首页IT科技Implementing class(Implementing Regular Expressions)

Implementing class(Implementing Regular Expressions)

时间2025-04-29 18:00:36分类IT科技浏览4654
导读:Implementing Regular Expressions Russ Cox...

Implementing Regular Expressions

Russ Cox

rsc@swtch.com

This page collects resources about

implementing regular expression search

efficiently.

Articles and Notes

“Regular Expression Matching Can Be Simple And Fast           ”

“Regular Expression Matching: the Virtual Machine Approach                 ”

An introduction to submatch tracking during

efficient (non-backtracking) NFA-based regular expression matching.

Supporting programs:

http://code.google.com/p/re1/

“Regular Expression Matching in the Wild      ”

A tour of RE2, an efficient, production regular expression implementation.

Supporting programs:

http://code.google.com/p/re2/

“Regular Expression Matching with a Trigram Index           ”

How Google Code Search worked.

Supporting programs:

http://code.google.com/p/codesearch/

“IBM 7094 Cheat Sheet                ”

If you want to read Ken Thompsons original 1968 paper

(see below),

youll want to take this with you.

“Regular Expressions: Languages, Algorithms, and Software      ”

by Brian W. Kernighan and Rob Pike

The cleanest, simplest, backtracking implementation

youll ever see.

(But still backtracking, so runs very slowly on some inputs.)

See also Chapter 9 of The Practice of Programming

and Chapter 1 of Beautiful Code.

Efficient Implementations

RE2 regular expression library

Efficient automaton-based implementation of

Perl-syntax regular expressions (excluding backreferences).

Written by Russ Cox.

Plan9 regular expression library

Efficient (non-backtracking) NFA implementation

with submatch tracking.

Accepts UTF-8 and wide-character Unicode input.

Traditional egrep syntax only.

Written by Rob Pike.

TRE regular expression library

Efficient (non-backtracking) NFA implementation

with submatch tracking.

POSIX-compliant, wide-character input.

Written by Ville Laurikari.

Plan9 grep

Very fast DFA-based grep(1) implementation.

Accepts UTF-8 input.

Traditional egrep syntax only.

Written by Ken Thompson.

References

Michael Rabin and Dana Scott,

“Finite automata and their decision problems,      ”

IBM Journal of Research and Development 3 (1959), pp.114–125.

http://www.research.ibm.com/journal/rd/032/ibmrd0302C.pdf

Introduces the concept of NFAs, shows they are equivalent in power to DFAs.

Won Rabin and Scott a Turing Award

Ken Thompson,

“Regular expression search algorithm,                ”

Communications of the ACM 11(6) (June 1968), pp.419–422.

http://doi.acm.org/10.1145/363347.363387

(PDF)

The first efficient regular expression search.

Four pages but dense: every time I read it I learn something new.

Take an IBM 7094 cheat sheet with you.

See also old slides

from a lecture about the paper.

Ville Laurikari,

“NFAs with Tagged Transitions,

their Conversion to Deterministic Automata

and

Application to Regular Expressions,           ”

in Proceedings of the Symposium on String Processing and

Information Retrieval, September 2000.

http://laurikari.net/ville/spire2000-tnfa.ps

How to track submatches during efficient (non-backtracking)

NFA-based regular expression search.

M. Douglas McIlroy,

“Enumerating the strings of regular languages,      ”

Journal of Functional Programming 14 (2004), pp.503–518.

http://www.cs.dartmouth.edu/~doug/nfa.ps.gz (preprint)

Not relevant to implementing regular expression matching,

but a beautiful example of regular expression analysis

and manipulation using Haskell.
声明:本站所有文章           ,如无特殊说明或标注                 ,均为本站原创发布           。任何个人或组织      ,在未征得本站同意时     ,禁止复制           、盗用                 、采集      、发布本站内容到任何网站           、书籍等各类媒体平台                 。如若本站内容侵犯了原著者的合法权益                 ,可联系我们进行处理      。

创心域SEO版权声明:以上内容作者已申请原创保护,未经允许不得转载,侵权必究!授权事宜、对本内容有异议或投诉,敬请联系网站管理员,我们将尽快回复您,谢谢合作!

展开全文READ MORE
a breakdown of the factors(A breakdown pie chart ReportLab Snippets (Beta)) cnblogs博客园(STL MainTao 博客园)