Implementing class(Implementing Regular Expressions)
Implementing Regular Expressions
Russ Cox
rsc@swtch.com
This page collects resources about
implementing regular expression search
efficiently.Articles and Notes
“Regular Expression Matching Can Be Simple And Fast ”
“Regular Expression Matching: the Virtual Machine Approach ”
An introduction to submatch tracking during
efficient (non-backtracking) NFA-based regular expression matching.Supporting programs:
http://code.google.com/p/re1/“Regular Expression Matching in the Wild”
A tour of RE2, an efficient, production regular expression implementation.
Supporting programs:
http://code.google.com/p/re2/“Regular Expression Matching with a Trigram Index ”
How Google Code Search worked.
Supporting programs:
http://code.google.com/p/codesearch/“IBM 7094 Cheat Sheet ”
If you want to read Ken Thompsons original 1968 paper
(see below),
youll want to take this with you.“Regular Expressions: Languages, Algorithms, and Software ”
by Brian W. Kernighan and Rob Pike
The cleanest, simplest, backtracking implementation
youll ever see.(But still backtracking, so runs very slowly on some inputs.)
See also Chapter 9 of The Practice of Programming
and Chapter 1 of Beautiful Code.Efficient Implementations
RE2 regular expression library
Efficient automaton-based implementation of
Perl-syntax regular expressions (excluding backreferences).
Written by Russ Cox.Plan9 regular expression library
Efficient (non-backtracking) NFA implementation
with submatch tracking.
Accepts UTF-8 and wide-character Unicode input.
Traditional egrep syntax only.
Written by Rob Pike.TRE regular expression library
Efficient (non-backtracking) NFA implementation
with submatch tracking.
POSIX-compliant, wide-character input.
Written by Ville Laurikari.Plan9 grep
Very fast DFA-based grep(1) implementation.
Accepts UTF-8 input.
Traditional egrep syntax only.
Written by Ken Thompson.References
Michael Rabin and Dana Scott,
“Finite automata and their decision problems, ”
IBM Journal of Research and Development 3 (1959), pp.114–125.
http://www.research.ibm.com/journal/rd/032/ibmrd0302C.pdfIntroduces the concept of NFAs, shows they are equivalent in power to DFAs.
Won Rabin and Scott a Turing AwardKen Thompson,
“Regular expression search algorithm, ”
Communications of the ACM 11(6) (June 1968), pp.419–422.
http://doi.acm.org/10.1145/363347.363387
(PDF)The first efficient regular expression search.
Four pages but dense: every time I read it I learn something new.
Take an IBM 7094 cheat sheet with you.
See also old slides
from a lecture about the paper.Ville Laurikari,
“NFAs with Tagged Transitions,
their Conversion to Deterministic Automata
and
Application to Regular Expressions, ”
in Proceedings of the Symposium on String Processing and
Information Retrieval, September 2000.
http://laurikari.net/ville/spire2000-tnfa.psHow to track submatches during efficient (non-backtracking)
NFA-based regular expression search.M. Douglas McIlroy,
“Enumerating the strings of regular languages, ”
Journal of Functional Programming 14 (2004), pp.503–518.
http://www.cs.dartmouth.edu/~doug/nfa.ps.gz (preprint)Not relevant to implementing regular expression matching,
but a beautiful example of regular expression analysis
and manipulation using Haskell.创心域SEO版权声明:以上内容作者已申请原创保护,未经允许不得转载,侵权必究!授权事宜、对本内容有异议或投诉,敬请联系网站管理员,我们将尽快回复您,谢谢合作!