首页IT科技Implementing class(Implementing Regular Expressions)

Implementing class(Implementing Regular Expressions)

时间2025-08-29 17:47:25分类IT科技浏览6194
导读:Implementing Regular Expressions Russ Cox...

Implementing Regular Expressions

Russ Cox

rsc@swtch.com

This page collects resources about

implementing regular expression search

efficiently.

Articles and Notes

“Regular Expression Matching Can Be Simple And Fast               ”

“Regular Expression Matching: the Virtual Machine Approach                       ”

An introduction to submatch tracking during

efficient (non-backtracking) NFA-based regular expression matching.

Supporting programs:

http://code.google.com/p/re1/

“Regular Expression Matching in the Wild        ”

A tour of RE2, an efficient, production regular expression implementation.

Supporting programs:

http://code.google.com/p/re2/

“Regular Expression Matching with a Trigram Index               ”

How Google Code Search worked.

Supporting programs:

http://code.google.com/p/codesearch/

“IBM 7094 Cheat Sheet                      ”

If you want to read Ken Thompsons original 1968 paper

(see below),

youll want to take this with you.

“Regular Expressions: Languages, Algorithms, and Software        ”

by Brian W. Kernighan and Rob Pike

The cleanest, simplest, backtracking implementation

youll ever see.

(But still backtracking, so runs very slowly on some inputs.)

See also Chapter 9 of The Practice of Programming

and Chapter 1 of Beautiful Code.

Efficient Implementations

RE2 regular expression library

Efficient automaton-based implementation of

Perl-syntax regular expressions (excluding backreferences).

Written by Russ Cox.

Plan9 regular expression library

Efficient (non-backtracking) NFA implementation

with submatch tracking.

Accepts UTF-8 and wide-character Unicode input.

Traditional egrep syntax only.

Written by Rob Pike.

TRE regular expression library

Efficient (non-backtracking) NFA implementation

with submatch tracking.

POSIX-compliant, wide-character input.

Written by Ville Laurikari.

Plan9 grep

Very fast DFA-based grep(1) implementation.

Accepts UTF-8 input.

Traditional egrep syntax only.

Written by Ken Thompson.

References

Michael Rabin and Dana Scott,

“Finite automata and their decision problems,        ”

IBM Journal of Research and Development 3 (1959), pp.114–125.

http://www.research.ibm.com/journal/rd/032/ibmrd0302C.pdf

Introduces the concept of NFAs, shows they are equivalent in power to DFAs.

Won Rabin and Scott a Turing Award

Ken Thompson,

“Regular expression search algorithm,                      ”

Communications of the ACM 11(6) (June 1968), pp.419–422.

http://doi.acm.org/10.1145/363347.363387

(PDF)

The first efficient regular expression search.

Four pages but dense: every time I read it I learn something new.

Take an IBM 7094 cheat sheet with you.

See also old slides

from a lecture about the paper.

Ville Laurikari,

“NFAs with Tagged Transitions,

their Conversion to Deterministic Automata

and

Application to Regular Expressions,               ”

in Proceedings of the Symposium on String Processing and

Information Retrieval, September 2000.

http://laurikari.net/ville/spire2000-tnfa.ps

How to track submatches during efficient (non-backtracking)

NFA-based regular expression search.

M. Douglas McIlroy,

“Enumerating the strings of regular languages,        ”

Journal of Functional Programming 14 (2004), pp.503–518.

http://www.cs.dartmouth.edu/~doug/nfa.ps.gz (preprint)

Not relevant to implementing regular expression matching,

but a beautiful example of regular expression analysis

and manipulation using Haskell.
声明:本站所有文章               ,如无特殊说明或标注                       ,均为本站原创发布               。任何个人或组织        ,在未征得本站同意时       ,禁止复制               、盗用                       、采集        、发布本站内容到任何网站               、书籍等各类媒体平台                       。如若本站内容侵犯了原著者的合法权益                       ,可联系我们进行处理        。

创心域SEO版权声明:以上内容作者已申请原创保护,未经允许不得转载,侵权必究!授权事宜、对本内容有异议或投诉,敬请联系网站管理员,我们将尽快回复您,谢谢合作!

展开全文READ MORE
css样式表代码(CSS样式快速入门) lambda表达式实现原理(lambda表达是应用实践)