Member-only story

A Brief Guide to Regular Expressions

7 min readJun 19, 2023

Regular expressions is a DSL (Domain-specific Language), which is used to match patterns of text. They are everywhere. All major programming languages have a built-in library that supports a flavor of regular expressions. The following article attempts to explore the origins of regular expressions and present their basic syntactic variations.

Origins of Regular Expressions

The theoretical background of regular expressions lies within automaton theory and formal languages. Regular expressions belong to a type 3 grammar of the Chomsky hierarchy. This hierarchy, described by Chomsky in 1956 [1], provides a categorization of grammars that describe formal languages.

In 1943 Warren McCulloch and Walter Pitts described the human neural system using automata [2]. The mathematician Stephen Kleene described the proposed models with a mathematical notation named regular sets [3]. Later Brzozowski [4] provided mathematical definitions for the Kleene regular expressions formalism, and introduced ways to convert regular expressions in to state diagrams. In the late 1960s, Ken Thompson proposed a compiler that translated a regular expression into the assembly language of an IBM 7094 processor [5]. Later on, he implemented the regular sets in his text editor named qed, and afterwards on ed, which became part of the UNIX distribution. From then on, regular expressions became widely used in almost all UNIX variants.

Ehrenfeucht and Zeiger [6] provided the academic community with four metrics that measured the complexity of regular expressions. In 1980, Ernst Leiss [7] presented an algorithm for constructing a finite automaton from a given regular expression, and two years later Floyd and Ullman [8] proposed an approach for compiling regular expressions into integrated circuits. Sensory Networks [Net04] constructed a content classification accelerator that implements a regular expression engine in hardware.

Regular expressions have also become a standard feature in many programming languages. Programming languages like Java, Ruby and Python contain regular expression engines as part of their API, and, as shown in Section 3, Perl also defines a set of operators and integrates regular expressions into its syntax [9]. In addition, Perl permits a grammar…

A Brief Guide to Regular Expressions

Origins of Regular Expressions

Written by Vassilios Karakoidas

No responses yet