{"id":25,"date":"2011-06-10T01:16:09","date_gmt":"2011-06-10T06:16:09","guid":{"rendered":"http:\/\/www.playwithlua.com\/?p=25"},"modified":"2020-05-22T13:25:06","modified_gmt":"2020-05-22T18:25:06","slug":"a-toy-regular-expression-matcher","status":"publish","type":"post","link":"http:\/\/www.playwithlua.com\/?p=25","title":{"rendered":"A Toy Regular-Expression Matcher"},"content":{"rendered":"<p>There&#8217;s a book I have, <em><a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/www.amazon.com\/gp\/product\/0596510047\/ref=as_li_tf_tl?ie=UTF8&amp;tag=plwilu-20&amp;linkCode=as2&amp;camp=217153&amp;creative=399705&amp;creativeASIN=0596510047');\"  href=\"http:\/\/www.amazon.com\/gp\/product\/0596510047\/ref=as_li_tf_tl?ie=UTF8&amp;tag=plwilu-20&amp;linkCode=as2&amp;camp=217153&amp;creative=399705&amp;creativeASIN=0596510047\">Beautiful Code<\/a><img loading=\"lazy\" style=\"border: none !important; margin: 0px !important;\" src=\"http:\/\/www.assoc-amazon.com\/e\/ir?t=plwilu-20&amp;l=as2&amp;o=1&amp;a=0596510047&amp;camp=217153&amp;creative=399705\" border=\"0\" alt=\"\" width=\"1\" height=\"1\" \/><\/em>, which is a bunch of essays about programming by various famous programmers. It&#8217;s a great book. The chapters are on all different topics; the only thing they have in common is a piece of code that makes you think &#8220;wow, that&#8217;s really clever&#8221;. This post is about chapter one of that book, which is about a regular expression matcher.<\/p>\n<p>Regular expressions are a domain-specific language for describing finite state machines. Specifically, finite state machines where the state transitions are successive characters in a string: you can use them to check if input strings match certain formats, like, a phone number: three digits followed by a dash followed by four digits. Kernighan&#8217;s matcher, which I&#8217;m going to translate to Lua and then extend a little bit, is really compact and beautiful. It&#8217;s about thirty lines, and recognizes these tokens:<\/p>\n<ul>\n<li>A dot (&#8220;.&#8221;) matches any single character<\/li>\n<li>A caret (&#8220;^&#8221;) matches the beginning of the string<\/li>\n<li>A dollar sign (&#8220;$&#8221;) matches the end of the string<\/li>\n<li>A star (&#8220;*&#8221;) matches zero or more repetitions of the previous token<\/li>\n<li>Any other character (like &#8220;a&#8221;) matches itself.<\/li>\n<\/ul>\n<p>My plan is first to duplicate Kernighan&#8217;s matcher in Lua and then extend it to also handle character sets, like &#8220;[abc]&#8221; would match a single &#8220;a&#8221;, &#8220;b&#8221;, or &#8220;c&#8221;. Because of this, I&#8217;m not going to port his code across directly, but rather make some changes for extensibility.<br \/>\n<!--more--><\/p>\n<h2>Basic matcher<\/h2>\n<p>Lua&#8217;s string metatable already has a match function, so we&#8217;re gonna, uh, destroy it. If you like yours feel free to call this <code>my_match<\/code> or something:<\/p>\n\n<div class=\"my_syntax_box\"><div class=\"my_syntax\"><div class=\"code\"><pre class=\"lua\" style=\"font-family:monospace;\"><span style=\"color: #aa9900; font-weight: bold;\">function<\/span> <span style=\"color: #0000aa;\">string<\/span><span style=\"color: #66cc66;\">.<\/span>match<span style=\"color: #66cc66;\">&#40;<\/span>str<span style=\"color: #66cc66;\">,<\/span> pattern<span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>at<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">1<\/span><span style=\"color: #66cc66;\">&#41;<\/span> <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;^&quot;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span>\n        <span style=\"color: #aa9900; font-weight: bold;\">return<\/span> matchhere<span style=\"color: #66cc66;\">&#40;<\/span>str<span style=\"color: #66cc66;\">,<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>sub<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">2<\/span><span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">else<\/span>\n        <span style=\"color: #aa9900; font-weight: bold;\">repeat<\/span>\n            <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> matchhere<span style=\"color: #66cc66;\">&#40;<\/span>str<span style=\"color: #66cc66;\">,<\/span> pattern<span style=\"color: #66cc66;\">&#41;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span> <span style=\"color: #aa9900; font-weight: bold;\">return<\/span> <span style=\"color: #aa9900;\">true<\/span> <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n            str <span style=\"color: #66cc66;\">=<\/span> str<span style=\"color: #66cc66;\">:<\/span>sub<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">2<\/span><span style=\"color: #66cc66;\">&#41;<\/span>\n        <span style=\"color: #aa9900; font-weight: bold;\">until<\/span> str <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;&quot;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n<span style=\"color: #aa9900; font-weight: bold;\">end<\/span><\/pre><\/div><\/div><\/div>\n\n<p>This is pretty straightforward, although you can see it calls a couple utility functions we&#8217;ll have to write. <code>matchhere<\/code> tries to find a match starting at the beginning of the string, so, if our pattern starts with a caret, we cut it off and try to match our pattern once and return. If it doesn&#8217;t, we enter a loop where we attempt to find a match starting at every character, until we find one or the string is empty. The code for <code>matchhere<\/code> is a little more complicated but not much:<\/p>\n\n<div class=\"my_syntax_box\"><div class=\"my_syntax\"><div class=\"code\"><pre class=\"lua\" style=\"font-family:monospace;\"><span style=\"color: #aa9900; font-weight: bold;\">function<\/span> matchhere<span style=\"color: #66cc66;\">&#40;<\/span>str<span style=\"color: #66cc66;\">,<\/span> pattern<span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #808080; font-style: italic;\">-- Everything matches an empty pattern<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> pattern <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;&quot;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span> <span style=\"color: #aa9900; font-weight: bold;\">return<\/span> <span style=\"color: #aa9900;\">true<\/span> <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n&nbsp;\n    <span style=\"color: #808080; font-style: italic;\">-- Try to match end-of-string<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> pattern <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;$&quot;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span> <span style=\"color: #aa9900; font-weight: bold;\">return<\/span> str <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;&quot;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n&nbsp;\n    <span style=\"color: #aa9900; font-weight: bold;\">local<\/span> first_token<span style=\"color: #66cc66;\">,<\/span> modifier<span style=\"color: #66cc66;\">,<\/span> after <span style=\"color: #66cc66;\">=<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>at<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">1<\/span><span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">,<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>at<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">2<\/span><span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">,<\/span> <span style=\"color: #cc66cc;\">2<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> modifier <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;*&quot;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span> after <span style=\"color: #66cc66;\">=<\/span> <span style=\"color: #cc66cc;\">3<\/span> <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n&nbsp;\n    <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> modifier <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;*&quot;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span>\n        <span style=\"color: #aa9900; font-weight: bold;\">return<\/span> matchstar<span style=\"color: #66cc66;\">&#40;<\/span>str<span style=\"color: #66cc66;\">,<\/span> first_token<span style=\"color: #66cc66;\">,<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>sub<span style=\"color: #66cc66;\">&#40;<\/span>after<span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">&#41;<\/span>\n&nbsp;\n    <span style=\"color: #aa9900; font-weight: bold;\">elseif<\/span> matches<span style=\"color: #66cc66;\">&#40;<\/span>str<span style=\"color: #66cc66;\">:<\/span>at<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">1<\/span><span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">,<\/span> first_token<span style=\"color: #66cc66;\">&#41;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span>\n        <span style=\"color: #aa9900; font-weight: bold;\">return<\/span> matchhere<span style=\"color: #66cc66;\">&#40;<\/span>str<span style=\"color: #66cc66;\">:<\/span>sub<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">2<\/span><span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">,<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>sub<span style=\"color: #66cc66;\">&#40;<\/span>after<span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n<span style=\"color: #aa9900; font-weight: bold;\">end<\/span><\/pre><\/div><\/div><\/div>\n\n<p>We first handle the two degenerate cases, matching against an empty pattern or an empty string. After that we want to find the token we&#8217;re going to match against (a single letter or a dot) and its modifier (a star or, well, not a star). The <code>after<\/code> variable is used to try to find the portion of the pattern left over after this token\/modifier tuple. This is a difference between what I wrote and Kernighan&#8217;s code: he assumes that the tokens are one character long (either a char or a dot) and I plan to add character classes, so I can&#8217;t.<\/p>\n<p>After that we either call another helper function (<code>matchstar<\/code>) or recurse. Since we directly return the result of the recursive call, this is actually not horrible: Lua supports tail-call elimination so this won&#8217;t blow out the stack. More on that later though.<\/p>\n<p>The <code>matches<\/code> function is really simple so I won&#8217;t bother writing this first version, instead let&#8217;s look at <code>matchstar<\/code>:<\/p>\n\n<div class=\"my_syntax_box\"><div class=\"my_syntax\"><div class=\"code\"><pre class=\"lua\" style=\"font-family:monospace;\"><span style=\"color: #aa9900; font-weight: bold;\">function<\/span> matchstar<span style=\"color: #66cc66;\">&#40;<\/span>str<span style=\"color: #66cc66;\">,<\/span> token<span style=\"color: #66cc66;\">,<\/span> rest<span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">while<\/span> <span style=\"color: #aa9900;\">true<\/span> <span style=\"color: #aa9900; font-weight: bold;\">do<\/span>\n        <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> matchhere<span style=\"color: #66cc66;\">&#40;<\/span>str<span style=\"color: #66cc66;\">,<\/span> rest<span style=\"color: #66cc66;\">&#41;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span> <span style=\"color: #aa9900; font-weight: bold;\">return<\/span> <span style=\"color: #aa9900;\">true<\/span> <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n&nbsp;\n        <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> <span style=\"color: #aa9900; font-weight: bold;\">not<\/span> matches<span style=\"color: #66cc66;\">&#40;<\/span>str<span style=\"color: #66cc66;\">:<\/span>at<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">1<\/span><span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">,<\/span> token<span style=\"color: #66cc66;\">&#41;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">or<\/span> str <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;&quot;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span> <span style=\"color: #aa9900; font-weight: bold;\">break<\/span> <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n        str <span style=\"color: #66cc66;\">=<\/span> str<span style=\"color: #66cc66;\">:<\/span>sub<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">2<\/span><span style=\"color: #66cc66;\">&#41;<\/span> <span style=\"color: #808080; font-style: italic;\">-- Next char<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n<span style=\"color: #aa9900; font-weight: bold;\">end<\/span><\/pre><\/div><\/div><\/div>\n\n<p>We take a string, a token that&#8217;s allowed to repeat, and a tail of a pattern. In a loop, we try to match the string to the tail. If we can&#8217;t, and the string starts with the allowed-repeatable token, then that&#8217;s probably why, so we strip that off and try again.<\/p>\n<p>So that&#8217;s everything you need. It can use up a few stack frames, but because it&#8217;s tail-recursive it only uses one extra one per star in the pattern, and there usually aren&#8217;t many of those. So now let&#8217;s add character classes to it.<\/p>\n<h2>Character classes<\/h2>\n<p>A character class is a set of characters inside brackets, like &#8220;[abc]&#8221;. The intent is for a character class to act like one token, but match any letter in the class, so &#8220;[abc]&#8221; would match &#8220;a&#8221; but not &#8220;d&#8221;, and &#8220;[01]*&#8221; would match &#8220;101010&#8221; but not &#8220;01210&#8221;. In order to add these, we have to modify two functions, <code>matches<\/code> (which we haven&#8217;t seen yet) and <code>matchhere<\/code> (where it finds <code>first_token<\/code> and <code>after<\/code>). Let&#8217;s do <code>matches<\/code> first:<\/p>\n\n<div class=\"my_syntax_box\"><div class=\"my_syntax\"><div class=\"code\"><pre class=\"lua\" style=\"font-family:monospace;\"><span style=\"color: #aa9900; font-weight: bold;\">function<\/span> matches<span style=\"color: #66cc66;\">&#40;<\/span>ch<span style=\"color: #66cc66;\">,<\/span> token<span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> token<span style=\"color: #66cc66;\">.<\/span>chars <span style=\"color: #aa9900; font-weight: bold;\">then<\/span>\n        <span style=\"color: #aa9900; font-weight: bold;\">return<\/span> token<span style=\"color: #66cc66;\">.<\/span>chars<span style=\"color: #66cc66;\">:<\/span>find<span style=\"color: #66cc66;\">&#40;<\/span>ch<span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">else<\/span>\n        <span style=\"color: #aa9900; font-weight: bold;\">return<\/span> token <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;.&quot;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">or<\/span> token <span style=\"color: #66cc66;\">==<\/span> ch\n    <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n<span style=\"color: #aa9900; font-weight: bold;\">end<\/span><\/pre><\/div><\/div><\/div>\n\n<p>We&#8217;re going to say, now, that a token is just a string containing a single character, or a table <code>{chars = \"...\"}<\/code> containing a set of characters (this is to remove the ambiguity between &#8220;.&#8221; and &#8220;[.]&#8221;; we want the latter to only match a literal dot). Now we just need a way to pull the first token, whatever it is, off a pattern:<\/p>\n\n<div class=\"my_syntax_box\"><div class=\"my_syntax\"><div class=\"code\"><pre class=\"lua\" style=\"font-family:monospace;\"><span style=\"color: #aa9900; font-weight: bold;\">function<\/span> grabtoken<span style=\"color: #66cc66;\">&#40;<\/span>pattern<span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">local<\/span> token<span style=\"color: #66cc66;\">,<\/span> rest\n&nbsp;\n    <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>at<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">1<\/span><span style=\"color: #66cc66;\">&#41;<\/span> <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;[&quot;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span> <span style=\"color: #808080; font-style: italic;\">-- A char class!<\/span>\n        <span style=\"color: #aa9900; font-weight: bold;\">local<\/span> close <span style=\"color: #66cc66;\">=<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>find<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #ff6666;\">&quot;]&quot;<\/span><span style=\"color: #66cc66;\">&#41;<\/span>\n        token <span style=\"color: #66cc66;\">=<\/span> <span style=\"color: #66cc66;\">&#123;<\/span>chars <span style=\"color: #66cc66;\">=<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>sub<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">2<\/span><span style=\"color: #66cc66;\">,<\/span> close<span style=\"color: #66cc66;\">-<\/span><span style=\"color: #cc66cc;\">1<\/span><span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">&#125;<\/span>\n        rest <span style=\"color: #66cc66;\">=<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>sub<span style=\"color: #66cc66;\">&#40;<\/span>close<span style=\"color: #66cc66;\">+<\/span><span style=\"color: #cc66cc;\">1<\/span><span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">else<\/span> <span style=\"color: #808080; font-style: italic;\">-- Not a class<\/span>\n        token <span style=\"color: #66cc66;\">=<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>at<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">1<\/span><span style=\"color: #66cc66;\">&#41;<\/span>\n        rest <span style=\"color: #66cc66;\">=<\/span> pattern<span style=\"color: #66cc66;\">:<\/span>sub<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">2<\/span><span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n&nbsp;\n    <span style=\"color: #aa9900; font-weight: bold;\">local<\/span> modifier <span style=\"color: #66cc66;\">=<\/span> rest<span style=\"color: #66cc66;\">:<\/span>at<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">1<\/span><span style=\"color: #66cc66;\">&#41;<\/span>\n    <span style=\"color: #aa9900; font-weight: bold;\">if<\/span> modifier <span style=\"color: #66cc66;\">==<\/span> <span style=\"color: #ff6666;\">&quot;*&quot;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">then<\/span> rest <span style=\"color: #66cc66;\">=<\/span> rest<span style=\"color: #66cc66;\">:<\/span>sub<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #cc66cc;\">2<\/span><span style=\"color: #66cc66;\">&#41;<\/span> <span style=\"color: #aa9900; font-weight: bold;\">end<\/span>\n&nbsp;\n    <span style=\"color: #aa9900; font-weight: bold;\">return<\/span> token<span style=\"color: #66cc66;\">,<\/span> modifier<span style=\"color: #66cc66;\">,<\/span> rest\n<span style=\"color: #aa9900; font-weight: bold;\">end<\/span><\/pre><\/div><\/div><\/div>\n\n<p>We look at the first character to tell if this is a class (which will end in a close brace) or a normal character. Once we pull off the token part, we determine what the rest of the pattern is after it, which we may cut a character (a star) off of later. Then we return the token, the char right after the token (in case it&#8217;s a star), and the rest of the pattern.<\/p>\n<p>We can pretty much plug this directly into <code>matchhere<\/code>:<\/p>\n\n<div class=\"my_syntax_box\"><div class=\"my_syntax\"><div class=\"code\"><pre class=\"lua\" style=\"font-family:monospace;\"><span style=\"color: #aa9900; font-weight: bold;\">local<\/span> first_token<span style=\"color: #66cc66;\">,<\/span> modifier<span style=\"color: #66cc66;\">,<\/span> rest <span style=\"color: #66cc66;\">=<\/span> grabtoken<span style=\"color: #66cc66;\">&#40;<\/span>pattern<span style=\"color: #66cc66;\">&#41;<\/span><\/pre><\/div><\/div><\/div>\n\n<p>It returns exactly what we&#8217;d expect. I did change <code>matchhere<\/code> to want the rest of the pattern instead of the index of the rest of the pattern, just to make it a little cleaner.<\/p>\n<h2>Demonstration<\/h2>\n<p>It works pretty much the same way a subset of Lua&#8217;s built-in matcher works. Here&#8217;s an example of using it to determine if a string of binary digits is even:<\/p>\n\n<div class=\"my_syntax_box\"><div class=\"my_syntax\"><div class=\"code\"><pre class=\"lua\" style=\"font-family:monospace;\"><span style=\"color: #0000aa;\">assert<\/span><span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #ff6666;\">&quot;10010110100&quot;<\/span><span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">:<\/span>match<span style=\"color: #66cc66;\">&#40;<\/span><span style=\"color: #ff6666;\">&quot;^[01]*0$&quot;<\/span><span style=\"color: #66cc66;\">&#41;<\/span><span style=\"color: #66cc66;\">&#41;<\/span><\/pre><\/div><\/div><\/div>\n\n<p>The real payoff here is in seeing how easy it is to add more features to it. Adding named character classes (&#8220;\\d&#8221; for all digits, say) is easy; more useful would be to add escapes, so we can match a literal dot. More useful and a lot harder would be adding captures. What have you always wished regular expressions would do? Play around with it, see what you can add. The code (with unit tests using <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/gist.github.com\/861635');\"  href=\"http:\/\/gist.github.com\/861635\">lt<\/a>) is <a onclick=\"javascript:pageTracker._trackPageview('\/outgoing\/gist.github.com\/1018317');\"  href=\"http:\/\/gist.github.com\/1018317\">here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There&#8217;s a book I have, Beautiful Code, which is a bunch of essays about programming by various famous programmers. It&#8217;s a great book. The chapters are on all different topics; the only thing they have in common is a piece of code that makes you think &#8220;wow, that&#8217;s really clever&#8221;. This post is about chapter [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/www.playwithlua.com\/index.php?rest_route=\/wp\/v2\/posts\/25"}],"collection":[{"href":"http:\/\/www.playwithlua.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.playwithlua.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.playwithlua.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.playwithlua.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=25"}],"version-history":[{"count":0,"href":"http:\/\/www.playwithlua.com\/index.php?rest_route=\/wp\/v2\/posts\/25\/revisions"}],"wp:attachment":[{"href":"http:\/\/www.playwithlua.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=25"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.playwithlua.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=25"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.playwithlua.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=25"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}