Day 8 – Grammars generating grammars

来源:互联网 时间:2015-12-09

By now you have probably gotten used to the prefix “meta” appearing here and there in the Perl 6 world. Metaclasses, metaobjects, metaoperators, the mythical Meta-Object Protocol. Sounds like nothing scary, all good and familiar and you’ve seen it all, eh? Nothing further from the truth! Today, on the Perl 6 Advent Calendar we’re going full meta. We’re going to have grammars that parse grammars then generate grammars that are going to use to parse grammars. I’ll let this sink in for a moment while you scroll down for the next paragraph.

Grammars are hands-down one of the killer features of Perl 6. Taking Perl’s no less than legendary ability to process text and taking it to the next level. Regular expressions, to many people’s disappointment are just what they say they are: regular (well, regular regular expressions do. Perl regexes are a bit of a different beast, but that’s a material for another story). They parse regular, as opposed to, say, context-free languages, to the neverending disappointment of all the people who would just love to parse XML with them, like in that everyone’s favourite Stackoverflow answer. Say no more to silly theoretical limitations of language theory though, since now we have grammars which are all we ever missed in regexes: readability, composability and, of course, ability to parse even Perl 6 itself— and if that doesn’t sell its absolute power, I don’t know what does.

Writing parsers for predefined grammars (say, in Bachus-Naur Form) was always a bit of a dull job, almost as exciting as copypasting stuff. If you ever sat down and wrote a parser from scratch (perhaps while going through the excellent Let’s Build a Compilerbook), you probably recognize the pattern all too well: take a single rule from your grammar, write a subroutine for it, have it call (perhaps recursively) other subroutines similarly defined for other grammar rules, rinse, repeat. Well say no more to that now that we have Perl 6 grammars! In this wonderful new world we need not write subroutines for every token to get the work done. Now we write a grammarclass, where we put tokens, rules and regexes for every symbol in our grammar, and inside them we write regexes (or code) that refer to (perhaps recursively) other symbols in our Perl 6 grammar. Now, if you ever went through both of those alternatives, you will definitely realize how massive of a convenience the grammars are in Perl 6. Instead of painstakingly cranking out repetitive and error-prone code we have a wonderful, declarative way to specify our language, with an impressive collection of utility to get most of our common, boring work out of the way.

But what if we already have a grammar, specified perhaps in the previously mentioned BNF? What we do then is carefully retype the existing grammar (parsing it in our head, actually) into our new, shiny Perl 6 grammar that represents the exact same thing, but has the clear advantage of actually being executable code. A fair deal, you could say. For most men, no big deal at all. Weare not most men. We are programmers. We have the resources. The will. To make these grammars count! Our job revolves around building things that will do the work for us so we don’t have to. To take the well-defined specifications and turn them into tools that work for us, or for themselves. Tools that take the repetitive and automatable parts of our work the way. Why should building parsers be any different? Why, I’m glad you asked.

The wonderful thing about the Perl 6 grammars is that they are no more magical than any other element of the language. Just as classes are first class citizen that we can introspect, augment and build programatically, so are grammars. In fact, you can look at the source code of the compiler itself and notice that grammars are nothing else than a specialized kind of classes. They follow the same rules as classes do, allowing us to create them on the fly, add tokens to them on the fly and eventually finalize them in order to have a proper, instantiatable class object. So now that we can parse BNF grammars (since they’re just ordinary text) and create Perl 6 grammars from code, let’s put those pieces together and write us something that will save us the effort of converting a BNF grammar to Perl 6 grammar manually.

The grammar for a BNF grammar grammar Grammar::BNF { token TOP { /s* <rule>+ /s* } token rule { <opt-ws> '<' <rule-name> '>' <opt-ws> '::=' <opt-ws> <expression> <line-end> } token expression { <list-of-terms> +% [/s* '|' <opt-ws>] } token term { <literal> | '<' <rule-name> '>' } token list-of-terms { <term> +% <opt-ws> } token rule-name { <-[>]>+ } token opt-ws { /h* } token line-end { [ <opt-ws> /n ]+ } token literal { '"' <-["]>* '"' | "'" <-[']>* "'" } ...}

The interesting stuff happens in the three, almost-topmost tokens. ruleis the core building block of a BNF grammar: a <symbol> ::= <expression>pair, followed by a new line. The entire grammar is no more than a list of those. Each expression is a list of terms, or possibly and alternative of them. Each term is either a literal, or a symbol name surrounded by angle brackets. Easy enough! That covers the parsing part. Let’s look at the generating itself. We do have a built-in mechanism of “doing stuff for each token in the grammar”, in the form of Actions, so let’s go ahead and use that:

my class Actions { has $.name = 'BNFGrammar'; method TOP($/) { my $grmr := Metamodel::GrammarHOW.new_type(:$.name); $grmr.^add_method('TOP', EVAL 'token { <' ~ $<rule>[0].ast.key ~ '> }'); for $<rule>.map(*.ast) -> $rule { $grmr.^add_method($rule.key, $rule.value); } $grmr.^compose; make $grmr; } method expression($/) { make EVAL 'token { ' ~ ~$/ ~ ' }'; } method rule($/) { make ~$<rule-name> => $<expression>.ast; }}

The TOPmethod is definitely the most magical and scary, so let’s tackle that first to make the rest of the stuff look trivial in comparison. Basically, three things happen there:

1. We create a new grammar, as a new Perl 6 type2. We add tokens to it using the `^add_method` method3. We finalize the grammar using the `^compose` method

While Perl 6 specifies that the token named TOPis where the parsing starts, in BNF the first rule is always the starting point. To adapt one to the other, we craft a phony TOP token which just calls the first rule specfied in the BNF grammar. Unavoidably, the scary and discouraging EVALcatches our attention, as if it said “horrible stuff happens here!” It’s not entirelywrong when it says that, but since we have no other way of programmatically constructing the individual regexes (that I know of), we’ll have to accept this little discomfort in the name of the Greater Good, and look a little bit closer at what we’re actually EVALing in a moment.

After TOPwe proceed to add the rest of the BNF rules to our grammar, this time preserving their original names, then ^compose()the whole thing and finally makeit the result of the parsing: a readymade parser class.

In the expressionmethod we glue the parsed BNF elements together in order to produce valid Perl 6 code. This turns out to be pretty easy, since both separate symbols with whitespace, alternate them with the pipe character and sorround symbol names with angle brackets. So for a rule that looks like this:

<foo> ::= 'bar' | <baz>

The Perl 6 code that we EVALturns becomes:

token { 'bar' | <baz> }

Since we already validated in the grammar part of our code that the BNF we parse is correct, there’s nothing stopping us from literally pasting the entire expression into our code and wrap it in a token { }, so let’s go ahead and do just that.

Last but not least, for each BNF rule we parse we produce a nice Pair, so our TOPmethod has an easier times processing each of them.

Seems like we’re about done here, but just for the users’ convenience, let’s write a nice method that takes a BNF grammar and produces a ready to use type object for us. As we remember, grammars are just classes, so there’s nothing stopping us from just adding it straight to our grammar:

grammar Grammar::BNF { ... method generate($source, :$name = 'BNFGrammar') { my $actions = Actions.new(:$name); my $ret = self.new.parse($source, :$actions).ast; return $ret.WHAT; }}

Looks good from here! Before you start copypasting all this into your own projects, remember that Grammar::BNFis a Perl 6 module available in your Perl 6 Module Ecosystem, installable with your favourite module manager. It also ships with some goodies like a Perl 6 slang, allowing you to write BNF grammars straight in your Perl 6 code (as opposed to including them as strings), or parsing the more powerful (and perhaps more widespread) ABNF grammars as well.

If you indeed took the time for the beginning of this post to sink in, you may remember that I promised that we’re going to have grammars (a-one) that parse grammars (a-two) then generate grammars (a-three) that are going to use to parse grammars (a-four). So far we’ve seen the BNF::Grammar grammar (that’s our a-one), that parses a BNF grammar (that’s our a-two), generates a Perl 6 grammar in a form of a type object (that’s a-three) and… that’s it. We’re still missing the last part, using this whole thing to parse grammars. We’ve only gone 75%-meta, and that is just not good enough in this day and age. Why stop now? Why not take a BNF grammar of a BNF grammar, parse that with a Perl 6 grammar and use the resulting Perl 6 BNF grammar to parse our original BNF grammar of a BNF grammar? Wouldn’t that be sweet? Sure it will! That is, however, left as an exercise for you, my dear readers. After all, how fair would it be if Ihad all the fun while you just sit there and watch? Would you like it if you opened the little paper window in your advent calendar only to find a note saying. „There was a chocolate here for you. I ate it. It was truly delivious?” Me neither! There’s a BNF grammar for BNF both on Wikipedia and in Grammar::BNF’s very own test suite; the latter even includes a little breadcrumb that can help you with your adventure. I eagerly await thy results, and as always, thank you all for reading and I wish you a wonderful advent!

相关阅读:
Top