In the previous installment we built a simple text generator using some Ruby meta-programming tricks. It was still far from being our desired context-free grammar (CFG) generator, though, since it lacked many CFG prerequisites. Most flagrantly, we had no rule recursion and only one production (rule definition) per rule. Here鈥檚 the what a script that would use both features:
dictionary
noun 'dog', 'bus'
verb 'barked', 'parked'
preposition 'at'
rule 'phrase'
opt 'The', noun, verb, preposition, 'a', noun
opt 'Here goes some', phrase, 'recursion.'
opt 'Meet me', preposition, 'the station.'
grammar phrase: 10
The dictionary
section is just as we left it. Let鈥檚 see what changed in the rule
section.
Previously we had only one production per rule, so rule definitions such as phrase
were captured by the method_missing
method. This design would make multiple productions difficult to handle. Here鈥檚 how we re-implemented the rule method:
def rule *args
verbose "Read rule: #{args.to_s}"
@last_rule = args.first.to_s
@grammar.rules[@last_rule] = (Rule.new @last_rule)
define_method(args.first.to_s) { @grammar.rules[args.first.to_s] }
@state = :rule
end
Once more we use define_method
to dynamically define methods. Consider the rule 'phrase'
statement present in our script: this would define a method named phrase
which hopefully returns the Rule
object within @grammar.rules['phrase']
. Note that the returned rule is not evaluated (i.e., it is still a Rule object, not a String object).
Now we keep track of the @last_rule
so rule productions (options) are added to the appropriate rule. Options are captures by opt
:
def opt *args
if @state == :rule
verbose "Read option for rule #{@last_rule}: #{args.to_s}"
@grammar.rules[@last_rule].options << args
end
end
Here, args
is an array of Rule production symbols (both terminal and non-terminal, i.e., both Strings and Rules). The set of Rule options will ultimately be an Array of Arrays of Rule production symbols corresponding to each opt
line written in the DSL script (e.g. in the example above, phrase
would have 3 options).
Rules are evaluated by Grammar.generate
, which receives a Hash of rules and the amount of times they should be generated (e.g. phrase: 10
in our example):
def generate args
text = ''
args.each do |rulename, qty|
(1..qty).each { text << @rules[rulename.to_s].to_s }
end
puts "Final result: \n========\n#{text}\n========"
end
How does Rule recursion work, though? Let鈥檚 take a look at the to_s
method in Rule
:
def to_s
randkeys = options.sample
randkeys.map! { |k| k.to_s }
verbose "Applying rule #{@name} with keys: #{randkeys}."
randkeys.join(" ")
end
Pretty straightforward: a production is chosen at random (e.g., 1 of the 3 options in our example), and each symbol in the production is evaluated into a String and concatenated into the final result. For example, say the first rule, opt 'The', noun, verb, preposition, 'a', noun
, is chosen. Then randkeys.map!
would call to_s
for each key in the production: 'The'.to_s, noun.to_s
, etc. Recursion will happen if the key is a method that returns a Rule object (such as the phrase
method we mentioned in the beginning of the post).
Let鈥檚 try out a CFG classic: well-formed parenthesis. Here鈥檚 the script:
rule 'par'
opt '()'
opt '(', par, ')'
opt par, par
grammar par: 1
And here鈥檚 some sample output:
$ ruby lero.rb examples.le
Final result:
========
( ( ( ( () ) ) () ) () )
========
$ ruby lero.rb examples.le
Final result:
========
( () )
========
$ ruby lero.rb examples.le
Final result:
========
() ()
========
And now we鈥檙e done! With only 2 classes (Grammar and Rule) and 1 additional file that defines a DSL (lero.rb), we were able to build a CFG-like text generator with the most important CFG properties.
Full code is available in the same repository.