This project is read-only.

Blowing my mind...

Nov 10, 2009 at 9:30 PM

Ok I have to be honest. I was just reading Douglas Purdy's latest blog post (http://www.douglaspurdy.com/2009/11/10/from-oslo-to-sql-server-modeling/) about project "Oslo" and feeling very afraid that MGrammar wasn't going to turn out like I hoped. Probably mired in licensing or lost in some sql server modeling behemoth. That's ok I guess but now I am out looking for a replacement and I was doing so with low expectations. But here I find OMeta#, it looks different enough from MGrammar but also strikingly similar... :)

I'll try to give you a little background on where I'm coming from and then ask some questions, some MGrammar comparisons some just for my own understanding of OMeta#.

First off, I'm working on a related project here on codeplex called MetaSharp (http://metasharp.codeplex.com). Sorry for the similar name, it was an accident but don't be threatened! It's actually very complimentary rather than competitive. So far I've been using MGrammar to do all of my parsing and grammar authoring then everything that MetaSharp is concerned with is the before and after the actual parsing. In short it's a transformation engine. Transforming text into an AST is just one of those steps. This is what I would like to get out of OMeta#. I would like a way to parse arbitrary text and result in a tree of Nodes. The theory is, with standard AST nodes for common programming constructs you can build transformers to convert that AST into anything else (such as CodeDom objects or Linq Expressions or text etc.). The goal here is to enable arbitrary DSLs and scripting languages and, frankly, any type of language oriented tool.

So I have a few questions, I was poking around in the source code here (for example http://ometasharp.codeplex.com/SourceControl/changeset/view/17706#361393) and I see that your right hand projections appear to be arbitrary code (C# even?). So it seems that OMeta# merely generates code that matches patterns based on the left hand side DSL then generates the code on the right hand side? Plus the "ometa" keyword generates some type of specialized class? So what I see here is that you're generating an html string, so I'm assuming the Parser<char> indicates that the output of this parser is a stream of chars, otherwise known as a string? So If I changed this to Parser<Node> could I potentially create a sytnax such as:

Example -> { Push(new ExampleNode()) }

 

Additionally I see some things in your code such as:

InputDecl -> { CodeGen.Create()
                .Write("(")
                .Write("OMetaStream<", Get<string>("CodeType"), "> ")
                .WriteLeveledVar("inputStream")
                .Write(", out OMetaList<HostExpression> ")
                .WriteLeveledVar("result")
                .Write(", out OMetaStream <", Get<string>("CodeType"), "> ")
                .WriteLeveledVar("modifiedStream")
                .WriteLineEnd(")")
               },

Which actually concerns me a little, it goes a little counter to my reasoning. So for this question I just want to tell you what I feel is wrong with this, how I have tried to solve it with MetaSharp and I would basically just like to hear what you think about it.

What looks like is happening here is that you're doing a textual transformation directly inside of your grammar. To me the problem with this is that you end up with a grammar that is only good for one thing (and is kind of ugly). Ideally, I think you should generate AST nodes then use visitors to generate this text instead. The  reason why I suggest this is because, to me, that makes the grammar more "pure" for one thing (it would be akin to separating a view from logic) and also it would allow you to write multiple transformers, so you could generate C# today and VB tomorrow for example. You could also push it through multiple transformation steps and you could also allow plugins to modify the AST (aspects?) etc. To me what would be much more valuable instead of arbitrary code on the right hand side is an AST declaration syntax, this was one thing that MGrammar did very well. I could be satisfied with the Parser<Node> described above though.

 

Next, suppose I wanted to define code to parse "sum(x, y)". How would I do this in OMeta#? Here it is in MGrammar you should be able to understand what I'm saying here.

sytnax sum = "sum(" a:Join(Identifier, ",") ")" => MethodInvocationExpression { Name => "sum", Arguments => a };

syntax Join(a, b) = a => [a] | a b c:Join(a) => [a, valuesof(c)];

[...] implies a collection of results, "valuesof" in this context simply flattens the collection so you don't have a bunch of nested items.

 

Also, what sort of support do you have for grammar reuse? In addtion to AST reuse I want to promote Grammar reuse. Specifically, in an external DSL, it's really common to write something very declarative but then add some support for BinaryExpressions or other common general purpose language constructs. And it turns out that BinaryExpressions are really hard! I would love it if, when writing a new DSL, I could start from a place where I could pull in common things such as binary expressions or if / else statements etc. without having to use everything else too (opt in). Also, are there any namespacing/import grouping mechanisms for your grammars?

I have more but I'll try to pace myself for now. This is exciting!

Nov 11, 2009 at 3:10 AM

Hi Justin,

When MGrammar was released a few weeks or so after I started OMeta#, I decided to stop work on it and I moved on to other things. The reason is that the language is the easy part and tool support becomes the hard part. This was just too much to do well on my own in my spare cycles. My hope was that MGrammar would get mainstream support for DSLs. I was willing to let some of OMeta's features go in exchange for Microsoft backing something similiar.

It's sort of sad to hear where Oslo is going. MGrammar had far greater potential than just being SQL Server specific.

I haven't been in the OMeta# code for over a year, but I'd like to point out a few things based off your questions:

  • The DLR and C# 4.0 wasn't realistically available for use when I wrote OMeta# initially. The use of these (especially C# 4.0's "dynamic" feature) would make the code simpler. It'd probably be best to mostly start over from scratch. You could be up and running in not too much time.
  • As mentioned in my initial blog post, OMeta handles streams of *anything* and does pattern matching on it. Streams of characters is just one kind. Note that optimizer is written in OMeta#. In that case, the input and output are a simplified AST. in this case, "ometa OMetaOptimizer<HostExpression> : Parser<HostExpression>" means that it is taking in (and producing) HostExpression(s) which you can think of as LISP cons cells (more details here). You could easily (in theory) write generators for any language from these. I picked C# and wrote a HostExpression/AST -> C# translator as an example. Note that this is done without needing explicit "visit"s as would be the case in a Visitor pattern implementation. See Alex's thesis for more details on why OMeta eliminates the need for a specialized Visitor pattern and does everything with a super pattern-matching concept.
  • As for your "sum" question, check out the Calculator example. In that case I do it two different ways. The first evaluates it directly:
    AddExpr  = AddExpr:x '+' MulExpr:y  -> { x + y }

    Here you see the result of matching AddExpr will be an Int32 and it'll be bound to "x"  (the right hand side of a -> is what the pattern matches. The beauty of OMeta is that you can then use it directly in the semantic action. In fact, the OMeta parser itself is written in OMeta (that takes characters and produces an AST)

    This is the case if you're taking in text and producing a value. The other approach is to create a AST like you are doing in your M example. I do this in the Calculator Parser example:
    AddExpr  = AddExpr:x '+' MulExpr:y    -> { "add", x, y }

    This is more or less exactly what you're doing in your example (the result is just a tuple/list/cons cell)
  •  Grammar re-use is a huge concept in OMeta. You can extend other grammars or use the "foreign" keyword to "lend" your stream to another grammar. This makes it easy to build language mashups.

I put OMeta# under a very permissive license. If you read the my posts and Alex's thesis, you can go a long way and quickly surpass what I did here. One approach you could take is to rewrite the core of OMeta in C# 4.0 and use that as a basis for your MetaSharp. Altenatively, I can give you commit access here and you could have MetaSharp reference it.

Does that help clarify things?

Nov 11, 2009 at 3:38 AM

Wow, thanks for the response. You've given me a lot to read and think about. I'll have to read about it more and decide whether or not to go the route of creating a new parser. Your words are encouraging however, which is dangerous because I think I'm starting to talk myself into it despite the time I know it will take.

I will have to look at your source code more closely as well.