What kind of parser does the Haxe compiler implement?

fullofcaffeine · June 25, 2019, 9:46pm

Or “where could I find more technical docs about the internals of the compiler”?

I’ve been digging into compiler theory and learning about all the different kinds of parser algorithms out there (LL, LR, PEG, top-down, bottom-up etc). I’m wondering what’s the one (or variation or combination of) Haxe uses.

Also, does Ocaml force or favor any algorithms/techniques?

Thanks in advance!

sundialservices · June 25, 2019, 10:03pm

Well, I confess that I have never “dumpster dived” into the source code yet, but this web page suggests that it might be good ol’ LEX/YACC.

http://caml.inria.fr/pub/docs/manual-ocaml/lexyacc.html

Certainly, this technology would be sufficient to do the job.

nadako · June 25, 2019, 10:11pm

I’m not very familiar with classification of parsers, but if I understand correctly, Haxe uses a simple recursive descent parser, and mostly LL(1) grammar (there’s one edge case with semicolon before else that requires two token peek). As for the software, we use sedlex for the lexer, and camlp4 for the parser.

fullofcaffeine · June 26, 2019, 5:25pm

@nadako Thanks for chiming in!

mostly LL(1) grammar (there’s one edge case with semicolon before else that requires two token peek)

Interesting. After learning a bit about each of those, I was expecting Haxe to use LR (or its variations) or PEG, which seem to be the most powerful and pragmatic ones and the ones that are generated by most parser generators. I also wonder what is the main advantage of OCaml + sedlex + camlp4 vs the usual lex/yacc route (which AFAIK, generates LALR parsers)? It looks like the grammar for Haxe is then not available in a declarative format (as it would be if a tool like yacc/bison was used)?

Also, from what I’ve read, RD parsers are not very performant, but perhaps the Haxe OCaml implementation is better in this aspect?

@sundialservices

Well, I confess that I have never “dumpster dived” into the source code yet, but this web page suggests that it might be good ol’ LEX/YACC.

Looks like it doesn’t use the OCaml impl. of Lex/Yacc, as stated by @nadako.

teotigraphix · June 26, 2019, 8:43pm

Hi, I am really familiar with Antlr and it’s grammar, I was curious on the surface, what are the differences between and Antlr lexer/parser and Ocaml (haxe’s)?

sundialservices · June 26, 2019, 10:22pm

Well, when it comes to parsers – academically interesting though they are (such that no one escapes alive from college these days without having constructed one …) – in this regard we can borrow a phrase from the Perl crowd:

TMTOWTDI (“Tim Toady”): There’s More Than One Way To Do It.™

And then:

“… as long as it works.”* Which of course it does.

“The problem of parsing” has been thoroughly solved by now. We’ve got lots of extremely-efficient ways to do it. (We’ve even got mature off-the-shelf tools (e.g. Bison …) with which to handle edge-cases that “simple YACC” can’t do.) My superficial assessment of the Haxe syntax, however, is that it is uncomplicated and apparently LALR(1). “Nothing to see here … move along …”

teotigraphix · June 27, 2019, 12:08pm

This is more coming from the stand point I absolutely love lexers and parsers, so I am one those guys and have a lot of experience/work with the Antlr ecosystem.

I would also like to work on the Haxe compiler and it’s features in the future, so just trying to start to ease into the dark recesses.

Simn · June 28, 2019, 8:15am

It’s a custom recursive-descent parser which deals with some special cases. I once tried to express the grammar in a declarative way but that was a bit of a mess due to a few cases:

if (expr) expr; else expr as already mentioned requires a lookahead of 2.
positions are significant in various places, e.g. @:meta(expr) is a metadata with argument whereas @:meta (expr) is a metadata without argument on a parenthesis expression. I dealt with this by lexing :meta( as a single token.
> is always lexed as a single token, so >> is OpGt OpGt in syntax. This is also affected by positions because > > is not valid syntax.

Let me know if you have any further questions!

teotigraphix · June 28, 2019, 2:30pm

For sure.

sundialservices · July 12, 2019, 2:50pm

Yup, say what you will about “recursive descent” but it works well.

In the Perl world I sometimes use a package called (I think) Parse::RecDescent which actually generates on the fly a recursive-descent parser based on a grammar provided, then runs the generated program. It runs like a bandit …

teotigraphix · July 25, 2019, 2:20pm

Kind of like Antlr but real-time. I have written a couple recursive descent parsers and I LOVE them. When I first really understood the power I almost tripped out.

HAHA