[Cuis-dev] No regex in parser

Thierry Goubier thierry.goubier at gmail.com
Wed Sep 25 15:40:48 PDT 2019


Hi Casey,

Le mer. 25 sept. 2019 à 21:09, Casey Ransberger via Cuis-dev
<cuis-dev at lists.cuis.st> a écrit :
>
> Hi Mariano,
>
> I’m familiar with PetitParser, OMeta/Ohm as well.
>
> What I’m after is more like, “has anyone studied ad-hoc parsers in depth?”

Yes.

Main ones are either recursive descent (i.e. LL(1)) or, recursive
ascent (LR(1)).

> I have a sneaking suspicion that when the grammar is sufficiently trivial, an ad-hoc parser ends up being the smaller, simpler code base versus a Chomskyan setup with a parser generator. But, how do I confirm that this gut-feeling is correct?

A chomskyian setup is 20 lines of code plus a table for LR(1), so it
is very hard to handwrite a parser that is smaller than that.

The main challenge, for me, is that the compiler theory has found a
very high bar for the simplest, most powerfull non backtracking parser
which is the LR(1).

> I can point at anecdotal evidence. Smalltalks, Lisps, and Forths don’t seem to need the — heh — Chomskyan overhead. Again, the grammars are relatively simple, and that’s probably the “why.” But I’m looking for a formal inquiry into the “how.”

C++ usually has ad-hoc handwritten parsers (for example gcc and
clang); that does not mean the language is simple.

> Maybe there isn’t one and I have to do it myself. Cuis would be a great subject, but I’m lazy and not at a college, so I look for research papers. What I’m hoping I can find is research into ad-hoc parsers (with a focus on the scanners they employ.)

A recommended reference is "Parsing techniques", from Grune and Jacob.
You find in it the complete menagerie of all parsers algorithms, and
therer are a lot of them. This book has an extensive (and commented)
bibliography.

> Anyway I think I’ve burned enough bandwidth on something only tangentially relevant to the list, so I’ll bow out now.

If you have the time, please share your findings. I'm working with
advanced (highly ambiguous) parsers for unconventional subjects, and
I'll be interested if you find something challenging on your path.

Regards,

Thierry

> Thanks for indulging me, people of Cuis :)
>
> —Casey
>
> > On Sep 24, 2019, at 4:54 AM, Mariano Montone via Cuis-dev <cuis-dev at lists.cuis.st> wrote:
> >
> > Hello,
> >
> >> El 24/9/19 a las 00:16, Casey Ransberger via Cuis-dev escribió:
> >> First, I should say I’m aware that there are external packages that do regular expressions. I’m interested in how we get by without them.
> >>
> >> Found myself marveling again tonight that the Smalltalk parsers of the world generally haven’t used them. That really astounded me at first, as I had been raised on lex/flex for lexical analysis. Later I learned about parsing expression grammars and saw those used for both scanning and parsing.
> >>
> >> I still don’t know how to write a scanner without either of those tools, I’m afraid. I’m also honestly not terribly familiar with how the various Smalltalks go about this.
> >>
> >> For context: I’ve run across a case where I want to build a scanner in a language that lacks any regular expression support at all, and thought I might look to Cuis for a good example of how to go about it. Sorry that this is only tangentially on-topic.
> >>
> >> Questions:
> >>
> >> * What classes / methods should I look at to see the magic?
> >
> > Smalltalk uses all sorts of magic in different areas, but I don't think
> > there's anything special about how it handles parsing. The Scanner and
> > Parser classes are implemented manually.
> >>
> >> * Is there a formal name for the kind of pattern matching that Cuis does when tokenizing input? Asking because I’d like to track down research papers as a matter of habit.
> >
> > If you are willing to reasearch, then I would look at Parsing Expression
> > Grammars. Smalltalk implemention of PEGs is PetitParser from Lukas
> > Renggli. PetitParser combines PEGs with a combinators api; it is
> > somewhat special, not easily found in other programming languages. The
> > PetitParser package for Cuis works very nicely. And this is cool
> > Smalltalk stuff using them: http://scg.unibe.ch/research/helvetia
> >
> > Cheers,
> >
> > Mariano
> > --
> > Cuis-dev mailing list
> > Cuis-dev at lists.cuis.st
> > https://lists.cuis.st/mailman/listinfo/cuis-dev
>
> --
> Cuis-dev mailing list
> Cuis-dev at lists.cuis.st
> https://lists.cuis.st/mailman/listinfo/cuis-dev


More information about the Cuis-dev mailing list