[Cuis-dev] No regex in parser

Thierry Goubier thierry.goubier at gmail.com
Wed Sep 25 23:23:40 PDT 2019


Le jeu. 26 sept. 2019 à 01:19, Casey Ransberger via Cuis-dev
<cuis-dev at lists.cuis.st> a écrit :
>
> Below
>
> > On Sep 25, 2019, at 3:40 PM, Thierry Goubier via Cuis-dev <cuis-dev at lists.cuis.st> wrote:
> >
> > C++ usually has ad-hoc handwritten parsers (for example gcc and
> > clang); that does not mean the language is simple.
>
> Yes, arguably this means that the grammar for C++ is complex. This is not an unexpected finding. If it was amenable to a formal representation and the cast of the show knew what they were doing, clang still wouldn’t be ad-hoc. But are you certain? I figured they’d be using flex/bison.

Gcc stopped using bison when reaching the 4.0 version, and never used
flex (in my knowledge it's a hand-written scanner for the 3.x). I've
spent some time peering over the gcc parser a while ago.

The Clang parser is a recursive descent hand-written parser, with a
very complex form of backtracking (where incorrect paths taken trigger
the generation of virtual tokens in the input to guide the following
paths). It's, for me, one of the maximum complexity ideas, akin to a
sort of self-modifying code dependent on your inputs... I sometime
assimilate writing ad-hoc parsers as writing an OS in assembly
language.

The C++ formal grammar, the one that really works, is huge (see the
J.A. Roskind grammars). See, we even reference them.

Current state is that you need a GLR parser (the full context free
theory) to handle C++. In the Smalltalk world, John Brant (who created
with Don Roberts the refactoring browser a long time ago, and SmaCC),
is working on a C++ grammar.

Thierry

> Cuis-dev mailing list
> Cuis-dev at lists.cuis.st
> https://lists.cuis.st/mailman/listinfo/cuis-dev


More information about the Cuis-dev mailing list