I just finished creating a lexer for the DM language, and I was wondering: is there a [E]BNF somewhere for the language?
I assume the C BNF might be a good start, but if one exists already that would be better :)
May 5 2014, 5:43 pm
|
|
Nopers, sorry.
|
Out of curiosity: is a standard lexer/parser not used internally (i.e., is the parsing done entirely by hand?)
|
In response to Liquidweaver
|
|
Liquidweaver wrote:
Out of curiosity: is a standard lexer/parser not used internally (i.e., is the parsing done entirely by hand?) BYOND's VM/Bytecode was written by Dan back in 95-98. Back then there really wasn't much open-source information on compilers freely available outside of university settings. It's anyone's guess how he constructed the lexical parser. |
I recommend trying to get in touch with Jp. I believe he was working on this some time ago for Scintilla or a similar project. He might have something, at the very least, for you to go on.
|
DM does not use a standard lexical parser. (I'm not entirely sure how that would work with the indent-sensitive language, but presumably EBNF can manage.) Good luck working up the EBNF grammar. I'm very curious to see how it ends up.
|
Sure.
You are completely correct that a lexer designed for a context-free grammar would not properly tokenize an "off-sides" language like DM. The indentation gives it context. I had to take a standard tokenizer and add state. Also, handling the nested bracket syntax in strings requires a stateful tokenizer. |
Coincidentally I just dropped by to upload something to my member filespace.
Very simple lexing/parsing code I hacked up over a few hours here: http://www.byond.com/members/Jp/files/dreamcatcher.zip . Almost certainly awful and primitive, but might be worth looking at. Parses out types and variables from a DM file without procs, verbs, preprocessor statements, couple of other things probably. This is a Scintilla lexer for DM: http://files.byondhome.com/Jp/dmlex/LexDM.cxx . That's for code highlighting and folding, not full-blown parsing, so it's much simpler and probably less useful. IIRC my approach was a stateful Flex lexer - kept a count of how many indent levels I was at, when that increased generated an INDENT token, when it decreased generated a DEDENT token (and had braces generate those too). Had to have NEWLINE as a token, though, because otherwise distinguishing these two cases wasn't possible: a/b Python lexing/grammar might be worth looking into. It was a thought I always had, but I was too lazy to do it. |
I'm working on a Bison/Flex parser as part of an open source C++ API I'm creating.
https://github.com/N3X15/OpenBYOND/blob/dev/openbyond-core/ grammar |
If I'm not very much mistaken, that grammar you're basing stuff on - the one you've credited to 'nan0desu' - is, in fact, the code I wrote all the way back in 2010. I've linked to my version of the files above, here's the blog post I wrote about it in 2010.
I don't mind too much - not only is the code in your openbyond-core very, very different by now, but the original grammar stuff I was fiddling with was pretty primitive, and also as far as I'm concerned it was public domain. I am a bit put out by nan0desu claiming to have written the code. |
Wouldn't calling such project "OpenBYOND" be some sort of a trademark issue or something?
|
In response to Laser50
|
|
Jp wrote:
If I'm not very much mistaken, that grammar you're basing stuff on - the one you've credited to 'nan0desu' - is, in fact, the code I wrote all the way back in 2010. I've linked to my version of the files above, here's the blog post I wrote about it in 2010. My apologies, I'll correct that ASAP. Laser50 wrote: Wouldn't calling such project "OpenBYOND" be some sort of a trademark issue or something? Probably. I will happy change the name of the project if BYOND asks. It was just a quick and easy name, since I have the creativity of a doorknob. |