I’m trying to parse some specific lines of LuaDoc using hxparse.
Those lines looks like this: ---@param name type description
where the type can also be a sum type so in reality it can be string|number
in any amount.
I already stripped out the prefix, so what is left to parse is just name type description
, and it is very context aware.
I have defined the following Token enums, one for the regular parsing, and another specific for types
The regular one
enum DocToken {
Identifier(name:String);
Description(text:String);
DocType(type:TypeToken);
ArrayMod;
OptionalMod;
Comma;
CurlyOpen;
CurlyClose;
SquareOpen;
SquareClose;
Lparen;
Rparen;
TypeOpen;
TypeClose;
Pipe;
Spc;
EOL;
}
The one specific for types
enum TypeToken {
Function;
Number;
String;
Table;
Boolean;
Nil;
}
The problem lies when I try to switch on the different combinations of DocType(t)
with or without pipes.
Here are the lexer rules:
static var ident = "[a-zA-Z_][a-zA-Z0-9_]*";
public static var desc = @:rule [
"[^\n]*" => Description(lexer.current.ltrim()),
"" => EOL
];
public static var paramDoc = @:rule [
ident => {final name = lexer.current.ltrim().rtrim(); Identifier(name);},
"" => EOL,
];
public static var typeDoc = @:rule [
// " " => Spc,
" " => lexer.token(typeDoc),
"," => lexer.token(typeDoc),
"\\[\\]" => ArrayMod,
"\\?" => OptionalMod,
"<" => TypeOpen,
">" => TypeClose,
"{" => CurlyOpen,
"}" => CurlyClose,
"[" => SquareOpen,
"]" => SquareClose,
"\\(" => Lparen,
"\\)" => Rparen,
"\\|" => Pipe,
"number" => DocType(TypeToken.Number),
"string" => DocType(TypeToken.String),
"table" => DocType(TypeToken.Table),
"boolean" => DocType(TypeToken.Boolean),
"function" => DocType(TypeToken.Function),
"fun" => DocType(TypeToken.Function),
"nil" => DocType(Nil),
"" => EOL,
// ident => throw 'Unknown type "${lexer.current}"',
];
My first problem appears when I try to parse using the 3 main elements. Because I have 3 different rulesets, I can go as blindly as:
case [Identifier(name), SPC, DocType(t), SPC, Description(d)]:
I have to first match on identifier, then check if the next element is EOL, and in that case return, and then select the next ruleset and continue parsing:
public function parse() {
return switch stream {
case [Identifier(name)]:
stream.ruleset = LuaDocLexer.typeDoc;
if (this.peek(1) == EOL)
return {name: name, type: null, description: null};
try {
final t = parseType();
stream.ruleset = LuaDocLexer.desc;
final text = parseDesc();
return {name: name, type: t, description: text};
This is not exactly how I wanted it, but at least works.
The problem gets even worse later in the parseType
method, because, as soon as I try to put a Pipe between two types, the compiler complains that I am not using the cases that are blow that one. Here:
public function parseType() {
return switch stream {
case [DocType(Table), TypeOpen, t = parseTypeArgs()]:
'Table<$t>';
case [DocType(t)]:
t + "";
case [DocType(t), Pipe, t2 = parseEither()]: 'Either<$t, $t2>'; // Here it says pipe is unused, and this never mathces
}
}
public function parseEither() {
return switch stream {
case [DocType(t), Pipe, t2 = parseEither()]: 'Either<$t, $t2>';
case [DocType(t)]: '$t'; // Here also says the case is unused
};
}
I am starting to think that I am missing some key concept. I tried including SPC as a token, but I’m not sure how to match with it without having an explosion of cases where I should account for extra spaces.
If I use concrete types, for example like this:
case [DocType(Number), Pipe, t2 = parseEither()]: 'Either<$t, $t2>';
then it is not a problem, but I really want to be able to combine any two values with the Pipe operator