Having difficulty with EReg in Java

I’m having trouble with a regex when compiling to Java. The regex is meant to match ANSI sequences in strings so that I can split the string with the regex and join the resulting array with "", removing ANSI formatting.

This is the Haxe regex:

~/[\x1b\x9b][[\]()#;?]*((([a-zA-Z0-9]*(;[-a-zA-Z0-9\/#&.:=?%@~_]*)*)?\x07)|(([0-9][0-9]?[0-9]?[0-9]?(;[0-9]?[0-9]?[0-9]?[0-9]?)*)?[0-9A-PR-TZcf-ntqry=><~]))/g

This is the compiled java output:

new haxe.root.EReg("[\\x1b\\x9b][[\\]()#;?]*((([a-zA-Z0-9]*(;[-a-zA-Z0-9/#&.:=?%@~_]*)*)?\\x07)|(([0-9][0-9]?[0-9]?[0-9]?(;[0-9]?[0-9]?[0-9]?[0-9]?)*)?[0-9A-PR-TZcf-ntqry=><~]))", "g")

And this is the exception I get thrown:

Exception in thread "main" java.lang.ExceptionInInitializerError
    <stack tracing>
Caused by: java.util.regex.PatternSyntaxException: Unclosed character class near index 152
[\x1b\x9b][[\]()#;?]*((([a-zA-Z0-9]*(;[-a-zA-Z0-9/#&.:=?%@~_]*)*)?\x07)|(([0-9][0-9]?[0-9]?[0-9]?(;[0-9]?[0-9]?[0-9]?[0-9]?)*)?[0-9A-PR-TZcf-ntqry=><~]))
                                                                                                                                                        ^
    <more stack tracing>

Apparently this is caused by me not escaping something, probably in that last set of [].

Does anyone have experience with Java regexes?

This is happening with Haxe 4.0.0-rc.3, hxjava 3.2.0 & 4.0.0-alpha, openjdk-8-jdk + openjdk-8-jre, and openjdk-11-jdk + openjdk-11-jre; and I’m on Ubuntu 18.04, if it helps.

Thanks!

Escaping strings can be very tricky when you’re building a regex. I’d suggest that you display the string that you’re right now passing to the EReg method, and look for tpyos. :slight_smile:

I’ve also been known to break down a complicated regular-expression string into several concatenated constants, just to make things easier to read for the programmer … even if it’s just me. As you see, such “chicken scratch strings” can be unreadable.