Rationale for lenient/forgiving behavior of substr and substring

uvtc · October 1, 2019, 2:30am

Was just reading the docs for String and noticed that substr and substring go to some lengths to accept possibly bad args. For example, substr(pos:Int, ?len:Int):

if pos is negative, and this.length + pos is negative, 0 is used instead
if the calculated pos + len exceeds this.length, it just goes to the end of this String.
And finally, if len is negative, the result is unspecified.

I’m curious: why does Haxe allow this, rather than telling me I may have passed a bad argument? Why not fail if this.length + pos is < 0, if pos + len exceeds this.length, or if len is negative?

RealyUniqueName · October 1, 2019, 7:12am

Most likely because that was the behavior of Haxe’s primary targets when String API was designed.

sundialservices · October 1, 2019, 3:37pm

Actually, many languages do these things, because it makes life easier when getting substrings. For instance, “give me the leftmost, or the rightmost, 5 characters of a string” can be written in just one way, even when the string in question is "abc". (Only 3 characters long.) It works better to let the function be forgiving of what are technically incorrect inputs, to avoid the (pointless …) extra work that would otherwise be required to ensure that they are always within bounds.

uvtc · October 2, 2019, 5:11am

Thanks. Yes, I see that some languages are more forgiving/accepting than others, and that this trait comes in a few different flavors. For example, I like Python because it complains any time things don’t look exactly right — but unfortunately it does so at run-time, and thus pays a performance penalty.

It’s funny. Maybe someone else can comment and explain further, but I see two axes that stand out here:

the degree to which the language implementation helps you catch your errors at all, and
how early or late the language helps you catch those errors

And both tie in to another axis: performance… (so many intertwined trade-offs…)

For example,

Python helps you a lot in catching errors, and does so later (at run-time), and so pays a performance penalty.
Fwict, Lua doesn’t appear to help you very much in catching errors, and has pretty quick performance.
From what I recall, Java tries hard to catch errors, and does so early on (compile-time), and performance is good.
JavaScript I’m still learning, but it may be hard to classify in this way because of all the tech and hours pumped into making it fast regardless…

Given Haxe’s static typing, the fact that it’s compiled, and given its ability to compile to some very forgiving/accepting languages (like JS and Lua), I figure a big advantage of Haxe is that it works very hard to help me catch my errors early — which is why I was surprised about how accomodating those String methods are (I see also some Array methods work similarly, and lots other methods say something like “if passed null, the results are unspecified”).

Am I mistaking a little innocent convenience for lack of vigilance in helping me catch my errors at compile-time?

(Note, an item that I still trip over is knowing which things Haxe can check at compile-time, vs dynamic things that aren’t known until run-time. But I think the subject of this thread is the former.)

Thanks for any feedback on this!

Gama11 · October 2, 2019, 9:54am

I don’t think this sort of thing can be caught at compile-time anyway. For that to even be theoretically possible, all the components of the call would have to be constants / known at compile-time: the string itself as well as the pos and len arguments (something like "Hello World".substr(0, 4)). Usually at least one or more of these are computed dynamically at runtime though.

Regarding null arguments, the new experimental null safety feature could help with that. But the standard library hasn’t really been updated for proper compatibility with that (i.e. adding Null<T> where needed).

piboistudios · October 2, 2019, 5:27pm

Really it all just has to do with whether the language is static or dynamic. A static language can work off of a host of assumptions that a dynamic language cannot, and can therefore throw more errors at compile time. In a dynamic language, most ‘compile’ time errors will be strictly syntax related.

Haxe does well to point out typing issues early because it has a rather robust typing system, but like Gama said, this isn’t really something that can be caught at compile-time, and its difficult to objectively say whether leniency with substring methods is an error that should or shouldn’t be thrown at runtime

mark.knol · October 4, 2019, 11:21am

Btw it’s not super hard to create your own fail safe utility function that does this. But I would rather ask, why provide bad args in the first place?

class Test {
    static function main() {
        trace(Util.substr("Haxe is great!", 5, 2));
        trace(Util.substr("Haxe is great!", 5, 123213122));
        trace(Util.substr("Haxe is great!", -5, 123213122));
    }
}

class Util {
     public static function substr(value:String, pos:Int, ?len:Int):String {
     	var pos = min(max(pos, 0), value.length - 1);
      	return value.substr(pos, len != null ? min(len, value.length) : value.length - pos);
     }
    
	/** Returns largest of two values. **/
	inline public static function max<T:Float>(a:T, b:T):T {
		return (a > b) ? a : b;
	}

	/** Returns smallest of two values. **/
	inline public static function min<T:Float>(a:T, b:T):T {
		return (a < b) ? a : b;
	}
}

uvtc · October 5, 2019, 8:41pm

I think there’s a misunderstanding here. I don’t want String.substr and String.substring to me more lenient/forgiving with what I pass in — I want them to be more strict; that is, less lenient and forgiving.

I’m using those two String methods just as an example here. They are the first methods I’ve come across that seem to me to be too lenient and forgiving with the args passed to them.

The point being, I’d like Haxe to work as hard as it can to help me as early as possible to catch errors that I inevitably make. If Haxe instead is very tolerant of possibly-erroneous inputs, to me that sounds like a recipe for creating hard-to-find bugs. (For example, String.substring, if passed 5, 2, reverses them as if I passed 2, 5. But what if I’d written and am using a function to figure out string indices and it has a bug but String.substring is hiding that bug by accepting the bad input anyway?)

But as I’m still learning the Haxe standard library, maybe those String methods are just forgiving/lenient/accepting-of-possibly-bad-input because it’s very convenient for them to be that way? Maybe making them strict would make working with Strings a pain the neck?

As I’m sure I’ll find out down the road, what I’d like to know is: is the Haxe and the Haxe std lib generally intentionally accepting of weird inputs (as a design goal), or is it in generally pretty strict and String is just an exception to the rule?

emugel · October 8, 2019, 6:44am

Getting a screwdriver on a swiss-army knife is not leniency.

In the beginning you will want to make checks, only to realize most of the time the way the function prototype is designed you don’t need them.

If you are super strict on some requirements then you may want EReg or maybe a simple test on s.length.

This also stands true for Array.slice(), it works the same way. These (ubiquitous) functions are a blessing for general use case.

uvtc · October 9, 2019, 4:32am

Ok, got it. Thanks, emugel.