Removing function call in native python code

I’ve found that for my use case, normal iteration over an array is pretty slow on python. Say I have an array with a million elements in it that I want to iterate over. Doing for(entry in arr) {} or for(idx in 0...arr.length) {} were both significantly slower in python than I expected them to be.

Digging in to it a bit, the python compiled code for both of those loops evaluated to a while loop, e.g. this simplified haxe code:

for (x in arr)
    if (x == "12345") continue;

compiled to this in python:

_g = 0
while (_g < len(arr)):
    x = (arr[_g] if _g >= 0 and _g < len(arr) else None)
    _g = (_g + 1)
    if (x == "12345"):

I replaced that with a python for loop and the performance increase was significant - an order of magnitude on my machine.

for str in {0}:
    if (str == \"12345\"):
", arr);

for being faster than while on python seems to be a thing, so I tried to make this a bit generic so I can use it in a few places where I need better performance. What I currently have is this:

static inline function fastIterator<T>(arr:Array<T>, f:(entry:T) -> Void)
#if python
    python.Syntax.code("for x in {0}: {1}(x)", arr, f);
    for (x in arr)

fastIterator(arr, (str) ->
    if (str == "12345"){

This works, and is faster than using the haxe for loops, but it appears that the function call is still slowing things down a bit. The python code generated from the above looks like this:

def _hx_local_1(_hx_str):
    tmp = (_hx_str == "12345")

for x in arr: _hx_local_1(x)

In my tests, if I can get rid of that _hx_local_1 function call and put its contents directly as the body of the for loop it appears I could save a good amount of time - but I have no idea how to go about this, or even if it’s possible. Any ideas on whether this could be rewritten? I’m not sure if a macro can be used here since I need to modify the python code - maybe there’s another way to write this?

1 Like