Array indexOf is very slowly!

Zhan · August 22, 2019, 4:32pm

Array indexOf is very slowly!!!

when array length >10000, it’s very slowly,

how to improve?

package;

import Random;
import haxe.MainLoop;

class Gen {
    var arr = new Array<Array<Int>>();
    var index = 0;

    public function new() {
        
    }

    public function gen5(callback:Array<Array<Int>>->Void):Void {
        for (r in 0...4) {
            MainLoop.addThread(function() {
                while (arr.indexOf([1, 2, 3, 4, 5]) == -1) {
                    var genArray =[];
                    // for (i in 0...5) {
                    while (genArray.length < 5) {
                        var insert = Random.int(0, 9);

                        var temp = genArray.slice(0, genArray.length); 

                        // if(check(tem))

                        temp.push(insert);

                        if (check(temp)) {
                            // if(genArray.indexOf(insert)==-1)
                            genArray.push(insert);
                        }
                    }

                    if (arr.indexOf(genArray) == -1) {
                        index++;
                        arr.push(genArray);

                            
                    }

                    if (arr.length % 10000 == 0) {
                        trace(arr.length);
                    }
                }


                callback(arr);
            });
        }

        
    }

    private function checkRepeat(arr:Array<Int>):Bool {
        var same = 0;
        
        for (i in 0...arr.length) {
            if (i < arr.length - 1) {
                if (arr[i] == arr[i + 1]) {
                    same += 1;

                    if (same > 3) {
                        return false;
                    }
                } else {
                    same = 0;
                }
            }
        }
        return true;
    }

    public function check(temp:Array<Int>):Bool {
        if (temp.length < 4) {
            return true; 
        }

        

        return checkRepeat(temp);
    }
}


import sys.io.File;
import Random;
import sys.FileSystem;
import haxe.Json;

class Main {
    static function main() {
        trace("Hello, world!");

        var gen = new Gen();

        gen.gen5(function(arr) {
            trace(arr.length);

            var content = Json.stringify(arr);

            File.saveContent("c:/gen5.json", content);
        });
    }
}

RealyUniqueName · August 22, 2019, 4:40pm

This will always return -1 because [1, 2, 3, 4, 5] creates a new array everytime. And it cannot be equal to an item in arr since arrays checked for equality by reference.
So you basically created an infinite loop.

sundialservices · August 23, 2019, 1:58am

The performance problem in this case probably appears due to the logic that you execute when the search comes up empty. I’m not entirely sure what you are trying to do here but the manner in which you are trying to do it appears to me to be remarkably inefficient … as an algorithm.

So, if you could please describe to us exactly what it is that you are trying to do, perhaps we can help you to devise a far more efficient way in which to do it.

“Don’t ‘diddle’ code to make it faster – find a better algorithm.”
– Kernighan & Plauger: The Elements of Programming Style (1978)

P.S.: "Yes, that Kernighan. One of the guys who invented ‘C’ … and Unix.

sundialservices · August 24, 2019, 12:18pm

The checkRepeat function is the bugaboo. If you need to produce a list of random tuples without duplicates, one way to do it – a bit counter-intuitive – is this:

Build an array of random tuples.
Sort the list.
Scan the list – any duplicates will now be adjacent. Remove the spares. (The odds of actually finding any are very slight.)
If duplicates were found,top off the list by appending more random tuples, then goto 2.
Finally, shuffle the list. (Lots of fast algorithms to do that in one linear pass.)

The cost will most likely be “one fast sort,” because the actual odds of duplication are slight, and the sort will be done in O(log(n)) time or faster. Your present algorithm has the performance characteristics of a Bubble sort. O(n^2) and that’s the real reason why it’s slow.

Zhan · August 26, 2019, 2:31am

thank you @sundialservices , yes,you are right

I rewrite this low performance

here is rule

1: Generate 5 non-repeating numbers from 0 to 9

2: The number of repetitions allowed is no more than 3 times, for example

   000123, 01222 This is correct

   000230,99299 This is wrong

3: Give specific rewards according to their arrangement rules, similar to lottery tickets

Initially I wanted to generate a collision with a random algorithm, but the efficiency was too low.
Now that this is the case, there may be better algorithms, what do you think?

package;


import haxe.Json;


class Main 
{
    
    static function main() 
    {
        //trace(Date.now().toString());
        var t = Date.now().getTime();
        var gen = gen();
        trace(Date.now().getTime()-t);
        
    }
    public static function gen():Array<Array<Int>> {
        var _array:Array<Array<String>> = [];
        var array:Array<Array<Int>> = [];

        for (i in 10...100000) {
            var arr:Array<String> = Std.string(i).split('');
            if (arr.length == 2) {
                arr.unshift('0');
                arr.unshift('0');
                arr.unshift('0');
            }
            if (arr.length == 3) {
                arr.unshift('0');
                arr.unshift('0');
            }
            if (arr.length == 4) {
                arr.unshift('0');
            }
            var insert = true;
            var limit = 3;

            for (j in 0...2) {
                var count = 1;

                for (k in j + 1...5) {
                    if (arr[j] == arr[k]) {
                        count++;
                    }
                }
                if (count > limit) {
                    insert = false;
                }
            }
            if (insert) {
                _array.push(arr);
            }
        }
        trace("gen ok");
        var str = Json.stringify(_array);
        trace("to json ok");
        var r = ~/"/g;
        str = r.replace(str, '');
        trace("replace ok");
        array = Json.parse(Std.string(str));
        trace("covert to ok");
        return array;
    }
}

Zhan · August 26, 2019, 2:34am

This may be a bit like the difference between === and ==

sundialservices · August 26, 2019, 4:28pm

I’m afraid that I simply don’t have time to desk-check your code. Here is the algorithm I suggest:

(1) Generate an array of randomly-chosen tuples.

(2) Sort this array. (You will need to write a custom comparison-function, I think …)

(3) Iterate linearly through the array: if any duplicate tuples exist, they will now be adjacent. Remove the duplicates.

(4) If duplicates were removed, generate enough new random tuples to fill the array up again, and goto 2.

(5) Finally, shuffle the array to put the tuples in random order.

Although intuitively it may seem that “sorting is expensive,” it actually isn’t, and it trivializes the work of finding duplicate values (and counting how many duplicates there are). If for instance you want to know if there are “more than 3 duplicates,” simply compare the entry at [n]with that at position [n+3]. If they are identical, you know that you’re looking at a sequence of at least four repetitions. (Necessarily, all four tuples must be identical, and so you don’t have to check each one.) And you don’t have to “search” for anything. Sorting completely avoids searching.

P.S.: If you ever wondered what those people were doing with those tape-drives and punched cards, most likely decades before you were born … “this is why.”