Summary for the Impatient: If you are not interested in improving your Testing approach Please Disregard This Post.
Strongly Suggested:
Attitude When Doing Testing Should Be There Are Defects To Be Found.
The problem IS How To Find Defects Effectively ?
Practically REQUIRED
NUMBER 1 Way To Help Avoid Defects Before Testing ( Giant Process DEFECT ! )
Output of ALL compiles need to be captured before Tests. AND Reviewed By Testers!
IF There Are Warnings / Errors THEN Developers Need To Clean Up The Application!
And Just Disabling Warnings Is Indication Of A LAZY Developer and Should Not Be Allowed Unless After Careful Review ( and not allow original Developer to have a vote ! ).
The Compiler and Interpreter providers go to a lot of Time and Effort to let anyone using those tools get good feedback about Warnings (Problems that happen to your Users but you cant reproduce locally) or Errors (Problems that are so bad that Lazy Developers like me are forced to take action to fix).
I have literally lost count of the number of times some Developer (including my earlier Self) would think a Warning is “HARMLESS” only to find after more Debugging that the Warning directly indicated the starting point (Root Cause) of a serious DEFECT.
Alternate Name: Keep Those Lazy Developers Honest !
Important Safety Tip: Even Testing Frameworks and UI / GUI Unit Tests may have built in Blind Spots or Weaknesses. If you use the Palm Gremlins point of view you may help yourself to avoid deploying Defects.
Palm had a great built in approach called Gremlins. Automatic pseudo random Keyboard AND Gesture AND GUI Buttons, Sliders, etc activation. Anything in the UI that could change or any UI control was tested. There was a pretty strong guidance that Palm would check up on Your App by running Gremlins at Palm before you would be approved to publish App. And the Gremlins ran As Fast As Possible (way faster than Monkeys) and would find subtle timing window Defects that no other testing approach could find. Gremlins could use a random start value or you could give it a value if you needed to reproduce a defect that would only happen after 900,000 Gremlin entries. Our little startup would let Gremlins run overnight and If count got to 1M then Maybe would trust Our Own Code. Typically would start a few more runs with different start numbers to get that “Warm and Fuzzy”.
Another thing to look for in Testing Frameworks is excellent support for Debugging.
Best approach might be:
Just always run a Debug Build of your app under the Debugger for automatic tests.
Set Breakpoints where the Code is handling unusual Exceptions or unexpected Errors.
Browser(s)
I think your Headless approach should work but I am really Ignorant of Web dev until a few months ago.
What I expect any Developer would want:
maybe a Callback into the App from the Browser when the Browser (or fake Browser) thinks it found a Warning or Error. What would be awesome for Debugging is to have a related Call Stack with parameters at the time of the Warning or Error.
Definitely a captured Log file saved that includes enough actions and other stuff before the Warning or Error to give some meaningful context to help isolate the Defect.
What would be awesome is if the Logging of the Test tool(s) could provide a Call Stack also or at least the most recently called part of your Application including all the parameters.
Deterministic way to reproduce the Defect.
If you use all the same initial settings for the Browser, Debugger and anything else will the Error always happen? If not, see below.
Sometimes it is Easier to try testing with a Release build to also help with finding a Defect that a Debug build finds. Optimization changes Timing of your App and related support Tools and all that.
After the Debug build seems to do OK then test with a Release build.
Hopefully you will not find any tricky Timing Window Defects in your App or the Libraries your App uses or other places.
IF you do AND you do not have Source code to change (3rd party Library, Framework, Tool or even OS): THEN add some code in your Application that is DEFENSIVE. The added code explicitly checks for the Warning or Error conditions Before OR Immediately After and Then does a Retry approach. I had to deal with an API that returned absolutely No Error but a requested Network setup just quietly FAILED. So I put in code that even after a so called Successful return would check the Network to be sure it actually worked. Very subtle as would only happen on certain deployments of the OS with certain background processes running. And in this case the silent No Error when there really IS An ERROR would happen originally less than 1 in 1000 tests (the DREADED Timing Window Defect class).
Suggest you try running some background processes on same platform as Browser or Headless Browser. After all Users don’t just Run your App only but rather run several processes and yours is just 1 app. You can try with both Debug and Release builds.
Again another way that changes the Timing of your App and especially the Resource Availability that your App will See. Load testing, Stress testing, low RAM tests, low Disk space tests, Network Busy / Not Available, etc.
Another Weakness of Testing:
There is a fairly large burden from using some Testing approaches.
It requires creating tests that really test Limits, Corner Cases, Bad Data input, etc.
Relying on Automated Tests may completely miss some really important Defects because no tests really hit the logical limits or assumptions of the Application. Bad Data showing various Security Defects or Math values being used incorrectly for example.
What might give you some help is what I call Testing by Contract.
The idea can be used for White Box and Black Box Tests along with practically any other type of Test.
You apply the Ideas from Design by Contract of Pre Conditions, Invariants and Post Conditions when you look at the various Application interfaces, UI, public API of the Application: settings, preferences, Haxe code interfaces, etc.
Think about if the calling (Test code or script) gives questionable or bad input that only fails to honor 1 Pre Condition out of 4. What Error returns or Exceptions are reasonable to expect?
In a similar way what is reasonable if after return from the Application an Invariant is broken.
In a similar way see if any Post Conditions were not supported.
AND If you really want to cut back on Defects.
Inspection(s) of source(s) is a process that I have seen work very impressively.
In one Inspection meeting over 50 Defects were identified in 1 hour by 5 people in a source code file. Yes there is significant overhead to Inspection meetings. Another reason to schedule them as early as possible in your process. Because Agile does not change any fundamentals about the COST of Defects. Early Correction is a Huge Saving.
And the Inspections approach detects types of Defects that literally NO amount of even very dedicated and otherwise thoughtful dynamic Tests will FIND. Pair programming in Agile will help find SOME of what Inspections find but I would think not as completely. And I am guessing that trying to convince any Agile team to actually use the Inspections approach would be difficult to do.
Spoken as former Palm and Intel employee
p.s. really incomplete summary for other Platforms:
Mobile platforms: Same as above (use the Emulator / Simulator, Luke!)
Desktop / Laptop: Test on both Slowest and Fastest reasonable HW / OS.
Server(s): Minimum Performance metrics are required to be met. Meet them always!
Network(s) Like server(s) with more Protocol combinations / connections to FAIL.