Hi! Everybody seems to have fun with AI so I decided to chime in & speedrun one of my old, crazy ideas:
Enter Fiberus, a native Haxe compiler target with a fully fiber-based runtime. It is designed specifically for workloads that can benefit from native, lightweight, cooperative concurrency using fibers/coroutines combined with a work-stealing scheduler and tight GC integration.
Fiberus exists to let you write clean, idiomatic Haxe code while enjoying cooperative concurrency that feels like a natural part of the language and runtime - without the friction of high-level async/await constructs that always seemed like an afterthought layered on top in other languages and runtimes.
Eventually I wanna to use this to build all kinds of things, from web-services(see GitHub - fiberus-hx/warp10 ) to game-engines (maybe fiber-cortex? :P).
It’s all alpha software, only Linux is supported and there is little documentation right now. So you better expect some nice explosions!
The concurrency model you’ve chosen, it essentially blocks the current fiber on IO, right? Are there primitives that allow firing off multiple blocking calls in parallel, i.e. something roughly equivalent to const [users, posts] = await Promise.all([getUsers(), getPosts()]) in JS?
I’ve also kinda grown tired with the noise from async IO, so solutions that try to get rid of that are very appealing. But then again I wonder if the noise is not a feature? It signals to me clearly that I’m yielding control, potentially for hundreds of milliseconds, and that this is the place where race conditions are most likely to creep in. Or do you have plans for something like actors or some other approach to deal with this?
Anyway, super cool. Can’t wait for the Windows version
A Fiber can yield at different points like at function entries and in for/while-loops in cases where there are other fibers waiting in the current thread’s queue/inbox(strong fairness!). In cases of IO we use io_uring to have OS level async operations on the scheduler level (we never wanna block a scheduler thread) and a fiber requesting an operation will suspend execution till there is a wake up to continue.
Spawning & syncing fibers can be done with a simple counter. I plan to add more utility/sugar around that(e.g. channels).
But then again I wonder if the noise is not a feature?
Valid point. “colorless” async can feel weird. But I think it’s fine since you should be aware of your data access patterns/mutability in any case and for debugging there is heavy instrumentation and profiling via Tracy-Profiler available. If it really turns out to be a problem there are multiple ways to address this (e.g. like async/await macros).
Ah, ok … so essentially one would spawn one fiber per parallel op and then suspend until all of them are done. Neat
It’s not that it feels weird … to the contrary: it’s what feels natural. And it’s super convenient too, until multiple concurrent processes / threads contend over shared resources
I just remember when I got excited about async/await in JS only to realize that my problems didn’t go away, only started look nicer. So even though the flow of the code becomes more easily understood with await, I learned that it’s also a marker for where things can go wrong - especially on a server where you may have to deal with thousands of concurent requests. Hence my paranoia
Up until there’s indirection. A function type that returns a plain result (or an interface with a method that returns a plain result) essentially means that the code is going to return synchronously, at least in runtimes where threads are isolated (JS) or cannot run in parallel (Python with GIL). Annoying as coloring may be, it does mean that you can make sync vs. async a contract. With blocking vs. non-blocking, it’s trickier. Anyway, I’m not even suggesting you should do anything about this. Might wind up as annoying as checked exceptions. I was just wondering if you had any synchronization approaches in place (or even in mind) that are not so loudly advertised ^^
Ok, my Fibrix GC is no longer full STW mark-sweep, instead we run a concurrent mark-sweep with SATB-Barriers (go-like). Things point into the right direction!
Oh! I’m super thankful about every bit of feedback I get! No, I really mean it, we will have to see. I literally have not built anything big yet and it might be very well the case that I missed something that invalidates my mission directly. Will find that out sooner than later
Ok, so the more I tested and the more complex the applications became (beyond the thousands of tests), it became pretty clear that the Immix-inspired GC was either super fast or super slow. The whole STW and synchronization kungfu really dragged this down and made me question my architecture.
Good news is that I found several interesting research papers and that gave me some ideas for a complete overhaul. We have now a MaPLe-inspired Hierarchical Heap GC (per-fiber heaps, depth-based write barriers, Cheney copying collector, TypeDis extensions, …) and we no longer have to pay the costly price of waits & pauses.
It actually works soo well, that I dont quite trust it atm. But still happy to share the progress here. Take it with a grain of salt; Seems like we play in the league of Rust frameworks now?! I guess I will setup the TechEmpower Benchmarks soon to get a clearer picture.
I wonder if something like Nim’s “Optimistic Reference Counting” might be worth looking into: In a nutshell it uses automatic reference counting (not unlike Swift’s) with a cycle collector: Introduction to ARC/ORC in Nim - Nim Blog . It’s basically a bet on the idea that most object graphs are cycle free. Plus it’s my understanding that they also avoid the overhead of atomic RC for thread local objects, which your hierarchical heap might lend itself to.
Then again your numbers look so good it’s quite possibly not even worth the hassle ^^