Lu(x)i solves the problem of floating-point drift in parallel computing, delivering bit-exact reproducible results for safety-critical infrastructure.
Most engineers assume sin(0.5) always returns the same value. It does not.
When GPUs run thousands of parallel threads, the order of operations varies. Small rounding differences compound. The same code, same hardware, same input can produce different outputs.
This breaks:
You cannot reproduce historical results.
Regulators require predictable behavior.
You cannot reproduce failures.
Lu(x)i compiles your expression into a single fused kernel with fixed operation order. No thread scheduling variance. No accumulated drift.
Same sequence regardless of parallelism
Explicit rounding, no fast-math
No undefined behavior
Cryptographic verification
Other engines achieve determinism by disabling SIMD and avoiding GPU acceleration. Box2D and Rapier take this approach. For games, that tradeoff works.
For quant finance running Monte Carlo at scale? For defense systems requiring real-time performance AND certification? That tradeoff doesn't work.
Lu(x)i delivers determinism WITH full acceleration:
| Capability | Others | Lu(x)i |
|---|---|---|
| SIMD (AVX-512, Neon) | Disabled for determinism | ✅ Enabled |
| GPU (CUDA, Vulkan) | Not supported | ✅ Full support |
| Throughput | Game-scale | 286.94B ops/sec |
No tradeoffs. No compromises.
Every Lu(x)i response includes a SHA-256 hash of the output. Store the hash at computation time. Months later, re-run the same input. If hashes match, the computation is verified.
{
"expr": "sin(x)*cos(x)",
"x": [0.5, 1.0, 1.57],
"y": [0.4207, 0.4546, 0.0007],
"hash": "98bd97026a738671..."
}
Faster computation means less time at peak power. Less time at peak power means less heat generated. Less heat means lower cooling costs and longer battery life.
Most benchmarks test isolated operations. That is like playing a single note on a piano.
The Art of Fugue benchmark simulates a polyphonic mathematical workload with three concurrent "voices" of conflicting intensity: trigonometric identities, logarithmic decay, and discontinuous transcendentals.
The output is captured as a single SHA-256 hash. On most engines, this hash drifts across platforms. On Lu(x)i, the hash is identical on M1, H100, and L4.
View on GitHub