Deterministic math at GPU scale

Lu(x)i solves the problem of floating-point drift in parallel computing, delivering bit-exact reproducible results for safety-critical infrastructure.

The Problem

Floating-Point Math Is Not Deterministic

Most engineers assume sin(0.5) always returns the same value. It does not.

When GPUs run thousands of parallel threads, the order of operations varies. Small rounding differences compound. The same code, same hardware, same input can produce different outputs.

This breaks:

  • Audits

    You cannot reproduce historical results.

  • Certifications

    Regulators require predictable behavior.

  • Debugging

    You cannot reproduce failures.

The Solution

Lu(x)i Guarantees Bit-Exact Results

Lu(x)i compiles your expression into a single fused kernel with fixed operation order. No thread scheduling variance. No accumulated drift.

01

Canonical operation order

Same sequence regardless of parallelism

02

IEEE 754 strict mode

Explicit rounding, no fast-math

03

Rust memory safety

No undefined behavior

04

SHA-256 output hashing

Cryptographic verification

Optimization Trade-offs

Why Not Just Disable Optimizations?

Other engines achieve determinism by disabling SIMD and avoiding GPU acceleration. Box2D and Rapier take this approach. For games, that tradeoff works.

For quant finance running Monte Carlo at scale? For defense systems requiring real-time performance AND certification? That tradeoff doesn't work.

Lu(x)i delivers determinism WITH full acceleration:

Capability Others Lu(x)i
SIMD (AVX-512, Neon) Disabled for determinism ✅ Enabled
GPU (CUDA, Vulkan) Not supported ✅ Full support
Throughput Game-scale 286.94B ops/sec

No tradeoffs. No compromises.

Verification

Prove It Without Re-Running

Every Lu(x)i response includes a SHA-256 hash of the output. Store the hash at computation time. Months later, re-run the same input. If hashes match, the computation is verified.

// Response Format
{
  "expr": "sin(x)*cos(x)",
  "x": [0.5, 1.0, 1.57],
  "y": [0.4207, 0.4546, 0.0007],
  "hash": "98bd97026a738671..."
}

Efficiency

Race to Idle

Faster computation means less time at peak power. Less time at peak power means less heat generated. Less heat means lower cooling costs and longer battery life.

2.35B
Operations Per Joule
Validated by TestFort QA Lab
See it visualized

The Art of Fugue

We Needed a Benchmark That Plays a Symphony

Most benchmarks test isolated operations. That is like playing a single note on a piano.

The Art of Fugue benchmark simulates a polyphonic mathematical workload with three concurrent "voices" of conflicting intensity: trigonometric identities, logarithmic decay, and discontinuous transcendentals.

The output is captured as a single SHA-256 hash. On most engines, this hash drifts across platforms. On Lu(x)i, the hash is identical on M1, H100, and L4.

View on GitHub