How It Works Benchmarks H100 Math Competitors Simulator Whitepaper Download

Non-provisional patent filed January 31, 2026  ·  Issuance expected early July 2026

Deterministic.
Receipt-attested.
Built for quant risk.

A Rust quant engine that produces bit-exact, reproducible risk outputs on the f64 CPU gold path — with a cryptographic SHA256 receipt on every run. The same determinism thesis is being extended onto GPUs.

LuxiEdge is a deterministic, receipt-attested quant risk engine. Its f64 CPU path produces bit-exact, reproducible results across CPUs (compiled with -fp-contract=off), and every evaluation carries a cryptographic SHA256 receipt. The audited risk pipeline is built on f64 linear algebra — Kahan-compensated covariance and matmul for portfolio variance — and deterministic normcdf/exp primitives for Black–Scholes-style Greeks. Written in memory-safe Rust with a stateless REST interface, it is designed for desks where reproducibility and auditability matter more than peak throughput.

The same determinism thesis is advancing onto accelerators. The quant engine already accelerates covariance and matmul on a wgpu GPU path, with Greeks held on deterministic CPU primitives. In parallel, the luxi-jit expression compiler proves the thesis on NVIDIA GPUs: bit-exact across Ampere (A100) and Hopper (H200) on rms_norm (0 ULP at both tested sizes), with other operations bounded by a disclosed sub-2-ULP envelope. luxi-jit is the consolidation path for GPU-scale work — proven in isolation, integrating next.

SHIPPING f64 CPU Risk Pipeline

Audited risk receipts, today. Kahan-compensated matmul, Welford sample covariance (--online-cov), and deterministic normcdf/exp for Black–Scholes-style Greeks. Bit-exact across CPUs when compiled with -fp-contract=off. Every run produces a SHA256 receipt over the canonical output vector.

The --gpu flag accelerates covariance and matmul via wgpu. Greeks remain on deterministic CPU primitives. wgpu GPU path is same-hardware reproducible, not bit-exact equivalent to the f64 CPU oracle — this is disclosed when enabled.

PROVEN → INTEGRATING luxi-jit GPU Acceleration

The same determinism thesis extended onto NVIDIA GPUs via Triton. 12 preset expressions verified: gelu, silu, softplus, tanh, cross_entropy, rms_norm, and others. Receipt model: CPU f32 oracle is authoritative; GPU drift bounded and disclosed in response fields.

Cross-architecture proof point: rms_norm achieves 0 ULP bit-exact on both A100 SXM4 80GB (Ampere SM 8.0) and H200 (Hopper SM 9.0), at small and 64K sizes. Artifacts available in results/determinism_results.json (H200) and results/determinism_results_a100.json (A100) in the luxi-jit repository. Consolidation with the risk pipeline is the documented next step.

Quant Engine — Shipped Capabilities
Receipt-Attested Outputs

Every risk pipeline evaluation produces a SHA256 receipt over the canonical f64 output vector. Receipts are deterministic across CPUs compiled with -fp-contract=off, enabling reproducibility verification and regulatory audit trails.

Deterministic Linear Algebra

Kahan-compensated matmul and Welford online covariance provide stable, reproducible portfolio variance estimates. Deterministic normcdf and exp primitives underpin Black–Scholes-style Greeks. Receipt hot path: linear algebra + normcdf + exp + scalar sqrt.

REST Interface

Stateless REST API. The /evaluate endpoint can route to the luxi-jit companion on :10000 when backend=auto|triton|flashinfer. When deterministic=true, the CPU oracle remains the receipt authority — the hash covers oracle output, not raw GPU output.

Determinism Scorecard — luxi-jit Preset Expressions
Oracle SHA256 and GPU bit-exact status as of 2026-06-15. Source: luxi-jit repository artifacts.
Metric Result Notes
Oracle SHA256 (CPU f32) 12 / 12 All 12 preset expressions produce stable CPU oracle receipts across runs.
GPU bit-exact (vs. oracle) 5 / 12 5 expressions achieve 0 ULP match to CPU oracle on GPU. Remainder bounded at ≤2 ULP spacing; drift disclosed in response fields.
rms_norm — cross-architecture 0 ULP Bit-exact on A100 SXM4 80GB (Ampere SM 8.0) and H200 (Hopper SM 9.0), both small and 64K input sizes. Headline cross-architecture proof point.
Kernel-only speed (GELU, 4M, H200) 9,455× Kernel-only benchmark vs. CPU. Does not include receipt-overhead wrapper. Label: kernel-only, not full pipeline.
What's Shipping, What's Next, What's in Development
✓ Shipping Today
  • f64 deterministic risk pipeline with SHA256 receipt
  • Kahan matmul, Welford covariance, normcdf/exp Greeks
  • wgpu GPU covariance/matmul (same-hardware reproducible; not equivalent to CPU oracle)
  • LuxiEdge REST API with luxi-jit routing for expression evaluation
  • CPU oracle receipt authority when deterministic=true
  • Memory-safe Rust implementation
→ Proven, Integrating Next
  • luxi-jit: 12 preset expressions verified via CPU oracle (12/12 SHA256)
  • rms_norm 0 ULP bit-exact: A100 and H200 (cross-architecture proof point)
  • 5/12 GPU presets bit-exact to oracle; remainder bounded ≤2 ULP
  • luxi-jit → risk-pipeline consolidation (documented in luxi-jit README)
  • luxi-quant online statistics for enhanced receipt backtesting
○ In Development
  • Geodesic Attention Engine (GAE) — active development, not production
  • Adiabatic Transform Engine (ATE) — early testing
  • Additional dtype support: BF16, FP8
  • Full-layer geodesic kernel optimizations
  • Expanded TestFort energy and determinism validation
REST API — Risk Evaluation with Receipt
# Submit a risk evaluation; receipt hash covers CPU oracle output
POST /evaluate
{
  "expression": "normcdf(x)",
  "inputs":     [0.5, 1.0, -0.25],
  "deterministic": true    // CPU oracle path; SHA256 receipt attached
}

# Route to luxi-jit GPU companion (receipt hashes oracle, not raw GPU output)
POST /evaluate
{
  "expression": "gelu(x)",
  "inputs":     [...],
  "backend":    "triton",
  "deterministic": true
}
Technical Characteristics
  • 01 Bit-exact on the f64 CPU gold path — reproducible across CPUs compiled with -fp-contract=off; every run receipt-attested via SHA256
  • 02 Oracle-attested receipts — CPU f32 oracle is authoritative for luxi-jit GPU expressions; GPU drift bounded and disclosed in response fields
  • 03 Ultra-low footprint — Rust kernel binary is smaller than a typical phone photo; minimal attack surface (qualifier: applies to Rust kernel binary, not full luxi-jit Python/Triton deploy)
  • 04 Memory-safe Rust core — risk pipeline and kernel primitives written in Rust; stateless REST interface for integration

Designed for quant desks, model risk teams, and safety-critical deployments where reproducibility and auditability take precedence over peak throughput — edge, scientific computing, defense, and high-trust inference.

All technical claims are traceable to code and artifacts in the luxi-quant-engine and luxi-jit repositories. Both repositories are currently private; full source available under NDA. Ongoing work follows the roadmap documented in LUXIEDGE_BUILD_ROADMAP.md.

GPU acceleration is proven in isolation and integrating next. No claim is made that the risk pipeline currently runs on luxi-jit Triton GPUs.

Try the Demo Binary

Pre-built binaries for Mac ARM64 and Linux x86_64 are available on GitHub. Download, run, and see the deterministic receipt output for yourself — no build toolchain required.

LuxiDemo on GitHub
Validation: Determinism and kernel performance have been validated in prior third-party testing (December 2025 TestFort QA Lab report on foundational numeric kernels). Current full-layer TRADE/AUDIT benchmarks and quant receipt integration are available in the private repository for NDA review. Expanded determinism and energy validation is planned as the engine matures.

Ready for confidential technical discussion

Contact: e@ewaller.com

Eric Waller  ·  Proprietary technology  ·  Full source available under NDA