Non-provisional patent filed January 31, 2026 · Issuance expected early July 2026
A Rust-based transformer engine with CUDA, Metal, and WebGPU support. Optimizes joules per useful token through minimized HBM data movement, online-softmax causal attention, and full-layer geodesic fusion.
Geodesic fused transformer layer on GPU: LN → packed QKV GEMM → attention + Wo → residual + LN₂ → MLP. Focuses on measured ms/layer and joules/token. Supports fp16 by default. Validated from laptop CPU to data-center GPU (RunPod/H100 tested).
CPU f32 reference execution providing cryptographic SHA-256 receipts over output bit patterns. Enables reproducibility verification and compliance auditing independent of GPU hardware. Suitable for quant finance and high-trust workloads requiring a verifiable audit trail.
Online-softmax causal attention with O(N) HBM usage for attention scores — no materialized N×N matrix. Delivers measurable memory and energy reductions at long contexts while maintaining mathematical equivalence to standard softmax attention.
Waller Null-Space Multiplexing carries auxiliary payloads (e.g., scaling proofs) in MLP null-space with provable zero impact (0.00e+0 difference) on primary outputs. Enables in-band verification data without altering model behavior.
Bit-exact reproducibility across runs and hardware via SHA-256 over f32 bit patterns. 50+ passing tests. CPU full proof kit. GPU gates re-validated post recent QKV stride fixes.
Determinism and kernel performance have been validated in prior third-party testing (December 2025 TestFort report on foundational numeric kernels). Current full-layer TRADE/AUDIT benchmarks and quant receipt integration are available in the private repository for NDA review.
// Configure and run a decoder layer with WNSM + audit receipt let config = Config { d_model: 512, n_heads: 8, d_ff: 2048, lane: Lane::Trade, // or Lane::Audit for CPU f32 reference }; let decoder = WNSM_GAE_Decoder::new(&config)?; // Forward pass — returns output tensor + optional WNSM payload let (output, payload) = decoder.forward(&input, mask)?; // Generate SHA-256 receipt over output bit patterns let receipt = decoder.receipt(&output)?; println!("SHA-256: {}", receipt.hex());
Honest HBM data-movement accounting, not theoretical FLOP counts. Strong reductions versus naive score-matrix attention at longer sequences. Full-layer benchmarks versus PyTorch baselines are available in the repository. Measured in ms/layer and joules/token.
50+ passing tests covering correctness, determinism, and WNSM null-space impact. CPU full proof kit validates the entire layer against independent reference implementations. GPU gate suite re-validates after each architectural change.
Designed for workloads where long-context efficiency, reproducibility, and measured energy performance are essential: edge deployment, scientific computing, defense, and high-trust inference.
The engine evolves while preserving its core verification and energy-modeling discipline. Ongoing work targets further single-kernel geodesic optimizations, additional dtype support (BF16/FP8), and deeper layer fusion as outlined in GEODESIC_SWEEP_DESIGN.md and LUXIEDGE_BUILD_ROADMAP.md.
All technical claims are based on code and benchmarks in the attention-transformer-v2 repository. Repository is currently private; full source available under NDA.
The TRADE GPU path and AUDIT CPU path are production-ready. Core deterministic kernels are fully functional. The Waller Operator and WNSM components have demonstrated correct results in testing with verified zero null-space impact. GPU gate suite re-validated following recent QKV stride fixes. Recent integration of luxi-quant online statistics supports enhanced receipt backtesting for quantitative finance workloads. Full production integration and additional dtype support remain in active development.
Eric Waller · Proprietary technology · Full source available under NDA