LuxiEdge is a Rust-based JSON math engine for dense numeric vector expressions (y = f(x)).
On real hardware, we have measured:
Unlike typical stacks where "faster" usually means "more power," LuxiEdge's specialization lets you go faster and cheaper to power on the right workloads.
Same JSON input → same output, even under heavy HTTP load. No hidden state. No Python.
/evaluate and /health endpoints over HTTPStateless microservice built with memory-safe Rust • Patent Pending
Most systems force a trade-off:
LuxiEdge was built to break that trade-off.
On the same hardware, you can either:
This is especially important at the edge, in vehicles, and in space, where watts and dollars both matter.
Third-party load testing confirms LuxiEdge stability and determinism under production conditions
Independent third-party validation of deterministic execution under concurrent load
Validation Partner
PFLB (QArea) — enterprise load testing platform used by Fortune 500 companies
Dojo Proxy Validation: Linear scaling from L4 to H100 confirmed
Investment Grade: Zero throttling. Stable thermals throughout 21-minute endurance test.
| Metric | L4 (Edge) | H100 (Hyperscale) | Factor |
|---|---|---|---|
| Throughput | 30.7B/s | 243.9B/s | 8.2× |
| Power | 72W | 366W | 5.1× |
| Efficiency | 426M/J | 670M/J | Stable ✅ |
| Peak Temp | — | 43°C | Cool ❄️ |
8.2× throughput increase from L4 to H100 with stable efficiency.
52% of TDP utilized—room for burst capacity when needed.
Validated for Tesla Dojo proxy workloads at hyperscale.
10-Minute Sustained FMA Test (FP16, 50M-element vector)
x = 1.5 * x + 2.0, FP16, 50M‑element vector
Confirms that LuxiEdge delivers tens of billions of operations per second and hundreds of millions of ops per joule on a single NVIDIA L4 over a realistic 10‑minute window.
This is a straightforward benchmark that can be reproduced with a simple FMA loop and NVIDIA's NVML power readings.
Test workload: `sin(x) * cos(x)`, FP16 precision, 4M‑element batches (more complex transcendental kernel)
This stress test demonstrates the upper bound on a more complex kernel. The primary public reference remains the 10‑minute FMA result above.
November 24, 2025 | NVIDIA L4 | RunPod | Investment-Grade Verified
Two Validated Configurations: LuxiEdge offers both throughput-optimized (30.7B ops/sec @ ~72W) and power-optimized (29.67B ops/sec @ 33.64W) deployment profiles. Choose based on your workload: maximum throughput or maximum energy efficiency.
The 'Gold Master' power-optimized configuration achieves 53% power savings and 2.1× energy efficiency improvement over initial targets, while maintaining 29.67B ops/sec throughput. This validates our thesis: deterministic, high-precision math can be delivered with massive energy efficiency suitable for hyperscale environments.
| Metric | Target | Gold Master Actual | Improvement |
|---|---|---|---|
| Throughput | 8.3B Ops/Sec | 29.67 Billion Ops/Sec | 3.5× 🚀 |
| Power Usage | < 72 W (TDP) | 33.64 Watts | 53% Savings ⚡ |
| Efficiency | 0.4 B Ops/J | 0.88 Billion Ops/Joule | 2.1× 🔋 |
| Thermals | < 70°C | < 37°C (Sustained) | Cool-Running ❄️ |
| Stability | 99.9% | 100% (0 Throttles/Errors) | Perfect ✅ |
53% power savings translates to millions in OpEx savings for hyperscale providers.
20-minute endurance test with zero thermal throttling, zero errors.
Architecture primed to scale linearly on next-gen GPU clusters.
LuxiEdge is live on Azure Container Apps for independent testing.
Request Access for Testing →PFLB Test 6143 (HTTP Load): 29,600 requests across 200 concurrent virtual users with zero errors. Production-ready architecture with deterministic latency guarantees under real-world conditions.
GPU Validation (10-Minute L4 FMA Test): Sustained 30.7B ops/sec for 600 seconds at ~72W with 426M ops/J energy efficiency. Validates production deployment reliability across realistic workloads.
Same input → same output, even under concurrent HTTP load. Stateless by design: no hidden caches or cross-request state. Implemented in memory-safe Rust; hot paths avoid unsafe.
Specialized for explicit numeric expressions of the form y = f(x) over large vectors. SIMD-optimized CPU backend (AVX-512/AVX2/ARM Neon) with runtime selection. GPU kernels for FP16/FP32 dense math on NVIDIA T4/L4-class GPUs.
Focus on operations per joule, not just peak FLOPs. Supports both GPU-accelerated servers and low-power edge ARM64 devices. Designed so that, on the right workloads, you can get more math per watt instead of trading speed for power.
On representative dense-math workloads we have measured:
≈426–428M ops/J sustained (10-min + 1-hour FMA on L4 FP16)
sin(x)*cos(x), 4M-batches, 815M ops/J (internal stress test)
Speedup vs 30M ops/sec SIMD baseline
8.3B planning target now exceeded by measured runs
NVIDIA L4 FP16 FMA @ ~72W, ~400M ARM Neon edge
Sustained across 10‑min, 1‑hour, and stress test runs
Primary Benchmark (FP16 FMA): 10‑minute and 1‑hour sustained runs on NVIDIA L4 confirm ~30.7–30.8B ops/sec and ~4.3×10⁸ ops/J. Secondary Stress Test: Transcendental workload (sin(x)*cos(x), 4M‑element batches) achieves 58.77B avg / 815M ops/J. These are reproducible, purpose‑built kernels demonstrating what LuxiEdge achieves on real hardware.
For the right math workloads, LuxiEdge can deliver both much higher throughput and better energy efficiency than typical Python/ML stacks – especially at the edge.
If you run control loops, signal processing, or analytics at the edge, you are usually constrained by:
LuxiEdge lets you:
/evaluate – Vectorized Expression EvaluationStateless JSON request:
POST /evaluate
{
"expr": "2*x + cos(x)",
"x": [1.0, 2.0, 3.0]
}
Response:
{
"y": [2.5403, 4.5839, 6.0100]
}
expr is a restricted mathematical expression (addition, multiplication, powers, standard trig, etc.)x is a scalar or vector of input values/healthService Health Check
Other endpoints (/bisect, /bisect_auto) exist for controlled root-finding, but /evaluate is the primary production path.
Clear boundaries prevent feature creep and set accurate expectations. Specialization is why these performance/energy numbers are possible.
Use SymPy instead if you need: Symbolic differentiation, equation solving, simplification
Why it matters: Symbolic preprocessing adds latency variance. LuxiEdge provides deterministic execution. Symbolic operations would break this guarantee.
Example:
• SymPy: d/dx(x² + sin(x)) → Symbolic derivative
• LuxiEdge: Evaluate 2x + cos(x) at GPU speed
Use JAX/TensorFlow instead if you need: Automatic differentiation, graph compilation, training loops, research workflows
Why it matters: General frameworks add graph compilation overhead and Python layers. LuxiEdge is purpose-built for explicit expression evaluation—no graph, no compilation, just fast deterministic math.
Example:
• JAX: jax.grad(lambda x: x**2 + sin(x)) → Auto-diff
• LuxiEdge: Evaluate 2*x + cos(x) at 58.77B ops/sec GPU
LuxiEdge is a stateless HTTP/gRPC microservice, not a Python library.
Why it matters: Embedding Python would eliminate the performance gains. We intentionally avoid Python overhead.
# NOT: pip install luxiedge
# YES: HTTP POST to /evaluate endpoint
curl -X POST http://localhost:8080/evaluate \
-d '{"expr": "sin(x)*cos(x)", "x": [1.0, 2.0]}'
LuxiEdge evaluates explicit expressions: y = f(x)
Why it matters: Implicit solving requires iterative methods with variable iteration counts = variable latency. This breaks deterministic execution guarantee.
Example:
• ❌ LuxiEdge can't solve: x² + y² = 1 (implicit)
• ✅ LuxiEdge can evaluate: y = sqrt(1 - x²) (explicit)
Use NumPy for: General array operations, linear algebra, broadcasting
LuxiEdge complements NumPy for: Mathematical expression evaluation at GPU speed
import numpy as np
x = np.linspace(0, 2*np.pi, 1000) # NumPy
# LuxiEdge: Evaluate at GPU speed
response = requests.post('http://...', ...)
Use xsimd/Highway if you need: Custom SIMD intrinsics, hand-optimized kernels
LuxiEdge provides: Pre-optimized cross-platform SIMD (AVX-512/AVX2/ARM Neon) with automatic runtime selection. No custom coding needed.
The world's fastest deterministic JSON math engine for dense numeric vector expressions (y = f(x))
Understanding where LuxiEdge excels vs. general frameworks
JAX excels at ML research with automatic differentiation and JIT compilation. LuxiEdge delivers production-grade deterministic execution for safety-critical systems where bit-exact reproducibility matters more than research flexibility.
LuxiEdge Advantage:
CuPy provides GPU-accelerated NumPy operations with Python integration. LuxiEdge delivers ≈30.7B ops/sec sustained with deterministic execution guarantees and cross-platform SIMD optimization (x86/ARM/GPU) beyond just GPU.
LuxiEdge Advantage:
TensorFlow is a general ML framework with graph compilation overhead. LuxiEdge is purpose-built mathematical computation with thousands of times speedup over interpreted evaluation and memory-safe Rust architecture.
LuxiEdge Advantage:
NumExpr optimizes CPU-based NumPy expressions (0.95-4x speedup). LuxiEdge delivers ≈30.7B ops/sec sustained with deterministic execution and cross-platform SIMD for real-time control systems.
LuxiEdge Advantage:
SymPy solves symbolic math problems (differentiation, simplification, solving). LuxiEdge evaluates explicit expressions at GPU speed with deterministic results. Complementary, not competitive.
LuxiEdge Advantage:
Real-time expression evaluation for manufacturing automation, process control systems, and robotics motion planning with deterministic execution guarantees.
Accelerate pre/post-processing for machine learning workloads, data normalization, feature engineering, and batch transformations.
Power path planning, sensor fusion calculations, and real-time decision-making for autonomous vehicles, drones, and navigation systems.
Fast computation for satellite trajectory optimization and orbital mechanics. Efficient math for power-limited space applications.
Luxi™ is available for white-label licensing, strategic partnerships, and custom enterprise deployments. Our NDA Partner Program provides early access to roadmap features and dedicated integration support.
Deploy under your brand with custom SLAs and support agreements
Patent Pending • Commercial license (LicenseRef-Luxi-Business-1.0) with NDA coverage
Direct engineering access and custom integration guidance