Structure is performance.
Models have internal geometry. That geometry is full of friction. Dystrio is the first structural compiler for AI inference — we reshape models before they run, and emit standard artifacts that deploy in any stack.
Every efficient system in nature converges on the same geometry — the shape that moves the most through the least friction.
AI models have structure. That structure is inefficient relative to real workloads. The industry has optimized runtimes, kernels, and serving stacks — everything around the model. We optimize the model itself: its width, its topology, its internal allocation of compute.
We call this Structural Inference Optimization. It is a new layer of the inference stack — a compiler step between training and deployment that didn't exist before.
We don't change how models run. We change what they are.
MoE models pay for cross-GPU communication every time co-activating experts land on different ranks — even on NVLink. Forge observes routing patterns, builds a co-activation graph, and places experts where they belong. Same model. Same stack. Less friction.
Read-only observation. Output is a placement artifact you apply at deploy time. No runtime changes.
Models allocate uniform width across every layer regardless of actual activation demand. Sculpt measures that demand, physically rewrites MLP dimensions, stabilizes the result, and emits a standard dense model. Not masking. Not sparse pruning. Tensor shape recompilation.
Hugging Face compatible. No custom kernels. No sparse runtime. Loads in vLLM, SGLang, Transformers.
Reshape the model. Not the stack.
The output is a model, not a runtime. Works wherever models work.
Standard model files and placement JSON. Zero pipeline modification.
Structural optimization before quantization, fine-tuning, and serving. Makes every downstream step more efficient.
Every product quantifies expected gain and recommends whether to apply. When optimization won't help, Dystrio tells you.
Want to see Dystrio on your workload?
We're working with design partners running inference in production.