
AN1: Field Compression for Hyperscale Inference
We compress transformer meaning fields by 40x with near-zero accuracy loss, then sell you the CUDA path that makes those compressed models fly.
- •AN1 delivers 6–10× faster inference out of the box, and 15–30× in a fully optimized serving stack
- •Accuracy within about 0.5% of the full model across multiple NLP tasks
- •Planned CUDA drop-in for hyperscalers and large AI platforms
What AN1 Actually Does
AN1 reads intermediate hidden states from multiple layers of a frozen transformer (for example, a 24-layer GPT-style model).
It stitches those layer slices into a "multi-anchor field" of dimension 10,240 in our experiments.
AN1 then learns a 256-dimensional representation that preserves task-relevant structure.
A simple classifier on top of these 256-dimensional fields nearly matches the full multi-anchor classifier.
Field Compression Invariance
This is an empirical pattern we are observing: transformer intermediate layers contain far more information than their output dimensions suggest, and that information can be compressed dramatically while preserving task performance. We are not claiming this as a final law of nature, but the consistency across benchmarks is striking.
Evidence so far
The following table summarizes our current experimental results across multiple NLP benchmarks. All experiments use frozen teacher models, multi-anchor field extraction, and AN1 compression to 256 dimensions.
| Task | Dataset | Classes | Compression | Baseline (full field) | AN1-256d | Δ |
|---|---|---|---|---|---|---|
| Sentiment | SST-2 | 2 | 40× | 90.9% | 91.4% | +0.5% |
| Entailment | MNLI | 3 | 40× | 70.9% | 71.3% | +0.4% |
| Yes/No QA | BoolQ | 2 | 40× | 68.3% | 68.4% | +0.1% |
| Low-data entailment | RTE | 2 | 40× | parity | parity | ~0% |
| Commonsense reasoning | HellaSwag | 4 | 16× | 83.08% | 83.36% | +0.28% |
Across all benchmarks tested so far, AN1 compresses multi-layer transformer fields by 16× to 40× while matching or slightly improving task accuracy.
Notes
RTE
Early results suggested a large AN1 advantage only because the baseline was under-trained. Once both models were run fairly (20 vs 20 epochs), they reached parity while AN1 still used 40× fewer features.
HellaSwag
We now have a strong RoBERTa-large teacher (≈82–84 percent accuracy). AN1-256 achieves 83.36 percent, slightly beating the full 4096-d field while compressing it by 16×.
Teacher-Free Field Generation (New Finding)
Preliminary experiments show that AN1 can train compact student encoders that approximate transformer field representations directly from text. On SST-2, a 14M-parameter student model achieved near-parity accuracy (−0.34 percent) relative to the teacher's field classifier, without running the transformer at inference time. These results suggest the possibility of transformer-free inference pathways once a task's field representation has been learned.
Open Source Core, Licensed CUDA Path
AN1-Core
Licensed under a permissive open-source license.
- •PyTorch field extraction from frozen transformers
- •AN1 architecture and training scripts
- •Evaluation code across multiple benchmarks
- •Meant for reproduction, research, and early adoption
AN1-Turbo
Coming SoonProprietary CUDA kernel stack that sits underneath AN1-Core.
- •Custom fused kernels for field projection
- •Optimized matmuls for 256 dimensions
- •Quantization and batched inference optimization
- •Integration with existing attention kernels
Designed for 6 to 10× faster inference on AN1 workloads out of the box, and 15 to 30× speedup when combined with standard optimized serving techniques depending on batch size and deployment shape.
Licensed monthly or annually, with usage-based pricing aligned to verified compute savings. Annual contracts receive reduced rates and dedicated support.
Open Source Layer (AN1-Core)
PyTorch implementation, research code, benchmarks
Proprietary Layer (AN1-Turbo)
CUDA kernels, 6 to 10× faster and 15 to 30× speedup when combined with standard optimized serving techniques, hyperscale license
Patent pending. AN1 is powered by proprietary semantic-field technology developed by Anima Core Inc. US provisional patents have been filed to protect the core innovations.
For Hyperscalers and Large AI Platforms
If a cloud provider or internal platform team plugs AN1-Turbo into their stack, they can run compressed models with much higher throughput and lower cost for their customers.
They get better latency and margin. You get performance aligned license revenue.
What AN1-Turbo delivers
- •Drop in CUDA kernels that integrate with standard PyTorch style inference flows
- •Field compressed models that reduce hidden state size by up to 40x
- •Practical speed gains that scale with deployment conditions. Early tests show 6x to 10x faster inference using the compressed field alone, and 15x to 30x faster inside a fully optimized serving stack
- •Flexible licensing for cloud platforms or enterprise environments, offered as monthly or annual plans with support
Pricing model: performance aligned
AN1 licensing is structured around shared value. Instead of flat fees, clients pay a small percentage of the compute they save by deploying AN1 in place of dense CUDA workloads. This ensures every deployment is a net positive for cost and energy.
5–12 percent of verified compute savings at hyperscaler scale
10–20 percent participation in green credits generated by reduced GPU energy consumption
Typical savings reach eight to nine figures per year, so even single digit percentages remain profitable
AN1 ships as a CUDA compatible drop in that fits into existing infrastructure without rewriting the stack
8–15 percent of measured GPU and inference savings
Optional green credit co registration where offsets are shared by both parties
Replaces traditional mid six to low seven figure licenses with a structure that scales naturally with adoption
Ideal for AI teams that want acceleration without large upfront commitment
10–18 percent of realized compute savings for smaller but growing workloads
Simple green credit bundle where AN1 handles registration and the client receives certified renewable offsets
Designed so that savings from AN1 cover the license cost, making adoption cash positive from day one
Every client only pays when AN1 actually saves them money and energy. This creates one of the few AI licensing models aligned with efficiency, sustainability, and long term trust.
How AN1 works under the hood
Multi-anchor fields
AN1 extracts hidden states from multiple layers of a frozen transformer (typically 3-6 layers distributed across the model depth). These slices are concatenated into a single high-dimensional "multi-anchor field" (e.g., 10,240 dimensions from 24 layers × 768 hidden size × multiple anchor points).
Low-dimensional manifold
AN1 learns to project this high-dimensional field onto a 256-dimensional manifold that captures task-relevant structure. The key insight is that despite the massive dimensionality reduction, nearly all task-critical information is preserved.
Concentrated singular values
Latent geometry analysis shows that multi-anchor fields exhibit highly concentrated singular value distributions. This mathematical property explains why aggressive compression (40x) can work: the intrinsic dimensionality of task-relevant information is far lower than the raw field size.
Frozen teacher design
All experiments use frozen teacher models with small trainable classifiers. This is a clean test of field compressibility, not a giant finetuned model. It isolates the question: how much information do intermediate layers already contain?
Where this is and where it is going
These results are research-stage and will continue to evolve.
We are not claiming to "replace transformers" yet. Instead, we are showing that the information carried by their intermediate layers is far more compressible than people assumed, and that this opens the door to new acceleration paths.
Next milestones:
- 1.
Training stronger teachers on HellaSwag and other hard benchmarks
Current HellaSwag results use a weak teacher. We are rerunning with RoBERTa-strength models.
- 2.
Extending AN1 to vision and multi-modal tasks
Testing whether field compression invariance holds across modalities.
- 3.
Implementing and benchmarking the CUDA kernels that underpin AN1-Turbo
Moving from PyTorch research code to production-grade CUDA acceleration.
Trust and honesty
We are committed to transparent, reproducible research. All experimental claims will be backed by open code, clear methodology, and honest discussion of limitations. We are building something real, not hype.
Get in touch
If you are a hyperscaler, large enterprise, researcher, or investor interested in AN1, we would like to hear from you.
Contact us at partner@animacore.ai
