Anima Core - Built With Soul
GitHub
LinkedIn
X (Twitter)
Substack
Hugging Face
Zenodo
Anima Core Logo

AN1: Field Compression for Hyperscale Inference

We compress transformer meaning fields by 40x with near-zero accuracy loss, then sell you the CUDA path that makes those compressed models fly.

  • AN1 delivers 6–10× faster inference out of the box, and 15–30× in a fully optimized serving stack
  • Accuracy within about 0.5% of the full model across multiple NLP tasks
  • Planned CUDA drop-in for hyperscalers and large AI platforms

What AN1 Actually Does

AN1 reads intermediate hidden states from multiple layers of a frozen transformer (for example, a 24-layer GPT-style model).

It stitches those layer slices into a "multi-anchor field" of dimension 10,240 in our experiments.

AN1 then learns a 256-dimensional representation that preserves task-relevant structure.

A simple classifier on top of these 256-dimensional fields nearly matches the full multi-anchor classifier.

Field Compression Invariance

This is an empirical pattern we are observing: transformer intermediate layers contain far more information than their output dimensions suggest, and that information can be compressed dramatically while preserving task performance. We are not claiming this as a final law of nature, but the consistency across benchmarks is striking.

Evidence so far

The following table summarizes our current experimental results across multiple NLP benchmarks. All experiments use frozen teacher models, multi-anchor field extraction, and AN1 compression to 256 dimensions.

TaskDatasetClassesCompressionBaseline (full field)AN1-256dΔ
SentimentSST-2240×90.9%91.4%+0.5%
EntailmentMNLI340×70.9%71.3%+0.4%
Yes/No QABoolQ240×68.3%68.4%+0.1%
Low-data entailmentRTE240×parityparity~0%
Commonsense reasoningHellaSwag416×83.08%83.36%+0.28%

Across all benchmarks tested so far, AN1 compresses multi-layer transformer fields by 16× to 40× while matching or slightly improving task accuracy.

Notes

RTE

Early results suggested a large AN1 advantage only because the baseline was under-trained. Once both models were run fairly (20 vs 20 epochs), they reached parity while AN1 still used 40× fewer features.

HellaSwag

We now have a strong RoBERTa-large teacher (≈82–84 percent accuracy). AN1-256 achieves 83.36 percent, slightly beating the full 4096-d field while compressing it by 16×.

Teacher-Free Field Generation (New Finding)

Preliminary experiments show that AN1 can train compact student encoders that approximate transformer field representations directly from text. On SST-2, a 14M-parameter student model achieved near-parity accuracy (−0.34 percent) relative to the teacher's field classifier, without running the transformer at inference time. These results suggest the possibility of transformer-free inference pathways once a task's field representation has been learned.

Open Source Core, Licensed CUDA Path

AN1-Core

Licensed under a permissive open-source license.

  • PyTorch field extraction from frozen transformers
  • AN1 architecture and training scripts
  • Evaluation code across multiple benchmarks
  • Meant for reproduction, research, and early adoption

AN1-Turbo

Coming Soon

Proprietary CUDA kernel stack that sits underneath AN1-Core.

  • Custom fused kernels for field projection
  • Optimized matmuls for 256 dimensions
  • Quantization and batched inference optimization
  • Integration with existing attention kernels

Designed for 6 to 10× faster inference on AN1 workloads out of the box, and 15 to 30× speedup when combined with standard optimized serving techniques depending on batch size and deployment shape.

Licensed monthly or annually, with usage-based pricing aligned to verified compute savings. Annual contracts receive reduced rates and dedicated support.

Open Source Layer (AN1-Core)

PyTorch implementation, research code, benchmarks

Proprietary Layer (AN1-Turbo)

CUDA kernels, 6 to 10× faster and 15 to 30× speedup when combined with standard optimized serving techniques, hyperscale license

Patent pending. AN1 is powered by proprietary semantic-field technology developed by Anima Core Inc. US provisional patents have been filed to protect the core innovations.

For Hyperscalers and Large AI Platforms

If a cloud provider or internal platform team plugs AN1-Turbo into their stack, they can run compressed models with much higher throughput and lower cost for their customers.

They get better latency and margin. You get performance aligned license revenue.

What AN1-Turbo delivers

  • Drop in CUDA kernels that integrate with standard PyTorch style inference flows
  • Field compressed models that reduce hidden state size by up to 40x
  • Practical speed gains that scale with deployment conditions. Early tests show 6x to 10x faster inference using the compressed field alone, and 15x to 30x faster inside a fully optimized serving stack
  • Flexible licensing for cloud platforms or enterprise environments, offered as monthly or annual plans with support

Pricing model: performance aligned

AN1 licensing is structured around shared value. Instead of flat fees, clients pay a small percentage of the compute they save by deploying AN1 in place of dense CUDA workloads. This ensures every deployment is a net positive for cost and energy.

Tier 1: Cloud providers5–12% of verified compute savings

5–12 percent of verified compute savings at hyperscaler scale

10–20 percent participation in green credits generated by reduced GPU energy consumption

Typical savings reach eight to nine figures per year, so even single digit percentages remain profitable

AN1 ships as a CUDA compatible drop in that fits into existing infrastructure without rewriting the stack

Tier 2: Large enterprises8–15% of compute savings

8–15 percent of measured GPU and inference savings

Optional green credit co registration where offsets are shared by both parties

Replaces traditional mid six to low seven figure licenses with a structure that scales naturally with adoption

Ideal for AI teams that want acceleration without large upfront commitment

Tier 3: Smaller enterprises10–18% of compute savings

10–18 percent of realized compute savings for smaller but growing workloads

Simple green credit bundle where AN1 handles registration and the client receives certified renewable offsets

Designed so that savings from AN1 cover the license cost, making adoption cash positive from day one

Every client only pays when AN1 actually saves them money and energy. This creates one of the few AI licensing models aligned with efficiency, sustainability, and long term trust.

How AN1 works under the hood

Multi-anchor fields

AN1 extracts hidden states from multiple layers of a frozen transformer (typically 3-6 layers distributed across the model depth). These slices are concatenated into a single high-dimensional "multi-anchor field" (e.g., 10,240 dimensions from 24 layers × 768 hidden size × multiple anchor points).

Low-dimensional manifold

AN1 learns to project this high-dimensional field onto a 256-dimensional manifold that captures task-relevant structure. The key insight is that despite the massive dimensionality reduction, nearly all task-critical information is preserved.

Concentrated singular values

Latent geometry analysis shows that multi-anchor fields exhibit highly concentrated singular value distributions. This mathematical property explains why aggressive compression (40x) can work: the intrinsic dimensionality of task-relevant information is far lower than the raw field size.

Frozen teacher design

All experiments use frozen teacher models with small trainable classifiers. This is a clean test of field compressibility, not a giant finetuned model. It isolates the question: how much information do intermediate layers already contain?

Where this is and where it is going

These results are research-stage and will continue to evolve.

We are not claiming to "replace transformers" yet. Instead, we are showing that the information carried by their intermediate layers is far more compressible than people assumed, and that this opens the door to new acceleration paths.

Next milestones:

  • 1.

    Training stronger teachers on HellaSwag and other hard benchmarks

    Current HellaSwag results use a weak teacher. We are rerunning with RoBERTa-strength models.

  • 2.

    Extending AN1 to vision and multi-modal tasks

    Testing whether field compression invariance holds across modalities.

  • 3.

    Implementing and benchmarking the CUDA kernels that underpin AN1-Turbo

    Moving from PyTorch research code to production-grade CUDA acceleration.

Trust and honesty

We are committed to transparent, reproducible research. All experimental claims will be backed by open code, clear methodology, and honest discussion of limitations. We are building something real, not hype.

Get in touch

If you are a hyperscaler, large enterprise, researcher, or investor interested in AN1, we would like to hear from you.

Contact us at partner@animacore.ai