Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

RunInfra is part of RightNow, a research lab co-designing models and hardware. The papers below are the public output of that work, focused on two areas that matter to anyone running open models in production. The full reading list with PDFs, arXiv links, and code repos lives at runinfra.ai/research. This page is a brief index of what is published and how to cite it.

Research areas

Compute efficiency

Faster, leaner, more memory-bounded ways to run models on existing hardware. Sparse attention, early exit, autonomous GPU kernel search.

Model architectures

New ways to compose, compute, and adapt model internals at training and inference time. Recursive transformers, causal world models.

Published papers

Compute efficiency

A memory-bounded sparse attention mechanism that selects top-k keys in a single streaming pass over the sequence, fused as a Triton kernel for production inference workloads.arXiv: 2605.02568 (PDF) Code: github.com/RightNow-AI/streamindex Tags: Sparse Attention, Streaming Top-k, Triton
Per-token early exit in LLM inference, driven by token-informed depth signals that decide when each token has enough computation to commit to a final logit.arXiv: 2603.21365 (PDF) Code: github.com/RightNow-AI/tide Tags: LLM Inference, Early Exit, Efficiency

Model architectures

A recursive transformer where each layer generates its own weights through input-conditioned LoRA modulation, enabling dynamic capacity allocation without storing additional parameters.arXiv: 2604.02051 (PDF) Code: github.com/RightNow-AI/ouroboros Tags: Transformers, LoRA, Weight Generation
Hierarchical causal latent state machines for object-centric world modeling, with explicit slots for entities and the causal relationships between them.arXiv: 2603.29090 (PDF) Code: github.com/RightNow-AI/hclsm Tags: World Models, Object-Centric, Causal

Authors

All papers are joint work by Jaber Jaber (RunInfra founder, RightNow) and Osama Jaber (RightNow). Correspondence to jaber@runinfra.ai.

How to cite

Each paper has the canonical BibTeX entry on its arXiv page. Use the arXiv id as the identifier.
@misc{streamindex2026,
  title  = {StreamIndex: Memory-Bounded Compressed Sparse Attention via Streaming Top-k},
  author = {Jaber, Jaber and Jaber, Osama},
  year   = {2026},
  eprint = {2605.02568},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
}