LoopGraph is the memory layer for autonomous ML research. It remembers every experiment, ranks methods by what actually worked, and turns papers into the next winning edit.
Today’s coding agents start from scratch on every run. Grep is useful, but it is not memory: agents need active retrieval, code maps, provenance, and a record of which ideas worked.
Without memory, agents repeat experiments that already flopped.
Papers live in PDFs; agents can’t turn “use SwiGLU” into a working code change.
When something works, no one knows which method or prompt caused it.
LoopGraph wraps your existing training loop and adds a retrieval layer that gets smarter with every experiment.
Ingest papers from arXiv or GitHub, or use the curated method pack covering SwiGLU, GQA, Muon LR, sliding-window attention, and more.
The agent edits train.py, trains for a fixed 5-minute budget, and logs the result to results.tsv and ResearchFS.
ResearchFS scores each method by success rate, BPB delta, query fatigue, and recency — so the agent reuses winners.
Every retrieval, outcome, and token cost is stored locally. The next experiment starts smarter than the last.
A local SQLite knowledge store with full-text search, empirical scoring, and an agent-native SDK. It turns a passive code search into active experiment memory.
Combine full-text search with empirical rankings to surface the right method at the right time.
Methods are ranked by prior BPB improvement, win rate, and fatigue — not just keyword match.
Every method links back to its paper, chunk, retrieval event, and experiment outcome.
LoopGraph was compared against a baseline autonomous agent on the nanochat BPB benchmark. Same run harness, same fixed training budget; one agent received ResearchFS retrieval context and one did not.
MLP SwiGLU during the GLM run.| Run | Baseline best BPB | LoopGraph best BPB | Tokens saved | Time saved | Outcome |
|---|---|---|---|---|---|
| GPT-4o-mini | 2.204652 | 2.180112 | 13.0% | 17.3% | LoopGraph wins BPB, tokens, and time |
| GLM 5.2 | 2.128834 | 2.153019 | 10.5% | 9.5% | Baseline wins BPB; LoopGraph wins efficiency |
optimized_from_karpathy report mean BPB around 0.9109, while these quick local LoopGraph runs are around 2.13–2.20. The benchmark signal today is efficiency and retrieval behavior; the next milestone is closing the quality gap.Runs on real PyTorch + nanochat, not a toy environment. Tested on H100; MPS/CPU fallbacks included.
Your experiment history, papers, and API calls stay in a local SQLite database — no cloud lock-in.
Plug in OpenAI, OpenRouter, Venice, or local models. Switch models without rewriting the loop.
1,693 lines of tests cover parsing, ranking, fatigue, deduplication, and harness-agent behavior.
Join the early access list. We’re working with ML teams to turn LoopGraph into the default memory layer for autonomous research.