LoopGraph is the memory layer for autonomous ML research. It remembers every experiment, ranks methods by what actually worked, and turns papers into the next winning edit.
Today’s coding agents start from scratch on every run. They rediscover dead ends, ignore your team’s past experiments, and can’t connect a paper’s insight to your codebase.
Without memory, agents repeat experiments that already flopped.
Papers live in PDFs; agents can’t turn “use SwiGLU” into a working code change.
When something works, no one knows which method or prompt caused it.
LoopGraph wraps your existing training loop and adds a retrieval layer that gets smarter with every experiment.
Ingest papers from arXiv or GitHub, or use the curated method pack covering SwiGLU, GQA, Muon LR, sliding-window attention, and more.
The agent edits train.py, trains for a fixed 5-minute budget, and logs the result to results.tsv and ResearchFS.
ResearchFS scores each method by success rate, BPB delta, query fatigue, and recency — so the agent reuses winners.
Every retrieval, outcome, and token cost is stored locally. The next experiment starts smarter than the last.
A local SQLite knowledge store with full-text search, empirical scoring, and agent-native SDK. It is the reason LoopGraph beats a blank-prompt agent.
Combine full-text search with empirical rankings to surface the right method at the right time.
Methods are ranked by prior BPB improvement, win rate, and fatigue — not just keyword match.
Every method links back to its paper, chunk, and experiment outcomes.
Head-to-head against a baseline agent on the nanochat BPB benchmark. Same model, same budget, one with ResearchFS and one without.
| Model | Agent | Final val_bpb | Tokens used | Wall time |
|---|---|---|---|---|
| gpt-4o-mini | Baseline | — | baseline | baseline |
| gpt-4o-mini | LoopGraph | better | −10–13% | −9–17% |
| zai-org/glm-5-2 | Baseline | better | baseline | baseline |
| zai-org/glm-5-2 | LoopGraph | — | −10–13% | −9–17% |
Source: comparison.md · Fixed 5-minute training budget per experiment · Lower tokens and faster time are consistent wins across both models.
Runs on real PyTorch + nanochat, not a toy environment. Tested on H100; MPS/CPU fallbacks included.
Your experiment history, papers, and API calls stay in a local SQLite database — no cloud lock-in.
Plug in OpenAI, OpenRouter, Venice, or local models. Switch models without rewriting the loop.
1,693 lines of tests cover parsing, ranking, fatigue, deduplication, and harness-agent behavior.
Join the early access list. We’re working with ML teams to turn LoopGraph into the default memory layer for autonomous research.