Autonomous research agents that learn

Give your research agent a second brain

LoopGraph is the memory layer for autonomous ML research. It remembers every experiment, ranks methods by what actually worked, and turns papers into the next winning edit.

~12
experiments / hour unattended
10–13%
fewer tokens vs. baseline
9–17%
faster wall-clock time
SQLite
local, auditable memory

Most agents forget everything

Today’s coding agents start from scratch on every run. They rediscover dead ends, ignore your team’s past experiments, and can’t connect a paper’s insight to your codebase.

Re-running failed ideas

Without memory, agents repeat experiments that already flopped.

Literature stays separate

Papers live in PDFs; agents can’t turn “use SwiGLU” into a working code change.

No provenance

When something works, no one knows which method or prompt caused it.

# Without LoopGraph: every run is a blank slate
agent.run() # propose → edit → train → evaluate → forget
# With LoopGraph: every run learns from the last
agent.use(LoopGraph()) # propose → retrieve ranked methods → edit → train → log → improve next proposal

How it works

LoopGraph wraps your existing training loop and adds a retrieval layer that gets smarter with every experiment.

1

Seed the knowledge base

Ingest papers from arXiv or GitHub, or use the curated method pack covering SwiGLU, GQA, Muon LR, sliding-window attention, and more.

2

Run experiments

The agent edits train.py, trains for a fixed 5-minute budget, and logs the result to results.tsv and ResearchFS.

3

Rank what works

ResearchFS scores each method by success rate, BPB delta, query fatigue, and recency — so the agent reuses winners.

4

Keep improving

Every retrieval, outcome, and token cost is stored locally. The next experiment starts smarter than the last.

Meet ResearchFS

A local SQLite knowledge store with full-text search, empirical scoring, and agent-native SDK. It is the reason LoopGraph beats a blank-prompt agent.

🔍

FTS5 hybrid search

Combine full-text search with empirical rankings to surface the right method at the right time.

📊

Success-weighted scoring

Methods are ranked by prior BPB improvement, win rate, and fatigue — not just keyword match.

🔗

Full provenance

Every method links back to its paper, chunk, and experiment outcomes.

# Query ResearchFS from your agent
from loopgraph import ResearchFSClient client = ResearchFSClient("researchfs.db") context = client.suggest_query( goal="improve validation bpb", current_code=train_py ) # Returns ranked method pack + experiment brief

It actually runs faster and cheaper

Head-to-head against a baseline agent on the nanochat BPB benchmark. Same model, same budget, one with ResearchFS and one without.

ModelAgentFinal val_bpbTokens usedWall time
gpt-4o-miniBaselinebaselinebaseline
gpt-4o-miniLoopGraphbetter−10–13%−9–17%
zai-org/glm-5-2Baselinebetterbaselinebaseline
zai-org/glm-5-2LoopGraph−10–13%−9–17%

Source: comparison.md · Fixed 5-minute training budget per experiment · Lower tokens and faster time are consistent wins across both models.

Built for researchers, not tourists

Single-GPU real training

Runs on real PyTorch + nanochat, not a toy environment. Tested on H100; MPS/CPU fallbacks included.

🔒

Local-first memory

Your experiment history, papers, and API calls stay in a local SQLite database — no cloud lock-in.

🧩

Provider agnostic

Plug in OpenAI, OpenRouter, Venice, or local models. Switch models without rewriting the loop.

🧪

Tested retrieval logic

1,693 lines of tests cover parsing, ranking, fatigue, deduplication, and harness-agent behavior.

Stop letting your agent forget

Join the early access list. We’re working with ML teams to turn LoopGraph into the default memory layer for autonomous research.