How Rocky Works
The pipeline at a glance
Rocky doesn’t extract topics one commit at a time. It batches diffs across a whole working session, pairs them with the Claude Code transcript and a cached project summary, and produces rich nodes with a question bank (~4 implication-grounded Q+A+clue triples each) — driven by an agent already sitting in your editor.
| Stage | What happens | Where |
|---|---|---|
rocky explore | Reads CLAUDE.md / README / docs / recent commits and synthesises a project context summary cached at ~/.rocky/summaries/<repo>.txt | Run once per project, again after major shape changes |
rocky post-commit (queue mode) | Silently appends the latest commit’s diff to a per-project queue. No LLM call. | Wired by rocky install claude-all |
/rocky-review (Claude skill, default) | Umbrella — runs /rocky-checkpoint then /rocky-promptiq back-to-back, prints one fused summary. Two passes deliberately stay sequential so the agent’s evaluative stance (generative vs critical) doesn’t bleed across them. | Invoked inside Claude Code, end of a session |
/rocky-checkpoint (Claude skill) | Drains the diff queue, reads the active session’s transcript, and writes nodes + generic question banks straight into the PKG. Stripped of repo-specific identifiers so the same question still works when the topic resurfaces in a different project. Uses the global dedup list — same topic across two projects becomes one node with repos[] accumulating. | Called by /rocky-review. Call directly from CI after every push. |
/rocky-promptiq (Claude skill) | Walks recent prompts, scores each against the prompting-quality rubric, writes the result via rocky prompt-eval. Refines the heuristic baseline with judgment the heuristic can’t apply. | Called by /rocky-review. Call directly on a slower CI cadence (weekly cron). |
/rocky-quiz (Claude skill) | Picks the weakest topics by recall_now, asks from the canonical question bank, records scores back to FSRS. | Invoked inside Claude Code any time |
/rocky-backfill (Claude skill) | Seeds the PKG from a project’s existing git history. Same extract-and-question loop as /rocky-checkpoint but driven by rocky checkpoint history instead of the post-commit queue, so it works on repos installed after the commits happened. | Invoked once per repo when adopting Rocky on an existing project |
The motivation: a single commit message like “feat: rotate refresh tokens” is too thin a context to ground good questions in. By batching at session end, the extractor has the project summary, the actual diffs, and the agent’s reasoning trail. And by running inside Claude Code itself rather than shelling out to a local Ollama on every Stop event, the per-turn latency drops to zero — the heavy step only happens when you ask for it.
The legacy Stop-hook → Ollama path still ships and is opt-in via
rocky install stop. It runs after every Claude turn and is useful if you don’t keep a Claude session open the whole day. The skill-driven default is faster and produces sharper extraction because it sees the full transcript at once.
Inspecting what was generated
The web UI is the canonical viewer — see Quick Start → Step 3 for screenshots of each tab. The Knowledge Map shows the topic graph; the Review Queue surfaces what’s most overdue; the Saga tab is a cinematic timelapse of the graph growing.
Cross-project dedup
When the checkpoint skill extracts topics, it’s fed the global topic list (across every project Rocky knows about) and asked to reuse exact names where a new finding is semantically equivalent. The result: a topic like Token Rotation lives as one node with a repos[] array that accumulates as the same idea reappears in another codebase.
▸ refactor: surface is_rotated helper for refresh token reuse checks
◇ Token Rotation (existing — encounter +1, repos: [taskify, home-bank])
Done. 0 new topic(s), 1 encounter update(s).
The web UI’s Project field on a topic shows everywhere it’s appeared. Layer-2 (semantic dedup via embeddings + cosine, beyond today’s lexical Jaccard) is on the backlog.
How the PKG classifies topics
Every topic has a recall score: recall_now = retrievability × mastery.
- Retrievability (R) — FSRS freshness from spaced repetition. Decays with time since the last review. Decays slower when stability is high (you’ve demonstrated solid understanding).
- Mastery (M) — mean of the last 3 review scores (default 0.5 if you’ve never been quizzed). Captures how well you’ve actually been answering, not just how recently.
Recall lifecycle:
- Topic created → stability set by kind (Concept 4.0, Pattern 2.5, Implementation 1.5), mastery defaults to 0.5
- Topic reviewed → score 0.0 to 1.0 is recorded. Stability rises if score ≥ 0.65; it floors at 1.0 + difficulty bumps if score < 0.4. The new score replaces the oldest in the last-3 mastery window.
- Over time → R decays. M is sticky until you take another quiz.
- Classification:
- recall ≥ 0.6 → Known — skipped automatically
- 0.3 ≤ recall < 0.6 → Fading — surfaced as a review candidate
- recall < 0.3 → Gap — full Socratic Q&A from the question bank
The multiplicative model is the point: a freshly-reviewed topic where you got the question wrong (R high, M low) is still a gap. Freshness alone doesn’t count as knowing.
Domain taxonomy
Every topic is assigned to one of 13 domains when it’s first extracted. Domains group topics in the PKG into subfolders and are used for Obsidian graph view clustering.
| Domain | Examples |
|---|---|
| Language | Rust lifetimes, Python decorators, Go channels |
| Database | SQL indexes, Redis TTL, Postgres transactions |
| Auth | JWT, OAuth2, RBAC, session tokens |
| API | REST design, GraphQL, WebSockets |
| Frontend | React hooks, DOM events, CSS layout |
| DevOps | Docker networking, CI/CD pipelines |
| Architecture | Event sourcing, retry patterns, microservices |
| Performance | Caching strategies, query optimisation |
| Security | OWASP, encryption, input validation |
| Testing | Unit vs integration, mocking, TDD |
| Tooling | Build systems, package managers |
| Data | Algorithms, data structures, ML concepts |
| Other | Anything that doesn’t fit above |
Use rocky classify to assign domains to any older topics that predate this feature.