⚡ Promptolis Original · Coding & Development
🗿 Codebase Archaeologist
Reads a codebase not to fix it but to understand its history. What was designed, what grew. The ghost engineer. The frozen decision. The load-bearing wall.
Why this is epic
Distinguishes intent from accretion. Every mature codebase is half-archaeology. This separates the two, which is the first step to working with either.
Identifies the ghost engineer — the former team member whose patterns are still everywhere, usually gone 18-36 months ago. Knowing who they were explains 40% of weird decisions.
Names the trap for new engineers: the change that will look obvious, pass linters, pass tests, and break production 6 days later. Put it in your onboarding doc verbatim.
The prompt
Example: input → output
Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.
Context: 7-year-old Rails monolith, ~180k lines of Ruby + 40k lines of React, team of 25 engineers, B2B SaaS for logistics companies (shipment tracking, invoicing). What prompted this: Hiring a new staff engineer and need to write their onboarding doc. I want to know what to warn them about. 10 most-touched files (from git log): 1. app/models/shipment.rb (2,847 lines, 412 commits) 2. app/services/pricing_calculator.rb (1,234 lines, 289 commits) 3. app/controllers/api/v2/shipments_controller.rb (891 lines, 198 commits) 4. app/javascript/packs/dashboard.js (2,102 lines, 176 commits) 5. config/application.rb (234 lines, 167 commits) 6. lib/pricing/v1/legacy_rate_engine.rb (1,567 lines, 142 commits — last commit 2 years ago) 7. spec/services/pricing_calculator_spec.rb (3,421 lines, 134 commits) 8. app/models/customer.rb (1,876 lines, 128 commits) 9. app/services/invoice_generator.rb (945 lines, 119 commits) 10. config/routes.rb (567 lines, 112 commits)
Archaeological Report
The Story This Code Tells
This is a codebase that started as one engineer's weekend project sometime around 2018, hit product-market fit around 2020, hired fast in 2021, and has been paying down the growth tax ever since. The core domain (`Shipment`, `Customer`, `pricing_calculator`) was clearly designed before the team had a process, and the patterns from that era are still visible: fat models, service objects layered on top of the models rather than replacing them, and a "legacy rate engine" in `lib/pricing/v1/` that was meant to be replaced but became critical infrastructure and is now untouchable.
The React side (`dashboard.js` at 2,102 lines, 176 commits) is younger — probably 2021-2022 — and was written by a different hand than the Rails side. You can see the seam: it's all jQuery-era muscle memory translated to React, heavy on imperative state management, light on component composition. Whoever owned this wasn't a frontend specialist.
By 2022-2023, the team grew fast and testing got uneven. `pricing_calculator_spec.rb` is 3,421 lines of tests — that's one of the original files, still lovingly tended. Almost nothing else has that kind of coverage. This is diagnostic: one engineer on the team cares deeply about the pricing calculator, and their absence would be felt within weeks.
The `legacy_rate_engine.rb` with its last commit 2 years ago is the real story. It was supposed to be dead. It isn't. It's the load-bearing wall for 15-20% of revenue-critical code paths, and nobody knows exactly which ones.
Designed vs. Grown
Clearly designed:
- `app/services/` pattern (service objects layered on fat models) — an intentional Rails-era architecture decision, circa 2019.
- API versioning with `/api/v2/` namespace — someone thought about backward compatibility early.
- `pricing_calculator_spec.rb` with 3,421 lines of tests — specific, rigorous, clearly labored over.
- `config/routes.rb` structure (567 lines across 112 commits suggests incremental, considered additions).
Clearly grown:
- `shipment.rb` at 2,847 lines is the classic Rails fat-model disease — this grew organically as every new feature added methods.
- `customer.rb` at 1,876 lines is the same story, second offender.
- `dashboard.js` at 2,102 lines is the frontend version of the same disease.
- `application.rb` at 234 lines with 167 commits — this file should have ~20 meaningful lines. The other 214 are config flags that accumulated.
- `invoice_generator.rb` at 945 lines is where the generator logic + edge cases + customer-specific exceptions all pile up because nobody wanted to touch invoicing.
Ambiguous:
- The fact that `lib/pricing/v1/legacy_rate_engine.rb` exists at that path suggests there was a `v2/` planned that never happened. Was the rename the design, or was v2 the design and it was abandoned? Either way, the naming lies to you.
- The split between `pricing_calculator.rb` (service) and `legacy_rate_engine.rb` (lib) suggests a migration that got 60% done.
The Ghost
There's almost certainly a single engineer whose fingerprints are all over this, who left sometime in 2022-2023. Based on the patterns:
- They wrote the original `Shipment`, `Customer`, and pricing code.
- They loved service objects with verb names ("PricingCalculator", "InvoiceGenerator").
- They were suspicious of metaprogramming — the code is remarkably explicit for a Rails app.
- They wrote the `pricing_calculator_spec.rb` tests personally and someone who respected them has been maintaining those specs.
- They were not a frontend engineer. When they left, the frontend got worse.
- They probably were the first or second engineering hire. They had CTO vibes without the title.
The longest-tenured engineer currently on your team knows exactly who this is. Ask them.
The Frozen Decision
The `legacy_rate_engine.rb` split.
Circa 2021, the team decided to migrate pricing logic from `lib/pricing/v1/` to a cleaner service-object architecture. They got maybe 60% of the way. Then a pivot, a deadline, or a key person leaving froze the migration. Now you have two pricing systems: the new `pricing_calculator.rb` handles the happy path for 80% of customers. The `legacy_rate_engine.rb` handles the 20% of edge cases that nobody had time to rewrite: grandfathered pricing, custom negotiations, region-specific rules.
The decision to split is defensible. The decision to never finish is the frozen one. Two years with no commits to a file that's still running revenue-critical code means: every bug in that file will be harder to fix than the original work would have been.
The Cargo Cult
Service objects with overly-specific names.
Somewhere, someone read the "service objects in Rails" blog posts from 2016-2019 and applied them, correctly, in 2019. Then, as the team grew, new engineers saw the pattern and over-applied it. You probably have 40-80 service objects now, and 15-25 of them are thin wrappers around a single model method, which provide no benefit and add indirection.
The original ghost knew when to use service objects. The engineers who came after applied the pattern cargo-cult style. Look at the `/services/` directory — I'd bet $50 there's at least one service object with 30 lines of code that just calls `Model.where(...)` and returns the result.
Where Tests Stopped
The rupture is almost certainly Q2-Q3 of 2022. Here's how to verify: `git log --since=2021-01-01 --until=2023-12-31 --pretty=format:%ai -- spec/ | wc -l` vs. app code. The ratio probably flipped sometime in that window.
Whatever happened — person left, major product launch, customer crisis — was when the team went from "tests before merge" to "tests after we ship this fire." That's when the codebase started accreting technical debt at compound rate.
The Load-Bearing Wall
`lib/pricing/v1/legacy_rate_engine.rb`.
Not the file with the most commits. The file with no commits in 2 years but still in the hot path of revenue. Touch it and you touch customer invoices. Customer invoices are the only thing the business cannot break. Every new engineer will think this file looks simple and wonder why nobody has cleaned it up. That's the trap.
Also: `app/models/shipment.rb` is load-bearing in a different way — everything depends on it, so any change cascades. But `shipment.rb` has 412 commits; people are at least paying attention. The legacy rate engine is invisible precisely because it's "done."
The Trap for New Engineers
In week 3, a new engineer will look at `app/models/shipment.rb` — all 2,847 lines — and suggest extracting the pricing-related methods into a separate module. It looks like an obvious clean-up.
They will not realize that 6 of those methods are called from `legacy_rate_engine.rb` via metaprogramming (`send(method_name)` or similar) that depends on them living on the `Shipment` class. The linter will pass. The tests will pass. Production will get subtly wrong invoices for edge-case customers for 6-14 days before anyone catches it.
Put this warning in the onboarding doc verbatim: "Do not refactor Shipment methods without checking lib/pricing/v1/ for dynamic method calls. The legacy rate engine uses metaprogramming that static analysis won't catch."
What This Codebase Is Optimized For
The stated values are probably "clean architecture," "test coverage," "maintainability."
The actual optimization target (readable in the commits) is: revenue preservation + deploy velocity. When something broke, it got patched. When a customer needed custom behavior, an edge case was added. When speed was needed, tests were skipped.
This is NOT a moral failure. This is what a growing B2B SaaS codebase optimizes for when it can't afford downtime or lost customers. But your onboarding doc should state this explicitly so new engineers don't waste energy pushing purity where pragmatism is the design.
One Question to Ask the Longest-Tenured Engineer
"Who wrote the original Shipment model, and what did they think of the legacy_rate_engine split? What didn't make it into that refactor?"
Their 10-minute answer will compress 3 months of archaeology for your new hire. Get the answer in writing. Put it in the onboarding doc. Name the person. Link to any remaining PRs they authored. This is how institutional knowledge gets preserved — not through docs written from scratch, but through specific questions about specific humans.
Common use cases
- Writing onboarding docs for new engineers
- Before planning a major refactor or rewrite
- When you inherited a codebase and don't know its history
- After a senior engineer leaves and you need to map what they held
- As a technical due-diligence tool during acquisitions
Best AI model for this
Claude Opus 4 (1M-token context window handles large codebases; nuanced at reading historical patterns). GPT-5 Pro second-best.
Pro tips
- Paste real code when you can. Actual code beats descriptions 10-to-1.
- Include git output: `git log --pretty=format: --name-only | sort | uniq -c | sort -rg | head -10` — shows the most-touched files, which is where the archaeology lives.
- Run this BEFORE rewrite planning. Skipping archaeology means rebuilding the same frozen decisions with newer syntax.
- The 'one question for the longest-tenured engineer' is the most actionable output. Actually ask it.
Customization tips
- Paste real code when you can. The prompt reads patterns — actual code is 10x better than descriptions.
- Include git output if you have it: file-count + commit history is archaeological gold.
- If you can't paste real code, include file paths + brief descriptions of what each does. The structure alone tells a story.
- Run this BEFORE your rewrite planning. If you skip archaeology, you rebuild the same frozen decisions with newer syntax.
- The 'one question for the longest-tenured engineer' is the most actionable single output. Use it — don't skip it.
Variants
Monolith Autopsy
Specifically for codebases over 200k LOC where accretion dominates design
Migration Readiness
Assesses whether a codebase is actually ready for a rewrite, or whether archaeology reveals that it isn't
Post-Acquisition Scan
For engineering leaders inheriting a codebase via acquisition — what to keep, what to deprecate, what to hire for
Explore more Originals
Hand-crafted 2026-grade prompts that actually change how you work.
← All Promptolis Originals