The Truth About How to Build a Better AI Knowledge Base with Graphify

Transforming Local Directories into High-Efficiency Knowledge Graphs for LLMs

Most people look at a massive folder of local files and see a chaotic mess. If you’ve ever tried to get an LLM to “understand” a large codebase or a folder full of research papers, you know the frustration: token limits get hit, context gets lost, and the AI starts hallucinating connections that don’t exist. You might have heard the hype about Andrej Karpathy’s post on his /raw folder, where he suggested there’s room for a new kind of tool. Well, the truth is, the gap between a pile of raw files and a structured, usable knowledge base is exactly where most projects go to die. That is why graphify was built.

It isn’t just another file crawler. It turns your local directories into a persistent knowledge graph, one that actually understands the relationships between your files, rather than just treating them as long strings of text.

How Graphify Works Under the Hood

The secret sauce here isn’t throwing everything into a vector database and hoping for the best. Instead, the tool performs a deterministic pass across 19 different programming languages using tree-sitter, a powerful incremental parsing library.

Here is the best part: this initial pass consumes zero tokens and zero API calls. By doing the heavy lifting locally before you even engage an LLM, you are saving money and avoiding unnecessary latency. Once the structure is mapped, the tool uses Claude to process your documentation, papers, and images in parallel.

“On a recent project, I tested this on a legacy Unity codebase. We had over 6,000 files, and within minutes, the tool surfaced nearly 4,000 hidden inheritance relationships that weren’t even documented in the primary files.”

Every connection it finds is tagged: is it confirmed, inferred, or uncertain? This distinction is vital. It means you aren’t just getting an “AI opinion”—you are getting a data-backed map of your project.

Why You Need a Local Knowledge Graph

If you are tired of watching your token costs skyrocket just because you asked an LLM to “look at this folder,” you aren’t alone. In testing, using a structured graph resulted in 71.5x fewer tokens per query than the standard approach of reading raw files.

Because it persists across sessions and merges automatically via git hooks whenever you commit, your “brain” for that project is always up to date. It works natively with Claude Code, meaning your assistant essentially gains a high-speed, local lookup table before it ever tries to answer a question.

Common Traps We Fall Into

One of the biggest mistakes developers make is trying to dump everything into a vector store. The problem? Vector stores are great for semantic similarity, but terrible for structural relationships. If you want to know “Which class inherits from X?” or “Who calls this specific function?”, a vector store will often fail.

Don’t fall for the “more data is better” trap. You need structured data, not just more raw context.

Getting Started with Graphify

You don’t need a complex setup. Since the graph never leaves your machine, you get the benefit of AI assistance without the privacy nightmare of uploading your entire codebase to the cloud.

  1. Install it via pip: pip install graphifyy
  2. Run the command in your project directory.
  3. Use graphify claude install to bridge it with your existing workflow.

The project is already gaining massive traction—over 6,000 stars in its first 48 hours—because it solves a problem we all face: the gap between “having” data and “understanding” it.

Frequently Asked Questions

Does the data leave my computer?
No. Graphify is designed with privacy in mind. There is no telemetry, no vendor lock-in, and it is GDPR compliant by design because the graph stays locally on your machine.

Can it handle non-code files?
Yes. While it excels at code analysis via tree-sitter, it also processes documentation, research papers, and images.

Does it require a paid API key for the initial scan?
No. The initial deterministic pass is performed locally, meaning you pay zero tokens for the structural mapping phase.

How does it handle updates to my codebase?
It uses git hooks. Every time you run a git commit, the graph is rebuilt or updated, ensuring your AI assistant is never looking at stale info.

Key Takeaways

  • Stop wasting tokens: Use structural mapping to reduce context window usage by over 70x.
  • Understand, don’t just search: Use deterministic parsing (tree-sitter) to find actual relationships, not just semantic guesses.
  • Keep it local: Maintain privacy and security by keeping your knowledge graphs on your own machine.
  • Automate the maintenance: Use git hooks to ensure your graph evolves alongside your code.

The next thing you should do is clone the GitHub repository and try it on a single directory today. You will be surprised by how much “hidden” information is already sitting in your folders.