We don't sell "RAG". We choose the right architecture.
RAG is a family of strategies, not a buzzword. The key is choosing which one fits your data, your latency budget, and your traceability requirements.
One technique doesn't fit all
When someone says "let's implement RAG," the unresolved question is: which one? There are at least six distinct retrieval strategies, each with radically different trade-offs in latency, traceability, scale, and cost. Forcing vector RAG because "it's what everyone does" on a 20k-token corpus is like using a cargo bike to deliver an envelope — it works, but it makes no sense.
Our position: choose the architecture the corpus and use case deserve. And explain exactly why.
What powers this site's assistant
Our knowledge base is curated and small — around 28k tokens. At that scale, reading complete curated nodes into the model's context (Wiki-LLM, no embeddings) outperforms top-k vector chunking: it's deterministic, fully traceable, and cites the exact source it used. No vectors, no probabilistic retrieval, no hallucinations from a misretrieved chunk.
That doesn't mean this is the right architecture for everyone. It means it's the right one for this case.
The toolkit: six strategies and their trade-offs
Wiki-LLM: curated markdown nodes read whole into context — ideal up to ~100k tokens, maximum traceability. Context Engineering: knowledge encoded directly in the prompt, zero retrieval latency, for catalogs under 15k tokens. Vector RAG: semantic similarity over embeddings (pgvector, Pinecone), the classic baseline for medium corpora. Hybrid RAG: vector + BM25 with RRF fusion, catches exact terms vector search misses — SKUs, acronyms, product names. Agentic RAG: the model decides what to search and whether evidence suffices (ReAct loop), for large heterogeneous corpora. GraphRAG: entity and relationship graph with traversal and community summarization, for complex relational questions.
The selection criterion isn't which is trending. It's which is correct for the corpus.