RAG· May 14, 2026 · 5 min read

Querying 10,000 internal documents without sending a byte to OpenAI

What a document assistant with answers traceable to their source looks like — and why traceability matters more than the model.

The problem isn’t answering — it’s answering with a source

An assistant that "sounds confident" but fabricates is worse than no assistant. The real value of a document system is that every answer points to the exact source it came from: the user can verify, and compliance can audit. Without traceability it isn’t a business tool — it’s a liability.

The pipeline, with nothing leaving

Documents are chunked, embedded with a local embedding model, and indexed in pgvector or Chroma inside your infrastructure. On each question the relevant chunks are retrieved and a private LLM drafts the answer citing its origin. No document or query ever touches an external API.

Architecture is chosen by the case, not the trend

At personal or team scale, reading whole curated nodes into the model’s context beats vector chunking: deterministic and traceable. At millions of documents, vector or hybrid (vector + keyword) retrieval is still necessary. The expertise is in choosing — not forcing one technique.

Querying 10,000 internal documents without sending a byte to OpenAI

The problem isn’t answering — it’s answering with a source

The pipeline, with nothing leaving

Architecture is chosen by the case, not the trend

Tell us the problem your data hasn't solved yet.

Querying 10,000 internal documents without sending a byte to OpenAI

The problem isn’t answering — it’s answering with a source

The pipeline, with nothing leaving

Architecture is chosen by the case, not the trend

Tell us the problem your data hasn't solved yet.