Querying 10,000 internal documents without sending a byte to OpenAI
What a document assistant with answers traceable to their source looks like — and why traceability matters more than the model.
The problem isn’t answering — it’s answering with a source
An assistant that "sounds confident" but fabricates is worse than no assistant. The real value of a document system is that every answer points to the exact source it came from: the user can verify, and compliance can audit. Without traceability it isn’t a business tool — it’s a liability.
The pipeline, with nothing leaving
Documents are chunked, embedded with a local embedding model, and indexed in pgvector or Chroma inside your infrastructure. On each question the relevant chunks are retrieved and a private LLM drafts the answer citing its origin. No document or query ever touches an external API.
Architecture is chosen by the case, not the trend
At personal or team scale, reading whole curated nodes into the model’s context beats vector chunking: deterministic and traceable. At millions of documents, vector or hybrid (vector + keyword) retrieval is still necessary. The expertise is in choosing — not forcing one technique.