Technical reference

AI Knowledge Stack: The Reference FAQ

Forty answers on stack decisions, pricing, scaling, protocol, and how everything fits together.

A Short Preamble

This AI knowledge stack FAQ consolidates recurring technical queries found across the site's deep-dive guides. Each response provides a direct answer to accelerate implementation, while linking to comprehensive articles for detailed architectural analysis.

The content focuses on retrieval-augmented generation (RAG), vector databases, and the transition from static documentation to dynamic knowledge layers. It serves as a quick-reference layer for engineers and product managers auditing their current AI infrastructure.

If a specific technical scenario or tool integration is missing from this list, contact the editor at guerin@novcog.com to request an addition to the documentation.

Appendix · Questions

Reference: common questions

What is an AI knowledge stack?
An AI knowledge stack is the full technical architecture required to turn raw data into actionable AI intelligence. It comprises ingestion pipelines, vector databases for semantic storage, and a RAG (Retrieval-Augmented Generation) layer that allows LLMs to query your specific business context.
What's the difference between an AI knowledge stack and an AI knowledge base?
A knowledge base is the repository of information itself, whereas the stack is the infrastructure that powers it. The stack includes the tools for embedding data, the vector database for retrieval, and the orchestration layer that connects your data to a conversational interface.
Is pgvector production-ready?
Yes, pgvector is widely used in production for teams wanting to keep their embeddings alongside their relational data. It is highly stable and supported by major cloud providers like AWS RDS and Azure, though extremely high-scale workloads may eventually require a dedicated vector store.
What's the best embedding model for an AI knowledge stack?
The 'best' model depends on your trade-off between cost and accuracy. OpenAI's text-embedding-3-small is excellent for general purpose and cost, while Cohere or voyage-ai often provide superior performance for complex enterprise retrieval and domain-specific nuances.
Do I need Pinecone for my AI knowledge stack?
Not necessarily. While Pinecone is a powerful managed vector database that scales effortlessly, you can achieve similar results using pgvector in PostgreSQL or open-source alternatives like Milvus and Qdrant if you prefer more control over your data residency.
How much does a typical AI knowledge stack cost?
Costs vary wildly based on data volume. A small-scale setup using Supabase (pgvector) and OpenAI API calls can cost under $100/month, while enterprise stacks with millions of vectors and high-throughput RAG pipelines can reach thousands per month in compute and token fees.
What is Model Context Protocol (MCP)?
MCP is an open standard that allows AI models to seamlessly connect to external data sources and tools. Instead of writing custom integrations for every app, MCP provides a universal way for LLMs to fetch real-time context from your knowledge stack.
Which AI clients support Model Context Protocol (MCP)?
Claude Desktop is currently the primary driver and supporter of MCP. Other IDEs and agentic frameworks are rapidly adopting the protocol to allow users to plug in their own data sources without rebuilding the integration layer.
Can I use ChatGPT or Claude with my own knowledge base?
Yes, via RAG (Retrieval-Augmented Generation). You can either use built-in features like OpenAI's 'GPTs' for simple uploads or build a custom middleware that retrieves relevant snippets from your vector DB and feeds them into the prompt window.
How long does it take to build an AI knowledge stack?
A basic prototype using managed tools like LangChain and Pinecone can be built in a few days. A production-grade system with automated ingestion pipelines, evaluation frameworks, and security controls typically takes 2 to 4 months.
Do I need to know Python to build an AI knowledge stack?
While Python is the industry standard for AI orchestration (via LlamaIndex or LangChain), it isn't strictly required. Many developers use TypeScript/Node.js, and no-code platforms are increasingly offering 'drag-and-drop' RAG builders.
What's the difference between pgvector and a dedicated vector DB?
pgvector is an extension that adds vector capabilities to a relational database, allowing you to join metadata with embeddings in one query. Dedicated databases like Pinecone or Weaviate are built from the ground up for high-dimensional search and typically offer faster indexing at massive scales.
Should I use Supabase or self-host Postgres for my vector store?
Use Supabase if you want a managed experience with pgvector pre-installed and an instant API layer. Self-hosting is better only if you have strict regulatory requirements regarding data residency or need deep control over the underlying hardware.
How do I handle document updates in an AI knowledge stack?
You must implement a synchronization pipeline that tracks file hashes. When a document changes, the system should trigger a re-chunking process and update only the affected embeddings in the vector database to prevent stale information.
What chunking strategy works best for RAG?
Recursive character splitting is a reliable baseline, but 'semantic chunking'—which breaks text based on shifts in meaning—usually yields better retrieval. Overlapping chunks (e.g., 10-20%) is essential to ensure context isn't lost at the boundaries.
What is a typical latency budget for an AI knowledge stack?
For a responsive chat experience, aim for total round-trip latency under 2 seconds. This involves keeping retrieval (vector search) under 100ms and using streaming responses from the LLM to reduce perceived wait time.
Can I shard pgvector for higher performance?
Yes, though it is more complex than sharding standard tables. You can use tools like Citus to distribute pgvector data across multiple nodes, allowing you to scale horizontally as your embedding count grows.
How do I migrate my data from Notion AI to a custom stack?
You'll need to export your Notion pages via API or CSV, clean the Markdown formatting, and run them through an embedding model. Once embedded, you upload these vectors to your new database (e.g., pgvector) to maintain searchability.
How do I migrate from Pinecone to another vector database?
Since embeddings are just arrays of floats, you can export your vectors and metadata as JSON or Parquet files. You then bulk-upload these into your new target DB; however, if you change embedding models during the move, you must re-embed all data.
How do I scale an AI knowledge stack past 1 million entries?
At this scale, move from flat indexing to HNSW (Hierarchical Navigable Small World) graphs for faster approximate nearest neighbor search. You should also implement a multi-stage retrieval process: fast coarse filtering followed by a precise re-ranking step.
What typically breaks first when scaling an AI knowledge stack?
The ingestion pipeline usually breaks first, specifically the ability to handle real-time updates without creating duplicates. Following that, retrieval quality often degrades (the 'lost in the middle' problem) as the volume of retrieved context increases.
Is Nomic Embed as good as OpenAI's embedding models?
Nomic is highly competitive and offers a significant advantage for those needing open-source, locally hostable models. While OpenAI may lead in raw benchmark versatility, Nomic provides excellent performance with better transparency and data privacy.
Should I use hybrid search in my AI knowledge stack?
Yes. Hybrid search combines semantic vector search (intent) with traditional keyword search (BM25). This ensures that specific technical terms or unique IDs—which vectors sometimes blur—are found accurately.
Is LlamaIndex worth using for a knowledge stack?
Absolutely. LlamaIndex provides the essential 'glue' for RAG, offering advanced data connectors and query engines that would take months to build from scratch. It is currently the gold standard for structuring data for LLMs.
What's the difference between an AI knowledge stack and RAG?
RAG (Retrieval-Augmented Generation) is the specific technique of retrieving data to inform a prompt. The AI knowledge stack is the entire ecosystem—the databases, pipelines, and models—that makes RAG possible.
Can I build an open-source version of Glean?
Yes, by combining tools like Apache Airbyte for ingestion, Qdrant or Milvus for the vector store, and a frontend like Verba or Dify. The challenge is not the search, but building the deep connectors for every enterprise SaaS app.
Do I need an LLM to have an AI knowledge stack?
Technically, you can have a 'knowledge stack' that only performs semantic search (returning documents). However, without an LLM to synthesize those documents into an answer, you have a semantic search engine rather than a generative knowledge system.
What is the security story for self-hosted AI stacks?
Self-hosting provides maximum data sovereignty because your embeddings and raw text never leave your VPC. To secure it, you should implement row-level security (RLS) in your database to ensure users only retrieve documents they are authorized to see.
Can I share a single knowledge stack across multiple users?
Yes, but you must implement multi-tenancy. This is typically done by adding a `user_id` or `org_id` metadata tag to every vector and applying a hard filter during the retrieval phase so users don't see each other's private data.
What is NovCog Brain?
NovCog Brain is an integrated AI knowledge stack designed to automate the ingestion, structuring, and retrieval of enterprise data. It simplifies the complex RAG pipeline into a unified system for creating context-aware AI agents.