AI Knowledge Stack
A reference publication on the stack that makes AI-era knowledge systems actually work: pgvector, Model Context Protocol, Supabase, and the architectural decisions that separate a memory from a mess.
What This Publication Is
Technical Decisioning for AI Operators
AI Knowledge Stack is a technical resource for operators making architectural decisions. It rejects the generic 'top 10 tools' format in favor of precise, data-driven analysis. The focus remains on solving specific engineering hurdles: selecting a vector database for exactly 100k embeddings, calculating Supabase costs at scale, or evaluating how the Model Context Protocol (MCP) resolves the N times M integration problem.
Most industry guides from Forrester or Gartner provide high-level buyer fluff that ignores implementation reality. This publication serves as a builder-to-builder alternative, prioritizing latency, token costs, and retrieval accuracy over marketing slide decks.
Content here centers on the AI knowledge stack through the lens of production stability. Examples include comparing pgvector performance against dedicated stores or analyzing the cost-per-query delta between different embedding models. The goal is to provide a blueprint for retrieval-first architectures that support autonomous agents and copilots without unnecessary overhead.
The Stack We Recommend
The Canonical Production Blueprint
For most production use cases, the recommended AI knowledge stack prioritizes modularity and cost-efficiency over proprietary lock-in. The core storage layer utilizes Supabase (PostgreSQL) with the pgvector extension, allowing relational data and vector embeddings to reside in a single database.
The protocol layer leverages the Model Context Protocol (MCP) to standardize how AI agents access external data sources. For embeddings, Nomic Embed provides a high-performance free tier, while OpenAI's text-embedding-3-small remains the benchmark for paid options at $0.02 per 1M tokens. Orchestration is handled via plain Python for simple RAG or LangGraph for complex agentic loops requiring state management.
| Component | Recommended Tool | Estimated Cost (Small Team) |
|---|---|---|
| Storage/Vector | Supabase + pgvector | $25 - $50 / mo |
| Embeddings | OpenAI text-embedding-3-small | Usage based (~$1-5 / mo) |
| Orchestration | Python / LangGraph | $0 (Self-hosted) |
| Total | Lean Stack | <$60 / mo |
This contrasts sharply with enterprise-heavy stacks combining Pinecone, AWS Bedrock, and custom middleware. Those configurations frequently exceed $500 to $2,000 per month for equivalent functionality due to managed service premiums and data transfer fees.
What Goes Wrong
Common Failure Modes in AI Architecture
Many teams over-engineer their AI knowledge stack by adopting dedicated vector databases like Pinecone or Weaviate before they hit the scale limits of pgvector. For datasets under several million vectors, a dedicated DB adds unnecessary network latency and operational complexity without providing measurable retrieval gains.
Another frequent error is building bespoke integrations for every AI client. This creates a maintenance nightmare that MCP was specifically designed to solve by decoupling the data source from the LLM interface. Additionally, relying solely on OpenAI embeddings often leads to cost spikes as datasets grow; failing to implement an eviction or pruning strategy once a table exceeds 1M rows results in degraded query performance and bloated storage costs.
Frameworks like LlamaIndex, Haystack, and Mem0 are powerful but frequently lead to over-engineering. Developers often wrap simple retrieval logic in layers of abstraction that make debugging difficult.
# Example of over-engineering: Avoid wrapping simple queries in 5+ framework layers.
# Instead, use direct SQL for vector search when possible:
SELECT content FROM documents
ORDER BY embedding <=> '[0.12, -0.23, ...]'
LIMIT 5;
Over-engineering the retrieval layer is the primary cause of high latency in production RAG systems.
How to Read the Rest
Navigating the Technical Documentation
This site is structured as a dependency graph for building an AI knowledge stack. Start with What is an AI Knowledge Base for foundational definitions, then move to the Build Guide for step-by-step implementation.
For specific architectural decisions, refer to these deep dives:
- Tools: A comprehensive stack comparison.
- vs-pinecone: When to move from pgvector to a dedicated store.
- mcp-architecture: Detailed protocol implementation for agents.
- vs-notion-ai: Evaluating custom stacks against SaaS alternatives.
- FAQ: Edge cases and troubleshooting.
For a live, operator-opinionated reference implementation, visit novcog.dev.