Technical reference

Build an AI Knowledge Stack in an Afternoon

A step-by-step build: Supabase + pgvector + an MCP server + a basic ingestion loop. Runnable in under three hours, runnable forever for under ten dollars a month.

What You'll Have When You're Done

The final architecture is a production-ready semantic memory system. By following this guide, the end state is a Supabase-backed knowledge store utilizing pgvector for high-dimensional embedding storage, connected to an AI agent via the Model Context Protocol (MCP). This allows tools like Claude Desktop or Cursor to query private documentation in real-time.

The stack consists of four primary components: a PostgreSQL database with vector capabilities, a Python-based ingestion pipeline for processing Markdown files, PDFs, and chat logs, an MCP server acting as the bridge, and a LLM interface. Implementation typically requires under three hours of active development.

Operational costs remain low. With Supabase's free tier or basic plan and OpenAI's text-embedding-3-small model, total monthly expenditure ranges between $8 and $12, depending on the volume of documents processed. This provides a scalable foundation to build an AI knowledge stack without significant infrastructure overhead.

Step 1: Supabase and Schema

Initialization begins with a Supabase project. The core requirement is the pgvector extension, which transforms PostgreSQL into a vector database capable of performing cosine similarity searches on embeddings.

To build an AI knowledge stack, the schema must support both the raw content for LLM context and the vector representation for retrieval. Using a UUID primary key ensures compatibility across distributed systems, while JSONB allows for flexible metadata filtering (e.g., filtering by date or author) without altering the table structure.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE knowledge_entries (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content TEXT NOT NULL,
    embedding VECTOR(1536), -- Optimized for OpenAI text-embedding-3-small
    source_uri TEXT NOT NULL,
    source_type TEXT,
    metadata JSONB,
    created_at TIMESTAMPTZ DEFAULT now()
);

-- IVFFlat index for faster retrieval on medium datasets
CREATE INDEX ON knowledge_entries 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 100);

The IVFFlat index is selected here for initial setup due to its speed in building. For datasets exceeding 100,000 rows, transitioning to an HNSW index is recommended to maintain high recall rates during semantic search.

Step 2: Ingestion Pipeline

The ingestion pipeline converts unstructured data into searchable vectors. The process involves reading source files, splitting text into manageable chunks to avoid LLM context window overflow, and generating embeddings via an API provider like OpenAI or a local Nomic instance.

To prevent duplicate entries, the script implements an upsert logic based on a hash of the content combined with the source_uri. For production environments, replacing basic paragraph splitting with LangChain's RecursiveCharacterTextSplitter is advised to maintain semantic coherence across chunks.

import hashlib
from supabase import create_client
from openai import OpenAI

client = OpenAI()
supabase = create_client("URL", "KEY")

def get_embedding(text):
    return client.embeddings.create(input=text, model="text-embedding-3-small").data[0].embedding

def process_file(path, source_type="markdown"):
    with open(path, 'r') as f: content = f.read()
    
    # Simple paragraph splitting
    chunks = [c.strip() for c in content.split('\n\n') if c.strip()]
    
    for chunk in chunks:
        content_hash = hashlib.sha256(chunk.encode()).hexdigest()
        vector = get_embedding(chunk)
        
        # Upsert based on unique content hash and source
        supabase.table("knowledge_entries").upsert({
            "content": chunk,
            "embedding": vector,
            "source_uri": path,
            "source_type": source_type,
            "metadata": {"hash": content_hash}
        }).execute()

process_file("./docs/architecture.md")

This pipeline allows users to build an AI knowledge stack by simply dropping files into a directory and running the script, ensuring the vector store remains synchronized with the source documentation.

Step 3: MCP Server

The Model Context Protocol (MCP) server acts as the interface between the LLM and the database. It exposes specific tools that the AI can call to retrieve relevant context or update its own knowledge base without manual SQL intervention.

Below is a conceptual implementation of an MCP server using the Python SDK, exposing two primary tools: search_knowledge for RAG retrieval and add_knowledge for real-time updates.

from mcp.server.fastmcp import FastMCP
import openai

mcp = FastMCP("KnowledgeStack")

@mcp.tool()
async def search_knowledge(query: str) -> str:
    """Search the knowledge base for relevant technical context."""
    vector = openai.embeddings.create(input=query, model="text-embedding-3-small").data[0].embedding
    # Call Supabase RPC for cosine similarity search
    res = supabase.rpc('match_documents', {'query_embedding': vector, 'match_threshold': 0.5}).execute()
    return "\n---\n".join([item['content'] for item in res.data])

@mcp.tool()
async def add_knowledge(text: str, source: str) -> str:
    """Add a new piece of information to the knowledge base."""
    vector = openai.embeddings.create(input=text, model="text-embedding-3-small").data[0].embedding
    supabase.table("knowledge_entries").insert({"content": text, "embedding": vector, "source_uri": source}).execute()
    return "Knowledge stored successfully."

To integrate this with Claude Desktop, add the following configuration to claude_desktop_config.json:


{
  "mcpServers": {
    "knowledge-stack": {
      "command": "python",
      "args": ["/path/to/mcp_server.py"]
    }
  }
}

Step 4: Operate It

Ongoing operations for an AI knowledge stack focus on retrieval quality and index maintenance. As the dataset grows, monitoring the distance threshold in similarity searches is critical to prevent "hallucinations" caused by retrieving irrelevant chunks.

Dataset Size Recommended Index Primary Benefit
< 10k rows None (Exact Search) Perfect Recall
10k - 500k rows IVFFlat Fast Build Time
> 500k rows HNSW High Query Speed/Recall

When migrating to HNSW, be aware that index build times are longer and memory consumption is higher. For those seeking a maintained, opinionated version of this architecture rather than a manual build, the NovCog Brain at novcog.dev provides a streamlined implementation of these patterns.

Appendix · Questions

Reference: common questions

How long does it take to build an AI knowledge stack?
A basic prototype using Supabase and pgvector can be deployed in a few hours. However, building a production-ready stack with optimized chunking strategies, HNSW indexing, and refined RAG pipelines typically takes two to four weeks of development and testing.
Do I need to know Python to build an AI knowledge stack?
While not strictly required, Python is the industry standard for handling embeddings via libraries like LangChain or LlamaIndex. You can use JavaScript/TypeScript with the Supabase client, but Python offers superior tooling for data preprocessing and integration with models from OpenAI or Hugging Face.
What&amp;amp;#x27;s the cheapest way to host an AI knowledge base?
The most cost-effective approach is combining a local embedding model via Ollama with Supabase&amp;amp;#x27;s free tier for vector storage. This eliminates per-token costs for embeddings and keeps your database overhead low until you need to scale to paid tiers.
How do I ingest PDFs into pgvector?
You must first extract text using a library like PyPDF2 or Unstructured, then split that text into smaller chunks. These chunks are passed through an embedding model (e.g., OpenAI&amp;amp;#x27;s text-embedding-3-small) and inserted into the pgvector column of your Supabase table.
What chunking strategy works best for AI knowledge bases?
Recursive character splitting with a slight overlap (e.g., 512 tokens with a 50-token overlap) is generally most effective. This ensures that semantic context isn&amp;amp;#x27;t lost at the boundaries of chunks, which improves retrieval accuracy during cosine similarity searches.
How do I handle updates and re-ingestion in an AI knowledge stack?
Implement a hashing system where you store a MD5 or SHA-256 hash of the original content in your metadata. When re-scanning documents, only regenerate embeddings for files whose hashes have changed to avoid unnecessary API costs and database churn.
Can I use Ollama for embeddings locally?
Yes, you can run embedding models like Qwen or mxbai-embed-large via Ollama. This allows you to generate vectors on your own hardware and push them to a remote pgvector instance, ensuring your raw data never leaves your local environment during the embedding phase.
What MCP clients can I connect to my knowledge base?
The Model Context Protocol (MCP) allows you to connect tools like Claude Desktop and Cursor directly to your Supabase backend. This enables these AI editors to query your schema, manage migrations, and debug your knowledge stack using natural language.
How do I scale an AI knowledge base past 1 million entries?
Switch from IVFFlat to HNSW (Hierarchical Navigable Small World) indexes in pgvector for faster, high-recall searches at scale. You should also implement partitioning on your Postgres tables and consider a dedicated vector database if latency exceeds acceptable thresholds.
Should I use hybrid search or pure semantic search?
Hybrid search is superior because it combines semantic vectors with traditional BM25 keyword matching. This prevents &amp;amp;#x27;hallucinated&amp;amp;#x27; retrievals by ensuring that specific technical terms or unique IDs are found exactly, while the vector search handles broader conceptual queries.
How do I handle access control on my AI knowledge store?
Use Supabase Row Level Security (RLS) to define policies at the database level. By adding a `user_id` or `org_id` column to your knowledge table, you can ensure that the vector search only returns embeddings that the authenticated user is authorized to see.
What typically goes wrong first when scaling an AI knowledge stack?
Retrieval quality usually degrades first due to &amp;amp;#x27;noise&amp;amp;#x27; in the top-k results. As your dataset grows, simple cosine similarity often returns irrelevant chunks; this is typically solved by adding a re-ranking step using a Cross-Encoder model after the initial retrieval.