What is Vector Database?

Definition

Vector database is a specialized database designed to store, index, and search high-dimensional vectors — the numerical representations (called embeddings) that AI models use to understand meaning.

Traditional databases search by exact matches: “find all users where name = ‘John’”. Vector databases search by similarity: “find documents most similar in meaning to this query.” This is what powers semantic search, recommendation systems, and RAG pipelines.

When you ask Perplexity AI a question, it converts your query to a vector, searches a vector database for semantically similar documents, retrieves the most relevant ones, and feeds them to an LLM to generate an answer. The vector database is the engine that makes this fast — searching millions of documents in milliseconds.

How It Works

1. EMBED — Convert text to vectors using an embedding model
   "How to train a neural network"  →  [0.23, -0.87, 0.45, ..., 0.12]
                                        (768 or 1536 dimensions)

2. STORE — Save vectors in the database with metadata
   {
     vector: [0.23, -0.87, 0.45, ...],
     metadata: { source: "tutorial.md", section: "chapter 3" },
     text: "Neural network training involves..."
   }

3. SEARCH — Find similar vectors using distance metrics
   Query: "deep learning training process"
   →  [0.25, -0.82, 0.41, ...]  (similar vector!)
   →  Returns top-k nearest neighbors

   Distance metrics:
   • Cosine similarity — angle between vectors (most common for text)
   • Euclidean distance — straight-line distance
   • Dot product — faster variant of cosine

Approximate Nearest Neighbor (ANN)

Brute-force comparison against millions of vectors would be too slow. Vector databases use ANN algorithms to find approximate matches much faster:

Brute force: Compare query against ALL 10M vectors → 100% accurate, very slow
ANN (HNSW):  Navigate a graph structure → ~99% accurate, 1000x faster

Common ANN algorithms:
  HNSW (Hierarchical Navigable Small World) — best general-purpose
  IVF (Inverted File Index) — good for very large datasets
  PQ (Product Quantization) — compressed vectors for memory efficiency

Why It Matters

RAG — Vector databases are the retrieval layer in RAG pipelines, the standard architecture for grounding LLMs
Semantic search — Search by meaning, not just keywords. “dog food” matches “canine nutrition” even without shared words.
Scale — Search millions of documents in <50ms, enabling real-time AI applications
Recommendations — “Users who liked this also liked…” is a nearest-neighbor search in embedding space

Major Vector Databases (as of 2026)

Database	Type	Best For	Pricing
Pinecone	Managed cloud	Production RAG, zero ops	Pay per usage
Weaviate	Open source / cloud	Hybrid search (vector + keyword)	Free / managed
ChromaDB	Open source (embedded)	Prototyping, small projects	Free
Qdrant	Open source / cloud	Performance-critical apps	Free / managed
pgvector	Postgres extension	Existing Postgres users	Free (extension)
Milvus	Open source	Large-scale (billions of vectors)	Free / managed

Example

# ChromaDB — simplest way to start (no server needed)
import chromadb

# Create a client and collection
client = chromadb.Client()
collection = client.create_collection(
    name="knowledge_base",
    metadata={"hnsw:space": "cosine"}  # cosine similarity
)

# Add documents — ChromaDB auto-generates embeddings
collection.add(
    documents=[
        "Python is a high-level programming language known for its simplicity.",
        "JavaScript runs in web browsers and is essential for frontend development.",
        "Rust provides memory safety without garbage collection.",
        "Go was designed at Google for building scalable server software.",
        "TypeScript adds static typing to JavaScript.",
    ],
    ids=["doc1", "doc2", "doc3", "doc4", "doc5"],
    metadatas=[
        {"language": "python"},
        {"language": "javascript"},
        {"language": "rust"},
        {"language": "go"},
        {"language": "typescript"},
    ]
)

# Semantic search
results = collection.query(
    query_texts=["What language is good for beginners?"],
    n_results=3
)

print(results["documents"][0])
# → ['Python is a high-level programming language known for its simplicity.',
#     'Go was designed at Google for building scalable server software.',
#     'JavaScript runs in web browsers...']
# Python ranks first because "simplicity" is semantically closest to "beginners"

# Full RAG pipeline: Vector DB + Claude
import chromadb
from anthropic import Anthropic

# Set up vector store with your documents
chroma = chromadb.Client()
docs = chroma.create_collection("product_docs")

# Index your documentation
docs.add(
    documents=[
        "The Pro plan costs $29/month and includes 50GB storage and priority support.",
        "To reset your password, go to Settings > Security > Change Password.",
        "Our API rate limit is 1000 requests per minute on the Pro plan.",
        "Billing is processed on the 1st of each month. You can cancel anytime.",
    ],
    ids=["pricing", "password", "api-limits", "billing"]
)

# Answer questions using retrieved context
anthropic = Anthropic()

def answer(question: str) -> str:
    # Step 1: Retrieve relevant documents
    results = docs.query(query_texts=[question], n_results=2)
    context = "\n".join(results["documents"][0])

    # Step 2: Generate grounded answer
    response = anthropic.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=300,
        system="Answer based only on the provided context. If the answer isn't in the context, say so.",
        messages=[{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {question}"
        }]
    )
    return response.content[0].text

print(answer("How much does the Pro plan cost?"))
# → "The Pro plan costs $29/month and includes 50GB storage and priority support."

print(answer("What's the API rate limit?"))
# → "The API rate limit is 1000 requests per minute on the Pro plan."

Key Takeaways

Vector databases store and search high-dimensional embeddings — enabling search by meaning, not just keywords
They’re the core infrastructure for RAG pipelines, semantic search, and recommendation systems
ANN algorithms (like HNSW) make similarity search across millions of vectors possible in milliseconds
ChromaDB is great for prototyping; Pinecone, Weaviate, and Qdrant are production-grade options
If you’re building any AI application that needs to search or retrieve information, you likely need a vector database

Part of the DeepRaft Glossary — AI and ML terms explained for developers.

Vector Database