Vector Database
Learn what Vector Database means in AI and machine learning, with examples and related concepts.
Definition
Vector database is a specialized database designed to store, index, and search high-dimensional vectors — the numerical representations (called embeddings) that AI models use to understand meaning.
Traditional databases search by exact matches: “find all users where name = ‘John’”. Vector databases search by similarity: “find documents most similar in meaning to this query.” This is what powers semantic search, recommendation systems, and RAG pipelines.
When you ask Perplexity AI a question, it converts your query to a vector, searches a vector database for semantically similar documents, retrieves the most relevant ones, and feeds them to an LLM to generate an answer. The vector database is the engine that makes this fast — searching millions of documents in milliseconds.
How It Works
1. EMBED — Convert text to vectors using an embedding model
"How to train a neural network" → [0.23, -0.87, 0.45, ..., 0.12]
(768 or 1536 dimensions)
2. STORE — Save vectors in the database with metadata
{
vector: [0.23, -0.87, 0.45, ...],
metadata: { source: "tutorial.md", section: "chapter 3" },
text: "Neural network training involves..."
}
3. SEARCH — Find similar vectors using distance metrics
Query: "deep learning training process"
→ [0.25, -0.82, 0.41, ...] (similar vector!)
→ Returns top-k nearest neighbors
Distance metrics:
• Cosine similarity — angle between vectors (most common for text)
• Euclidean distance — straight-line distance
• Dot product — faster variant of cosine
Approximate Nearest Neighbor (ANN)
Brute-force comparison against millions of vectors would be too slow. Vector databases use ANN algorithms to find approximate matches much faster:
Brute force: Compare query against ALL 10M vectors → 100% accurate, very slow
ANN (HNSW): Navigate a graph structure → ~99% accurate, 1000x faster
Common ANN algorithms:
HNSW (Hierarchical Navigable Small World) — best general-purpose
IVF (Inverted File Index) — good for very large datasets
PQ (Product Quantization) — compressed vectors for memory efficiency
Why It Matters
- RAG — Vector databases are the retrieval layer in RAG pipelines, the standard architecture for grounding LLMs
- Semantic search — Search by meaning, not just keywords. “dog food” matches “canine nutrition” even without shared words.
- Scale — Search millions of documents in <50ms, enabling real-time AI applications
- Recommendations — “Users who liked this also liked…” is a nearest-neighbor search in embedding space
Major Vector Databases (as of 2026)
| Database | Type | Best For | Pricing |
|---|---|---|---|
| Pinecone | Managed cloud | Production RAG, zero ops | Pay per usage |
| Weaviate | Open source / cloud | Hybrid search (vector + keyword) | Free / managed |
| ChromaDB | Open source (embedded) | Prototyping, small projects | Free |
| Qdrant | Open source / cloud | Performance-critical apps | Free / managed |
| pgvector | Postgres extension | Existing Postgres users | Free (extension) |
| Milvus | Open source | Large-scale (billions of vectors) | Free / managed |
Example
# ChromaDB — simplest way to start (no server needed)
import chromadb
# Create a client and collection
client = chromadb.Client()
collection = client.create_collection(
name="knowledge_base",
metadata={"hnsw:space": "cosine"} # cosine similarity
)
# Add documents — ChromaDB auto-generates embeddings
collection.add(
documents=[
"Python is a high-level programming language known for its simplicity.",
"JavaScript runs in web browsers and is essential for frontend development.",
"Rust provides memory safety without garbage collection.",
"Go was designed at Google for building scalable server software.",
"TypeScript adds static typing to JavaScript.",
],
ids=["doc1", "doc2", "doc3", "doc4", "doc5"],
metadatas=[
{"language": "python"},
{"language": "javascript"},
{"language": "rust"},
{"language": "go"},
{"language": "typescript"},
]
)
# Semantic search
results = collection.query(
query_texts=["What language is good for beginners?"],
n_results=3
)
print(results["documents"][0])
# → ['Python is a high-level programming language known for its simplicity.',
# 'Go was designed at Google for building scalable server software.',
# 'JavaScript runs in web browsers...']
# Python ranks first because "simplicity" is semantically closest to "beginners"
# Full RAG pipeline: Vector DB + Claude
import chromadb
from anthropic import Anthropic
# Set up vector store with your documents
chroma = chromadb.Client()
docs = chroma.create_collection("product_docs")
# Index your documentation
docs.add(
documents=[
"The Pro plan costs $29/month and includes 50GB storage and priority support.",
"To reset your password, go to Settings > Security > Change Password.",
"Our API rate limit is 1000 requests per minute on the Pro plan.",
"Billing is processed on the 1st of each month. You can cancel anytime.",
],
ids=["pricing", "password", "api-limits", "billing"]
)
# Answer questions using retrieved context
anthropic = Anthropic()
def answer(question: str) -> str:
# Step 1: Retrieve relevant documents
results = docs.query(query_texts=[question], n_results=2)
context = "\n".join(results["documents"][0])
# Step 2: Generate grounded answer
response = anthropic.messages.create(
model="claude-sonnet-4-6",
max_tokens=300,
system="Answer based only on the provided context. If the answer isn't in the context, say so.",
messages=[{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}]
)
return response.content[0].text
print(answer("How much does the Pro plan cost?"))
# → "The Pro plan costs $29/month and includes 50GB storage and priority support."
print(answer("What's the API rate limit?"))
# → "The API rate limit is 1000 requests per minute on the Pro plan."
Key Takeaways
- Vector databases store and search high-dimensional embeddings — enabling search by meaning, not just keywords
- They’re the core infrastructure for RAG pipelines, semantic search, and recommendation systems
- ANN algorithms (like HNSW) make similarity search across millions of vectors possible in milliseconds
- ChromaDB is great for prototyping; Pinecone, Weaviate, and Qdrant are production-grade options
- If you’re building any AI application that needs to search or retrieve information, you likely need a vector database
Part of the DeepRaft Glossary — AI and ML terms explained for developers.