data

Embeddings

Learn what Embeddings (Vector Embeddings) means in AI and machine learning, with examples and related concepts.

Definition

Embeddings (or vector embeddings) are numerical representations of text, images, or other data as arrays of numbers (vectors) that capture semantic meaning.

The key insight: similar concepts end up near each other in vector space. “dog” and “puppy” have very similar embeddings, while “dog” and “airplane” are far apart. This makes it possible to search by meaning, not just keywords.

Embeddings are the foundation of RAG systems, semantic search, recommendation engines, and clustering.

How It Works

Text: "How to train a neural network"
        ↓ Embedding Model
Vector: [0.023, -0.451, 0.887, ..., 0.012]  (1536 dimensions)

Text: "Deep learning model training guide"
        ↓ Embedding Model
Vector: [0.019, -0.448, 0.891, ..., 0.009]  (similar! → close in space)

Text: "Best pizza recipes"
        ↓ Embedding Model
Vector: [0.756, 0.234, -0.112, ..., 0.445]  (different → far apart)

The distance between vectors tells you how semantically similar two pieces of text are. Common distance metrics: cosine similarity, dot product, Euclidean distance.

Why It Matters

Example

from openai import OpenAI
import numpy as np

client = OpenAI()

def get_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare similarity
e1 = get_embedding("How to build a REST API")
e2 = get_embedding("Creating a web service endpoint")
e3 = get_embedding("Best chocolate cake recipe")

print(cosine_similarity(e1, e2))  # → 0.89 (very similar)
print(cosine_similarity(e1, e3))  # → 0.12 (not similar)
ModelDimensionsProviderUse Case
text-embedding-3-small1536OpenAIGeneral purpose, cost-effective
text-embedding-3-large3072OpenAIHigher accuracy
Cohere embed-v31024CohereMultilingual
all-MiniLM-L6-v2384Hugging FaceFree, runs locally

Key Takeaways


Part of the DeepRaft Glossary — AI and ML terms explained for developers.