What is LLM (Large Language Model)?

Definition

LLM stands for Large Language Model — a type of AI model trained on massive amounts of text data that can understand and generate human language.

LLMs are the technology behind tools like ChatGPT, Claude, and Gemini. They work by predicting the next word (or token) in a sequence, but this simple mechanism produces remarkably sophisticated behavior: answering questions, writing code, translating languages, and reasoning through complex problems.

The “large” in LLM refers to both the size of the training data (often trillions of words from books, websites, and code) and the number of parameters in the model (ranging from a few billion to over a trillion).

How It Works

An LLM processes text in three stages:

Tokenization — Input text is broken into tokens (roughly word pieces). “I love programming” might become ["I", " love", " program", "ming"].
Processing — Tokens pass through dozens of Transformer layers, where attention mechanisms determine how each token relates to every other token.
Prediction — The model outputs a probability distribution over the vocabulary for the next token, then samples from it.

Input: "The capital of France is"
        ↓ Tokenize
Tokens: [The, capital, of, France, is]
        ↓ Transformer layers (96+ layers)
        ↓ Attention + Feed-forward
Output: { "Paris": 0.95, "Lyon": 0.02, "the": 0.01, ... }

The key insight is that by training on enough text, the model learns not just grammar but facts, reasoning patterns, and even coding conventions.

Why It Matters

LLMs have transformed software development and knowledge work:

Code generation — GitHub Copilot and Claude Code use LLMs to write, debug, and refactor code
Content creation — Marketing teams use LLMs for drafts, translations, and brainstorming
Search & research — Perplexity and Google’s AI Overviews use LLMs to synthesize search results
Customer support — Companies deploy LLMs as first-line support agents

The practical impact: tasks that took hours (writing documentation, analyzing data, translating content) now take minutes.

Example

from anthropic import Anthropic

client = Anthropic()

# Basic LLM usage — ask a question
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[
        {"role": "user", "content": "Explain quantum computing in one paragraph."}
    ]
)

print(response.content[0].text)

# Using OpenAI's API
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the difference between RAM and ROM?"}
    ]
)

print(response.choices[0].message.content)

Key Takeaways

LLMs predict the next token in a sequence — but this simple mechanism enables complex reasoning
Model quality depends on training data size, parameter count, and training technique (RLHF, etc.)
The most capable LLMs (GPT-4o, Claude Opus, Gemini Ultra) have hundreds of billions of parameters
LLMs can hallucinate — generating plausible-sounding but incorrect information
RAG and grounding techniques help reduce hallucinations

Part of the DeepRaft Glossary — AI and ML terms explained for developers.

LLM

Definition

How It Works

Why It Matters

Example

Key Takeaways

Related Terms

Tools That Use This