LLM Learning Journey
Intro

Analogy + Concept

Just like a baby learns to speak…
an LLM learns to generate text.

Large Language Models (LLMs) are AI systems trained on massive text to predict and generate human-like language. They grow through observation → trial & error → refinement.

How to use this page
  • 1Read the short idea on the left.
  • 2Try the mini activity on the right.
  • 3Use the “Next” button to continue.
What you’ll understand by the end
Tokens Probabilities Attention Transformers RLHF
Think of it like learning to talk: listen → guess → get corrected.
Start the journey
Scroll ↓

Slide 2 — Pretraining

The Exposure Phase

Babies absorb language patterns for months before speaking. LLMs do the same during pretraining — they read enormous datasets and learn the structure of language.

  • Massive text exposure
  • Learns patterns of grammar & style
  • Requires heavy GPU compute
Takeaway

Pretraining is just reading a lot. The model isn’t “smart” yet — it’s absorbing patterns.

Try this

Imagine your phone reading every book you own. That’s the scale of pretraining data.

Slide 3 — Tokens

Building Vocabulary: Tokens

Models don’t “see” words directly. Text is split into tokens (word pieces) and mapped to numbers.

For teaching: explain “tokens” like syllables/word-chunks a baby learns first.

Checkpoint: Tokenize your sentence
What to notice

Long words split into pieces. Models work with pieces, not full words.

Try this

Type a long word like “internationalization” and hit Tokenize.

Slide 4 — Next Token + Softmax

Trial, Error & Probability

The core trick: next token prediction. Given the current text, the model scores possible next tokens. Softmax converts scores into probabilities.

“The cat sat on the …” → {mat: 0.62, sofa: 0.21, moon: 0.02}
Higher probability → more likely token
Checkpoint: Reroll the probabilities
Takeaway

The model makes a weighted guess for the next word.

Try this

Click “Reroll” and watch how the probabilities change.

Slide 5 — Embeddings

Understanding Context: Embeddings

Words become vectors in a “meaning space”. Similar concepts cluster together (like juice & milk → drinks).

Classic intuition
King − Man + Woman ≈ Queen
Clusters animate into place when this section appears.
Try: orange → blue line, van → gray, person → purple line, tata → between car + person.
What to notice

Related words land near each other in a map of meaning.

Think of it like

A map where cities close together share culture. Words close together share meaning.

Slide 6 — Attention

Focus & Attention

Attention lets each word “look at” other words and decide what matters most. This resolves ambiguity (like pronouns).

Example
The animal didn’t cross the road because it was tired.

Tip: Wrap the word to focus in [brackets].

Heatmap cells brighten on the most relevant tokens.
Checkpoint: Toggle attention or apply your sentence
Takeaway

Attention decides which words matter most right now.

Try this

Toggle focus and see which words the model “pays attention” to.

Slide 7 — Transformer

The “Brain”: Transformer Architecture

Transformers stack many repeated blocks: Self-Attention + Feed-Forward. Deeper stacks learn richer relationships.

Layers animate in as you add them.
Checkpoint: Add a layer
Takeaway

Transformers stack the same building block many times.

Try this

Add layers and see how depth builds capability.

Slide 8 — Fine-tuning + RLHF

Alignment & Feedback

Humans correct mistakes and reward good behavior. LLMs are refined using fine-tuning and RLHF (human feedback loops).

Click a rating to run the feedback loop.
User Prompt
Model Answer
Human Feedback
Update Model
Checkpoint: Click Helpful or Not helpful
Takeaway

Human feedback shapes how the model responds.

Try this

Click Helpful or Not helpful and watch the loop update.

Slide 9 — Hallucinations

Growing Pains

Toddlers confidently tell tall tales. LLMs can also be confidently wrong because they optimize plausibility, not guaranteed truth.

⚠️
Tip: Use citations, verification, and guardrails for high-stakes use.
Model says (confidently):
“The moon is made of cheese.”
Checkpoint: Fact-check your claim
Takeaway

LLMs can sound confident even when they are wrong.

Try this

Click Fact-check to see a safe correction.

Slide 10 — Future

From Learner to Capable Assistant

Next wave: Multimodal AI, Autonomous Agents, and Personalized AI in healthcare & education.

🖼️🎧
Multimodal
Understands text + image + audio + video.
🤖🧩
Agents
Plans, uses tools, and completes tasks end-to-end.
🧑‍⚕️📚
Personalized
Adapts to your goals, context, and learning style.
Quick recap

LLMs learn from data, guess next tokens, use attention, and improve with feedback.

Summary

You now have the basics to interpret LLM demos and compare model behavior.