Working with AI

How LLMs work and which coding tools to use

Author

Dr. Tobias Vlćek

LLM Fundamentals

Welcome!

This workshop covers AI Programming & Large Language Models. You have likely already used LLMs such as ChatGPT, Claude, Gemini, and others. By the end of this workshop, you will understand:

How LLMs work at a conceptual level
The difference between models, apps, and harnesses
Which AI coding tools to use and how to get started

What is an LLM?

Think of LLMs as advanced pattern recognition systems. They have “read” massive amounts of text from the internet, books, and code, and based on those patterns, they can generate new text.

What they really do is predict the next token, like an incredibly advanced version of autocomplete on your phone. But they cannot truly “think” or “understand” like humans.

Despite their limitations, LLMs are genuinely useful for coding, writing, research, and analysis.

How LLMs work

Modern LLMs use the Transformer architecture, introduced in “Attention is All You Need” (Vaswani et al., 2017).

The process (simplified):

Tokenization: Text is split into tokens, meaning not whole words but subwords like “un-”, “break”, “-able”. This lets the model handle words it has never seen before.
Relationship mapping: The model builds a mathematical representation of how these tokens relate to each other
Attention mechanism: The model focuses on the most relevant parts of the input to generate each output token

“The cat sat on the mat because it was warm.” Attention helps the model understand that “it” refers to “the mat,” not “the cat.”

You can explore how text is tokenized at platform.openai.com/tokenizer.

Context windows

The context window is the number of tokens an LLM can process at once. Think of it as the model’s short-term memory.

Larger context window:

“Remembers” more of the conversation
More coherent responses
More computational cost

Smaller context window:

Faster processing
May lose track of earlier parts
Less expensive to run

How to manage context well:

Break problems into manageable chunks rather than dumping everything at once, and be specific about which files or functions you need help with. When you switch topics, start a fresh conversation so stale context does not confuse the model. Providing relevant context upfront, instead of expecting the model to guess, consistently leads to better results.

Think of context like a desk: keep only what you need for the current task on it, and clear it when you move to the next task.

How LLMs are trained

Training an LLM happens in three phases:

1. Self-supervised learning (pre-training)

Trained on massive text datasets without explicit labels
Learns to predict the next token, picking up grammar, facts, and patterns

2. Supervised fine-tuning

Trained on smaller, curated datasets for specific tasks
Makes the model better at following instructions

3. Reinforcement learning from human feedback (RLHF)

Human evaluators rank model outputs by quality
Model learns to produce responses that are more helpful, honest, and harmless

What LLMs get wrong

LLMs have several important limitations to be aware of:

Unreliable output: LLMs can confidently generate incorrect information (hallucinations), and because their training data contains biases, those biases can surface in their responses, sometimes in subtle ways you will not catch without careful review.
No true understanding: What looks like reasoning is pattern matching on training data, not thinking from first principles. Models also only know what they were trained on, so anything after their knowledge cutoff is simply absent.
Inconsistent behavior: Small changes in wording can produce very different results. Models can also drift persona mid-conversation. Researchers have documented cases where fine-tuning on one task triggered unexpected behavior elsewhere, and long conversations can push a model away from its intended character entirely.
Overconfident tone: LLMs rarely signal uncertainty, so a wrong answer reads exactly like a correct one.

A 2025 Anthropic study found that developers using AI scored 17% lower on comprehension tests (roughly two letter grades), with the biggest gap in debugging. Leaning on AI for code generation without engaging with what it produces erodes the skills you need to catch its mistakes.

Models, Apps & Harnesses

A framework for understanding AI tools

This framework is based on Ethan Mollick’s work from “One Useful Thing”. There are three layers to understand:

Models = the underlying AI “brains”
Apps = the products you interact with
Harnesses = systems within apps that let the AI use tools and take action

The same model can produce very different results depending on the harness it is placed in!

Models: The AI brains

The foundation model is the core intelligence. Different providers have different strengths:

Provider	Model	Strength
OpenAI	GPT-5.4	General purpose, multimodal
Anthropic	Claude Opus/Sonnet	Reasoning, coding, long context
Google	Gemini Pro	Google integration, multimodal
Mistral	Mistral Large	Affordable, European provider
Meta	Llama (open-source)	Local deployment, customizable

This changes fast. New models appear every few months.

Free/default tiers often use weaker models. Always select the frontier or advanced model when available, as the difference is noticeable.

Apps and harnesses

Apps are how you access the models.

App	URL	Provider
ChatGPT	chatgpt.com	OpenAI
Claude	claude.ai	Anthropic
Gemini	gemini.google.com	Google
Le Chat	chat.mistral.ai	Mistral

All four offer free access to capable models, while paid tiers unlock stronger models and higher usage limits. Mistral offers an affordable student tier with access to Mistral Vibe (not flawless, but acceptable).

Every app also comes with a harness, a system that gives the model tools and capabilities. Same model, different harness = completely different results.

Different apps, different harnesses:

Claude.ai harnesses Claude with web search and code execution → a chat assistant
Claude Code harnesses the same Claude with file access, terminal, and testing → an autonomous coding agent
Copilot in Zed harnesses a model with your editor context → a pair programmer

Pick the right app for the task! A chat app is great for brainstorming, but a coding agent is better for building features.

Why this framework matters

Question: Why does the same AI feel “smarter” in some tools than others?

Because the harness determines:

What context the model receives and what tools it can use (web search, file access, code execution)
How many steps it can take autonomously and how results are presented back to you

Anthropic’s own engineering team showed this concretely: the same model failed at complex coding tasks when given a bare prompt, but succeeded when wrapped in a harness with progress tracking, environment setup, and automated testing.

When someone says “AI can’t do X,” ask: which model, in which harness? The answer often changes with the right setup.

AI Coding Tools

Homebrew: Your package manager

Homebrew is a package manager for macOS that installs command-line tools and applications with a single command. We will use it to install all coding tools in this workshop.

Install Homebrew:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Then installing tools is as simple as:

brew install <tool-name>        # CLI tools
brew install --cask <app-name>  # GUI applications

If you are not on macOS, download links are provided as alternatives for each tool.

Zed: A modern code editor

Zed is the recommended IDE for this course — a modern, fast code editor built in Rust with a built-in AI assistant supporting multiple model support.

Features:

Supports cloud models (Claude, GPT) and local models (Ollama)
Great for pair programming with AI
Free, open-source, and actively developed

How to install (macOS):

brew install --cask zed

For Linux or Windows, download from zed.dev.

Zed offers a free student tier with unlimited edit predictions and $10/month in API credits for one year. Sign up at zed.dev/education with your GitHub account and university email.

GitHub Copilot

GitHub Copilot is an AI pair programmer integrated into editors (VS Code, Zed, JetBrains). It is free for students via GitHub Student Developer Pack and provides inline code completion as you type.

How to get access:

Go to education.github.com/pack
Sign up with your university email
Verify your student status
Wait for approval (usually 1-2 days)

Copilot’s code completion can also be activated in Zed if you prefer it over Zed’s built-in completion.

Terminal-based AI tools

Terminal-based coding assistants let you work with AI directly from the command line, without leaving your editor or browser. Two options stand out:

OpenCode

Open-source terminal-based AI coding assistant
Supports multiple AI providers (bring your own API key)
Reads, edits, and creates files from the terminal, good for quick fixes and file operations

brew install opencode

For other platforms, see opencode.ai.

Claude Code

Anthropic’s CLI coding agent that goes beyond suggestions: autonomous multi-step task execution
Reads your entire codebase for context and can plan, write, test, and debug code autonomously
Executes terminal commands and iterates on errors, best for involved, multi-step tasks

brew install --cask claude-code

For other platforms, see claude.ai/code.

Claude Code requires a Claude subscription or Anthropic API key. OpenCode is free but requires you to bring your own API key for whichever provider you choose.

Choosing the right tool

Tool	Type	Best for	Cost
Zed	IDE	Daily coding with AI assist	Free
GitHub Copilot	IDE extension	Inline code suggestions	Free for students
OpenCode	Terminal	Quick terminal tasks	Free (bring API key)
Claude Code	Terminal agent	Multi-step tasks	API costs

Question: When would you use a chat interface vs. a coding agent?

Chat interface (Claude.ai, ChatGPT)

Brainstorming ideas and explaining concepts
Drafting text, outlines, or quick one-off questions

Coding agent (Claude Code, OpenCode)

Building features across files and debugging involved issues
Refactoring codebases, running and testing code

Remember: same model, different harness = different capabilities. Pick the right tool for the job.

Wrap-up

Tips and takeaways

1. Be precise in your instructions

The more specific your prompt, the better the result. Include context: language, framework, expected behavior.

2. Always review generated code

LLMs make mistakes, suggest inefficient solutions, or miss edge cases. Your code, your responsibility. In the Anthropic study mentioned earlier, developers who asked the AI to explain its code rather than just generate it performed nearly as well as those coding by hand.

3. Iterate and refine

Use the output as a starting point. Ask follow-up questions to improve the result, and break requests into smaller steps when things get involved.

Takeaways:

AI is a tool, not a replacement. You still need to understand the fundamentals
Understand before accepting. Always read and verify AI-generated output
Pick the right harness. Chat for brainstorming, agents for building, IDE for daily coding
Models matter. Always use the most capable model available to you
Context matters. Clear prompts and managed context give you better output

If you use free models, be aware that your prompts may be used by providers and are not private. For learning and experimenting, this should be no issue.

Resources

Getting started:

Zed Editor: Recommended IDE
GitHub Student Pack: Free Copilot access
Claude.ai: Anthropic’s chat interface
ChatGPT: OpenAI’s chat interface

Going deeper:

3Blue1Brown: LLMs: Grant Sanderson explains LLMs visually better than most textbooks
Anthropic Research: AI safety and interpretability
One Useful Thing: Ethan Mollick’s blog on practical AI use
Prompt Engineering Guide: Techniques for better prompts
AI Assistance and Coding Skills: Anthropic’s study on how AI help affects learning (the 17% finding cited in this workshop)
The Many Masks LLMs Wear: Kai Williams on persona drift and why models behave unpredictably
Effective Harnesses for Long-Running Agents: Practical patterns for building agent systems that actually work

LLM architecture types (reference)

Architecture	How it works	Best for
Encoder-Decoder	Understands input, then generates output	Translation, summarization
Encoder-Only	Focuses on understanding input	Classification, search (BERT)
Decoder-Only	Generates text token by token	Chatbots, code generation (GPT, Claude)
Mixture of Experts	Routes input to specialized sub-models	Efficient scaling (fewer params active)

Most LLMs you interact with today (ChatGPT, Claude, Gemini) are decoder-only models, sometimes enhanced with Mixture of Experts.