5. Prompt Engineering
Work in progress
This section is under construction. This information hasn’t been reviewed or edited yet!
Introduction
At their core, LLMs work by responding to “prompts” - text inputs that tell the model what we want it to do. Think of a prompt as a conversation starter or instruction that guides the AI’s response. However, there’s more complexity to prompts than meets the eye, especially when working with different API types and managing conversations.
What will I get out of this?
By the end of this section, you will be able to:
- Explain the concept of prompts and their role in guiding LLM responses.
- Describe the key components of an effective prompt, including task instructions, context, and format specifications.
- Analyze the impact of essential parameters like temperature and top-P sampling on LLM outputs.
- Identify best practices for prompt engineering, including clarity, specificity, and error handling.
- Differentiate between traditional prompt engineering techniques and approaches optimized for modern reasoning models.
- Evaluate the appropriate use cases for different prompt engineering strategies based on task requirements and model capabilities.
Prompt Engineering: as much Art as Science
Prompt Engineering is a surprisingly complex discipline! Different models, different methods of inference, different tasks - all are criteria that influence the creation of a good prompt. While going into extreme minutiae on this is outside the scope of this course, we’ll cover general good practices.
Ultimately, the best way to craft a good prompt will involve a lot of experimentation and evaluation!
Anatomy of an Effective Prompt
A well-structured prompt typically includes several key components:
-
Task Instructions:
- Clear, specific directions about what you want
- Example: “Analyze this code for security vulnerabilities”
-
Context and Background:
- Relevant information the model needs
- Previous conversation history (in chat contexts)
- Example: “Given a Python web application using Flask…”
-
Format Specifications:
- How you want the output structured
- Example: “Provide your answer in bullet points”
-
Examples (Few-Shot Learning):
- Demonstrations of desired input-output pairs
- Helps the model understand patterns
Input: "Hello" Output: "Hi there! How can I help?" Input: "What's the weather?" Output: "I don't have access to current weather data."
Context Management
Managing context in longer conversations requires careful consideration:
-
Context Window Limits:
- Models have maximum token limits
- Need to strategically manage conversation history
- Consider summarizing or pruning older messages
-
Conversation Memory:
- Recent messages are more influential than older ones
- Important to maintain relevant context while removing unnecessary details
- Example strategy:
Keep: Last 3-5 exchanges Summarize: Earlier important points Remove: Off-topic or resolved discussions
Effective Context Management
- Keep track of token usage
- Prioritize recent and relevant information
- Use summarization for long conversations
- Consider implementing memory systems for persistent knowledge
Best Practices
-
Clarity and Specificity:
- Be explicit about what you want
- Avoid ambiguous instructions
- Example: “Generate a Python function that calculates the Fibonacci sequence up to n terms”
-
Safety and Control:
- Include guardrails in system messages
- Specify output constraints
- Example: “Never generate executable code without safety checks”
-
Error Handling:
- Plan for edge cases
- Include fallback instructions
- Example: “If you’re unsure, ask for clarification rather than making assumptions”
Common Pitfalls
- Overloading context windows with unnecessary information
- Mixing multiple tasks in a single prompt
- Assuming the model remembers previous conversations without proper context
- Not setting clear boundaries in system messages
A Note on API Types
When implementing LLMs, you’ll use either a Completion API (for single-turn interactions) or a Chat API (for multi-turn conversations). Each has strengths for different scenarios. We’ll explore these integration patterns in detail in the next section on Inference Techniques, but it’s important to consider which API you’ll use as it affects how you structure your prompts.
The art of crafting effective prompt engineering is crucial because:
- The same question asked differently can yield vastly different results
- Prompts can include context, examples, or specific formatting instructions
- The way we phrase prompts can help prevent or inadvertently enable harmful outputs
- Different API types require different prompt structures
Advanced Prompt Engineering Techniques
Chain-of-Thought Prompting
Chain-of-thought (CoT) prompting is a technique that encourages LLMs to break down complex problems into step-by-step reasoning. Instead of jumping straight to an answer, the model explains its thinking process.
Example:
Basic prompt: "What's 123 × 456?"
CoT prompt: "Let's solve 123 × 456 step by step:
1. First, let's break down 456: 400 + 50 + 6
2. Now multiply 123 by each part:
- 123 × 400 = 49,200
- 123 × 50 = 6,150
- 123 × 6 = 738
3. Finally, add these results:
49,200 + 6,150 + 738 = 56,088"This technique is particularly effective for:
- Mathematical problems
- Logical reasoning tasks
- Complex decision-making
- Debugging code
- Analysis requiring multiple steps
Evolution of Reasoning
While Chain-of-Thought prompting has been a breakthrough in getting traditional LLMs to show their work, newer reasoning models are changing this paradigm entirely. Let’s explore how these models are reshaping our approach to prompt engineering.
Prompt Engineering for Reasoning Models
Modern reasoning models (like OpenAI’s o3 and DeepSeek R1) have built-in multi-step reasoning capabilities that fundamentally change how we should approach prompt engineering. In fact, many of the explicit CoT techniques we just covered may actually hinder these models’ performance.
Key Differences from Traditional Models
| Aspect | Traditional LLMs | Reasoning Models |
|---|---|---|
| Reasoning Process | Needs explicit CoT prompting | Has automatic internal reasoning |
| Best Prompt Style | Detailed instructions + examples | Concise, direct queries |
| Few-Shot Learning | Generally improves performance | Can actually reduce quality |
| Processing Style | Single-pass prediction | Multi-step deliberation |
| Error Handling | Requires manual iteration | Has built-in verification |
Optimizing for Reasoning Models
-
Keep It Simple
# Instead of: "Please solve this equation step by step: 3x + 7 = 22" # Use: "Solve 3x + 7 = 22"The model will automatically break down complex problems - explicit instructions can interfere with this process.
-
Embrace Conciseness
# Instead of: "Please carefully analyze all aspects and provide a detailed report..." # Use: "Analyze [topic] with supporting evidence" -
Model-Specific Considerations
| Model | Best For | Prompt Style |
|---|---|---|
| o3 | Structured coding tasks | JSON schema constraints |
| R1 | Mathematical reasoning | Open-ended questions |
When to Use Reasoning Models
Consider these models when:
- Tasks require complex logical steps
- Output consistency is critical
- You need automated verification
- Dealing with mathematical or coding challenges
Stick with traditional LLMs for:
- Simple text transformations
- Creative writing
- Tasks where cost is a primary concern
Essential Parameters
While understanding how LLMs work is crucial, effectively using them requires mastering their control parameters. These parameters shape how models generate and process text:
Temperature
Temperature controls the randomness in the model’s responses:
- Low values (e.g., 0.2) → more predictable, focused responses
- High values (e.g., 0.8) → more creative, diverse responses
Think of it This Way…
At low temperatures, LLMs stick to the most probable responses (like saying “the sky is blue”). At higher temperatures, it might get more unpredictable with the words it uses (like “the sky is a canvas painted in azure hues”) 🎨 It is why it is often correlated with the “creativity” of the model.
Top-P (Nucleus) Sampling
While temperature affects overall randomness, top-P sampling controls which words the model considers:
- Setting p = 0.9 means only the top 90% most likely tokens are considered
- Lower values → more focused, conservative text
- Higher values → more diverse vocabulary
For example:
Question: "What color is the sky?"
Top-P = 0.5: "blue" (sticks to most common answer)
Top-P = 0.9: "azure", "cerulean", "sapphire" (considers more options)Response Length
Controls how much text the model generates:
- Set by maximum token count
- Longer isn’t always better
- Consider context window limits
Context Window Trade-offs
Remember that longer responses consume more of your context window. A 1000-token response means 1000 fewer tokens available for future context in the conversation.
Quiz
Let’s test your understanding!
Want to test your understanding of prompt engineering principles? This quiz focuses on applying these concepts in real-world scenarios.
Coming up next
Now that we’ve explored how to effectively communicate with AI models through well-crafted prompts, let’s dive into the technical approaches for integrating these models into applications. In the next section, we’ll examine different inference techniques and API integration patterns.