Activity 1.4: Managing Context and Memory
Work in progress
This section is under construction. This information hasn’t been reviewed or edited yet!
Practical Activity Overview
In this activity, we’ll enhance our chat application to maintain conversation history and implement memory management techniques. This builds directly on our previous chat app.
Prerequisites
- Python 3.8 or higher installed on your system
- Basic familiarity with command line/terminal
- A text editor or IDE of your choice
Activities
Step 1: Set Up Your Development Environment
1.1 Make sure you are using the virtual environment we created in the previous activity:
- On Windows:
.\venv\Scripts\activate- On macOS/Linux:
source venv/bin/activateStep 2: Install Required Packages
2.1 Add ’tiktoken’ to our requirements.txt file. This will handle token counting.
streamlit
google-generativeai
python-dotenv
requests
tiktoken2.2 Install the new packages:
pip install -r requirements.txtStep 3: Token Counting
3.1 First, let’s add tiktoken for accurate token counting:
# Add this import at the top
import tiktoken
# Add this function for token counting
def count_tokens(text, encoding_name="cl100k_base"):
"""Count tokens accurately using tiktoken"""
encoding = tiktoken.get_encoding(encoding_name)
return len(encoding.encode(text))3.2 Next, add session state for conversation history:
# Initialize conversation history in session state
if "messages" not in st.session_state:
st.session_state.messages = []
# Display chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.write(message["content"])Step 4: Adding Conversation History
4.1 Add functions to generate responses with conversation history:
def generate_ollama_response(messages):
# Convert our message format to Ollama's expected format
ollama_messages = [{"role": m["role"], "content": m["content"]} for m in messages]
response = requests.post("http://localhost:11434/api/chat",
json={
"model": "llama3.2:1b",
"messages": ollama_messages,
"stream": False
}
)
if response.status_code == 200:
return response.json()["message"]["content"]
else:
raise Exception(f"Ollama error: {response.text}")
def generate_gemini_response(messages):
# Convert our messages to Gemini format
gemini_messages = []
for message in messages:
role = "user" if message["role"] == "user" else "model"
gemini_messages.append(types.Content(role=role, parts=[types.Part.from_text(text=message["content"])]))
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=gemini_messages
)
return response.text4.2 Add context window controls:
# Add context window size slider
context_window = st.sidebar.slider("Context Window Size", 1, 10, 5,
help="Number of previous messages to include in context")
# Add token counting display
if st.session_state.messages:
total_tokens = sum(count_tokens(msg["content"]) for msg in st.session_state.messages)
st.sidebar.metric("Tokens Used", total_tokens)
# Display context window usage
if model_type == "Ollama":
max_tokens = 4096
st.sidebar.progress(min(1.0, total_tokens / max_tokens))
st.sidebar.text(f"Ollama context: ~{max_tokens} tokens")
else:
max_tokens = 32768
st.sidebar.progress(min(1.0, total_tokens / max_tokens))
st.sidebar.text(f"Gemini context: ~{max_tokens} tokens")Step 5: Update User Input Handling
5.1 Update the user input handling:
if user_input:
# Add user message to history
st.session_state.messages.append({"role": "user", "content": user_input})
# Display user message
with st.chat_message("user"):
st.write(user_input)
try:
# Use only the most recent messages based on context window setting
recent_messages = st.session_state.messages[-context_window:]
# Generate response based on selected model
if model_type == "Ollama":
response = generate_ollama_response(recent_messages)
else:
response = generate_gemini_response(recent_messages)
# Add assistant response to history
st.session_state.messages.append({"role": "assistant", "content": response})
# Display assistant response
with st.chat_message("assistant"):
st.write(response)
except Exception as e:
st.error(str(e))5.2 Add a button to clear conversation history:
# Add button to clear conversation history
if st.sidebar.button("Clear Conversation"):
st.session_state.messages = []
st.experimental_rerun()Step 6: Experiment and Observe
6.1 Now that you’ve implemented conversation memory and context management, experiment with:
- Asking the model to remember information from earlier in the conversation
- Adjusting the context window size and observing how it affects the model’s memory
- Testing the summarization feature with longer conversations
- Comparing token usage between different models and approaches
Key Learning Points
- Context windows determine how much previous conversation the model can “see”
- Efficient token usage is crucial for managing context windows
- Summarization can help extend effective context by condensing information
- Different models have different context window limitations
- Memory management is essential for creating coherent, contextual AI interactions
Security Note
Never commit your .env file to version control. Add it to your .gitignore file if you’re using Git.
Complete Script
Here’s the full app.py script with all the changes incorporated:
import os
import streamlit as st
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.llms import Ollama
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain.callbacks import get_openai_callback
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Check for API key
if not os.getenv('GOOGLE_API_KEY'):
st.error("Please set your Google API Key in the .env file!")
st.stop()
st.title("Chat App with Memory")
model_type = st.sidebar.selectbox("Model", ["Gemini", "Ollama"])
# Initialize conversation history in session state
if "messages" not in st.session_state:
st.session_state.messages = []
# Display chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.write(message["content"])
# Add context window size slider
context_window = st.sidebar.slider("Context Window Size", 1, 10, 5,
help="Number of previous messages to include in context")
# Initialize the appropriate model and chain based on selection
@st.cache_resource
def get_conversation_chain():
if model_type == "Gemini":
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
else:
llm = Ollama(model="llama3.2:1b", base_url="http://localhost:11434")
memory = ConversationBufferMemory(k=context_window, return_messages=True)
# Load existing conversation into memory
for msg in st.session_state.messages:
if msg["role"] == "user":
memory.chat_memory.add_user_message(msg["content"])
else:
memory.chat_memory.add_ai_message(msg["content"])
chain = ConversationChain(llm=llm, memory=memory)
return chain, max_tokens
# Token counting display
if st.session_state.messages:
chain, max_tokens = get_conversation_chain()
total_tokens = sum(len(msg["content"].split()) * 1.3 for msg in st.session_state.messages)
st.sidebar.metric("Estimated Tokens Used", int(total_tokens))
st.sidebar.progress(min(1.0, total_tokens / max_tokens))
st.sidebar.text(f"{model_type} context: ~{max_tokens} tokens")
user_input = st.chat_input("Type your message here...")
if user_input:
# Add user message to history
st.session_state.messages.append({"role": "user", "content": user_input})
# Display user message
with st.chat_message("user"):
st.write(user_input)
try:
# Get the conversation chain
chain, _ = get_conversation_chain()
# Generate response
response = chain.invoke({"input": user_input})
assistant_response = response["response"]
# Add assistant response to history
st.session_state.messages.append({"role": "assistant", "content": assistant_response})
# Display assistant response
with st.chat_message("assistant"):
st.write(assistant_response)
except Exception as e:
st.error(str(e))
# Add button to clear conversation history
if st.sidebar.button("Clear Conversation"):
st.session_state.messages = []
st.experimental_rerun()