Activity 1.4: Managing Context and Memory

Work in progress

This section is under construction. This information hasn’t been reviewed or edited yet!

Practical Activity Overview

In this activity, we’ll enhance our chat application to maintain conversation history and implement memory management techniques. This builds directly on our previous chat app.

Prerequisites

Python 3.8 or higher installed on your system
Basic familiarity with command line/terminal
A text editor or IDE of your choice

Activities

Step 1: Set Up Your Development Environment

1.1 Make sure you are using the virtual environment we created in the previous activity:

On Windows:

.\venv\Scripts\activate

On macOS/Linux:

source venv/bin/activate

Step 2: Install Required Packages

2.1 Add ’tiktoken’ to our requirements.txt file. This will handle token counting.

streamlit 
google-generativeai 
python-dotenv
requests
tiktoken

2.2 Install the new packages:

pip install -r requirements.txt

Step 3: Token Counting

3.1 First, let’s add tiktoken for accurate token counting:

# Add this import at the top
import tiktoken

# Add this function for token counting
def count_tokens(text, encoding_name="cl100k_base"):
    """Count tokens accurately using tiktoken"""
    encoding = tiktoken.get_encoding(encoding_name)
    return len(encoding.encode(text))

3.2 Next, add session state for conversation history:

# Initialize conversation history in session state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

Step 4: Adding Conversation History

4.1 Add functions to generate responses with conversation history:

def generate_ollama_response(messages):
    # Convert our message format to Ollama's expected format
    ollama_messages = [{"role": m["role"], "content": m["content"]} for m in messages]
    
    response = requests.post("http://localhost:11434/api/chat", 
        json={
            "model": "llama3.2:1b",
            "messages": ollama_messages,
            "stream": False
        }
    )
    if response.status_code == 200:
        return response.json()["message"]["content"]
    else:
        raise Exception(f"Ollama error: {response.text}")

def generate_gemini_response(messages):
    # Convert our messages to Gemini format
    gemini_messages = []
    for message in messages:
        role = "user" if message["role"] == "user" else "model"
        gemini_messages.append(types.Content(role=role, parts=[types.Part.from_text(text=message["content"])]))
    
    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=gemini_messages
    )
    return response.text

4.2 Add context window controls:

# Add context window size slider
context_window = st.sidebar.slider("Context Window Size", 1, 10, 5, 
                                  help="Number of previous messages to include in context")

# Add token counting display
if st.session_state.messages:
    total_tokens = sum(count_tokens(msg["content"]) for msg in st.session_state.messages)
    st.sidebar.metric("Tokens Used", total_tokens)
    
    # Display context window usage
    if model_type == "Ollama":
        max_tokens = 4096
        st.sidebar.progress(min(1.0, total_tokens / max_tokens))
        st.sidebar.text(f"Ollama context: ~{max_tokens} tokens")
    else:
        max_tokens = 32768
        st.sidebar.progress(min(1.0, total_tokens / max_tokens))
        st.sidebar.text(f"Gemini context: ~{max_tokens} tokens")

Step 5: Update User Input Handling

5.1 Update the user input handling:

if user_input:
    # Add user message to history
    st.session_state.messages.append({"role": "user", "content": user_input})
    
    # Display user message
    with st.chat_message("user"):
        st.write(user_input)
    
    try:
        # Use only the most recent messages based on context window setting
        recent_messages = st.session_state.messages[-context_window:]
        
        # Generate response based on selected model
        if model_type == "Ollama":
            response = generate_ollama_response(recent_messages)
        else:
            response = generate_gemini_response(recent_messages)
        
        # Add assistant response to history
        st.session_state.messages.append({"role": "assistant", "content": response})
        
        # Display assistant response
        with st.chat_message("assistant"):
            st.write(response)
    except Exception as e:
        st.error(str(e))

5.2 Add a button to clear conversation history:

# Add button to clear conversation history
if st.sidebar.button("Clear Conversation"):
    st.session_state.messages = []
    st.experimental_rerun()

Step 6: Experiment and Observe

6.1 Now that you’ve implemented conversation memory and context management, experiment with:

Asking the model to remember information from earlier in the conversation
Adjusting the context window size and observing how it affects the model’s memory
Testing the summarization feature with longer conversations
Comparing token usage between different models and approaches

Key Learning Points

Context windows determine how much previous conversation the model can “see”
Efficient token usage is crucial for managing context windows
Summarization can help extend effective context by condensing information
Different models have different context window limitations
Memory management is essential for creating coherent, contextual AI interactions

Security Note

Never commit your .env file to version control. Add it to your .gitignore file if you’re using Git.

Complete Script

Here’s the full app.py script with all the changes incorporated:

import os
import streamlit as st
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.llms import Ollama
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
from langchain.callbacks import get_openai_callback
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Check for API key
if not os.getenv('GOOGLE_API_KEY'):
    st.error("Please set your Google API Key in the .env file!")
    st.stop()

st.title("Chat App with Memory")
model_type = st.sidebar.selectbox("Model", ["Gemini", "Ollama"])

# Initialize conversation history in session state
if "messages" not in st.session_state:
    st.session_state.messages = []
    
# Display chat history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.write(message["content"])

# Add context window size slider
context_window = st.sidebar.slider("Context Window Size", 1, 10, 5, 
                                  help="Number of previous messages to include in context")

# Initialize the appropriate model and chain based on selection
@st.cache_resource
def get_conversation_chain():
    if model_type == "Gemini":
        llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
    else:
        llm = Ollama(model="llama3.2:1b", base_url="http://localhost:11434")
        
    memory = ConversationBufferMemory(k=context_window, return_messages=True)
    
    # Load existing conversation into memory
    for msg in st.session_state.messages:
        if msg["role"] == "user":
            memory.chat_memory.add_user_message(msg["content"])
        else:
            memory.chat_memory.add_ai_message(msg["content"])
            
    chain = ConversationChain(llm=llm, memory=memory)
    return chain, max_tokens

# Token counting display
if st.session_state.messages:
    chain, max_tokens = get_conversation_chain()
    total_tokens = sum(len(msg["content"].split()) * 1.3 for msg in st.session_state.messages)
    st.sidebar.metric("Estimated Tokens Used", int(total_tokens))
    st.sidebar.progress(min(1.0, total_tokens / max_tokens))
    st.sidebar.text(f"{model_type} context: ~{max_tokens} tokens")

user_input = st.chat_input("Type your message here...")
if user_input:
    # Add user message to history
    st.session_state.messages.append({"role": "user", "content": user_input})
    
    # Display user message
    with st.chat_message("user"):
        st.write(user_input)
    
    try:
        # Get the conversation chain
        chain, _ = get_conversation_chain()
        
        # Generate response
        response = chain.invoke({"input": user_input})
        assistant_response = response["response"]
        
        # Add assistant response to history
        st.session_state.messages.append({"role": "assistant", "content": assistant_response})
        
        # Display assistant response
        with st.chat_message("assistant"):
            st.write(assistant_response)
    except Exception as e:
        st.error(str(e))

# Add button to clear conversation history
if st.sidebar.button("Clear Conversation"):
    st.session_state.messages = []
    st.experimental_rerun()