SnakAI Project

I'll guide you through creating a complete Snake game implementation and then training a Reinforcement Learning (RL) algorithm to play it. We'll start with the game implementation using PyGame, then implement Q-learning to train an AI agent to play Snake autonomously. The approach will be step-by-step, ensuring you understand both game development and RL concepts.

Keywords: Snake game, Reinforcement Learning, Q-learning, PyGame, Markov Decision Process, Bellman equation, epsilon-greedy, state representation, reward function, neural networks, deep Q-learning.

Introduction

Creating a Snake game and training an RL agent to play it is an excellent project that combines game development with artificial intelligence. This project will teach you fundamental concepts in both domains: game loops, collision detection, state representation, reward design, and RL algorithms. By the end, you'll have a fully functional Snake game and an AI that learns to play it through trial and error, just like humans do!

The beauty of this project is that it demonstrates how machines can learn complex behaviors through simple reward mechanisms. The Snake game provides a perfect environment for RL because it has clear rules, discrete states (in our simplified version), and immediate feedback through rewards (eating food) and penalties (dying).

Literature Review

Game Development Fundamentals

Before implementing RL, we need a solid game environment. The Snake game follows these basic principles:

Game Loop: Continuous cycle of processing input, updating game state, and rendering graphics
Collision Detection: Checking if the snake hits walls, itself, or food
State Management: Tracking snake position, direction, food location, and score
Rendering: Displaying game elements on screen

Reference: Rogers, D. (2010). Mathematics for Game Developers. Course Technology Press.

Reinforcement Learning Basics

Reinforcement Learning is a machine learning paradigm where an agent learns to make decisions by interacting with an environment:

Markov Decision Process (MDP): Formal framework for RL with states (S), actions (A), transitions (P), and rewards (R)
Q-learning: Model-free RL algorithm that learns action-value function Q(s,a)
Bellman Equation: Foundation for temporal difference learning: $Q(s,a) \leftarrow Q(s,a) + \alpha[r + \gamma \max_{a'} Q(s',a') - Q(s,a)]$
Exploration vs Exploitation: Balancing trying new actions (exploration) with using known good actions (exploitation)

References:

Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3-4), 279-292.

State Representation for Snake

For RL to work effectively, we need to represent the game state in a way the agent can understand:

Grid-based representation: Divide game area into discrete cells
Feature extraction: Extract relevant information about danger, food direction, etc.
Image-based representation: Use raw pixels as input (more advanced)

Reference: Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.

Methods for Implementing Snake Game with RL

Method 1: Tabular Q-learning with Simple State Representation

Approach: Discretize game state into finite grid positions, use Q-table
Pros: Simple to implement, easy to understand, guaranteed convergence
Cons: State space explosion for larger grids, doesn't scale well

Method 2: Deep Q-Network (DQN) with Neural Network

Approach: Use neural network to approximate Q-function from game pixels or features
Pros: Handles high-dimensional state spaces, can learn complex patterns
Cons: Requires more computational resources, can be unstable during training

Method 3: Policy Gradient Methods (REINFORCE)

Approach: Directly learn policy without value function
Pros: Better for continuous action spaces, can learn stochastic policies
Cons: Higher variance, slower learning

Method 4: Double DQN with Experience Replay

Approach: Advanced DQN with separate target network and memory buffer
Pros: More stable training, reduces overestimation bias
Cons: More complex implementation

Comparative Analysis and Optimal Method Selection

For beginners, I recommend starting with Method 1: Tabular Q-learning because:

It's the most educational - you can see exactly how Q-values update
It works well for small grid sizes
It demonstrates core RL concepts without neural network complexities
Once mastered, you can graduate to more advanced methods

However, for better performance and scalability, Method 2: Deep Q-Network is superior for larger games. We'll implement the simpler Q-learning first for understanding, then discuss how to extend it to DQN.

Detailed Explanation of Optimal Method: Tabular Q-learning

Mathematical Foundation

The core of Q-learning is the Bellman equation update:

$$Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \left[ r_{t+1} + \gamma \max_{a} Q(s_{t+1}, a) - Q(s_t, a_t) \right]$$

Let's break down each term:

$Q(s_t, a_t)$: Current Q-value for state $s_t$ and action $a_t$
- Represents expected cumulative reward from taking action $a$ in state $s$
$\alpha$: Learning rate ($0 < \alpha \leq 1$)
- Controls how much new information overrides old information
- Example: $\alpha = 0.1$ means we update Q-value by 10% toward new estimate
$r_{t+1}$: Immediate reward received after taking action $a_t$
- In Snake: +10 for eating food, -10 for dying, -0.1 for each move to encourage efficiency
$\gamma$: Discount factor ($0 \leq \gamma < 1$)
- Determines importance of future rewards
- Example: $\gamma = 0.9$ means future rewards are worth 90% of immediate rewards
$\max_{a} Q(s_{t+1}, a)$: Maximum Q-value for next state $s_{t+1}$
- Estimates best possible future reward from next state
$r_{t+1} + \gamma \max_{a} Q(s_{t+1}, a) - Q(s_t, a_t)$: Temporal difference error
- Difference between estimated value and current Q-value

Algorithm Steps

Initialize Q-table with zeros for all state-action pairs
For each episode:
- Initialize game state
- While game not over:
  - Choose action using epsilon-greedy policy
  - Execute action, observe reward and next state
  - Update Q-table using Bellman equation
  - Update current state
Repeat for many episodes until convergence

Example Application

Let's walk through a simple example with a 2x2 grid:

States: 4 possible positions (0,0), (0,1), (1,0), (1,1)
Actions: Up, Down, Left, Right
Initial Q-table: All zeros
Parameters: $\alpha = 0.1$, $\gamma = 0.9$, $\epsilon = 0.1$

If in state (0,0) we take Right, get reward +1, and move to (0,1):

Old Q((0,0), Right) = 0
max Q for (0,1) = 0 (initially)
New Q = 0 + 0.1 * [1 + 0.9*0 - 0] = 0.1

Implementation Plan

Step 1: Define and Explain the Problem

We need to create two main components:

Snake Game Environment: A fully playable Snake game with:
- Snake that grows when eating food
- Collision detection with walls and itself
- Score tracking
- Visual display
RL Agent: An AI that learns to play Snake by:
- Observing game state
- Choosing actions (up, down, left, right)
- Receiving rewards
- Updating its policy based on experience

The challenge is designing a state representation that captures enough information for learning while keeping the state space manageable.

Step 2: Present Methods (Already Done Above)

We've already compared methods and selected tabular Q-learning for initial implementation.

Step 3: Propose Algorithm

Main Training Algorithm

1. Initialize Q-table with dimensions [num_states, num_actions]
2. Set hyperparameters: alpha, gamma, epsilon, epsilon_decay, min_epsilon
3. For episode = 1 to max_episodes:
   4. Reset game environment
   5. Get initial state s
   6. While game not over:
       7. With probability epsilon: choose random action a
          Else: choose a = argmax(Q[s, :])
       8. Execute action a, get reward r and next state s'
       9. Update Q[s, a] = Q[s, a] + alpha * (r + gamma * max(Q[s', :]) - Q[s, a])
       10. s = s'
       11. Decrease epsilon
   12. Log episode score and length

Game Loop Algorithm

1. Initialize pygame, clock, display
2. Create snake with initial position and direction
3. Place food at random position
4. While game running:
   5. Handle user input (for manual play)
   6. Update snake position based on direction
   7. Check collisions:
       - If snake hits wall or itself: game over
       - If snake head at food position: grow snake, increase score, place new food
   8. Draw all game elements
   9. Update display, control frame rate

Step 4: Execute Algorithm Manually

Let's trace through a simplified example:

Initial Setup:

Grid: 3x3
Snake: [(1,1)] (head at center)
Food: (0,0)
Q-table: All zeros
State: Represented as danger directions and food direction

Step 1: State s = [danger_left=0, danger_right=0, danger_up=0, danger_down=0, food_left=1, food_up=1]

Food is up and left from snake

Step 2: Choose action (epsilon-greedy with epsilon=0.1)

90% chance: max Q (all zeros, so random)
Let's say we choose Up

Step 3: Execute Up

Snake moves to (1,0)
No food eaten, so reward = -0.1
New state s' = [danger_left=0, danger_right=0, danger_up=1 (wall), danger_down=0, food_left=1, food_up=0]

Step 4: Update Q-value

Old Q[s, Up] = 0
max Q[s', :] = 0
New Q = 0 + 0.1 * (-0.1 + 0.9*0 - 0) = -0.01

This shows how the agent learns that moving Up in the initial state gives a small negative reward.

Step 5: Enumerate Variables

Game Variables:

grid_size: Tuple (width, height) in cells
cell_size: Pixel size of each cell
snake: List of (x,y) positions
food: (x,y) position of food
direction: Current moving direction (0: up, 1: right, 2: down, 3: left)
score: Current score
game_over: Boolean flag
clock: PyGame clock for frame rate control
screen: PyGame display surface

RL Agent Variables:

q_table: Numpy array of shape [num_states, num_actions]
alpha: Learning rate
gamma: Discount factor
epsilon: Exploration rate
epsilon_decay: Rate at which epsilon decreases
min_epsilon: Minimum exploration rate
state: Current state representation
action_space: List of possible actions
total_rewards: List of rewards per episode
scores: List of scores per episode

Step 6: Specify Classes, Attributes, and Methods

Class: `SnakeGame`

Role: Manages the game environment

Attributes:

grid_width, grid_height: Grid dimensions
cell_size: Size of each cell in pixels
snake: List of (x,y) positions
food: (x,y) position
direction: Current direction (0-3)
score: Current score
game_over: Boolean
screen: PyGame display
clock: PyGame clock

Methods:

__init__(width, height, cell_size): Initialize game
reset(): Reset game to initial state
get_state(): Return current state as RL-friendly representation
step(action): Execute action, return (next_state, reward, done)
move(): Update snake position
check_collision(): Check wall/self collisions
check_food(): Check if food eaten
place_food(): Place food at random empty position
draw(): Render game to screen
get_game_info(): Return score and snake length

Class: `QLearningAgent`

Role: Learns to play Snake using Q-learning

Attributes:

q_table: Q-value table
alpha: Learning rate
gamma: Discount factor
epsilon: Exploration rate
epsilon_decay: Decay rate
min_epsilon: Minimum epsilon
action_space: Possible actions
state_size: Number of possible states
action_size: Number of possible actions

Methods:

__init__(state_size, action_size, alpha, gamma, epsilon): Initialize agent
get_state_index(state): Convert state to table index
choose_action(state): Epsilon-greedy action selection
learn(state, action, reward, next_state, done): Update Q-table
decay_epsilon(): Reduce exploration rate
save_model(path): Save Q-table to file
load_model(path): Load Q-table from file

Functions:

train_agent(agent, env, episodes): Train agent for specified episodes
test_agent(agent, env, episodes): Test trained agent
plot_training_results(scores, rewards): Visualize training progress
play_human(env): Allow human to play game

Step 7: Specify Libraries and Dependencies

Required Libraries:

PyGame: Game development library
- Role: Handles graphics, input, and game loop
- Utility: Creates game window, draws shapes, handles keyboard input
NumPy: Numerical computing
- Role: Efficient array operations
- Utility: Q-table storage and manipulation, state representation
Matplotlib: Plotting library
- Role: Data visualization
- Utility: Plot training progress, scores, and rewards
Random: Python standard library
- Role: Random number generation
- Utility: Food placement, epsilon-greedy exploration
Time: Python standard library
- Role: Time measurement
- Utility: Frame rate control, training duration measurement

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
models		models
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
agent_player.py		agent_player.py
logging.conf		logging.conf
plotting.png		plotting.png
pytest.ini		pytest.ini
q_learning_agent.py		q_learning_agent.py
requirements.txt		requirements.txt
score_distribution.png		score_distribution.png
snake_game.py		snake_game.py
snake_rl.png		snake_rl.png
train_snake.py		train_snake.py
training_results.png		training_results.png

Folders and files

Latest commit

History

Repository files navigation

SnakAI Project

Introduction

Literature Review

Game Development Fundamentals

Reinforcement Learning Basics

State Representation for Snake

Methods for Implementing Snake Game with RL

Method 1: Tabular Q-learning with Simple State Representation

Method 2: Deep Q-Network (DQN) with Neural Network

Method 3: Policy Gradient Methods (REINFORCE)

Method 4: Double DQN with Experience Replay

Comparative Analysis and Optimal Method Selection

Detailed Explanation of Optimal Method: Tabular Q-learning

Mathematical Foundation

Algorithm Steps

Example Application

Implementation Plan

Step 1: Define and Explain the Problem

Step 2: Present Methods (Already Done Above)

Step 3: Propose Algorithm

Main Training Algorithm

Game Loop Algorithm

Step 4: Execute Algorithm Manually

Step 5: Enumerate Variables

Game Variables:

RL Agent Variables:

Step 6: Specify Classes, Attributes, and Methods

Class: SnakeGame

Class: QLearningAgent

Functions:

Step 7: Specify Libraries and Dependencies

Required Libraries:

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Class: `SnakeGame`

Class: `QLearningAgent`

Packages