Files
artsite/content/projects/rl-tennis-atari-game.md
Arthur DANJOU ac5ccb3555 Refactor project documentation and structure
- Updated data visualization project documentation to remove incomplete warning.
- Deleted the glm-financial-assets project file and replaced it with glm-implied-volatility project file, detailing a comprehensive study on implied volatility prediction using GLMs and machine learning.
- Marked n8n automations project as completed.
- Added new project on reinforcement learning applied to Atari Tennis, detailing agent comparisons and results.
- Removed outdated rl-tennis project file.
- Updated package dependencies in package.json for improved stability and performance.
2026-03-10 12:07:09 +01:00

4.2 KiB

slug, title, type, description, shortDescription, publishedAt, readingTime, status, tags, icon
slug title type description shortDescription publishedAt readingTime status tags icon
rl-tennis-atari-game Reinforcement Learning for Tennis Strategy Optimization Academic Project An academic project exploring the application of reinforcement learning to optimize tennis strategies. The project involves training RL agents on Atari Tennis (ALE) to evaluate strategic decision-making through competitive self-play and baseline benchmarking. Reinforcement learning algorithms applied to Atari tennis matches for strategy optimization and competitive benchmarking. 2026-03-13 3 Completed
Reinforcement Learning
Python
Gymnasium
Atari
ALE
i-ph-lightning-duotone

Comparison of Reinforcement Learning algorithms on Atari Tennis (ALE/Tennis-v5 via Gymnasium/PettingZoo).

::BackgroundTitle{title="Overview"} ::

This project implements and compares five RL agents playing Atari Tennis against the built-in AI and in head-to-head tournaments.

::BackgroundTitle{title="Algorithms"} ::

Agent Type Policy Update Rule
Random Baseline Uniform random None
SARSA TD(0), on-policy ε-greedy W_a \leftarrow W_a + \alpha \cdot (r + \gamma \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s)
Q-Learning TD(0), off-policy ε-greedy W_a \leftarrow W_a + \alpha \cdot (r + \gamma \max_{a'} \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s)
Monte Carlo First-visit MC ε-greedy W_a \leftarrow W_a + \alpha \cdot (G_t - \hat{q}(s, a)) \cdot \phi(s)
DQN Deep Q-Network ε-greedy MLP (256→256) with experience replay & target network

::BackgroundTitle{title="Architecture"} ::

  • Linear agents (SARSA, Q-Learning, Monte Carlo): \hat{q}(s, a; \mathbf{W}) = \mathbf{W}_a^\top \phi(s) with \phi(s) \in \mathbb{R}^{128} (RAM observation)
  • DQN: MLP network (128 → 128 → 64 → 18) trained with Adam optimizer, Huber loss, and periodic target network sync

::BackgroundTitle{title="Environment"} ::

  • Game: Atari Tennis via PettingZoo (tennis_v3)
  • Observation: RAM state (128 features)
  • Action Space: 18 discrete actions
  • Agents: 2 players (first_0 and second_0)

::BackgroundTitle{title="Project Structure"} ::

.
├── Project_RL_DANJOU_VON-SIEMENS.ipynb    # Main notebook
├── README.md                              # This file
├── checkpoints/                           # Saved agent weights
│   ├── sarsa.pkl
│   ├── q_learning.pkl
│   ├── montecarlo.pkl
│   └── dqn.pkl
└── plots/                                 # Training & evaluation plots
    ├── SARSA_training_curves.png
    ├── Q-Learning_training_curves.png
    ├── MonteCarlo_training_curves.png
    ├── DQN_training_curves.png
    ├── evaluation_results.png
    └── championship_matrix.png

::BackgroundTitle{title="Key Results"} ::

Win Rate vs Random Baseline

Agent Win Rate
SARSA 88.9%
Q-Learning 41.2%
Monte Carlo 47.1%
DQN 6.2%

Championship Tournament

Full round-robin tournament where each agent faces every other agent in both positions (first_0/second_0).

::BackgroundTitle{title="Notebook Sections"} ::

  1. Configuration & Checkpoints — Incremental training workflow with pickle serialization
  2. Utility Functions — Observation normalization, ε-greedy policy
  3. Agent DefinitionsRandomAgent, SarsaAgent, QLearningAgent, MonteCarloAgent, DQNAgent
  4. Training Infrastructuretrain_agent(), plot_training_curves()
  5. Evaluation — Match system, random baseline, round-robin tournament
  6. Results & Visualization — Win rate plots, matchup matrix heatmap

::BackgroundTitle{title="Known Issues"} ::

  • Monte Carlo & DQN: Checkpoint loading issues — saved weights may not restore properly during evaluation (training works correctly)

::BackgroundTitle{title="Dependencies"} ::

  • Python 3.13+
  • numpy, matplotlib
  • torch
  • gymnasium, ale-py
  • pettingzoo
  • tqdm

::BackgroundTitle{title="Authors"} ::

  • Arthur DANJOU
  • Moritz VON SIEMENS