artsite/content/projects/rl-tennis-atari-game.md at ac5ccb3555eb4d523fdbdc59ab7cab9902436ae2

mirror of https://github.com/ArthurDanjou/artsite.git synced 2026-03-16 05:09:46 +01:00

Files

Arthur DANJOU ac5ccb3555 Refactor project documentation and structure

- Updated data visualization project documentation to remove incomplete warning.
- Deleted the glm-financial-assets project file and replaced it with glm-implied-volatility project file, detailing a comprehensive study on implied volatility prediction using GLMs and machine learning.
- Marked n8n automations project as completed.
- Added new project on reinforcement learning applied to Atari Tennis, detailing agent comparisons and results.
- Removed outdated rl-tennis project file.
- Updated package dependencies in package.json for improved stability and performance.

2026-03-10 12:07:09 +01:00

4.2 KiB

Raw Blame History

slug, title, type, description, shortDescription, publishedAt, readingTime, status, tags, icon

slug

title

type

description

shortDescription

publishedAt

readingTime

status

tags

icon

rl-tennis-atari-game

Reinforcement Learning for Tennis Strategy Optimization

Academic Project

An academic project exploring the application of reinforcement learning to optimize tennis strategies. The project involves training RL agents on Atari Tennis (ALE) to evaluate strategic decision-making through competitive self-play and baseline benchmarking.

Reinforcement learning algorithms applied to Atari tennis matches for strategy optimization and competitive benchmarking.

2026-03-13

Completed

Reinforcement Learning

Python

Gymnasium

Atari

ALE

i-ph-lightning-duotone

Comparison of Reinforcement Learning algorithms on Atari Tennis (ALE/Tennis-v5 via Gymnasium/PettingZoo).

GitHub Repository: Tennis-Atari-Game

::BackgroundTitle{title="Overview"} ::

This project implements and compares five RL agents playing Atari Tennis against the built-in AI and in head-to-head tournaments.

::BackgroundTitle{title="Algorithms"} ::

Agent	Type	Policy	Update Rule
Random	Baseline	Uniform random	None
SARSA	TD(0), on-policy	ε-greedy	`W_a \leftarrow W_a + \alpha \cdot (r + \gamma \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s)`
Q-Learning	TD(0), off-policy	ε-greedy	`W_a \leftarrow W_a + \alpha \cdot (r + \gamma \max_{a'} \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s)`
Monte Carlo	First-visit MC	ε-greedy	`W_a \leftarrow W_a + \alpha \cdot (G_t - \hat{q}(s, a)) \cdot \phi(s)`
DQN	Deep Q-Network	ε-greedy	MLP (256→256) with experience replay & target network

::BackgroundTitle{title="Architecture"} ::

Linear agents (SARSA, Q-Learning, Monte Carlo): \hat{q}(s, a; \mathbf{W}) = \mathbf{W}_a^\top \phi(s) with \phi(s) \in \mathbb{R}^{128} (RAM observation)
DQN: MLP network (128 → 128 → 64 → 18) trained with Adam optimizer, Huber loss, and periodic target network sync

::BackgroundTitle{title="Environment"} ::

Game: Atari Tennis via PettingZoo (tennis_v3)
Observation: RAM state (128 features)
Action Space: 18 discrete actions
Agents: 2 players (first_0 and second_0)

::BackgroundTitle{title="Project Structure"} ::

.
├── Project_RL_DANJOU_VON-SIEMENS.ipynb    # Main notebook
├── README.md                              # This file
├── checkpoints/                           # Saved agent weights
│   ├── sarsa.pkl
│   ├── q_learning.pkl
│   ├── montecarlo.pkl
│   └── dqn.pkl
└── plots/                                 # Training & evaluation plots
    ├── SARSA_training_curves.png
    ├── Q-Learning_training_curves.png
    ├── MonteCarlo_training_curves.png
    ├── DQN_training_curves.png
    ├── evaluation_results.png
    └── championship_matrix.png

::BackgroundTitle{title="Key Results"} ::

Win Rate vs Random Baseline

Agent	Win Rate
SARSA	88.9%
Q-Learning	41.2%
Monte Carlo	47.1%
DQN	6.2%

Championship Tournament

Full round-robin tournament where each agent faces every other agent in both positions (first_0/second_0).

::BackgroundTitle{title="Notebook Sections"} ::

Configuration & Checkpoints — Incremental training workflow with pickle serialization
Utility Functions — Observation normalization, ε-greedy policy
Agent Definitions — RandomAgent, SarsaAgent, QLearningAgent, MonteCarloAgent, DQNAgent
Training Infrastructure — train_agent(), plot_training_curves()
Evaluation — Match system, random baseline, round-robin tournament
Results & Visualization — Win rate plots, matchup matrix heatmap

::BackgroundTitle{title="Known Issues"} ::

Monte Carlo & DQN: Checkpoint loading issues — saved weights may not restore properly during evaluation (training works correctly)

::BackgroundTitle{title="Dependencies"} ::

Python 3.13+
numpy, matplotlib
torch
gymnasium, ale-py
pettingzoo
tqdm

::BackgroundTitle{title="Authors"} ::

Arthur DANJOU
Moritz VON SIEMENS

4.2 KiB Raw Blame History

Win Rate vs Random Baseline

Championship Tournament

4.2 KiB

Raw Blame History