- Updated data visualization project documentation to remove incomplete warning. - Deleted the glm-financial-assets project file and replaced it with glm-implied-volatility project file, detailing a comprehensive study on implied volatility prediction using GLMs and machine learning. - Marked n8n automations project as completed. - Added new project on reinforcement learning applied to Atari Tennis, detailing agent comparisons and results. - Removed outdated rl-tennis project file. - Updated package dependencies in package.json for improved stability and performance.
4.2 KiB
slug, title, type, description, shortDescription, publishedAt, readingTime, status, tags, icon
| slug | title | type | description | shortDescription | publishedAt | readingTime | status | tags | icon | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rl-tennis-atari-game | Reinforcement Learning for Tennis Strategy Optimization | Academic Project | An academic project exploring the application of reinforcement learning to optimize tennis strategies. The project involves training RL agents on Atari Tennis (ALE) to evaluate strategic decision-making through competitive self-play and baseline benchmarking. | Reinforcement learning algorithms applied to Atari tennis matches for strategy optimization and competitive benchmarking. | 2026-03-13 | 3 | Completed |
|
i-ph-lightning-duotone |
Comparison of Reinforcement Learning algorithms on Atari Tennis (ALE/Tennis-v5 via Gymnasium/PettingZoo).
- GitHub Repository: Tennis-Atari-Game
::BackgroundTitle{title="Overview"} ::
This project implements and compares five RL agents playing Atari Tennis against the built-in AI and in head-to-head tournaments.
::BackgroundTitle{title="Algorithms"} ::
| Agent | Type | Policy | Update Rule |
|---|---|---|---|
| Random | Baseline | Uniform random | None |
| SARSA | TD(0), on-policy | ε-greedy | W_a \leftarrow W_a + \alpha \cdot (r + \gamma \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s) |
| Q-Learning | TD(0), off-policy | ε-greedy | W_a \leftarrow W_a + \alpha \cdot (r + \gamma \max_{a'} \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s) |
| Monte Carlo | First-visit MC | ε-greedy | W_a \leftarrow W_a + \alpha \cdot (G_t - \hat{q}(s, a)) \cdot \phi(s) |
| DQN | Deep Q-Network | ε-greedy | MLP (256→256) with experience replay & target network |
::BackgroundTitle{title="Architecture"} ::
- Linear agents (SARSA, Q-Learning, Monte Carlo):
\hat{q}(s, a; \mathbf{W}) = \mathbf{W}_a^\top \phi(s)with\phi(s) \in \mathbb{R}^{128}(RAM observation) - DQN: MLP network (128 → 128 → 64 → 18) trained with Adam optimizer, Huber loss, and periodic target network sync
::BackgroundTitle{title="Environment"} ::
- Game: Atari Tennis via PettingZoo (
tennis_v3) - Observation: RAM state (128 features)
- Action Space: 18 discrete actions
- Agents: 2 players (
first_0andsecond_0)
::BackgroundTitle{title="Project Structure"} ::
.
├── Project_RL_DANJOU_VON-SIEMENS.ipynb # Main notebook
├── README.md # This file
├── checkpoints/ # Saved agent weights
│ ├── sarsa.pkl
│ ├── q_learning.pkl
│ ├── montecarlo.pkl
│ └── dqn.pkl
└── plots/ # Training & evaluation plots
├── SARSA_training_curves.png
├── Q-Learning_training_curves.png
├── MonteCarlo_training_curves.png
├── DQN_training_curves.png
├── evaluation_results.png
└── championship_matrix.png
::BackgroundTitle{title="Key Results"} ::
Win Rate vs Random Baseline
| Agent | Win Rate |
|---|---|
| SARSA | 88.9% |
| Q-Learning | 41.2% |
| Monte Carlo | 47.1% |
| DQN | 6.2% |
Championship Tournament
Full round-robin tournament where each agent faces every other agent in both positions (first_0/second_0).
::BackgroundTitle{title="Notebook Sections"} ::
- Configuration & Checkpoints — Incremental training workflow with pickle serialization
- Utility Functions — Observation normalization, ε-greedy policy
- Agent Definitions —
RandomAgent,SarsaAgent,QLearningAgent,MonteCarloAgent,DQNAgent - Training Infrastructure —
train_agent(),plot_training_curves() - Evaluation — Match system, random baseline, round-robin tournament
- Results & Visualization — Win rate plots, matchup matrix heatmap
::BackgroundTitle{title="Known Issues"} ::
- Monte Carlo & DQN: Checkpoint loading issues — saved weights may not restore properly during evaluation (training works correctly)
::BackgroundTitle{title="Dependencies"} ::
- Python 3.13+
numpy,matplotlibtorchgymnasium,ale-pypettingzootqdm
::BackgroundTitle{title="Authors"} ::
- Arthur DANJOU
- Moritz VON SIEMENS