mirror of
https://github.com/ArthurDanjou/artsite.git
synced 2026-03-16 05:09:46 +01:00
Refactor project documentation and structure
- Updated data visualization project documentation to remove incomplete warning. - Deleted the glm-financial-assets project file and replaced it with glm-implied-volatility project file, detailing a comprehensive study on implied volatility prediction using GLMs and machine learning. - Marked n8n automations project as completed. - Added new project on reinforcement learning applied to Atari Tennis, detailing agent comparisons and results. - Removed outdated rl-tennis project file. - Updated package dependencies in package.json for improved stability and performance.
This commit is contained in:
@@ -15,10 +15,6 @@ tags:
|
|||||||
icon: i-ph-chart-bar-duotone
|
icon: i-ph-chart-bar-duotone
|
||||||
---
|
---
|
||||||
|
|
||||||
::warning
|
|
||||||
The project is complete, but the documentation is still being expanded with more details.
|
|
||||||
::
|
|
||||||
|
|
||||||
This project involves building an interactive data visualization application using R and R Shiny. The goal is to deliver dynamic, explorable visualizations that let users interact with the data in meaningful ways.
|
This project involves building an interactive data visualization application using R and R Shiny. The goal is to deliver dynamic, explorable visualizations that let users interact with the data in meaningful ways.
|
||||||
|
|
||||||
::BackgroundTitle{title="Technologies & Tools"}
|
::BackgroundTitle{title="Technologies & Tools"}
|
||||||
|
|||||||
@@ -1,71 +0,0 @@
|
|||||||
---
|
|
||||||
slug: implied-volatility-modeling
|
|
||||||
title: Implied Volatility Surface Modeling
|
|
||||||
type: Academic Project
|
|
||||||
description: A large-scale statistical study comparing Generalized Linear Models (GLMs) and black-box machine learning architectures to predict the implied volatility of S&P 500 options.
|
|
||||||
shortDescription: Predicting the SPX volatility surface using GLMs and black-box models on 1.2 million observations.
|
|
||||||
publishedAt: 2026-02-28
|
|
||||||
readingTime: 3
|
|
||||||
status: In progress
|
|
||||||
tags:
|
|
||||||
- R
|
|
||||||
- GLM
|
|
||||||
- Finance
|
|
||||||
- Machine Learning
|
|
||||||
icon: i-ph-graph-duotone
|
|
||||||
---
|
|
||||||
|
|
||||||
This project targets high-precision calibration of the **Implied Volatility Surface** using a large-scale dataset of S&P 500 (SPX) European options.
|
|
||||||
|
|
||||||
The core objective is to stress-test classic statistical models against modern predictive algorithms. **Generalized Linear Models (GLMs)** provide a transparent baseline, while more complex "black-box" architectures are evaluated on whether their accuracy gains justify reduced interpretability in a risk management context.
|
|
||||||
|
|
||||||
::BackgroundTitle{title="Dataset & Scale"}
|
|
||||||
::
|
|
||||||
|
|
||||||
The modeling is performed on a high-dimensional dataset with over **1.2 million observations**.
|
|
||||||
|
|
||||||
- **Target Variable**: `implied_vol_ref` (implied volatility).
|
|
||||||
- **Features**: Option strike price ($K$), underlying asset price ($S$), and time to maturity ($\tau$).
|
|
||||||
- **Volume**: A training set of $1,251,307$ rows and a test set of identical size.
|
|
||||||
|
|
||||||
::BackgroundTitle{title="Modeling Methodology"}
|
|
||||||
::
|
|
||||||
|
|
||||||
The project follows a rigorous statistical pipeline to compare two modeling philosophies:
|
|
||||||
|
|
||||||
### 1. The Statistical Baseline (GLM)
|
|
||||||
Using R's GLM framework, I implement models with targeted link functions and error distributions (such as **Gamma** or **Inverse Gaussian**) to capture the global structure of the volatility surface. These models serve as the benchmark for transparency and stability.
|
|
||||||
|
|
||||||
### 2. The Black-Box Challenge
|
|
||||||
To capture local non-linearities such as the volatility smile and skew, I explore more complex architectures. Performance is evaluated by **Root Mean Squared Error (RMSE)** relative to the GLM baselines.
|
|
||||||
|
|
||||||
### 3. Feature Engineering
|
|
||||||
Key financial indicators are derived from the raw data:
|
|
||||||
- **Moneyness**: Calculated as the ratio $K/S$.
|
|
||||||
- **Temporal Dynamics**: Transformations of time to maturity to linearize the term structure.
|
|
||||||
|
|
||||||
::BackgroundTitle{title="Evaluation & Reproducibility"}
|
|
||||||
::
|
|
||||||
|
|
||||||
Performance is measured strictly via RMSE on the original scale of the target variable. To ensure reproducibility and precise comparisons across model iterations, a fixed random seed is maintained throughout the workflow.
|
|
||||||
|
|
||||||
```r
|
|
||||||
set.seed(2025)
|
|
||||||
|
|
||||||
TrainData <- read.csv("train_ISF.csv", stringsAsFactors = FALSE)
|
|
||||||
TestX <- read.csv("test_ISF.csv", stringsAsFactors = FALSE)
|
|
||||||
|
|
||||||
rmse_eval <- function(actual, predicted) {
|
|
||||||
sqrt(mean((actual - predicted)^2))
|
|
||||||
}
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
::BackgroundTitle{title="Critical Analysis"}
|
|
||||||
::
|
|
||||||
|
|
||||||
Beyond pure prediction, the project addresses:
|
|
||||||
|
|
||||||
- Model Limits: Identifying market regimes where models fail (e.g., deep out-of-the-money options).
|
|
||||||
- Interpretability: Quantifying the trade-off between complexity and practical utility in a risk management context.
|
|
||||||
- Future Extensions: Considering richer dynamics, such as historical volatility or skew-specific targets.
|
|
||||||
336
content/projects/glm-implied-volatility.md
Normal file
336
content/projects/glm-implied-volatility.md
Normal file
@@ -0,0 +1,336 @@
|
|||||||
|
---
|
||||||
|
slug: implied-volatility-prediction-from-options-data
|
||||||
|
title: Implied Volatility Prediction from Options Data
|
||||||
|
type: Academic Project
|
||||||
|
description: A large-scale statistical study comparing Generalized Linear Models (GLMs) and black-box machine learning architectures to predict the implied volatility of S&P 500 options.
|
||||||
|
shortDescription: Predicting implied volatility using advanced regression techniques and machine learning models on financial options data.
|
||||||
|
publishedAt: 2026-02-28
|
||||||
|
readingTime: 3
|
||||||
|
status: Completed
|
||||||
|
tags:
|
||||||
|
- R
|
||||||
|
- GLM
|
||||||
|
- Finance
|
||||||
|
- Machine Learning
|
||||||
|
- Statistical Modeling
|
||||||
|
icon: i-ph-graph-duotone
|
||||||
|
---
|
||||||
|
|
||||||
|
> **M2 Master's Project** – Predicting implied volatility using advanced regression techniques and machine learning models on financial options data.
|
||||||
|
|
||||||
|
This project explores the prediction of **implied volatility** from options market data, combining classical statistical methods with modern machine learning approaches. The analysis covers data preprocessing, feature engineering, model benchmarking, and interpretability analysis using real-world financial panel data.
|
||||||
|
|
||||||
|
- **GitHub Repository:** [Implied-Volatility-from-Options-Data](https://github.com/ArthurDanjou/Implied-Volatility-from-Options-Data)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Project Overview"}
|
||||||
|
::
|
||||||
|
|
||||||
|
### Problem Statement
|
||||||
|
|
||||||
|
Implied volatility represents the market's forward-looking expectation of an asset's future volatility. Accurate prediction is crucial for:
|
||||||
|
- **Option pricing** and valuation
|
||||||
|
- **Risk management** and hedging strategies
|
||||||
|
- **Trading strategies** based on volatility arbitrage
|
||||||
|
|
||||||
|
### Dataset
|
||||||
|
|
||||||
|
The project uses a comprehensive panel dataset tracking **3,887 assets** across **544 observation dates** (2019-2022):
|
||||||
|
|
||||||
|
| File | Description | Shape |
|
||||||
|
|------|-------------|-------|
|
||||||
|
| `Train_ISF.csv` | Training data with target variable | 1,909,465 rows × 21 columns |
|
||||||
|
| `Test_ISF.csv` | Test data for prediction | 1,251,308 rows × 18 columns |
|
||||||
|
| `hat_y.csv` | Final predictions from both models | 1,251,308 rows × 2 columns |
|
||||||
|
|
||||||
|
### Key Variables
|
||||||
|
|
||||||
|
**Target Variable:**
|
||||||
|
- `implied_vol_ref` – The implied volatility to predict
|
||||||
|
|
||||||
|
**Feature Categories:**
|
||||||
|
- **Identifiers:** `asset_id`, `obs_date`
|
||||||
|
- **Market Activity:** `call_volume`, `put_volume`, `call_oi`, `put_oi`, `total_contracts`
|
||||||
|
- **Volatility Metrics:** `realized_vol_short`, `realized_vol_mid1-3`, `realized_vol_long1-4`, `market_vol_index`
|
||||||
|
- **Option Structure:** `strike_dispersion`, `maturity_count`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Methodology"}
|
||||||
|
::
|
||||||
|
|
||||||
|
### Data Pipeline
|
||||||
|
|
||||||
|
```
|
||||||
|
Raw Data
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ Data Splitting (Chronological 80/20) │
|
||||||
|
│ - Training: 2019-10 to 2021-07 │
|
||||||
|
│ - Validation: 2021-07 to 2022-03 │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ Feature Engineering │
|
||||||
|
│ - Aggregation of volatility horizons │
|
||||||
|
│ - Creation of financial indicators │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
↓
|
||||||
|
┌─────────────────────────────────────────────────────────┐
|
||||||
|
│ Data Preprocessing (tidymodels) │
|
||||||
|
│ - Winsorization (99.5th percentile) │
|
||||||
|
│ - Log/Yeo-Johnson transformations │
|
||||||
|
│ - Z-score normalization │
|
||||||
|
│ - PCA (95% variance retention) │
|
||||||
|
└─────────────────────────────────────────────────────────┘
|
||||||
|
↓
|
||||||
|
Three Datasets Generated:
|
||||||
|
├── Tree-based (raw, scale-invariant)
|
||||||
|
├── Linear (normalized, winsorized)
|
||||||
|
└── PCA (dimensionality-reduced)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Feature Engineering
|
||||||
|
|
||||||
|
New financial indicators created to capture market dynamics:
|
||||||
|
|
||||||
|
| Feature | Description | Formula |
|
||||||
|
|---------|-------------|---------|
|
||||||
|
| `pulse_ratio` | Volatility trend direction | RV_short / RV_long |
|
||||||
|
| `stress_spread` | Asset vs market stress | RV_short - Market_VIX |
|
||||||
|
| `put_call_ratio_volume` | Immediate market stress | Put_Volume / Call_Volume |
|
||||||
|
| `put_call_ratio_oi` | Long-term risk structure | Put_OI / Call_OI |
|
||||||
|
| `liquidity_ratio` | Market depth | Total_Volume / Total_OI |
|
||||||
|
| `option_dispersion` | Market uncertainty | Strike_Dispersion / Total_Contracts |
|
||||||
|
| `put_low_strike` | Downside protection density | Strike_Dispersion / Put_OI |
|
||||||
|
| `put_proportion` | Hedging vs speculation | Put_Volume / Total_Volume |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Models Implemented"}
|
||||||
|
::
|
||||||
|
|
||||||
|
### Linear Models
|
||||||
|
|
||||||
|
| Model | Description | Best RMSE |
|
||||||
|
|-------|-------------|-----------|
|
||||||
|
| **OLS** | Ordinary Least Squares | 11.26 |
|
||||||
|
| **Ridge** | L2 regularization | 12.48 |
|
||||||
|
| **Lasso** | L1 regularization (variable selection) | 12.03 |
|
||||||
|
| **Elastic Net** | L1 + L2 combined | ~12.03 |
|
||||||
|
| **PLS** | Partial Least Squares (on PCA) | 12.79 |
|
||||||
|
|
||||||
|
### Linear Mixed-Effects Models (LMM)
|
||||||
|
|
||||||
|
Advanced panel data models accounting for asset-specific effects:
|
||||||
|
|
||||||
|
| Model | Features | RMSE |
|
||||||
|
|-------|----------|------|
|
||||||
|
| LMM Baseline | All variables + Random Intercept | 8.77 |
|
||||||
|
| LMM Reduced | Collinearity removal | ~8.77 |
|
||||||
|
| LMM Interactions | Financial interaction terms | ~8.77 |
|
||||||
|
| LMM + Quadratic | Convexity terms (vol of vol) | 8.41 |
|
||||||
|
| **LMM + Random Slopes (mod_lmm_5)** | Asset-specific betas | **8.10** ⭐ |
|
||||||
|
|
||||||
|
### Tree-Based Models
|
||||||
|
|
||||||
|
| Model | Strategy | Validation RMSE | Training RMSE |
|
||||||
|
|-------|----------|-----------------|---------------|
|
||||||
|
| **XGBoost** | Level-wise, Bayesian tuning | 10.70 | 0.57 |
|
||||||
|
| **LightGBM** | Leaf-wise, feature regularization | **10.61** ⭐ | 10.90 |
|
||||||
|
| Random Forest | Bagging | DNF* | - |
|
||||||
|
|
||||||
|
*DNF: Did Not Finish (computational constraints)
|
||||||
|
|
||||||
|
### Neural Networks
|
||||||
|
|
||||||
|
| Model | Architecture | Status |
|
||||||
|
|-------|--------------|--------|
|
||||||
|
| MLP | 128-64 units, tanh activation | Failed to converge |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Results Summary"}
|
||||||
|
::
|
||||||
|
|
||||||
|
### Model Comparison
|
||||||
|
|
||||||
|
```
|
||||||
|
RMSE Performance (Lower is Better)
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
Linear Mixed-Effects (LMM5) 8.38 ████████████████████ Best Linear
|
||||||
|
Linear Mixed-Effects (LMM4) 8.41 ███████████████████
|
||||||
|
Linear Mixed-Effects (Baseline) 8.77 ██████████████████
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
LightGBM 10.61 ███████████████ Best Non-Linear
|
||||||
|
XGBoost 10.70 ██████████████
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
OLS (with interactions) 11.26 █████████████
|
||||||
|
Lasso 12.03 ███████████
|
||||||
|
OLS (baseline) 12.01 ███████████
|
||||||
|
Ridge 12.48 ██████████
|
||||||
|
PLS 12.79 █████████
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Findings
|
||||||
|
|
||||||
|
1. **Best Linear Model:** LMM with Random Slopes (RMSE = 8.38)
|
||||||
|
- Captures asset-specific volatility sensitivities
|
||||||
|
- Includes quadratic terms for convexity effects
|
||||||
|
|
||||||
|
2. **Best Non-Linear Model:** LightGBM (RMSE = 10.61)
|
||||||
|
- Superior generalization vs XGBoost
|
||||||
|
- Feature regularization prevents overfitting
|
||||||
|
|
||||||
|
3. **Interpretability Insights (SHAP Analysis):**
|
||||||
|
- `realized_vol_mid` dominates (57% of gain)
|
||||||
|
- Volatility clustering confirmed as primary driver
|
||||||
|
- Non-linear regime switching in stress_spread
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Repository Structure"}
|
||||||
|
::
|
||||||
|
|
||||||
|
```
|
||||||
|
PROJECT/
|
||||||
|
├── Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.qmd # Main analysis (Quarto)
|
||||||
|
├── Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.html # Rendered report
|
||||||
|
├── packages.R # R dependencies installer
|
||||||
|
├── Train_ISF.csv # Training data (~1.9M rows)
|
||||||
|
├── Test_ISF.csv # Test data (~1.25M rows)
|
||||||
|
├── hat_y.csv # Final predictions
|
||||||
|
├── README.md # This file
|
||||||
|
└── results/
|
||||||
|
├── lightgbm/ # LightGBM model outputs
|
||||||
|
└── xgboost/ # XGBoost model outputs
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Getting Started"}
|
||||||
|
::
|
||||||
|
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- **R** ≥ 4.0
|
||||||
|
- Required packages (auto-installed via `packages.R`)
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
```r
|
||||||
|
# Install all dependencies
|
||||||
|
source("packages.R")
|
||||||
|
```
|
||||||
|
|
||||||
|
Or manually install key packages:
|
||||||
|
|
||||||
|
```r
|
||||||
|
install.packages(c(
|
||||||
|
"tidyverse", "tidymodels", "caret", "glmnet",
|
||||||
|
"lme4", "lmerTest", "xgboost", "lightgbm",
|
||||||
|
"ranger", "pls", "shapviz", "rBayesianOptimization"
|
||||||
|
))
|
||||||
|
```
|
||||||
|
|
||||||
|
### Running the Analysis
|
||||||
|
|
||||||
|
1. **Open the Quarto document:**
|
||||||
|
```r
|
||||||
|
# In RStudio
|
||||||
|
rstudioapi::navigateToFile("Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.qmd")
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Render the document:**
|
||||||
|
```r
|
||||||
|
quarto::quarto_render("Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.qmd")
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Or run specific sections interactively** using the code chunks in the `.qmd` file
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Technical Details"}
|
||||||
|
::
|
||||||
|
|
||||||
|
### Data Split Strategy
|
||||||
|
|
||||||
|
- **Chronological split** at 80th percentile of dates
|
||||||
|
- Prevents look-ahead bias and data leakage
|
||||||
|
- Training: ~1.53M observations
|
||||||
|
- Validation: ~376K observations
|
||||||
|
|
||||||
|
### Hyperparameter Tuning
|
||||||
|
|
||||||
|
- **Method:** Bayesian Optimization (Gaussian Processes)
|
||||||
|
- **Acquisition:** Expected Improvement (UCB)
|
||||||
|
- **Goal:** Maximize negative RMSE
|
||||||
|
|
||||||
|
### Evaluation Metric
|
||||||
|
|
||||||
|
**Exponential RMSE** on original scale:
|
||||||
|
|
||||||
|
$$
|
||||||
|
RMSE_{real} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( \exp(\hat{y}_{\log, i}) - y_i \right)^2}
|
||||||
|
$$
|
||||||
|
|
||||||
|
Models trained on log-transformed target for variance stabilization.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Key Concepts"}
|
||||||
|
::
|
||||||
|
|
||||||
|
### Financial Theories Applied
|
||||||
|
|
||||||
|
1. **Volatility Clustering** – Past volatility predicts future volatility
|
||||||
|
2. **Variance Risk Premium** – Spread between implied and realized volatility
|
||||||
|
3. **Fear Gauge** – Put-call ratio as sentiment indicator
|
||||||
|
4. **Mean Reversion** – Volatility tends to return to long-term average
|
||||||
|
5. **Liquidity Premium** – Illiquid assets command higher volatility
|
||||||
|
|
||||||
|
### Statistical Methods
|
||||||
|
|
||||||
|
- Panel data modeling with fixed and random effects
|
||||||
|
- Principal Component Analysis (PCA)
|
||||||
|
- Bayesian hyperparameter optimization
|
||||||
|
- SHAP values for model interpretability
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Authors"}
|
||||||
|
::
|
||||||
|
|
||||||
|
**Team:**
|
||||||
|
- Arthur DANJOU
|
||||||
|
- Camille LEGRAND
|
||||||
|
- Axelle MERIC
|
||||||
|
- Moritz VON SIEMENS
|
||||||
|
|
||||||
|
**Course:** Classification and Regression (M2)
|
||||||
|
**Academic Year:** 2025-2026
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Notes"}
|
||||||
|
::
|
||||||
|
|
||||||
|
- **Computational Constraints:** Some models (Random Forest, MLP) failed due to hardware limitations (16GB RAM, CPU-only)
|
||||||
|
- **Reproducibility:** Set `seed = 2025` for consistent results
|
||||||
|
- **Language:** Analysis documented in English, course materials in French
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
::BackgroundTitle{title="References"}
|
||||||
|
::
|
||||||
|
|
||||||
|
Key R packages used:
|
||||||
|
- `tidymodels` – Modern modeling framework
|
||||||
|
- `glmnet` – Regularized regression
|
||||||
|
- `lme4` / `lmerTest` – Mixed-effects models
|
||||||
|
- `xgboost` / `lightgbm` – Gradient boosting
|
||||||
|
- `shapviz` – Model interpretability
|
||||||
|
- `rBayesianOptimization` – Hyperparameter tuning
|
||||||
@@ -6,7 +6,7 @@ description: An academic project exploring the automation of GenAI workflows usi
|
|||||||
shortDescription: Automating GenAI workflows with n8n and Ollama in a self-hosted environment.
|
shortDescription: Automating GenAI workflows with n8n and Ollama in a self-hosted environment.
|
||||||
publishedAt: 2026-03-15
|
publishedAt: 2026-03-15
|
||||||
readingTime: 2
|
readingTime: 2
|
||||||
status: In progress
|
status: Completed
|
||||||
tags:
|
tags:
|
||||||
- n8n
|
- n8n
|
||||||
- Gemini
|
- Gemini
|
||||||
|
|||||||
119
content/projects/rl-tennis-atari-game.md
Normal file
119
content/projects/rl-tennis-atari-game.md
Normal file
@@ -0,0 +1,119 @@
|
|||||||
|
---
|
||||||
|
slug: rl-tennis-atari-game
|
||||||
|
title: Reinforcement Learning for Tennis Strategy Optimization
|
||||||
|
type: Academic Project
|
||||||
|
description: An academic project exploring the application of reinforcement learning to optimize tennis strategies. The project involves training RL agents on Atari Tennis (ALE) to evaluate strategic decision-making through competitive self-play and baseline benchmarking.
|
||||||
|
shortDescription: Reinforcement learning algorithms applied to Atari tennis matches for strategy optimization and competitive benchmarking.
|
||||||
|
publishedAt: 2026-03-13
|
||||||
|
readingTime: 3
|
||||||
|
status: Completed
|
||||||
|
tags:
|
||||||
|
- Reinforcement Learning
|
||||||
|
- Python
|
||||||
|
- Gymnasium
|
||||||
|
- Atari
|
||||||
|
- ALE
|
||||||
|
icon: i-ph-lightning-duotone
|
||||||
|
---
|
||||||
|
|
||||||
|
Comparison of Reinforcement Learning algorithms on Atari Tennis (`ALE/Tennis-v5` via Gymnasium/PettingZoo).
|
||||||
|
|
||||||
|
- **GitHub Repository:** [Tennis-Atari-Game](https://github.com/ArthurDanjou/Tennis-Atari-Game)
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Overview"}
|
||||||
|
::
|
||||||
|
|
||||||
|
This project implements and compares five RL agents playing Atari Tennis against the built-in AI and in head-to-head tournaments.
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Algorithms"}
|
||||||
|
::
|
||||||
|
|
||||||
|
| Agent | Type | Policy | Update Rule |
|
||||||
|
|-------|------|--------|-------------|
|
||||||
|
| **Random** | Baseline | Uniform random | None |
|
||||||
|
| **SARSA** | TD(0), on-policy | ε-greedy | $W_a \leftarrow W_a + \alpha \cdot (r + \gamma \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s)$ |
|
||||||
|
| **Q-Learning** | TD(0), off-policy | ε-greedy | $W_a \leftarrow W_a + \alpha \cdot (r + \gamma \max_{a'} \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s)$ |
|
||||||
|
| **Monte Carlo** | First-visit MC | ε-greedy | $W_a \leftarrow W_a + \alpha \cdot (G_t - \hat{q}(s, a)) \cdot \phi(s)$ |
|
||||||
|
| **DQN** | Deep Q-Network | ε-greedy | MLP (256→256) with experience replay & target network |
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Architecture"}
|
||||||
|
::
|
||||||
|
|
||||||
|
- **Linear agents** (SARSA, Q-Learning, Monte Carlo): $\hat{q}(s, a; \mathbf{W}) = \mathbf{W}_a^\top \phi(s)$ with $\phi(s) \in \mathbb{R}^{128}$ (RAM observation)
|
||||||
|
- **DQN**: MLP network (128 → 128 → 64 → 18) trained with Adam optimizer, Huber loss, and periodic target network sync
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Environment"}
|
||||||
|
::
|
||||||
|
|
||||||
|
- **Game**: Atari Tennis via PettingZoo (`tennis_v3`)
|
||||||
|
- **Observation**: RAM state (128 features)
|
||||||
|
- **Action Space**: 18 discrete actions
|
||||||
|
- **Agents**: 2 players (`first_0` and `second_0`)
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Project Structure"}
|
||||||
|
::
|
||||||
|
|
||||||
|
```
|
||||||
|
.
|
||||||
|
├── Project_RL_DANJOU_VON-SIEMENS.ipynb # Main notebook
|
||||||
|
├── README.md # This file
|
||||||
|
├── checkpoints/ # Saved agent weights
|
||||||
|
│ ├── sarsa.pkl
|
||||||
|
│ ├── q_learning.pkl
|
||||||
|
│ ├── montecarlo.pkl
|
||||||
|
│ └── dqn.pkl
|
||||||
|
└── plots/ # Training & evaluation plots
|
||||||
|
├── SARSA_training_curves.png
|
||||||
|
├── Q-Learning_training_curves.png
|
||||||
|
├── MonteCarlo_training_curves.png
|
||||||
|
├── DQN_training_curves.png
|
||||||
|
├── evaluation_results.png
|
||||||
|
└── championship_matrix.png
|
||||||
|
```
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Key Results"}
|
||||||
|
::
|
||||||
|
|
||||||
|
### Win Rate vs Random Baseline
|
||||||
|
|
||||||
|
| Agent | Win Rate |
|
||||||
|
|-------|----------|
|
||||||
|
| SARSA | 88.9% |
|
||||||
|
| Q-Learning | 41.2% |
|
||||||
|
| Monte Carlo | 47.1% |
|
||||||
|
| DQN | 6.2% |
|
||||||
|
|
||||||
|
### Championship Tournament
|
||||||
|
|
||||||
|
Full round-robin tournament where each agent faces every other agent in both positions (first_0/second_0).
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Notebook Sections"}
|
||||||
|
::
|
||||||
|
|
||||||
|
1. **Configuration & Checkpoints** — Incremental training workflow with pickle serialization
|
||||||
|
2. **Utility Functions** — Observation normalization, ε-greedy policy
|
||||||
|
3. **Agent Definitions** — `RandomAgent`, `SarsaAgent`, `QLearningAgent`, `MonteCarloAgent`, `DQNAgent`
|
||||||
|
4. **Training Infrastructure** — `train_agent()`, `plot_training_curves()`
|
||||||
|
5. **Evaluation** — Match system, random baseline, round-robin tournament
|
||||||
|
6. **Results & Visualization** — Win rate plots, matchup matrix heatmap
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Known Issues"}
|
||||||
|
::
|
||||||
|
|
||||||
|
- **Monte Carlo & DQN**: Checkpoint loading issues — saved weights may not restore properly during evaluation (training works correctly)
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Dependencies"}
|
||||||
|
::
|
||||||
|
|
||||||
|
- Python 3.13+
|
||||||
|
- `numpy`, `matplotlib`
|
||||||
|
- `torch`
|
||||||
|
- `gymnasium`, `ale-py`
|
||||||
|
- `pettingzoo`
|
||||||
|
- `tqdm`
|
||||||
|
|
||||||
|
::BackgroundTitle{title="Authors"}
|
||||||
|
::
|
||||||
|
|
||||||
|
- Arthur DANJOU
|
||||||
|
- Moritz VON SIEMENS
|
||||||
@@ -1,55 +0,0 @@
|
|||||||
---
|
|
||||||
slug: rl-tennis
|
|
||||||
title: Reinforcement Learning for Tennis Strategy Optimization
|
|
||||||
type: Academic Project
|
|
||||||
description: An academic project exploring the application of reinforcement learning to optimize tennis strategies. The project involves training RL agents on Atari Tennis (ALE) to evaluate strategic decision-making through competitive self-play and baseline benchmarking.
|
|
||||||
shortDescription: Reinforcement learning algorithms applied to Atari tennis matches for strategy optimization and competitive benchmarking.
|
|
||||||
publishedAt: 2026-03-13
|
|
||||||
readingTime: 3
|
|
||||||
status: In progress
|
|
||||||
tags:
|
|
||||||
- Reinforcement Learning
|
|
||||||
- Python
|
|
||||||
- Gymnasium
|
|
||||||
- Atari
|
|
||||||
- ALE
|
|
||||||
icon: i-ph-lightning-duotone
|
|
||||||
---
|
|
||||||
|
|
||||||
::BackgroundTitle{title="Overview"}
|
|
||||||
::
|
|
||||||
|
|
||||||
This project serves as a practical application of theoretical Reinforcement Learning (RL) principles. The goal is to develop and train autonomous agents capable of mastering the complex dynamics of **Atari Tennis**, using the **Arcade Learning Environment (ALE)** via Farama Foundation's Gymnasium.
|
|
||||||
|
|
||||||
Instead of simply reaching a high score, this project focuses on **strategy optimization** and **comparative performance** through a multi-stage tournament architecture.
|
|
||||||
|
|
||||||
::BackgroundTitle{title="Technical Objectives"}
|
|
||||||
::
|
|
||||||
|
|
||||||
The project is divided into three core phases:
|
|
||||||
|
|
||||||
### 1. Algorithm Implementation
|
|
||||||
I am implementing several key RL algorithms covered during my academic curriculum to observe their behavioral differences in a high-dimensional state space:
|
|
||||||
* **Value-Based Methods:** Deep Q-Networks (DQN) and its variants (Double DQN, Dueling DQN).
|
|
||||||
* **Policy Gradient Methods:** Proximal Policy Optimization (PPO) for more stable continuous action control.
|
|
||||||
* **Exploration Strategies:** Implementing epsilon-greedy and entropy-based exploration to handle the sparse reward signals in tennis rallies.
|
|
||||||
|
|
||||||
#### 2. The "Grand Slam" Tournament (Self-Play)
|
|
||||||
To determine the most robust strategy, I developed a competitive framework:
|
|
||||||
* **Agent vs. Agent:** Different algorithms (e.g., PPO vs. DQN) are pitted against each other in head-to-head matches.
|
|
||||||
* **Evolutionary Ranking:** Success is measured not just by points won, but by the ability to adapt to the opponent's playstyle (serve-and-volley vs. baseline play).
|
|
||||||
* **Winner Identification:** The agent with the highest win rate and most stable policy is crowned the "Optimal Strategist."
|
|
||||||
|
|
||||||
#### 3. Benchmarking Against Atari Baselines
|
|
||||||
The final "Boss Level" involves taking my best-performing trained agent and testing it against the pre-trained, high-performance algorithms provided by the Atari/ALE benchmarks. This serves as a validation step to measure the efficiency of my custom implementations against industry-standard baselines.
|
|
||||||
|
|
||||||
::BackgroundTitle{title="Tech Stack & Environment"}
|
|
||||||
::
|
|
||||||
|
|
||||||
* **Environment:** [ALE (Arcade Learning Environment) - Tennis](https://ale.farama.org/environments/tennis/)
|
|
||||||
* **Frameworks:** Python, Gymnasium, PyTorch (for neural network backends).
|
|
||||||
* **Key Challenges:** Handling the long-horizon dependency of a tennis match and the high-frequency input of the Atari RAM/Pixels.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
*This project is currently in the training phase. I am fine-tuning the reward function to discourage "passive" play and reward aggressive net approaches.*
|
|
||||||
28
package.json
28
package.json
@@ -18,11 +18,11 @@
|
|||||||
},
|
},
|
||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@libsql/client": "^0.17.0",
|
"@libsql/client": "^0.17.0",
|
||||||
"@nuxt/content": "3.11.2",
|
"@nuxt/content": "3.12.0",
|
||||||
"@nuxt/eslint": "1.15.1",
|
"@nuxt/eslint": "1.15.2",
|
||||||
"@nuxt/ui": "^4.4.0",
|
"@nuxt/ui": "4.5.1",
|
||||||
"@nuxthub/core": "0.10.6",
|
"@nuxthub/core": "0.10.7",
|
||||||
"@nuxtjs/mdc": "0.20.1",
|
"@nuxtjs/mdc": "0.20.2",
|
||||||
"@nuxtjs/seo": "3.4.0",
|
"@nuxtjs/seo": "3.4.0",
|
||||||
"@vueuse/core": "^14.2.1",
|
"@vueuse/core": "^14.2.1",
|
||||||
"@vueuse/math": "^14.2.1",
|
"@vueuse/math": "^14.2.1",
|
||||||
@@ -30,23 +30,23 @@
|
|||||||
"drizzle-kit": "^0.31.9",
|
"drizzle-kit": "^0.31.9",
|
||||||
"drizzle-orm": "^0.45.1",
|
"drizzle-orm": "^0.45.1",
|
||||||
"nuxt": "4.3.1",
|
"nuxt": "4.3.1",
|
||||||
"nuxt-studio": "1.3.2",
|
"nuxt-studio": "1.4.0",
|
||||||
"vue": "3.5.28",
|
"vue": "3.5.30",
|
||||||
"vue-router": "5.0.2",
|
"vue-router": "5.0.3",
|
||||||
"zod": "^4.3.6"
|
"zod": "^4.3.6"
|
||||||
},
|
},
|
||||||
"devDependencies": {
|
"devDependencies": {
|
||||||
"@iconify-json/devicon": "1.2.58",
|
"@iconify-json/devicon": "1.2.59",
|
||||||
"@iconify-json/file-icons": "^1.2.2",
|
"@iconify-json/file-icons": "^1.2.2",
|
||||||
"@iconify-json/logos": "^1.2.10",
|
"@iconify-json/logos": "^1.2.10",
|
||||||
"@iconify-json/ph": "^1.2.2",
|
"@iconify-json/ph": "^1.2.2",
|
||||||
"@iconify-json/twemoji": "1.2.5",
|
"@iconify-json/twemoji": "1.2.5",
|
||||||
"@iconify-json/vscode-icons": "1.2.43",
|
"@iconify-json/vscode-icons": "1.2.45",
|
||||||
"@types/node": "25.2.3",
|
"@types/node": "25.4.0",
|
||||||
"@vueuse/nuxt": "14.2.1",
|
"@vueuse/nuxt": "14.2.1",
|
||||||
"eslint": "10.0.0",
|
"eslint": "10.0.3",
|
||||||
"typescript": "^5.9.3",
|
"typescript": "^5.9.3",
|
||||||
"vue-tsc": "3.2.4",
|
"vue-tsc": "3.2.5",
|
||||||
"wrangler": "4.66.0"
|
"wrangler": "4.71.0"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user