Refactor code structure for improved readability and maintainability

2026-03-16 05:11:40 +01:00 · 2026-03-10 11:39:11 +01:00
parent 5ab7d46608
commit 895463e9e9
22 changed files with 96 additions and 1260436 deletions
--- a/Regression/Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.html
+++ b/Regression/Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.html
--- a/Regression/Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.qmd
+++ b/Regression/Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.qmd
--- a/Regression/README.md
+++ b/Regression/README.md
@@ -1,310 +0,0 @@
-# Implied Volatility Prediction from Options Data
-
-[![R](https://img.shields.io/badge/R-4.0+-276DC3.svg)](https://www.r-project.org/)
-[![Course](https://img.shields.io/badge/Course-Classification%20%26%20Regression-orange.svg)]()
-[![License](https://img.shields.io/badge/License-Academic-blue.svg)]()
-
-> **M2 Master's Project** – Predicting implied volatility using advanced regression techniques and machine learning models on financial options data.
-
-This project explores the prediction of **implied volatility** from options market data, combining classical statistical methods with modern machine learning approaches. The analysis covers data preprocessing, feature engineering, model benchmarking, and interpretability analysis using real-world financial panel data.
-
---
-
-## 📋 Project Overview
-
-### Problem Statement
-
-Implied volatility represents the market's forward-looking expectation of an asset's future volatility. Accurate prediction is crucial for:
- **Option pricing** and valuation
- **Risk management** and hedging strategies
- **Trading strategies** based on volatility arbitrage
-
-### Dataset
-
-The project uses a comprehensive panel dataset tracking **3,887 assets** across **544 observation dates** (2019-2022):
-
-| File | Description | Shape |
-|------|-------------|-------|
-| `Train_ISF.csv` | Training data with target variable | 1,909,465 rows × 21 columns |
-| `Test_ISF.csv` | Test data for prediction | 1,251,308 rows × 18 columns |
-| `hat_y.csv` | Final predictions from both models | 1,251,308 rows × 2 columns |
-
-### Key Variables
-
-**Target Variable:**
- `implied_vol_ref` – The implied volatility to predict
-
-**Feature Categories:**
- **Identifiers:** `asset_id`, `obs_date`
- **Market Activity:** `call_volume`, `put_volume`, `call_oi`, `put_oi`, `total_contracts`
- **Volatility Metrics:** `realized_vol_short`, `realized_vol_mid1-3`, `realized_vol_long1-4`, `market_vol_index`
- **Option Structure:** `strike_dispersion`, `maturity_count`
-
---
-
-## 🏗️ Methodology
-
-### Data Pipeline
-
-```
-Raw Data
-    ↓
-┌─────────────────────────────────────────────────────────┐
-│  Data Splitting (Chronological 80/20)                  │
-│  - Training: 2019-10 to 2021-07                         │
-│  - Validation: 2021-07 to 2022-03                     │
-└─────────────────────────────────────────────────────────┘
-    ↓
-┌─────────────────────────────────────────────────────────┐
-│  Feature Engineering                                   │
-│  - Aggregation of volatility horizons                 │
-│  - Creation of financial indicators                   │
-└─────────────────────────────────────────────────────────┘
-    ↓
-┌─────────────────────────────────────────────────────────┐
-│  Data Preprocessing (tidymodels)                       │
-│  - Winsorization (99.5th percentile)                  │
-│  - Log/Yeo-Johnson transformations                    │
-│  - Z-score normalization                              │
-│  - PCA (95% variance retention)                       │
-└─────────────────────────────────────────────────────────┘
-    ↓
-Three Datasets Generated:
-├── Tree-based (raw, scale-invariant)
-├── Linear (normalized, winsorized)
-└── PCA (dimensionality-reduced)
-```
-
-### Feature Engineering
-
-New financial indicators created to capture market dynamics:
-
-| Feature | Description | Formula |
-|---------|-------------|---------|
-| `pulse_ratio` | Volatility trend direction | RV_short / RV_long |
-| `stress_spread` | Asset vs market stress | RV_short - Market_VIX |
-| `put_call_ratio_volume` | Immediate market stress | Put_Volume / Call_Volume |
-| `put_call_ratio_oi` | Long-term risk structure | Put_OI / Call_OI |
-| `liquidity_ratio` | Market depth | Total_Volume / Total_OI |
-| `option_dispersion` | Market uncertainty | Strike_Dispersion / Total_Contracts |
-| `put_low_strike` | Downside protection density | Strike_Dispersion / Put_OI |
-| `put_proportion` | Hedging vs speculation | Put_Volume / Total_Volume |
-
---
-
-## 🤖 Models Implemented
-
-### Linear Models
-
-| Model | Description | Best RMSE |
-|-------|-------------|-----------|
-| **OLS** | Ordinary Least Squares | 11.26 |
-| **Ridge** | L2 regularization | 12.48 |
-| **Lasso** | L1 regularization (variable selection) | 12.03 |
-| **Elastic Net** | L1 + L2 combined | ~12.03 |
-| **PLS** | Partial Least Squares (on PCA) | 12.79 |
-
-### Linear Mixed-Effects Models (LMM)
-
-Advanced panel data models accounting for asset-specific effects:
-
-| Model | Features | RMSE |
-|-------|----------|------|
-| LMM Baseline | All variables + Random Intercept | 8.77 |
-| LMM Reduced | Collinearity removal | ~8.77 |
-| LMM Interactions | Financial interaction terms | ~8.77 |
-| LMM + Quadratic | Convexity terms (vol of vol) | 8.41 |
-| **LMM + Random Slopes (mod_lmm_5)** | Asset-specific betas | **8.10** ⭐ |
-
-### Tree-Based Models
-
-| Model | Strategy | Validation RMSE | Training RMSE |
-|-------|----------|-----------------|---------------|
-| **XGBoost** | Level-wise, Bayesian tuning | 10.70 | 0.57 |
-| **LightGBM** | Leaf-wise, feature regularization | **10.61** ⭐ | 10.90 |
-| Random Forest | Bagging | DNF* | - |
-
-*DNF: Did Not Finish (computational constraints)
-
-### Neural Networks
-
-| Model | Architecture | Status |
-|-------|--------------|--------|
-| MLP | 128-64 units, tanh activation | Failed to converge |
-
---
-
-## 📊 Results Summary
-
-### Model Comparison
-
-```
-RMSE Performance (Lower is Better)
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-Linear Mixed-Effects (LMM5)     8.38 ████████████████████ Best Linear
-Linear Mixed-Effects (LMM4)     8.41 ███████████████████
-Linear Mixed-Effects (Baseline) 8.77 ██████████████████
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-LightGBM                       10.61 ███████████████ Best Non-Linear
-XGBoost                        10.70 ██████████████
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-OLS (with interactions)        11.26 █████████████
-Lasso                          12.03 ███████████
-OLS (baseline)                 12.01 ███████████
-Ridge                          12.48 ██████████
-PLS                            12.79 █████████
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-```
-
-### Key Findings
-
-1. **Best Linear Model:** LMM with Random Slopes (RMSE = 8.38)
-   - Captures asset-specific volatility sensitivities
-   - Includes quadratic terms for convexity effects
-
-2. **Best Non-Linear Model:** LightGBM (RMSE = 10.61)
-   - Superior generalization vs XGBoost
-   - Feature regularization prevents overfitting
-
-3. **Interpretability Insights (SHAP Analysis):**
-   - `realized_vol_mid` dominates (57% of gain)
-   - Volatility clustering confirmed as primary driver
-   - Non-linear regime switching in stress_spread
-
---
-
-## 📁 Repository Structure
-
-```
-PROJECT/
-├── Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.qmd    # Main analysis (Quarto)
-├── Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.html    # Rendered report
-├── packages.R                                         # R dependencies installer
-├── Train_ISF.csv                                      # Training data (~1.9M rows)
-├── Test_ISF.csv                                       # Test data (~1.25M rows)
-├── hat_y.csv                                          # Final predictions
-├── README.md                                          # This file
-└── results/
-    ├── lightgbm/                                      # LightGBM model outputs
-    └── xgboost/                                       # XGBoost model outputs
-```
-
---
-
-## 🚀 Getting Started
-
-### Prerequisites
-
- **R** ≥ 4.0
- Required packages (auto-installed via `packages.R`)
-
-### Installation
-
-```r
-# Install all dependencies
-source("packages.R")
-```
-
-Or manually install key packages:
-
-```r
-install.packages(c(
-  "tidyverse", "tidymodels", "caret", "glmnet",
-  "lme4", "lmerTest", "xgboost", "lightgbm",
-  "ranger", "pls", "shapviz", "rBayesianOptimization"
-))
-```
-
-### Running the Analysis
-
-1. **Open the Quarto document:**
-   ```r
-   # In RStudio
-   rstudioapi::navigateToFile("Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.qmd")
-   ```
-
-2. **Render the document:**
-   ```r
-   quarto::quarto_render("Projet_MRC_DANJOU_LEGRAND_MERIC_VONSIEMENS.qmd")
-   ```
-
-3. **Or run specific sections interactively** using the code chunks in the `.qmd` file
-
---
-
-## 🛠️ Technical Details
-
-### Data Split Strategy
-
- **Chronological split** at 80th percentile of dates
- Prevents look-ahead bias and data leakage
- Training: ~1.53M observations
- Validation: ~376K observations
-
-### Hyperparameter Tuning
-
- **Method:** Bayesian Optimization (Gaussian Processes)
- **Acquisition:** Expected Improvement (UCB)
- **Goal:** Maximize negative RMSE
-
-### Evaluation Metric
-
-**Exponential RMSE** on original scale:
-
-$$
-RMSE_{real} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( \exp(\hat{y}_{\log, i}) - y_i \right)^2}
-$$
-
-Models trained on log-transformed target for variance stabilization.
-
---
-
-## 📖 Key Concepts
-
-### Financial Theories Applied
-
-1. **Volatility Clustering** – Past volatility predicts future volatility
-2. **Variance Risk Premium** – Spread between implied and realized volatility
-3. **Fear Gauge** – Put-call ratio as sentiment indicator
-4. **Mean Reversion** – Volatility tends to return to long-term average
-5. **Liquidity Premium** – Illiquid assets command higher volatility
-
-### Statistical Methods
-
- Panel data modeling with fixed and random effects
- Principal Component Analysis (PCA)
- Bayesian hyperparameter optimization
- SHAP values for model interpretability
-
---
-
-## 👥 Authors
-
-**Team:**
- Arthur DANJOU
- Camille LEGRAND  
- Axelle MERIC
- Moritz VON SIEMENS
-
-**Course:** Classification and Regression (M2)
-**Academic Year:** 2025-2026
-
---
-
-## 📝 Notes
-
- **Computational Constraints:** Some models (Random Forest, MLP) failed due to hardware limitations (16GB RAM, CPU-only)
- **Reproducibility:** Set `seed = 2025` for consistent results
- **Language:** Analysis documented in English, course materials in French
-
---
-
-## 📚 References
-
-Key R packages used:
- `tidymodels` – Modern modeling framework
- `glmnet` – Regularized regression
- `lme4` / `lmerTest` – Mixed-effects models
- `xgboost` / `lightgbm` – Gradient boosting
- `shapviz` – Model interpretability
- `rBayesianOptimization` – Hyperparameter tuning
--- a/Regression/hat_y.csv
+++ b/Regression/hat_y.csv
--- a/Regression/packages.R
+++ b/Regression/packages.R
@@ -1,44 +0,0 @@
-# Liste des packages nécessaires
-packages <- c(
-  "tidyverse",
-  "rsample",
-  "scales",
-  "dplyr",
-  "tidyr",
-  "glue",
-  "corrplot",
-  "ggfortify",
-  "carData",
-  "car",
-  "MASS",
-  "ggplot2",
-  "DataExplorer",
-  "skimr",
-  "plotly",
-  "gridExtra",
-  "grid",
-  "rlang",
-  "caret",
-  "reshape2",
-  "class",
-  "ROCR",
-  "randomForest",
-  "fitdistrplus",
-  "hexbin",
-  "paletteer"
-)
-
-# Fonction pour installer les packages manquants
-install_if_missing <- function(p) {
-  if (!require(p, character.only = TRUE)) {
-    install.packages(p, dependencies = TRUE)
-  }
-}
-
-# Application de la fonction sur toute la liste
-invisible(sapply(packages, install_if_missing))
-
-# Chargement de toutes les librairies
-invisible(lapply(packages, library, character.only = TRUE))
-
-message("Tous les packages ont été installés et chargés avec succès !")
--- a/Learning/project/Project_RL_DANJOU_VON-SIEMENS.ipynb
+++ b/Learning/project/Project_RL_DANJOU_VON-SIEMENS.ipynb
--- a/Learning/project/README.md
+++ b/Learning/project/README.md
@@ -1,91 +0,0 @@
-# RL Project: Atari Tennis Tournament
-
-Comparison of Reinforcement Learning algorithms on Atari Tennis (`ALE/Tennis-v5` via Gymnasium/PettingZoo).
-
-## Overview
-
-This project implements and compares five RL agents playing Atari Tennis against the built-in AI and in head-to-head tournaments.
-
-## Algorithms
-
-| Agent | Type | Policy | Update Rule |
-|-------|------|--------|-------------|
-| **Random** | Baseline | Uniform random | None |
-| **SARSA** | TD(0), on-policy | ε-greedy | $W_a \leftarrow W_a + \alpha \cdot (r + \gamma \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s)$ |
-| **Q-Learning** | TD(0), off-policy | ε-greedy | $W_a \leftarrow W_a + \alpha \cdot (r + \gamma \max_{a'} \hat{q}(s', a') - \hat{q}(s, a)) \cdot \phi(s)$ |
-| **Monte Carlo** | First-visit MC | ε-greedy | $W_a \leftarrow W_a + \alpha \cdot (G_t - \hat{q}(s, a)) \cdot \phi(s)$ |
-| **DQN** | Deep Q-Network | ε-greedy | MLP (256→256) with experience replay & target network |
-
-## Architecture
-
- **Linear agents** (SARSA, Q-Learning, Monte Carlo): $\hat{q}(s, a; \mathbf{W}) = \mathbf{W}_a^\top \phi(s)$ with $\phi(s) \in \mathbb{R}^{128}$ (RAM observation)
- **DQN**: MLP network (128 → 128 → 64 → 18) trained with Adam optimizer, Huber loss, and periodic target network sync
-
-## Environment
-
- **Game**: Atari Tennis via PettingZoo (`tennis_v3`)
- **Observation**: RAM state (128 features)
- **Action Space**: 18 discrete actions
- **Agents**: 2 players (`first_0` and `second_0`)
-
-## Project Structure
-
-```
-.
-├── Project_RL_DANJOU_VON-SIEMENS.ipynb   # Main notebook
-├── README.md                              # This file
-├── checkpoints/                           # Saved agent weights
-│   ├── sarsa.pkl
-│   ├── q_learning.pkl
-│   ├── montecarlo.pkl
-│   └── dqn.pkl
-└── plots/                                 # Training & evaluation plots
-    ├── SARSA_training_curves.png
-    ├── Q-Learning_training_curves.png
-    ├── MonteCarlo_training_curves.png
-    ├── DQN_training_curves.png
-    ├── evaluation_results.png
-    └── championship_matrix.png
-```
-
-## Key Results
-
-### Win Rate vs Random Baseline
-
-| Agent | Win Rate |
-|-------|----------|
-| SARSA | 88.9% |
-| Q-Learning | 41.2% |
-| Monte Carlo | 47.1% |
-| DQN | 6.2% |
-
-### Championship Tournament
-
-Full round-robin tournament where each agent faces every other agent in both positions (first_0/second_0).
-
-## Notebook Sections
-
-1. **Configuration & Checkpoints** — Incremental training workflow with pickle serialization
-2. **Utility Functions** — Observation normalization, ε-greedy policy
-3. **Agent Definitions** — `RandomAgent`, `SarsaAgent`, `QLearningAgent`, `MonteCarloAgent`, `DQNAgent`
-4. **Training Infrastructure** — `train_agent()`, `plot_training_curves()`
-5. **Evaluation** — Match system, random baseline, round-robin tournament
-6. **Results & Visualization** — Win rate plots, matchup matrix heatmap
-
-## Known Issues
-
- **Monte Carlo & DQN**: Checkpoint loading issues — saved weights may not restore properly during evaluation (training works correctly)
-
-## Dependencies
-
- Python 3.13+
- `numpy`, `matplotlib`
- `torch`
- `gymnasium`, `ale-py`
- `pettingzoo`
- `tqdm`
-
-## Authors
-
- Arthur DANJOU
- Moritz VON SIEMENS
--- a/Learning/project/checkpoints/dqn.pkl
+++ b/Learning/project/checkpoints/dqn.pkl
--- a/Learning/project/checkpoints/montecarlo.pkl
+++ b/Learning/project/checkpoints/montecarlo.pkl
--- a/Learning/project/checkpoints/q_learning.pkl
+++ b/Learning/project/checkpoints/q_learning.pkl
--- a/Learning/project/checkpoints/sarsa.pkl
+++ b/Learning/project/checkpoints/sarsa.pkl
--- a/Learning/project/plots/DQN_training_curves.png
+++ b/Learning/project/plots/DQN_training_curves.png
--- a/Learning/project/plots/MonteCarlo_training_curves.png
+++ b/Learning/project/plots/MonteCarlo_training_curves.png
--- a/Learning/project/plots/Q-Learning_training_curves.png
+++ b/Learning/project/plots/Q-Learning_training_curves.png
--- a/Learning/project/plots/SARSA_training_curves.png
+++ b/Learning/project/plots/SARSA_training_curves.png
--- a/Learning/project/plots/championship_matrix.png
+++ b/Learning/project/plots/championship_matrix.png
--- a/Learning/project/plots/championship_results.png
+++ b/Learning/project/plots/championship_results.png
--- a/Learning/project/plots/evaluation_results.png
+++ b/Learning/project/plots/evaluation_results.png
--- a/M2/VBA/Course1.xlsm
+++ b/M2/VBA/Course1.xlsm