mirror of
https://github.com/ArthurDanjou/artsite.git
synced 2026-03-16 07:09:20 +01:00
72 lines
3.2 KiB
Markdown
72 lines
3.2 KiB
Markdown
---
|
|
slug: implied-volatility-modeling
|
|
title: Implied Volatility Surface Modeling
|
|
type: Academic Project
|
|
description: A large-scale statistical study comparing Generalized Linear Models (GLMs) and black-box machine learning architectures to predict the implied volatility of S&P 500 options.
|
|
shortDescription: Predicting the SPX volatility surface using GLMs and black-box models on 1.2 million observations.
|
|
publishedAt: 2026-02-28
|
|
readingTime: 3
|
|
status: In progress
|
|
tags:
|
|
- R
|
|
- GLM
|
|
- Finance
|
|
- Machine Learning
|
|
icon: i-ph-graph-duotone
|
|
---
|
|
|
|
This project targets high-precision calibration of the **Implied Volatility Surface** using a large-scale dataset of S&P 500 (SPX) European options.
|
|
|
|
The core objective is to stress-test classic statistical models against modern predictive algorithms. **Generalized Linear Models (GLMs)** provide a transparent baseline, while more complex "black-box" architectures are evaluated on whether their accuracy gains justify reduced interpretability in a risk management context.
|
|
|
|
::BackgroundTitle{title="Dataset & Scale"}
|
|
::
|
|
|
|
The modeling is performed on a high-dimensional dataset with over **1.2 million observations**.
|
|
|
|
- **Target Variable**: `implied_vol_ref` (implied volatility).
|
|
- **Features**: Option strike price ($K$), underlying asset price ($S$), and time to maturity ($\tau$).
|
|
- **Volume**: A training set of $1,251,307$ rows and a test set of identical size.
|
|
|
|
::BackgroundTitle{title="Modeling Methodology"}
|
|
::
|
|
|
|
The project follows a rigorous statistical pipeline to compare two modeling philosophies:
|
|
|
|
### 1. The Statistical Baseline (GLM)
|
|
Using R's GLM framework, I implement models with targeted link functions and error distributions (such as **Gamma** or **Inverse Gaussian**) to capture the global structure of the volatility surface. These models serve as the benchmark for transparency and stability.
|
|
|
|
### 2. The Black-Box Challenge
|
|
To capture local non-linearities such as the volatility smile and skew, I explore more complex architectures. Performance is evaluated by **Root Mean Squared Error (RMSE)** relative to the GLM baselines.
|
|
|
|
### 3. Feature Engineering
|
|
Key financial indicators are derived from the raw data:
|
|
- **Moneyness**: Calculated as the ratio $K/S$.
|
|
- **Temporal Dynamics**: Transformations of time to maturity to linearize the term structure.
|
|
|
|
::BackgroundTitle{title="Evaluation & Reproducibility"}
|
|
::
|
|
|
|
Performance is measured strictly via RMSE on the original scale of the target variable. To ensure reproducibility and precise comparisons across model iterations, a fixed random seed is maintained throughout the workflow.
|
|
|
|
```r
|
|
set.seed(2025)
|
|
|
|
TrainData <- read.csv("train_ISF.csv", stringsAsFactors = FALSE)
|
|
TestX <- read.csv("test_ISF.csv", stringsAsFactors = FALSE)
|
|
|
|
rmse_eval <- function(actual, predicted) {
|
|
sqrt(mean((actual - predicted)^2))
|
|
}
|
|
|
|
```
|
|
|
|
::BackgroundTitle{title="Critical Analysis"}
|
|
::
|
|
|
|
Beyond pure prediction, the project addresses:
|
|
|
|
- Model Limits: Identifying market regimes where models fail (e.g., deep out-of-the-money options).
|
|
- Interpretability: Quantifying the trade-off between complexity and practical utility in a risk management context.
|
|
- Future Extensions: Considering richer dynamics, such as historical volatility or skew-specific targets.
|