Files
artsite/content/projects/dropout-reduces-underfitting.md

5.8 KiB

slug, title, type, description, publishedAt, readingTime, status, tags, icon
slug title type description publishedAt readingTime status tags icon
dropout-reduces-underfitting Dropout Reduces Underfitting Research Project TensorFlow/Keras implementation and reproduction of "Dropout Reduces Underfitting" (Liu et al., 2023). A comparative study of Early and Late Dropout strategies to optimize model convergence. 2024-12-10 6 Completed
Python
TensorFlow
Deep Learning
Research
i-ph-share-network-duotone

Study and reproduction of the paper: Liu, Z., et al. (2023). Dropout Reduces Underfitting. arXiv:2303.01500.

The paper is available at: https://arxiv.org/abs/2303.01500

This repository contains a robust and modular implementation in TensorFlow/Keras of Early Dropout and Late Dropout strategies. The goal is to verify the hypothesis that dropout, traditionally used to reduce overfitting, can also combat underfitting when applied solely during the initial training phase.

🎯 Scientific Objectives

The study aims to validate the three operating regimes of Dropout described in the paper:

  1. Early Dropout (Targeting Underfitting): Active only during the initial phase to reduce gradient variance and align their direction, allowing for better final optimization.
  2. Late Dropout (Targeting Overfitting): Disabled at the start to allow rapid learning, then activated to regularize final convergence.
  3. Standard Dropout: Constant rate throughout training (Baseline).
  4. No Dropout: Control experiment without dropout.

🛠️ Technical Architecture

Unlike naive Keras callback implementations, this project uses a dynamic approach via the TensorFlow graph to ensure the dropout rate is properly updated on the GPU without model recompilation.

Key Components

  • DynamicDropout: A custom layer inheriting from keras.layers.Layer that reads its rate from a shared tf.Variable.
  • DropoutScheduler: A Keras Callback that drives the rate variable based on the current epoch and the chosen strategy (early, late, standard).
  • ExperimentPipeline: An orchestrator class that handles data loading (MNIST, CIFAR-10, Fashion MNIST), model creation (Dense or CNN), and execution of comparative benchmarks.

File Structure

.
├── README.md                         # This documentation file
├── Dropout reduces underfitting.pdf  # Original research paper
├── pipeline.py                       # Main experiment pipeline
├── pipeline.ipynb                    # Jupyter notebook for experiments
├── pipeline_mnist.ipynb              # Jupyter notebook for MNIST experiments
├── pipeline_cifar10.ipynb            # Jupyter notebook for CIFAR-10 experiments
├── pipeline_cifar100.ipynb           # Jupyter notebook for CIFAR-100 experiments
├── pipeline_fashion_mnist.ipynb      # Jupyter notebook for Fashion MNIST experiments
├── requirements.txt                  # Python dependencies
├── .python-version                   # Python version specification
└── uv.lock                           # Dependency lock file

🚀 Installation

# Clone the repository
git clone https://github.com/arthurdanjou/dropoutreducesunderfitting.git
cd dropoutreducesunderfitting

Install dependencies

pip install tensorflow numpy matplotlib seaborn scikit-learn

📊 Usage

The main notebook pipeline.ipynb contains all necessary code. Here is how to run a typical experiment via the pipeline API.

1. Initialization

Choose your dataset (cifar10, fashion_mnist, mnist) and architecture (cnn, dense).

from pipeline import ExperimentPipeline

# Fashion MNIST is recommended to observe underfitting/overfitting nuances
exp = ExperimentPipeline(dataset_name="fashion_mnist", model_type="cnn")

2. Learning Curves Comparison

Compare training dynamics (Loss & Accuracy) of the three strategies.

exp.compare_learning_curves(
    modes=["standard", "early", "late"],
    switch_epoch=10,  # The epoch where dropout state changes
    rate=0.4,         # Dropout rate
    epochs=30
)

3. Ablation Studies

Study the impact of the "Early" phase duration or Dropout intensity.

# Impact of the switch epoch on final performance
exp.compare_switch_epochs(
    switch_epochs=[5, 10, 15, 20],
    modes=["early"],
    rate=0.4,
    epochs=30
)

# Impact of the dropout rate
exp.compare_drop_rates(
    rates=[0.2, 0.4, 0.6],
    modes=["standard", "early"],
    switch_epoch=10,
    epochs=25
)

4. Data Regimes (Data Scarcity)

Verify the paper's hypothesis that Early Dropout shines on large datasets (or limited models) while Standard Dropout protects small datasets.

# Training on 10%, 50% and 100% of the dataset
exp.run_dataset_size_comparison(
    fractions=[0.1, 0.5, 1.0],
    modes=["standard", "early"],
    rate=0.3,
    switch_epoch=10
)

📈 Expected Results

According to the paper, you should observe:

  • Early Dropout: Higher initial Loss, followed by a sharp drop after the switch_epoch, often reaching a lower minimum than Standard Dropout (reduction of underfitting).
  • Late Dropout: Rapid rise in accuracy at the start (potential overfitting), then stabilized by the activation of dropout.

📄 Detailed Report

📝 Authors

M.Sc. Statistical and Financial Engineering (ISF) - Data Science Track at Université Paris-Dauphine PSL

Based on the work of Liu, Z., et al. (2023). Dropout Reduces Underfitting.