[PR #2] [MERGED] Complete TP4 DeepLearning notebooks - RNN with Embedding layer exercises #2

New Issue

arthur · 2025-12-01T17:04:31+01:00

arthur commented

2025-12-01 17:04:31 +01:00

📋 Pull Request Information

Original PR: https://github.com/ArthurDanjou/ArtStudies/pull/2
Author: @Copilot
Created: 11/26/2025
Status: ✅ Merged
Merged: 11/26/2025
Merged by: @ArthurDanjou

Base: master ← Head: copilot/fix-code-cells-tp4

📝 Commits (3)

dc05441 Initial plan
886a7a2 Complete TP4 Bonus notebook code cells for DeepLearning
aad17ec Implement feature X to enhance user experience and optimize performance

📊 Changes

1 file changed (+227 additions, -93 deletions)

View changed files

📝 M2/Deep Learning/TP4 - Récurrents/TP4 - Bonus.ipynb (+227 -93)

📄 Description

TP4 Bonus notebook had empty code cells for the Embedding layer comparison exercises. Completed the missing implementations.

Changes to `TP4 - Bonus.ipynb`

Training loop: Compare embedding dimensions [8, 16, 32, 64, 128] over 10 epochs, tracking min validation loss
Results processing: DataFrame conversion with mean/std computation per dimension
Visualization: Matplotlib plot showing embedding dimension vs. validation loss
Analysis: Added markdown cell explaining expected tradeoffs (underfitting at low dims, overfitting risk at high dims)

# Training loop structure
dimensions = [8, 16, 32, 64, 128]
for dimension in dimensions:
    model = get_model(dimension, n_characters)
    model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.Adam())
    history = model.fit(X_train, y_train, epochs=n_epochs, validation_data=(X_val, y_val))
    results.append({"dimension": dimension, "val_loss": min(history.history["val_loss"])})

Note: Beaudelaire.txt data file needs to be downloaded from Project Gutenberg (pg6099) to run the notebooks.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

www.gutenberg.org

Triggering command: /usr/bin/curl curl -s REDACTED -o Beaudelaire.txt (dns block)

Triggering command: /usr/bin/wget wget -q REDACTED -O Beaudelaire.txt (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled

Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

Corrige les cellules de code déjà présentes et complète les autres en répondant aux questions demandées dans le TP 4 de DeepLearning de M2

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Note

Completes the Bonus notebook by adding embedding-based data prep, model factory, training across embedding dimensions, result aggregation, plotting, and brief analysis.

Notebook: M2/Deep Learning/TP4 - Récurrents/TP4 - Bonus.ipynb

Data prep: Build character index mappings; generate integer sequences X/y; split into train/val.

Model: Add get_model(dimension, vocabulary_size) creating Embedding → SimpleRNN(128) → Dense(vocabulary_size, softmax).

Training sweep: Loop over embedding dimensions [8, 16, 32, 64, 128], train for 10 epochs, record min val_loss per run.

Results processing: Convert to DataFrame; produce list of tuples (dimension, mean_val_loss, std_val_loss) (single-run std=0).

Visualization: Plot embedding dimension vs. min validation loss (Matplotlib).

Analysis: Add markdown commenting on trade-offs across embedding sizes.

^{Written by Cursor Bugbot for commit aad17ec465. This will update automatically on new commits. Configure here.}

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/ArthurDanjou/ArtStudies/pull/2 **Author:** [@Copilot](https://github.com/apps/copilot-swe-agent) **Created:** 11/26/2025 **Status:** ✅ Merged **Merged:** 11/26/2025 **Merged by:** [@ArthurDanjou](https://github.com/ArthurDanjou) **Base:** `master` ← **Head:** `copilot/fix-code-cells-tp4` --- ### 📝 Commits (3) - [`dc05441`](https://github.com/ArthurDanjou/ArtStudies/commit/dc054417f7254532e150971719930711baba5f45) Initial plan - [`886a7a2`](https://github.com/ArthurDanjou/ArtStudies/commit/886a7a2e2cc25409c0fde6ed346b2a0b83f99021) Complete TP4 Bonus notebook code cells for DeepLearning - [`aad17ec`](https://github.com/ArthurDanjou/ArtStudies/commit/aad17ec46573087740be4036786149431db0709e) Implement feature X to enhance user experience and optimize performance ### 📊 Changes **1 file changed** (+227 additions, -93 deletions) <details> <summary>View changed files</summary> 📝 `M2/Deep Learning/TP4 - Récurrents/TP4 - Bonus.ipynb` (+227 -93) </details> ### 📄 Description TP4 Bonus notebook had empty code cells for the Embedding layer comparison exercises. Completed the missing implementations. ### Changes to `TP4 - Bonus.ipynb` - **Training loop**: Compare embedding dimensions [8, 16, 32, 64, 128] over 10 epochs, tracking min validation loss - **Results processing**: DataFrame conversion with mean/std computation per dimension - **Visualization**: Matplotlib plot showing embedding dimension vs. validation loss - **Analysis**: Added markdown cell explaining expected tradeoffs (underfitting at low dims, overfitting risk at high dims) ```python # Training loop structure dimensions = [8, 16, 32, 64, 128] for dimension in dimensions: model = get_model(dimension, n_characters) model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.Adam()) history = model.fit(X_train, y_train, epochs=n_epochs, validation_data=(X_val, y_val)) results.append({"dimension": dimension, "val_loss": min(history.history["val_loss"])}) ``` Note: `Beaudelaire.txt` data file needs to be downloaded from Project Gutenberg (pg6099) to run the notebooks. > [!WARNING] > > <details> > <summary>Firewall rules blocked me from connecting to one or more addresses (expand for details)</summary> > > #### I tried to connect to the following addresses, but was blocked by firewall rules: > > - `www.gutenberg.org` > - Triggering command: `/usr/bin/curl curl -s REDACTED -o Beaudelaire.txt` (dns block) > - Triggering command: `/usr/bin/wget wget -q REDACTED -O Beaudelaire.txt` (dns block) > > If you need me to access, download, or install something from one of these locations, you can either: > > - Configure [Actions setup steps](https://gh.io/copilot/actions-setup-steps) to set up my environment, which run before the firewall is enabled > - Add the appropriate URLs or hosts to the custom allowlist in this repository's [Copilot coding agent settings](https://github.com/ArthurDanjou/ArtStudies/settings/copilot/coding_agent) (admins only) > > </details>  <details> <summary>Original prompt</summary> > Corrige les cellules de code déjà présentes et complète les autres en répondant aux questions demandées dans le TP 4 de DeepLearning de M2 </details>  --- ✨ Let Copilot coding agent [set things up for you](https://github.com/ArthurDanjou/ArtStudies/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo.  --- > [!NOTE] > Completes the Bonus notebook by adding embedding-based data prep, model factory, training across embedding dimensions, result aggregation, plotting, and brief analysis. > > - **Notebook**: `M2/Deep Learning/TP4 - Récurrents/TP4 - Bonus.ipynb` > - **Data prep**: Build character index mappings; generate integer sequences `X`/`y`; split into train/val. > - **Model**: Add `get_model(dimension, vocabulary_size)` creating `Embedding` → `SimpleRNN(128)` → `Dense(vocabulary_size, softmax)`. > - **Training sweep**: Loop over embedding dimensions `[8, 16, 32, 64, 128]`, train for 10 epochs, record min `val_loss` per run. > - **Results processing**: Convert to `DataFrame`; produce list of tuples `(dimension, mean_val_loss, std_val_loss)` (single-run std=0). > - **Visualization**: Plot embedding dimension vs. min validation loss (Matplotlib). > - **Analysis**: Add markdown commenting on trade-offs across embedding sizes. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit aad17ec46573087740be4036786149431db0709e. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

arthur added the pull-request label 2025-12-01 17:04:31 +01:00

arthur closed this issue

2025-12-01 17:04:34 +01:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: arthur/ArtStudies#2

[PR #2] [MERGED] Complete TP4 DeepLearning notebooks - RNN with Embedding layer exercises #2

📋 Pull Request Information

📝 Commits (3)

📊 Changes

📄 Description

Changes to TP4 - Bonus.ipynb

I tried to connect to the following addresses, but was blocked by firewall rules:

Changes to `TP4 - Bonus.ipynb`