[PR #2] [MERGED] Complete TP4 DeepLearning notebooks - RNN with Embedding layer exercises #2

Closed
opened 2025-12-01 17:04:31 +01:00 by arthur · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArthurDanjou/ArtStudies/pull/2
Author: @Copilot
Created: 11/26/2025
Status: Merged
Merged: 11/26/2025
Merged by: @ArthurDanjou

Base: masterHead: copilot/fix-code-cells-tp4


📝 Commits (3)

  • dc05441 Initial plan
  • 886a7a2 Complete TP4 Bonus notebook code cells for DeepLearning
  • aad17ec Implement feature X to enhance user experience and optimize performance

📊 Changes

1 file changed (+227 additions, -93 deletions)

View changed files

📝 M2/Deep Learning/TP4 - Récurrents/TP4 - Bonus.ipynb (+227 -93)

📄 Description

TP4 Bonus notebook had empty code cells for the Embedding layer comparison exercises. Completed the missing implementations.

Changes to TP4 - Bonus.ipynb

  • Training loop: Compare embedding dimensions [8, 16, 32, 64, 128] over 10 epochs, tracking min validation loss
  • Results processing: DataFrame conversion with mean/std computation per dimension
  • Visualization: Matplotlib plot showing embedding dimension vs. validation loss
  • Analysis: Added markdown cell explaining expected tradeoffs (underfitting at low dims, overfitting risk at high dims)
# Training loop structure
dimensions = [8, 16, 32, 64, 128]
for dimension in dimensions:
    model = get_model(dimension, n_characters)
    model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.Adam())
    history = model.fit(X_train, y_train, epochs=n_epochs, validation_data=(X_val, y_val))
    results.append({"dimension": dimension, "val_loss": min(history.history["val_loss"])})

Note: Beaudelaire.txt data file needs to be downloaded from Project Gutenberg (pg6099) to run the notebooks.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • www.gutenberg.org
    • Triggering command: /usr/bin/curl curl -s REDACTED -o Beaudelaire.txt (dns block)
    • Triggering command: /usr/bin/wget wget -q REDACTED -O Beaudelaire.txt (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Corrige les cellules de code déjà présentes et complète les autres en répondant aux questions demandées dans le TP 4 de DeepLearning de M2


Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.


Note

Completes the Bonus notebook by adding embedding-based data prep, model factory, training across embedding dimensions, result aggregation, plotting, and brief analysis.

  • Notebook: M2/Deep Learning/TP4 - Récurrents/TP4 - Bonus.ipynb
    • Data prep: Build character index mappings; generate integer sequences X/y; split into train/val.
    • Model: Add get_model(dimension, vocabulary_size) creating EmbeddingSimpleRNN(128)Dense(vocabulary_size, softmax).
    • Training sweep: Loop over embedding dimensions [8, 16, 32, 64, 128], train for 10 epochs, record min val_loss per run.
    • Results processing: Convert to DataFrame; produce list of tuples (dimension, mean_val_loss, std_val_loss) (single-run std=0).
    • Visualization: Plot embedding dimension vs. min validation loss (Matplotlib).
    • Analysis: Add markdown commenting on trade-offs across embedding sizes.

Written by Cursor Bugbot for commit aad17ec465. This will update automatically on new commits. Configure here.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArthurDanjou/ArtStudies/pull/2 **Author:** [@Copilot](https://github.com/apps/copilot-swe-agent) **Created:** 11/26/2025 **Status:** ✅ Merged **Merged:** 11/26/2025 **Merged by:** [@ArthurDanjou](https://github.com/ArthurDanjou) **Base:** `master` ← **Head:** `copilot/fix-code-cells-tp4` --- ### 📝 Commits (3) - [`dc05441`](https://github.com/ArthurDanjou/ArtStudies/commit/dc054417f7254532e150971719930711baba5f45) Initial plan - [`886a7a2`](https://github.com/ArthurDanjou/ArtStudies/commit/886a7a2e2cc25409c0fde6ed346b2a0b83f99021) Complete TP4 Bonus notebook code cells for DeepLearning - [`aad17ec`](https://github.com/ArthurDanjou/ArtStudies/commit/aad17ec46573087740be4036786149431db0709e) Implement feature X to enhance user experience and optimize performance ### 📊 Changes **1 file changed** (+227 additions, -93 deletions) <details> <summary>View changed files</summary> 📝 `M2/Deep Learning/TP4 - Récurrents/TP4 - Bonus.ipynb` (+227 -93) </details> ### 📄 Description TP4 Bonus notebook had empty code cells for the Embedding layer comparison exercises. Completed the missing implementations. ### Changes to `TP4 - Bonus.ipynb` - **Training loop**: Compare embedding dimensions [8, 16, 32, 64, 128] over 10 epochs, tracking min validation loss - **Results processing**: DataFrame conversion with mean/std computation per dimension - **Visualization**: Matplotlib plot showing embedding dimension vs. validation loss - **Analysis**: Added markdown cell explaining expected tradeoffs (underfitting at low dims, overfitting risk at high dims) ```python # Training loop structure dimensions = [8, 16, 32, 64, 128] for dimension in dimensions: model = get_model(dimension, n_characters) model.compile(loss="sparse_categorical_crossentropy", optimizer=keras.optimizers.Adam()) history = model.fit(X_train, y_train, epochs=n_epochs, validation_data=(X_val, y_val)) results.append({"dimension": dimension, "val_loss": min(history.history["val_loss"])}) ``` Note: `Beaudelaire.txt` data file needs to be downloaded from Project Gutenberg (pg6099) to run the notebooks. > [!WARNING] > > <details> > <summary>Firewall rules blocked me from connecting to one or more addresses (expand for details)</summary> > > #### I tried to connect to the following addresses, but was blocked by firewall rules: > > - `www.gutenberg.org` > - Triggering command: `/usr/bin/curl curl -s REDACTED -o Beaudelaire.txt` (dns block) > - Triggering command: `/usr/bin/wget wget -q REDACTED -O Beaudelaire.txt` (dns block) > > If you need me to access, download, or install something from one of these locations, you can either: > > - Configure [Actions setup steps](https://gh.io/copilot/actions-setup-steps) to set up my environment, which run before the firewall is enabled > - Add the appropriate URLs or hosts to the custom allowlist in this repository's [Copilot coding agent settings](https://github.com/ArthurDanjou/ArtStudies/settings/copilot/coding_agent) (admins only) > > </details> <!-- START COPILOT CODING AGENT SUFFIX --> <details> <summary>Original prompt</summary> > Corrige les cellules de code déjà présentes et complète les autres en répondant aux questions demandées dans le TP 4 de DeepLearning de M2 </details> <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/ArthurDanjou/ArtStudies/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Completes the Bonus notebook by adding embedding-based data prep, model factory, training across embedding dimensions, result aggregation, plotting, and brief analysis. > > - **Notebook**: `M2/Deep Learning/TP4 - Récurrents/TP4 - Bonus.ipynb` > - **Data prep**: Build character index mappings; generate integer sequences `X`/`y`; split into train/val. > - **Model**: Add `get_model(dimension, vocabulary_size)` creating `Embedding` → `SimpleRNN(128)` → `Dense(vocabulary_size, softmax)`. > - **Training sweep**: Loop over embedding dimensions `[8, 16, 32, 64, 128]`, train for 10 epochs, record min `val_loss` per run. > - **Results processing**: Convert to `DataFrame`; produce list of tuples `(dimension, mean_val_loss, std_val_loss)` (single-run std=0). > - **Visualization**: Plot embedding dimension vs. min validation loss (Matplotlib). > - **Analysis**: Add markdown commenting on trade-offs across embedding sizes. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit aad17ec46573087740be4036786149431db0709e. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
arthur added the pull-request label 2025-12-01 17:04:31 +01:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: arthur/ArtStudies#2