Merge branch 'master' into fix-chapter3-header

This commit is contained in:
Aurélien Geron
2021-03-02 10:33:10 +13:00
committed by GitHub
32 changed files with 1590 additions and 1177 deletions

View File

@@ -84,6 +84,13 @@
"# MNIST"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning:** since Scikit-Learn 0.24, `fetch_openml()` returns a Pandas `DataFrame` by default. To avoid this and keep the same code as in the book, we use `as_frame=False`."
]
},
{
"cell_type": "code",
"execution_count": 2,
@@ -91,7 +98,7 @@
"outputs": [],
"source": [
"from sklearn.datasets import fetch_openml\n",
"mnist = fetch_openml('mnist_784', version=1)\n",
"mnist = fetch_openml('mnist_784', version=1, as_frame=False)\n",
"mnist.keys()"
]
},
@@ -345,7 +352,7 @@
"* first, Scikit-Learn and other libraries evolve, and algorithms get tweaked a bit, which may change the exact result you get. If you use the latest Scikit-Learn version (and in general, you really should), you probably won't be using the exact same version I used when I wrote the book or this notebook, hence the difference. I try to keep this notebook reasonably up to date, but I can't change the numbers on the pages in your copy of the book.\n",
"* second, many training algorithms are stochastic, meaning they rely on randomness. In principle, it's possible to get consistent outputs from a random number generator by setting the seed from which it generates the pseudo-random numbers (which is why you will see `random_state=42` or `np.random.seed(42)` pretty often). However, sometimes this does not suffice due to the other factors listed here.\n",
"* third, if the training algorithm runs across multiple threads (as do some algorithms implemented in C) or across multiple processes (e.g., when using the `n_jobs` argument), then the precise order in which operations will run is not always guaranteed, and thus the exact result may vary slightly.\n",
"* lastly, other things may prevent perfect reproducibility, such as Python maps and sets whose order is not guaranteed to be stable across sessions, or the order of files in a directory which is also not guaranteed."
"* lastly, other things may prevent perfect reproducibility, such as Python dicts and sets whose order is not guaranteed to be stable across sessions, or the order of files in a directory which is also not guaranteed."
]
},
{
@@ -393,7 +400,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
@@ -412,7 +419,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
@@ -481,7 +488,7 @@
},
{
"cell_type": "code",
"execution_count": 30,
"execution_count": 34,
"metadata": {},
"outputs": [],
"source": [
@@ -491,7 +498,7 @@
},
{
"cell_type": "code",
"execution_count": 31,
"execution_count": 35,
"metadata": {},
"outputs": [],
"source": [
@@ -502,7 +509,7 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": 36,
"metadata": {},
"outputs": [],
"source": [
@@ -533,7 +540,7 @@
},
{
"cell_type": "code",
"execution_count": 33,
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
@@ -542,7 +549,7 @@
},
{
"cell_type": "code",
"execution_count": 35,
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
@@ -564,7 +571,7 @@
},
{
"cell_type": "code",
"execution_count": 42,
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
@@ -573,7 +580,7 @@
},
{
"cell_type": "code",
"execution_count": 43,
"execution_count": 40,
"metadata": {},
"outputs": [],
"source": [
@@ -582,7 +589,7 @@
},
{
"cell_type": "code",
"execution_count": 44,
"execution_count": 41,
"metadata": {},
"outputs": [],
"source": [
@@ -591,7 +598,7 @@
},
{
"cell_type": "code",
"execution_count": 45,
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
@@ -600,7 +607,7 @@
},
{
"cell_type": "code",
"execution_count": 46,
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
@@ -616,7 +623,7 @@
},
{
"cell_type": "code",
"execution_count": 47,
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
@@ -627,7 +634,7 @@
},
{
"cell_type": "code",
"execution_count": 50,
"execution_count": 45,
"metadata": {},
"outputs": [],
"source": [
@@ -651,7 +658,7 @@
},
{
"cell_type": "code",
"execution_count": 53,
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
@@ -669,7 +676,7 @@
},
{
"cell_type": "code",
"execution_count": 54,
"execution_count": 47,
"metadata": {},
"outputs": [],
"source": [
@@ -681,7 +688,7 @@
},
{
"cell_type": "code",
"execution_count": 55,
"execution_count": 48,
"metadata": {},
"outputs": [],
"source": [
@@ -691,7 +698,7 @@
},
{
"cell_type": "code",
"execution_count": 57,
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
@@ -713,7 +720,7 @@
},
{
"cell_type": "code",
"execution_count": 58,
"execution_count": 50,
"metadata": {},
"outputs": [],
"source": [
@@ -722,7 +729,7 @@
},
{
"cell_type": "code",
"execution_count": 59,
"execution_count": 51,
"metadata": {},
"outputs": [],
"source": [
@@ -732,7 +739,7 @@
},
{
"cell_type": "code",
"execution_count": 60,
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
@@ -836,6 +843,13 @@
"sgd_clf.decision_function([some_digit])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning**: the following two cells may take close to 30 minutes to run, or more depending on your hardware."
]
},
{
"cell_type": "code",
"execution_count": 62,
@@ -1209,7 +1223,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning**: the next cell may take hours to run, depending on your hardware."
"**Warning**: the next cell may take close to 16 hours to run, or more depending on your hardware."
]
},
{
@@ -1355,6 +1369,13 @@
"knn_clf.fit(X_train_augmented, y_train_augmented)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning**: the following cell may take close to an hour to run, depending on your hardware."
]
},
{
"cell_type": "code",
"execution_count": 99,
@@ -1925,7 +1946,7 @@
"source": [
"import os\n",
"import tarfile\n",
"import urllib\n",
"import urllib.request\n",
"\n",
"DOWNLOAD_ROOT = \"http://spamassassin.apache.org/old/publiccorpus/\"\n",
"HAM_URL = DOWNLOAD_ROOT + \"20030228_easy_ham.tar.bz2\"\n",
@@ -2156,7 +2177,7 @@
},
{
"cell_type": "code",
"execution_count": 185,
"execution_count": 142,
"metadata": {},
"outputs": [],
"source": [
@@ -2517,7 +2538,7 @@
},
{
"cell_type": "code",
"execution_count": 183,
"execution_count": 158,
"metadata": {},
"outputs": [],
"source": [
@@ -2540,7 +2561,7 @@
},
{
"cell_type": "code",
"execution_count": 184,
"execution_count": 159,
"metadata": {},
"outputs": [],
"source": [
@@ -2581,7 +2602,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.2"
"version": "3.7.9"
},
"nav_menu": {},
"toc": {