Update notebooks 1 to 8 to latest library versions (in particular Scikit-Learn 0.20)

This commit is contained in:
Aurélien Geron
2018-12-21 10:18:31 +08:00
parent dc16446c5f
commit b54ee1b608
8 changed files with 694 additions and 586 deletions

View File

@@ -661,15 +661,25 @@
"sample_incomplete_rows"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning**: Since Scikit-Learn 0.20, the `sklearn.preprocessing.Imputer` class was replaced by the `sklearn.impute.SimpleImputer` class."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import Imputer\n",
"try:\n",
" from sklearn.impute import SimpleImputer # Scikit-Learn 0.20+\n",
"except ImportError:\n",
" from sklearn.preprocessing import Imputer as SimpleImputer\n",
"\n",
"imputer = Imputer(strategy=\"median\")"
"imputer = SimpleImputer(strategy=\"median\")"
]
},
{
@@ -798,7 +808,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning**: earlier versions of the book used the `LabelEncoder` class or Pandas' `Series.factorize()` method to encode string categorical attributes as integers. However, the `OrdinalEncoder` class that is planned to be introduced in Scikit-Learn 0.20 (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)) is preferable since it is designed for input features (`X` instead of labels `y`) and it plays well with pipelines (introduced later in this notebook). For now, we will import it from `future_encoders.py`, but once it is available you can import it directly from `sklearn.preprocessing`."
"**Warning**: earlier versions of the book used the `LabelEncoder` class or Pandas' `Series.factorize()` method to encode string categorical attributes as integers. However, the `OrdinalEncoder` class that was introduced in Scikit-Learn 0.20 (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)) is preferable since it is designed for input features (`X` instead of labels `y`) and it plays well with pipelines (introduced later in this notebook). If you are using an older version of Scikit-Learn (<0.20), then you can import it from `future_encoders.py` instead."
]
},
{
@@ -807,7 +817,10 @@
"metadata": {},
"outputs": [],
"source": [
"from future_encoders import OrdinalEncoder"
"try:\n",
" from sklearn.preprocessing import OrdinalEncoder\n",
"except ImportError:\n",
" from future_encoders import OrdinalEncoder # Scikit-Learn < 0.20"
]
},
{
@@ -834,7 +847,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning**: earlier versions of the book used the `LabelBinarizer` or `CategoricalEncoder` classes to convert each categorical value to a one-hot vector. It is now preferable to use the `OneHotEncoder` class. Right now it can only handle integer categorical inputs, but in Scikit-Learn 0.20 it will also handle string categorical inputs (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)). So for now we import it from `future_encoders.py`, but when Scikit-Learn 0.20 is released, you can import it from `sklearn.preprocessing` instead:"
"**Warning**: earlier versions of the book used the `LabelBinarizer` or `CategoricalEncoder` classes to convert each categorical value to a one-hot vector. It is now preferable to use the `OneHotEncoder` class. Since Scikit-Learn 0.20 it can handle string categorical inputs (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)), not just integer categorical inputs. If you are using an older version of Scikit-Learn, you can import the new version from `future_encoders.py`:"
]
},
{
@@ -843,7 +856,11 @@
"metadata": {},
"outputs": [],
"source": [
"from future_encoders import OneHotEncoder\n",
"try:\n",
" from sklearn.preprocessing import OrdinalEncoder # just to raise an ImportError if Scikit-Learn < 0.20\n",
" from sklearn.preprocessing import OneHotEncoder\n",
"except ImportError:\n",
" from future_encoders import OneHotEncoder # Scikit-Learn < 0.20\n",
"\n",
"cat_encoder = OneHotEncoder()\n",
"housing_cat_1hot = cat_encoder.fit_transform(housing_cat)\n",
@@ -959,7 +976,7 @@
"from sklearn.preprocessing import StandardScaler\n",
"\n",
"num_pipeline = Pipeline([\n",
" ('imputer', Imputer(strategy=\"median\")),\n",
" ('imputer', SimpleImputer(strategy=\"median\")),\n",
" ('attribs_adder', CombinedAttributesAdder()),\n",
" ('std_scaler', StandardScaler()),\n",
" ])\n",
@@ -980,7 +997,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Warning**: earlier versions of the book applied different transformations to different columns using a solution based on a `DataFrameSelector` transformer and a `FeatureUnion` (see below). It is now preferable to use the `ColumnTransformer` class that will be introduced in Scikit-Learn 0.20. For now we import it from `future_encoders.py`, but when Scikit-Learn 0.20 is released, you can import it from `sklearn.compose` instead:"
"**Warning**: earlier versions of the book applied different transformations to different columns using a solution based on a `DataFrameSelector` transformer and a `FeatureUnion` (see below). It is now preferable to use the `ColumnTransformer` class that was introduced in Scikit-Learn 0.20. If you are using an older version of Scikit-Learn, you can import it from `future_encoders.py`:"
]
},
{
@@ -989,7 +1006,10 @@
"metadata": {},
"outputs": [],
"source": [
"from future_encoders import ColumnTransformer"
"try:\n",
" from sklearn.compose import ColumnTransformer\n",
"except ImportError:\n",
" from future_encoders import ColumnTransformer # Scikit-Learn < 0.20"
]
},
{
@@ -1070,7 +1090,7 @@
"\n",
"old_num_pipeline = Pipeline([\n",
" ('selector', OldDataFrameSelector(num_attribs)),\n",
" ('imputer', Imputer(strategy=\"median\")),\n",
" ('imputer', SimpleImputer(strategy=\"median\")),\n",
" ('attribs_adder', CombinedAttributesAdder()),\n",
" ('std_scaler', StandardScaler()),\n",
" ])\n",
@@ -1275,6 +1295,13 @@
"display_scores(lin_rmse_scores)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note**: we specify `n_estimators=10` to avoid a warning about the fact that the default value is going to change to 100 in Scikit-Learn 0.22."
]
},
{
"cell_type": "code",
"execution_count": 91,
@@ -1283,7 +1310,7 @@
"source": [
"from sklearn.ensemble import RandomForestRegressor\n",
"\n",
"forest_reg = RandomForestRegressor(random_state=42)\n",
"forest_reg = RandomForestRegressor(n_estimators=10, random_state=42)\n",
"forest_reg.fit(housing_prepared, housing_labels)"
]
},
@@ -2114,10 +2141,10 @@
"metadata": {},
"outputs": [],
"source": [
"param_grid = [\n",
" {'preparation__num__imputer__strategy': ['mean', 'median', 'most_frequent'],\n",
" 'feature_selection__k': list(range(1, len(feature_importances) + 1))}\n",
"]\n",
"param_grid = [{\n",
" 'preparation__num__imputer__strategy': ['mean', 'median', 'most_frequent'],\n",
" 'feature_selection__k': list(range(1, len(feature_importances) + 1))\n",
"}]\n",
"\n",
"grid_search_prep = GridSearchCV(prepare_select_and_predict_pipeline, param_grid, cv=5,\n",
" scoring='neg_mean_squared_error', verbose=2, n_jobs=4)\n",
@@ -2164,7 +2191,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.6"
},
"nav_menu": {
"height": "279px",