mirror of
https://github.com/ArthurDanjou/handson-ml3.git
synced 2026-02-02 21:17:49 +01:00
Update notebooks 1 to 8 to latest library versions (in particular Scikit-Learn 0.20)
This commit is contained in:
@@ -661,15 +661,25 @@
|
||||
"sample_incomplete_rows"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Warning**: Since Scikit-Learn 0.20, the `sklearn.preprocessing.Imputer` class was replaced by the `sklearn.impute.SimpleImputer` class."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 49,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.preprocessing import Imputer\n",
|
||||
"try:\n",
|
||||
" from sklearn.impute import SimpleImputer # Scikit-Learn 0.20+\n",
|
||||
"except ImportError:\n",
|
||||
" from sklearn.preprocessing import Imputer as SimpleImputer\n",
|
||||
"\n",
|
||||
"imputer = Imputer(strategy=\"median\")"
|
||||
"imputer = SimpleImputer(strategy=\"median\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -798,7 +808,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Warning**: earlier versions of the book used the `LabelEncoder` class or Pandas' `Series.factorize()` method to encode string categorical attributes as integers. However, the `OrdinalEncoder` class that is planned to be introduced in Scikit-Learn 0.20 (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)) is preferable since it is designed for input features (`X` instead of labels `y`) and it plays well with pipelines (introduced later in this notebook). For now, we will import it from `future_encoders.py`, but once it is available you can import it directly from `sklearn.preprocessing`."
|
||||
"**Warning**: earlier versions of the book used the `LabelEncoder` class or Pandas' `Series.factorize()` method to encode string categorical attributes as integers. However, the `OrdinalEncoder` class that was introduced in Scikit-Learn 0.20 (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)) is preferable since it is designed for input features (`X` instead of labels `y`) and it plays well with pipelines (introduced later in this notebook). If you are using an older version of Scikit-Learn (<0.20), then you can import it from `future_encoders.py` instead."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -807,7 +817,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from future_encoders import OrdinalEncoder"
|
||||
"try:\n",
|
||||
" from sklearn.preprocessing import OrdinalEncoder\n",
|
||||
"except ImportError:\n",
|
||||
" from future_encoders import OrdinalEncoder # Scikit-Learn < 0.20"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -834,7 +847,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Warning**: earlier versions of the book used the `LabelBinarizer` or `CategoricalEncoder` classes to convert each categorical value to a one-hot vector. It is now preferable to use the `OneHotEncoder` class. Right now it can only handle integer categorical inputs, but in Scikit-Learn 0.20 it will also handle string categorical inputs (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)). So for now we import it from `future_encoders.py`, but when Scikit-Learn 0.20 is released, you can import it from `sklearn.preprocessing` instead:"
|
||||
"**Warning**: earlier versions of the book used the `LabelBinarizer` or `CategoricalEncoder` classes to convert each categorical value to a one-hot vector. It is now preferable to use the `OneHotEncoder` class. Since Scikit-Learn 0.20 it can handle string categorical inputs (see [PR #10521](https://github.com/scikit-learn/scikit-learn/issues/10521)), not just integer categorical inputs. If you are using an older version of Scikit-Learn, you can import the new version from `future_encoders.py`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -843,7 +856,11 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from future_encoders import OneHotEncoder\n",
|
||||
"try:\n",
|
||||
" from sklearn.preprocessing import OrdinalEncoder # just to raise an ImportError if Scikit-Learn < 0.20\n",
|
||||
" from sklearn.preprocessing import OneHotEncoder\n",
|
||||
"except ImportError:\n",
|
||||
" from future_encoders import OneHotEncoder # Scikit-Learn < 0.20\n",
|
||||
"\n",
|
||||
"cat_encoder = OneHotEncoder()\n",
|
||||
"housing_cat_1hot = cat_encoder.fit_transform(housing_cat)\n",
|
||||
@@ -959,7 +976,7 @@
|
||||
"from sklearn.preprocessing import StandardScaler\n",
|
||||
"\n",
|
||||
"num_pipeline = Pipeline([\n",
|
||||
" ('imputer', Imputer(strategy=\"median\")),\n",
|
||||
" ('imputer', SimpleImputer(strategy=\"median\")),\n",
|
||||
" ('attribs_adder', CombinedAttributesAdder()),\n",
|
||||
" ('std_scaler', StandardScaler()),\n",
|
||||
" ])\n",
|
||||
@@ -980,7 +997,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Warning**: earlier versions of the book applied different transformations to different columns using a solution based on a `DataFrameSelector` transformer and a `FeatureUnion` (see below). It is now preferable to use the `ColumnTransformer` class that will be introduced in Scikit-Learn 0.20. For now we import it from `future_encoders.py`, but when Scikit-Learn 0.20 is released, you can import it from `sklearn.compose` instead:"
|
||||
"**Warning**: earlier versions of the book applied different transformations to different columns using a solution based on a `DataFrameSelector` transformer and a `FeatureUnion` (see below). It is now preferable to use the `ColumnTransformer` class that was introduced in Scikit-Learn 0.20. If you are using an older version of Scikit-Learn, you can import it from `future_encoders.py`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -989,7 +1006,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from future_encoders import ColumnTransformer"
|
||||
"try:\n",
|
||||
" from sklearn.compose import ColumnTransformer\n",
|
||||
"except ImportError:\n",
|
||||
" from future_encoders import ColumnTransformer # Scikit-Learn < 0.20"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1070,7 +1090,7 @@
|
||||
"\n",
|
||||
"old_num_pipeline = Pipeline([\n",
|
||||
" ('selector', OldDataFrameSelector(num_attribs)),\n",
|
||||
" ('imputer', Imputer(strategy=\"median\")),\n",
|
||||
" ('imputer', SimpleImputer(strategy=\"median\")),\n",
|
||||
" ('attribs_adder', CombinedAttributesAdder()),\n",
|
||||
" ('std_scaler', StandardScaler()),\n",
|
||||
" ])\n",
|
||||
@@ -1275,6 +1295,13 @@
|
||||
"display_scores(lin_rmse_scores)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Note**: we specify `n_estimators=10` to avoid a warning about the fact that the default value is going to change to 100 in Scikit-Learn 0.22."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 91,
|
||||
@@ -1283,7 +1310,7 @@
|
||||
"source": [
|
||||
"from sklearn.ensemble import RandomForestRegressor\n",
|
||||
"\n",
|
||||
"forest_reg = RandomForestRegressor(random_state=42)\n",
|
||||
"forest_reg = RandomForestRegressor(n_estimators=10, random_state=42)\n",
|
||||
"forest_reg.fit(housing_prepared, housing_labels)"
|
||||
]
|
||||
},
|
||||
@@ -2114,10 +2141,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"param_grid = [\n",
|
||||
" {'preparation__num__imputer__strategy': ['mean', 'median', 'most_frequent'],\n",
|
||||
" 'feature_selection__k': list(range(1, len(feature_importances) + 1))}\n",
|
||||
"]\n",
|
||||
"param_grid = [{\n",
|
||||
" 'preparation__num__imputer__strategy': ['mean', 'median', 'most_frequent'],\n",
|
||||
" 'feature_selection__k': list(range(1, len(feature_importances) + 1))\n",
|
||||
"}]\n",
|
||||
"\n",
|
||||
"grid_search_prep = GridSearchCV(prepare_select_and_predict_pipeline, param_grid, cv=5,\n",
|
||||
" scoring='neg_mean_squared_error', verbose=2, n_jobs=4)\n",
|
||||
@@ -2164,7 +2191,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.5"
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"nav_menu": {
|
||||
"height": "279px",
|
||||
|
||||
Reference in New Issue
Block a user