From 7629334e9b9568eb6a232d63afbc523c28df7c97 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien=20Geron?= Date: Tue, 19 Sep 2017 13:01:23 +0200 Subject: [PATCH] Do not use LabelEncoder and LabelBinarizer, use factorize() and CategoricalEncoder instead. --- 02_end_to_end_machine_learning_project.ipynb | 494 +++++++++++++++---- 1 file changed, 398 insertions(+), 96 deletions(-) diff --git a/02_end_to_end_machine_learning_project.ipynb b/02_end_to_end_machine_learning_project.ipynb index cfd004b..6844f9c 100644 --- a/02_end_to_end_machine_learning_project.ipynb +++ b/02_end_to_end_machine_learning_project.ipynb @@ -782,18 +782,28 @@ "housing_tr.head()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's preprocess the categorical input feature, `ocean_proximity`:" + ] + }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ - "from sklearn.preprocessing import LabelEncoder\n", - "\n", - "encoder = LabelEncoder()\n", "housing_cat = housing[\"ocean_proximity\"]\n", - "housing_cat_encoded = encoder.fit_transform(housing_cat)\n", - "housing_cat_encoded" + "housing_cat.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use Pandas' `factorize()` method to convert this string categorical feature to an integer categorical feature, which will be easier for Machine Learning algorithms to handle:" ] }, { @@ -802,7 +812,8 @@ "metadata": {}, "outputs": [], "source": [ - "print(encoder.classes_)" + "housing_cat_encoded, housing_categories = housing_cat.factorize()\n", + "housing_cat_encoded[:10]" ] }, { @@ -810,6 +821,29 @@ "execution_count": 59, "metadata": {}, "outputs": [], + "source": [ + "housing_categories" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Warning**: earlier versions of the book used the `LabelEncoder` class instead of Pandas' `factorize()` method. This was incorrect: indeed, as its name suggests, the `LabelEncoder` class was designed for labels, not for input features. The code worked because we were handling a single categorical input feature, but it would break if you passed multiple categorical input features." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can convert each categorical value to a one-hot vector using a `OneHotEncoder`:" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [], "source": [ "from sklearn.preprocessing import OneHotEncoder\n", "\n", @@ -819,12 +853,10 @@ ] }, { - "cell_type": "code", - "execution_count": 60, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "housing_cat_1hot.toarray()" + "The `OneHotEncoder` returns a sparse array by default, but we can convert it to a dense array if needed:" ] }, { @@ -833,11 +865,14 @@ "metadata": {}, "outputs": [], "source": [ - "from sklearn.preprocessing import LabelBinarizer\n", - "\n", - "encoder = LabelBinarizer()\n", - "housing_cat_1hot = encoder.fit_transform(housing_cat)\n", - "housing_cat_1hot" + "housing_cat_1hot.toarray()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Warning**: earlier versions of the book used the `LabelBinarizer` class at this point. Again, this was incorrect: just like the `LabelEncoder` class, the `LabelBinarizer` class was designed to preprocess labels, not input features. A better solution is to use Scikit-Learn's upcoming `CategoricalEncoder` class: it will soon be added to Scikit-Learn, and in the meantime you can use the code below (copied from [Pull Request #9151](https://github.com/scikit-learn/scikit-learn/pull/9151))." ] }, { @@ -847,6 +882,273 @@ "collapsed": true }, "outputs": [], + "source": [ + "# Definition of the CategoricalEncoder class, copied from PR #9151.\n", + "# Just run this cell, or copy it to your code, do not try to understand it (yet).\n", + "\n", + "from sklearn.base import BaseEstimator, TransformerMixin\n", + "from sklearn.utils import check_array\n", + "from sklearn.preprocessing import LabelEncoder\n", + "from scipy import sparse\n", + "\n", + "class CategoricalEncoder(BaseEstimator, TransformerMixin):\n", + " \"\"\"Encode categorical features as a numeric array.\n", + " The input to this transformer should be a matrix of integers or strings,\n", + " denoting the values taken on by categorical (discrete) features.\n", + " The features can be encoded using a one-hot aka one-of-K scheme\n", + " (``encoding='onehot'``, the default) or converted to ordinal integers\n", + " (``encoding='ordinal'``).\n", + " This encoding is needed for feeding categorical data to many scikit-learn\n", + " estimators, notably linear models and SVMs with the standard kernels.\n", + " Read more in the :ref:`User Guide `.\n", + " Parameters\n", + " ----------\n", + " encoding : str, 'onehot', 'onehot-dense' or 'ordinal'\n", + " The type of encoding to use (default is 'onehot'):\n", + " - 'onehot': encode the features using a one-hot aka one-of-K scheme\n", + " (or also called 'dummy' encoding). This creates a binary column for\n", + " each category and returns a sparse matrix.\n", + " - 'onehot-dense': the same as 'onehot' but returns a dense array\n", + " instead of a sparse matrix.\n", + " - 'ordinal': encode the features as ordinal integers. This results in\n", + " a single column of integers (0 to n_categories - 1) per feature.\n", + " categories : 'auto' or a list of lists/arrays of values.\n", + " Categories (unique values) per feature:\n", + " - 'auto' : Determine categories automatically from the training data.\n", + " - list : ``categories[i]`` holds the categories expected in the ith\n", + " column. The passed categories are sorted before encoding the data\n", + " (used categories can be found in the ``categories_`` attribute).\n", + " dtype : number type, default np.float64\n", + " Desired dtype of output.\n", + " handle_unknown : 'error' (default) or 'ignore'\n", + " Whether to raise an error or ignore if a unknown categorical feature is\n", + " present during transform (default is to raise). When this is parameter\n", + " is set to 'ignore' and an unknown category is encountered during\n", + " transform, the resulting one-hot encoded columns for this feature\n", + " will be all zeros.\n", + " Ignoring unknown categories is not supported for\n", + " ``encoding='ordinal'``.\n", + " Attributes\n", + " ----------\n", + " categories_ : list of arrays\n", + " The categories of each feature determined during fitting. When\n", + " categories were specified manually, this holds the sorted categories\n", + " (in order corresponding with output of `transform`).\n", + " Examples\n", + " --------\n", + " Given a dataset with three features and two samples, we let the encoder\n", + " find the maximum value per feature and transform the data to a binary\n", + " one-hot encoding.\n", + " >>> from sklearn.preprocessing import CategoricalEncoder\n", + " >>> enc = CategoricalEncoder(handle_unknown='ignore')\n", + " >>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])\n", + " ... # doctest: +ELLIPSIS\n", + " CategoricalEncoder(categories='auto', dtype=<... 'numpy.float64'>,\n", + " encoding='onehot', handle_unknown='ignore')\n", + " >>> enc.transform([[0, 1, 1], [1, 0, 4]]).toarray()\n", + " array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.],\n", + " [ 0., 1., 1., 0., 0., 0., 0., 0., 0.]])\n", + " See also\n", + " --------\n", + " sklearn.preprocessing.OneHotEncoder : performs a one-hot encoding of\n", + " integer ordinal features. The ``OneHotEncoder assumes`` that input\n", + " features take on values in the range ``[0, max(feature)]`` instead of\n", + " using the unique values.\n", + " sklearn.feature_extraction.DictVectorizer : performs a one-hot encoding of\n", + " dictionary items (also handles string-valued features).\n", + " sklearn.feature_extraction.FeatureHasher : performs an approximate one-hot\n", + " encoding of dictionary items or strings.\n", + " \"\"\"\n", + "\n", + " def __init__(self, encoding='onehot', categories='auto', dtype=np.float64,\n", + " handle_unknown='error'):\n", + " self.encoding = encoding\n", + " self.categories = categories\n", + " self.dtype = dtype\n", + " self.handle_unknown = handle_unknown\n", + "\n", + " def fit(self, X, y=None):\n", + " \"\"\"Fit the CategoricalEncoder to X.\n", + " Parameters\n", + " ----------\n", + " X : array-like, shape [n_samples, n_feature]\n", + " The data to determine the categories of each feature.\n", + " Returns\n", + " -------\n", + " self\n", + " \"\"\"\n", + "\n", + " if self.encoding not in ['onehot', 'onehot-dense', 'ordinal']:\n", + " template = (\"encoding should be either 'onehot', 'onehot-dense' \"\n", + " \"or 'ordinal', got %s\")\n", + " raise ValueError(template % self.handle_unknown)\n", + "\n", + " if self.handle_unknown not in ['error', 'ignore']:\n", + " template = (\"handle_unknown should be either 'error' or \"\n", + " \"'ignore', got %s\")\n", + " raise ValueError(template % self.handle_unknown)\n", + "\n", + " if self.encoding == 'ordinal' and self.handle_unknown == 'ignore':\n", + " raise ValueError(\"handle_unknown='ignore' is not supported for\"\n", + " \" encoding='ordinal'\")\n", + "\n", + " X = check_array(X, dtype=np.object, accept_sparse='csc', copy=True)\n", + " n_samples, n_features = X.shape\n", + "\n", + " self._label_encoders_ = [LabelEncoder() for _ in range(n_features)]\n", + "\n", + " for i in range(n_features):\n", + " le = self._label_encoders_[i]\n", + " Xi = X[:, i]\n", + " if self.categories == 'auto':\n", + " le.fit(Xi)\n", + " else:\n", + " valid_mask = np.in1d(Xi, self.categories[i])\n", + " if not np.all(valid_mask):\n", + " if self.handle_unknown == 'error':\n", + " diff = np.unique(Xi[~valid_mask])\n", + " msg = (\"Found unknown categories {0} in column {1}\"\n", + " \" during fit\".format(diff, i))\n", + " raise ValueError(msg)\n", + " le.classes_ = np.array(np.sort(self.categories[i]))\n", + "\n", + " self.categories_ = [le.classes_ for le in self._label_encoders_]\n", + "\n", + " return self\n", + "\n", + " def transform(self, X):\n", + " \"\"\"Transform X using one-hot encoding.\n", + " Parameters\n", + " ----------\n", + " X : array-like, shape [n_samples, n_features]\n", + " The data to encode.\n", + " Returns\n", + " -------\n", + " X_out : sparse matrix or a 2-d array\n", + " Transformed input.\n", + " \"\"\"\n", + " X = check_array(X, accept_sparse='csc', dtype=np.object, copy=True)\n", + " n_samples, n_features = X.shape\n", + " X_int = np.zeros_like(X, dtype=np.int)\n", + " X_mask = np.ones_like(X, dtype=np.bool)\n", + "\n", + " for i in range(n_features):\n", + " valid_mask = np.in1d(X[:, i], self.categories_[i])\n", + "\n", + " if not np.all(valid_mask):\n", + " if self.handle_unknown == 'error':\n", + " diff = np.unique(X[~valid_mask, i])\n", + " msg = (\"Found unknown categories {0} in column {1}\"\n", + " \" during transform\".format(diff, i))\n", + " raise ValueError(msg)\n", + " else:\n", + " # Set the problematic rows to an acceptable value and\n", + " # continue `The rows are marked `X_mask` and will be\n", + " # removed later.\n", + " X_mask[:, i] = valid_mask\n", + " X[:, i][~valid_mask] = self.categories_[i][0]\n", + " X_int[:, i] = self._label_encoders_[i].transform(X[:, i])\n", + "\n", + " if self.encoding == 'ordinal':\n", + " return X_int.astype(self.dtype, copy=False)\n", + "\n", + " mask = X_mask.ravel()\n", + " n_values = [cats.shape[0] for cats in self.categories_]\n", + " n_values = np.array([0] + n_values)\n", + " indices = np.cumsum(n_values)\n", + "\n", + " column_indices = (X_int + indices[:-1]).ravel()[mask]\n", + " row_indices = np.repeat(np.arange(n_samples, dtype=np.int32),\n", + " n_features)[mask]\n", + " data = np.ones(n_samples * n_features)[mask]\n", + "\n", + " out = sparse.csc_matrix((data, (row_indices, column_indices)),\n", + " shape=(n_samples, indices[-1]),\n", + " dtype=self.dtype).tocsr()\n", + " if self.encoding == 'onehot-dense':\n", + " return out.toarray()\n", + " else:\n", + " return out" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `CategoricalEncoder` expects a 2D array containing one or more categorical input features. We need to reshape `housing_cat` to a 2D array:" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [], + "source": [ + "#from sklearn.preprocessing import CategoricalEncoder # in future versions of Scikit-Learn\n", + "\n", + "cat_encoder = CategoricalEncoder()\n", + "housing_cat_reshaped = housing_cat.values.reshape(-1, 1)\n", + "housing_cat_1hot = cat_encoder.fit_transform(housing_cat_reshaped)\n", + "housing_cat_1hot" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The default encoding is one-hot, and it returns a sparse array. You can use `toarray()` to get a dense array:" + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [], + "source": [ + "housing_cat_1hot.toarray()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Alternatively, you can specify the encoding to be `\"onehot-dense\"` to get a dense matrix rather than a sparse matrix:" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [], + "source": [ + "cat_encoder = CategoricalEncoder(encoding=\"onehot-dense\")\n", + "housing_cat_1hot = cat_encoder.fit_transform(housing_cat_reshaped)\n", + "housing_cat_1hot" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [], + "source": [ + "cat_encoder.categories_" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's create a custom transformer to add extra attributes:" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": { + "collapsed": true + }, + "outputs": [], "source": [ "from sklearn.base import BaseEstimator, TransformerMixin\n", "\n", @@ -874,7 +1176,7 @@ }, { "cell_type": "code", - "execution_count": 63, + "execution_count": 68, "metadata": {}, "outputs": [], "source": [ @@ -882,9 +1184,16 @@ "housing_extra_attribs.head()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's build a pipeline for preprocessing the numerical attributes:" + ] + }, { "cell_type": "code", - "execution_count": 64, + "execution_count": 69, "metadata": { "collapsed": true }, @@ -904,16 +1213,23 @@ }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 70, "metadata": {}, "outputs": [], "source": [ "housing_num_tr" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And a transformer to just select a subset of the Pandas DataFrame columns:" + ] + }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 71, "metadata": { "collapsed": true }, @@ -936,28 +1252,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Important note**: the `LabelEncoder` and `LabelBinarizer` classes were designed for preprocessing labels, not input features, so their `fit()` and `fit_transform()` methods only accept one parameter `y` instead of two parameters `X` and `y`. The proper way to convert categorical input features to one-hot vectors should be to use the `OneHotEncoder` class, but unfortunately it does not work with string categories, only integer categories (people are working on it, see [Pull Request 7327](https://github.com/scikit-learn/scikit-learn/pull/7327)). In the meantime, one workaround was to use the `LabelBinarizer` class, as shown in the book. Unfortunately, since Scikit-Learn 0.19.0, pipelines now expect each estimator to have a `fit()` or `fit_transform()` method with two parameters `X` and `y`, so the code shown in the book won't work if you are using Scikit-Learn 0.19.0 (and possibly later as well). A temporary workaround (until PR 7327 is finished and you can use a `OneHotEncoder`) is to create a small wrapper class around the `LabelBinarizer` class, to fix its `fit_transform()` method, like this:" + "Now let's join all these components into a big pipeline that will preprocess both the numerical and the categorical features:" ] }, { "cell_type": "code", - "execution_count": 67, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "class PipelineFriendlyLabelBinarizer(LabelBinarizer):\n", - " def fit_transform(self, X, y=None):\n", - " return super(PipelineFriendlyLabelBinarizer, self).fit_transform(X)" - ] - }, - { - "cell_type": "code", - "execution_count": 68, - "metadata": { - "collapsed": true - }, + "execution_count": 72, + "metadata": {}, "outputs": [], "source": [ "num_attribs = list(housing_num)\n", @@ -972,13 +1273,13 @@ "\n", "cat_pipeline = Pipeline([\n", " ('selector', DataFrameSelector(cat_attribs)),\n", - " ('label_binarizer', PipelineFriendlyLabelBinarizer()),\n", + " ('cat_encoder', CategoricalEncoder(encoding=\"onehot-dense\")),\n", " ])" ] }, { "cell_type": "code", - "execution_count": 69, + "execution_count": 73, "metadata": { "collapsed": true }, @@ -994,7 +1295,7 @@ }, { "cell_type": "code", - "execution_count": 70, + "execution_count": 74, "metadata": {}, "outputs": [], "source": [ @@ -1004,7 +1305,7 @@ }, { "cell_type": "code", - "execution_count": 71, + "execution_count": 75, "metadata": {}, "outputs": [], "source": [ @@ -1020,7 +1321,7 @@ }, { "cell_type": "code", - "execution_count": 72, + "execution_count": 76, "metadata": {}, "outputs": [], "source": [ @@ -1032,7 +1333,7 @@ }, { "cell_type": "code", - "execution_count": 73, + "execution_count": 77, "metadata": {}, "outputs": [], "source": [ @@ -1053,7 +1354,7 @@ }, { "cell_type": "code", - "execution_count": 74, + "execution_count": 78, "metadata": {}, "outputs": [], "source": [ @@ -1062,7 +1363,7 @@ }, { "cell_type": "code", - "execution_count": 75, + "execution_count": 79, "metadata": {}, "outputs": [], "source": [ @@ -1071,7 +1372,7 @@ }, { "cell_type": "code", - "execution_count": 76, + "execution_count": 80, "metadata": {}, "outputs": [], "source": [ @@ -1085,7 +1386,7 @@ }, { "cell_type": "code", - "execution_count": 77, + "execution_count": 81, "metadata": {}, "outputs": [], "source": [ @@ -1097,7 +1398,7 @@ }, { "cell_type": "code", - "execution_count": 78, + "execution_count": 82, "metadata": {}, "outputs": [], "source": [ @@ -1109,7 +1410,7 @@ }, { "cell_type": "code", - "execution_count": 79, + "execution_count": 83, "metadata": {}, "outputs": [], "source": [ @@ -1128,7 +1429,7 @@ }, { "cell_type": "code", - "execution_count": 80, + "execution_count": 84, "metadata": { "collapsed": true }, @@ -1143,7 +1444,7 @@ }, { "cell_type": "code", - "execution_count": 81, + "execution_count": 85, "metadata": {}, "outputs": [], "source": [ @@ -1157,7 +1458,7 @@ }, { "cell_type": "code", - "execution_count": 82, + "execution_count": 86, "metadata": {}, "outputs": [], "source": [ @@ -1169,7 +1470,7 @@ }, { "cell_type": "code", - "execution_count": 83, + "execution_count": 87, "metadata": {}, "outputs": [], "source": [ @@ -1181,7 +1482,7 @@ }, { "cell_type": "code", - "execution_count": 84, + "execution_count": 88, "metadata": {}, "outputs": [], "source": [ @@ -1193,7 +1494,7 @@ }, { "cell_type": "code", - "execution_count": 85, + "execution_count": 89, "metadata": {}, "outputs": [], "source": [ @@ -1207,7 +1508,7 @@ }, { "cell_type": "code", - "execution_count": 86, + "execution_count": 90, "metadata": {}, "outputs": [], "source": [ @@ -1217,7 +1518,7 @@ }, { "cell_type": "code", - "execution_count": 87, + "execution_count": 91, "metadata": {}, "outputs": [], "source": [ @@ -1233,7 +1534,7 @@ }, { "cell_type": "code", - "execution_count": 88, + "execution_count": 92, "metadata": {}, "outputs": [], "source": [ @@ -1262,7 +1563,7 @@ }, { "cell_type": "code", - "execution_count": 89, + "execution_count": 93, "metadata": {}, "outputs": [], "source": [ @@ -1271,7 +1572,7 @@ }, { "cell_type": "code", - "execution_count": 90, + "execution_count": 94, "metadata": {}, "outputs": [], "source": [ @@ -1287,7 +1588,7 @@ }, { "cell_type": "code", - "execution_count": 91, + "execution_count": 95, "metadata": {}, "outputs": [], "source": [ @@ -1298,7 +1599,7 @@ }, { "cell_type": "code", - "execution_count": 92, + "execution_count": 96, "metadata": {}, "outputs": [], "source": [ @@ -1307,7 +1608,7 @@ }, { "cell_type": "code", - "execution_count": 93, + "execution_count": 97, "metadata": {}, "outputs": [], "source": [ @@ -1327,7 +1628,7 @@ }, { "cell_type": "code", - "execution_count": 94, + "execution_count": 98, "metadata": {}, "outputs": [], "source": [ @@ -1338,7 +1639,7 @@ }, { "cell_type": "code", - "execution_count": 95, + "execution_count": 99, "metadata": {}, "outputs": [], "source": [ @@ -1348,19 +1649,20 @@ }, { "cell_type": "code", - "execution_count": 96, + "execution_count": 100, "metadata": {}, "outputs": [], "source": [ "extra_attribs = [\"rooms_per_hhold\", \"pop_per_hhold\", \"bedrooms_per_room\"]\n", - "cat_one_hot_attribs = list(encoder.classes_)\n", + "cat_encoder = cat_pipeline.named_steps[\"cat_encoder\"]\n", + "cat_one_hot_attribs = list(cat_encoder.categories_[0])\n", "attributes = num_attribs + extra_attribs + cat_one_hot_attribs\n", "sorted(zip(feature_importances, attributes), reverse=True)" ] }, { "cell_type": "code", - "execution_count": 97, + "execution_count": 101, "metadata": { "collapsed": true }, @@ -1380,7 +1682,7 @@ }, { "cell_type": "code", - "execution_count": 98, + "execution_count": 102, "metadata": {}, "outputs": [], "source": [ @@ -1403,7 +1705,7 @@ }, { "cell_type": "code", - "execution_count": 99, + "execution_count": 103, "metadata": {}, "outputs": [], "source": [ @@ -1425,7 +1727,7 @@ }, { "cell_type": "code", - "execution_count": 100, + "execution_count": 104, "metadata": { "collapsed": true }, @@ -1436,7 +1738,7 @@ }, { "cell_type": "code", - "execution_count": 101, + "execution_count": 105, "metadata": { "collapsed": true }, @@ -1457,7 +1759,7 @@ }, { "cell_type": "code", - "execution_count": 102, + "execution_count": 106, "metadata": {}, "outputs": [], "source": [ @@ -1495,7 +1797,7 @@ }, { "cell_type": "code", - "execution_count": 103, + "execution_count": 107, "metadata": {}, "outputs": [], "source": [ @@ -1521,7 +1823,7 @@ }, { "cell_type": "code", - "execution_count": 104, + "execution_count": 108, "metadata": {}, "outputs": [], "source": [ @@ -1539,7 +1841,7 @@ }, { "cell_type": "code", - "execution_count": 105, + "execution_count": 109, "metadata": {}, "outputs": [], "source": [ @@ -1569,7 +1871,7 @@ }, { "cell_type": "code", - "execution_count": 106, + "execution_count": 110, "metadata": {}, "outputs": [], "source": [ @@ -1602,7 +1904,7 @@ }, { "cell_type": "code", - "execution_count": 107, + "execution_count": 111, "metadata": {}, "outputs": [], "source": [ @@ -1620,7 +1922,7 @@ }, { "cell_type": "code", - "execution_count": 108, + "execution_count": 112, "metadata": {}, "outputs": [], "source": [ @@ -1643,7 +1945,7 @@ }, { "cell_type": "code", - "execution_count": 109, + "execution_count": 113, "metadata": {}, "outputs": [], "source": [ @@ -1668,7 +1970,7 @@ }, { "cell_type": "code", - "execution_count": 110, + "execution_count": 114, "metadata": {}, "outputs": [], "source": [ @@ -1707,7 +2009,7 @@ }, { "cell_type": "code", - "execution_count": 111, + "execution_count": 115, "metadata": { "collapsed": true }, @@ -1745,7 +2047,7 @@ }, { "cell_type": "code", - "execution_count": 112, + "execution_count": 116, "metadata": { "collapsed": true }, @@ -1763,7 +2065,7 @@ }, { "cell_type": "code", - "execution_count": 113, + "execution_count": 117, "metadata": {}, "outputs": [], "source": [ @@ -1773,7 +2075,7 @@ }, { "cell_type": "code", - "execution_count": 114, + "execution_count": 118, "metadata": {}, "outputs": [], "source": [ @@ -1789,7 +2091,7 @@ }, { "cell_type": "code", - "execution_count": 115, + "execution_count": 119, "metadata": {}, "outputs": [], "source": [ @@ -1805,7 +2107,7 @@ }, { "cell_type": "code", - "execution_count": 116, + "execution_count": 120, "metadata": { "collapsed": true }, @@ -1819,7 +2121,7 @@ }, { "cell_type": "code", - "execution_count": 117, + "execution_count": 121, "metadata": { "collapsed": true }, @@ -1837,7 +2139,7 @@ }, { "cell_type": "code", - "execution_count": 118, + "execution_count": 122, "metadata": {}, "outputs": [], "source": [ @@ -1853,7 +2155,7 @@ }, { "cell_type": "code", - "execution_count": 119, + "execution_count": 123, "metadata": {}, "outputs": [], "source": [ @@ -1883,7 +2185,7 @@ }, { "cell_type": "code", - "execution_count": 120, + "execution_count": 124, "metadata": { "collapsed": true }, @@ -1898,7 +2200,7 @@ }, { "cell_type": "code", - "execution_count": 121, + "execution_count": 125, "metadata": {}, "outputs": [], "source": [ @@ -1914,7 +2216,7 @@ }, { "cell_type": "code", - "execution_count": 122, + "execution_count": 126, "metadata": {}, "outputs": [], "source": [ @@ -1948,7 +2250,7 @@ }, { "cell_type": "code", - "execution_count": 123, + "execution_count": 127, "metadata": {}, "outputs": [], "source": [ @@ -1964,7 +2266,7 @@ }, { "cell_type": "code", - "execution_count": 124, + "execution_count": 128, "metadata": {}, "outputs": [], "source": [ @@ -1980,7 +2282,7 @@ }, { "cell_type": "code", - "execution_count": 125, + "execution_count": 129, "metadata": {}, "outputs": [], "source": [