From 8370cafbb7eb84bef57b1a28f3c8998bad2213d9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien=20Geron?= Date: Thu, 3 Mar 2016 18:40:31 +0100 Subject: [PATCH] Remove one level of headers --- tools_numpy.ipynb | 503 ++++++++++++++++++++++++--------------------- tools_pandas.ipynb | 94 +++++---- 2 files changed, 314 insertions(+), 283 deletions(-) diff --git a/tools_numpy.ipynb b/tools_numpy.ipynb index 0a6f5b5..5ec032d 100644 --- a/tools_numpy.ipynb +++ b/tools_numpy.ipynb @@ -4,10 +4,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Tools - NumPy\n", + "**Tools - NumPy**\n", + "\n", "*NumPy is the fundamental library for scientific computing with Python. NumPy is centered around a powerful N-dimensional array object, and it also contains useful linear algebra, Fourier transform, and random number functions.*\n", "\n", - "## Creating arrays\n", + "# Creating arrays\n", "First let's make sure that this notebook works both in python 2 and 3:" ] }, @@ -19,9 +20,7 @@ }, "outputs": [], "source": [ - "from __future__ import division\n", - "from __future__ import print_function\n", - "from __future__ import unicode_literals" + "from __future__ import division, print_function, unicode_literals" ] }, { @@ -46,7 +45,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `np.zeros`" + "## `np.zeros`" ] }, { @@ -89,7 +88,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Some vocabulary\n", + "## Some vocabulary\n", "\n", "* In NumPy, each dimension is called an **axis**.\n", "* The number of axes is called the **rank**.\n", @@ -150,7 +149,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### N-dimensional arrays\n", + "## N-dimensional arrays\n", "You can also create an N-dimensional array of arbitrary rank. For example, here's a 3D array (rank=3), with shape `(2,3,4)`:" ] }, @@ -169,7 +168,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Array type\n", + "## Array type\n", "NumPy arrays have the type `ndarray`s:" ] }, @@ -188,7 +187,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `np.ones`\n", + "## `np.ones`\n", "Many other NumPy functions create `ndarrays`.\n", "\n", "Here's a 3x4 matrix full of ones:" @@ -209,7 +208,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `np.full`\n", + "## `np.full`\n", "Creates an array of the given shape initialized with the given value. Here's a 3x4 matrix full of `π`." ] }, @@ -228,7 +227,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `np.empty`\n", + "## `np.empty`\n", "An uninitialized 2x3 array (its content is not predictable, as it is whatever is in memory at that point):" ] }, @@ -248,7 +247,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### np.array\n", + "## np.array\n", "Of course you can initialize an `ndarray` using a regular python array. Just call the `array` function:" ] }, @@ -267,7 +266,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `np.arange`\n", + "## `np.arange`\n", "You can create an `ndarray` using NumPy's `range` function, which is similar to python's built-in `range` function:" ] }, @@ -343,7 +342,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `np.linspace`\n", + "## `np.linspace`\n", "For this reason, it is generally preferable to use the `linspace` function instead of `arange` when working with floats. The `linspace` function returns an array containing a specific number of points evenly distributed between two values (note that the maximum value is *included*, contrary to `arange`):" ] }, @@ -362,7 +361,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `np.rand` and `np.randn`\n", + "## `np.rand` and `np.randn`\n", "A number of functions are available in NumPy's `random` module to create `ndarray`s initialized with random values.\n", "For example, here is a 3x4 matrix initialized with random floats between 0 and 1 (uniform distribution):" ] @@ -413,7 +412,17 @@ "outputs": [], "source": [ "%matplotlib inline\n", - "import matplotlib.pyplot as plt\n", + "import matplotlib.pyplot as plt" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ "plt.hist(np.random.rand(100000), normed=True, bins=100, histtype=\"step\", color=\"blue\", label=\"rand\")\n", "plt.hist(np.random.randn(100000), normed=True, bins=100, histtype=\"step\", color=\"red\", label=\"randn\")\n", "plt.axis([-2.5, 2.5, 0, 1.1])\n", @@ -428,13 +437,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### np.fromfunction\n", + "## np.fromfunction\n", "You can also initialize an `ndarray` using a function:" ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 24, "metadata": { "collapsed": false }, @@ -468,15 +477,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Array data\n", - "### `dtype`\n", + "# Array data\n", + "## `dtype`\n", "NumPy's `ndarray`s are also efficient in part because all their elements must have the same type (usually numbers).\n", "You can check what the data type is by looking at the `dtype` attribute:" ] }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 25, "metadata": { "collapsed": false, "scrolled": true @@ -489,7 +498,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 26, "metadata": { "collapsed": false }, @@ -508,7 +517,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 27, "metadata": { "collapsed": false }, @@ -524,13 +533,13 @@ "source": [ "Available data types include `int8`, `int16`, `int32`, `int64`, `uint8`|`16`|`32`|`64`, `float16`|`32`|`64` and `complex64`|`128`. Check out [the documentation](http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html) for the full list.\n", "\n", - "### `itemsize`\n", + "## `itemsize`\n", "The `itemsize` attribute returns the size (in bytes) of each item:" ] }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 28, "metadata": { "collapsed": false }, @@ -544,13 +553,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `data` buffer\n", + "## `data` buffer\n", "An array's data is actually stored in memory as a flat (one dimensional) byte buffer. It is available *via* the `data` attribute (you will rarely need it, though)." ] }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 29, "metadata": { "collapsed": false, "scrolled": false @@ -570,7 +579,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 30, "metadata": { "collapsed": false }, @@ -595,14 +604,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Reshaping an array\n", - "### In place\n", + "# Reshaping an array\n", + "## In place\n", "Changing the shape of an `ndarray` is as simple as setting its `shape` attribute. However, the array's size must remain the same." ] }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 31, "metadata": { "collapsed": false }, @@ -615,7 +624,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 32, "metadata": { "collapsed": false }, @@ -628,7 +637,7 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 33, "metadata": { "collapsed": false, "scrolled": true @@ -644,13 +653,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `reshape`\n", + "## `reshape`\n", "The `reshape` function returns a new `ndarray` object pointing at the *same* data. This means that modifying one array will also modify the other." ] }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 34, "metadata": { "collapsed": false, "scrolled": true @@ -671,7 +680,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 35, "metadata": { "collapsed": false }, @@ -690,7 +699,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 36, "metadata": { "collapsed": false }, @@ -703,13 +712,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `ravel`\n", + "## `ravel`\n", "Finally, the `ravel` function returns a new one-dimensional `ndarray` that also points to the same data:" ] }, { "cell_type": "code", - "execution_count": 36, + "execution_count": 37, "metadata": { "collapsed": false }, @@ -722,13 +731,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Arithmetic operations\n", + "# Arithmetic operations\n", "All the usual arithmetic operators (`+`, `-`, `*`, `/`, `//`, `**`, etc.) can be used with `ndarray`s. They apply *elementwise*:" ] }, { "cell_type": "code", - "execution_count": 37, + "execution_count": 38, "metadata": { "collapsed": false, "scrolled": false @@ -759,7 +768,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Broadcasting" + "# Broadcasting" ] }, { @@ -768,13 +777,13 @@ "source": [ "In general, when NumPy expects arrays of the same shape but finds that this is not the case, it applies the so-called *broadcasting* rules:\n", "\n", - "### First rule\n", + "## First rule\n", "*If the arrays do not have the same rank, then a 1 will be prepended to the smaller ranking arrays until their ranks match.*" ] }, { "cell_type": "code", - "execution_count": 38, + "execution_count": 39, "metadata": { "collapsed": false }, @@ -793,7 +802,7 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": 40, "metadata": { "collapsed": false }, @@ -806,13 +815,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Second rule\n", + "## Second rule\n", "*Arrays with a 1 along a particular dimension act as if they had the size of the array with the largest shape along that dimension. The value of the array element is repeated along that dimension.*" ] }, { "cell_type": "code", - "execution_count": 40, + "execution_count": 41, "metadata": { "collapsed": false }, @@ -831,7 +840,7 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": 42, "metadata": { "collapsed": false }, @@ -849,7 +858,7 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 43, "metadata": { "collapsed": false }, @@ -867,7 +876,7 @@ }, { "cell_type": "code", - "execution_count": 43, + "execution_count": 44, "metadata": { "collapsed": false }, @@ -880,13 +889,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Third rule\n", + "## Third rule\n", "*After rules 1 & 2, the sizes of all arrays must match.*" ] }, { "cell_type": "code", - "execution_count": 44, + "execution_count": 45, "metadata": { "collapsed": false }, @@ -910,13 +919,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Upcasting\n", + "## Upcasting\n", "When trying to combine arrays with different `dtype`s, NumPy will *upcast* to a type capable of handling all possible values (regardless of what the *actual* values are)." ] }, { "cell_type": "code", - "execution_count": 45, + "execution_count": 46, "metadata": { "collapsed": false }, @@ -928,7 +937,7 @@ }, { "cell_type": "code", - "execution_count": 46, + "execution_count": 47, "metadata": { "collapsed": false }, @@ -947,7 +956,7 @@ }, { "cell_type": "code", - "execution_count": 47, + "execution_count": 48, "metadata": { "collapsed": false }, @@ -961,7 +970,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Conditional operators" + "# Conditional operators" ] }, { @@ -973,7 +982,7 @@ }, { "cell_type": "code", - "execution_count": 48, + "execution_count": 49, "metadata": { "collapsed": false }, @@ -992,7 +1001,7 @@ }, { "cell_type": "code", - "execution_count": 49, + "execution_count": 50, "metadata": { "collapsed": false }, @@ -1010,7 +1019,7 @@ }, { "cell_type": "code", - "execution_count": 50, + "execution_count": 51, "metadata": { "collapsed": false }, @@ -1023,7 +1032,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Mathematical and statistical functions" + "# Mathematical and statistical functions" ] }, { @@ -1032,13 +1041,13 @@ "source": [ "Many mathematical and statistical functions are available for `ndarray`s.\n", "\n", - "### `ndarray` methods\n", + "## `ndarray` methods\n", "Some functions are simply `ndarray` methods, for example:" ] }, { "cell_type": "code", - "execution_count": 51, + "execution_count": 52, "metadata": { "collapsed": false }, @@ -1060,7 +1069,7 @@ }, { "cell_type": "code", - "execution_count": 52, + "execution_count": 53, "metadata": { "collapsed": false }, @@ -1079,7 +1088,7 @@ }, { "cell_type": "code", - "execution_count": 53, + "execution_count": 54, "metadata": { "collapsed": false }, @@ -1091,7 +1100,7 @@ }, { "cell_type": "code", - "execution_count": 54, + "execution_count": 55, "metadata": { "collapsed": false }, @@ -1102,7 +1111,7 @@ }, { "cell_type": "code", - "execution_count": 55, + "execution_count": 56, "metadata": { "collapsed": false }, @@ -1120,7 +1129,7 @@ }, { "cell_type": "code", - "execution_count": 56, + "execution_count": 57, "metadata": { "collapsed": false }, @@ -1131,7 +1140,7 @@ }, { "cell_type": "code", - "execution_count": 57, + "execution_count": 58, "metadata": { "collapsed": false }, @@ -1144,13 +1153,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Universal functions\n", + "## Universal functions\n", "NumPy also provides fast elementwise functions called *universal functions*, or **ufunc**. They are vectorized wrappers of simple functions. For example `square` returns a new `ndarray` which is a copy of the original `ndarray` except that each element is squared:" ] }, { "cell_type": "code", - "execution_count": 58, + "execution_count": 59, "metadata": { "collapsed": false }, @@ -1169,7 +1178,7 @@ }, { "cell_type": "code", - "execution_count": 59, + "execution_count": 60, "metadata": { "collapsed": false }, @@ -1186,13 +1195,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Binary ufuncs\n", + "## Binary ufuncs\n", "There are also many binary ufuncs, that apply elementwise on two `ndarray`s. Broadcasting rules are applied if the arrays do not have the same shape:" ] }, { "cell_type": "code", - "execution_count": 60, + "execution_count": 61, "metadata": { "collapsed": false }, @@ -1205,7 +1214,7 @@ }, { "cell_type": "code", - "execution_count": 61, + "execution_count": 62, "metadata": { "collapsed": false }, @@ -1216,7 +1225,7 @@ }, { "cell_type": "code", - "execution_count": 62, + "execution_count": 63, "metadata": { "collapsed": false }, @@ -1227,7 +1236,7 @@ }, { "cell_type": "code", - "execution_count": 63, + "execution_count": 64, "metadata": { "collapsed": false }, @@ -1240,14 +1249,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Array indexing\n", - "### One-dimensional arrays\n", + "# Array indexing\n", + "## One-dimensional arrays\n", "One-dimensional NumPy arrays can be accessed more or less like regular python arrays:" ] }, { "cell_type": "code", - "execution_count": 64, + "execution_count": 65, "metadata": { "collapsed": false }, @@ -1259,7 +1268,7 @@ }, { "cell_type": "code", - "execution_count": 65, + "execution_count": 66, "metadata": { "collapsed": false }, @@ -1270,7 +1279,7 @@ }, { "cell_type": "code", - "execution_count": 66, + "execution_count": 67, "metadata": { "collapsed": false }, @@ -1281,7 +1290,7 @@ }, { "cell_type": "code", - "execution_count": 67, + "execution_count": 68, "metadata": { "collapsed": false }, @@ -1292,7 +1301,7 @@ }, { "cell_type": "code", - "execution_count": 68, + "execution_count": 69, "metadata": { "collapsed": false }, @@ -1303,7 +1312,7 @@ }, { "cell_type": "code", - "execution_count": 69, + "execution_count": 70, "metadata": { "collapsed": false }, @@ -1321,7 +1330,7 @@ }, { "cell_type": "code", - "execution_count": 70, + "execution_count": 71, "metadata": { "collapsed": false }, @@ -1340,7 +1349,7 @@ }, { "cell_type": "code", - "execution_count": 71, + "execution_count": 72, "metadata": { "collapsed": false }, @@ -1354,13 +1363,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Differences with regular python arrays\n", + "## Differences with regular python arrays\n", "Contrary to regular python arrays, if you assign a single value to an `ndarray` slice, it is copied across the whole slice, thanks to broadcasting rules discussed above." ] }, { "cell_type": "code", - "execution_count": 72, + "execution_count": 73, "metadata": { "collapsed": false }, @@ -1379,7 +1388,7 @@ }, { "cell_type": "code", - "execution_count": 73, + "execution_count": 74, "metadata": { "collapsed": false, "scrolled": false @@ -1401,7 +1410,7 @@ }, { "cell_type": "code", - "execution_count": 74, + "execution_count": 75, "metadata": { "collapsed": false }, @@ -1422,7 +1431,7 @@ }, { "cell_type": "code", - "execution_count": 75, + "execution_count": 76, "metadata": { "collapsed": false }, @@ -1435,7 +1444,7 @@ }, { "cell_type": "code", - "execution_count": 76, + "execution_count": 77, "metadata": { "collapsed": false }, @@ -1454,7 +1463,7 @@ }, { "cell_type": "code", - "execution_count": 77, + "execution_count": 78, "metadata": { "collapsed": false }, @@ -1467,7 +1476,7 @@ }, { "cell_type": "code", - "execution_count": 78, + "execution_count": 79, "metadata": { "collapsed": false }, @@ -1481,13 +1490,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Multi-dimensional arrays\n", + "## Multi-dimensional arrays\n", "Multi-dimensional arrays can be accessed in a similar way by providing an index or slice for each axis, separated by commas:" ] }, { "cell_type": "code", - "execution_count": 79, + "execution_count": 80, "metadata": { "collapsed": false }, @@ -1499,7 +1508,7 @@ }, { "cell_type": "code", - "execution_count": 80, + "execution_count": 81, "metadata": { "collapsed": false }, @@ -1510,7 +1519,7 @@ }, { "cell_type": "code", - "execution_count": 81, + "execution_count": 82, "metadata": { "collapsed": false }, @@ -1521,7 +1530,7 @@ }, { "cell_type": "code", - "execution_count": 82, + "execution_count": 83, "metadata": { "collapsed": false }, @@ -1539,7 +1548,7 @@ }, { "cell_type": "code", - "execution_count": 83, + "execution_count": 84, "metadata": { "collapsed": false, "scrolled": true @@ -1551,7 +1560,7 @@ }, { "cell_type": "code", - "execution_count": 84, + "execution_count": 85, "metadata": { "collapsed": false }, @@ -1571,13 +1580,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Fancy indexing\n", + "## Fancy indexing\n", "You may also specify a list of indices that you are interested in. This is referred to as *fancy indexing*." ] }, { "cell_type": "code", - "execution_count": 85, + "execution_count": 86, "metadata": { "collapsed": false, "scrolled": true @@ -1589,7 +1598,7 @@ }, { "cell_type": "code", - "execution_count": 86, + "execution_count": 87, "metadata": { "collapsed": false }, @@ -1607,7 +1616,7 @@ }, { "cell_type": "code", - "execution_count": 87, + "execution_count": 88, "metadata": { "collapsed": false }, @@ -1620,13 +1629,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Higher dimensions\n", + "## Higher dimensions\n", "Everything works just as well with higher dimensional arrays, but it's useful to look at a few examples:" ] }, { "cell_type": "code", - "execution_count": 88, + "execution_count": 89, "metadata": { "collapsed": false }, @@ -1638,7 +1647,7 @@ }, { "cell_type": "code", - "execution_count": 89, + "execution_count": 90, "metadata": { "collapsed": false }, @@ -1649,7 +1658,7 @@ }, { "cell_type": "code", - "execution_count": 90, + "execution_count": 91, "metadata": { "collapsed": false }, @@ -1667,7 +1676,7 @@ }, { "cell_type": "code", - "execution_count": 91, + "execution_count": 92, "metadata": { "collapsed": false }, @@ -1680,21 +1689,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Ellipsis (`...`)\n", + "## Ellipsis (`...`)\n", "You may also write an ellipsis (`...`) to ask that all non-specified axes be entirely included." ] }, - { - "cell_type": "code", - "execution_count": 92, - "metadata": { - "collapsed": false - }, - "outputs": [], - "source": [ - "c[2, ...] # matrix 2, all rows, all columns. This is equivalent to c[2, :, :]" - ] - }, { "cell_type": "code", "execution_count": 93, @@ -1703,7 +1701,7 @@ }, "outputs": [], "source": [ - "c[2, 1, ...] # matrix 2, row 1, all columns. This is equivalent to c[2, 1, :]" + "c[2, ...] # matrix 2, all rows, all columns. This is equivalent to c[2, :, :]" ] }, { @@ -1714,12 +1712,23 @@ }, "outputs": [], "source": [ - "c[2, ..., 3] # matrix 2, all rows, column 3. This is equivalent to c[2, :, 3]" + "c[2, 1, ...] # matrix 2, row 1, all columns. This is equivalent to c[2, 1, :]" ] }, { "cell_type": "code", "execution_count": 95, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [ + "c[2, ..., 3] # matrix 2, all rows, column 3. This is equivalent to c[2, :, 3]" + ] + }, + { + "cell_type": "code", + "execution_count": 96, "metadata": { "collapsed": false, "scrolled": false @@ -1733,13 +1742,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Boolean indexing\n", + "## Boolean indexing\n", "You can also provide an `ndarray` of boolean values on one axis to specify the indices that you want to access." ] }, { "cell_type": "code", - "execution_count": 96, + "execution_count": 97, "metadata": { "collapsed": false }, @@ -1751,7 +1760,7 @@ }, { "cell_type": "code", - "execution_count": 97, + "execution_count": 98, "metadata": { "collapsed": false }, @@ -1763,7 +1772,7 @@ }, { "cell_type": "code", - "execution_count": 98, + "execution_count": 99, "metadata": { "collapsed": false }, @@ -1777,13 +1786,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `np.ix_`\n", + "## `np.ix_`\n", "You cannot use boolean indexing this way on multiple axes, but you can work around this by using the `ix_` function:" ] }, { "cell_type": "code", - "execution_count": 99, + "execution_count": 100, "metadata": { "collapsed": false }, @@ -1794,7 +1803,7 @@ }, { "cell_type": "code", - "execution_count": 100, + "execution_count": 101, "metadata": { "collapsed": false }, @@ -1812,7 +1821,7 @@ }, { "cell_type": "code", - "execution_count": 101, + "execution_count": 102, "metadata": { "collapsed": false }, @@ -1825,13 +1834,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Iterating\n", + "# Iterating\n", "Iterating over `ndarray`s is very similar to iterating over regular python arrays. Note that iterating over multidimensional arrays is done with respect to the first axis." ] }, { "cell_type": "code", - "execution_count": 102, + "execution_count": 103, "metadata": { "collapsed": false }, @@ -1843,7 +1852,7 @@ }, { "cell_type": "code", - "execution_count": 103, + "execution_count": 104, "metadata": { "collapsed": false }, @@ -1856,7 +1865,7 @@ }, { "cell_type": "code", - "execution_count": 104, + "execution_count": 105, "metadata": { "collapsed": false }, @@ -1876,7 +1885,7 @@ }, { "cell_type": "code", - "execution_count": 105, + "execution_count": 106, "metadata": { "collapsed": false }, @@ -1890,13 +1899,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Stacking arrays\n", + "# Stacking arrays\n", "It is often useful to stack together different arrays. NumPy offers several functions to do just that. Let's start by creating a few arrays." ] }, { "cell_type": "code", - "execution_count": 106, + "execution_count": 107, "metadata": { "collapsed": false }, @@ -1908,7 +1917,7 @@ }, { "cell_type": "code", - "execution_count": 107, + "execution_count": 108, "metadata": { "collapsed": false }, @@ -1920,7 +1929,7 @@ }, { "cell_type": "code", - "execution_count": 108, + "execution_count": 109, "metadata": { "collapsed": false }, @@ -1934,13 +1943,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `vstack`\n", + "## `vstack`\n", "Now let's stack them vertically using `vstack`:" ] }, { "cell_type": "code", - "execution_count": 109, + "execution_count": 110, "metadata": { "collapsed": false }, @@ -1952,7 +1961,7 @@ }, { "cell_type": "code", - "execution_count": 110, + "execution_count": 111, "metadata": { "collapsed": false }, @@ -1967,13 +1976,13 @@ "source": [ "This was possible because q1, q2 and q3 all have the same shape (except for the vertical axis, but that's ok since we are stacking on that axis).\n", "\n", - "### `hstack`\n", + "## `hstack`\n", "We can also stack arrays horizontally using `hstack`:" ] }, { "cell_type": "code", - "execution_count": 111, + "execution_count": 112, "metadata": { "collapsed": false }, @@ -1985,7 +1994,7 @@ }, { "cell_type": "code", - "execution_count": 112, + "execution_count": 113, "metadata": { "collapsed": false }, @@ -2003,7 +2012,7 @@ }, { "cell_type": "code", - "execution_count": 113, + "execution_count": 114, "metadata": { "collapsed": false }, @@ -2019,13 +2028,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `concatenate`\n", + "## `concatenate`\n", "The `concatenate` function stacks arrays along any given existing axis." ] }, { "cell_type": "code", - "execution_count": 114, + "execution_count": 115, "metadata": { "collapsed": false }, @@ -2037,7 +2046,7 @@ }, { "cell_type": "code", - "execution_count": 115, + "execution_count": 116, "metadata": { "collapsed": false }, @@ -2057,13 +2066,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `stack`\n", + "## `stack`\n", "The `stack` function stacks arrays along a new axis. All arrays have to have the same shape." ] }, { "cell_type": "code", - "execution_count": 116, + "execution_count": 117, "metadata": { "collapsed": false }, @@ -2075,7 +2084,7 @@ }, { "cell_type": "code", - "execution_count": 117, + "execution_count": 118, "metadata": { "collapsed": false }, @@ -2088,7 +2097,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Splitting arrays\n", + "# Splitting arrays\n", "Splitting is the opposite of stacking. For example, let's use the `vsplit` function to split a matrix vertically.\n", "\n", "First let's create a 6x4 matrix:" @@ -2096,7 +2105,7 @@ }, { "cell_type": "code", - "execution_count": 118, + "execution_count": 119, "metadata": { "collapsed": false }, @@ -2115,7 +2124,7 @@ }, { "cell_type": "code", - "execution_count": 119, + "execution_count": 120, "metadata": { "collapsed": false }, @@ -2127,7 +2136,7 @@ }, { "cell_type": "code", - "execution_count": 120, + "execution_count": 121, "metadata": { "collapsed": false }, @@ -2138,7 +2147,7 @@ }, { "cell_type": "code", - "execution_count": 121, + "execution_count": 122, "metadata": { "collapsed": false }, @@ -2156,7 +2165,7 @@ }, { "cell_type": "code", - "execution_count": 122, + "execution_count": 123, "metadata": { "collapsed": false }, @@ -2168,7 +2177,7 @@ }, { "cell_type": "code", - "execution_count": 123, + "execution_count": 124, "metadata": { "collapsed": false }, @@ -2181,7 +2190,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Transposing arrays\n", + "# Transposing arrays\n", "The `transpose` method creates a new view on an `ndarray`'s data, with axes permuted in the given order.\n", "\n", "For example, let's create a 3D array:" @@ -2189,7 +2198,7 @@ }, { "cell_type": "code", - "execution_count": 124, + "execution_count": 125, "metadata": { "collapsed": false }, @@ -2208,7 +2217,7 @@ }, { "cell_type": "code", - "execution_count": 125, + "execution_count": 126, "metadata": { "collapsed": false }, @@ -2220,7 +2229,7 @@ }, { "cell_type": "code", - "execution_count": 126, + "execution_count": 127, "metadata": { "collapsed": false }, @@ -2238,7 +2247,7 @@ }, { "cell_type": "code", - "execution_count": 127, + "execution_count": 128, "metadata": { "collapsed": false }, @@ -2250,7 +2259,7 @@ }, { "cell_type": "code", - "execution_count": 128, + "execution_count": 129, "metadata": { "collapsed": false }, @@ -2268,7 +2277,7 @@ }, { "cell_type": "code", - "execution_count": 129, + "execution_count": 130, "metadata": { "collapsed": false }, @@ -2280,7 +2289,7 @@ }, { "cell_type": "code", - "execution_count": 130, + "execution_count": 131, "metadata": { "collapsed": false }, @@ -2293,16 +2302,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Linear algebra\n", - "NumPy 2D arrays can be used to represent matrices efficiently in python. Let's go through some of the main matrix operations available.\n", + "# Linear algebra\n", + "NumPy 2D arrays can be used to represent matrices efficiently in python. We will just quickly go through some of the main matrix operations available. For more details about Linear Algebra, vectors and matrics, go through the [Linear Algebra tutorial](math_linear_algebra.ipynb).\n", "\n", - "### Matrix transpose\n", + "## Matrix transpose\n", "The `T` attribute is equivalent to calling `transpose()` when the rank is ≥2:" ] }, { "cell_type": "code", - "execution_count": 131, + "execution_count": 132, "metadata": { "collapsed": false }, @@ -2314,7 +2323,7 @@ }, { "cell_type": "code", - "execution_count": 132, + "execution_count": 133, "metadata": { "collapsed": false }, @@ -2332,7 +2341,7 @@ }, { "cell_type": "code", - "execution_count": 133, + "execution_count": 134, "metadata": { "collapsed": false, "scrolled": true @@ -2345,7 +2354,7 @@ }, { "cell_type": "code", - "execution_count": 134, + "execution_count": 135, "metadata": { "collapsed": false, "scrolled": true @@ -2364,7 +2373,7 @@ }, { "cell_type": "code", - "execution_count": 135, + "execution_count": 136, "metadata": { "collapsed": false }, @@ -2376,7 +2385,7 @@ }, { "cell_type": "code", - "execution_count": 136, + "execution_count": 137, "metadata": { "collapsed": false }, @@ -2389,13 +2398,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Matrix dot product\n", + "## Matrix dot product\n", "Let's create two matrices and execute a matrix [dot product](https://en.wikipedia.org/wiki/Dot_product) using the `dot` method." ] }, { "cell_type": "code", - "execution_count": 137, + "execution_count": 138, "metadata": { "collapsed": false }, @@ -2407,7 +2416,7 @@ }, { "cell_type": "code", - "execution_count": 138, + "execution_count": 139, "metadata": { "collapsed": false }, @@ -2419,7 +2428,7 @@ }, { "cell_type": "code", - "execution_count": 139, + "execution_count": 140, "metadata": { "collapsed": false }, @@ -2439,13 +2448,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Matrix inverse and pseudo-inverse\n", + "## Matrix inverse and pseudo-inverse\n", "Many of the linear algebra functions are available in the `numpy.linalg` module, in particular the `inv` function to compute a square matrix's inverse:" ] }, { "cell_type": "code", - "execution_count": 140, + "execution_count": 141, "metadata": { "collapsed": false }, @@ -2459,7 +2468,7 @@ }, { "cell_type": "code", - "execution_count": 141, + "execution_count": 142, "metadata": { "collapsed": false }, @@ -2477,7 +2486,7 @@ }, { "cell_type": "code", - "execution_count": 142, + "execution_count": 143, "metadata": { "collapsed": false }, @@ -2490,13 +2499,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Identity matrix\n", + "## Identity matrix\n", "The product of a matrix by its inverse returns the identiy matrix (with small floating point errors):" ] }, { "cell_type": "code", - "execution_count": 143, + "execution_count": 144, "metadata": { "collapsed": false }, @@ -2514,7 +2523,7 @@ }, { "cell_type": "code", - "execution_count": 144, + "execution_count": 145, "metadata": { "collapsed": false }, @@ -2527,13 +2536,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### QR decomposition\n", + "## QR decomposition\n", "The `qr` function computes the [QR decomposition](https://en.wikipedia.org/wiki/QR_decomposition) of a matrix:" ] }, { "cell_type": "code", - "execution_count": 145, + "execution_count": 146, "metadata": { "collapsed": false }, @@ -2545,7 +2554,7 @@ }, { "cell_type": "code", - "execution_count": 146, + "execution_count": 147, "metadata": { "collapsed": false }, @@ -2556,7 +2565,7 @@ }, { "cell_type": "code", - "execution_count": 147, + "execution_count": 148, "metadata": { "collapsed": false }, @@ -2569,13 +2578,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Determinant\n", + "## Determinant\n", "The `det` function computes the [matrix determinant](https://en.wikipedia.org/wiki/Determinant):" ] }, { "cell_type": "code", - "execution_count": 148, + "execution_count": 149, "metadata": { "collapsed": false }, @@ -2588,13 +2597,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Eigenvalues and eigenvectors\n", + "## Eigenvalues and eigenvectors\n", "The `eig` function computes the [eigenvalues and eigenvectors](https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors) of a square matrix:" ] }, { "cell_type": "code", - "execution_count": 149, + "execution_count": 150, "metadata": { "collapsed": false }, @@ -2606,7 +2615,7 @@ }, { "cell_type": "code", - "execution_count": 150, + "execution_count": 151, "metadata": { "collapsed": false }, @@ -2617,7 +2626,7 @@ }, { "cell_type": "code", - "execution_count": 151, + "execution_count": 152, "metadata": { "collapsed": false }, @@ -2630,13 +2639,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Singular Value Decomposition\n", + "## Singular Value Decomposition\n", "The `svd` function takes a matrix and returns its [singular value decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition):" ] }, { "cell_type": "code", - "execution_count": 152, + "execution_count": 153, "metadata": { "collapsed": false }, @@ -2648,7 +2657,7 @@ }, { "cell_type": "code", - "execution_count": 153, + "execution_count": 154, "metadata": { "collapsed": false }, @@ -2660,7 +2669,7 @@ }, { "cell_type": "code", - "execution_count": 154, + "execution_count": 155, "metadata": { "collapsed": false }, @@ -2678,7 +2687,7 @@ }, { "cell_type": "code", - "execution_count": 155, + "execution_count": 156, "metadata": { "collapsed": false }, @@ -2691,7 +2700,7 @@ }, { "cell_type": "code", - "execution_count": 156, + "execution_count": 157, "metadata": { "collapsed": false }, @@ -2702,7 +2711,7 @@ }, { "cell_type": "code", - "execution_count": 157, + "execution_count": 158, "metadata": { "collapsed": false }, @@ -2715,12 +2724,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Diagonal and trace" + "## Diagonal and trace" ] }, { "cell_type": "code", - "execution_count": 158, + "execution_count": 159, "metadata": { "collapsed": false }, @@ -2731,7 +2740,7 @@ }, { "cell_type": "code", - "execution_count": 159, + "execution_count": 160, "metadata": { "collapsed": false }, @@ -2744,7 +2753,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Solving a system of linear scalar equations" + "## Solving a system of linear scalar equations" ] }, { @@ -2759,7 +2768,7 @@ }, { "cell_type": "code", - "execution_count": 160, + "execution_count": 161, "metadata": { "collapsed": false }, @@ -2780,7 +2789,7 @@ }, { "cell_type": "code", - "execution_count": 161, + "execution_count": 162, "metadata": { "collapsed": false }, @@ -2798,7 +2807,7 @@ }, { "cell_type": "code", - "execution_count": 162, + "execution_count": 163, "metadata": { "collapsed": false, "scrolled": true @@ -2812,7 +2821,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Vectorization\n", + "# Vectorization\n", "Instead of executing operations on individual array items, one at a time, your code is much more efficient if you try to stick to array operations. This is called *vectorization*. This way, you can benefit from NumPy's many optimizations.\n", "\n", "For example, let's say we want to generate a 768x1024 array based on the formula $sin(xy/40.5)$. A **bad** option would be to do the math in python using nested loops:" @@ -2820,7 +2829,7 @@ }, { "cell_type": "code", - "execution_count": 163, + "execution_count": 164, "metadata": { "collapsed": false }, @@ -2842,7 +2851,7 @@ }, { "cell_type": "code", - "execution_count": 164, + "execution_count": 165, "metadata": { "collapsed": false }, @@ -2856,7 +2865,7 @@ }, { "cell_type": "code", - "execution_count": 165, + "execution_count": 166, "metadata": { "collapsed": false }, @@ -2876,7 +2885,7 @@ }, { "cell_type": "code", - "execution_count": 166, + "execution_count": 167, "metadata": { "collapsed": false }, @@ -2894,7 +2903,7 @@ }, { "cell_type": "code", - "execution_count": 167, + "execution_count": 168, "metadata": { "collapsed": false }, @@ -2911,16 +2920,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Saving and loading\n", + "# Saving and loading\n", "NumPy makes it easy to save and load `ndarray`s in binary or text format.\n", "\n", - "### Binary `.npy` format\n", + "## Binary `.npy` format\n", "Let's create a random array and save it." ] }, { "cell_type": "code", - "execution_count": 168, + "execution_count": 169, "metadata": { "collapsed": false, "scrolled": true @@ -2933,7 +2942,7 @@ }, { "cell_type": "code", - "execution_count": 169, + "execution_count": 170, "metadata": { "collapsed": false }, @@ -2951,7 +2960,7 @@ }, { "cell_type": "code", - "execution_count": 170, + "execution_count": 171, "metadata": { "collapsed": false }, @@ -2972,7 +2981,7 @@ }, { "cell_type": "code", - "execution_count": 171, + "execution_count": 172, "metadata": { "collapsed": false }, @@ -2986,13 +2995,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Text format\n", + "## Text format\n", "Let's try saving the array in text format:" ] }, { "cell_type": "code", - "execution_count": 172, + "execution_count": 173, "metadata": { "collapsed": false }, @@ -3010,7 +3019,7 @@ }, { "cell_type": "code", - "execution_count": 173, + "execution_count": 174, "metadata": { "collapsed": false }, @@ -3029,7 +3038,7 @@ }, { "cell_type": "code", - "execution_count": 174, + "execution_count": 175, "metadata": { "collapsed": true }, @@ -3047,7 +3056,7 @@ }, { "cell_type": "code", - "execution_count": 175, + "execution_count": 176, "metadata": { "collapsed": false }, @@ -3061,13 +3070,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Zipped `.npz` format\n", + "## Zipped `.npz` format\n", "It is also possible to save multiple arrays in one zipped file:" ] }, { "cell_type": "code", - "execution_count": 176, + "execution_count": 177, "metadata": { "collapsed": false }, @@ -3079,7 +3088,7 @@ }, { "cell_type": "code", - "execution_count": 177, + "execution_count": 178, "metadata": { "collapsed": true }, @@ -3097,7 +3106,7 @@ }, { "cell_type": "code", - "execution_count": 178, + "execution_count": 179, "metadata": { "collapsed": false }, @@ -3118,7 +3127,7 @@ }, { "cell_type": "code", - "execution_count": 179, + "execution_count": 180, "metadata": { "collapsed": false }, @@ -3137,7 +3146,7 @@ }, { "cell_type": "code", - "execution_count": 180, + "execution_count": 181, "metadata": { "collapsed": false }, @@ -3148,7 +3157,7 @@ }, { "cell_type": "code", - "execution_count": 181, + "execution_count": 182, "metadata": { "collapsed": false }, @@ -3161,7 +3170,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## What next?\n", + "# What next?\n", "Now you know all the fundamentals of NumPy, but there are many more options available. The best way to learn more is to experiment with NumPy, and go through the excellent [reference documentation](http://docs.scipy.org/doc/numpy/reference/index.html) to find more functions and features you may be interested in." ] } @@ -3183,6 +3192,20 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" + }, + "toc": { + "toc_cell": false, + "toc_number_sections": true, + "toc_section_display": "block", + "toc_threshold": 6, + "toc_window_display": false + }, + "toc_position": { + "height": "677px", + "left": "1195.02px", + "right": "20px", + "top": "78px", + "width": "238px" } }, "nbformat": 4, diff --git a/tools_pandas.ipynb b/tools_pandas.ipynb index b4ec061..27dea4f 100644 --- a/tools_pandas.ipynb +++ b/tools_pandas.ipynb @@ -4,10 +4,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Tools - pandas\n", + "**Tools - pandas**\n", + "\n", "*The `pandas` library provides high-performance, easy-to-use data structures and data analysis tools. The main data structure is the `DataFrame`, which you can think of as an in-memory 2D table (like a spreadsheet, with column names and row labels). Many features available in Excel are available programmatically, such as creating pivot tables, computing columns based on other columns, plotting graphs, etc. You can also group rows by column value, or join tables much like in SQL. Pandas is also great at handling time series.*\n", "\n", - "**Prerequisites:**\n", + "Prerequisites:\n", "* NumPy – if you are not familiar with NumPy, we recommend that you go through the [NumPy tutorial](tools_numpy.ipynb) now." ] }, @@ -15,7 +16,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Setup\n", + "# Setup\n", "First, let's make sure this notebook works well in both python 2 and 3:" ] }, @@ -54,7 +55,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## `Series` objects\n", + "# `Series` objects\n", "The `pandas` library contains these useful data structures:\n", "* `Series` objects, that we will discuss now. A `Series` object is 1D array, similar to a column in a spreadsheet (with a column name and row labels).\n", "* `DataFrame` objects. This is a 2D table, similar to a spreadsheet (with column names and row labels).\n", @@ -65,7 +66,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Creating a `Series`\n", + "## Creating a `Series`\n", "Let's start by creating our first `Series` object!" ] }, @@ -85,7 +86,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Similar to a 1D `ndarray`\n", + "## Similar to a 1D `ndarray`\n", "`Series` objects behave much like one-dimensional NumPy `ndarray`s, and you can often pass them as parameters to NumPy functions:" ] }, @@ -159,7 +160,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Index labels\n", + "## Index labels\n", "Each item in a `Series` object has a unique identifier called the *index label*. By default, it is simply the rank of the item in the `Series` (starting at `0`) but you can also set the index labels manually:" ] }, @@ -332,7 +333,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Init from `dict`\n", + "## Init from `dict`\n", "You can create a `Series` object from a `dict`. The keys will be used as index labels:" ] }, @@ -372,7 +373,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Automatic alignment\n", + "## Automatic alignment\n", "When an operation involves multiple `Series` objects, `pandas` automatically aligns items by matching index labels." ] }, @@ -425,7 +426,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Init with a scalar\n", + "## Init with a scalar\n", "You can also initialize a `Series` object using a scalar and a list of index labels: all items will be set to the scalar." ] }, @@ -445,7 +446,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### `Series` name\n", + "## `Series` name\n", "A `Series` can have a `name`:" ] }, @@ -465,7 +466,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Plotting a `Series`\n", + "## Plotting a `Series`\n", "Pandas makes it easy to plot `Series` data using matplotlib (for more details on matplotlib, check out the [matplotlib tutorial](tools_matplotlib.ipynb)). Just import matplotlib and call the `plot` method:" ] }, @@ -497,14 +498,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Time series\n", + "# Handling time\n", "Many datasets have timestamps, and pandas is awesome at manipulating such data:\n", "* it can represent periods (such as 2016Q3) and frequencies (such as \"monthly\"),\n", "* it can convert periods to actual timestamps, and *vice versa*,\n", "* it can resample data and aggregate values any way you like,\n", "* it can handle timezones.\n", "\n", - "### Time range\n", + "## Time range\n", "Let's start by creating a time series using `timerange`. This returns a `DatetimeIndex` containing one datetime per hour for 12 hours starting on October 29th 2016 at 5:30pm." ] }, @@ -564,7 +565,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Resampling\n", + "## Resampling\n", "Pandas let's us resample a time series very simply. Just call the `resample` method and specify a new frequency:" ] }, @@ -622,7 +623,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Upsampling and interpolation\n", + "## Upsampling and interpolation\n", "This was an example of downsampling. We can also upsample (ie. increase the frequency), but this creates holes in our data:" ] }, @@ -676,7 +677,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Timezones\n", + "## Timezones\n", "By default datetimes are *naive*: they are not aware of timezones, so 2016-10-30 02:30 might mean October 30th 2016 at 2:30am in Paris or in New York. We can make datetimes timezone *aware* by calling the `tz_localize` method:" ] }, @@ -776,7 +777,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Periods\n", + "## Periods\n", "The `period_range` function returns a `PeriodIndex` instead of a `DatetimeIndex`. For example, let's get all quarters in 2016 and 2017:" ] }, @@ -957,10 +958,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## `DataFrame` objects\n", + "# `DataFrame` objects\n", "A DataFrame object represents a spreadsheet, with cell values, column names and row index labels. You can define expressions to compute columns based on other columns, create pivot-tables, group rows, draw graphs, etc. You can see `DataFrame`s as dictionaries of `Series`.\n", "\n", - "### Creating a `DataFrame`\n", + "## Creating a `DataFrame`\n", "You can create a DataFrame by passing a dictionary of `Series` objects:" ] }, @@ -1156,7 +1157,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Multi-indexing\n", + "## Multi-indexing\n", "If all columns are tuples of the same size, then they are understood as a multi-index. The same goes for row index labels. For example:" ] }, @@ -1216,7 +1217,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Dropping a level\n", + "## Dropping a level\n", "Let's look at `d5` again:" ] }, @@ -1254,7 +1255,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Transposing\n", + "## Transposing\n", "You can swap columns and indices using the `T` attribute:" ] }, @@ -1274,7 +1275,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Stacking and unstacking levels\n", + "## Stacking and unstacking levels\n", "Calling the `stack` method will push the lowest column level after the lowest index:" ] }, @@ -1354,7 +1355,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Most methods return modified copies\n", + "## Most methods return modified copies\n", "As you may have noticed, the `stack` and `unstack` methods do not modify the object they apply to. Instead, they work on a copy and return that copy. This is true of most methods in pandas." ] }, @@ -1362,7 +1363,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Accessing rows\n", + "## Accessing rows\n", "Let's go back to the `people` `DataFrame`:" ] }, @@ -1471,7 +1472,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Adding and removing columns\n", + "## Adding and removing columns\n", "You can generally treat `DataFrame` objects like dictionaries of `Series`, so the following work fine:" ] }, @@ -1555,7 +1556,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Assigning new columns\n", + "## Assigning new columns\n", "You can also create new columns by calling the `assign` method. Note that this returns a new `DataFrame` object, the original is not modified:" ] }, @@ -1672,7 +1673,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Evaluating an expression\n", + "## Evaluating an expression\n", "A great feature supported by pandas is expression evaluation. This relies on the `numexpr` library which must be installed." ] }, @@ -1730,7 +1731,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Querying a `DataFrame`\n", + "## Querying a `DataFrame`\n", "The `query` method lets you filter a `DataFrame` based on a query expression:" ] }, @@ -1749,7 +1750,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Sorting a `DataFrame`\n", + "## Sorting a `DataFrame`\n", "You can sort a `DataFrame` by calling its `sort_index` method. By default it sorts the rows by their index label, in ascending order, but let's reverse the order:" ] }, @@ -1806,7 +1807,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Plotting a `DataFrame`\n", + "## Plotting a `DataFrame`\n", "Just like for `Series`, pandas makes it easy to draw nice graphs based on a `DataFrame`.\n", "\n", "For example, it is trivial to create a line plot from a `DataFrame`'s data by calling its `plot` method:" @@ -1855,7 +1856,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Operations on `DataFrame`s\n", + "## Operations on `DataFrame`s\n", "Although `DataFrame`s do not try to mimick NumPy arrays, there are a few similarities. Let's create a `DataFrame` to demonstrate this:" ] }, @@ -2058,7 +2059,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Automatic alignment\n", + "## Automatic alignment\n", "Similar to `Series`, when operating on multiple `DataFrame`s, pandas automatically aligns them by row index label, but also by column names. Let's create a `DataFrame` with bonus points for each person from October to December:" ] }, @@ -2093,7 +2094,7 @@ "source": [ "Looks like the addition worked in some cases but way too many elements are now empty. That's because when aligning the `DataFrame`s, some columns and rows were only present on one side, and thus they were considered missing on the other side (`NaN`). Then adding `NaN` to a number results in `NaN`, hence the result.\n", "\n", - "### Handling missing data\n", + "## Handling missing data\n", "Dealing with missing data is a frequent task when working with real life data. Pandas offers a few tools to handle missing data.\n", " \n", "Let's try to fix the problem above. For example, we can decide that missing data should result in a zero, instead of `NaN`. We can replace all `NaN` values by a any value using the `fillna` method:" @@ -2274,7 +2275,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Aggregating with `groupby`\n", + "## Aggregating with `groupby`\n", "Similar to the SQL language, pandas allows grouping your data into groups to run calculations over each group.\n", "\n", "First, let's add some extra data about each person so we can group them, and let's go back to the `final_grades` `DataFrame` so we can see how `NaN` values are handled:" @@ -2551,7 +2552,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Saving & loading\n", + "# Saving & loading\n", "Pandas can save `DataFrame`s to various backends, including file formats such as CSV, Excel, JSON, HTML and HDF5, or to a SQL database. Let's create a `DataFrame` to demonstrate this:" ] }, @@ -2575,7 +2576,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Saving\n", + "## Saving\n", "Let's save it to CSV, HTML and JSON:" ] }, @@ -2641,7 +2642,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Loading\n", + "## Loading\n", "Now let's load our CSV file back into a `DataFrame`:" ] }, @@ -2693,9 +2694,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Combining `DataFrame`s\n", + "# Combining `DataFrame`s\n", "\n", - "### SQL-like joins\n", + "## SQL-like joins\n", "One powerful feature of pandas is it's ability to perform SQL-like joins on `DataFrame`s. Various types of joins are supported: inner joins, left/right outer joins and full joins. To illustrate this, let's start by creating a couple simple `DataFrame`s:" ] }, @@ -2817,7 +2818,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Concatenation\n", + "## Concatenation\n", "Rather than joining `DataFrame`s, we may just want to concatenate them. That's what `concat` is for:" ] }, @@ -2961,7 +2962,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Categories\n", + "# Categories\n", "It is quite frequent to have values that represent categories, for example `1` for female and `2` for male, or `\"A\"` for Good, `\"B\"` for Average, `\"C\"` for Bad. These categorical values can be hard to read and cumbersome to handle, but fortunately pandas makes it easy. To illustrate this, let's take the `city_pop` `DataFrame` we created earlier, and add a column that represents a category:" ] }, @@ -3062,6 +3063,13 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" + }, + "toc": { + "toc_cell": false, + "toc_number_sections": true, + "toc_section_display": "none", + "toc_threshold": 6, + "toc_window_display": true } }, "nbformat": 4,