Use tf.layers instead of tf.contrib.layers

2026-01-26 09:40:26 +01:00 · 2017-04-30 10:21:27 +02:00
parent 14101abcf9
commit 326d32cae0
7 changed files with 531 additions and 258 deletions
--- a/11_deep_learning.ipynb
+++ b/11_deep_learning.ipynb
@@ -297,6 +297,20 @@
    "    display(HTML(iframe))"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
+   "source": [
+    "Note: the book uses `tensorflow.contrib.layers.fully_connected()` rather than `tf.layers.dense()` (which did not exist when this chapter was written). It is now preferable to use `tf.layers.dense()`, because anything in the contrib module may change or be deleted without notice. The `dense()` function is almost identical to the `fully_connected()` function. The main differences relevant to this chapter are:\n",
+    "* several parameters are renamed: `scope` becomes `name`, `activation_fn` becomes `activation` (and similarly the `_fn` suffix is removed from other parameters such as `normalizer_fn`), `weights_initializer` becomes `kernel_initializer`, etc.\n",
+    "* the default `activation` is now `None` rather than `tf.nn.relu`.\n",
+    "* it does not support `tensorflow.contrib.framework.arg_scope()` (introduced later in chapter 11).\n",
+    "* it does not support regularizer params (introduced later in chapter 11)."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 12,
@@ -307,8 +321,6 @@
   },
   "outputs": [],
   "source": [
-    "from tensorflow.contrib.layers import fully_connected\n",
-    "\n",
    "tf.reset_default_graph()\n",
    "\n",
    "n_inputs = 28*28  # MNIST\n",
@@ -321,9 +333,9 @@
    "y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
    "\n",
    "with tf.name_scope(\"dnn\"):\n",
-    "    hidden1 = fully_connected(X, n_hidden1, activation_fn=leaky_relu, scope=\"hidden1\")\n",
-    "    hidden2 = fully_connected(hidden1, n_hidden2, activation_fn=leaky_relu, scope=\"hidden2\")\n",
-    "    logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope=\"outputs\")\n",
+    "    hidden1 = tf.layers.dense(X, n_hidden1, activation=leaky_relu, name=\"hidden1\")\n",
+    "    hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=leaky_relu, name=\"hidden2\")\n",
+    "    logits = tf.layers.dense(hidden2, n_outputs, name=\"outputs\")\n",
    "\n",
    "with tf.name_scope(\"loss\"):\n",
    "    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
@@ -377,6 +389,24 @@
    "# Batch Normalization"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
+   "source": [
+    "Note: the book uses `tensorflow.contrib.layers.batch_norm()` rather than `tf.layers.batch_normalization()` (which did not exist when this chapter was written). It is now preferable to use `tf.layers.batch_normalization()`, because anything in the contrib module may change or be deleted without notice. Instead of using the `batch_norm()` function as a regularizer parameter to the `fully_connected()` function, we now use `batch_normalization()` and we explicitly create a distinct layer. The parameters are a bit different, in particular:\n",
+    "* `decay` is renamed to `momentum`,\n",
+    "* `is_training` is renamed to `training`,\n",
+    "* `updates_collections` is removed: the update operations needed by batch normalization are added to the `UPDATE_OPS` collection and you need to explicity run these operations during training (see the execution phase below),\n",
+    "* we don't need to specify `scale=True`, as that is the default.\n",
+    "\n",
+    "Also note that in order to run batch norm just _before_ each hidden layer's activation function, we apply the ELU activation function manually, right after the batch norm layer.\n",
+    "\n",
+    "Note: since the `tf.layers.dense()` function is incompatible with `tf.contrib.layers.arg_scope()` (which is used in the book), we now use python's `functools.partial()` function instead. It makes it easy to create a `my_dense_layer()` function that just calls `tf.layers.dense()` with the desired parameters automatically set (unless they are overridden when calling `my_dense_layer()`). As you can see, the code remains very similar."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 14,
@@ -387,11 +417,10 @@
   },
   "outputs": [],
   "source": [
-    "from tensorflow.contrib.layers import fully_connected, batch_norm\n",
-    "from tensorflow.contrib.framework import arg_scope\n",
-    "\n",
    "tf.reset_default_graph()\n",
    "\n",
+    "from functools import partial\n",
+    "\n",
    "n_inputs = 28 * 28  # MNIST\n",
    "n_hidden1 = 300\n",
    "n_hidden2 = 100\n",
@@ -405,22 +434,23 @@
    "\n",
    "with tf.name_scope(\"dnn\"):\n",
    "    he_init = tf.contrib.layers.variance_scaling_initializer()\n",
-    "    batch_norm_params = {\n",
-    "        'is_training': is_training,\n",
-    "        'decay': 0.9,\n",
-    "        'updates_collections': None,\n",
-    "        'scale': True,\n",
-    "    }\n",
    "\n",
-    "    with arg_scope(\n",
-    "            [fully_connected],\n",
-    "            activation_fn=tf.nn.elu,\n",
-    "            weights_initializer=he_init,\n",
-    "            normalizer_fn=batch_norm,\n",
-    "            normalizer_params=batch_norm_params):\n",
-    "        hidden1 = fully_connected(X, n_hidden1, scope=\"hidden1\")\n",
-    "        hidden2 = fully_connected(hidden1, n_hidden2, scope=\"hidden2\")\n",
-    "        logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope=\"outputs\")\n",
+    "    my_batch_norm_layer = partial(\n",
+    "            tf.layers.batch_normalization,\n",
+    "            training=is_training,\n",
+    "            momentum=0.9)\n",
+    "\n",
+    "    my_dense_layer = partial(\n",
+    "            tf.layers.dense,\n",
+    "            kernel_initializer=he_init)\n",
+    "\n",
+    "    hidden1 = my_dense_layer(X, n_hidden1, name=\"hidden1\")\n",
+    "    bn1 = tf.nn.elu(my_batch_norm_layer(hidden1))\n",
+    "    hidden2 = my_dense_layer(bn1, n_hidden2, name=\"hidden2\")\n",
+    "    bn2 = tf.nn.elu(my_batch_norm_layer(hidden2))\n",
+    "    logits_before_bn = my_dense_layer(bn2, n_outputs, activation=None, name=\"outputs\")\n",
+    "    logits = my_batch_norm_layer(logits_before_bn)\n",
+    "    extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n",
    "\n",
    "with tf.name_scope(\"loss\"):\n",
    "    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
@@ -438,6 +468,16 @@
    "saver = tf.train.Saver()"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
+   "source": [
+    "Note: since we are using `tf.layers.batch_normalization()` rather than `tf.contrib.layers.batch_norm()` (as in the book), we need to explicitly run the extra update operations needed by batch normalization (`sess.run([training_op, extra_update_ops],...`)."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 15,
@@ -449,14 +489,14 @@
   "outputs": [],
   "source": [
    "n_epochs = 20\n",
-    "batch_size = 50\n",
+    "batch_size = 200\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    init.run()\n",
    "    for epoch in range(n_epochs):\n",
    "        for iteration in range(len(mnist.test.labels)//batch_size):\n",
    "            X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
-    "            sess.run(training_op, feed_dict={is_training: True, X: X_batch, y: y_batch})\n",
+    "            sess.run([training_op, extra_update_ops], feed_dict={is_training: True, X: X_batch, y: y_batch})\n",
    "        acc_train = accuracy.eval(feed_dict={is_training: False, X: X_batch, y: y_batch})\n",
    "        acc_test = accuracy.eval(feed_dict={is_training: False, X: mnist.test.images, y: mnist.test.labels})\n",
    "        print(epoch, \"Train accuracy:\", acc_train, \"Test accuracy:\", acc_test)\n",
@@ -464,11 +504,21 @@
    "    save_path = saver.save(sess, \"my_model_final.ckpt\")"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
+   "source": [
+    "Now the same model with $\\ell_1$ regularization:"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
-    "collapsed": false,
+    "collapsed": true,
    "deletable": true,
    "editable": true
   },
@@ -476,29 +526,32 @@
   "source": [
    "tf.reset_default_graph()\n",
    "\n",
+    "from functools import partial\n",
+    "\n",
    "X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
    "y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
    "is_training = tf.placeholder(tf.bool, shape=(), name='is_training')\n",
    "\n",
    "with tf.name_scope(\"dnn\"):\n",
    "    he_init = tf.contrib.layers.variance_scaling_initializer()\n",
-    "    batch_norm_params = {\n",
-    "        'is_training': is_training,\n",
-    "        'decay': 0.9,\n",
-    "        'updates_collections': None,\n",
-    "        'scale': True,\n",
-    "    }\n",
    "\n",
-    "    with arg_scope(\n",
-    "            [fully_connected],\n",
-    "            activation_fn=tf.nn.elu,\n",
-    "            weights_initializer=he_init,\n",
-    "            normalizer_fn=batch_norm,\n",
-    "            normalizer_params=batch_norm_params,\n",
-    "            weights_regularizer=tf.contrib.layers.l1_regularizer(0.01)):\n",
-    "        hidden1 = fully_connected(X, n_hidden1, scope=\"hidden1\")\n",
-    "        hidden2 = fully_connected(hidden1, n_hidden2, scope=\"hidden2\")\n",
-    "        logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope=\"outputs\")\n",
+    "    my_batch_norm_layer = partial(\n",
+    "            tf.layers.batch_normalization,\n",
+    "            training=is_training,\n",
+    "            momentum=0.9)\n",
+    "\n",
+    "    my_dense_layer = partial(\n",
+    "            tf.layers.dense,\n",
+    "            kernel_initializer=he_init,\n",
+    "            kernel_regularizer=tf.contrib.layers.l1_regularizer(0.01))\n",
+    "\n",
+    "    hidden1 = my_dense_layer(X, n_hidden1, name=\"hidden1\")\n",
+    "    bn1 = tf.nn.elu(my_batch_norm_layer(hidden1))\n",
+    "    hidden2 = my_dense_layer(bn1, n_hidden2, name=\"hidden2\")\n",
+    "    bn2 = tf.nn.elu(my_batch_norm_layer(hidden2))\n",
+    "    logits_before_bn = my_dense_layer(bn2, n_outputs, activation=None, name=\"outputs\")\n",
+    "    logits = my_batch_norm_layer(logits_before_bn)\n",
+    "    extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n",
    "\n",
    "with tf.name_scope(\"loss\"):\n",
    "    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
@@ -513,7 +566,7 @@
    "with tf.name_scope(\"eval\"):\n",
    "    correct = tf.nn.in_top_k(logits, y, 1)\n",
    "    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))\n",
-    "    \n",
+    "\n",
    "init = tf.global_variables_initializer()\n",
    "saver = tf.train.Saver()"
   ]
@@ -529,14 +582,14 @@
   "outputs": [],
   "source": [
    "n_epochs = 20\n",
-    "batch_size = 50\n",
+    "batch_size = 200\n",
    "\n",
    "with tf.Session() as sess:\n",
    "    init.run()\n",
    "    for epoch in range(n_epochs):\n",
    "        for iteration in range(len(mnist.test.labels)//batch_size):\n",
    "            X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
-    "            sess.run(training_op, feed_dict={is_training: True, X: X_batch, y: y_batch})\n",
+    "            sess.run([training_op, extra_update_ops], feed_dict={is_training: True, X: X_batch, y: y_batch})\n",
    "        acc_train = accuracy.eval(feed_dict={is_training: False, X: X_batch, y: y_batch})\n",
    "        acc_test = accuracy.eval(feed_dict={is_training: False, X: mnist.test.images, y: mnist.test.labels})\n",
    "        print(epoch, \"Train accuracy:\", acc_train, \"Test accuracy:\", acc_test)\n",
@@ -557,6 +610,16 @@
    "[v.name for v in tf.global_variables()]"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
+   "source": [
+    "Note: the weights variable created by the `tf.layers.dense()` function is called `\"kernel\"` (instead of `\"weights\"` when using the `tf.contrib.layers.fully_connected()`, as in the book):"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 19,
@@ -568,8 +631,8 @@
   "outputs": [],
   "source": [
    "with tf.variable_scope(\"\", default_name=\"\", reuse=True):  # root scope\n",
-    "    weights1 = tf.get_variable(\"hidden1/weights\")\n",
-    "    weights2 = tf.get_variable(\"hidden2/weights\")\n",
+    "    weights1 = tf.get_variable(\"hidden1/kernel\")\n",
+    "    weights2 = tf.get_variable(\"hidden2/kernel\")\n",
    "    "
   ]
  },
@@ -689,6 +752,8 @@
   "source": [
    "tf.reset_default_graph()\n",
    "\n",
+    "from functools import partial\n",
+    "\n",
    "X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
    "y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
    "is_training = tf.placeholder(tf.bool, shape=(), name='is_training')\n",
@@ -701,12 +766,15 @@
    "    return max_norm\n",
    "\n",
    "with tf.name_scope(\"dnn\"):\n",
-    "    with arg_scope(\n",
-    "            [fully_connected],\n",
-    "            weights_regularizer=max_norm_regularizer(1.5)):\n",
-    "        hidden1 = fully_connected(X, n_hidden1, scope=\"hidden1\")\n",
-    "        hidden2 = fully_connected(hidden1, n_hidden2, scope=\"hidden2\")\n",
-    "        logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope=\"outputs\")\n",
+    "    \n",
+    "    my_dense_layer = partial(\n",
+    "            tf.layers.dense,\n",
+    "            activation=tf.nn.relu,\n",
+    "            kernel_regularizer=max_norm_regularizer(1.5))\n",
+    "\n",
+    "    hidden1 = my_dense_layer(X, n_hidden1, name=\"hidden1\")\n",
+    "    hidden2 = my_dense_layer(hidden1, n_hidden2, name=\"hidden2\")\n",
+    "    logits = my_dense_layer(hidden2, n_outputs, activation=None, name=\"outputs\")\n",
    "\n",
    "clip_all_weights = tf.get_collection(\"max_norm\")\n",
    "        \n",
@@ -770,6 +838,18 @@
    "show_graph(tf.get_default_graph())"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
+   "source": [
+    "Note: the book uses `tf.contrib.layers.dropout()` rather than `tf.layers.dropout()` (which did not exist when this chapter was written). It is now preferable to use `tf.layers.dropout()`, because anything in the contrib module may change or be deleted without notice. The `tf.layers.dropout()` function is almost identical to the `tf.contrib.layers.dropout()` function, except for a few minor differences. Most importantly:\n",
+    "* you must specify the dropout rate (`rate`) rather than the keep probability (`keep_prob`), where `rate` is simply equal to `1 - keep_prob`,\n",
+    "* the `is_training` parameter is renamed to `training`."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 30,
@@ -780,7 +860,7 @@
   },
   "outputs": [],
   "source": [
-    "from tensorflow.contrib.layers import dropout\n",
+    "from functools import partial\n",
    "\n",
    "tf.reset_default_graph()\n",
    "\n",
@@ -795,20 +875,22 @@
    "learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step,\n",
    "                                           decay_steps, decay_rate)\n",
    "\n",
-    "keep_prob = 0.5\n",
+    "dropout_rate = 0.5\n",
    "\n",
    "with tf.name_scope(\"dnn\"):\n",
    "    he_init = tf.contrib.layers.variance_scaling_initializer()\n",
-    "    with arg_scope(\n",
-    "            [fully_connected],\n",
-    "            activation_fn=tf.nn.elu,\n",
-    "            weights_initializer=he_init):\n",
-    "        X_drop = dropout(X, keep_prob, is_training=is_training)\n",
-    "        hidden1 = fully_connected(X_drop, n_hidden1, scope=\"hidden1\")\n",
-    "        hidden1_drop = dropout(hidden1, keep_prob, is_training=is_training)\n",
-    "        hidden2 = fully_connected(hidden1_drop, n_hidden2, scope=\"hidden2\")\n",
-    "        hidden2_drop = dropout(hidden2, keep_prob, is_training=is_training)\n",
-    "        logits = fully_connected(hidden2_drop, n_outputs, activation_fn=None, scope=\"outputs\")\n",
+    "\n",
+    "    my_dense_layer = partial(\n",
+    "            tf.layers.dense,\n",
+    "            activation=tf.nn.elu,\n",
+    "            kernel_initializer=he_init)\n",
+    "\n",
+    "    X_drop = tf.layers.dropout(X, dropout_rate, training=is_training)\n",
+    "    hidden1 = my_dense_layer(X_drop, n_hidden1, name=\"hidden1\")\n",
+    "    hidden1_drop = tf.layers.dropout(hidden1, dropout_rate, training=is_training)\n",
+    "    hidden2 = my_dense_layer(hidden1_drop, n_hidden2, name=\"hidden2\")\n",
+    "    hidden2_drop = tf.layers.dropout(hidden2, dropout_rate, training=is_training)\n",
+    "    logits = my_dense_layer(hidden2_drop, n_outputs, activation=None, name=\"outputs\")\n",
    "\n",
    "with tf.name_scope(\"loss\"):\n",
    "    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
@@ -970,7 +1052,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.5.2+"
+   "version": "3.5.3"
  },
  "nav_menu": {
   "height": "360px",