Use tf.layers instead of tf.contrib.layers

This commit is contained in:
Aurélien Geron
2017-04-30 10:21:27 +02:00
parent 14101abcf9
commit 326d32cae0
7 changed files with 531 additions and 258 deletions

View File

@@ -297,6 +297,20 @@
" display(HTML(iframe))"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Note: the book uses `tensorflow.contrib.layers.fully_connected()` rather than `tf.layers.dense()` (which did not exist when this chapter was written). It is now preferable to use `tf.layers.dense()`, because anything in the contrib module may change or be deleted without notice. The `dense()` function is almost identical to the `fully_connected()` function. The main differences relevant to this chapter are:\n",
"* several parameters are renamed: `scope` becomes `name`, `activation_fn` becomes `activation` (and similarly the `_fn` suffix is removed from other parameters such as `normalizer_fn`), `weights_initializer` becomes `kernel_initializer`, etc.\n",
"* the default `activation` is now `None` rather than `tf.nn.relu`.\n",
"* it does not support `tensorflow.contrib.framework.arg_scope()` (introduced later in chapter 11).\n",
"* it does not support regularizer params (introduced later in chapter 11)."
]
},
{
"cell_type": "code",
"execution_count": 12,
@@ -307,8 +321,6 @@
},
"outputs": [],
"source": [
"from tensorflow.contrib.layers import fully_connected\n",
"\n",
"tf.reset_default_graph()\n",
"\n",
"n_inputs = 28*28 # MNIST\n",
@@ -321,9 +333,9 @@
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
"\n",
"with tf.name_scope(\"dnn\"):\n",
" hidden1 = fully_connected(X, n_hidden1, activation_fn=leaky_relu, scope=\"hidden1\")\n",
" hidden2 = fully_connected(hidden1, n_hidden2, activation_fn=leaky_relu, scope=\"hidden2\")\n",
" logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope=\"outputs\")\n",
" hidden1 = tf.layers.dense(X, n_hidden1, activation=leaky_relu, name=\"hidden1\")\n",
" hidden2 = tf.layers.dense(hidden1, n_hidden2, activation=leaky_relu, name=\"hidden2\")\n",
" logits = tf.layers.dense(hidden2, n_outputs, name=\"outputs\")\n",
"\n",
"with tf.name_scope(\"loss\"):\n",
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
@@ -377,6 +389,24 @@
"# Batch Normalization"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Note: the book uses `tensorflow.contrib.layers.batch_norm()` rather than `tf.layers.batch_normalization()` (which did not exist when this chapter was written). It is now preferable to use `tf.layers.batch_normalization()`, because anything in the contrib module may change or be deleted without notice. Instead of using the `batch_norm()` function as a regularizer parameter to the `fully_connected()` function, we now use `batch_normalization()` and we explicitly create a distinct layer. The parameters are a bit different, in particular:\n",
"* `decay` is renamed to `momentum`,\n",
"* `is_training` is renamed to `training`,\n",
"* `updates_collections` is removed: the update operations needed by batch normalization are added to the `UPDATE_OPS` collection and you need to explicity run these operations during training (see the execution phase below),\n",
"* we don't need to specify `scale=True`, as that is the default.\n",
"\n",
"Also note that in order to run batch norm just _before_ each hidden layer's activation function, we apply the ELU activation function manually, right after the batch norm layer.\n",
"\n",
"Note: since the `tf.layers.dense()` function is incompatible with `tf.contrib.layers.arg_scope()` (which is used in the book), we now use python's `functools.partial()` function instead. It makes it easy to create a `my_dense_layer()` function that just calls `tf.layers.dense()` with the desired parameters automatically set (unless they are overridden when calling `my_dense_layer()`). As you can see, the code remains very similar."
]
},
{
"cell_type": "code",
"execution_count": 14,
@@ -387,11 +417,10 @@
},
"outputs": [],
"source": [
"from tensorflow.contrib.layers import fully_connected, batch_norm\n",
"from tensorflow.contrib.framework import arg_scope\n",
"\n",
"tf.reset_default_graph()\n",
"\n",
"from functools import partial\n",
"\n",
"n_inputs = 28 * 28 # MNIST\n",
"n_hidden1 = 300\n",
"n_hidden2 = 100\n",
@@ -405,22 +434,23 @@
"\n",
"with tf.name_scope(\"dnn\"):\n",
" he_init = tf.contrib.layers.variance_scaling_initializer()\n",
" batch_norm_params = {\n",
" 'is_training': is_training,\n",
" 'decay': 0.9,\n",
" 'updates_collections': None,\n",
" 'scale': True,\n",
" }\n",
"\n",
" with arg_scope(\n",
" [fully_connected],\n",
" activation_fn=tf.nn.elu,\n",
" weights_initializer=he_init,\n",
" normalizer_fn=batch_norm,\n",
" normalizer_params=batch_norm_params):\n",
" hidden1 = fully_connected(X, n_hidden1, scope=\"hidden1\")\n",
" hidden2 = fully_connected(hidden1, n_hidden2, scope=\"hidden2\")\n",
" logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope=\"outputs\")\n",
" my_batch_norm_layer = partial(\n",
" tf.layers.batch_normalization,\n",
" training=is_training,\n",
" momentum=0.9)\n",
"\n",
" my_dense_layer = partial(\n",
" tf.layers.dense,\n",
" kernel_initializer=he_init)\n",
"\n",
" hidden1 = my_dense_layer(X, n_hidden1, name=\"hidden1\")\n",
" bn1 = tf.nn.elu(my_batch_norm_layer(hidden1))\n",
" hidden2 = my_dense_layer(bn1, n_hidden2, name=\"hidden2\")\n",
" bn2 = tf.nn.elu(my_batch_norm_layer(hidden2))\n",
" logits_before_bn = my_dense_layer(bn2, n_outputs, activation=None, name=\"outputs\")\n",
" logits = my_batch_norm_layer(logits_before_bn)\n",
" extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n",
"\n",
"with tf.name_scope(\"loss\"):\n",
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
@@ -438,6 +468,16 @@
"saver = tf.train.Saver()"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Note: since we are using `tf.layers.batch_normalization()` rather than `tf.contrib.layers.batch_norm()` (as in the book), we need to explicitly run the extra update operations needed by batch normalization (`sess.run([training_op, extra_update_ops],...`)."
]
},
{
"cell_type": "code",
"execution_count": 15,
@@ -449,14 +489,14 @@
"outputs": [],
"source": [
"n_epochs = 20\n",
"batch_size = 50\n",
"batch_size = 200\n",
"\n",
"with tf.Session() as sess:\n",
" init.run()\n",
" for epoch in range(n_epochs):\n",
" for iteration in range(len(mnist.test.labels)//batch_size):\n",
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
" sess.run(training_op, feed_dict={is_training: True, X: X_batch, y: y_batch})\n",
" sess.run([training_op, extra_update_ops], feed_dict={is_training: True, X: X_batch, y: y_batch})\n",
" acc_train = accuracy.eval(feed_dict={is_training: False, X: X_batch, y: y_batch})\n",
" acc_test = accuracy.eval(feed_dict={is_training: False, X: mnist.test.images, y: mnist.test.labels})\n",
" print(epoch, \"Train accuracy:\", acc_train, \"Test accuracy:\", acc_test)\n",
@@ -464,11 +504,21 @@
" save_path = saver.save(sess, \"my_model_final.ckpt\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Now the same model with $\\ell_1$ regularization:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false,
"collapsed": true,
"deletable": true,
"editable": true
},
@@ -476,29 +526,32 @@
"source": [
"tf.reset_default_graph()\n",
"\n",
"from functools import partial\n",
"\n",
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
"is_training = tf.placeholder(tf.bool, shape=(), name='is_training')\n",
"\n",
"with tf.name_scope(\"dnn\"):\n",
" he_init = tf.contrib.layers.variance_scaling_initializer()\n",
" batch_norm_params = {\n",
" 'is_training': is_training,\n",
" 'decay': 0.9,\n",
" 'updates_collections': None,\n",
" 'scale': True,\n",
" }\n",
"\n",
" with arg_scope(\n",
" [fully_connected],\n",
" activation_fn=tf.nn.elu,\n",
" weights_initializer=he_init,\n",
" normalizer_fn=batch_norm,\n",
" normalizer_params=batch_norm_params,\n",
" weights_regularizer=tf.contrib.layers.l1_regularizer(0.01)):\n",
" hidden1 = fully_connected(X, n_hidden1, scope=\"hidden1\")\n",
" hidden2 = fully_connected(hidden1, n_hidden2, scope=\"hidden2\")\n",
" logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope=\"outputs\")\n",
" my_batch_norm_layer = partial(\n",
" tf.layers.batch_normalization,\n",
" training=is_training,\n",
" momentum=0.9)\n",
"\n",
" my_dense_layer = partial(\n",
" tf.layers.dense,\n",
" kernel_initializer=he_init,\n",
" kernel_regularizer=tf.contrib.layers.l1_regularizer(0.01))\n",
"\n",
" hidden1 = my_dense_layer(X, n_hidden1, name=\"hidden1\")\n",
" bn1 = tf.nn.elu(my_batch_norm_layer(hidden1))\n",
" hidden2 = my_dense_layer(bn1, n_hidden2, name=\"hidden2\")\n",
" bn2 = tf.nn.elu(my_batch_norm_layer(hidden2))\n",
" logits_before_bn = my_dense_layer(bn2, n_outputs, activation=None, name=\"outputs\")\n",
" logits = my_batch_norm_layer(logits_before_bn)\n",
" extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n",
"\n",
"with tf.name_scope(\"loss\"):\n",
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
@@ -513,7 +566,7 @@
"with tf.name_scope(\"eval\"):\n",
" correct = tf.nn.in_top_k(logits, y, 1)\n",
" accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))\n",
" \n",
"\n",
"init = tf.global_variables_initializer()\n",
"saver = tf.train.Saver()"
]
@@ -529,14 +582,14 @@
"outputs": [],
"source": [
"n_epochs = 20\n",
"batch_size = 50\n",
"batch_size = 200\n",
"\n",
"with tf.Session() as sess:\n",
" init.run()\n",
" for epoch in range(n_epochs):\n",
" for iteration in range(len(mnist.test.labels)//batch_size):\n",
" X_batch, y_batch = mnist.train.next_batch(batch_size)\n",
" sess.run(training_op, feed_dict={is_training: True, X: X_batch, y: y_batch})\n",
" sess.run([training_op, extra_update_ops], feed_dict={is_training: True, X: X_batch, y: y_batch})\n",
" acc_train = accuracy.eval(feed_dict={is_training: False, X: X_batch, y: y_batch})\n",
" acc_test = accuracy.eval(feed_dict={is_training: False, X: mnist.test.images, y: mnist.test.labels})\n",
" print(epoch, \"Train accuracy:\", acc_train, \"Test accuracy:\", acc_test)\n",
@@ -557,6 +610,16 @@
"[v.name for v in tf.global_variables()]"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Note: the weights variable created by the `tf.layers.dense()` function is called `\"kernel\"` (instead of `\"weights\"` when using the `tf.contrib.layers.fully_connected()`, as in the book):"
]
},
{
"cell_type": "code",
"execution_count": 19,
@@ -568,8 +631,8 @@
"outputs": [],
"source": [
"with tf.variable_scope(\"\", default_name=\"\", reuse=True): # root scope\n",
" weights1 = tf.get_variable(\"hidden1/weights\")\n",
" weights2 = tf.get_variable(\"hidden2/weights\")\n",
" weights1 = tf.get_variable(\"hidden1/kernel\")\n",
" weights2 = tf.get_variable(\"hidden2/kernel\")\n",
" "
]
},
@@ -689,6 +752,8 @@
"source": [
"tf.reset_default_graph()\n",
"\n",
"from functools import partial\n",
"\n",
"X = tf.placeholder(tf.float32, shape=(None, n_inputs), name=\"X\")\n",
"y = tf.placeholder(tf.int64, shape=(None), name=\"y\")\n",
"is_training = tf.placeholder(tf.bool, shape=(), name='is_training')\n",
@@ -701,12 +766,15 @@
" return max_norm\n",
"\n",
"with tf.name_scope(\"dnn\"):\n",
" with arg_scope(\n",
" [fully_connected],\n",
" weights_regularizer=max_norm_regularizer(1.5)):\n",
" hidden1 = fully_connected(X, n_hidden1, scope=\"hidden1\")\n",
" hidden2 = fully_connected(hidden1, n_hidden2, scope=\"hidden2\")\n",
" logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope=\"outputs\")\n",
" \n",
" my_dense_layer = partial(\n",
" tf.layers.dense,\n",
" activation=tf.nn.relu,\n",
" kernel_regularizer=max_norm_regularizer(1.5))\n",
"\n",
" hidden1 = my_dense_layer(X, n_hidden1, name=\"hidden1\")\n",
" hidden2 = my_dense_layer(hidden1, n_hidden2, name=\"hidden2\")\n",
" logits = my_dense_layer(hidden2, n_outputs, activation=None, name=\"outputs\")\n",
"\n",
"clip_all_weights = tf.get_collection(\"max_norm\")\n",
" \n",
@@ -770,6 +838,18 @@
"show_graph(tf.get_default_graph())"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"Note: the book uses `tf.contrib.layers.dropout()` rather than `tf.layers.dropout()` (which did not exist when this chapter was written). It is now preferable to use `tf.layers.dropout()`, because anything in the contrib module may change or be deleted without notice. The `tf.layers.dropout()` function is almost identical to the `tf.contrib.layers.dropout()` function, except for a few minor differences. Most importantly:\n",
"* you must specify the dropout rate (`rate`) rather than the keep probability (`keep_prob`), where `rate` is simply equal to `1 - keep_prob`,\n",
"* the `is_training` parameter is renamed to `training`."
]
},
{
"cell_type": "code",
"execution_count": 30,
@@ -780,7 +860,7 @@
},
"outputs": [],
"source": [
"from tensorflow.contrib.layers import dropout\n",
"from functools import partial\n",
"\n",
"tf.reset_default_graph()\n",
"\n",
@@ -795,20 +875,22 @@
"learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step,\n",
" decay_steps, decay_rate)\n",
"\n",
"keep_prob = 0.5\n",
"dropout_rate = 0.5\n",
"\n",
"with tf.name_scope(\"dnn\"):\n",
" he_init = tf.contrib.layers.variance_scaling_initializer()\n",
" with arg_scope(\n",
" [fully_connected],\n",
" activation_fn=tf.nn.elu,\n",
" weights_initializer=he_init):\n",
" X_drop = dropout(X, keep_prob, is_training=is_training)\n",
" hidden1 = fully_connected(X_drop, n_hidden1, scope=\"hidden1\")\n",
" hidden1_drop = dropout(hidden1, keep_prob, is_training=is_training)\n",
" hidden2 = fully_connected(hidden1_drop, n_hidden2, scope=\"hidden2\")\n",
" hidden2_drop = dropout(hidden2, keep_prob, is_training=is_training)\n",
" logits = fully_connected(hidden2_drop, n_outputs, activation_fn=None, scope=\"outputs\")\n",
"\n",
" my_dense_layer = partial(\n",
" tf.layers.dense,\n",
" activation=tf.nn.elu,\n",
" kernel_initializer=he_init)\n",
"\n",
" X_drop = tf.layers.dropout(X, dropout_rate, training=is_training)\n",
" hidden1 = my_dense_layer(X_drop, n_hidden1, name=\"hidden1\")\n",
" hidden1_drop = tf.layers.dropout(hidden1, dropout_rate, training=is_training)\n",
" hidden2 = my_dense_layer(hidden1_drop, n_hidden2, name=\"hidden2\")\n",
" hidden2_drop = tf.layers.dropout(hidden2, dropout_rate, training=is_training)\n",
" logits = my_dense_layer(hidden2_drop, n_outputs, activation=None, name=\"outputs\")\n",
"\n",
"with tf.name_scope(\"loss\"):\n",
" xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)\n",
@@ -970,7 +1052,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2+"
"version": "3.5.3"
},
"nav_menu": {
"height": "360px",