Keras sequential models make deep neural network modeling about as simple as it can be.
As I discussed in my review of PyTorch, the foundational deep neural network (DNN) frameworks such as TensorFlow (Google) and CNTK (Microsoft) tend to be hard to use for model building. However, TensorFlow now contains three high-level APIs for creating models, one of which, tf.keras, is a bespoke version of Keras.
Keras proper, a high-level front end for building neural network models, ships with support for three back-end deep learning frameworks: TensorFlow, CNTK, and Theano. Amazon is currently working on developing a MXNet back end for Keras. It’s also possible to use PlaidML (an independent project) as a back end for Keras to take advantage of PlaidML’s OpenCL support for all GPUs.
As an aside, the name Keras is from the Greek for horn, κέρας, and refers to a passage from the Odyssey. The dream spirits that come through the gate made of horn are the ones that announce a true future; the ones that come through the gate made of ivory, deceive men with false visions.
TensorFlow is the default back end for Keras, and the one recommended for many use cases involving GPU acceleration on Nvidia hardware via CUDA and cuDNN, as well as for TPU acceleration in the Google Cloud. I used the TensorFlow back end configured for CPU-only to do my basic Keras testing on a MacBook Pro.
Keras vs. PyTorch
Keras (Google) and PyTorch (Facebook) are often mentioned in the same breath, especially when the subject is easy creation of deep neural networks. Both are designed to make it as simple as possible to build models. PyTorch says it’s designed for “fast, flexible experimentation.” Keras “was developed with a focus on enabling fast experimentation.” Both expose Python APIs.
There are some practical differences between the two. While Keras is a front end for three DNN frameworks, PyTorch provides its own back ends, primarily C/C++ code adapted from Torch, with some production features from Caffe2.
Keras has a high-level environment that reduces adding a layer to a neural network to one line of code in its sequential model, and needs one function call each for compiling and training a model. PyTorch model-building code can look very similar if you add layers using its sequential model, but PyTorch requires you to write your own optimization loop for training, as opposed to making a single call in Keras. Frankly, writing that loop isn’t a big deal.
Both Keras and PyTorch let you work at a lower level if you want. Keras calls that level its model or functional API. Keras also allows you to drop down even farther, to the Python coding level, by subclassing keras.Model
, but prefers the functional API when possible.
PyTorch claims two distinctions: the ability to change the model dynamically from step to step during training, and the ability to compute gradients using tape-based back-propagation. Keras lacks dynamic modeling, but it does have tape-based gradients, courtesy of the TensorFlow back end’s GradientTape
class.
Keras also has a Scikit-learn API, so that you can use the Scikit-learn grid search to perform hyperparameter optimization in Keras models. In a way, that ability can replace the need for PyTorch-like dynamic models, especially if you’re doing your training on multiple GPUs. Essentially, you’re doing the hyperparameter optimizations in parallel training runs instead of within a single training.
Keras simplicity
The 30-second intro to Keras explains that the Keras model, a way to organize layers in a neural network, is the framework’s core data structure. The sequential model is a linear stack of layers, and the layers can be described with one call each. By contrast, describing a layer in TensorFlow takes multiple lines of code.
Keras architecture
As noted above, the model is the core Keras data structure. There are two main types of models available in Keras: the sequential model, and the Model
class used with the functional API.
Both sequential and functional models have methods or attributes for layers, inputs, outputs, summary()
, get_config()
, from_config(config)
, get_weights()
, set_weights(weights)
, to_json()
, to_yaml()
, save_weights()
, and load_weights()
. I won’t dwell on keras.Model
subclassing, which doesn’t have all the listed methods and attributes.
Keras sequential models
As I discussed earlier, you can use the model.add()
method to add layers to sequential models.
The first layer in a sequential model normally specifies its input shape or dimension. The other layers get their input shapes from the output of the previous layers; in the code above, the relu
activation layer has an input dimension of 32, and the softmax
activation layer has an input dimension of 10. There is a mechanism for delayed sequential model building that infers the input shape the first time fit()
is called if you don’t specify the shape or dimension; it only seems to be mentioned in the sequential.py
source code.
Model compilation configures the learning process. It sets the optimizer, the loss function, and a list of metrics. Keras has a full set of all of these predefined, and calls the back end when appropriate. You can pass string identifiers for these, or instances of the appropriate classes.
Training takes NumPy arrays of input data and labels as input. You normally call the fit()
method to run the entire training process, but you can also feed in data batch by batch with the train_on_batch()
method. If you need even more control, you can train a model on data from a Python generator function, using the fit_generator()
method.
Keras layers
Keras has numerous layers pre-defined, organized into categories: core, convolutional, pooling, locally connected, recurrent, embedding, merge, advanced activations, normalization, and noise. There are also two layer wrappers, for time series generation and bidirectional RNNs, and an API for writing custom layers.
For example, the core layers include Dense, the regular densely-connected neural network layer that does a dot product with optional bias and activation function; Activation, which applies an activation function; Dropout, which randomly drops input units to 0 to prevent overfitting; and several more. Convolutional layers can be 1D (temporal convolution), 2D (spatial convolution), 3D (spatial convolution over volumes), separable, transposed, cropping, upsampling, and so on. In general, layers pass most of the work to the back end (TensorFlow, etc.) where compute-intensive operations such as convolution of large tensors can be optimized with, for example, GPU or TPU support.
Keras functional API
The Keras functional API is useful for creating complex models, such as multi-input/multi-output models, directed acyclic graphs (DAGs), and models with shared layers. The functional API uses the same layers as the sequential model, but provides more flexibility in putting them together. In the functional API you define the layers first, and then create the model, compile it, and fit (train) it.
The functional model that follows takes an input, runs it through two 64-unit Dense layers with ReLU (rectified linear unit) activation, and finally runs it through a 10-unit Dense layer with softmax (normalized exponential function) activation. It could just as easily have been created with a sequential model. The input could be the MNIST data set of handwritten numerals or something else that has 10 classes of 28×28 (784) pixel images.
You can do much cooler things with functional models than you can with sequential models, since you can blithely apply models (both the model architecture and the trained weights) to tensors. For example, the code that follows turns the image classification model defined above into a video classification model.
Installing Keras
Keras installation is basically a two-step process, meaning you have to install a back end as well as Keras. On my MacBook, I started by upgrading pip; then I upgraded TensorFlow and installed Keras, both with pip. I also freshened the source code for both repositories so that I could use the code for reference in areas where the documentation wasn’t complete enough for me.