7.11. Recurrent Neural Network¶
DLP/NN: deal
CNN: capture the spatial data structure(image analysis)
RNN: capture the sequential data structure(sentence, stock price)
Example : “I like eating apple.” and “Apple is a company.”. If we want to specify different meanings of a word, we need to take the nearby words into consideration.
How does RNN works:
The hidden layer remembers the infomation of the previous hidden layer \(h_{t-1}\), and then learn from the current data \(X_t\).
$\(
O_t=g(Wh_t)\\
h_t=f(UX_t+Vh_{t-1})
\)$
\(X_t\): input vector
\(h_t\): hidden layer vector
\(O_t\): output vector
\(W,U,V\): parameter matrices
Example Assume we have trained a RNN, 2 nodes with weights \(W,U,V=(0.5,0.5)'\), our sequence
\((1,1)',(1,2)',...\)
\(h_1=(0.5*1+0.5*1,0.5*1+0.5*1)=(1,1),O_1=(1,1)\)
\(h_2=(0.5*1+0.5*2,0.5*1+0.5*2)+(1,1)=(2.5,2.5),O_2=(2.5,2.5)\)
LSTM: “long short term memory”, a commonly used RNN model.
from numpy import array
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.embeddings import Embedding
# define documents
docs = ['Well done!',
'Good work',
'Great effort',
'nice work',
'Excellent!',
'Weak',
'Poor effort!',
'not good',
'poor work',
'Could have done better.']
# define class labels
labels = array([1,1,1,1,1,0,0,0,0,0])
# integer encode the documents
vocab_size = 50
encoded_docs = [one_hot(d, vocab_size) for d in docs]
print(encoded_docs)
[[16, 1], [46, 17], [5, 10], [35, 17], [27], [2], [4, 10], [40, 46], [4, 17], [26, 41, 1, 6]]
e = Embedding(200, 32, input_length=50)
# pad documents to a max length of 4 words
max_length = 4
padded_docs = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
print(padded_docs)
# define the model
model = Sequential()
model.add(Embedding(vocab_size, 8, input_length=max_length))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# summarize the model
print(model.summary())
# fit the model
model.fit(padded_docs, labels, epochs=50, verbose=0)
# evaluate the model
loss, accuracy = model.evaluate(padded_docs, labels, verbose=0)
print('Accuracy: %f' % (accuracy*100))
[[16 1 0 0]
[46 17 0 0]
[ 5 10 0 0]
[35 17 0 0]
[27 0 0 0]
[ 2 0 0 0]
[ 4 10 0 0]
[40 46 0 0]
[ 4 17 0 0]
[26 41 1 6]]
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (None, 4, 8) 400
flatten_1 (Flatten) (None, 32) 0
dense_2 (Dense) (None, 1) 33
=================================================================
Total params: 433
Trainable params: 433
Non-trainable params: 0
_________________________________________________________________
None
Accuracy: 89.999998
7.12. Convolutional Neural Network¶
7.12.1. Filter¶
Filter is also known as kernel. It is a designed matrix in CNN, which extracts the local features from a data.
7.12.2. Basic concepts¶
Padding: Addition of (typically) 0-valued pixels on the borders of an image
Pooling: Reduce the dimensions of data by combining the outputs of previous neuron into a single neuron in the next layer.
Channels: Number of filters in a layer
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 123s 1us/step
170508288/170498071 [==============================] - 123s 1us/step
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896
max_pooling2d (MaxPooling2D (None, 15, 15, 32) 0
)
conv2d_1 (Conv2D) (None, 13, 13, 64) 18496
max_pooling2d_1 (MaxPooling (None, 6, 6, 64) 0
2D)
conv2d_2 (Conv2D) (None, 4, 4, 64) 36928
=================================================================
Total params: 56,320
Trainable params: 56,320
Non-trainable params: 0
_________________________________________________________________
The input data is \(32*32*3\) tensor, and the layers are:
Conv layer: \(3*3\) filter, channels=32
Pooling layer: \(2*2\) maxpooling
Conv layer: \(3*3\) filter, channels=64
Pooling layer: \(2*2\) maxpooling
Conv layer: \(3*3\) filter, channels=64
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 30, 30, 32) 896
max_pooling2d (MaxPooling2D (None, 15, 15, 32) 0
)
conv2d_1 (Conv2D) (None, 13, 13, 64) 18496
max_pooling2d_1 (MaxPooling (None, 6, 6, 64) 0
2D)
conv2d_2 (Conv2D) (None, 4, 4, 64) 36928
flatten (Flatten) (None, 1024) 0
dense (Dense) (None, 64) 65600
dense_1 (Dense) (None, 10) 650
=================================================================
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
_________________________________________________________________
To complete model, the last layer(dense layer) outputs the 1-dimensional vector to make a classification.
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
Epoch 1/10
1563/1563 [==============================] - 59s 37ms/step - loss: 1.5502 - accuracy: 0.4321 - val_loss: 1.3032 - val_accuracy: 0.5331
Epoch 2/10
1563/1563 [==============================] - 58s 37ms/step - loss: 1.2001 - accuracy: 0.5765 - val_loss: 1.1816 - val_accuracy: 0.5806
Epoch 3/10
1563/1563 [==============================] - 54s 34ms/step - loss: 1.0436 - accuracy: 0.6316 - val_loss: 1.0422 - val_accuracy: 0.6347
Epoch 4/10
1563/1563 [==============================] - 54s 35ms/step - loss: 0.9299 - accuracy: 0.6737 - val_loss: 0.9405 - val_accuracy: 0.6712
Epoch 5/10
1563/1563 [==============================] - 55s 35ms/step - loss: 0.8501 - accuracy: 0.7027 - val_loss: 0.9336 - val_accuracy: 0.6752
Epoch 6/10
1563/1563 [==============================] - 55s 35ms/step - loss: 0.7862 - accuracy: 0.7249 - val_loss: 0.8879 - val_accuracy: 0.6898
Epoch 7/10
1563/1563 [==============================] - 55s 35ms/step - loss: 0.7354 - accuracy: 0.7414 - val_loss: 0.8518 - val_accuracy: 0.7038
Epoch 8/10
1563/1563 [==============================] - 57s 36ms/step - loss: 0.6873 - accuracy: 0.7593 - val_loss: 0.8349 - val_accuracy: 0.7101
Epoch 9/10
1563/1563 [==============================] - 56s 36ms/step - loss: 0.6497 - accuracy: 0.7722 - val_loss: 0.8684 - val_accuracy: 0.7094
Epoch 10/10
1563/1563 [==============================] - 56s 36ms/step - loss: 0.6069 - accuracy: 0.7865 - val_loss: 0.8711 - val_accuracy: 0.7054
print(test_acc)
0.7053999900817871