GRU is LSTM variant which was introduced by K. Cho. GRU retains the resisting vanishing gradient properties of LSTM but GRU’s are internally simpler and faster than LSTM’s.
LSTM had 3 gates input, output and forget gates. Where in GRU we only have two gates an update gate z and a reset gate r.
Update gate: The update gate decides how much of previous memory to keep around.
Reset input: The reset gate defines how to combine new input with the previous value.
Unlike LSTM in GRU, there is no persistent cell state distinct from the hidden state as in LSTM.
The equation’s for GRU gating mechanism:
GRU v/s LSTM
- GRU and LSTM have comparable performance and there is no simple way to recommend one or the other for a specific task
- GRU’s are faster to train and need fewer data to generalise
- When there is enough data, an LSTM’s greater expressive power may lead to better results
- Like LSTMs, GRUs are drop-in replacements for the SimpleRNN cell
GRU example using Keras
- Importing all the required import
- Load the imdb dataset
- Define the vocabulary of 3000 words
- Set max length of each row to 300, pad the sequence less than 300 with zeroes and trim which are more than 300.
- Create a GRU model using Keras API.
- Fit the GRU model and check the accuracy.
from keras.datasets import imdb from keras.layers import GRU, LSTM, CuDNNGRU, CuDNNLSTM, Activation from keras.preprocessing.sequence import pad_sequences from keras.models import Sequential num_words = 30000 maxlen = 300 (X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = num_words) # pad the sequences with zeros # padding parameter is set to 'post' => 0's are appended to end of sequences X_train = pad_sequences(X_train, maxlen = maxlen, padding = 'post') X_test = pad_sequences(X_test, maxlen = maxlen, padding = 'post') X_train = X_train.reshape(X_train.shape + (1,)) X_test = X_test.reshape(X_test.shape + (1,)) def gru_model(): model = Sequential() model.add(GRU(50, input_shape = (300,1), return_sequences = True)) model.add(GRU(1, return_sequences = False)) model.add(Activation('sigmoid')) model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy']) return model model = gru_model() %%time model.fit(X_train, y_train, batch_size = 100, epochs = 10, verbose = 0) scores = model.evaluate(X_test, y_test, verbose=0) print("Accuracy: %.2f%%" % (scores*100))