Deep Learning with Gated recurrent unit (GRU)

24th May 2019

GRU is LSTM variant which was introduced by K. Cho. GRU retains the resisting vanishing gradient properties of LSTM but GRU’s are internally simpler and faster than LSTM’s.

LSTM had 3 gates input, output and forget gates. Where in GRU we only have two gates an update gate z and a reset gate r.

Update gate: The update gate decides how much of previous memory to keep around.

Reset input: The reset gate defines how to combine new input with the previous value.

Unlike LSTM in GRU, there is no persistent cell state distinct from the hidden state as in LSTM.

The equation’s for GRU gating mechanism:


  • GRU and LSTM have comparable performance and there is no simple way to recommend one or the other for a specific task
  • GRU’s are faster to train and need fewer data to generalise
  • When there is enough data, an LSTM’s greater expressive power may lead to better results
  • Like LSTMs, GRUs are drop-in replacements for the SimpleRNN cell

GRU example using Keras

  1. Importing all the required import
  2. Load the imdb dataset
  3. Define the vocabulary of 3000 words
  4. Set max length of each row to 300, pad the sequence less than 300 with zeroes and trim which are more than 300.
  5. Create a GRU model using Keras API.
  6. Fit the GRU model and check the accuracy.
from keras.datasets import imdb
from keras.layers import GRU, LSTM, CuDNNGRU, CuDNNLSTM, Activation
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential

num_words = 30000
maxlen = 300

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = num_words)

# pad the sequences with zeros 
# padding parameter is set to 'post' => 0's are appended to end of sequences
X_train = pad_sequences(X_train, maxlen = maxlen, padding = 'post')
X_test = pad_sequences(X_test, maxlen = maxlen, padding = 'post')

X_train = X_train.reshape(X_train.shape + (1,))
X_test = X_test.reshape(X_test.shape + (1,))

def gru_model():
    model = Sequential()
    model.add(GRU(50, input_shape = (300,1), return_sequences = True))
    model.add(GRU(1, return_sequences = False))
    model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
    return model
model = gru_model()

%%time, y_train, batch_size = 100, epochs = 10, verbose = 0)

scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Read also: The power of Recurrent Neural Networks in NLP

Manish Prasad

An experienced data scientist with a passion to work on new challenges

To give you the best possible experience, this site uses cookies. Continuing to use means you agree on our use of cookies.