TensorFlow has played a vital role in being a consistent framework and a compatible work zone for the dynamic machine learning and deep learning field.
One of the most hyped domains is the neural network, and its working principle can be demonstrated easily with the help of the gradient descent algorithm. And TensorFlow assures almost all the libraries required to enhance this experience.
According to the basic concept, the neural network is how the human brain cell operates, decides, recognizes, and functions. Our processing unit can only memorize the information fed at a time.
Rather it follows a learning rate and revises back to back based on dealing with situations. In this process, there is an important term called the
cost function or
loss function or
partial derivative or the basic term
To visualize the concept, we have tried to demonstrate a common concept, the “Linear Regression” model. If you are not a newbie to machine learning, then we assume you are already familiar with it.
What this model does is extract a relationship between the independent and dependent variables with the help of a regression line.
While establishing the regression line, it is required to know the
sum of square errors (SSE), which is the
gradient for this particular case. We expect the
SSE to be the minimum when we train our model.
The constant process only avails this of training. Here, we calculate the output from our specified equation and reinitialize the driver values with the
gradient x learning rate.
learning rate ensures stability so that our
loss function doesn’t show abrupt consequences.
We will update the base values depending on the gradients after initiating a base value. This backward pass is necessary to check if the model was trained properly.
And the gradient tape is just that component that successively stores the
cost function along with the updated
outputs. So we get a complete picture of the whole procedure with the detail of the values.
Let’s visualize the method with an example.
Use of Gradient Tape in TensorFlow
In this section, we will not do a theoretical overview. Rather we will follow up on the practical codes and try to demonstrate how we rely on the gradient tape.
So, let’s jump on to the code bases. We are using
Google Colab for our instance.
Initially, we will set the library imports and generate a random dataset for the drive. We will also plot a scatter plot to see how the variables behave.
#Necessary Imports import numpy as np import tensorflow as tf import matplotlib.pyplot as plt import os # Generating random data x = np.random.uniform(0.0, 1.0, (200)) y =0.3 + 0.5 * x + np.random.normal(0.0, 0.3, len(x)) # x # y plt.scatter(x,y) plt.xlabel('Independent') plt.ylabel('dependent') plt.show()
The next code fences will include the regression model and the loss functions declaration. The important part to focus on is that the regression model outputs the predicted value for the dependent variable
y, and the equation goes like this:
x is the independent variable, aka the model’s input. Mainly, we will try to see if we have the minimum error between the given
y and predicted
The loss function includes the sum of square error (SSE):
# The following equation for the model is ----> y" = a + b * x # Linear Regression class regression(): def __init__(self): self.a = tf.Variable(initial_value = 0, dtype = tf.float32) self.b = tf.Variable(initial_value = 0, dtype = tf.float32) def __call__(self, x): x = tf.convert_to_tensor(x, dtype = tf.float32) y_est = tf.add(self.a, tf.multiply(self.b, x)) return y_est
model = regression()
# The loss function/ Cost function/ Optimization function (sum of square error) def loss_func(y_true, y_pred): #both y_true and y_pred are tensors sse = tf. reduce_sum(tf.square(tf.subtract(y_true, y_pred))) return sse
Next, we will define the training section, including the gradient tape. The equation to update the variables
# Gradient Descent ----> a = ai - sse * lr (learning rate) # b = bi - sse * lr def train(model, inputs, outputs, lr): # convert outputs to tensor y_true = tf.convert_to_tensor(outputs, dtype=tf.float32) #Gradient Tape with tf.GradientTape() as g: y_pred = model(inputs) current_loss = loss_func(y_true, y_pred) #slopes/ partal_differentiation/ gradients da, db = g.gradient(current_loss, [model.a, model.b]) #update values model.a.assign_sub(da * lr) model.b.assign_sub(db * lr)
The next segment is to set the regression line. We will set some epochs for better learning.
We have examined the saturated point and the necessary epoch value for which the loss function is the minimum. Let’s plot the epoch and cost values to see how the output occurred from the whole training section.
# Fitting def plotting (x, y): plt.scatter(x,y) #scatter plt.plot(x, model(x), 'r--')
model = regression() a_val =  b_val =  cost_val =  #epochs epochs = 70 #learning rate lr = 0.001 for e in range(epochs): a_val.append(model.a) b_val.append(model.b) #prediction values and errors y_pred = model(x) cost_value = loss_func(y, y_pred) cost_val.append(cost_value) #Train train(model, x, y, lr) plotting(x, y) plt.show() print('Epoch: %d, Loss: %0.2f' %(e, cost_value))
The output scatter-plot from the last epoch:
plt.plot(cost_val) plt.xlabel('epochs') plt.ylabel('cost values')
So, it can be said that the cost values reduced drastically in the epoch range of
0-10. And when you run all the epochs, you will notice with the values that the saturated point is near the
Technically, the gradient tape helped collect all the corresponding values, and thus the visualization is more intuitive. You can also preview the whole code in this link.