Introduction to Deep Learning part2 - Linear Regression

Posted on Tue 16 January 2018 in DL



last time we figured out how to use tensors with the TensorFlow framework. And now it's high time to do some DEEP LEARNING.

In this short tutorial, we are going to implement a so-called "Hello World!" in the machine learning world - the linear regression.

I hope everyone is familiar with, if not here is a short reminder. Let's start with the line equation.

\begin{equation} y = kx+b \end{equation}

where \(k\) and \(b\) some coinficents, and x is input data, and y is output data.

Offen, you may see vectorized equation:

\begin{equation} y = WX+B \end{equation}

where \(W\) is a tensor with weights and \(B\) is bias coefficient, also a tensor. \(X\) is a tensor with input data.

So, shortly, with TensorFlow, we want to find \(W\) and \(B\) coefficients.

For that reseon we have to introduce the loss function:

\begin{equation} L = \sum^N_{i=1}(\hat{y}-y)^2\rightarrow min \end{equation}

where \(y\) is ground truth and \(\hat{y}\) the predicted values. Basically, we are computing the famous Mean Squared Error (MSE).

If it's something new for you, I recommend this blog post In Depth: Linear Regression. But I think you've seen it many times.

Enough talks & math, let's do some code!

I would like to post the whole code first and then we can go line by line by discussing it.

import numpy as np
import tensorflow as tf

n_samples, batch_size, num_steps = 1000, 100, 20000

# train data
x_data = np.random.uniform(1, 10, (n_samples, 1))
y_data = 2 * x_data + 1 + np.random.normal(0, 2, (n_samples, 1))

# placeholders
x = tf.placeholder(tf.float32, shape=(batch_size, 1))
y = tf.placeholder(tf.float32, shape=(batch_size, 1))

# code optimization
with tf.variable_scope('linear-regression'):
    k = tf.Variable(tf.random_normal((1, 1), stddev=0.001), name='slope')
    b = tf.Variable(tf.zeros(1,), name='bias')

# y_hat
y_pred = tf.matmul(x, k) + b

# loss function
loss = tf.reduce_mean(tf.square(y_pred - y))

# optimizer
optimizer = tf.train.GradientDescentOptimizer(1.0e-2).minimize(loss)

display_step = 200
with tf.Session() as session:
    for i in range(num_steps):
        indices = np.random.choice(n_samples, batch_size)
        x_batch = x_data[indices]
        y_batch = y_data[indices]
        _, loss_val, k_val, b_val =[optimizer, loss, k, b],
                                                feed_dict={x: x_batch, y: y_batch})
        if (i + 1) % display_step == 0:
            print(f'Epoch {i+1}: loss = {loss_val:.3f}, ' + 
                  f'k = {np.sum(k_val).item():.3f}, b = {np.sum(b_val).item():.3f}')

Ok, first we import libraries, which we are going to use. Second, we define numbers of train samples, epochs, and the batch size.

Nextly, we have to place our tensors somewhere, and what a luck, TensorFlow has placeholders.

The Next few lines are quite advance, so feel free to skip, however, if you curious as me, just check this link

After, we create the prediction function using tf.matmul function. The same story is for loss and optimizer function.

Finally, now we can train our first TensorFlow model, with initiating the tf.Session.

And the output should be something like this:

Epoch 200: loss = 3.592, k = 2.087, b = 0.643
Epoch 400: loss = 3.647, k = 2.039, b = 0.796
Epoch 600: loss = 4.458, k = 2.031, b = 0.869
Epoch 800: loss = 4.031, k = 2.036, b = 0.924
Epoch 19600: loss = 3.476, k = 2.013, b = 0.941
Epoch 19800: loss = 4.460, k = 2.009, b = 0.941
Epoch 20000: loss = 4.013, k = 1.995, b = 0.927

Of course, the exact values are hard to get, since the gradient boosting is just an approximation method. However, the final results are pretty close to our input once.

That is it, you did a great job today, share it with your colleagues!