Project 2: Traffic Sign classification

Interpreting traffic signs is an essential part of driving for every human driver. Inevitably an autonomous vehicle has got to understand traffic signs just like humans to react appropriately. The second Udacity Self Driving Car Engineering Nanodegree project focused on the interpretation aspect, the classification of traffic signs.

To do this a convolutional neural network was to be built and trained to decide for each provided image which traffic sign out of a set of 43 different german traffic signs the presented image fits best.

A neural network is a collection of connected neurons. There can be, and often are, several interconnected layers which learn from training data to make conclusions to reach the same solution or get as close as possible to what is provided as correct solution with the input. E.g. if a neural network is shown an image of a car and given the label “car” then, over training iterations, the neurons learn to respond to certain features in the input to collectively generate the label “car”.¬† Its success is evaluated by a scoring function and the neurons sensitivities are adjusted to optimize the score.

In a convolutional layer of a neural network we exploit that in images the location of a feature does not change its meaning. I.e. a car is a car regardless no matter where it is in the image. To do this a convolutional layer will look only a small area of the input at a time.  Directly applied to the input image this would pick up features like edge/lines of different orientation or spots of a certain colour. In convolutional layers deeper in the neural network would combine these small subunits to make out patterns. This expands until in the end concepts like an entire car, animals or, in this case, traffic signs are recognized. An in-depth and highly commendable resource on neural networks is the Stanford course on neural networks (http://cs231n.github.io/convolutional-networks/).

Back to the problem at hand. Udacity provided a training dataset of about 35000 labelled images of traffic signs as well as test and validation sets. The dataset is split that way to avoid evaluating the model on data it has already seen. For this reason the validation set is used when iterating to improve the model to evaluate it. Over many iteration the characteristics of this validation set bleed into the model and another independent dataset is required to confirm the effectiveness of the model. This dataset is the test dataset.

All the dataset images have a resolution of 32 by 32 pixels and 3 (RGB) colour channels. The labels are numbers which correspond to the traffic sign classes.

sign_original
Traffic sign from the Udacity dataset

The plan for this problem was to first build a simple neural network and train it on the input images. The first attempt used the simple LeNet 5 Architecture. It is a small neural network which works well and does not require long to train.

lenet

This led to a decent accuracy straight away with ~89% of the time predicting the traffic sign right against the validation set. To improve this further the training data was normalized in two ways. The first is histogram normalization. In short histogram normalization aims to fully utilize the range of possible values in colour schemes. For a standard RGB image each pixel will have a value from 0-255. Often images do not make full use of this range and e.g. mainly use 50-150 then histogram normalization stretches out existing colours to use the full range improving the visible contrast.

The second normalization is specifically for the maths operations training the neural network. If the nodes of the neural network can operate with values close to 0 this helps to converge more quickly in training (http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf)

Those two measures improved the accuracy to 95% and above the 93% accuracy required for the project to get accepted. To take it one step further I looked into data augmentation. Data augmentation extends the available training dataset by adding images generated from existing ones, but with small alterations. Generally this augmentations are done which resemble realistic permutations. For traffic sign images they could be rotated or tilted as the signs could be on a slope or the sign photo could be taken from different angles. Different lighting conditions could be simulated by changing the brightness of the image randomly or fake shadows can be drawn onto the image. These permutations help the built model to generalize better as they learn to cope with various conditions.

sign_norm_aug
Rotated and normalized traffic sign image

In this scenario I slightly rotated images randomly. This was combined with an improved network architecture combining an inception module, which combines a few different convolutional elements of different receptive field size in parallel. Instead of choosing this manually the network learns which ones work best on it’s own.

All these efforts combined resulted in a total accuracy of 96% against the test set and was submitted in this form. This could likely be improved further through more data augmentation (e.g. brightness, tilting and shadows) and more complex neural networks. Methods I learned in later projects, like saving the model whenever it improves during training and early termination when no progress is made anymore shortens iterations and helps to avoid overfitting the model.

As usual the code is hosted on Github and you can download and try out my Jupyter notebook.

 

Advertisements

Project 1: Basic Lane Line Detection

Here it is, my first project centric blog post. It took me much longer than I really wanted to finally start this blog post. Mostly due to spending much more time than necessary on the first time. That’s not too bad because the learning process has been a lot of fun. The results can be found in my GitHub repository. But without further ado, let’s come to the first project.

The first projects aim is to detect and mark up highway lane lines on images as well as video footage from that could be from a simple dashcam. This project focuses on basic computer vision techniques to find lines in images. The main ones here are canny edge detection, which looks for areas of high contrast in images and marks it up, and hough transformation of this edge detected version of the image to find lines.

The top left image shows the input image. The image gets converted to grayscale to just have to work with one colour channel. To smooth out some noise the image was blurred before the canny edge detector got applied. The output of the canny edge detection is depicted in the top right. This already looks way simpler to work on for a computer. There is way less going on in the image and the lane lines are clearly visible. Note that the image is now binary, either black or white, nothing in between. However there are still loads of contours of trees, the landscape and the horizon. To get rid of them we apply a trapezoid region of interest and discard the rest (bottom).

Then this image gets fed into a hough transform. Simply put through the hough transform we aim to find points that lie on lines. For this we have to tune the parameters specifying how many dots need to be on the line, how large gaps are allowed to exist and how long a line has to be to be considered a line. The output of such a hough transform can be seen below. This does not show the lines of the scene displayed above and does not have a region of interest applied. Furthermore I have coloured the lines based on their slope.

hough1
Line detection through Hough transform

For this output the hough transform was configured to already find long continuous stretches of lane lines. This had the advantage that gaps in between the lane lines were already bridged. But this later turned out to be problematic in some situations. More about that later.

The next step for me was to extrapolate the lines and only draw them up to a certain point as, due to the parameters I have chosen, the lines can reach beyond the horizon crossing over after intercepting. Because of this and because lines are inaccurate to detect after a certain distance, as the lines get very small, I did not draw the lane lines north of the bottom 40% of the image. To get continuous lane lines from the bottom to the 40% line I sorted the detected lines, with region of interest applied, by their slope into left lines and right lines.

The image above shows that for lane lines, at least two lines are drawn. One at the left and one at the right edge of the lane line. To merge them into one lane line per side the intersections with both, the bottom horizontal and the 40% horizontal line were calculated and averaged for the left and right lines. The outcome was one lane line per side which is more robust, due to literally less moving parts.

solidYellowCurve2.jpg_final_08
Averaged and extrapolated lane lines

Applying the built pipeline to movie clips instead of individual images showed that the approach, to do the hough transform to detect long lines required relatively strict conditions to be met, resulted in stretches of footage without any detected lane lines. To overcome this the threshold of the hough line detector were relaxed to allow more and shorter lines to be input for the line merging step. This stabilized the lane lines and helped finding lane lines most of the time as can be seen in the video below.

The video shows my final pipeline processing the optional challenge video. It is particularly challenging because of the different shades of road surface, shades cast by the trees and the curvature of the motorway. My original pipeline did not do as well and to reach this point I had to filter the original input images for white and yellow colour ranges to improve the detection of lanes. As you can see this still results in jittery lines and could be improved further.

If you are interested to explore this in-depth on my github repo please do! It contains the report I submitted to Udacity, more videos, images and the source code, an ipython notebook.

Welcome, driveting world of self driving cars!

Hello there, curious reader. I am a Software Developer living in Cambridge (UK) who got to know about the self driving car engineer course at Udacity through a newsapp a few months ago.

The idea of engineering the mind to make a car drive itself sounded very intriguing and I quickly found myself engaged and in the middle of an introduction to machine learning course. My programming background allowed me to progress quickly, enjoying the machine learning techniques and learning Python, a refreshing change from Java.

Researching the domain I got to know about deep learning and (deep) neural networks, a trendy machine learning method that supposedly resembles a network of neurons applied to machine learning problems. It is e.g. applied to speech recognition, analysis of image contents and apparently to teach cars how to drive. Needless to say that I, as a biologist and computer scientist, obviously was hooked and applied to the self driving car engineer course as soon as I finished the machine learning intro course.

And recently I got accepted into Udacity’s self driving car engineer nanodregee course and decided to write about my experiences throughout the course. I think especially the projects will lend themselves¬† to blog about. The course finally starts the day after tomorrow and I am very excited to get my hands on the first project:
Detecting highway lane lines from a video stream.