In my role as Data Science Researcher at Abley, working closely with my team we have identified that a computer vision solution could be used to detect traffic signs, which will add value and save time for the road safety work that we do here (see previous blog). The next step is to address the time-consuming process of data labelling. This blog will explain more about what data labelling is, why it’s an important step in preparing data, and the challenges involved for this project.
When building a computer vision model to locate traffic signs from imagery, it’s helpful to first understand what our desired outcome will look like. Is our aim to identify only the part of the image that contains the sign? How many different varieties of signs do we want to identify? These are things we need to keep in mind as we get the data ready to train our model.
For the purpose of this project, the aim is to identify multiple traffic signs within the same image. In the field of computer vision, this is called object detection. Object detection is a computing task that identifies the specific area within a given image that contains an object of interest. These objects are typically defined by employing a set of descriptive labels. For our application, the labeling process involves taking each individual video frame, drawing a box around each traffic sign in the frame and assigning each of those boxes a label (e.g. give way sign). Adding labels to a single image may take 20 seconds, meaning it’s possible to label 1,440 images in a regular working day. This may sound like a lot of images, but most applications require at least 10,000 images. That’s nearly seven full working days!
For a computer to recognise a sign it needs to be shown many examples of that sign. Not dissimilar from how a child learns to perceive the world. If a child has seen 40 dogs, but only 2 cats, then it would be easy to mistake a cat for a dog. Both animals have fur, four legs, and a tail, but from the child’s past experience it’s more likely to be a dog than a cat. Labelling is a way of creating flashcards for a computer to learn from.
There are hundreds of signs on our roads. If each label corresponds to one of these signs, and each label needs 1,000 examples, then we may need to label nearly 1,000,000 images - at 20 seconds each that’s 695 working days! Of course, that’s assuming there are 1,000 examples of that sign on our state highways in the first place.
There are a couple of ways to tackle this issue. First, we can eliminate any types of signs that aren’t relevant to our project. Second, we can group together similar signs under a single label. Rather than having a label for each speed limit (e.g. 30km/h, 50km/h etc), they can all be labelled as ‘Speed Limit’, reducing the number of labels from 11 to 1. That’s 10,000 less examples and saves us 7 days of work! There are drawbacks to this approach. The labels can become too generic. If you’re interested in knowing what the speed limit is, using a generic ‘Speed Limit’ label won’t be useful. Another issue occurs when trying to identify a sign that occurs infrequently. Defining labels is often an iterative process. The labels are refine as we learn more about the data and how common or rare each label is.
Though the process of data labelling may seem time consuming, hopefully you now have a better understanding of how critical and involved the process is. In this blog, I specified that a computer vision solution needs thousands of images to train a computer to recognise signs. But how do we manage that much information? If more examples are better, how do we know when we have enough information? Next time I’ll answer these questions and discuss the realities of working with big data.
This is the third in a series of blogs about Joe's research and the progress we are making to use computer vision and machine learning to add value to client solutions.
Blog written by Joe Duncan, Data Science Researcher