In our previous computer vision blog, we talked about how big data can be managed in the context of computer vision. As part of that discussion, we mentioned that selecting a smaller sample from a much later dataset can be a practical strategy for handling big data for computer vision applications. There are many ways to define a sample. These sampling strategies are not unique to computer vision but are rooted in statistics. This blog will look at more traditional methods based around random sampling, as well as a machine learning option and why these samples are important.
For this project, we are not interested in individual images, but in groups of images that make up entire sections of a road. The reason for using sections over individual video frames is that we want to have the ability to understand the visibility of a sign as a vehicle moves towards it. An ideal sample for this application will contain variety consistent with the full dataset. Variety in this instance relates to different weather conditions, lighting conditions, surroundings (urban or rural), turns, and intersections among many others. Including this type of variety gives the computer a basis for identifying signs in a range of scenarios. The types of signs present in the sample will be very important as well. Ideally, we want each of the signs of interest to be well represented in the sample. As you can see, there are plenty of things to consider when selecting a sampling strategy.
The simplest method is to use a computer to randomly select several road sections. This method is fast to implement as there’s little complexity involved. A drawback, however, is that we may randomly select sections of road where certain signs cannot be found or the road is visually very similar. Within random sampling, there are multiple methods including selecting sections completely at random, selecting at regular intervals (i.e. every fifth section), and selecting a random section of road from each state highway.
We began by sampling completely at random as this technique is easier to scale, meaning we can add more road sections as needed. As each new sample is added to the dataset, we can track any associated improvement in performance and decide as to whether we need more data. Approaching it in this way means that we can try to minimise the amount of data and computation time for the model we produce.
A more complex technique is to use machine learning, to better understand the contents of each frame. This is done by taking each frame and using an algorithm to group together visually similar images, called clustering. Clustering methods are unsupervised, which means the computer must decide, without supervision, what these groups are. Unsupervised methods are often used for problems where there is no right or wrong answer.
When testing this technique, we discovered that the computer may automatically group together left or right-hand turns, images that were over or under-exposed, roads surrounded by trees, and roads in urban or rural settings. From these groups or clusters, it's possible to determine which sections of road are visually similar and generate a dataset that contains at least one example from each cluster, giving the sample plenty of variety. The key drawback of this method is that the complexity means it requires time and plenty of computing power to run. It is for this reason that we are using random sampling in the first instance. Using a more complex technique such as this can be introduced later to improve performance if necessary.
From this blog, hopefully you should now have an appreciation for why data sampling is such an important part of building a machine learning solution and can understand a few ways that a sample can be selected. For the next blog, we will cover data augmentation, a process by which we modify the data before we apply any models. When it comes to images, this often means changing the image itself. We will talk about the role that data augmentation plays in computer vision and why it’s an important part of the process.
Computer vision blog series:
Blog written by Joe Duncan, Data Science Researcher