If you’ve been following our "Computer Vision" blog series, you will have gained an appreciation for the complexity and effort involved in preparing data before it can be modelled. Previously we discussed data sampling, this time we will cover another important step in the pipeline called data augmentation.
In computer vision applications it’s important for data to have diversity because, as in real-world applications, random variation occurs everywhere and it’s an unavoidable part of working with real-world data. For example, an image may be over or underexposed, the object you are trying to identify might be partially obscured by a passing car, or you might be approaching it from a different angle. The model we create needs to be capable of handling random events like these. Ideally, we would go out and collect data for every possible scenario, but that is not a realistic approach. It would take a long time to get photos of the same street under all possible weather, seasonal, and lighting conditions. Data augmentation is a way that we can introduce variety and diversity into the data artificially. These augmentation techniques tend to be broken down into two categories: basic image manipulation and machine learning.
Basic image manipulation
These methods manipulate images in ways that many of us are familiar with through photo editing software. We could change the geometric properties of the image by resizing, flipping, rotating, warping, or cropping it. We could change the saturation, brightness or contrast, blur, sharpen, or convert the image to grayscale. These techniques are not the only ones available but illustrate the possibilities. When using these techniques, we need to ensure that the images are not being manipulated in a way that causes the image to lose its defining features. These features need to be retained as they underpin how a model will interpret the image.
In the context of extracting traffic signs, it’s not uncommon for something to be partially blocking a sign, such as a passing car. These obstructions can make it difficult for the model to recognise a sign. A way that we can combat this issue is to use data augmentation to create obstructions artificially so that the model is better prepared for obscured signage. The augmented images may look like the images below, where random portions of the images have been blocked out. We use these augmented images in combination with the original ones to build the model.
More complex techniques use machine learning and artificial intelligence to create entirely new images based on the original data. A great example of this can be found here. Each time you refresh the webpage it will show you a new image of a cat. Each image you look at was produce by a computer, based on photos of real cats. If you want to learn more about this method, called a generative adversarial network (GAN), there are numerous examples showing it in action here.
Another method, called neural style transfer, imposes the style of one or more images onto another image. While the internet would have you believe that it is only used for creating unique artwork (and it certainly is), it can also be very useful for augmented images for analysis. The below examples show how it can be used to stylize an image from a day scene to a night scene or from summer to winter. This directly addresses some of the issues we have with wanting photos of the same scene under different lighting or weather conditions. If you want to have a play around with stylizing, you can try it out here.
The biggest drawback to using these more advanced techniques is the time they take to implement. Each transformation takes longer than it would to crop or rotate a photo. And time is often of the essence when working with vast amounts of data. The techniques used are heavily dependent on the specific application and the resources available. Similar to what was discussed in the last blog post that covered sampling.
You should now have a better understanding of what data augmentation is, how it can be an important technique when it comes to computer vision, and what some of the challenges are when it comes to implementing them.
Computer vision blog series:
Blog written by Joe Duncan, Data Science Researcher