>Business >Overview of albumentations – Open-source library for sophisticated image augmentations

Overview of albumentations – Open-source library for sophisticated image augmentations

Native PyTorch and TensorFlow augmenters have a major drawback – they cannot (at the same time) go about augmenting an image and its segmentation mask, bounding box, or keypoint locations. So there are two possibilities – either write functions on your own or leverage third-party libraries.

Why Albumentations?

Albumentations has several advantages, key among them being

  • It is open-sourced.
  • Intuitive
  • Fast
  • Has in excess of sixty differing augmentations
  • Well-documented
  • And, perhaps what is most critical, can simultaneously go about augmenting an image and its segmentation mask, bounding box, or keypoint locations.

There are two other similar libraries – imgaug and Augmentor. Comparisons to Albumentations is outside the scope of this blog post.

Short Tutorial

In this brief tutorial, we’ll demonstrate how to go about augmenting images for segmentation and object detection tasks – merely with some lines of code.

If you’d like to adhere to this tutorial:

  • Setup Albumentations: It’s really worth checking if you possess the updated version, as older versions could be fraught with bugs. Version 1.0.0 is a fine release.
  • Download a test image with labels. This library accepts images as NumPy arrays, segmentation masks as NumPy arrays, and bounding boxes as lists.

Upon loading the test image, its binary pixel-wise segmentation mask, and a bounding box should be loaded along with it. The bounding box is defined as a 4-element list – [x_min, y_min, width, height]

import pickle

import numpy as np

import matplotlib.pyplot as plt

import matplotlib.patches as patches

 

# load data

with open(“image_data.pickle”, “rb”) as handle:

image_data = pickle.load(handle)

 

image = image_data[“image”]

mask = image_data[“mask”]

bbox = image_data[“bbox_coco”]

 

# visualize data

fig, ax = plt.subplots(1, 2, figsize=(12, 5))

ax[0].imshow(image)

ax[0].set_title(“Image”)

ax[1].imshow(image

bbox_rect = patches.Rectangle(

bbox[:2], bbox[2], bbox[3], linewidth=2, edgecolor=”r”, facecolor=”none”

)

ax[1].add_patch(bbox_rect)

ax[1].imshow(mask, alpha=0.3, cmap=”gray_r”)

ax[1].set_title(“Image + BBox + Mask”)

plt.show()

 

Upon loading and visualizing the image, you should get something like this.

Mask augmentation for segmentation: And now we can begin with Albumentations. Transformations here are defined very much like PyTorch and TensorFlow (Keras API):

  • Define transformation through bringing together various augmentations leveraging a compose object
  • Every augmentation has argument ‘p’ the probability to be applied, and also augmentations-particular arguments, such as ‘width’ and ‘height’ for RandomCrop.
  • Leverage defined transformation as a function to go about augmenting the image and its mask. This function returns a dictionary with keys – ‘image’ and ‘mask’

Below is specified the code on how to go about augmenting the image (and its mask) with random 256×256 crop (always) and horizontal flip (only in half of the cases)

import albumentations as A
# define augmentation
transform = A.Compose([
    A.RandomCrop(width=256, height=256, p=1), 
    A.HorizontalFlip(p=0.5),
])
# augment and visualize images
fig, ax = plt.subplots(2, 3, figsize=(15, 10))
for i in range(6):
    transformed = transform(image=image, mask=mask) 
    ax[i // 3, i % 3].imshow(transformed["image"])
    ax[i // 3, i % 3].imshow(transformed["mask"], alpha=0.3, cmap="gray_r")
plt.show()

As an outcome, you should receive something like this. Your augmented imagery will be different, as Albumentations generates random transformations.

Bounding boxes augmentation for object detection: It is like augmentation for segmentation masks, although:

  • Additionally, define ‘bbox_params’ where mention the format of the bounding box and argument for bounding box classes. ‘coco’ means bounding box in COCO dataset format – [x_min, y_min, width, height]. And argument ‘bbox_classes’ will be leveraged later to pass classes for bounding boxes.
  • ‘transform’ accepts bounding boxes as a list of lists. Also, it needs bounding box classes (as a list) even if there is a singular bounding box within the image.

Below is the code that does RandomCrop and HorizontalFlip at the same time for the image and its bounding box.

# define augmentation
transform = A.Compose([
    A.RandomCrop(width=256, height=256, p=1),
    A.HorizontalFlip(p=0.5),
], bbox_params=A.BboxParams(format='coco', label_fields=["bbox_classes"]))
# augment and visualize images
bboxes = [bbox] #`transform` accepts bounding boxes as a list of lists.
bbox_classes = ["horse"]
fig, ax = plt.subplots(2, 3, figsize=(15, 10))
for i in range(6):
    transformed = transform(
        image=image, 
        bboxes=bboxes, 
        bbox_classes=bbox_classes
    )
    ax[i // 3, i % 3].imshow(transformed["image"])
    trans_bbox = transformed["bboxes"][0]
    bbox_rect = patches.Rectangle(
        trans_bbox[:2],
        trans_bbox[2],
        trans_bbox[3],
        linewidth=2,
        edgecolor="r",
        facecolor="none",
    )
    ax[i // 3, i % 3].add_patch(bbox_rect)
plt.show()

And here is the outcome.

Simultaneous augmentation of several targets: Asides from enabling to simultaneously go about augmenting various masks or various bounding boxes, Albumentations has a feature to simultaneously augment differing kinds of labels, for example, a mask and a bounding box.

When calling a ‘transform’ just give it everything you have:

# define augmentation
transform = A.Compose([
    A.RandomCrop(width=256, height=256, p=1),
    A.HorizontalFlip(p=0.5),
], bbox_params=A.BboxParams(format='coco', label_fields=["bbox_classes"]))
# augment and visualize images
bboxes = [bbox]
bbox_classes = ["horse"]
fig, ax = plt.subplots(2, 3, figsize=(15, 10))
for i in range(6):
    transformed = transform(
        image=image, 
        mask=mask, 
        bboxes=bboxes, 
        bbox_classes=bbox_classes
    )
    ax[i // 3, i % 3].imshow(transformed["image"])
    trans_bbox = transformed["bboxes"][0]
    bbox_rect = patches.Rectangle(
        trans_bbox[:2],
        trans_bbox[2],
        trans_bbox[3],
        linewidth=2,
        edgecolor="r",
        facecolor="none",
    )
    ax[i // 3, i % 3].add_patch(bbox_rect)
    ax[i // 3, i % 3].imshow(transformed["mask"], alpha=0.3, cmap="gray_r")
plt.show()
The outcome will look somewhat like the image below.

Albumentations has a lot more features available, like augmentation for keypoints and AutoAugment. And it consists approximately sixty differing augmentation types – literally for any activity you require.
Compatibility with PyTorch and SensorFlow
Most probably you are going to leverage Albumentations as an aspect of PyTorch or TensorFlow training pipeline, so we’ll briefly detail how to do it.
Pytorch. When developing a custom dataset, define Albumentations transform in the ‘__init___’ function and call it in the ‘__getitem__’ function. PyTorch models need input data to be tensors, so ensure you include ‘ToTensorV2’ as the final step when defining ‘transform’
from torch.utils.data import Dataset
from albumentations.pytorch import ToTensorV2
class CustomDataset(Dataset):
    def __init__(self, images, masks):
        self.images = images  # assume it's a list of numpy images
        self.masks = masks  # assume it's a list of numpy masks
        self.transform = A.Compose([
            A.RandomCrop(width=256, height=256, p=1),
            A.HorizontalFlip(p=0.5),
            ToTensorV2(),
        ])
    def __len__(self):
        return len(self.images)
    def __getitem__(self, idx):
        image = self.images[idx]
        mask = self.masks[idx]
        transformed = self.transform(image=image, mask=mask)
        transformed_image = transformed["image"]
        transformed_mask = transformed["mask"]
        return transformed_image, transformed_mask

TensorFlow (Keras API) also facilitates developing Custom Datasets, just like PyTorch. So define Albumentations transform in the ‘__init__’ function an call it in the ‘__getitem__’ function.

from tensorflow import keras
class CustomDataset(keras.utils.Sequence):
    def __init__(self, images, masks):
        self.images = images
        self.masks = masks
        self.batch_size = 1
        self.img_size = (256, 256)
        self.transform = A.Compose([
            A.RandomCrop(width=256, height=256, p=1), 
            A.HorizontalFlip(p=0.5),
        ])
    def __len__(self):
        return len(self.images) // self.batch_size
    def __getitem__(self, idx):
        """Returns a batch of samples"""
        i = idx * self.batch_size
        batch_images = self.images[i : i + self.batch_size]
        batch_masks = self.masks[i : i + self.batch_size]
        batch_images_stacked = np.zeros(
            (self.batch_size,) + self.img_size + (3,), dtype="uint8"
        )
        batch_masks_stacked = np.zeros(
            (self.batch_size,) + self.img_size, dtype="float32"
        )
        for i in range(len(batch_images)):
            transformed = self.transform(
                image=batch_images[i], 
                mask=batch_masks[i]
            )
            batch_images_stacked[i] = transformed["image"]
            batch_masks_stacked[i] = transformed["mask"]
        return batch_images_stacked, batch_masks_stacked

That’s pretty much it! We hope this tutorial compelled you to try out Albumentations next time you are dealing with segmentation, object detection, or keypoint localization task.

Add Comment