On this submit, we’ll discover ways to carry out semantic picture segmentation utilizing pre-trained fashions out there in TensorFlow Hub. TensorFlow Hub is a library and platform designed for sharing, discovering, and reusing pre-trained machine studying fashions. The first aim of TensorFlow Hub is to simplify the method of reusing present fashions, thereby selling collaboration, lowering redundant work, and accelerating analysis and improvement in machine studying. Customers can seek for pre-trained fashions, referred to as modules, which have been contributed by the group or supplied by Google. These modules will be simply built-in right into a person’s personal machine studying initiatives with only a few traces of code.
Picture Segmentation is analogous to picture classification however on the pixel degree. The aim of picture segmentation is to simplify the illustration of a picture and make it extra significant for evaluation or additional processing. In different phrases, it goals to separate the necessary elements of a picture, corresponding to objects or areas of curiosity, from the background or irrelevant areas. You possibly can learn extra about Image Segmentation in our introductory submit on the topic.
On this instance, we’ll use a picture segmentation mannequin camvid-hrnetv2-w48 that was skilled on CamVid (Cambridge-driving Labeled Video Database), which is a driving and scene understanding dataset containing pictures extracted from 5 video sequences taken throughout real-world driving situations. The dataset accommodates 32 courses. A number of different picture segmentation fashions will be discovered here as nicely.
import os
import numpy as np
import cv2
import zipfile
import requests
import glob as glob
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import warnings
import logging
import absl
# Filter absl warnings
warnings.filterwarnings("ignore", module="absl")
# Seize all warnings within the logging system
logging.captureWarnings(True)
# Set the absl logger degree to 'error' to suppress warnings
absl_logger = logging.getLogger("absl")
absl_logger.setLevel(logging.ERROR)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
Obtain Pattern (CamVid) Pictures
def download_file(url, save_name):
url = url
file = requests.get(url)
open(save_name, 'wb').write(file.content material)
def unzip(zip_file=None):
strive:
with zipfile.ZipFile(zip_file) as z:
z.extractall("./")
print("Extracted all")
besides:
print("Invalid file")
download_file(
'https://www.dropbox.com/s/5jhbvmqgzbzl9fd/camvid_images.zip?dl=1',
'camvid_images.zip'
)
unzip(zip_file="camvid_images.zip")
Extracted all
Show Pattern Pictures
image_paths = sorted(glob.glob('camvid_images' + '/*.png'))
for idx in vary(len(image_paths)):
print(image_paths[idx])
camvid_images/camvid_sample_1.png camvid_images/camvid_sample_2.png camvid_images/camvid_sample_3.png camvid_images/camvid_sample_4.png
def load_image(path):
picture = cv2.imread(path)
# Convert picture in BGR format to RGB.
picture = cv2.cvtColor(picture, cv2.COLOR_BGR2RGB)
# Add a batch dimension which is required by the mannequin.
picture = np.expand_dims(picture, axis=0)/255.0
return picture
pictures = []
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(16, 12))
for idx, axis in enumerate(ax.flat):
picture = load_image(image_paths[idx])
pictures.append(picture)
axis.imshow(picture[0])
axis.axis('off')
Outline a Dictionary that Maps Class IDs to Class Names and Class Colours
class_index
is a dictionary that maps all 32 courses within the CamVid dataset with their related class IDs and RGB coloration labels.
class_index =
{
0: [(64, 128, 64), 'Animal'],
1: [(192, 0, 128), 'Archway'],
2: [(0, 128, 192), 'Bicyclist'],
3: [(0, 128, 64), 'Bridge'],
4: [(128, 0, 0), 'Building'],
5: [(64, 0, 128), 'Car'],
6: [(64, 0, 192), 'Cart/Luggage/Pram'],
7: [(192, 128, 64), 'Child'],
8: [(192, 192, 128),'Column Pole'],
9: [(64, 64, 128), 'Fence'],
10: [(128, 0, 192), 'LaneMkgs Driv'],
11: [(192, 0, 64), 'LaneMkgs NonDriv'],
12: [(128, 128, 64), 'Misc Text'],
13: [(192, 0, 192), 'Motorcycle/Scooter'],
14: [(128, 64, 64), 'Other Moving'],
15: [(64, 192, 128), 'Parking Block'],
16: [(64, 64, 0), 'Pedestrian'],
17: [(128, 64, 128), 'Road'],
18: [(128, 128, 192),'Road Shoulder'],
19: [(0, 0, 192), 'Sidewalk'],
20: [(192, 128, 128),'Sign Symbol'],
21: [(128, 128, 128),'Sky'],
22: [(64, 128, 192), 'SUV/Pickup/Truck'],
23: [(0, 0, 64), 'Traffic Cone'],
24: [(0, 64, 64), 'Traffic Light'],
25: [(192, 64, 128), 'Train'],
26: [(128, 128, 0), 'Tree'],
27: [(192, 128, 192),'Truck/Bus'],
28: [(64, 0, 64), 'Tunnel'],
29: [(192, 192, 0), 'Vegetation Misc'],
30: [(0, 0, 0), 'Void'],
31: [(64, 192, 0), 'Wall']
}
Mannequin Inference utilizing TensorFlow Hub
TensorFlow Hub accommodates many alternative pre-trained segmentation models. Right here we’ll use the Excessive-Decision Community (HRNet) segmentation mannequin skilled on CamVid (camvid-hrnetv2-w48
). The mannequin has been pre-trained on the Imagenet ILSVRC-2012 classification job and fine-tuned on CamVid.
Load the Mannequin from TensorFlow Hub
We are able to load the mannequin into reminiscence utilizing the URL to the mannequin web page.
model_url="https://tfhub.dev/google/HRNet/camvid-hrnetv2-w48/1"
print('loading mannequin: ', model_url)
seg_model = hub.load(model_url)
print('nmodel loaded!')
loading mannequin... mannequin loaded!
Carry out Inference
Earlier than we formalize the code to course of a number of pictures and post-process the outcomes, let’s first see the way to carry out inference on a single picture and research the output from the mannequin.
Name the Mannequin’s precict()
Methodology
# Make a prediction utilizing the primary picture within the listing of pictures.
pred_mask = seg_model.predict(pictures[0])
# The anticipated masks has the next form: [B, H, W, C].
print('Form of predicted masks: ', pred_mask.form)
Form of predicted masks: (1, 720, 960, 33)
Submit-Course of the Predicted Segmentation Masks
The anticipated segmentation masks returned by the mannequin accommodates a separate channel for every class. Every channel accommodates the likelihood {that a} given pixel from the enter picture is related to the category for that channel. This knowledge, subsequently, requires some post-processing to acquire significant outcomes. A number of steps must be carried out to reach at a ultimate visible illustration.
- Take away the batch dimension and the background class.
- Assign a category label to each pixel within the picture based mostly on the very best likelihood rating throughout all channels.
- The earlier step leads to a single-channel picture that accommodates the category labels for every pixel. We, subsequently, have to map these class IDs to RGB values so we are able to visualize the outcomes as a color-coded segmentation map.
Take away Batch Dimension and Background Class
# Convert tensor to numpy array.
pred_mask = pred_mask.numpy()
# The first label is the background class added by the mannequin, however we are able to take away it for this dataset.
pred_mask = pred_mask[:,:,:,1:]
# We additionally have to take away the batch dimension.
pred_mask = np.squeeze(pred_mask)
# Print the form to substantiate: [H, W, C].
print('Form of predicted masks after elimination of batch dimension and background class: ', pred_mask.form)
Form of predicted masks after elimination of batch dimension and background class: (720, 960, 32)
Visualize the Intermediate Outcomes
# Every channel in `pred_mask` accommodates the possibilities that the pixels
# within the authentic picture are related to the category for that channel.
plt.determine(figsize=(20,6))
plt.subplot(1,3,1)
plt.title('Enter Picture', fontsize=14)
plt.imshow(np.squeeze(pictures[0]))
plt.subplot(1,3,2)
plt.title('Predictions for Class: Highway', fontsize=14)
plt.imshow(pred_mask[:,:,17], cmap='grey'); # Class 17 corresponds to the 'street' class
plt.axis('off')
plt.subplot(1,3,3)
plt.title('Predictions for Class: Sky', fontsize=14)
plt.imshow(pred_mask[:,:,21], cmap='grey'); # Class 21 corresponds to the 'sky' class
plt.axis('off');
Assign Every Pixel a Class Label
Right here we assign each pixel within the picture with a category ID based mostly on the category with the very best likelihood. We are able to visualize this as a grayscale picture. Within the code cell beneath, we’ll show simply the highest portion of the picture to spotlight a number of of the category assignments.
# Assign every pixel within the picture a category ID based mostly on the channel that accommodates the
# highest likelihood rating. This may be applied utilizing the `argmax` operate.
pred_mask_class = np.argmax(pred_mask, axis=-1)
plt.determine(figsize=(15,5));
plt.subplot(1,2,1)
plt.title('Enter Picture', fontsize=12)
plt.imshow(np.squeeze(pictures[0]))
plt.subplot(1,2,2)
plt.title('Segmentation Masks', fontsize=12)
plt.imshow(pred_mask_class, cmap='grey')
plt.gca().add_patch(Rectangle((450,200),200,3, edgecolor="crimson", facecolor="none", lw=.5));
Let’s now examine a small area of the segmentation masks to raised perceive how the values map to class IDs. For reference, the highest portion (200 rows) of the segmentation masks (pred_mask_class
) have been overlayed on the enter picture. Discover that areas within the segmentation masks correspond to distinct areas within the enter picture (e.g., buildings, sky, bushes).
# Print the category IDs from the final row within the above picture.
print(pred_mask_class[200,450:650])
[ 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 21 26 26 21 21 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26]
Discover that the values in pred_mask_class
for the small part indicated by the crimson rectangle correspond to the category IDs for buildings, sky, and bushes.
Convert the Single Channel Masks to a Coloration Illustration
We may even have to make use of the operate beneath that may convert a single channel masks to an RGB illustration for visualization functions. Every class ID within the single-channel masks might be transformed to a unique coloration based on the class_index
dictionary mapping.
# Perform to transform a single channel masks illustration to an RGB masks.
def class_to_rgb(mask_class, class_index):
# Create RGB channels
r_map = np.zeros_like(mask_class).astype(np.uint8)
g_map = np.zeros_like(mask_class).astype(np.uint8)
b_map = np.zeros_like(mask_class).astype(np.uint8)
# Populate RGB coloration channels based mostly on the colour assigned to every class.
for class_id in vary(len(class_index)):
index = mask_class == class_id
r_map[index] = class_index[class_id][0][0]
g_map[index] = class_index[class_id][0][1]
b_map[index] = class_index[class_id][0][2]
seg_map_rgb = np.stack([r_map, g_map, b_map], axis=2)
return seg_map_rgb
Convert the grayscale segmentation masks to a coloration segmentation masks and show the outcomes.
pred_mask_rgb = class_to_rgb(pred_mask_class, class_index)
plt.determine(figsize=(20,8))
plt.subplot(1,3,1)
plt.title('Enter Picture', fontsize=14)
plt.imshow(np.squeeze(pictures[0]))
plt.axis('off')
plt.subplot(1,3,2)
plt.title('Grayscale Segmentation', fontsize=14)
plt.imshow(pred_mask_class, cmap='grey')
plt.axis('off')
plt.subplot(1,3,3)
plt.title('Coloration Segmentation', fontsize=14)
plt.imshow(pred_mask_rgb, cmap='grey')
plt.axis('off');
Formalize the Implementation
On this part, we’ll formalize the implementation and might want to outline some extra comfort capabilities.
image_overlay()
image_overlay()
is a helper operate to overlay an RGB masks on high of the unique picture to raised admire how the predictions line up with the unique picture.
# Perform to overlay a segmentation map on high of an RGB picture.
def image_overlay(picture, seg_map_rgb):
alpha = 1.0 # Transparency for the unique picture.
beta = 0.6 # Transparency for the segmentation map.
gamma = 0.0 # Scalar added to every sum.
picture = (picture*255.0).astype(np.uint8)
seg_map_rgb = cv2.cvtColor(seg_map_rgb, cv2.COLOR_RGB2BGR)
picture = cv2.addWeighted(picture, alpha, seg_map_rgb, beta, gamma)
picture = cv2.cvtColor(picture, cv2.COLOR_BGR2RGB)
return picture
run_inference()
To carry out inference on a number of pictures, we outline the operate beneath, which accepts a listing of pictures and a pre-trained mannequin. This operate additionally handles all the post-processing required to compute the ultimate segmentation masks in addition to the overlay.
def run_inference(pictures, mannequin):
for img in pictures:
# Ahead move by means of the mannequin (convert the tensor output to a numpy array).
pred_mask = mannequin.predict(img).numpy()
# Take away the background class added by the mannequin.
pred_mask = pred_mask[:,:,:,1:]
# Take away the batch dimension.
pred_mask = np.squeeze(pred_mask)
# `pred_mask` is a numpy array of form [H, W, 32] the place every channel accommodates the likelihood
# scores related to a given class. We nonetheless have to assign a single class to every pixel
# which is completed utilizing the argmax operate throughout the final dimension to acquire the category labels.
pred_mask_class = np.argmax(pred_mask, axis=-1)
# Convert the expected (class) segmentation map to a coloration segmentation map.
pred_mask_rgb = class_to_rgb(pred_mask_class, class_index)
fig = plt.determine(figsize=(20, 15))
# Show the unique picture.
ax1 = fig.add_subplot(1,3,1)
ax1.imshow(img[0])
ax1.title.set_text('Enter Picture')
plt.axis('off')
# Show the expected coloration segmentation masks.
ax2 = fig.add_subplot(1,3,2)
ax2.set_title('Predicted Masks')
ax2.imshow(pred_mask_rgb)
plt.axis('off')
# Show the expected coloration segmentation masks overlayed on the unique picture.
overlayed_image = image_overlay(img[0], pred_mask_rgb)
ax4 = fig.add_subplot(1,3,3)
ax4.set_title('Overlayed Picture')
ax4.imshow(overlayed_image)
plt.axis('off')
plt.present()
plot_color_legend()
The operate plot_color_legend()
creates a coloration legend for the CamVid dataset, which is useful for confirming the category assignments by the mannequin.
def plot_color_legend(class_index):
# Extract colours and labels from class_index dictionary.
color_array = np.array([[v[0][0], v[0][1], v[0][2]] for v in class_index.values()]).astype(np.uint8)
class_labels = [val[1] for val in class_index.values()]
fig, ax = plt.subplots(nrows=2, ncols=16, figsize=(20, 3))
plt.subplots_adjust(wspace = 0.5, hspace=0.01)
# Show coloration legend.
for i, axis in enumerate(ax.flat):
axis.imshow(color_array[i][None, None, :])
axis.set_title(class_labels[i], fontsize = 8)
axis.axis('off')
plot_color_legend(class_index)
Make Predictions on the Pattern Pictures
Now, let’s use this operate to carry out inference on the pattern pictures utilizing the three fashions we chosen above.
run_inference(pictures, seg_model)
Conclusion
On this submit, we coated the way to use pre-trained picture segmentation fashions out there in TensorFlow Hub. TensorFlow Hub simplifies the method of reusing present fashions by offering a central repository for sharing, discovering, and reusing pre-trained machine studying fashions. A necessary facet of working with these fashions entails comprehending the method of decoding their output. Picture segmentation fashions produce multi-channel segmentation masks, which encompass likelihood scores that require additional processing to generate the ultimate segmentation maps.