
Introducing GPU Instances: Using Deep Learning to Obtain Frontal Rendering of Facial Images
We just released GPU Instances, our first servers equipped with graphical processing units (GPUs). Powered by high-end 16-GB NVIDIA Tesla P100 cards and highly efficient Intel Xeon Gold 6148 CPUs, they are ideal for data processing, artificial intelligence, rendering, and video encoding. In addition to the dedicated GPU and 10 Intel Xeon Gold cores, each instance comes with 45 GB of memory, 400 GB of local NVMe SSD storage, and is billed €1 per hour or €500 per month.
Today, we present you with a concrete use case for GPU Instances using deep learning to obtain a frontal rendering of facial images. Feel free to try it too. To do so, visit the Scaleway console to ask for quotas before creating your first GPU Instance.
GPU Overview
Graphical processing unit (GPU) became a go-to term for the specialized electronic circuit designed to power graphics on a machine in the late 1990s, when it was popularized by the chip manufacturer NVIDIA.
GPUs were originally produced primarily to drive high-quality gaming experiences, producing life-like digital graphics. Today, those capabilities are being harnessed more broadly to accelerate computational workloads in areas such as artificial intelligence, machine learning and complex modeling.
GPU Instances at Scaleway were designed to be optimized for taking huge batches of data and performing the same operation over and over very quickly. The combination of an efficient CPU with a powerful GPU will deliver the best value of system performance and price for your deep learning applications.
Writing Your Own Face Frontalization Software from Scratch
Screenwriters never cease to amuse us with bizarre portrayals of the tech industry, ranging from cringeworthy to hilarious. With the current advances in artificial intelligence, however, some of the most unrealistic technologies from the TV screens are coming to life.
For example, the Enhance software from CSI: NY (or Les Experts : Manhattan for our francophone readers) has already been outshone by the state-of-the-art Super Resolution neural networks. On a more extreme side of the imagination, there is Enemy of the state:
“Rotating [a video surveillance footage] 75 degrees around the vertical” must have seemed completely nonsensical long after 1998 when the movie came out, evinced by the YouTube comments below this particular excerpt:

Despite the apparent pessimism of the audience, thanks to machine learning today anyone with a little bit of Python knowledge, a large enough dataset, and a Scaleway account can take a stab at writing a sci-fi drama worthy program.
Introduction
Forget MNIST, forget the boring cat vs. dog classifiers, today we are going to learn how to do something far more exciting! This article is inspired by the impressive work by R. Huang et al. (Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis), in which the authors synthesise frontal views of people’s faces given their images at various angles.
We are not going to try to reproduce the state-of-the-art model by R. Huang et al. You will learn:
- How to use NVIDIA’s
DALI
library for highly optimized pre-processing of images on the GPU and feeding them into a deep learning model. - How to code a Generative Adversarial Network, praised as “the most interesting idea in the last ten years in Machine Learning” by Yann LeCun, the director of Facebook AI, in
PyTorch
You will also have your very own Generative Adversarial Network set up to be trained on a dataset of your choice. Without further ado, let’s dig in!
Step 1: Starting and Configuring a Gpu Instance on Scaleway
If you have not already gotten yourself a GPU instance hosted by Scaleway, you may do so by:
- Logging in to your Scaleway console.
- Selecting the
Compute
tab on the left sidebar and clicking on the green+ Create a server
button. - Choose the tab
GPU OS
inChoose an Image
andGPU
inSelect a Server
.
For this project, feel free to choose either of the two GPU OS images currently available (10.1
and 9.2
refer to the corresponding versions of CUDA
) and select RENDER-S
as your server.

4 . Click on the green Create a new server
button at the bottom of the page, and within seconds, your very own GPU Instance will be up and running!

5 . You can now ssh
to it using the IP Address
that you read off your list of instances under the Compute
tab:
ssh root@[YOUR GPU INSTANCE IP ADDRESS]
6a. The Docker
way:
If you are familiar with Docker
, a convenient containerization platform that allows to package up applications together with all their dependencies, go ahead and pull our Docker image containing all the packages and the code needed for the Frontalization project, as well as a small sample dataset:
nvidia-docker run -it rg.fr-par.scw.cloud/opetrova/frontalization:tutorial
root@b272693df1ca:/Frontalization# ls
Dockerfile data.py main.py network.py test.py training_set
(Note that you have to use nvidia-docker
rather than the regular docker
command due to the presence of a GPU.) You are now inside the Frontalization
directory containing the four Python
files whose contents we’ll go over below, and the training_set
directory containing a sample training dataset. Great start, you can now proceed to Step 2!
6b. The native way:
If you are not familiar with Docker
, no problem, you can easily set up the environment by hand. Scaleway GPU instances come with CUDA
, Python
and conda
already installed, but at the time of writing, you have to downgrade the Python version to Python 3.6 in order for Nvidia’s DALI library to function:
conda install -y python==3.6.7
conda install -y pytorch torchvision
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali==0.6.1
You can upload your own training set onto your GPU instance via:
scp -r path/to/local/training_set root@[YOUR GPU INSTANCE IP ADDRESS]:/root/Frontalization
and save the Python code that you will see below inside the Frontalization
directory using your terminal text editor of choice (e.g. nano
or vim
, both of which are already installed). Alternatively, you may clone Scaleway’s GitHub repository for this project.
Step 2: Setting Up Your Data
At the heart of any machine learning project, lies the data. Unfortunately, Scaleway cannot provide the CMU Multi-PIE Face Database that we used for training due to copyright, so we shall proceed assuming you already have a dataset that you would like to train your model on. In order to make use of NVIDIA Data Loading Library (DALI), the images should be in.webp format. The dimensions of the images do not matter, since we have DALI to resize all the inputs to the input size required by our network (128×128 pixels), but a 1:1 ratio is desirable in order to obtain the most realistic synthesised images.
The advantage of using DALI over, e.g., a standard PyTorch Dataset, is that whatever pre-processing (resizing, cropping, etc.) is necessary, is performed on the GPU rather than the CPU, after which pre-processed images on the GPU are fed straight into the neural network.
Managing our dataset:
For the face frontalization project, we set up our dataset in the following way: the dataset folder contains a subfolder and a target frontal image for each person (aka subject). In principle, the names of the subfolders and the target images do not have to be identical (as they are in the figure below), but if we are to separately sort all the subfolders and all the targets alphanumerically, the ones corresponding to the same subject must appear at the same position on the two lists of names.
As you can see, subfolder 001/
corresponding to subject 001 contains images of the person pictured in 001.webp
— these are closely cropped images of the face under different poses, lighting conditions, and varying face expressions. For the purposes of face frontalization, it is crucial to have the frontal images aligned as close to one another as possible, whereas the other (profile) images have a little bit more leeway.
For instance, our target frontal images are all squares and cropped in such a way that the bottom of the person’s chin is located at the bottom of the image, and the centred point between the inner corners of the eyes is situated at 0.8h above and 0.5h to the right of the lower left corner (h being the image’s height). This way, once the images are resized to 128×128, the face features all appear at more or less the same locations on the images in the training set, and the network can learn to generate the said features and combine them together into realistic synthetic faces.

Building a DALI pipeline
:
We are now going to build a pipeline for our dataset that is going to inherit from nvidia.dali.pipeline.Pipeline
. At the time of writing, DALI does not directly support reading (image, image) pairs from a directory, so we will be making use of nvidia.dali.ops.ExternalSource()
to pass the inputs and the targets to the pipeline.
data.py
import collections
from random import shuffle
import os
from os import listdir
from os.path import join
import numpy as np
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types
def is.webp(filename):
return any(filename.endswith(extension) for extension in [".webp", ".webp"])
def get_subdirs(directory):
subdirs = sorted([join(directory,name) for name in sorted(os.listdir(directory)) if os.path.isdir(os.path.join(directory, name))])
return subdirs
flatten = lambda l: [item for sublist in l for item in sublist]
class ExternalInputIterator(object):
def __init__(self, imageset_dir, batch_size, random_shuffle=False):
self.images_dir = imageset_dir
self.batch_size = batch_size
# First, figure out what are the inputs and what are the targets in your directory structure:
# Get a list of filenames for the target (frontal) images
self.frontals = np.array([join(imageset_dir, frontal_file) for frontal_file in sorted(os.listdir(imageset_dir)) if is.webp(frontal_file)])
# Get a list of lists of filenames for the input (profile) images for each person
profile_files = [[join(person_dir, profile_file) for profile_file in sorted(os.listdir(person_dir)) if is.webp(profile_file)] for person_dir in get_subdirs(imageset_dir)]
# Build a flat list of frontal indices, corresponding to the *flattened* profile_files
# The reason we are doing it this way is that we need to keep track of the multiple inputs corresponding to each target
frontal_ind = []
for ind, profiles in enumerate(profile_files):
frontal_ind += [ind]*len(profiles)
self.frontal_indices = np.array(frontal_ind)
# Now that we have built frontal_indices, we can flatten profile_files
self.profiles = np.array(flatten(profile_files))
# Shuffle the (input, target) pairs if necessary: in practice, it is profiles and frontal_indices that get shuffled
if random_shuffle:
ind = np.array(range(len(self.frontal_indices)))
shuffle(ind)
self.profiles = self.profiles[ind]
self.frontal_indices = self.frontal_indices[ind]
def __iter__(self):
self.i = 0
self.n = len(self.frontal_indices)
return self
# Return a batch of (input, target) pairs
def __next__(self):
profiles = []
frontals = []
for _ in range(self.batch_size):
profile_filename = self.profiles[self.i]
frontal_filename = self.frontals[self.frontal_indices[self.i]]
profile = open(profile_filename, 'rb')
frontal = open(frontal_filename, 'rb')
profiles.append(np.frombuffer(profile.read(), dtype = np.uint8))
frontals.append(np.frombuffer(frontal.read(), dtype = np.uint8))
profile.close()
frontal.close()
self.i = (self.i + 1) % self.n
return (profiles, frontals)
next = __next__
class ImagePipeline(Pipeline):
'''
Constructor arguments:
- imageset_dir: directory containing the dataset
- image_size = 128: length of the square that the images will be resized to
- random_shuffle = False
- batch_size = 64
- num_threads = 2
- device_id = 0
'''
def __init__(self, imageset_dir, image_size=128, random_shuffle=False, batch_size=64, num_threads=2, device_id=0):
super(ImagePipeline, self).__init__(batch_size, num_threads, device_id, seed=12)
eii = ExternalInputIterator(imageset_dir, batch_size, random_shuffle)
self.iterator = iter(eii)
self.num_inputs = len(eii.frontal_indices)
# The source for the inputs and targets
self.input = ops.ExternalSource()
self.target = ops.ExternalSource()
# n.webpDecoder below accepts CPU inputs, but returns GPU outputs (hence device = "mixed")
self.decode = ops.n.webpDecoder(device = "mixed", output_type = types.RGB)
# The rest of pre-processing is done on the GPU
self.res = ops.Resize(device="gpu", resize_x=image_size, resize_y=image_size)
self.norm = ops.NormalizePermute(device="gpu", output_dtype=types.FLOAT,
mean=[128., 128., 128.], std=[128., 128., 128.],
height=image_size, width=image_size)
# epoch_size = number of (profile, frontal) image pairs in the dataset
def epoch_size(self, name = None):
return self.num_inputs
# Define the flow of the data loading and pre-processing
def define_graph(self):
self.profiles = self.input(name="inputs")
self.frontals = self.target(name="targets")
profile_images = self.decode(self.profiles)
profile_images = self.res(profile_images)
profile_output = self.norm(profile_images)
frontal_images = self.decode(self.frontals)
frontal_images = self.res(frontal_images)
frontal_output = self.norm(frontal_images)
return (profile_output, frontal_output)
def iter_setup(self):
(images, targets) = self.iterator.next()
self.feed_input(self.profiles, images)
self.feed_input(self.frontals, targets)
You can now use the ImagePipeline
class that you wrote above to load images from your dataset directory, one batch at a time.
If you are using the code from this tutorial inside a Jupyter notebook, here is how you can use an ImagePipeline
to display the images:
from __future__ import division
import matplotlib.gridspec as gridspec
import matplotlib.pyplot as plt
%matplotlib inline
def show_images(image_batch, batch_size):
columns = 4
rows = (batch_size + 1) // (columns)
fig = plt.figure(figsize = (32,(32 // columns) * rows))
gs = gridspec.GridSpec(rows, columns)
for j in range(rows*columns):
plt.subplot(gs[j])
plt.axis("off")
plt.imshow(np.transpose(image_batch.at(j), (1,2,0)))
batch_size = 8
pipe = ImagePipeline('my_dataset_directory', image_size=128, batch_size=batch_size)
pipe.build()
profiles, frontals = pipe.run()
# The images returned by ImagePipeline are currently on the GPU
# We need to copy them to the CPU via the asCPU() method in order to display them
show_images(profiles.asCPU(), batch_size=batch_size)
show_images(frontals.asCPU(), batch_size=batch_size)
Step 3: Setting Up Your Neural Network
Here comes the fun part, building the network’s architecture! We assume that you are already somewhat familiar with the idea behind convolutional neural networks, the architecture of choice for many computer vision applications today.
Beyond that, there are two main concepts that we will need for the face Frontalization project, that we shall touch upon in this section:
- The Encoder/Decoder Network(s)
- The Generative Adversarial Network
Encoders and Decoders
The Encoder
As mentioned above, our network takes images that are sized 128 by 128 as input. Since the images are in colour (meaning 3 colour channels for each pixel), this results in the input being 3 × 128 × 128 = 49152 dimensional. Perhaps we do not need all 49152 values to describe a person’s face? This turns out to be correct: we can get away with a mere 512 dimensional vector (which is simply another way of saying “512 numbers