Home Tech Compressed version of styleGAN, synthesizing high-fidelity images, with fewer parameters and lower...

Compressed version of styleGAN, synthesizing high-fidelity images, with fewer parameters and lower computational complexity

23
0

Heart of the Machine Report

Author: boat, Chen Ping

A new architecture called MobileStyleGAN greatly reduces the number of parameters based on style GAN and reduces computational complexity.

In recent years, in generative image modeling, there have been more and more applications of generative confrontation networks (GAN). Style-based GAN can generate different levels of detail, ranging from the shape of the head to the color of the eyes. It implements SOTA in high-fidelity image synthesis, but the computational complexity of the generation process is very high. It is difficult to apply to mobile devices such as smart phones.

Recently, a study focusing on the performance optimization of style-based generative models has attracted everyone’s attention. This research analyzes the most difficult computational part of StyleGAN2 and proposes changes to the generator network, making it possible to deploy a style-based generation network in edge devices. The research proposed a new architecture called MobileStyleGAN.Compared with StyleGAN2, the amount of parameters of this architecture is reduced by about 71%, the computational complexity is reduced by about 90%, and the quality of the generation is almost unchanged.

Comparison of the generation effect of StyleGAN2 (top) and MobileStyleGAN (bottom).

The author of the paper has put the PyTorch implementation of MobileStyleGAN on GitHub.

Paper address: https://arxiv.org/pdf/2104.04767.pdf

Project address: https://github.com/bes-dev/MobileStyleGAN.pytorch

The training code required for this implementation is very simple:

The effect of StyleGAN2 (left) and MobileStyleGAN (right) is shown.

Let’s take a look at the details of the MobileStyleGAN architecture.

MobileStyleGAN architecture

The MobileStyleGAN architecture is built on the basis of the style generation model. It includes a mapping network and a synthesis network. The former uses the mapping network in StyleGAN2. The focus of this research is to design a computationally efficient synthesis network.

The difference between MobileStyleGAN and StyleGAN2

StyleGAN2 uses pixel-based image representation and aims to directly predict the pixel value of the output image. MobileStyleGAN uses frequency-based image representation to predict the discrete wavelet transform (DWT) of the output image. When applied to 2D images, DWT converts the channels into four channels of the same size, which have lower spatial resolution and different frequency bands. Then, the inverse discrete wavelet transform (IDWT) reconstructs the pixel-based representation from the wavelet domain, as shown in the figure below.

StyleGAN2 uses a skip-generator to form an output image by explicitly summing the RGB values ​​of multiple resolutions of the same image. The study found that when the image is predicted in the wavelet domain, the prediction head based on skip connection has little effect on the quality of the generated image. Therefore, in order to reduce the computational complexity, this study replaces the long jump generator with a single prediction head of the last block in the network. But predicting the target image from the middle block is of great significance for stable image synthesis. Therefore, this research adds an auxiliary prediction head to each intermediate block and predicts it according to the spatial resolution of the target image.

The prediction head difference between StyleGAN2 and MobileStyleGAN.

As shown in the figure below, modulation convolution includes modulation, convolution and normalization (left). The depth separable modulation convolution also includes these parts (middle). StyleGAN2 describes modulation/demodulation for weights, and the study applies them to input/output activations respectively, which makes it easier to describe depth separable modulation convolution.

The StyleGAN2 building block uses ConvTranspose (left in the figure below) to upscale the input feature map. The research uses IDWT as the upscale function in the MobileStyleGAN building block (right in the figure below). Since IDWT does not contain trainable parameters, this study adds additional depth separable modulation convolution after the IDWT layer.

The complete building block structure of StyleGAN2 and MobileStyleGAN is shown in the figure below:

Distillation-based training process

Similar to some previous studies, the training framework of this study is also based on knowledge distillation technology. The research uses StyleGAN2 as a teacher network and trains MobileStyleGAN to imitate the functions of StyleGAN2. The training framework is shown in the figure below.

Build new, see wisdom – 2021 Amazon Cloud Technology AIOnline conference

April 22, 14:00-18:00

Why do so many machine learning loads choose Amazon Cloud Technology? How to achieve large-scale machine learning and enterprise digital transformation?

“Building New · Seeing Wisdom-2021 Amazon Cloud Technology AI Online Conference” is led by Alex Smola, vice president of global artificial intelligence technology and outstanding scientist of Amazon Cloud Technology, and Gu Fan, general manager of Amazon Cloud Technology Greater China Product Department, and more than 40 heavyweights Guests will give you an in-depth analysis of the innovation culture of Amazon cloud technology in the keynote speech and 6 major sub-venues, and reveal how AI/ML can help companies accelerate innovation.

Session 1: Amazon Machine Learning Practice Revealed

Session 2: Artificial Intelligence Empowers Digital Transformation of Enterprises

Session 3: The Way to Realize Large-scale Machine Learning

Session 4: AI services help the Internet to innovate rapidly

Session 5: Open Source and Frontier Trends

Sub-venue 6: Intelligent ecology of win-win cooperation

Which topic are you more interested in in the 6 major conference venues?