Deep Learning is now powering numerous AI technologies in daily life, and convolutional neural networks (CNNs) can apply complex treatments to images at high speeds. At Unity, we aim to propose seamless integration of CNN inference in the 3D rendering pipeline. Unity Labs, therefore, works on improving state-of-the-art research and developing an efficient neural inference engine called Barracuda. In this post, we experiment with a challenging use case: multi-style in-game style transfer.
Deep learning has long been confined to supercomputers and offline computation, but their usability at real-time on consumer hardware is fast approaching thanks to ever-increasing compute capability. With Barracuda, Unity Labs hopes to accelerate its arrival in creators’ hands. While neural networks are already being used for game AI thanks to ML-Agents, there are many applications to rendering which have yet to be demonstrated in real-time game engines. For example: deep-learned supersampling, ambient occlusion, global illumination, style transfer, etc. We chose the latter to demonstrate the full pipeline going from training the network to integration in Unity’s rendering loop.
Style transfer is the process of transferring the style of one image onto the content of another. Famous examples are to transfer the style of famous paintings onto a real photograph. Since 2015, the quality of results dramatically improved thanks to the use of convolutional neural networks (CNNs). And more recently, large efforts have been made by the research community to train a CNN at processing this task in a single pass: a given image is taken as an input to the network which outputs a stylized version of it in less than a second (on GPU). In this work, we use a small version of such a network that we train to the task of multi-style transfer. Later, we plug it into the Unity rendering pipeline so that it takes as input the framebuffer, and transforms it into its stylized version in real-time.
The result is a real-time style transfer in your game. Here we see the great visuals from the Book of the Dead environment, stylized with the neural network applied in real-time at 30 FPS on current high-end PC hardware, with on-the-Fly style switching.
To start we chose the state-of-the-art fast style-transfer neural network from Ghiasi and colleagues. This network has two parts:
1) from a style image, it estimates a compact representation of style using a neural network, and
2) it injects this compact representation into the actual style transfer network that transforms an input image into a stylized image. This way, one can change the style image at runtime, and the style transfer adapts.
Our Style Transfer Network is composed of two downsampling and symmetric upsampling layers with in-between five residual blocks.
Once the architecture is chosen, we first pre-train this full network offline (once trained, it will be used at runtime). To this end, we useContinue reading