FPGAs in Neural Networks

Published By

Artificial intelligence (AI) is undoubtedly the future of computing, with large amounts of research being conducted in an attempt to create useful and dependable AI. Part of the development of AI includes advancement into an area called deep learning, which is a branch of machine learning that uses algorithms to model high-level abstractions in data. Currently, in order to model deep learning and process these algorithms, large systems are developed using GPUs as their central processing source. Much of today’s mass computing uses GPUs because Moore’s Law has held up over the past several decades, however, we are coming to a time where GPU advancement is not keeping up with the demands of AI software algorithms. In addition to computing power, these GPU based systems also consume large amounts of energy. So, how is it possible to increase computing power while reducing power consumption in the process of executing these deep learning algorithms?

Doug Burger, Director of Client and Cloud Applications at Microsoft, and his team have taken an interest in solving this dilemma. Microsoft has deployed various large global data centers that carry out extremely important workloads through the use of Convolutional Neural Networks (CNN). CNNs are a type of machine learning that analyzes an image in a way to learn features that might help the computer identify patterns in an image. In order to gain a better understanding of what CNNs are, Dr. Burger explains that, “the algorithms apply a set of filters onto a ‘space’ of an input image. While centering around a set of pixels on the input image, a function is used to analyze the pixel set, gather a result, and slide the parameters of the set one pixel over.” He further explains that with increased repetitions of the process, the resulting matrix reduces the original matrix size. The process is executed across many functions resulting in a three dimensional matrix of learned values. Dr. Burger indicates that his team then applies a set of weights to the original image based on the prior processing and uses this to extract features from the images. The results of the CNN can then be placed into a fully connected layer, or deep network where the system could learn higher level concepts.

This process of using CNNs for deep learning is hugely computationally intensive. Microsoft’s data centers were having trouble deploying GPUs at a large scale as the sole the computing source, as they had limited applicability and power consumption. It was also discovered prior to testing CNNs that the servers were inefficient at processing Bing’s search rankings. In 2014, Microsoft’s Catapult project acknowledged this hurdle by augmenting the CPUs and tested Altera Stratix 5 FPGAs to process the Bing search ranking algorithms. The results showed an increase in performance by a factor of nearly 2X, so the team decided to leverage this infrastructure to test the CNN accelerator.

The FPGA accelerator was developed to efficiently compute forward propagation of convolutional layers. So the CNN accelerator should be able to accept an input image and process multiple convolutional layers in succession. In development, the design should include a few factors. Because the system must process multiple layers, the engine of the system must be configurable in order to support these layers. Memory management is crucial so the design must include an efficient data buffering scheme and an on-chip re-distribution network. Finally, the design must have the ability to contain a spatially distributed array of processing elements that can be scaled easily up to thousands of units. This allows the CNN accelerator to accept an input image and then perform analysis of numerous convolutional layers in succession. The method in which the system processes the convolutional layers is heavily dependent on the hardware used. FPGAs have become the clear choice in the battle for more processing efficiency.

Individually the FPGA units underperform GPUs. The FPGAs used in the CNN accelerator design are the Altera Arria 10 FPGAs. Using a single Altera Arria 10 FPGA on the ImageNet 1K processes 233 images/second while using around 25W. Comparably the NVIDIA Tesla K40 GPU can process 500 to 824 images/sec. But, the NVIDIA Tesla K40 GPU uses 235W of power in order to process the images at this rate. So, the FPGA is a little slower, but it also consumes considerably less power. The kicker here is that the FPGAs are stackable. As little as three FPGAs can be connected together to produce the same amount of processing power as the Tesla K40 while reducing power output by around 30%.

When placing performance against power, the Arria 10 FPGA produces up to 40 GFLOPS-per-Watt. The Arria 10 uses OpenCL, a type of VHDL, to code its IEEE754 hard floating point digital signal processing (DSP) blocks. According to Michael Strickland, the director of the Compute and Storage Business Unit at Altera, the Arria 10 has a flexible data path which bypasses external memory and allows OpenCL kernels to pass data directly to each other. In addition to the flexible data path, the Arria 10 supports hard floating point for both multiplication and addition. This allows the FPGA to contain more logic and a faster clock speed. With these improvements in hardware and software functionality, the Arria 10 can stack and outperform current GPU platforms. In the tests that Microsoft performed using the array of Arria 10s to compute functions of the CNN, the team observed considerable performance gains. When just running the software on an FPGA setup versus a GPU setup, there was a 30x to 40x speed increase across the FPGA setup.

According to Dr. Burger, “When the whole system is tuned in using FPGAs, there was about a 2x throughput improvement. We could run with half the number of servers with the same load.” Arria 10 FPGAs are built to be programmable, while at the same time have great performance with better efficiency. The ability to connect multiple FPGAs to each other on a single server, allows the computing power to scale above current GPUS.  By increasing processing while decreasing power consumption, it is possible for Microsoft to advance their current data centers to hand the demands of deep learning.

Pursuing faster and more efficient resources allows for the development of advanced applications. Deep learning and other computationally intensive applications are in the forefront of research. Since GPUs have been able to keep up with demand, they have been consistently used in the data center environment. But with the advancements in FPGA performance, data centers can expand to meet the researcher’s needs. This allows the team to emulate deep learning by processing multiple layers of convolutional neural networks in succession. With the success of the FPGAs for heavy computational applications, the door opens for the possibility of future applications. Dr. Burger details that, “this is the first time that software is mixing with programmable hardware in data centers. Lots of interesting opportunities arise, including how to balance which applications, such as in CNNs and deep learning, should bounce between the two computing paradigms.” It is becoming more real to model machine learning, and based on the Microsoft team’s research, FPGAs can be a trailblazer when it comes to computing futures.

 

 

Related news articles

Latest News

Sorry, your filter selection returned no results.

We've updated our privacy policy. Please take a moment to review these changes. By clicking I Agree to Arrow Electronics Terms Of Use  and have read and understand the Privacy Policy and Cookie Policy.

Our website places cookies on your device to improve your experience and to improve our site. Read more about the cookies we use and how to disable them here. Cookies and tracking technologies may be used for marketing purposes.
By clicking “Accept”, you are consenting to placement of cookies on your device and to our use of tracking technologies. Click “Read More” below for more information and instructions on how to disable cookies and tracking technologies. While acceptance of cookies and tracking technologies is voluntary, disabling them may result in the website not working properly, and certain advertisements may be less relevant to you.
We respect your privacy. Read our privacy policy here