Accelerating Machine Learning with Field-Programmable Gate Arrays

In this article, BittWare, a Molex Company, explores a recent test of its technology stack for machine learning (ML) and field-programmable gate arrays (FPGA) in a real-world traffic monitoring system using the BittWare 520N accelerator card. Read more to learn about the advantages FPGAs offer over a central processing or graphics processing unit and visit Arrow today to discover how you can try out a BittWare FPGA PCIe board through the Arrow Test Drive Program.

Machine learning (ML), which is a particular implementation of artificial intelligence (AI) technology and field-programmable gate arrays (FPGAs) are enjoying a significant increase in demand due to their mutually beneficial characteristics. The demand is a result of the technical and business forces driving AI to be built into “edge” devices -- that is, embedded electronics devices that don’t rely on having a high-bandwidth internet connection or access to high-end, cloud-based computing hardware.

A key benefit of ML inference engines running on FPGAs locally where the sensors are is that this topology offers extremely low latency. The ability to act quickly on locally processed data (versus sending data up to the cloud to be crunched and waiting for the results) is crucial in applications where milliseconds can make a difference, such as collision detection for autonomous automobiles. Other application areas where it makes sense to push the computations as close as physically possible to the end user include aerial drones, factory robots and building automation.

The BittWare Advantage

0320  Molex Accelerating Machine Learning with FieldProgrammable Gate Arrays image  520Ncover1200px

BittWare 520N FPGA Accelerator Board

Molex-subsidiary BittWare is constantly developing solutions that uniquely position its OEM customers for success by helping reduce risks associated with technology-maturity and minimizing their time to revenue. Recently, BittWare had the opportunity to explore its ML/FPGA technology stack in a real-world application involving a traffic monitoring system. The research utilizes the BittWare 520N accelerator card that leverages an Intel Stratix 10 as its FPGA of choice.

So why choose an FPGA over a central processing unit (CPU) or graphics processing unit (GPU)? First, the costs of FPGAs are steadily decreasing as demand ramps up. FPGAs were historically treated as mere proving grounds before an OEM commits a proprietary microchip’s intellectual property (IP) to silicon in the form of an application specific integrated circuit (ASIC). However, FPGAs are increasingly seen as important for other use cases, chief among them being ML applications. The reconfigurable nature of an FPGA is seen as a plus because ML software libraries are still nascent and as such, they are constantly and rapidly evolving. Being able to take advantage of these constant improvements over time, in hardware, is very appealing to designers and end users alike. FPGAs are also computationally fast once programmed. In addition, the development tools to program FPGAs are becoming more accessible just as microcontrollers have become more accessible over the past decade or so.

Implementing Software in Hardware

0320  Molex Accelerating Machine Learning with FieldProgrammable Gate Arrays image  BWNNvideotraffic

YOLOv3 Network Not Only Identifies Objects, It Also Places Bounding Boxes Around Objects, Which Is Useful for Applications That Require Objects to Be Tracked,/i>

A subset of ML technology is neural networks, which loosely mimic how the neurons in a biological brain operate. BittWare utilized the state-of-the-art real-time object detection system YOLOv3 (short for “you only look once”), which is part of a C-based open-source neural network framework. To implement the YOLOv3 code on the FPGA that resides at the heart of the BittWare 520N accelerator card, the team utilized the OpenCL (short for open computing language) framework. The advantages of using OpenCL over implementing YOLOv3 with a hardware description language (HDL) such as Verilog are threefold. First, it is much faster to get to a functional prototype since developers can utilize a more familiar software-like development toolchain and workflow. Second, it is much quicker to move the OpenCL application to different FPGAs as compared to the time it would otherwise take to target ML libraries to multiple hardware platforms using HDL. Last, iterative design is much easier as newer versions of ML libraries are released, letting developers stay on the cutting edge during the product development life cycle.

Optimizing Neural Networks for FPGAs

0320  Molex Accelerating Machine Learning with FieldProgrammable Gate Arrays image  Inferenceperjoul

Performance Graph Showing the BWWN Acceleration for the 520N FPGA Card Compared to an Earlier FPGA Card and a CPU

All that said, FPGAs are not a silver bullet solution. Great care must still be taken during the design and engineering phases to ensure that neural networks are optimized for different data sets. Striking an acceptable balance between detection accuracy and power consumption is also a major concern. Power consumption is a crucial consideration for edge devices that may be required to operate in low-power environments.

One way that BittWare achieves this balance is through the use of a variant of a convolutional neural network (CNN) known as a binary weighted neural network (BWNN). Think of a processing convolution of a CNN as a 2-dimensional array filled with 32-bit-long numbers. A quick aside, convolution is a mathematical term that refers to a specific set of operations done on two functions that results in a third function. These 32-bit coefficients (collectively referred to as a filter or kernel) are one half of the convolution operation, with the sensor inputs being stored in the second function; they result in a third function called a feature map. What is very interesting from a practical perspective is that research has shown that using 16-bit-long coefficients has no significant effect on the accuracy of the network’s output while consuming half the storage and memory bandwidth resources within the FPGA. BWNN goes even further and reduces the weight filters to single-bit values. Furthermore, it eliminates the need for costly multiplication operations and reduces the math to simple addition and subtraction. These improvements can translate to speed increases and improved power efficiency. Further performance improvements are dependent on additional factors, including:

  • Device speed grade
  • Depth of combinatorial logic in a design
  • Fanout of a design (the number of signals that are shared between multiple points)
  • Routing congestion caused by overpopulating the device
  • Global memory bandwidth

Are you considering integrating FPGAs as part of your ML-based solution? Do you have questions about how to proceed? BittWare offers a deeper look at these technologies in a whitepaper titled “FPGA Acceleration of Binary Weighted Neural Network Inference.”

Ready to try out BittWare’s FPGA PCIe board? Borrow it now through the Arrow Test Drive program.

Related news articles

Latest News

Sorry, your filter selection returned no results.

We've updated our privacy policy. Please take a moment to review these changes. By clicking I Agree to Arrow Electronics Terms Of Use  and have read and understand the Privacy Policy and Cookie Policy.

Our website places cookies on your device to improve your experience and to improve our site. Read more about the cookies we use and how to disable them here. Cookies and tracking technologies may be used for marketing purposes.
By clicking “Accept”, you are consenting to placement of cookies on your device and to our use of tracking technologies. Click “Read More” below for more information and instructions on how to disable cookies and tracking technologies. While acceptance of cookies and tracking technologies is voluntary, disabling them may result in the website not working properly, and certain advertisements may be less relevant to you.
We respect your privacy. Read our privacy policy here