FPGA Programming Tutorial

Within the computational sphere, FPGAs and programmable SoCs are a class of devices that stand separate to CPU, MCU, DSP and GPU devices. FPGAs provide developers with implementations that offer increased throughput, lower latency and increased determinism.

Within the computational sphere, FPGAs and programmable SoCs are a class of devices that stand separate to CPU, MCU, DSP and GPU devices. FPGAs provide developers with implementations that offer increased throughput, lower latency and increased determinism.

Both FPGA and programmable SoCs contain programmable logic cells such as lookup tables, registers and block RAMs. When a developer creates a programmable logic solution, it is the configuration and connections of these logic cells that the developer is defining.

As the developer is describing the configuration of logic cells rather than the sequence of instructions for execution, this leads to a significantly different development flow. This difference also introduces new challenges for the developers, including:

- Verification — How does the designer verify and debug the functionality?
- Resource allocation — Does the device contain enough logic resources to implement the design?
- Timing closure — Can the utilized logic cells be connected as needed and still achieve the desired operating frequency?
- Power dissipation — Is the power dissipation of the final design acceptable for the power supplies and thermal environment?

Using Hardware Description Language (HDL) for Design Capture

Since its introduction in 1984, the way that programmable logic designs are captured has changed significantly. This progress has seen design entry evolve from defining logic equations for each logic cell to schematically capturing logic circuits, use of hardware description languages and, more recently, high-level synthesis.

Of course, one of the key driving factors behind the increasing level of abstraction used to program devices has been the increasing capacity and capability of the devices.

Most modern designs are captured using a hardware description language (HDL) such as Verilog or VHDL. Both languages allow the developer to describe the desired functionality to be implemented at a register transfer level (RTL). Defining an RTL design means that the developer is describing a synchronous logic design and the transfer of information between registersfor example, state machines and counters.

While HDL languages contain the constructs that would be expected of a programming language (e.g., working with files), only a limited subset of the language can be used to create a programmable logic solution. The remaining constructs are used during the verification of the design.

Increasingly, however, developers of programmable logic solutions are using higher-level languages such as C, C++, OpenCL and Matlab/Simulink for design capture. These languages are used in conjunction with a high-level synthesis tool, for example Vivado HLx or Intel’s HLS Compiler, which is responsible for converting the high-level language into a VHDL or Verilog description. This HDL description is then implemented using the standard FPGA development flow.

Like all other developments, regardless of whether an HDL or high-level language is used, it is good practice to use a modular approach, which allows for easier comprehension and reuse as desired.

Test Benching the FPGA Module

As each HDL or HLS module is developed, it will require testing to ensure that the functionality is performing as expected. This is where the test bench comes in. The test bench will apply stimulus to the module’s inputs and monitor. In advanced cases, it will report upon the behavior of the output signals. As we are simulating the transactions for every register in the design, logic simulations can be much slower.

While conceptually like a software test harness, the test bench can require more detailed interaction because each signal and bus must be correctly driven and timed to stimulate the module. It is within the test bench that the wider constructs of the language are used as stimulus settings are read or results are logged to text files.

One of the key elements of the simulation is the test for corner cases and boundary conditions, which can lead to the module not functioning as intended.

To apply the test bench to the unit under test, an HDL simulator like Vivado Simulator (which is supplied with Vivado HLx) is required as this allows developers to simulate the logic design.

It is common practice to simulate the design prior to implementation within the device. However, this means that the results of the simulation do not consider the timing delays that occur in the implemented device (e.g., setup and hold times). As such, it simulates the functional performance only. While simulations can be back-annotated with this information once the design has been implemented, doing so increases simulation runtime significantly.

Implementing the Module for FPGA Use

Once the developer is satisfied with the functional performance, the final stage of the development is implementation.

Implementation can be split into to four distinct stages: synthesis, placing, routing and programming file generation. While the implementation consists of multiple stages they are all performed using a proprietary tool supplied by the selected device vendor, e.g. Intel Quartus or Xilinx Vivado.

Synthesis takes the HDL files and synthesizes them into a description of the logic circuits to be implemented. As such, synthesis determines the settings of the configurable logic cells, registers and block RAMS and other dedicated logic resources available within the targeted device. It is during the synthesis stage that most logic optimization will take place and the trimming of unused signals and variables will be performed. This can result in unwanted optimizations or synthesis decisions. As such, the developer can control synthesis options, strategies and optimizations using synthesis constraints. Constraints are text-based and guide the synthesis tool during its operation.

The output from synthesis is a netlist, which describes the logical behavior of the design. The next stage of implementation is to physically place each logic function within the device. Generally, the placer tool will use built-in algorithms that define how it places the logic cells within the design. However, if desired, the user can also examine and move the placement of logic cells via the use of placement constraints. This is very useful when we are trying to achieve timing closure on the design.

The penultimate stage of implementation is performed once the logical functions have been mapped. These mapped resources must be connected as defined by the design using the resources available in the device. This process is called “routing,” and it is here when the desired timing performance is used by the routing algorithms to try and achieve the desired operating frequency. Achieving the desired operating frequency is called “timing closure,” meaning that each register and clock element in the design achieves the required setup and hold time.

Should timing close not be achieved, there are several approaches that can be used, from selecting a different implementation strategy to updating the placement constraints to bring timing critical blocks closer together, as well as updating the HDL design to implement a more optimal logic structure during synthesis.

The final stage of the implementation process is the generation of a programming file, which can be used to configure the target device. Once this is completed, we are ready to download it to our device and begin the fun that is integration with the wider system.

Of course, integration can bring about its own challenges for both the FPGA developer and system integrator.


The FPGA development process is certainly different from that which is used for the creation of a more traditional computational solution. The learning curve, however, as it relates to the languages and toolchains (especially with HLS) is not as steep as initially thought. With enough time spent studying these processes, developers can begin implementing FPGA-bases solutions that offer increased throughput, lower latency and increased determinism.

Related news articles

Latest News

Sorry, your filter selection returned no results.

We've updated our privacy policy. Please take a moment to review these changes. By clicking I Agree to Arrow Electronics Terms Of Use  and have read and understand the Privacy Policy and Cookie Policy.

Our website places cookies on your device to improve your experience and to improve our site. Read more about the cookies we use and how to disable them here. Cookies and tracking technologies may be used for marketing purposes.
By clicking “Accept”, you are consenting to placement of cookies on your device and to our use of tracking technologies. Click “Read More” below for more information and instructions on how to disable cookies and tracking technologies. While acceptance of cookies and tracking technologies is voluntary, disabling them may result in the website not working properly, and certain advertisements may be less relevant to you.
We respect your privacy. Read our privacy policy here