kirkr.xyz

Field-Programmable Gate Arrays (FPGAs)

Designers have long navigated the trade-off between the flexibility of software-driven CPUs and the raw performance of custom-designed silicon (ASICs). Field-Programmable Gate Arrays (FPGAs) represent a third paradigm: a "sea" of gates on a chip that can be rewired after manufacturing to become any digital circuit possible. This unique reconfigurability allows FPGAs to deliver hardware-level performance with software-like flexibility. This post explores the architecture that makes this possible, the design flow used to harness their power, and the real-world applications where they are used, like prototyping the next generation of chips and accelerating workloads in cloud data centres.

1. The Architecture of Reconfigurability

At its heart, an FPGA is a blank canvas for digital logic. Unlike a CPU that fetches and executes a linear sequence of instructions, an FPGA implements a circuit spatially. The design is not compiled; it is synthesised onto a physical fabric of programmable components. This fabric is primarily composed of three elements:

Configurable Logic Blocks (CLBs): These are the fundamental building blocks. Each CLB contains a few small, reconfigurable memory units called Look-Up Tables (LUTs) and storage elements called Flip-Flops. A 6-input LUT, for instance, is a tiny SRAM that can be programmed to implement any boolean logic function of up to six variables. Flip-flops provide memory, allowing the creation of registers and stateful, synchronous circuits.
Programmable Interconnect: A vast, flexible network of wires and programmable switches that route signals between the thousands of CLBs, I/O pins, and other resources on the chip. The efficiency of this routing fabric is a major determinant of the FPGA's final performance.
Input/Output (I/O) Blocks: These blocks sit at the perimeter of the chip, connecting the internal logic to the outside world. They are highly configurable to support a wide range of signaling standards (like LVDS, HDMI, or DDR memory interfaces).

The mental model for an FPGA designer is not writing code that runs sequentially, but describing a hardware structure that operates in parallel. For example, this snippet of Verilog, a Hardware Description Language (HDL), describes a simple 8-bit counter:

module simple_counter (
    input clk,
    input reset,
    output reg [7:0] count_out
);

    always @(posedge clk) begin
        if (reset) begin
            count_out <= 8'd0;
        end else begin
            count_out <= count_out + 1;
        end
    end

endmodule

When synthesised, this becomes a physical circuit: eight flip-flops to form the count_out register, a collection of LUTs configured to work as an 8-bit adder, and routing to connect them all together. Every clock cycle, the entire circuit updates in parallel.

2. The Modern FPGA SoC

Modern FPGAs are more than a grid of CLBs. To improve performance and efficiency for common tasks, they integrate hardened, dedicated silicon circuits known as hard blocks. Building a function in dedicated silicon is orders of magnitude more efficient in terms of power, area, and speed than constructing it from generic logic.

The Power of Hard Blocks: Implementing a 32x32 multiplier using LUTs might consume hundreds of logic cells and run at a few hundred MHz. A dedicated DSP hard block on the same chip can perform the same function faster, at lower power, and using zero general-purpose logic resources.

Common hard blocks found in today's FPGAs include:

Block RAM (BRAM): Dedicated on-chip memory blocks for efficient data storage.
DSP Slices: Digital Signal Processing units containing hardware multipliers and accumulators, essential for filtering, transforms, and AI inference.
High-Speed Transceivers: Complex SerDes (Serialiser/Deserialiser) blocks for implementing multi-gigabit protocols like PCI Express and 100G Ethernet.
Hard Processor Systems: Many FPGAs are now full-fledged Systems-on-Chips (SoCs), integrating one or more hardened Arm processor cores (e.g., Cortex-A53 or Cortex-R5). This allows developers to run a full operating system like Linux for control and management tasks, while using the programmable fabric for custom, high-performance hardware accelerators.

3. The Ecosystem and Real-World Impact

A powerful piece of silicon is only useful if developers have the tools and ecosystem to target it. The FPGA ecosystem has matured dramatically.

Design Tools: The industry is dominated by sophisticated EDA (Electronic Design Automation) toolchains from the device vendors, namely AMD (Xilinx) Vivado and Intel (Altera) Quartus Prime. These tools handle the complex process of synthesis, placement (assigning logic to specific CLBs), and routing (connecting everything through the interconnect). An open-source toolchain (Yosys, nextpnr) is also gaining significant traction for smaller devices.
High-Level Synthesis (HLS): To bridge the gap between software and hardware design, HLS tools allow developers to write algorithms in C, C++, or OpenCL. The HLS compiler analyses the code, infers parallelism, and synthesises it into a hardware implementation (RTL), making FPGA acceleration accessible to a much broader audience.

This powerful combination of hardware and software has unlocked FPGAs for a wide array of applications:

ASIC Prototyping: The original killer app. An entire ASIC design can be implemented on a large FPGA to be tested and validated with real software months before the final chip is fabricated.
Data Centre Acceleration: Uses cases like Microsoft's Project Catapult and Amazon deploy FPGAs in their data centres (e.g., AWS F1 instances) to accelerate networking, database queries, and machine learning workloads. Their reconfigurability is a key advantage, allowing new hardware accelerators to be deployed with a simple bitstream update.
High-Frequency Trading (HFT): Where every nanosecond counts, FPGAs are used to implement ultra-low-latency network stacks and trading algorithms directly in hardware.
Aerospace & Defense: FPGAs are staples in radar, software-defined radio, and avionics, where their performance and ability to be reconfigured in the field are critical.

4. The FPGA vs. ASIC Trade-off

The ultimate question for many product designers is when to use an FPGA versus committing to a full-custom ASIC. The decision hinges on a few trade-offs:

Cost & Time-to-Market: FPGAs have no non-recurring engineering (NRE) costs; you simply buy the chips off the shelf. An ASIC design requires millions of pounds in NRE for design, verification, and mask sets, and a typical design cycle is 18-24 months. FPGAs offer a vastly faster and cheaper path to market.
Performance & Power: An ASIC will always be superior for a given function. By optimising the silicon specifically for one task, an ASIC can achieve higher clock speeds, lower power consumption, and a smaller die area compared to an FPGA implementation of the same logic.
Volume: The trade-off often comes down to volume. The high NRE of an ASIC is amortised over the number of units sold. For low-to-medium volume products, FPGAs are almost always more cost-effective. For products that will ship in the millions of units, the lower per-unit cost of an ASIC eventually wins out.

Ultimately, FPGAs provide a unique middle ground, offering hardware performance and software flexibility that has cemented their place as a critical component in the modern computing landscape.