Different Approaches to reduce Latency

Published in

FPGAs for Stock Market Trading

3 min readMar 29, 2021

Throughout the last 2 decades, financial markets have witnessed a major change from the conventional human-based physical venues and floor-based trading into electronic-based automated trading controlled by sophisticated computer algorithms without any human intervention. According to a study carried out in 2010, the Securities and Exchange Commission estimated that the HFT volume in U.S. equity markets in the second half of the decade was greater than 50% of the total trading volume.so, in this blog we will discuss some of the techniques used to reduce latency further.

UDP Offloading

In general HFT system, UDP data is received by a NIC in form of packages.Then,packets are forwarded to kernel which perform checks and decoding.This approach as high latency as interrupts are used to inform OS of new packets and s another translation layer in form socket interface to user space.additionally, OS jitter in other activities introduces latency spikes.

First approach for reducing latency is to bypass the OS kernel and directly decode the received frames in hardware. Therefore, a support for the Address Resolution Protocol (ARP) is used to receive and send frames to the correct receiver. ARP is employed to map IP addresses to physical MAC addresses of the recipient.

IGMP,ETH,IP and UDP are the protocols implemented in efficient pipelined design which focuses on lowest latency possible.IGMP and ARP frames on the other hand are not timing critical for trading, as they do not deliver data. checksums of different protocols are checked in parallel while results of these checks are interpreted at the last stage of pipeline.

2)FAST Decoding in hardware

This is an approach of hardware extension,it enables direct decoding of messages.FAST decoder is composed of three independent units. First is Decompressor which detects the stop bits andallign incoming fields to 64 bit to have fixed size fields which make decoding easier for following units.

Second is Microcode engine,it is used to decode any variation of FAST protocol.it runs program loaded into FPGA at start with subroutine.an assembler is used to produce binary code for microcode engine.assembler with microcode engine adapts the changes of exchange very easily without any need for new design.This speeds up protocol modifications.

Third is DMA engine which provides trading software with decoded streams.It forwards received data to ring buffer in address of trading software.Each write consists of eight quadwords out of seven are used for content of fields and eighth contains status information,status information incorporates a bit which is inverted every time ring buffer wraps around so, using this bit new data detection can be done easily.

These implementations provide lowest possible latency by making use of polling for software.Polling provides latency lower than interrupts.

3)Parallelization

FAST protocol has a limitation that it is impossible to process a single stream of FAST messages in parallel.Fortunately, the assembly of data provided by stock exchanges provides multiple FAST streams, therefore it improves the throughput and reduces the latency by implementing FASt processors in parallel.

References:C. Leber, B. Geib and H. Litz, “High Frequency Trading Acceleration Using FPGAs,” 2011 21st International Conference on Field Programmable Logic and Applications, 2011, pp. 317–322, doi: 10.1109/FPL.2011.64.

Different Approaches to reduce Latency

Written by KAUSHAL SHINDE