Our interviewee and his team have a short but demanding to-do list: to make the algorithmic trading services offered by their employer, one of the world's leading investment banks, faster, more reliable, and more sophisticated in minimising financial risk.
Key to forging a competitive advantage in providing so-called "black box" trading services to investors is performing maximum risk analysis without becoming the bottleneck in a client's trading flow.
"What we try to do is avert risk," he explains. This isn't just for the benefit of the bank's clients but also as an essential safeguard of their own business. The bank stands to be liable for loses resulting from bad trades triggered by a client's algorithm.
With the system already executing trades at speeds close to the theoretical limits of physics the team is increasingly looking to the telecoms and defence industries for inspiration on how to cram even more risk calculations within the few microseconds available.
Processing in parallel
The team exhausted the limits of the sturdy old CPU long ago. The processors that power conventional computing devices accomplish an astonishingly wide range of tasks.
From the smartphones in our pockets to servers running enterprise-level cloud-based application, all these devices use high clock-speed processors that, generally speaking, perform calculations one after another.
Ideal for watching Netflix on an iPad; less-so for analysing millions pieces of market and trade data in just a few microseconds.
The key is parallelism - breaking down large computational problems into their fundamental operations and computing them simultaneously, or "in parallel". Conventional CPUs have limited capacity for parallel processing, which is why our interviewee's team relies on more specialist hardware.
Field-programmable gate arrays (FPGAs) offer an attractive alternative.
They're essentially a large matrix of rudimentary "logic blocks" that can be "wired together" into desired configurations. It may sound unsophisticated, but that's its beauty: the embodiment of the principle that any computational task, however complex, can be distilled down to basic boolean operations.
Specialist software packages programme FPGAs with the logic for a specific algorithm. The outcome is an entirely customised processor, poised to perform its intended operation with maximum parallelism.
Clock speeds are low but performance brushes the edge of what's theoretically conceivable. Asked what the limiting factor is to the reduction of latency within the bank's system, our interviewee replies with one word: "physics".
Using custom hardware in place of software increases reliability as well as speed. He explains that FPGAs have cut execution times from under 20 microseconds for 90 per cent of trades to under five microseconds for 99.99 per cent of trades.
Putting it into practice
But such impressive results have come at a cost. Porting algorithms over from their software implementation is a considerable task, sometimes taking several months per algorithm.
In the early days of hardware-led algorithmic trading (back in 2009-2010) hardware experts - including our interviewee - were brought in from the telecoms and defence industries to lead the process.
But as the application of FGPAs has broadened new software tools have made the process easier. Our interviewee explains: "Before, it used to be languages like VHDL array log. Now there's a C variant for everything, and there's OpenCL that's coming up, and that will improve the time to market."
This is encouraging news for technology graduates aspiring to contribute to the advancement of hardware-led trading. A graduate joiner in the team, who studied electrical engineering, got involved straight away in coding for a variety of FPGA applications and plays a big part in the testing phase.
With the latency of algorithmic trading already so close the limits of possibility, what next for our interviewee and his team? The answer, it seems, is squeezing even more computation within the few microseconds they have to complete trades.
"The bank is not about ultra-low latency. It's is more about making the risk checks we need to make as quickly as possible."
"We have literally hundreds of risk checks, because we don't like to take risks. With FPGAs we can cram those risk checks into a reasonable amount of time."