Burchard portrait, by unknown photographer.

From Academia to Industry: Luk Burchard

Published: 3.04.2024

The following interview with former Simula colleague, Luk Bjarne Burchard, PhD, covers his exciting research contributions to the field of High Performance Computing (HPC), the impact of the work and the path that led him to accept a role in industry, with Cerebras Systems Inc.

Can you share your educational background?

I completed my BA and MA in Berlin. During my BA I interned at Google with Ahmet Alp Balkan and worked on Kubernetes products, an open-source container orchestration system for automating software deployment, scaling, and management.

I also interned at Simula to work on Graphcore IPU scientific codes, which in simple terms means we worked on applications that would run on these hardware accelerators. I liked it so much that I moved to Norway eventually to complete a PhD with UiO and with Simula supervision, defending my thesis, "Repurposing Domain-specific Hardware Accelerators for Sparse and Irregular High-Performance General-Purpose Computation" ahead of schedule in December 2023.

What are some examples of recent work in your field?

To share my recent work, I need to offer context. Since 2018 there have been frequent dialogues between Simula and Graphcore as well as Cerebras, when both companies were in stealth mode. This has enabled me to have early access to emerging HPC hardware. For example, in December 2019, Simula was among the first in the world to receive the delivery of a 64-processor IPU system from Graphcore.

The Graphcore relationship allowed me to start work on the Graphcore IPU Mk1 and Mk2 in 2020. We looked at graph processing first as it is a well-defined and challenging benchmark established in the HPC community. The Graph500 benchmark focuses on graph processing which is more difficult to scale, especially through multiple machines. We focused on doing well in this benchmark first to understand the potential of the hardware for graph applications, which build the foundation for many real-world applications.

Another area we successfully explored is the usage of AI accelerators for bioinformatics problems. We explored how to repurpose the IPU for running sequence alignment problems, which are still very much limited by the available hardware.

Finally, Simula and Xing Cai have expertise in cardiac modeling, which is simulating a heart inside a supercomputer, and we also sought to accelerate it. Compared to the two previous applications this falls more into the classical scientific computation applications.

Through the Cerebras relationship, we also had early access to the CS-2 Cerebras architecture in collaboration with EPCC, the supercomputing center at the University of Edinburgh, UK.

Our initial collaboration started as a passion project due to the interesting and promising hardware architecture the Cerebras AI accelerators would offer. With our experience implementing Breadth-First Search (BFS) on special hardware accelerators, we started a project with the Cerebras team. The collaboration was very fruitful, thanks in part to good communication and direct access to their engineers. Later, during research visits in Berkeley and San Francisco, I was invited to their HQ to present research results and look at the latest technology previews of their hardware.

What are IPUs and GPUs?
Intelligence Processing Units (IPUs) and graphics processing units (GPUs) are specialized hardware designed to accelerate the training and inference of AI models, including models for Generative AI training. GPUs were originally developed for processing images and videos for computer video gaming. At a certain point, it was discovered that GPUs could also run in parallel and greatly improve performance for other types of computing.

Are there emerging trends or technologies within your field that you find particularly exciting or promising?

Right now, there’s a lot of new hardware emerging in the HPC space. Many new hardware products are created because Moore's law is stalling, and CPUs are often not fast enough for many applications.

Most supercomputers already use GPUs or other accelerators to compete in global benchmarks and speed up scientific applications, of which AI has also become a big part. The question is if GPUs are the solution to everything. I think not. Thus, many new interesting and creative designs emerge. This carries great potential and provides many opportunities to do research with a lasting impact on the field.

Another general trend is to integrate more into a system; the consumer could see this with great success for Apple's M1 processor silicon. But also on the server market, the trend goes towards more tightly integrated systems.

The more one can squeeze onto a chip, the denser it becomes, which then removes or greatly reduces the latency and communication bandwidth, and you get the benefits of having everything in the same localized place. The extreme would be the Cerebras architecture which has 850,000 cores on the same processor, basically integrating a whole server aisle into a single processor.

Can you share what you did in your work with IPUs and GPUs?

We explored speeding up graph algorithms on the IPU because they are crucial for modeling complex relationships and interactions in various real-world systems, such as social networks, transportation networks, and biological networks. Their versatility and effectiveness in handling large-scale data make them indispensable in the era of big data and advanced computing.

One exciting project I worked on was with Professor Giulia Guidi, who invited me and Max Xiaohang Zhao from Charité to stay at Cornell to work on a challenging sequence alignment problem together with Aydın Buluç from LBNL. During the three months, we collaborated closely together and improved existing bioinformatic pipelines for protein and DNA sequence alignment up to 10x. In Prof. Guidi's words, “Sequence alignment is an extremely important and compute-intensive part of basically any computational biology workload. “It is extremely common and it’s usually one of the bottlenecks of the computation.”

Building upon these results, I have now accepted a position at Cerebras and will continue the work in March. I start with working on an implementation of the Breadth-First Search (BFS) algorithm for the Cerebras architecture to better understand the architecture.

From Academia to Industry: Luk Burchard

See also