Embedded Systems Group (ES)

Sequence alignment

Sequence alignment

Sequence alignment plays an important role in bioinformatics: for example, given two DNA sequences, of which one is suspected to have originated from the other through evolutionary processes (e.g., crossover and mutation). Then in order to align these sequences, they are filled with gaps (wildcards) in such a way that they coincide at each non-gap position, while optimizing the number and placement of the gaps according to some metric.

For example, consider the following two sequences:

One possible alignment might look like this (where gaps are represented by a dash):

These sequences are aligned since there are no two different letters on top of each other. Another solution is:

Clearly, the first solution is preferable, since it introduces fewer gaps.

Gaps can be interpreted as follows: if there is a gap in the second sequence at a certain position, then the corresponding letter in the first sequence has been deleted; if there is a gap in the first sequence, then the corresponding letter in the second sequence has been inserted.

FPGA board The FPGA board that was used for sequence alignment in hardware. The host computer communicated with the FPGA through the USB cable on the left.

Due to its nature, sequence alignment is a prime candidate for speedup by parallelization. In our project, students designed a hardware implementation of sequence alignment in our programming language Quartz. The implementation targeted a board with a Xilinx Virtex-5 FPGA and 64 MiB DDR2 RAM. In order to interface with the FPGA, the host computer was connected to the board by a USB cable, which was used to send the input sequences to the FPGA and to receive the resulting aligned sequences.

Schematic
Schematic diagram of the hardware system.

The Smith-Waterman algorithm finds an optimal alignment for simple cost metrics. Using dynamic programming techniques, it runs in O(nm) time and O(nm) space, where n and m are the respective lengths of the sequences. In our project, a naive implementation of the Smith-Waterman algorithm would have required more memory than available on the board for its computation. Therefore, students either had to implement a more sophisticated memory management, or use a modified version of the algorithm which requires less than quadratic space.

For more information, please contact Adrian Willenbücher.

Further reading