DNA FRAGMENT ASSEMBLY

This project is in collaboration with Rebecca Parsons. We are applying the genetic algorithm to a computational biology problem---the assembly of DNA sequence fragments from a parent clone whose sequence is unknown into a consensus sequence corresponding to the parent sequence. This is called the ``fragment assembly problem'' and was chosen as one of two challenge problems for the 1994-95 DIMACS competition. It is a permutation problem, similar to the Traveling Salesman Problem (TSP), and similarly, NP-complete, but with some important differences (circular tours, noise, and special relationships between entities). In this work we have studied two different representations and many different genetic operators. We have compared the two representations and their associated operators on problems ranging from 2 kilobases to 34 kilobases. After experimenting with different combinations, we were able to solve a 10 kilobase sequence, consisting of 177 fragments, with no manual intervention. We recently extended our work to a 752 fragment set and achieved very close agreement with the published consensus sequence.

Major Sponsors:
Current projects result from the generous support of:
The National Science Foundation CCR-0311686 & CCR-0331580, The National Institute of Health COBRE Grant 5 P20 RR018754-03, The Santa Fe Institute, & Motorola, Inc.
University of New Mexico
Computer Science Dept.
Farris Engineering Building,
Albuquerque, NM 87131-1386
505-277-7104
forrest@cs.unm.edu
On-line PGP key