September 20, 2002
Troy, N.Y. — Two researchers at Rensselaer Polytechnic
Institute are creating a faster, more efficient data-mining
technique to determine basic rules of how proteins form. The
researchers are Mohammed Zaki, assistant professor of computer
science, and Chris Bystroff, assistant professor of
biology.
Researchers can identify a protein’s biological function, and
therefore its specific role in disease, if they know the 3-D
structure of a protein given its amino-acid sequence.
Twenty simple amino acids make up the “language” that forms
the thousands of complex proteins in the human body. The idea
is to discover how amino acids, or “letters,” lead to “words”
or common patterns to form proteins.
With that in mind, Zaki and Bystroff’s approach involves
creating a 3-D image of each known protein already recorded in
the worldwide Protein Data Bank. The researchers then reduce
the image to a simpler 2-D representation, called a “contact
map.” The 2-D map reveals the chemical and other interactions
among amino acids-data that are difficult to extract from the
more complex 3-D images.
The data are mined from the contact map is then transferred
into a knowledge bank of “contact rules” and used to predict
unknown proteins and even how novel proteins might form.
The research is funded under a three-year, $333,928 Early
Career Principal Investigator Award from the U.S. Department of
Energy.
The research will appear in the IEEE (Institute of Electrical
and Electronics Engineers) journal, Transactions on
Systems, Man and Cybernetics in early 2003. The work will
also appear in 2003 in a chapter of a book, called Handbook
of Data Mining (Publisher: Lawrence Earlbaum
Associates).
Contact: Jodi Ackerman
Phone: (518) 276-6531
E-mail: N/A