Mapping Proteins: Researchers Discover a Better Way to Decode the Protein Language

September 20, 2002

Troy, N.Y. — Two researchers at Rensselaer Polytechnic Institute are creating a faster, more efficient data-mining technique to determine basic rules of how proteins form. The researchers are Mohammed Zaki, assistant professor of computer science, and Chris Bystroff, assistant professor of biology.

Researchers can identify a protein’s biological function, and therefore its specific role in disease, if they know the 3-D structure of a protein given its amino-acid sequence.

Twenty simple amino acids make up the “language” that forms the thousands of complex proteins in the human body. The idea is to discover how amino acids, or “letters,” lead to “words” or common patterns to form proteins.

With that in mind, Zaki and Bystroff’s approach involves creating a 3-D image of each known protein already recorded in the worldwide Protein Data Bank. The researchers then reduce the image to a simpler 2-D representation, called a “contact map.” The 2-D map reveals the chemical and other interactions among amino acids-data that are difficult to extract from the more complex 3-D images.

The data are mined from the contact map is then transferred into a knowledge bank of “contact rules” and used to predict unknown proteins and even how novel proteins might form.

The research is funded under a three-year, $333,928 Early Career Principal Investigator Award from the U.S. Department of Energy.

The research will appear in the IEEE (Institute of Electrical and Electronics Engineers) journal, Transactions on Systems, Man and Cybernetics in early 2003. The work will also appear in 2003 in a chapter of a book, called Handbook of Data Mining (Publisher: Lawrence Earlbaum Associates).

Contact: Jodi Ackerman
Phone: (518) 276-6531
E-mail: N/A

Back to top