Rensselaer Polytechnic Institute Professor Wins HP Innovation Award for His Work on Searching for Complex Patterns in Massive Linked Data
August 31, 2010
Mohammed J. Zaki
The universe around us can be expressed as numbers, and those numbers in pattern paint a picture: a network of friends from the vastness of the Internet; travel patterns among residents of cold climates; or a common factor among victims of a disease. Now, the deluge of digital-age data makes possible more complex patterns and a more complete picture — the link between the friends, their travel, and the illness.
The work of Mohammed J. Zaki, a Rensselaer professor of computer science, will enable us to see links and patterns we did not know existed. In recognition of the importance of his work, Zaki has been selected for the 2010 HP Labs Innovation Research Program.
Zaki searches for so-called “graph” patterns — patterns represented by links between points — on an unprecedented level, designing algorithms and systems that connect data through multiple layers and links.
“The ability to connect all the dots from the disparate data sources is extremely important to gain critical insights into complex social, business, or scientific phenomena of interest,” said Zaki.
Early systems for gathering and analyzing data were built on a “transactional” basis, looking at one transaction at a time within a particular category (such as an e-mail to a friend among e-mail records, or a flight to a particular destination among flight records, or an urgent visit to a hospital among hospital admissions records) independent of links that exist between those different transactions. With the preponderance of data, the need for a new model has grown.
“If you think of the data of the world today, everything is related to everything else,” Zaki said. “Existing statistical frameworks have to be extended to reflect the new interconnectedness of this world.”
For example, Zaki said, researchers may have studied the structural properties of social or other interaction networks, but have largely ignored the content of individual nodes and entities.
“Other people have primarily looked at the topology of the networks; my goal is to simultaneously add the content,” Zaki said.
According to HP, the prestigious Innovation Research Program is designed to provide colleges, universities, and research institutes around the world with opportunities to conduct breakthrough collaborative research with HP.
HP reviewed more than 375 proposals from 202 universities across 36 countries. Rensselaer is one of only 52 universities in the world to receive a 2010 Innovation Research award. The HP Labs Innovation Research Program is designed to encourage open collaboration between HP and the academic community on mutually beneficial, high-impact research. This year's proposals were solicited on a range of topics within the eight broad research themes at HP Labs: analytics, cloud, content transformation, digital commercial print, immersive interaction, information management, intelligent infrastructure, and sustainability.
"Our goal with the HP Labs Innovation Research Program is to inspire the brightest minds from around the world to conduct high-impact scientific research, addressing the most important challenges and opportunities facing society in the next decade," said Prith Banerjee, senior vice president of research at HP and director of HP Labs. “Rensselaer has demonstrated outstanding achievement and we look forward to collaborating with it in this dynamic area of research.”
The award will allow Zaki, working with HP labs, to tackle two specific problems: graph pattern mining and graph indexing.
Pattern mining allows data miners to establish whether a links exists between two or more things.
“We’re looking for hidden patterns: Are entities connected? Are they connected in a particular configuration?” Zaki said.
Pattern mining enables advances in a broad variety of applications like the development of the semantic web, proteins-protein interactions, social networks, and pattern discovery.
Graph indexing refers to systems that can store complex graph and network data.
“Once you have enhanced interlinked data sets, how do you store them? It’s a way of physically storing data on a system on the back end for rapid search and mining,” Zaki said.
Contact: Mary L. Martialay
Phone: (518) 276-2146