Joe Caspermeyer, Media Relations Manager & Science Editor
(480) 727-0369 | joseph.caspermeyer@asu.edu
To answer some of the most challenging questions in biology, researchers have had to come to grips with an ever-increasing and unruly information onslaught. Now, ASU assistant professors Michael Rosenberg and Jieping Ye of the Biodesign Institute’s Center for Evolutionary Functional Genomics, have recently been awarded more than $1.2 million in grants by the National Science Foundation (NSF) to expand and create technology to help wade through the burgeoning data pool.
Their field, bioinformatics, uses the increasing power of computational tools to answer pivotal questions like how organisms develop from a single cell to an adult.
Ye, a computer scientist, is using a $583,603 grant to develop a computational framework for analyzing biological images. His system uses a technology called machine learning, a technique routinely used in face recognition and thwarting credit card fraud. In this case, the “faces” are a large collection of Drosophila (fruit fly) embryonic images obtained through the Berkeley Drosophila Genome Project.
A collage of fruit fly gene expression images.
The proper development of each football-shaped fly embryo depends on the coordinated expression of thousands of genes. By studying the expression pattern of single genes, typically displayed in wide bands or narrow striped patterns, scientists can gain insight into the control and regulation of large genetic networks. Similar gene networks are found throughout biology, and break downs in these processes may result in birth defects, heart disease, cancer and aging.
Each image in the collection represents a different stage in time of an embryo’s development and a specific pattern of gene expression. “The key in this project is to be able to compare the expression patterns captured in the images,” Ye said. “If the images share similar spatial expression patterns, then the genes may share the same function.”
Ye and colleagues hope to design a system that can automatically identify an embryo’s developmental stage, and then identify spatial overlaps in the gene expression patterns. By identifying which genes are expressed at the same time and pattern, the scientists hope to unravel some of the mystery behind developmental biology.
The most straightforward way to compare images would be to take a single image and check it against every other picture in the database. “At the moment we can check an image against the entire database and provide a ranking of all images based on similarity, but this takes a long time for each new query,” Ye said.
The researchers expect to be able to design the system to work similar to the way internet search engines work, where a query can be made against a large database and matches located in a matter of seconds. Ye and colleagues are currently looking for a way to efficiently catalogue the important features of each image and develop a hierarchical structure to the database so that computing time isn’t wasted comparing images that aren’t related.
Another challenge in the project is to prepare all of the biological images so that they can be compared with one another. This involves a painstaking process in which each image is manually adjusted to the same size, shape and orientation. So far, the research group has combed through more than 45,000 images, which is about half of the total number of images available.
Rosenberg has been awarded $642,862 to develop a methods and software package called PASSaGE (pattern analysis, spatial statistics and geographic exegesis).
“PASSaGE can be used to analyze the clustering of disease in different regions of the world, the distribution of species within an ecological environment, and really anything that has a geographical or spatial component,” Rosenberg said. His background as a biologist has also helped him develop the package to tackle emerging questions in the life sciences like how topographical and structural features of the genome influence gene expression.
Rosenberg will use the NSF grant to build on his success in the first version of PASSaGE. He is focusing his efforts on developing a second version that will provide a wider range of analytical methods and allow scientists to incorporate three-dimensional spatial data and analyses into their projects.
“There have been tools around to do this for a long time, except that each research discipline had its own way of doing it,” Rosenberg said. With PASSaGE, Rosenberg has taken elements common to methods used in scientific disciplines like ecology, geology and geography and combined them in a single package that is more user-friendly. The simple menu driven program will be broadly applicable to a variety of different scientific fields.
vPASSaGE is designed to analyze data in its spatial context, like information gathered from global positioning systems and remote sensing. But PASSaGE is different from a typical geographic information system, which is tailored for information storage and map construction, and is instead built to provide a variety of powerful analytical functions like network and pattern analyses.
Rosenberg is doing a complete rebuild with the second version in order to improve on the user interface, its ability to import and export data, available functions and graphical output. He hopes to release a beta version of the new program to the public in the next few months.
-story written by Dan Jenk