Using Complex Bioinformatics to Save Lives

Dr. Sohrab Shah

A focus on finding a direct route to a meaningful outcome seems a distinctive characteristic of Sohrab Shah’s life and work to date. As an undergrad in biology at Queen’s University, he had little knowledge of computers, but he figured out how to use them anyway to develop multi-media tutorials “functioning as souped-up lab manuals,” he notes, to help other students have a better understanding of biological diversity. After graduating from Queens in 1996 he became interested in web design. Through this largely self-taught work, and realizing that there were “some interesting things going on under the hood there,” Sohrab decided to enroll in computer science for a second bachelor’s degree. After researching his options, he decided on UBC’s computer science department, which allowed him to transfer his elective credits and graduate with a CS degree in two years.

Making this switch proved serendipitous to say the least. During a course in the department Sohrab met Francis Ouellette, a guest lecturer who was working in the emergent field of bioinformatics, where biological information generally, and molecular genetics in particular, can be accessed, processed, and analyzed using computer technology. Impressed with Ouellette’s work and wanting to learn more about bioinformatics, Sohrab spoke to Ouellette and was offered a job in his lab at Vancouver’s Centre for Molecular Medicine and Therapeutics. This began a fruitful period of collaboration in which Sohrab and his colleagues developed Genecomber, a gene-prediction program that combines, and substantially improves on, the performances of two other gene-predicting programs then in use.

In 2002 Sohrab was hired on as the head of software development at the UBC Bioinformatics Centre, where he developed a trifecta of applications to improve bioinformatics research. Named for the classical mythological figures whose characteristics they emulate, these tools are in use today at UBC and in the broader research community. The first, Pegasus, is a genome annotation system that allows the user to create analysis pipelines for biological sequence data. With the system, Sohrab notes, it’s possible for a researcher, through simple point and click means, to create an analytic work path that dynamically and, like the winged horse, swiftly updates the interaction among a researcher’s input data, results, and further analysis. The second program, Atlas, is a data warehouse for integrative bioinformatics. Here, users link into existing medical-based databases such as Medline, GO, and EntrezGene, and, through a single SQL query, access published information on specific genomics data. The program both literally maps information available through existing databases and at the same time allows for a functional searching of that information that vastly exceeds traditional hunt-and-peck web methods. The third program, Ulysses, is a web-based user interface and analytical system for projecting homologous protein interactions from model organisms onto human proteins.

As he developed these tools, Sohrab learned how to build a network of contacts, approach funding agencies and write grants; in short, he began to master the “business of science.” He also opened up to the possibilities in the field. “That’s really when the light went on…and I saw [how] computer science plays a central role in biomedical research in moving things forward and identifying potential molecular diagnostics, molecular targets for drugs. It’s now a full-fledged quantitative science. Clinical research that leverages genomics cannot exist without sophisticated computational research that goes along with it. It’s part and parcel of what we do.” At that point, Sohrab decided to return for a PhD. at UBC’s Department of Computer Science, noting that the work experiences he’d had to date helped him “hit the ground running” in the program because he really knew what he wanted to accomplish. There he worked with Professor Raymond Ng in bioinformatics, developing model-based approaches for detecting alterations in DNA sequences.

Now with the ink barely dry on his PhD, and just weeks into his new position as a researcher at the B.C. Cancer Agency, Sohrab is already in the full swing of work, shuttling between two sets of offices at the B.C. Cancer Centre’s clinical facility at Heather and 10th in Vancouver. In these offices he will work on a variety of projects aimed at developing methods to improve cancer treatment outcomes. One project focuses specifically on determining the differences between tumour and normal genomics. The completion of the Human Genome Project makes this undertaking exciting and transformative. Sequencing technology has evolved over the last few years such that it is now cost-effective to sequence specific tumours and then compare the results with reference data from the HGP. Sohrab notes that these “interrogations in parallel” yield an enormous amount of data from which treatment protocols may be derived, but the problem is in identifying which of the data are useful and which are part of the background noise. It’s as if a huge treasure chest is waiting to be mined, but tons of sand impede the progress of finding the jewels lying in wait.

For Sohrab, this is where research into novel computational techniques will help make the difference. Sequencing a tumour, he notes, “generates millions and millions and millions of little sequences and the task is to find where in the reference genome [of 3 billion letters] each of these little sequences actually matches.” There’s simply no way humans can manually sift through these data and extract meaningful comparisons. Furthermore, many of the gene sequences in the tumour data have tiny “errors” embedded in them. These genetic alterations, such as deletions, amplifications, and translocations, don’t refer to the tumour but rather to an individual’s personal DNA characteristics, and their presence complicates understanding the true difference between the tumour itself and the reference data. Evolving computational techniques will help push this problematic data to the background so that key differences can be ascertained and analyzed.

Another set of projects to which bioinformatics gives researchers access is retrospective studies on tissue-banked tumour samples. Researchers can examine the molecular patterns of these banked tumours against the reference bank. They can also examine the treatment protocols and clinical outcomes for the patients who exhibited the tumours. This is extremely important data because when doctors treat patients for cancer, Sohrab notes, some will respond to the standard protocol but some won’t and “almost always what’s underlying that differential response is a molecular difference…in the tumours of those people versus the tumours of these people.” When these molecular differences are truly understood, treatment protocols can be designed individually and the era of “personalized medicine” can begin in earnest.

Though with his schedule it seems unlikely he would have much time to do anything but work, Sohrab has an active home life that includes hiking and other outdoor activities with his wife, daughter, and son, and coaching his son’s elite soccer team. He also plays alto saxophone, a holdover from his “bohemian days,” when, between his undergrad degrees, he played professionally in Kingston. These days, there’s nothing bohemian about his focus or his mission to build bridges between two rapidly evolving scientific disciplines: “We think of computer science and we think of….bioinformatical sciences but what we’re really looking for is progress.... Students…do some novel algorithm [in one field or the other] and that’s what’s rewarded. But what you need to make is discoveries. In order to do that we need to have people working across fields. What’s important is that we come up with novel ways of treating cancer.”