Blog

Functional genomics and precision medicine in systemic sclerosis

Systemic sclerosis (SSc) is a rare disease, so it is often difficult to build large clinical trials to test the effectiveness of any given experimental therapy. Thus, even though whole genome gene expression is routinely gathered in these trials, trials are often statistically underpowered to detect differential expression of many important genes in improvers on the treatment. This makes it difficult to perform post hoc analysis to determine the drug’s functional role in changing pathological gene expression. Moreover, many of the molecular responses to a drug occur post-transcriptionally, e.g. affecting binding and signaling.

In collaboration with the inimitable Jaclyn Taroni, we recently performed a meta-analysis of five clinical trials in SSc. To overcome the above obstacles (low statistical power, non-transcriptional regulation), we re-analyzed gene expression in these subjects using a variant of the NetWAS strategy developed by Greene et al. [1]. Originally developed for genome-wide association studies (GWAS), NetWAS combines p-values for genes (from any statistical test) with functional genomic networks and machine learning classifiers to identify a coherent functional signature of the nominally statistically significant genes. The ethos of NetWAS is that although genome-wide statistical significance is hard to come by (except in the most high-powered settings), it is often the case that the nominally significant genes in a study are enriched for functionally relevant pathways and processes. Thus, while taking all genes that meet a statistically permissive cutoff results in many spurious gene associations, pairing statistical tests with functional information allows one to simultaneously prune the list of spurious hits and expand the list to other relevant genes that were missed by statistical criteria. The net result is that functional information about genes can be used to make an end run around statistical power issues. This opens the door for a number of genomic applications in rare diseases where samples are hard to come by (more on this in a future post, perhaps).

We co-opted NetWAS to perform an augmented form of differential expression analysis in SSc clinical trials. Instead of GWAS -values, we used differential expression p-values. We ran NetWAS to identify the functionally similar genes among the nominally differentially expressed genes (i.e. genes that look differentially expressed by a permissive statistical cutoff). Then we extrapolated to the rest of the genome to identify genes that were similar to this signature. We systematically show that our strategy dramatically improves over differential expression analysis alone in identifying the known molecular targets of therapies, even when those targets are not even nominally differentially expressed. We show that there are significant commonalities between improver signatures across all therapies and that some therapies hit targets missed by others, suggesting the possibility of precision medicine in SSc. I’m sparing the details in this format, but please enjoy the paper [2], which just came out in JID, and get in touch with questions and comments!

  1. Greene, C. S. et al. Understanding multicellular function and disease with human tissue-specific networks. Nat. Genet. 47, 569–576 (2015).
  2. Taroni, J. N., Martyanov, V., Mahoney, J. M. & Whitfield, M. L. A functional genomic meta-analysis of clinical trials in systemic sclerosis: towards precision medicine and combination therapy. (2016). http://dx.doi.org/10.1016/j.jid.2016.12.007

 

 

 

It’s WORMHOLE day!

Almost three years ago, my friend George Sutphin and I were out for a hike chatting about orthologous genes. George studies the genetics of aging in mice at The Jackson Lab. But since mice live so long, he also studies aging in C. elegans. Thus, George finds himself looking back and forth between genomes trying to find the mouse gene that “corresponds” to a given worm gene. In this case, “corresponds” means “is an ortholog of”. A lot of ortholog mapping has been worked out by bioinformaticists using sophisticated statistical models of sequence data, but there are still a number of tough edge cases and not all methods agree. On our fateful hike, George told me about a meta-strategy that was being used in the worm community, which was to pool multiple predictors by simple voting. George was in the process of building the biggest, baddest meta-tool on the market. I had to get in on the action.

Voting for orthologs assumes that if many methods call a gene an ortholog, then it is more likely to be an ortholog than if only one method does. This seemed perfectly sensible to me, but I had the following benign thought: If you have examples of orthologous gene pairs and non-orthologous gene pairs, then you could use machine learning to learn the difference between their respective voting patterns. George and I agreed it would be sensible to try and, long story short, it worked! I am extremely pleased to bring you the fruits of that labor, our PLOS Computational Biology paper WORMHOLE: Novel Least Diverged Ortholog Prediction through Machine Learning.

Three years on, it feels good to have this finally in print. From the bottom of my heart, I wish you a happy WORMHOLE day!