FANTOM is a large scientific consortium with over 500 members all over the world and it’s data is based on cap analysis of gene expression (CAGE). CAGE is a pretty advanced and unique next-generation sequencing technique that allows analysing transcriptional start sites across the entire genome with amazing resolution. FANTOM 5 dataset is also the most comprehensive expression atlas that exists today, including 952 human and 396 mouse tissues, primary cells and cancer cell-lines.
I have started working with the FANTOM 5 data over two years ago, at the beginning of my PhD. So I’m very happy that this data is now finally released and out there in the open for everyone to explore: http://fantom.gsc.riken.jp/5/data/. Including the results of the work we have been doing on it all this time.
A lot of people (including one of my supervisors, – L.H.) spent a lot of sleepless nights trying to polish these results, check from all possible angles, then waiting for reviews, emails and finally an acceptance letter.
Our focus was following the evolution of different types of mammalian cells and tissues, fate of gene duplicates and what all this new data of promoter architecture can tell us about gene expression evolution. If you are interested in these topics as well, I would humbly recommend the following order of approaching the papers:
1. A promoter-level mammalian expression atlas
First, of course, have a look at the main FANTOM5 paper in Nature. It gives a good overview of what this data set offers and contains some truly stunning visualisations, like this one of promoter expression:
2. The Evolution of Human Cells in Terms of Protein Innovation
http://mbe.oxfordjournals.org/content/31/6/1364.long (Open Access)
Next in line is a paper by Gough’s group that is in many ways aligned with our work and results, but they were first getting it into a neat story, by creating a timeline of cell evolution. Unfortunately, the figure demonstrating that timeline is completely unreadable (what were you thinking, MBE?) But authors offer a little more detailed figure to explore on their website: http://supfam.cs.bris.ac.uk/SUPERFAMILY/trap/
FANTOM 5 allows to look especially in detail into different cell types in the human brain and this paper demonstrates the very curious process of evolutionary accumulation of novel cell functions that form our brains ever since the Fungi/Metazoa divergence. The most interesting observation here, in my opinion, is that the brain cells evolve under the same selective pressure that the spleen and thymus, which demonstrates how intervened the nervous and immune systems are in the light of evolution.
Authors also use an interesting way of looking at the evolutionary profile of different cells. On the following figure is, for example, the evolutionary profile for T-cells and bars represent how different protein domain architectures appearing at certain evolutionary time.
3. A simple metric of promoter architecture robustly predicts expression breadth of
human genes suggesting that most transcription factors are positive regulators.
I think now it clearly demonstrates an interesting observation that became apparent from the FANTOM5 data. The idea is really simple: the number of binding transcription factors found on the promoter predicts the expression breadth of this gene. This can be seen when one looks at the expression breadth of paralogs in the human genome. Another important conclusion is that the number of TFs defines where the gene is expressed, but not at what level. The HTML version of the paper is still in production, but you can view a provisional PDF over here: http://genomebiology.com/2014/15/7/413/abstract# (Open Access)