The relationship of proteins
Researchers have for the first time uncovered the proteome of 100 organisms from all domains of life
© Johannes B. Müller © MPI of Biochemistry
What do the house mouse Mus Musculus, the organism Haloferax Mediterranei that lives in thermal springs and the intestinal bacterium Escherischia coli have in common? Nothing one would assume, because these three organisms are far distant in their evolutionary ancestry. Each belongs to one of the three different domains, the highest classification category of living organisms: eukaryotes, archaeae or bacteria. All three organisms use similar biomolecules called proteins which are important for their survival.
In order to discover new similarities and differences between these and other organisms, researchers from the MPI of Biochemistry, in collaboration with research institutions from Munich and Copenhagen, have analyzed in addition to these three organisms, the proteome of a total of 100 organisms from all domains of life. The proteome is the sum total of the proteins of a cell or living organism.
Johannes Müller, one of the two first authors of the study, explains: "Nowadays, evolutionary phylogeny is analysed on the basis of the similarity of certain gene segments. Genes, are the blueprints for proteins. With the current study we have looked at the different gene products, the proteins. We can now determine whether the individual organisms not only carry the building instructions for the proteins, but also produce them.”
Protein analysis
Matthias Mann, head of the Department "Proteomics and Signal Transduction" and his team are experts in protein analysis. Using the high-throughput mass spectrometry method, known and unknown proteins can be identified and their quantity determined. Bioinformatic methods subsequently enable the analysis and integration of additional information from scientific databases.
Quantifying these proteins makes it possible for evolutionarily-related proteins called homologues to be compared between organisms. The data allows the scientists to answer fundamental questions such as: In what do living organisms invest the most of their resources across all domains? Related proteins conserved during evolution can now be compared quantitatively for the first time.
This is the largest number of proteins that have ever been identified and quantified in the field of proteomics research. 2 million unique peptides from a total of 340,000 proteins have been detected. Peptides are fragments of proteins. The identified protein fragments were assigned to proteins using databases of known and predicted gene products and linked to other known database information. This resulted in an incredible data collection of 8 million data points with 53 million connections between those.
New discoveries
"We were able to prove the existence of assumed but never confirmed proteins. Many proteins were previously only known as predictions, based on the known gene sequence. 93% of the proteins we measured have not yet been experimentally confirmed," explains Johannes Müller.
It was already known genes exist which are very similar across all domains. These include the genes known to be involved in protein folding, the protein production machinery or the energy metabolism. With the help of the study, the researchers were now able to show that the associated proteins are produced in high concentrations in all organisms. Johannes Müller explains: "It is understandable that the protein folding assistants are essential for the survival of all organisms and therefore would be conserved across domains. To ensure that proteins can perform their specific function they have to be folded into a very individual three-dimensional shape. At this point, the folding assistants help. For example, a receptor protein can only function if it has the proper shape to bind a signalling molecule.”
From the raw data set that is now available, the research community can gain many more insights in the future. Matthias Mann summarizes: "We are providing the public an unprecedented resource of proteomic data to increase biological knowledge. Figuratively speaking: With the proteomic data of 100 different organisms, we have generated new detailed molecular knowledge. Comparable with 100 maps, we can now zoom into many sub-areas and discover new relationships and compare the data with each other. These results are an invalueable resource for any protein researcher who wants to know more about his or her proteins of interest in other organisms, as well as for bioinformaticians who benefit from our raw data and our systems biology findings. Among the large number of still uncharacterised proteins, there are certainly many that are particularly important for life and might therefore be of interest for medicine and the biotechnology to advance ‘chemistry green’".