Proteomic data validation: a closer look
April 12, 2012 § Leave a comment
I’ve been getting reader feedback on my article on validating -omics data in the April 2012 issue of ASBMB Today. Some in the proteomics field feel that I made it sound like that as a whole, proteomics has issues with data replication. It doesn’t — and I can now devote some words to explain why the field of proteomics can’t be treated as one big lump.
Proteomics has different branches. The best-known branch is shotgun proteomics. You use mass spectrometry to find out what is in your sample of interest. It’s like a fishing expedition in a river you’ve not gone fishing in before. You take your sample, cut up the proteins with proteases (usually trypsin) and then analyze the peptides in a mass spectrometer. You try to fish out as many peptides in your sample as possible and identify them. Shotgun proteomics is the technique that produces those large lists of proteins or modifications on proteins.
In the past, shotgun proteomics used to be poorly reproducible in one person’s hands, because the number of peptides in the sample overwhelmingly exceeded the number of sequencing attempts the instrument could do. But that’s no longer the case for many samples, especially those of moderate complexity, because the sequencing speed of modern mass spectrometers has been greatly improved. If you have a certain number of precursor ions that falls within the same order of magnitude as the sequencing cycles of the mass spectrometer, almost all precursors can be identified. For this reason, the method becomes reproducible.
Irreproducibility rears its head if a sample is not thoroughly analyzed. If you can’t pick out nearly all of the molecules in your sample, then it’s going to be hard for you or anyone else to come up with the same set of molecules when the experiment is repeated.
As Ruedi Aebersold at ETH Zurich explained it to me when I interviewed him for the “Valid -Omics” story, it’s like having a box of red, green, blue and yellow marbles. I can pull out a handful of marbles that would have a mix of colors different to the mix of colors pulled out by Aebersold. The same thing happens with shotgun proteomics, although he explained this sampling problem is growing smaller as higher performance mass spectrometers come onto the market.
There is another branch of proteomics called targeting mass spectrometry. It’s also known as targeting proteomics, which is done by selection reaction monitoring or multiple reaction monitoring mass spectrometry (check out some papers on the topic in Molecular & Cellular Proteomics). Here, you’re not trying to identify all the proteins in a sample and build an inventory. Instead, you ask whether certain proteins exist in a given sample and, if so, how much of each protein is there? You need to know ahead of time what you are looking for.
This technique has been used with great reproducibility across laboratories, says Aebersold. You basically just need to have some coordinates that you feed to your instrument so it can identify the targeted peptides. People in other labs can find the same peptides if you give them your coordinates.
As Aebersold explained to me, when people talk about reproducibility in proteomics, it’s important to understand the distinctions between the different types of proteomic approaches. “If one wants to pursue the discovery of as many proteins in a sample as possible then, yes, reproducibility is a problem,” Aebersold said. “If one is attempting to measure a predetermined set of proteins in multiple samples by targeting mass spectrometry, then the reproducibility is not a huge problem.”