/r/proteomics
This subreddit is dedicated to dissemination and discussion regarding the latest research and news in the field of proteomics.
The Proteomics Reddit
Proteomics - the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions. The term proteomics was coined in 1997 in analogy with genomics, the study of the genome. The word proteome is a portmanteau of protein and genome, and was coined by Marc Wilkins in 1994 while he was a PhD student at Macquarie University.
The proteome is the entire set of proteins that are produced or modified by an organism or system. This varies with time and distinct requirements, or stresses, that a cell or organism undergoes. Proteomics is an interdisciplinary domain that has benefited greatly from the genetic information of the Human Genome Project; it is also emerging scientific research and exploration of proteomes from the overall level of intracellular protein composition, structure, and its own unique activity patterns. It is an important component of functional genomics.
While proteomics generally refers to the large-scale experimental analysis of proteins, it is often specifically used for protein purification and mass spectrometry. Wikipedia: proteomics
Related Reddits
Outside Reddit Sites
/r/proteomics
I'm trying to follow a method from a paper that states "We selected the top 35 precursors for MS^(2) analysis which consisted of HCD high-energy collision dissociation with the following parameters: Astral data acquisition (TMT on), AGC 100%, ... etc"
Where do I go to select the top "35" N in this MS2scan Properties?
Has anyone ever done this?
Hi I'm new to the field and we want to validate our DDA data with PRM. I found a presentation saying that using Prosit can expedite this process without the need for synthetic peptides, but I can't find any additional info regarding this. I know that synthetic heavy labeled peptides are the gold standard, but these are currently inaccessible to us. Any leads would be appreciated, Thank you so much!
Hello,
I am trying to process some DDA plasma data analyzed on the Exploris 480 with DIA-NN. I know that it is meant for DIA analysis but I was under the impression that it can also process DDA data since it can be used for spectral library curation. For some reason my results with DIA-NN are very inconsistent and some files get 0 total ID’s. I’m not sure what’s wrong, are there certain parameters that I need to change in order to analyze the DDA data? For reference, I analyzed the same dataset of files in sequest(PD) and got 1200ish proteins. When the DIA-NN run finished I got 720, which is quite low. Any help or tips would be greatly appreciated!!
I am a complete newbie in proteomics, stumbled onto the field but staying to learn more because of the promising future in unlocking deeper insights into our health.
Here to ask researchers who use the different proteomics tools hands-on, how do you see the future of the tools develop (MS / PEA (Olink) / Somalogic etc.)?
Olink looks to be killing it out there commercially with the UK Biobank collab, getting longitudinal, disease-labeled data points. Is Olink going to take over the whole field as they have more and more paired Antibodies in their repertoire?
I also tried to find more researchers at my local medical university that publish with Olink, but there seems to be way more working with MS. Is it because Olink is too expensive vs MS? Limited in targets portfolio? Something to do with precision, dynamic range, or simply researcher habits & preferences?
Extremely curious. Would be fantastic to hear your thoughts!
In Metaproteomics , often a two step database search is performed to select a subset of database sequences at the first step to be used as the sequence database for the search in the 2nd step.
Usually at the first step and for a large sequence database , the spectra is searched using a "relaxed" criterion.
Can someone point out how this can be done in Proteome Discoverer ? Which nodes/params I've to select and with what params for the Processing and Consensus workflows?
Shall I use Fixed Value PSM Validator or Percolator with higher cutoffs for High/Medium confidence FDRs?
Where can I make changes in the Consensus workflow?
Thanks
Does anyone here have a cheap source of magnetic beads compatible with SP3/PAC clean-up. We have been using hydroxyl-modified beads from MagReSyn and Cytiva (both with good results), but have an application where the cost is killing us.
Just wondering if anyone has worked or is working with M3 emitter (Newomics) for bottom-up proteomics. Presently, I am using a 110 cm uPAC column + 15 um EASY-Spray emitter connected to an Ascend + FAIMS. I want to explore this M3 emitter, but prior to spending $$$, I'd like to hear feedback from others.
Is there a way to convert my PD3.1 output to the format used in MaxQuant STY sites files?
PD output includes a modification sites file:
As well as the PSM, Peptide Groups, and Protein Groups files..
I really don't want to re-run this analysis on MaxQuant because I was able to use Chimerys and some other specific search steps in PD. But the downstream analysis programs I want to use (DEP2, PhosphoAnalyst, PhosMap, etc right now only take the PhosphoSTYsites.txt input
These are IP samples. I was not expecting the data to look like this?
Hey guys. I am a PhD student who just got raw data back from an exploratory study in the form of label-free DIA. I have been recommended to process my files in Spectronaut.
I have zero experience in bioinformatics/biostatistics and overall computation stuff, but keen to learn with this great opportunity/project.
Can anyone advise what pipeline to follow and where can I find good resources to learn (literally) everything on how to go from raw files to visualisation graphs, please? How can I optimise all my stringency criteria during this pipeline?
Any help will be greatly appreciated! 🙏
I am very new in Proteomics. Just wondering if anyone has a good book or review on Proteomics Analysis Plots like heat map, volcanos, how to use GSEA, etc. I know I can google these terms, but the output is overwhelming and I need to comb through them. Thank you
Does anyone have the experience in doing Micro BCA for total protein concentration before and after trypsin digestion. The buffer used before the digestion is PBS and the buffer is UA buffer after the digestion. The concentration of total protein increases up to 3 times after the digestion. Does Urea interferes? Also the conc. of urea is 20mM. Thank you
An Open invitation to join mass spectrometry omics discord group
Hi, I got Mass spec data in excel sheet. It is partially analysed, showing protein IDs, fold change, -log10 p value, number of peptides identified in each protein etc. I have 3 repeats of control and treated samples. What should i do next? I am doing basic analysis on Reactom by shortlisting significant up and down regulated proteins. What else I can do? I am new to this all and I would appreciate any step by step guidance. The purpose is to find the key pathways/targets affected by the treatment. Thanks
I am using a self-filled column for single-cell proteomics (Astral+Vanqusih neo, 50 μm inner diameter, 1.5 μm C18, flow rate 250 nl/min, column temperature 55 degrees Celsius). When observing the tip of the column, I found a very obvious Taylor cone. How should I optimize my self-filled column?
Our purchase dept requires us to do market research for the instruments we want to buy. We already gave them the unique selling points for the instruments but that was not enough. Do any of you have experience with market research for MS for Proteomics? Or could anyone give me an example document? Thanks for the help!
Does anyone have experience that you could share related with formaldehyde-based crosslinking experiments?
To give further information, I’m exploring a few possibilities to study a protein-protein interaction. Perhaps as expected, some of my formaldehyde tests have given me pretty much only garbage in return.
Also looking into other crosslinkers like DSSO so if you can opine on that I would also appreciate it.
Hello guys!
So, straight to the problem.
I have a proteomics dataset in the form of a matrix, with 20 samples (as columns), and 6000 proteins (as rows). It's inside the picture inside this post. Protein expression is already log2 transformed.
Performing a PCA with FactoMiner and Factoextra packages, with the following code:
res.pca <- prcomp(datiprova_df_numeric, center=T, scale=F)
> fviz_pca_var(res.pca)
I obtain the PCA labeled 1 in the picture inside this post.
By writing
res.pca <- prcomp(datiprova_df_numeric, center=T, scale=T)
> fviz_pca_var(res.pca)
I obtain PCA 2 instead.
Now, when I transpose the matrix, and by writing
res.pca_t<- prcomp(datiprova_df_numeric_t, center=T, scale=T)
> fviz_pca_ind(res.pca_t)
I obtain PCA 3.
Why do I have the difference in how the PCAs look? I mean, using the same matrix i should get the same results, but with plots inverted if I transpose the matrix. I get why variables become individuals if i transpose, but not the change in PCA.
Can someone help?
Thanks!
When comparing phosphorylation between a control and treated (paired data) what is the best way to go about this?
Right now I am using TMTanalyst (Monash) and treat the phospho-enriched samples as a different 'condition' than the total proteome in the annotation file so that I can get expression graphs that show me the total protein quant (left) and the phosphoprotein quant (right).
In the case of this example where there is only one phosphopeptide identified in this protein, the phosphoprotein quant boxplots technically only have quantification from that single phosphopeptide between the control and treatment.
Given that I don't expect the total proteome to change between my control and treatment samples, and that they are paired, if I check the quant of the total protein between the control and treatment and don't see a difference is it ok to just compare the quantification of individual phosphopeptides?
The LFQ-intensity which MaxQuant produces is normalized internally if opted for. Is it OK to further normalize this already normalized intensities in Perseus , like using VSN method?
Secondly, I have a LFQ dataset for which the Control samples apparently have too many missing values in them, looks like the amount of protein loaded was really less. What kind of normalization / imputation is recommended in MaxQuant/Perseus and ProteomeDiscoverer ?
Thanks
Does anyone have experience using SDB-RPS StageTips for peptide desalting? I have been recommended to use the Empore brand, but cannot, for the life of me, find if/with what I need to condition the tip with prior to sample loading. Can anyone clarify if/what I need to equilibrate the SDB-RPS StageTip with prior to sample loading? Thanks!
I am completely new to proteomics. Everyone in my lab uses formic acid instead of TFA, but this particular protocol uses TFA throughout-- 0.1%, 0.2%, 1% TFA at various steps. I went to order TFA and found that it is sold as powder (in grams) and already in solution in (mL).
I read that the density of TFA is much different than water, so 1% TFA w/v vs. 1% TFA v/v are actually quite different solutions. I have tried to google and read papers, but no one states whether their TFA is w/v or v/v, which leads me to think there is some sort of convention in the field... Which should I use for my peptide desalting protocols, TFA solutions w/v or v/v? Thanks in advance for your help!
Hello,
We are integrating MSFragger with Scaffold on the command line (i.e. no Fragpipe GUI).
Does anybody know what exact files and formats (pepXML or tsv) Scaffold expects?
thx in advance
keesh@ieee.org
Hello all,
Random question for our timsTOF (SCP) users. Ever since we installed an Astral about 10ft from our SCP, we started noticing the inlet filter on the source was getting *really* dirty within a week when previously it took more like a month to get even a little dirty. Evil ploy by Thermo to poison the air for the competition or are we just more aware now? Our lab is a new building and the MS area is very clean (like the cleanest lab I've ever worked in).
With what frequency do you all change the inlet filter?
many thx
I want to place the freeze-dried sample directly in the injection bottle. I wish first to suck out 1ul from a specific reserve solution bottle, then inject it into the freeze-dried sample, and then suck out 1ul for injection (don't ask me what I want to do, I have a similar need)
I am doing FDR analysis on a big dataset on percolator, but it runs out of memory? How can I fix it? Can i distribute the process or something?
I have PD data and am trying to convert it to MSstatsTMT format, however when creating the input.pd file there are several rows of peptides that end up with NA in the columns for Mixture, TechRepMixture, Run, BioReplicate, and Condition. In the PSMs file from PD used to make raw.pd there are not any peptides that are not associated with a SpectrumFile (newly named File ID), so I'm not sure why these specific peptides are not being associated with the annotation info.
Since PDtoMSstatsTMTFormat expects a column named Spectrum.File in the raw.pd file, I just changed the name from File ID to Spectrum File and made sure the contents match the Run column in my annotation file.
When I run input.pd <- PDtoMSstatsTMTFormat(raw.pd, annotation.pd, which.proteinid = "Protein.Accessions") I get a warning:
WARN [2024-12-25 11:49:55] ** Condition in the input file must match condition in annotation.
I'm running R 4.4.2, MSstats 4.14.0, MSstatsConvert 1.16.1, and MSstatsTMT 2.14.1
This warning/error becomes an issue because when I run the proteinSummarization command i get this:
0%<simpleError in .Primitive("length")(newABUNDANCE, keep = TRUE): 2 arguments passed to 'length' which requires 1>
Error in merge.data.table(summarized, lab, by.x = c(merge_col, "Protein"), :
Elements listed in `by.x` must be valid column names in x.
In addition: Warning messages:
1: In dcast.data.table(LABEL + RUN ~ FEATURE, data = input, value.var = "newABUNDANCE", :
'fun.aggregate' is NULL, but found duplicate row/column combinations, so defaulting to length(). That is, the variables [LABEL, RUN, FEATURE] used in 'formula' do not uniquely identify rows in the input 'data'. In such cases, 'fun.aggregate' is used to derive a single representative value for each combination in the output data.table, for example by summing or averaging (fun.aggregate=sum or fun.aggregate=mean, respectively). Check the resulting table for values larger than 1 to see which combinations were not unique. See ?dcast.data.table for more details.
2: In merge.data.table(summarized, lab, by.x = c(merge_col, "Protein"), :
Input data.table 'x' has no columns.
like the title says- I am using Chimerys in PD, and getting errors. I have tried 30+ times with different settings and inputs and haven't gotten it to work once so I'm considering giving up on it because it just prolongs the processing time and there is no manual or description of the error codes anywhere.
Anyway here are the 3 errors I consistently get some combination of:
(1) All charge groups contain less than 100 candidates which is the minimum requirement per group for CE calibration. Please revisit the combination of raw file, fasta file, and search settings.
(2) Not enough PSMs for refinement learning
(3) Number of target peptides with FDR <1% is too low. Please revisit the combination of raw file, fasta file, and search settings.
Errors 1 & 2 usually have to do with just 1 or two specific input files (1 or 2 of the fractions) so only some of the Chimerys jobs end up failing (2 out of 4 let's say).
I have 8 fractionated runs of TMT10plex samples and another run with phospho-enrichment of the same sample. I am working with a non-model organism that's been pretty tricky to get working all around so I'm not sure if the data I've acquired is just not high quality enough for Chimerys or what. Without Chimerys I am still getting ~500 to 2000 high confidence protein groups depending on the species/conditions for the experiment and my labeling efficiency was ~98%, so I would say that's pretty good compared to what I expected and I don't think my data is complete crap. Maybe just not what's needed for Chimerys?
Does anyone else have experience with these kind of errors?