/r/proteomics
This subreddit is dedicated to dissemination and discussion regarding the latest research and news in the field of proteomics.
The Proteomics Reddit
Proteomics - the large-scale study of proteins. Proteins are vital parts of living organisms, with many functions. The term proteomics was coined in 1997 in analogy with genomics, the study of the genome. The word proteome is a portmanteau of protein and genome, and was coined by Marc Wilkins in 1994 while he was a PhD student at Macquarie University.
The proteome is the entire set of proteins that are produced or modified by an organism or system. This varies with time and distinct requirements, or stresses, that a cell or organism undergoes. Proteomics is an interdisciplinary domain that has benefited greatly from the genetic information of the Human Genome Project; it is also emerging scientific research and exploration of proteomes from the overall level of intracellular protein composition, structure, and its own unique activity patterns. It is an important component of functional genomics.
While proteomics generally refers to the large-scale experimental analysis of proteins, it is often specifically used for protein purification and mass spectrometry. Wikipedia: proteomics
Related Reddits
Outside Reddit Sites
/r/proteomics
I'm working with non-human serum samples. While constructing a simple protein rank abundance plot I realized that the ranking output from Spectronaut differs from the ranking constructed with MS-DAP during downstream analysis (which uses MaxLFQ peptide-protein rollup with an input of the same Spectronaut "raw" report).
I want to have a better understanding of why these two different lists are generated. I'm inclined to trust the Spectronaut output since Albumin is ranked first and that is what I'd expect biologically, but I'm really curious as to why these two lists aren't just the same.
Looking at the Top 5 proteins from each, I get:
Spectronaut (Rank + Protein Description)
Albumin
Serotransferrin
Serpin Family A Member 1
Histidine-rich glycoprotein
Collagen Type XX alpha 1 chain
MS-DAP
Glycoprotein 1b platelet subunit beta
Collagen Type XX alpha 1 chain
Rotatin
Protein Kinase cAMP-dependent type 1 regulatory subunit beta
Albumin
Hello community. I am trying to understand next steps after an ANOVA test. I started with a matrix from a time course experiment with 4 time points. For each time point, I have 2 biological replicates. Following filtering, normalisation and log2 transformation, I performed an ANOVA test with S0=0, Benjamini-Hochberg FDR 0.01. I then filtered the ANOVA significant values and performed the Tukey's Honestly Significant difference (THSD). The output lists the pairwise groups which are significantly differentially expressed. What is the next step of the analysis? Do you simply report the statistically different groups or is there a possibility to perform further statistical tests on the significantly different groups?
When performing enrichment analysis on proteins, I use the significantly changing proteins against the background of all the proteins detected in my assay. For enrichment analysis of proteins with significantly changing phosphosites, what is the appropriate background list? Is it all the detected proteins as before or all the detected phosphorylated proteins?
Hi all, I am very new to proteomics and feel very lost with handling and representing such large datasets in the form of graphs/figures. I specifically work on characterizing the protein corona formed around nanoparticles and how this can be used to explain uptake levels of nanoparticles with different surface properties in mammalian cells. Any textbook and/or software suggestions would be really helpful. Thanks!
Hello everyone, I apologize if I sound like an idiot or am wasting people's time but this shows how truly new I am to this.
Long story short, I am trying to write a paper and decided I wanted to see if it is even realistic to discuss before saying, "Here's my theory." Anyway, I am on UCSF Chimera, and I FINALLY modified this glycoprotein the way I hypothesized, Chimera was telling me it was A-OK. I know my next steps are to write about this, get experimental validation, and possibly go into testing. Any advice on where to go or how?
The potential advantages of my modified protein include enhanced stability, improved binding affinity, biological activity, altered immune response, potential for remyelination, novel therapeutic approach, research innovation, and preliminary positive results.
Hello everyone, I'm searching for histidine phosphorylation using MaxQuant and I find some articles set HSTY phosphorylation altogether while others set H and STY phosphorylation independently. Are there any differences between these two types of settings? Which one should I choose?
I was able to follow the ProteoDA tutorial; however, I have abundances for one group and NAs for the second group (so unique proteins). Through the end of the analysis, the result output for statistical analysis between group has NAs (for pvalues, etc.). How do I get stats for these proteins? Can I just add 0.1 to all abundances, including NAs?
I have a sample that contains host and pathogen proteins (viral infection). I am interested in proteins from both. I am wondering, when doing the database search, should I upload both proteomes to search against (output contains list of proteins for both) or should I search them independently (two different output files for each species)? I will be using DIA-NN.
I am a bit confused how to do this. Can anyone help me in this process. 🪛
Did any one familiar with the process of using multiple spectral library for DIA LFQ data analysis? Is DIANN allow that? Other than this which software allows to do that?
How to compile multiple spectral library into one?
Thanks
Hi, I'm a student in year 9 in Australia and I am working on a data science project for a university course I'm doing for fun. The data I need is plasma proteomics data for cancer with cancer and non cancer data. Can anybody help with this or have this data? Or provide guidance? Any help will be appreciated.
Thank you
DIA method generally includes MS1 scan followed by sequential series of MS2 scans. However, I’m struggling to understand the benefits of including MS1 scan (and MS1 optimization such as BoxcarDIA) in the method since the software (particularly DIA-NN) use only MS2 for identification and quantification. Following this logic we don’t even need MS1 scan in the DIA run , so why bother sacrificing transient time for it?
Hi,
We’re offering an exciting PhD position for someone passionate about deep learning, especially in its application to bioinformatics. Our research group focuses on mass spectrometry, metabolomics, and enzymes, and we’re looking for someone with strong machine learning skills. No worries if your chemistry or biology background isn’t strong; our team includes experts who can support you in these areas.
The project is part of the European MSCA Doctoral Network ModBioTerp and involves designing deep learning models to predict enzyme activity. This has farreaching applications in drug development and industrial biochemistry. If you’re interested in applying your ML expertise to bioinformatics and mass spectrometry, this could be a great fit for you!
PhD position details and application link: https://www.uochb.cz/en/open-positions/293/modeling-the-mechanisms-of-terpene-biosynthesis-using-deep-learning
If you’re interested or have any questions, feel free to reach out. We believe this is a fantastic opportunity for anyone eager to apply their ML skills to an exciting, real world challenge in bioinformatics!
Thanks for your time and consideration!
Hi!
I’m trying to perform normalization in Spectronaut 18.6 for specific exosomes. I created a FASTA file containing the exosomes of interest and imported it into Spectronaut. However, when I try to filter using the FASTA file and include its name, I receive an error stating that no peptides remain. I’m not sure if Spectronaut even recognized that I included the FASTA file.
Has anyone successfully used the normalization filter? Could someone walk me through the process?
Thanks!
Hi everyone. I'm reading a paper in the field of metalloproteomics recently, and I find the selection of negative control protein confusing.
Researchers applied ICP-MS to detect zinc levels for GFPT1 and GFPT2 (two known zinc-binding proteins). They set tobacco etch virus(TEV) protease as negative control protein.
I am new to this field and I'd like to know why take TEV protease as negative control? Any clues?
Hi everyone. I'm a biophysicist working on membrane proteins and GPCRs using tools like EPR and cryo-EM. Recently, there is a need for me to perform MS on membrane proteins, but my PI does not have the expertise.
Can I get your input on how easy/difficult it is to do MS on these monsters?
Thank you very much.
Hi everyone,
I'm currently working on analyzing NGS data for antibody sequences, specifically focusing on determining germline diversity usage (V, D, and J gene assignment). I'm looking for someone with experience in this area to guide me through the process or assist with the analysis. Familiarity with tools like IgBLAST, IMGT, Change-O, or similar software is preferred.
I'm willing to pay for your time and expertise. If you're experienced in this kind of analysis and are available to help, please reach out! My email is kongmike368@gmail.com
Looking forward to hearing from you. Thanks in advance!
Hi everyone,
I'm currently working on analyzing NGS data for antibody sequences, specifically focusing on determining germline diversity usage (V, D, and J gene assignment). I'm looking for someone with experience in this area to guide me through the process or assist with the analysis. Familiarity with tools like IgBLAST, IMGT, Change-O, or similar software is preferred.
I'm willing to pay for your time and expertise. If you're experienced in this kind of analysis and are available to help, please reach out!
Looking forward to hearing from you. Thanks in advance!
Hi. I'm experiencing errors while running TimsTOF DDA data in MaxQuant (version 2.6.5.). The error appears during different stages of the LFQ process, such as during collection, normalization, and quantification steps. Could anyone please advise what might be causing these errors and suggest potential fixes? Thank you.
I am trying to find differentially expressed proteins using the DEP and DEP2 packages. The issue is when I run the test_diff function from DEP, it gives me a few significant proteins on the basis of my alpha value of 0.05. On the other hand, when I use the test_diff function from DEP2 package with fdr.type = "BH" and then add rejection on the basis of my alpha of 0.05, I get no significant proteins. I have no idea why this is happening. I am using the same pipeline for both methods for filtering and imputation.
I was given an excel sheet from the company that did my TMT labeled proteomics. I currently have both the abundances of the protein and abundance ratio between different samples that I am interested in. They identified ~2000 proteins. What software would be best for organizing the proteins into different pathways/cellular processes so that it’s easier to see what pathways are being upregulated vs downregulated? Thank you so much!!
Hoping the protein people can help me.
I want to get information about gene function based on loci. I have feature tables filtered my to loci of interest (a few hundred speard across a genome) Is something like rentrz, GEOprofiles the right way to do this?
I've search a few geneIDs and sometimes I get something informative sometimes I don't. I figure there's probably a better way.
I completely understand that different iterations of software like MQ can produce different IDs and quant. values to a certain (minimal) extent.
What I am experiencing now however with a phosphoproteomic data set (DDA PASEF, 36 samples, time course experiment with 3 biological replicates sampled in two phases of a bioprocess with 6 time points each time, 2 replicates 26 27 had initially some injection errors so I reran them afterwards on a new column) is a little bit mindblowing.
I know that MQ since 2.5 has improved PTM search integration in Andromeda, especially for more low abundant features (I see in benchmark sets a >50% increase in IDs after filtering). Also, based on investigating benchmark sets with 2.4 and 2.6 versions, phosphosite allocation has become a little bit more stringent. Additionally, I know MBR has possibly become more funky based on limited tests with the new versions.
Anyway, and this is the point I cannot explain why is happening, that this 36 sample dataset has (after filtering) in MQ 2.4.10 a biologically sound and comparable number of site IDs across replicates and all samples, while with 2.6.1 and 2.6.4 some samples completely loose IDs (see below). This also happens on phosphopeptide, peptide and protein levels. Initially, I thought it was a problem with MBR and using 2 samples from an independent run, but no, the error persists if I remove those samples. Also, the samples that are getting close to no IDs vary with the MQ version and they also vary if I include the separately run samples (which brings me back to funky MBR). I also found a bug thread on GitHub where a weird taxonomy ID setting did something similar, but no still persisted (see release for 2.6.5, where this error-producing setting was set off by default now).
I am currently running a search with MBR completely off but we will see. Additionally, I will do a fragpipe search for this phospho set as well.
Any idea why I am experiencing this with 2.6 versions and not with 2.4?
EDIT: this also represents protein, peptide and phosphopeptide levels, not exclusively for ST phospho sites!
I'm a bit confused on how bin size (width?) is chosen for high resolution systems as cited in this paper, particularly depending on product mass and instrument accuracy. Can someone give a numerical example to illustrate?
Ref: https://pubmed.ncbi.nlm.nih.gov/24896981/
Thanks
I had a cell pellet where I added 0.2 M h2so4 to extract histone. As per protocol I should have centrifuged the tubes, saved the pellet, and added 100% TCA. However, I forgot to spin it and save the supernatant. Instead, I added 100% tca to the pellet with h2so4 and kept it at 4C for overnight incubation.
I am proceeding with acetone wash after saving the supernatant. However, I do not see any pellets so I am worried.
Hello :)
I'm a beginner Perseus user and have a question about PCA plot generation. I've applied data filterings and imputation (replacing missing values with values from a normal distribution).
I was wondering if Perseus automatically applies autoscaling or normalization when generating a PCA plot, or if I should perform scaling or normalization, such as Z-score normalization, before running the PCA?
Thank you !
Hello!
I am new to proteomics and I was wondering if anyone has experience interpreting fragment information that has been formatted in the following example -
+2y12+1
For context, I am trying to format data for use in SAINTq (specifically the fragment level analysis) and I see that the peptide information in the example file has been formatted in the manner shown above. I've analyzed by data using DIA-NN and am able to obtain information about precursor and fragment charge, type of ion ('y' or 'b') as well as, fragment series number and I am trying to format the data in a manner compatible with SAINTq.
I am guessing its formatted like - +2(Precursor Charge) y(Ion Type) 12 (Fragment Series Number) +1 Fragment Charge, though I'm not quite sure.
Hi all,
On proteomeXchange there is a metadata tab called 'ModificationList'. In it I can find PTMs that have occured on proteins in the data. However, there seems to be some discrepancy in how they might be listed by people uploading their data.
For example, on protoemexchange the dataset PXD001684 only has the listed modification phosphorylation, but in the SDRF metadata sheet (which was manually annotated) modifications listed are also carbamidomethylation, oxidation, acetylation, deamidation, as well as phosphorylation.
So, my first question is, are some modifications deemed too 'obvious' to list in proteomexchange metadata? Oxidation, deamidation, etc?
As a follow up question, if I am reanalysing a proteomics dataset and I have incomplete information (e.g. only phosphorylation is listed), are there a list of modifications I should assume have happened, or at least, I should assume could have happened?
I have a proteomics dataset where I am confident that lot of lipid peroxidation has taken place. Is there any way to look for lipid peroxide derived adduct formation. Can someone point to any good resource/literature for the same? Do I need sample enrichment in this case, or can I expect some adducts in regular bottom up proteomics data.
How do you handle organic solutions in your lab? Do you just use plastic pipettes to transfer from the 2L bottle or do you have a better system? I'm thinking of getting bottle top dispensers, any opinion on that?