A subreddit dedicated to bioinformatics, computational genomics and systems biology.
news for genome hackers
- If you have a specific bioinformatics related question, there is also the question and answer site BioStar and the next generation sequencing community SEQanswers
- If you want to read more about genetics or personalized medicine, please visit /r/genomics
- Information about curated, biological-relevant databases can be found in /r/BioDatasets
- Multicore, cluster, and cloud computing news, articles and tools can be found over at /r/HPC.
Can someone explain what a typical day might look like in different years of a bioinformatics PhD student? I’m an undergrad in a wet lab right now and it doesn’t particularly appeal to me to continue pursuing it so I wanted to hear what a dry lab might be like.
I have a protein fasta file output from Bakta, but I want to know its Entrez ID, how would I achieve this? So far I have diamond blasted my protein fasta against the COG databse. is there anyway to convert my COG term to entrez IDs. Or I guess map my proteins to a Entrez database?
A few years ago, we started a program at Stanford Medicine for patients that had exhausted their clinical options and needed to move beyond Standard of Care for treatment. The idea was to generate advanced longitudinal multi-omic data, then make the data available to Bioinformaticians around the world for cutting edge analysis. The resulting insights went through a rapidly translation phase with Pharmacologists and Oncologists producing new treatment possibilities for the patients.
This week, we're presenting the results from this work, for our first two pilot cases: Shirley and Vanessa. Please Join Us Online This Friday June 9th.
We're eager to have more folks from the bioinformatics community directly involved in patient cases!
sorry for such an easy question, but I'm hoping someone here can help me out: I am working in Seurat with a 10X dataset and I have successfully completed the workflow suggested by the documentation, but I am trying to get a list of all genes expressed in a given cluster (for example, I have 7 clusters in my tissue sample, and I would like a complete gene list for each of the clusters defined).
I am sure this is a very easy thing to do, but I cant seem to figure it out. I can get the differentially expressed genes per cluster, but not the full gene lists. thanks in advance for any help!
Title. Anything from genetics to bioinformatics. What books would you recommend?
If it helps guide your recommendations, I am a current graduate student with 3 semesters left in my program, have taken undergraduate statistics before but remember very little from it. I would like to know what I need to know to understand and interpret papers, as well as the mathematical methods to deploy for use in my future career, this being working in industry as a bioinformatics scientist.
I am a junior in high school. I'm not going to lie, I know very little about bioinformatics but I'm also very passionate about it and its a super interesting topic to me. I'd like to create a bioinformatics club in high school. I have a Data Science teacher who's very knowledgeable and eager to learn, so he can definitely fill in for my lack of knowledge and help here and there, but I still have to be the one to plan the club activities/labs. Do y'all have any ideas for fun labs/activities I could set up for high school students? I'm assuming 50% of the club members will have taken ap statistics and ap comp sci a, and only three members are familiar with data science with R and Python/JupyterLab.
I need to create mutant peptides with Rosetta to use in Virtual Screening, the mutations need to be kinda ''random'' (not chosen directly by me) and, if possible, following an evolutionary bias (genetic algorithm). I'm really new at protein modelling and I have only used AlphaFold. Rosetta has many ways to use, and I'm kinda lost, should I use Rosetta Scripts or PyRosetta? Can I do this with both? I haven't found any tutorial that explains how to create mutant peptides following a genetic algorithm with rosetta :(
For an analysis of my data, I have a transcriptome and a list of sequences obtained from the transcriptome. I would like to perform a functional enrichment analysis. I have annotated both sets of data using eggnog mapper. Currently, I want to perform a test between the two functional annotations, specifically COGs (Clusters of Orthologous Groups). I have tried using the R code https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html#gsea-algorithm
with clusterProfiler, but it seems that it may not work. With which tools or code can I perform this test, please?
exemple somme of my data :
What's the best tool that I can use to annotate assembled fungal genomes (from ONT reads)?
I have experience with Prokka and I used it to annotate bacterial genomes but I don't think that it would work with fungi, for that I am looking for other tools.
Thanks in advance
I am currently working with the output from Autodock Vina using the --score_only argument and I'm having some difficulty understanding how to combine the different score terms (gauss 1, gauss 2, repulsion, hydrophobic, Hydrogen) to calculate the Affinity value.
According to the related publication (DOI 10.1002/jcc.21334), there's a set of weights for these terms, but the way to use these weights to obtain the Affinity is unclear to me. It doesn't appear to be a straightforward linear combination. Can someone help me to find the right formula to recover the Affinity from these terms?
Here's an example of the output I'm working with:
Affinity: -6.13476 (kcal/mol)
Intermolecular contributions to the terms, before weighting:
gauss 1 : 79.40942
gauss 2 : 1285.97614
repulsion : 2.61957
hydrophobic : 21.88511
Hydrogen : 4.41061
As per the associated paper (DOI 10.1002/jcc.21334), there are specified weights for these score terms, but I'm struggling to discern how these weights are applied to yield the Affinity value. I've attempted a direct linear combination using these weights:
--weight_gauss1 arg (=-0.035579) gauss_1 weight
--weight_gauss2 arg (=-0.005156) gauss_2 weight
--weight_repulsion arg (=0.84024500000000002) repulsion weight
--weight_hydrophobic arg (=-0.035069000000000003) hydrophobic weight
--weight_hydrogen arg (=-0.58743900000000004) Hydrogen bond weight
--weight_rot arg (=0.058459999999999998) N_rot weight
Furthermore, I've also considered the extra weight for rotatable bonds --weight_rot arg (=0.058459999999999998), given the example output I'm examining includes 13 active torsions.
Despite this, I haven't managed to successfully reconcile these figures to get the Affinity value.
I'm currently working on a coursework project where we have developed a modified version of the 1-D diffusion model proposed in Mechanistic stochastic model of histone modification pattern formation to see H3K9me3 modification distribution pattern.
Now, I want to take it a step further by testing the model's performance using experimental chip-seq data. However, I have limited experience with chip-seq data analysis and I'm unsure about how to do it.
Although this Cell paper shows how the test can be done, it doesn't provide any accompanying code, which has left me quite confused. Could someone kindly assist me with this challenge?
Our model basically just changes some parameter from here (from the Cell paper I mentioned above)
The visualization I want to achieve (also from the Cell paper)
P.S. I'm an undergraduate student new to the field of bioinformatics, so please forgive me if this question is too basic :)
I just finished my second year of undergrad and I’m not too sure what to do for grad school. I used to be pre-med and switched out because of the long schooling and long work hours. I am a cell and molecular bio major and I am planning on adding a data science and computer science minors. I want to go into bioinformatics because it seems to combine all of my interests, but from what I’ve heard the job outlook with just a masters isn’t too great. If I get a masters in data science will I still be able to get jobs in bioinformatics? Or if I get a masters in data science will I be able to work in a biology field? Im not too interested in working in computer science but am open to data science careers in various fields. Any advice is appreciated :)
Hi, so I'm a final year BS-MS student majoring in Biology from India. I've just finished my MS thesis on zebrafish lipid droplet dynamics, trying to potentially connect it with the changes in lipid profile. It didn't exactly work out the way I had expected it to, but there were other factors beyond my control that contributed to it. My overall CGPA is decent though (around 8.6/10).
Anyway, during the course of working through this project and for a year or so before that, I taught myself to code. I'm comfortable in Python and R, and I've done a small project in epidemic modelling under a prof at my institute (which I'm currently trying to expand a bit to accommodate new ideas). I've also tried to teach myself to use Linux and Git (still a work in progress!) I took a course in bioinformatics and computational biology (as part of my curriculum) and learnt about biomarker screening for diseases. This is one of the topics I'm interested in, and I want to switch fields to include more of bioinfo/comp bio for my PhD (if possible I wouldn't want to give up completely on my wet lab though). I didn't apply for a PhD last cycle because my confidence was at an all time low but now that I'm out of that lab I'm slowly recovering. I've started to look for labs that work in the fields I'm interested in.
What advice would you have for me? (Also, I'm not sure if I've included all required info so please ask me anything else you'd want to know! Apologies in advance if this sounds muddled.)
Hi there, I am trying to install GEMINI for genetic variation on my local machine but even after multiple attempt to solve it I am unable resolve it. Dow anyone have any idea?
it initially says: The environment is inconsistent, please check the package plan carefully The following packages are causing the inconsistency: /
and give a long list after that gets stuck on
failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: /
and ends with some error
I have recently joined aab where they had a few nanopore RNA-Seq data and received a few more samples now. I have little to none long-read sequencinf analysis ezprience, so I need some help here.
The read quality (Phred Score) median on the previous smaples was 9. In the new samples is 12. Is this not too low? Or is it normal for both RNA-seq/Nanopore?
I also have a "smear" or a second lower quality circle in the density plot for the read quality/read length plot. This happens for most samples. Is this also normal? And what can explain it?
Databases for Ion Channel Receptors.
Could you please tell me which country is the best to study bioinformatics in master? I want to find work there after finishing my study with good salary.
I am a new scientist designing a microbiome study for the first time. I am sampling cows in 9 herds. Half are diseased and half are control (not diseased). I have extracted DNA from sample pools and I have 4 samples from each herd, that is, 2 diseased and 2 controls. 36 samples in total. Due to some constraints, I can only submit 18 samples for meta-genomics sequencing to the NGS facility (facility is making the library).
With analysis in mind, I am struggling to understand what would make statistical sense for study design.
(It will not matter if some of the samples are not sequenced by metagenomics because all 36 samples will be sequenced in a 16S microbiome run at our collaborators). I tried to use the microbiome study design shiny app. That required apriori knowledge about numbers of OTUs expected in the sample. I have no idea how many OTUs I can expect.
Could someone help with information about what makes better sense for a study design?
Currently I'm in my 6th semester of bs bioinformatics and this summer i want to learn some experience since thesis is near too. but the problem is I dont have any particular set skills. Shoud I
or if you guys have any advice please share since I'm feeling so lost these days and every batchmate i ask is atleast doing something
May be a dumb question, but when doing microbial metagenomics on a sample how can you determine which reads belong to a certain a specie?. I was recently sent a bunch of 150bp reads that I was told showed similarity to a certain bacterial species, but I just used BLAST on all the reads and this particular species never showed up as the top match based on their algorithm for any of the reads. In my head, the only way I see that one could understand which reads belong to a particular species is if they assemble a long enough contig or sequencing of a certain housekeeping gene.
Any advice would be appreciated, hopefully this was clear, thank you.
I have a BS in exercise science and work as a cardiac monitor tech for about a year. I am planning to begin a masters in biomedical informatics with a focus in health data science, if I do what would be my career options? Would I be able to become a data scientist? What pay can I expect for careers with this degree?
Hello everyone, I would really appreciate your opinion on which masters program to choose out of these options. I was having issues formatting a pros and cons column so I will include that separately from the table.
Some background about me is that I'm a Canadian with a molecular biology degree. I tried applying to programs that have internships because I think it's really important to get practical experience and it would be a big plus if I can land a paid internship.
My main career goal is to get into the field of machine learning engineering preferably in the biology space but I'd be fine with that job in another space. I learned relatively recently that there are computer science masters that accept students who did not major in computer science. I have applied to these and my ideal plan is to get a masters in CS while self-studying the math needed for machine learning. The reason for this plan was because my math fundamentals are fairly weak and I'm not a fan of academic math courses. As a fallback, I could always go into software development after getting a CS degree. However, I haven't heard back from any of the CS programs I've applied to and there's a chance I could not be accepted into any. Meanwhile I've already been accepted to bioinformatics masters and the deadline to confirm enrollment is quickly approaching. So, I think the safe option would be to accept one of these bioinformatics programs instead of risking it and betting that I will get accepted into a CS masters.
University | Costs | Curriculum | Internship | Program Length | Rankings (Average of QS and Times) | ---|---|----|----|----|----|----|--- Boston University | 61,050 USD | link | 2 weeks | 1 year | 90 Northeastern University (Boston) | 55,400 USD | link | 3-6 months | 2 years | 278 UvA/VU (Netherlands) | 31,160 (Euro) | link (I would choose the bioinformatics concenctration)| 10 months? | 2 years | 59 (Amsterdam) / 168 (Vrije) Wageningen (Netherlands) | 39,200 (Euro) | link | 6 months | 2 years | 92
Right now I'm leaning on either Boston University or the University of Amsterdam (UvA/VU) but I'd love to hear your opinion!
I want to do functional analysis of fish microbiome cultured from different salinity to see its influence on microbiome. Nanopore technology was used for sequencing, so long reads. Please do suggest tools you think is suitable here
I’m looking for some input/advice regarding pivoting to an industry position and putting together a “portfolio” due to my lack of publications. So any and all feedback from those in industry is greatly appreciated. And if you read all of this, thank you.
Some background. I finished my PhD in plant science in 2015. I have two publications from that; both are first author. The first was an algal genome paper published in Science (2013). I did genome annotation for this project, but it was pretty vanilla/amateur and mostly brute force, e.g., running BLAST searches manually, etc. There was already a bioinformaticist (CS background) on the project that did most, if not all, of the “heavy lifting”. I also got into phylogenetics pretty heavily as a spin-off project (2nd publication), and had to basically teach myself Linux to run various programs at the time. Also became familiar with using clusters and such.
I’ve done 3 postdocs, albeit each were pretty short lived. The first was all bench work and lasted about 10 months. The position/lab was not a good fit, and it was destroying aspects of my personal life, i.e., relationships, etc. The next year I accepted the second postdoc because I desperately needed a job. This was a combination of dry/wet lab work (plant epitranscriptomics). Through this I got to dabble with chIP-seq and then started messing around with genome assembly and other NGS data (by my own volition). My contract was for 11 months and was not extended due to lack of funding. Fast forward to my third postdoc which started out all wet lab (plant synthetic biology) and then towards the end got to mess with RNAseq data for differential expression. I started the analysis for this, but eventually had someone from a neighboring lab specializing in RNAseq finish the analysis because they already had all of the pipelines basically drag and drop (I do have all of the scripts used to run the analysis from beginning to end). The third postdoc lasted 13 months, again due to lack of funding. The project I was initially brought in to take over had the funding revoked a few months after arriving to the lab. The biggest kicker, in my opinion, is that I have no publications from any of these postdocs.
Following my last postdoc I accepted a position with a hemp company as a geneticist. They had NGS sequencing data provided by Medicinal Genomics (Kannapedia) for quite a few of their strains. I started with raw sequencing data and pulled out all of the relevant sequences for cannabinoid biosynthesis and such. This didn’t really go much beyond that as I was laid off 6 months after getting hired; the company ran out of money and started laying off everyone...
For the past couple of years I’ve been working in intellectual property. I have no intention of attending law school, so I think I’ve pretty much reached my ceiling in this field.
So, I want to pursue a remote bioinformatics position in industry. I have a pretty strong wetlab background, which I think will help my cause as I can communicate very well with bench scientists. I understand the lingo and science/methodologies.
I watched all of Harvard’s CS50 (I did some, but definitely not all of the assignments), and I’m almost through Helsinki’s MOOC python course. I know I still have a lot to learn with Python, and I’ve barely touched R. I also have not messed with Next Flow or snakemake. I have a friend in industry and Nextflow is used heavily by their company.
My question is what should I do regarding putting together a “portfolio” that shows I’m competent sans publications? Should I attend a bootcamp? I could reach out to various research labs and offer to do free “on the side” work in exchange for my name on papers? Should I do yet another postdoc (don’t really want to) but focus strictly on “dry lab” work?
I do plan on using the hemp NGS data to do various analyses (not sure what just yet), and I can probably sneak that in my resume as real experience. I thought about even building a database to store all of the “cleaned” sequence data for simplified retrieval. Not that I need it, but just to show competency.
If you read all of this, thank you. Any and all feedback is appreciated. I consider my career thus far quite the dumpster fire, but I’m hoping to change that and settle into a somewhat new direction. I’ve also considered data science, but given my degree and experience, I think bioinformatics is a much more natural transition for the time being.
Hi, I would like to ask how to put a threshold line in phylogenetic tree. I am using UGENE software in constructing the tree.
Please answer ASAP😭 thanks
Hello, I would like to ask on how to put a threshold line in phylogenetic tree?
Please answer ASAP😭
I've got an ATAC-seq data set, and have identified motifs for my TF of interest in open regions. I've got a set of regions that are open only in my experimental group, and want to see which genes nearest to open sites in this group have more TF motifs than expected from background, which is the number of sites on all peaks open in control and experimental cells. I've tried binomial p, but the data isn't binomially distributed and so I get artefacts like huge genes with a single site coming up as significant (and MiRNAs). I'd appreciate any advice about how to proceed. Thanks!
I currently have shotgun metagenome data. I quality-filtered reads at Q30 and employed Kaiju and GOTTCHA2 using default parameters.
My sample is marine water. And yes, I know phages are more abundant than bacteria. This is my first time seeing reads-based taxonomic profile with almost 90% of reads belonging to phages! Is this a cause of alarm? Or it is just phages dominate my sample?
I've handled wastewater samples before which are more known to harbor A LOT of phages but the reads suggested that there are still more bacteria than phages.
I'm still waiting for my metagenome assembly to corroborate whether an assembly-based approach would recapitulate my assembly-free taxonomic profile.
Any comments would be appreciated! Comments on how I may go about, literature to read, or whatever.