Bioinformatics PhD student

Can someone explain what a typical day might look like in different years of a bioinformatics PhD student? I’m an undergrad in a wet lab right now and it doesn’t particularly appeal to me to continue pursuing it so I wanted to hear what a dry lab might be like.

01:29 UTC


How to get Entrez IDs for my protein fasta

I have a protein fasta file output from Bakta, but I want to know its Entrez ID, how would I achieve this? So far I have diamond blasted my protein fasta against the COG databse. is there anyway to convert my COG term to entrez IDs. Or I guess map my proteins to a Entrez database?

21:44 UTC


Oncology Patient Cases @ Stanford Med

Hi All,

A few years ago, we started a program at Stanford Medicine for patients that had exhausted their clinical options and needed to move beyond Standard of Care for treatment. The idea was to generate advanced longitudinal multi-omic data, then make the data available to Bioinformaticians around the world for cutting edge analysis. The resulting insights went through a rapidly translation phase with Pharmacologists and Oncologists producing new treatment possibilities for the patients.

This week, we're presenting the results from this work, for our first two pilot cases: Shirley and Vanessa. Please Join Us Online This Friday June 9th.

We're eager to have more folks from the bioinformatics community directly involved in patient cases!

20:42 UTC


Help with getting gene lists for clusters in Seurat

Hi all,

sorry for such an easy question, but I'm hoping someone here can help me out: I am working in Seurat with a 10X dataset and I have successfully completed the workflow suggested by the documentation, but I am trying to get a list of all genes expressed in a given cluster (for example, I have 7 clusters in my tissue sample, and I would like a complete gene list for each of the clusters defined).

I am sure this is a very easy thing to do, but I cant seem to figure it out. I can get the differentially expressed genes per cluster, but not the full gene lists. thanks in advance for any help!

20:21 UTC


What is a must-read for anyone looking to explore the field of Bioinformatics?

Title. Anything from genetics to bioinformatics. What books would you recommend?

19:54 UTC


Please share the best resources for me to self-study statistics and statistical methods for use in my future work.

If it helps guide your recommendations, I am a current graduate student with 3 semesters left in my program, have taken undergraduate statistics before but remember very little from it. I would like to know what I need to know to understand and interpret papers, as well as the mathematical methods to deploy for use in my future career, this being working in industry as a bioinformatics scientist.

19:15 UTC


Ideas for a High School Bioinformatics Club?

I am a junior in high school. I'm not going to lie, I know very little about bioinformatics but I'm also very passionate about it and its a super interesting topic to me. I'd like to create a bioinformatics club in high school. I have a Data Science teacher who's very knowledgeable and eager to learn, so he can definitely fill in for my lack of knowledge and help here and there, but I still have to be the one to plan the club activities/labs. Do y'all have any ideas for fun labs/activities I could set up for high school students? I'm assuming 50% of the club members will have taken ap statistics and ap comp sci a, and only three members are familiar with data science with R and Python/JupyterLab.

16:16 UTC


How can I create mutant peptides with Rosetta?

I need to create mutant peptides with Rosetta to use in Virtual Screening, the mutations need to be kinda ''random'' (not chosen directly by me) and, if possible, following an evolutionary bias (genetic algorithm). I'm really new at protein modelling and I have only used AlphaFold. Rosetta has many ways to use, and I'm kinda lost, should I use Rosetta Scripts or PyRosetta? Can I do this with both? I haven't found any tutorial that explains how to create mutant peptides following a genetic algorithm with rosetta :(

16:14 UTC


overrepresentation test, between transcriptome and candidates sequences obtained from the transcriptome

For an analysis of my data, I have a transcriptome and a list of sequences obtained from the transcriptome. I would like to perform a functional enrichment analysis. I have annotated both sets of data using eggnog mapper. Currently, I want to perform a test between the two functional annotations, specifically COGs (Clusters of Orthologous Groups). I have tried using the R code https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html#gsea-algorithm

with clusterProfiler, but it seems that it may not work. With which tools or code can I perform this test, please?

exemple somme of my data :

15:33 UTC


Annotation of fungal genomes

Hello everyone,

What's the best tool that I can use to annotate assembled fungal genomes (from ONT reads)?

I have experience with Prokka and I used it to annotate bacterial genomes but I don't think that it would work with fungi, for that I am looking for other tools.

Thanks in advance

14:30 UTC


Deriving Affinity from Individual Terms in Autodock Vina Scoring Output

Reddit community,

I am currently working with the output from Autodock Vina using the --score_only argument and I'm having some difficulty understanding how to combine the different score terms (gauss 1, gauss 2, repulsion, hydrophobic, Hydrogen) to calculate the Affinity value.

According to the related publication (DOI 10.1002/jcc.21334), there's a set of weights for these terms, but the way to use these weights to obtain the Affinity is unclear to me. It doesn't appear to be a straightforward linear combination. Can someone help me to find the right formula to recover the Affinity from these terms?

Here's an example of the output I'm working with:
Affinity: -6.13476 (kcal/mol)
Intermolecular contributions to the terms, before weighting:
gauss 1 : 79.40942
gauss 2 : 1285.97614
repulsion : 2.61957
hydrophobic : 21.88511
Hydrogen : 4.41061

As per the associated paper (DOI 10.1002/jcc.21334), there are specified weights for these score terms, but I'm struggling to discern how these weights are applied to yield the Affinity value. I've attempted a direct linear combination using these weights:

--weight_gauss1 arg (=-0.035579) gauss_1 weight
--weight_gauss2 arg (=-0.005156) gauss_2 weight
--weight_repulsion arg (=0.84024500000000002) repulsion weight
--weight_hydrophobic arg (=-0.035069000000000003) hydrophobic weight
--weight_hydrogen arg (=-0.58743900000000004) Hydrogen bond weight
--weight_rot arg (=0.058459999999999998) N_rot weight

Furthermore, I've also considered the extra weight for rotatable bonds --weight_rot arg (=0.058459999999999998), given the example output I'm examining includes 13 active torsions.

Despite this, I haven't managed to successfully reconcile these figures to get the Affinity value.

12:39 UTC


How to validate my H3K9me3 pattern formation model using chip-seq data?

I'm currently working on a coursework project where we have developed a modified version of the 1-D diffusion model proposed in Mechanistic stochastic model of histone modification pattern formation to see H3K9me3 modification distribution pattern.

Now, I want to take it a step further by testing the model's performance using experimental chip-seq data. However, I have limited experience with chip-seq data analysis and I'm unsure about how to do it.

Although this Cell paper shows how the test can be done, it doesn't provide any accompanying code, which has left me quite confused. Could someone kindly assist me with this challenge?

Our model basically just changes some parameter from here (from the Cell paper I mentioned above)

The visualization I want to achieve (also from the Cell paper)

P.S. I'm an undergraduate student new to the field of bioinformatics, so please forgive me if this question is too basic :)

02:57 UTC


Bioinformatics or Data Science?

Hi everyone,

I just finished my second year of undergrad and I’m not too sure what to do for grad school. I used to be pre-med and switched out because of the long schooling and long work hours. I am a cell and molecular bio major and I am planning on adding a data science and computer science minors. I want to go into bioinformatics because it seems to combine all of my interests, but from what I’ve heard the job outlook with just a masters isn’t too great. If I get a masters in data science will I still be able to get jobs in bioinformatics? Or if I get a masters in data science will I be able to work in a biology field? Im not too interested in working in computer science but am open to data science careers in various fields. Any advice is appreciated :)

01:04 UTC


I want to switch fields for my PhD.

Hi, so I'm a final year BS-MS student majoring in Biology from India. I've just finished my MS thesis on zebrafish lipid droplet dynamics, trying to potentially connect it with the changes in lipid profile. It didn't exactly work out the way I had expected it to, but there were other factors beyond my control that contributed to it. My overall CGPA is decent though (around 8.6/10).

Anyway, during the course of working through this project and for a year or so before that, I taught myself to code. I'm comfortable in Python and R, and I've done a small project in epidemic modelling under a prof at my institute (which I'm currently trying to expand a bit to accommodate new ideas). I've also tried to teach myself to use Linux and Git (still a work in progress!) I took a course in bioinformatics and computational biology (as part of my curriculum) and learnt about biomarker screening for diseases. This is one of the topics I'm interested in, and I want to switch fields to include more of bioinfo/comp bio for my PhD (if possible I wouldn't want to give up completely on my wet lab though). I didn't apply for a PhD last cycle because my confidence was at an all time low but now that I'm out of that lab I'm slowly recovering. I've started to look for labs that work in the fields I'm interested in.

What advice would you have for me? (Also, I'm not sure if I've included all required info so please ask me anything else you'd want to know! Apologies in advance if this sounds muddled.)

00:22 UTC


Help on installation of GEMINi

Hi there, I am trying to install GEMINI for genetic variation on my local machine but even after multiple attempt to solve it I am unable resolve it. Dow anyone have any idea?

it initially says: The environment is inconsistent, please check the package plan carefully The following packages are causing the inconsistency: /

  • conda-forge/noarch::flask==2.3.2=pyhd8ed1ab_0
  • conda-forge/noarch::huggingface_hub==0.15.1=pyhd8ed1ab_0
  • conda-forge/noarch::ipython==8.14.0=pyh41d4057_0
  • defaults/linux-64::conda-build==3.24.0=py310h06a4308_0
  • conda-forge/noarch::requests-toolbelt==1.0.0=pyhd8ed1ab_0
  • defaults/linux-64::transformers==4.24.0=py310h06a4308_0
  • conda-forge/noarch::ipykernel==6.23.1=pyh210e3f2_0
  • conda-forge/noarch::panel==0.14.0=pyhd8ed1ab_0
  • defaults/noarch::conda-token==0.4.0=pyhd3eb1b0_0
  • conda-forge/linux-64::matplotlib==3.7.1=py310hff52083_0
  • defaults/linux-64::anaconda-navigator==2.4.1=py310h06a4308_0
  • conda-forge/noarch::urllib3==2.0.2=pyhd8ed1ab_0
  • defaults/linux-64::anaconda==custom=py310_1
  • defaults/linux-64::scipy==1.10.1=py310hd5efca6_0
  • conda-forge/noarch::imbalanced-learn==0.10.1=pyhd8ed1ab_0
  • defaults/linux-64::qt-webengine==5.15.9=hd2b0992_4
  • conda-forge/noarch::cookiecutter==2.1.1=pyh6c4a22f_0
  • defaults/linux-64::_anaconda_depends==2023.03=py310_0
  • conda-forge/linux-64::scikit-image==0.20.0=py310h9b08913_1

and give a long list after that gets stuck on

failed with initial frozen solve. Retrying with flexible solve. Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source. Collecting package metadata (repodata.json): done Solving environment: failed with initial frozen solve. Retrying with flexible solve. Solving environment: /

and ends with some error

20:52 UTC


Nanopore RNA-Seq Quality data interpretation

I have recently joined aab where they had a few nanopore RNA-Seq data and received a few more samples now. I have little to none long-read sequencinf analysis ezprience, so I need some help here.
The read quality (Phred Score) median on the previous smaples was 9. In the new samples is 12. Is this not too low? Or is it normal for both RNA-seq/Nanopore?

I also have a "smear" or a second lower quality circle in the density plot for the read quality/read length plot. This happens for most samples. Is this also normal? And what can explain it?

Thank you

16:31 UTC


Where can I get a list of all ion channel receptors in mouse?

Databases for Ion Channel Receptors.

14:58 UTC



Could you please tell me which country is the best to study bioinformatics in master? I want to find work there after finishing my study with good salary.

11:20 UTC


Computational Biology Tools

01:45 UTC


Microbiome study design

I am a new scientist designing a microbiome study for the first time. I am sampling cows in 9 herds. Half are diseased and half are control (not diseased). I have extracted DNA from sample pools and I have 4 samples from each herd, that is, 2 diseased and 2 controls. 36 samples in total. Due to some constraints, I can only submit 18 samples for meta-genomics sequencing to the NGS facility (facility is making the library).

With analysis in mind, I am struggling to understand what would make statistical sense for study design.

  1. Should I submit one diseased and one non diseased pool from each herd?
  2. Or should I submit 4 pooled samples from 4 herds and 2 from one herd.

(It will not matter if some of the samples are not sequenced by metagenomics because all 36 samples will be sequenced in a 16S microbiome run at our collaborators). I tried to use the microbiome study design shiny app. That required apriori knowledge about numbers of OTUs expected in the sample. I have no idea how many OTUs I can expect.

Could someone help with information about what makes better sense for a study design?

13:12 UTC


internship advice for bioinformatics

Currently I'm in my 6th semester of bs bioinformatics and this summer i want to learn some experience since thesis is near too. but the problem is I dont have any particular set skills. Shoud I

  1. Apply for internship in a research lab (advice for what should i add to my cv to make them interested)
  2. Learn some coding through online courses bcs its very weak (which languages would you suggest?)

or if you guys have any advice please share since I'm feeling so lost these days and every batchmate i ask is atleast doing something

08:16 UTC


Question About Sequencing Reads

May be a dumb question, but when doing microbial metagenomics on a sample how can you determine which reads belong to a certain a specie?. I was recently sent a bunch of 150bp reads that I was told showed similarity to a certain bacterial species, but I just used BLAST on all the reads and this particular species never showed up as the top match based on their algorithm for any of the reads. In my head, the only way I see that one could understand which reads belong to a particular species is if they assemble a long enough contig or sequencing of a certain housekeeping gene.

Any advice would be appreciated, hopefully this was clear, thank you.

05:00 UTC


Career options

I have a BS in exercise science and work as a cardiac monitor tech for about a year. I am planning to begin a masters in biomedical informatics with a focus in health data science, if I do what would be my career options? Would I be able to become a data scientist? What pay can I expect for careers with this degree?

00:10 UTC


Opinion on These Masters Programs and Advice on Future Masters Choice

Hello everyone, I would really appreciate your opinion on which masters program to choose out of these options. I was having issues formatting a pros and cons column so I will include that separately from the table.

Some background about me is that I'm a Canadian with a molecular biology degree. I tried applying to programs that have internships because I think it's really important to get practical experience and it would be a big plus if I can land a paid internship.

My main career goal is to get into the field of machine learning engineering preferably in the biology space but I'd be fine with that job in another space. I learned relatively recently that there are computer science masters that accept students who did not major in computer science. I have applied to these and my ideal plan is to get a masters in CS while self-studying the math needed for machine learning. The reason for this plan was because my math fundamentals are fairly weak and I'm not a fan of academic math courses. As a fallback, I could always go into software development after getting a CS degree. However, I haven't heard back from any of the CS programs I've applied to and there's a chance I could not be accepted into any. Meanwhile I've already been accepted to bioinformatics masters and the deadline to confirm enrollment is quickly approaching. So, I think the safe option would be to accept one of these bioinformatics programs instead of risking it and betting that I will get accepted into a CS masters.

University | Costs | Curriculum | Internship | Program Length | Rankings (Average of QS and Times) | ---|---|----|----|----|----|----|--- Boston University | 61,050 USD | link | 2 weeks | 1 year | 90 Northeastern University (Boston) | 55,400 USD | link | 3-6 months | 2 years | 278 UvA/VU (Netherlands) | 31,160 (Euro) | link (I would choose the bioinformatics concenctration)| 10 months? | 2 years | 59 (Amsterdam) / 168 (Vrije) Wageningen (Netherlands) | 39,200 (Euro) | link | 6 months | 2 years | 92

Boston University


  • 1 year program means quicker to graduate and enter the job market
  • Access to lucrative US market
  • One of America's biotech hubs
  • 15k scholarship


  • Very expensive
  • Not as much depth as other programs

Northeastern University


  • Most of the pros from Boston University apply to Northeastern too
  • Nice choice in elective programs
  • No thesis


  • Worst ranked
  • Don't provide much support in securing a sponsor



  • Great selection of data science/machine learning courses
  • Opportunity for long internship


  • Difficulty securing housing in Amsterdam (housing crisis)
  • Courses seem very demanding



  • University that specializes in the life sciences
  • Housing may not be as difficult to find/expensive as Amsterdam


  • Courses don't look as appealing to me

Right now I'm leaning on either Boston University or the University of Amsterdam (UvA/VU) but I'd love to hear your opinion!

Thank you

23:25 UTC


Tools suggestion for functional analysis

I want to do functional analysis of fish microbiome cultured from different salinity to see its influence on microbiome. Nanopore technology was used for sequencing, so long reads. Please do suggest tools you think is suitable here

14:27 UTC


Input/Advice Regarding "Portfolio" Sans Publications (PhD industry)

I’m looking for some input/advice regarding pivoting to an industry position and putting together a “portfolio” due to my lack of publications. So any and all feedback from those in industry is greatly appreciated. And if you read all of this, thank you.

Some background. I finished my PhD in plant science in 2015. I have two publications from that; both are first author. The first was an algal genome paper published in Science (2013). I did genome annotation for this project, but it was pretty vanilla/amateur and mostly brute force, e.g., running BLAST searches manually, etc. There was already a bioinformaticist (CS background) on the project that did most, if not all, of the “heavy lifting”. I also got into phylogenetics pretty heavily as a spin-off project (2nd publication), and had to basically teach myself Linux to run various programs at the time. Also became familiar with using clusters and such.

I’ve done 3 postdocs, albeit each were pretty short lived. The first was all bench work and lasted about 10 months. The position/lab was not a good fit, and it was destroying aspects of my personal life, i.e., relationships, etc. The next year I accepted the second postdoc because I desperately needed a job. This was a combination of dry/wet lab work (plant epitranscriptomics). Through this I got to dabble with chIP-seq and then started messing around with genome assembly and other NGS data (by my own volition). My contract was for 11 months and was not extended due to lack of funding. Fast forward to my third postdoc which started out all wet lab (plant synthetic biology) and then towards the end got to mess with RNAseq data for differential expression. I started the analysis for this, but eventually had someone from a neighboring lab specializing in RNAseq finish the analysis because they already had all of the pipelines basically drag and drop (I do have all of the scripts used to run the analysis from beginning to end). The third postdoc lasted 13 months, again due to lack of funding. The project I was initially brought in to take over had the funding revoked a few months after arriving to the lab. The biggest kicker, in my opinion, is that I have no publications from any of these postdocs.

Following my last postdoc I accepted a position with a hemp company as a geneticist. They had NGS sequencing data provided by Medicinal Genomics (Kannapedia) for quite a few of their strains. I started with raw sequencing data and pulled out all of the relevant sequences for cannabinoid biosynthesis and such. This didn’t really go much beyond that as I was laid off 6 months after getting hired; the company ran out of money and started laying off everyone...

For the past couple of years I’ve been working in intellectual property. I have no intention of attending law school, so I think I’ve pretty much reached my ceiling in this field.

So, I want to pursue a remote bioinformatics position in industry. I have a pretty strong wetlab background, which I think will help my cause as I can communicate very well with bench scientists. I understand the lingo and science/methodologies.

I watched all of Harvard’s CS50 (I did some, but definitely not all of the assignments), and I’m almost through Helsinki’s MOOC python course. I know I still have a lot to learn with Python, and I’ve barely touched R. I also have not messed with Next Flow or snakemake. I have a friend in industry and Nextflow is used heavily by their company.

My question is what should I do regarding putting together a “portfolio” that shows I’m competent sans publications? Should I attend a bootcamp? I could reach out to various research labs and offer to do free “on the side” work in exchange for my name on papers? Should I do yet another postdoc (don’t really want to) but focus strictly on “dry lab” work?

I do plan on using the hemp NGS data to do various analyses (not sure what just yet), and I can probably sneak that in my resume as real experience. I thought about even building a database to store all of the “cleaned” sequence data for simplified retrieval. Not that I need it, but just to show competency.

If you read all of this, thank you. Any and all feedback is appreciated. I consider my career thus far quite the dumpster fire, but I’m hoping to change that and settle into a somewhat new direction. I’ve also considered data science, but given my degree and experience, I think bioinformatics is a much more natural transition for the time being.

14:03 UTC


Phylogenetic analysis

Hi, I would like to ask how to put a threshold line in phylogenetic tree. I am using UGENE software in constructing the tree.

Please answer ASAP😭 thanks

10:24 UTC


Phylogenetic analysis

Hello, I would like to ask on how to put a threshold line in phylogenetic tree?

Please answer ASAP😭

10:21 UTC


Looking for genes with enriched numbers of binding sites for specific transcription factors - stats help needed!

I've got an ATAC-seq data set, and have identified motifs for my TF of interest in open regions. I've got a set of regions that are open only in my experimental group, and want to see which genes nearest to open sites in this group have more TF motifs than expected from background, which is the number of sites on all peaks open in control and experimental cells. I've tried binomial p, but the data isn't binomially distributed and so I get artefacts like huge genes with a single site coming up as significant (and MiRNAs). I'd appreciate any advice about how to proceed. Thanks!

10:18 UTC


87% of my reads are from phages as predicted in Kaiju and GOTTCHA2

I currently have shotgun metagenome data. I quality-filtered reads at Q30 and employed Kaiju and GOTTCHA2 using default parameters.

My sample is marine water. And yes, I know phages are more abundant than bacteria. This is my first time seeing reads-based taxonomic profile with almost 90% of reads belonging to phages! Is this a cause of alarm? Or it is just phages dominate my sample?

I've handled wastewater samples before which are more known to harbor A LOT of phages but the reads suggested that there are still more bacteria than phages.

I'm still waiting for my metagenome assembly to corroborate whether an assembly-based approach would recapitulate my assembly-free taxonomic profile.

Any comments would be appreciated! Comments on how I may go about, literature to read, or whatever.


07:44 UTC

