/r/bioinformatics

Photograph via snooOG

A subreddit to discuss the intersection of computers and biology.


A subreddit dedicated to bioinformatics, computational genomics and systems biology.

The Biology Network
science askscience biology
microbiology bioinformatics biochemistry
evolution
Bioinformatics

news for genome hackers

Frequently Asked Questions
New to Reddit?
Learning Bioinformatics
#bioinformatics IRC at Freenode
Information
  • If you have a specific bioinformatics related question, there is also the question and answer site BioStar and the next generation sequencing community SEQanswers
  • If you want to read more about genetics or personalized medicine, please visit /r/genomics
  • Information about curated, biological-relevant databases can be found in /r/BioDatasets
  • Multicore, cluster, and cloud computing news, articles and tools can be found over at /r/HPC.
Getting a job in bioinformatics
Friends

/r/bioinformatics

97,587 Subscribers

2

UO Bioinformatics Master’s Program is hosting in-person (travel paid for) weekend event for underrepresented ethnic-minorities in STEM interested in a master’s degree

Hello all! I work for the University of Oregon’s Bioinformatics & Genomics Master’s Program (hello to all our students/alumni on this subreddit), and we are hosting a new in-person program next year, FEB 2-3, that serves underrepresented ethnic-minority prospective students interested in pursuing a master’s in bioinformatics. You should be a junior, senior, or current professional looking to return to school.

Those that attend will learn about the grad program, strengthen their grad applications, and explore Eugene and Oregon through a day trip, among other activities.You do need to apply, but the application is not meant to be time-consuming or a significant barrier, so if anything proves challenging, please message me so I can help. The deadline to apply is Nov. 28th at 11:59 PM. Link is: https://oregon.qualtrics.com/jfe/form/SV_8bKmCTTqrmqWmG2

We will be covering domestic airfare, Eugene transportation, lodging, event meals, and access to agenda-related activities for those accepted into the program.

International students are welcome to apply, but if selected to participate, you must be able to attend without having to secure a visa between now and the event. We will only cover domestic flights.

I’m also always happy to answer questions about grad school, regardless of whether you attend this program or not.

Thanks all!

0 Comments
2023/11/13
17:34 UTC

2

RNAseq help. Strandedness and Counts

Hello everyone.

I got in my hands an RNAseq, with a friend asking if I could give a hand with it, given that my knowledge of bioinformatics is somewhat existant.

Initially I did not get any info regarding the strandedness, but given that they used dUTP in the library construction, I am assuming is stranded. Wha I clearly know is that is paired end.

I checked quality (all good) and proceeded to align. I used STAR, which gave me 97% of uniquely mapped reads. So far so good. Then I decided to use the reads per gene command, in order to try to infer the strandedness. Surprisingly, I got the same value for the counts of unstranded, forward stranded and reverse stranded.

Thinking that it could be a problem from STAR, I tested with featureCounts. Again, I got the same values (very similar to STAR) independently of the -s flag written in the script (0,1,2). In case of featureCounts I added -p and -countReadPAirs, which apparently are both mandatory in the case of pair end samples.

Any idea why I get the same values in each of the three conditions (unstranded, fw stranded and rv stranded) using both softwares ?

Kind regards!

4 Comments
2023/11/13
16:00 UTC

2

Comparison of Molecular Dynamics Programs

I'm just getting into this topic, and I wonder if anyone has strong opinions about the various (free/open source) molecular dynamics software options. For reference, wikipedia has a nice table here: https://en.wikipedia.org/wiki/Comparison_of_software_for_molecular_mechanics_modeling

2 Comments
2023/11/13
14:00 UTC

6

Python

Is python a good first language to learn for bioinformatics? Any other languages you could recommend? Any good books on the topic anyone happens to know of? I’m planning on switching my major to bioinformatics in the spring and wanted to teach myself some stuff first tho. Thanks a lot.

5 Comments
2023/11/13
13:14 UTC

2

Good places for PhD in Europe?

I'm finishing my MsC in Data Science, with a focus on bioinformatics. I plan to be done by september 2024, then start a PhD.

I'm planning to stay in Europe for it, and I've been looking for institutions with good PhD programs in bioinformatics.

So far I've found the obvious ones: EMBL, SIB, DKFZ and university of Barcelona.

Does anyone know of any other institutions with good programs in Europe? Thanks!

0 Comments
2023/11/13
12:47 UTC

2

Biopython class project

Hi r/bioinformatics,

I’m looking for ideas for a basic coding project which displays the capabilities of biopython. My skill level is about 3 months of learning python.

I’d like to find an interesting project related to bioinformatics, and was hoping someone with more experience could offer some guidance. Any help is appreciated!

Thanks in advance

0 Comments
2023/11/13
11:52 UTC

2

Snakemake rerun rules when one input changed but the outputs are here.

Hello,

I recently encountered an issue with Snakemake and I'm seeking some advice to better understand what went wrong.

Context:

  • I have a Snakemake rule for adapter removal.
  • Initially, there were three samples processed using this rule.
  • I added a fourth sample via the configuration file.

Issue:

  • Despite having output files already present for the first three samples, when I ran the rule to process the newly added fourth sample, Snakemake unexpectedly reprocessed all four samples.
  • Snakemake cited a 'missing output file' as the reason, but the output files for the initial three samples were indeed present.

Additional Information:

  • Normally, I use a dry run, but this approach does not update the Conda environment.
  • I used cluster mode with sbatch. Could this have caused the output files not to be recognized by Snakemake?
  • I may have pulled an older version of my Snakemake repository before running sbatch.

Questions:

  1. Is there a way to make Snakemake verify the presence of all output files before executing a rule? For instance, it indicated a missing output file, but ls confirmed its existence beforehands.
  2. Does Snakemake use temporary files to track previously generated files? If so, can we force it to recognize all existing output files before running?
  3. Could the addition of a new sample trigger the reprocessing of all samples, even if the previous ones were completed?
  4. Or is it because --cluster "sbatch didn't flag outputs upon completion?
  5. Or is it because I run snakemake in a different conda environment than usual?

Maybe I should have commented Expand rule all of the files that were already created..

I appreciate any suggestions or advice on how to avoid this issue in the future. Thank you for your help in understanding this better.

11 Comments
2023/11/13
11:11 UTC

1

Research topic for Masters degree in Bioinformatics

Anyone has a solid background in Biology and knows what topic may I choose for my masters thesis that could be solved by computational approaches?

1 Comment
2023/11/13
09:38 UTC

7

Has anyone done their masters in Bioinformatics online?

I am currently looking into different Bioinformatics (or similar) master programms and I am thinking about doing an online version. Has anyone some insight?

5 Comments
2023/11/13
08:59 UTC

31

What do bioinformaticians do in their day-to-day jobs?

I'm starting to be slightly skeptical about what my role would eventually be in the professional world. A lot of our cohort (easily 40%+) have switched to software engineering/computer science because it seems broader and much more lucrative.

I haven't switched but I work full-time as a software developer for a company, while simultaneously studying bioinformatics.

I'm starting to second-guess myself and I'd like to know what would be the average day-to-day tasks of a bioinformatician? An example of a work pipeline would be great to demonstrate.

In software for instance,

  1. I get assigned a ticket that's requesting a bug fix or a new feature
  2. I find the repository where the changes are to be made
  3. I implement code to fix the bug or implement the feature, as well as test it
  4. I have my team double-check my changes
  5. Once approved, I push those changes to the cloud and production

What would be the equivalent for a bioinformatician?

11 Comments
2023/11/13
07:05 UTC

8

What do you think about working without conda?

I am currently re-setupping my genome assembly environment I used in college, and having trouble installing softwares in conda environment.

I remember conda being relatively easy to use(4years ago?), but now I am somehow stuck in a troubleshooting loop and dependency hell.

Now I am about to give up and thinking about installing everything without package manager.

Am I making a terrible mistake? I don't think it is sustainable in the long run to only able to have single environment, but I don't seem to have a choice here.

21 Comments
2023/11/13
05:59 UTC

2

G4 tertiary structure

Is there any tool that can produce pdb files of g quadruplex structure from sequence information?

1 Comment
2023/11/13
03:56 UTC

2

Entry-level positions and when to look for them

Hey guys, I just wanted to share my situation here and pose some specific questions I have about entry-level bioinformatics work.

I'm in my final year of my undergraduate degree at a large university in Canada, pursuing a major in Microbiology and a minor in Statistics. Experience-wise, I did a 4-month research project with a prof in the spring looking at RNA-seq data, spent the summer doing dry-lab work for my university's iGEM team (mostly protein structure predictions, protein-ligand docking, things like that... it was mostly self-taught and I wouldn't consider myself proficient on this), and now I'm starting a project doing some variant calling (with the same prof as previously). I graduate at the end of April 2024 and plan to move to the U.S. (assume there's no visa hassles here... I have work authorization/SSN/all that and won't need any sponsorship).

My end goal is to apply for masters/PhD programs in winter 2024 (so for 2025 intake), but I'm hoping to get in some work experience in the field in that year after finishing my undergrad. After finishing my masters/PhD, my end goal is to find a job in industry. So here go the questions:

  1. I feel that I can find my way around a ton of niche tasks with some googling (analyzing differentially-expressed genes, variant calling, etc) but have so much I don't know. How much do entry-level positions usually expect you to know coming in to the job?

  2. When would be the right time to start applying for jobs? I know a couple friends in engineering who are already looking for summer co-op positions, so that just got me thinking... so far, I've done little in terms of my job search and mostly been saving it for the spring.

  3. Would it be worthwhile to try and find an internship for the spring semester before I graduate? I have loads of free time in my schedule in the spring, so I think I should be able to handle it alongside my courses. The reason I worry about this is because I have had zero exposure to computational work in industry at all so far.

  4. Is there any benefit to looking for RA work with labs I am interested in doing my masters/PhD in down the line, or do PIs generally not get a say in graduate admissions? I've heard so many differing answers for this question, so I figured I'd ask again. I'm also looking for both industry and academia jobs for that time before grad school, though I'm sure that the former is substantially easier to find.

  5. Any tips/suggestions overall?

0 Comments
2023/11/13
03:32 UTC

0

Help with mass downloading specific gene sequence of species

Super new to bioinformatics and trying to do a MSA of a common gene found in closely related bacterial species from NCBI for a research project. However, it seems like the only way to do so is by downloading the entire genome sequence which is tons of gb of memory. Theres the option to download a single reference sequence of the gene, but doing that for each sequence will take tons of time.

Any recommendations to work around this? Hoping to align at least 100 sequences together. Also any advice on material to read or online classes would be greatly appreciated!

4 Comments
2023/11/13
00:31 UTC

1

A question about creating an integrated platform to make predictions using big bio-data.

Hi Folks, As a bench scientist, programming and data science are all Greek to me. But, I frequently use several bioinformatics tools on webpages such as BLAST, Gene ontology, AlphaFold and PTM prediction tools to inquiry any gene of my interest in a research context. I intend to create a software run by AI/ML-assisted text mining/natural language processing (NLP) algorithms (!) to be able to customize and integrate the separate prediction tools in one place. My overall goal is to get more accurate predictions to design my wetlab experiments. It might be either website or installable program format utilizing the publicly available transcriptome and proteome data to generate such predictions. I’m wondering what kind of skill set this project requires to create such tool embedded in a website or downloadable software application? I’m planning to reach out to a few experts to get their help and guidance, but I don’t know which field guys would be correct people to talk about this project, like AI expert, bioinformaticians, software developers or ML guys. I even don’t know whether such tool needs AI and ML or text mining algorithms, but it seems like AI may help putting all the data together in a format that I can understand. Any advice would be greatly appreciated. Thanks.

5 Comments
2023/11/12
21:27 UTC

2

Modeling biological systems / drug discovery help + major help?

Hello all, I’d like to preface this by saying I hope I’m not breaking any of the rules. As the title states, I’d like to learn more about these specific fields within Computational biology. I’m currently a freshman potentially thinking of double majoring in molecular biology and statistics lr applied math with a minor in computer science. I know the topics listed above are broad, so that’s why I’d like to learn more about them and see what would be more applicable (applied math / statistics). I’m very interested with the idea of interpreting data for drug discovery or modeling biological systems. Does anyone have any insight on these fields? And maybe what would be the best idea to pursue these fields or how best to mix all of these majors to pursue these. I’m okay with grad school and further education.

Thank you

0 Comments
2023/11/12
20:55 UTC

2

How can I get a curated FASTA file for genomes outside of BioMart?

Hey all,

I am an undergrad who is fairly new to bioinformatics, and was wondering of some good alternatives to BioMart. BioMart is great and does most of what I want, especially with customizing what exactly I want in my output.

That said, I am working with another lab on a project and I am looking for a way to query a FASTA file containing sequences of all of upstream flanks for every gene within a certain bacterial genome, Streptococcus agalactiae COH1. I checked EnsemblBacteria, and didn’t have any luck there.

What are some other ways I can find what i’m looking for? If it comes down to it, I am comfortable working in front of a command line. I know the genome fo S. agalactiae COH1 is available on GenBank, but I have not needed to use GenBank for this kind data curation before (as I mentioned, i’m used to tools like Biomart doing all of the formatting stuff I request for me)

Thanks in advance!

5 Comments
2023/11/12
20:51 UTC

1

Seeking In-Depth Advice: Drawing MD Simulation with VMD, Avogadro, or Alternatives

Hello, community! I'm currently working on a molecular dynamics (MD) simulation using LAMMPS, and I'm looking for comprehensive guidance on visualizing and drawing the simulation results. Specifically, I'm interested in using VMD, Avogadro, or any other software that you would recommend.

I've attached an example image of my simulation results , and I'm aiming to achieve a similar visualization in the chosen software. Any tips, tutorials, or personal experiences you can share would be greatly appreciated!

https://preview.redd.it/vlw9pfy6wyzb1.png?width=365&format=png&auto=webp&s=81dcd7d212abeb522013d9a27901a45e559e84db

Thank you in advance for your help.

2 Comments
2023/11/12
19:21 UTC

28

WGCNA gone missing

Where did the Horvath lab site at UCLA genetics go?

I'm a new user of WGNA and am interested in comparing and contrasting networks. I know there were several tutorials linked to the Horvath lab site at UCLA. However, the site has been suspended for a couple weeks and I am wondering if anyone by chance knows where the tutorials have been moved to. Did this amazing resource just drop off the planet??

6 Comments
2023/11/12
19:07 UTC

8

Comparing blastn, tblastx, and blastp E-values

Hey there, I just started taking a bioinformatics course a few weeks ago and I've been thinking about this situation almost all day. Let's say that we have a nucleotide sequence, and a protein sequence of a gene (let's call the nucleotide sequence NucA, and the protein sequence ProtA).

If I blastn the NucA, tblastx the NucA, and blastp the ProtA sequences, how would the E-values compare? Now I understand that protein sequences are more conserved due to the third position in codons, thus allowing a lower E-value for blastp alignment results (and causing higher E-values for blastn as even a single silent mutation causes a decrease in identity score). However I couldn't really understand how the tblastx and blastn would compare, so I'd like to hear your thoughts about how blastn, tblastx, and blastp E-values compare.

This is not for homework or anything, I'm just really curious how these compare, and this is the only place I could ask this question as no one I talk to has a clue about it.

Thanks in advance!

1 Comment
2023/11/12
14:19 UTC

3

Lipidomics Data Analysis

Hello everyone,

Does anyone have any experience with lipidomics data analysis? We are going to send some extracellular vesicles off for lipidomics sequencing soon and I will need to perform the analysis.

Also, does anyone have any consensus standard pipelines for this analysis?

Thank you!

0 Comments
2023/11/12
13:34 UTC

13

Alternative PhD Routes

Hello everyone,

I am in a tough spot…

Currently, I am getting my MS in Bioinformatics. I love science and research and I love working in bioinformatics. I am currently in a research lab and have been for nearly 3 years. I am in my 4th semester of a masters degree as well. I am about to engage in some very interesting and important, impactful research in my lab of which most was my idea and the experiments were of my design.

I am in a situation unfortunately, where I want very badly to pursue a PhD. But, I cannot as I have life responsibilities. My wife and I bought a house, of which we cannot afford if I take a large pay cut to work with a PhD stipend. We also have other bills of which I will not get into but…

My question to you all, is have you seen anyone or have you yourself acquired a PhD through non-traditional routes? I have read about people getting PhDs sponsored by their company working in the Biotech space where they are able to keep their current job, but never have I met anyone who has done this. My understanding is that I would need to stay with pen company through the duration of my PhD.

Any and all advice would be welcomed. Thank you!

9 Comments
2023/11/12
01:33 UTC

1

PharmD to bioinformatics

Im doing my pharmD but as a student researcher I fell in love with GWAS and biostatistics. Do you think it is enough to do a PhD in bioinformatics to work as a researcher in a pharmaceutical company or am I supposed to do a bsc/msc in bioinformatics after graduating from pharmacy school?

4 Comments
2023/11/11
20:27 UTC

1

what courses should I take during undergrad for molecular dynamic work?

I'm a Freshman looking to major in CS and to pursue molecular dynamic work in grad school. The general consensus here is that you should CS major + bio minor / vice versa, but does this remain the same for molecular dynamic work? Should I not minor in bio and instead take extra courses in physics? I've been told by some of my professors that physics is a crucial part of the field (and CS if you're developing MD tools), and that the biology can be picked up on. Currently, I'm in a research lab not focused on MD but related to biophysics and machine learning.

MD work doesn't seem to be the main focus here on this sub, so I wanted to clarify.

4 Comments
2023/11/11
20:19 UTC

2

Leveraging findings from scRNA-seq data by using bulk RNA-seq data

Hi, Are there any methods to leverage findings from scRNA-seq data by using bulk RNA-seq data? I heard from someone at the ASHG meeting that we can extrapolate some results from large bulk RNA-seq datasets to interpret findings from small scRNA-seq datasets. In fact, I have scRNA-seq datasets of 15 disease cases and 15 controls, in addition to bulk RNA-seq datasets of 300 disease cases. My scRNA-seq datasets are relatively small, so I would like to leverage any scRNA-seq insights by using bulk RNA-seq data. Thank you very much!

3 Comments
2023/11/11
20:01 UTC

2

TF Genes Promoter issue

Hi folks, I wanted to investigate the presence of specific sequences (direct repeats, inverted repeats) in IUPAC code (e.g. CGKTCANNCGKTCA) inside the promoter region of a list of transcription factors. How would you set a script for this task? How would you deal with genes that are on the negative strand? Would you use reverse complement sequences of genes flanking 5kb upstream or would you reverse complement the pattern to find? How would you approach to this task? And, would you rather use R or Python programming?

4 Comments
2023/11/11
17:20 UTC

9

What to do between Bachelors and Masters - best use of time to gain experience?

Hello all!

I graduated with a bachelors degree in computer science in the spring. Upon graduation I worked as a cybersecurity analyst for a fortune 500 company, and I am now in a position as a "research professional" in an unrelated field (what I do in my job has no relevence to cs or bioinformatics...not going to get into that ).

I have decided to pursue bioinformatics as I have been unhappy with the tech industry which I initially
jumped into because I landed a high-paying job right out of undergrad. The $$ was not enough to keep me happy, and I want to pursue an advanced education and work in a more scientific/research-oriented role. I love CS, but want more meaningful applications of my CS skills. I love biology and there are many areas of bioinformatics where I would be happy applying my skills.

I am trying to swiftly get on a track towards a bioinformatics career. I am currently in the process of applying to masters programs in bioinformatics for fall 2024. My question is what would be the best use of my time in the meantime? I do not want to stay at my current job as it really has no relevance to what I want to do.

What do you think of these two options:

(1) Volunteer in a bioinformatics lab for research experience starting ASAP (if the opportunity for a paid position arose obviously I'd jump at that).

(2) Do a semester of independent studies with coursework more relevant to bioinformatics since my undergrad was a pure CS degree- and then try to get into a lab next summer for undergraduate research. I have been accepted to a university for this kind of study where I would take these courses - calc II, linear algebra II, 200-level molecular and general genetics, 200-level probability, 300 level biostatistics, 400 level computational biology

Thank you for any input! Additional ideas/advice is welcomed.

5 Comments
2023/11/11
17:11 UTC

2

Low number of DGEs in pseudobulk scRNA

Hi all,

Without giving too much away, I have 3 drug treated samples and 3 untreated samples subjected to single cell sequencing (from mice)

I ran a standard pipeline, removed doublets, and trained a SCVI model. I used to scvi to do differential expression of treated vs control on a cluster of about 1000 cells and noticed there was no significant differential expressed genes.

Then to confirm this I did pseudobulk and ran differential expression with deseq2 and it identified something like 5 upregulated and maybe 10-20 downregulated. Other clusters with deseq2 also had some with no differences.

I guess my question is is there a good way to confirm this? I would be surprised that there are no differentially expressed genes, even without any treatment I would be surprised if there was that much consistency between samples.

9 Comments
2023/11/11
15:40 UTC

Back To Top