/r/bioinformatics

Photograph via snooOG

A subreddit to discuss the intersection of computers and biology.


A subreddit dedicated to bioinformatics, computational genomics and systems biology.

The Biology Network
science askscience biology
microbiology bioinformatics biochemistry
evolution
Bioinformatics

news for genome hackers

Frequently Asked Questions
New to Reddit?
Learning Bioinformatics
#bioinformatics IRC at Freenode
Information
  • If you have a specific bioinformatics related question, there is also the question and answer site BioStar and the next generation sequencing community SEQanswers
  • If you want to read more about genetics or personalized medicine, please visit /r/genomics
  • Information about curated, biological-relevant databases can be found in /r/BioDatasets
  • Multicore, cluster, and cloud computing news, articles and tools can be found over at /r/HPC.
Getting a job in bioinformatics
Friends

/r/bioinformatics

121,278 Subscribers

1

Building Singularity containers on Mac os with Apple Silicon

Hello everyone! I want to get some advice from anyone who has experience in building Singularity/Apptainer x86 containers for HPC on Mac OS with ARM processors. Does it work well consistently? How do you do it? I suppose one of way would be via conda (x86_64 env) with Singularity/Apptainer package.

To provide a context, I’m deciding what laptop I would ask my PI to provide me. In my lab I’m in-charge of all the analysis that requires HPC which includes building containers for some of the pipeline processes. I’ve been doing it on Windows + WSL2 and so far so good. The issue is that at my current workplace, Windows devices has additional limitations placed by IT such as enforcing bitlocker on removable drives which makes it almost impossible for me to share files with my other lab members who are all using macs. Additionally, I would not have admin rights on Windows laptop provided by the institution. Thus, running WSL2 might be an issue? I’m not sure. Therefore, I’m considering Mac as my next laptop. This is not a ‘which laptop to get’ question per se but rather is mac os a good platform for Singularity/Apptainer development or bioinformatics in general.
Alternatively, I could also get a mbp + linux desktop which solves all the problem. However, I would prefer to be able to do my work on-the-go which a linux desktop would hinder that.

Thank you!

0 Comments
2024/11/18
03:03 UTC

11

AFusion: A Graphical User Interface for Simplifying AlphaFold 3 Predictions

I am pleased to introduce AFusion, an open-source graphical user interface (GUI) designed to streamline the use of AlphaFold 3 for protein structure prediction. AFusion aims to make the setup and execution of AlphaFold 3 more accessible, particularly for researchers who prefer an intuitive interface over command-line operations.

Key Features:

  • User-Friendly Interface: Configure job settings, sequences, and execution parameters through a clean and organized GUI.
  • Entity Management: Support for multiple entities, including proteins, RNA, DNA, and ligands, with options for modifications and templates.
  • Automated JSON Generation: Automatically generates the required JSON input files for AlphaFold 3 based on user inputs.
  • Integrated Execution: Run AlphaFold 3 directly from the GUI with customizable Docker settings.
  • Visual Feedback: Monitor command outputs within the interface to facilitate tracking and debugging.

Project Overview:

AFusion is developed to reduce the complexity associated with setting up and running AlphaFold 3 predictions. By providing a graphical interface, it allows researchers to focus more on their scientific inquiries rather than the technical details of command-line usage.

How to Use AFusion:

  1. Installation:Install AFusion using the Python package manager:

pip install afusion
  1. Launching the GUI:Start the AFusion application by running:This command will launch the AFusion interface in your default web browser.

afusion 

Useful Links:

Future Developments:

  • Integration with Alphafold-analysis: Incorporate detailed result analysis tools to enhance post-prediction evaluation.
  • Preset Options for Small Molecules and Metal Ions: Provide built-in configurations for common small molecules and metal ions to simplify their inclusion.
  • Enhanced Modification Support: Expand customization tools for covalent modifications and other sequence alterations.

I encourage interested researchers to explore AFusion and provide feedback or suggestions for further enhancements. Collaboration and community input are valuable for refining this tool to better serve the scientific community.

For any issues or contributions, please visit the GitHub repository linked above.

Happy folding! 🧬

1 Comment
2024/11/18
01:32 UTC

3

Interpreting Pathway 7049: Fatty Acid Salvage in PICRUSt2 Results from Nephele

Hi everyone,

I ran PICRUSt through Nephele to analyze functional pathways in my microbial community data. In the results, I noticed that Pathway 7049: Fatty Acid Salvage appears among the pathways with the highest fold change (as shown in the attached screenshot).

Does this indicate that Fatty Acid Salvage is more activated in one group compared to the other?

Is there a difference between fold change and log2 fold change, or are these terms used interchangeably in the context of pathway analysis?

Thank you for your help!

https://preview.redd.it/nmnhfwselh1e1.png?width=1570&format=png&auto=webp&s=3e500397b60913fdeaf6bfb3a221d174398ea40f

1 Comment
2024/11/17
16:16 UTC

10

Experience basecalling legacy ONT data

I am working with an investigator planning to direct RNA-seq a few hundred samples on a PromethION instrument.

The investigator wants to archive the raw signal data for long-term storage after basecalling and methylation analysis are complete, with the intention of basecalling and performing the methylation analysis again in the future (4-5 years later).

I am curious if anyone has worked with a group that did something similar and if it was worth it in terms of storage and compute costs, time, and data quality or scientific benefit.

9 Comments
2024/11/17
13:49 UTC

0

Where to search for origin of replication in a fasta file?

I'm trying to find the origins of replications of several closely related viruses, and would like to know which sites do I have to look into to identify the original sequences

1 Comment
2024/11/17
06:01 UTC

2

Modkit and beta values

Hi, I'm quite new to the field of bioinformatics, and I have a question about my understanding of a tool. Regarding modkit pileup, if I enable the options --cpg, --ignore-h, and --combine-strands, would I get a BED file where the beta methylation values for each CpG are in column 11, represented as values between 0 and 100? Or is this value interpreted differently?

1 Comment
2024/11/17
03:36 UTC

2

fastq-screen output on scRNA-seq library

I am struggling to interpret the output of a fastq-screen run on the read 1 of a paired end library from a commercial split-pool protocol for single cell RNA-seq.

Organism is mouse.

What can I say about it? Can I conclude that ribosomal RNA is affecting a good number of reads?
Thanks a lot

https://preview.redd.it/9496vuh8p91e1.png?width=1858&format=png&auto=webp&s=81c021443c95c4a7b69b1e3a7010c866dd69538f

2 Comments
2024/11/16
13:43 UTC

0

NanoPore Data Pipeline Help

Long story short, I am not a bioinformatician yet I have done RNA-Seq and enrichment analysis on R before. I am involved in a project where I need to analyze same species genomic variation in a plant. I am a complete beginner with bash and I need help with, well, basically anything. What would you recommend?

1 Comment
2024/11/16
12:19 UTC

1

【Joint tissue snRNA-seq】Should I make cell suspension before isolate the nuclei?

Hello everyone,

Our lab has decided to do snRNA-seq to study a live mouse joint that contains a diverse range of cell types, including hard and soft tissue, cartilage, neurons, etc.. We want to check changes across all these cell types after treatment.

Existing protocols all have options to isolate nuclei from cell suspension or from tissue directly. I've been advised to minimize cell processing time and disruption, so isolate directly from tissue seems to be the move.

However, since these tissues are so distinct, I’m wondering:

  1. Could "cooking" everything together lead to biased results, where nuclei from certain cell types are underrepresented? (Like from cell suspension we at least have chance to take a look at the composition or get rid of the dead cells)
  2. Are there specific techniques or tips to ensure successful or less biased nuclei isolation across all cell types in this scenario?

I am new to this technique, so I’d really appreciate any advice, insights, or tips from those with experience in snRNA-seq. Thanks in advance for your help!

3 Comments
2024/11/16
02:25 UTC

40

Why is it standard practice on AWS Omics to convert genomic assembly fasta formats to fastq?

The initial step in our machine learning workflow focuses on preparing the data. We start by uploading the genomic sequences into a HealthOmics sequence store. Although FASTA files are the standard format for storing reference sequences, we convert these to FASTQ format. This conversion is carried out to better reflect the format expected to store the assembled data of a sequenced sample.

https://aws.amazon.com/blogs/machine-learning/pre-training-genomic-language-models-using-aws-healthomics-and-amazon-sagemaker/

https://github.com/aws-samples/genomic-language-model-pretraining-with-healthomics-seq-store/blob/70c9d37b57476897b71cb5c6977dbc43d0626304/load-genome-to-sequence-store.ipynb

This makes no sense to me why someone would do this. Are they trying to fit a round peg into a square hole?

34 Comments
2024/11/15
22:10 UTC

7

DE analysis-alternative test (Seurat)

Hey everyone,

I was wondering in what cases based on your experience have you decided to use the MAST test in the FindMarkers function in Seurat. I ask this because i am currently facing a dilemma where they are more hypoxia cells in my B cell type compared to normoxia. Yet, I would like to make a comparison between these oxygen groups in the B cell type. Is this scenario a to use the MAST test? Or the wilcoxon rank sum test(default) is sufficient?

6 Comments
2024/11/15
19:01 UTC

2

Help Setting up GSEA

I'm a PhD student in psychopharmacology, with no expertise in bioinformatic. I was given access to a few bulk RNA-seq datasets which are related to my work. DGE analysis found very few significantly DEGs, when FDR corrected (there are only 3 animals per condition) and I've been trying to see if I can make sense of the data.

I came across GSEA, and conceptually it makes sense to me that it would be useful in this setting. However, I have a question as to how exactly go about performing it (for reference I'm using WebgestaltR). Specifically, my question is about what data to include in the analysis. Do I include all the genes detected, even those with uncorrected p > 0,05? Do I include all the genes independently of Log2FC? Are there any criteria/cutoffs?
I've read that you should input the entire dataset, but it seems weird to me to introduce genes which have p = 0.8 into the analysis, for example?

Any input would be greatly appreciated!

10 Comments
2024/11/15
17:01 UTC

0

Manta issue not resolved

Hi guys,

I was running manta (SV caller) on some data and it worked fine. I then tried on another set of data, and it gave me this error (reported some time ago) https://github.com/Illumina/manta/issues/168. I tried all the things they suggested but it still didnt work. What do you suggest? Any experience with this tool?

2 Comments
2024/11/15
15:16 UTC

22

Where do I go from here?

I finished a degree in Biology, developing a rly great liking to bioinformatics. I like looking at genetic sequences comparitively and i like coding...

I feel lost because I feel hopeless looking and applying for jobs and really dont know how to look for experience or internship... is there anything out there that allowed you to go through a programme of like a year or however long that let you learn and experience the job? like how people who want to work in the animal industry can go to africa for a couple months (very different example but hopefully this makes sense..?)

14 Comments
2024/11/15
13:13 UTC

15

integrating R and Python

hi guys, first post ! im a bioinf student and im writing a review on how to integrate R and Python to improve reproducibility in bioinformatics workflows. Im talking about direct integration (reticulate and rpy2) and automated workflows using nextflow, docker, snakemake, Conda, git etc

were there any obvious problems with snakemake that led to nextflow taking over?

are there any landmark bioinformatics studies using any of the above I could use as an example?

are there any problems you often encounter when integrating the languages?

any notable examples where studies using the above proved to not be very reproducible?

thank you. from a student who wants to stop writing and get back in the terminal >:(

37 Comments
2024/11/15
11:48 UTC

5

Any tool to predict effect of protein variations?

Hello, I am currently working on studying the variations within structural proteins of a virus. I have performed multiple sequence alignment on all entries available on the GenBank and found out the variations. I have also its interactions with specific human proteins.
Now task ahead of me is to find out if these changes make the virus more virulent or less pathogenic. Is there any tool to predict the same?
Thanks.

13 Comments
2024/11/15
10:11 UTC

1

Alternative to AMOScmp for contig assembly?

I am trying out reference-guided de novo assembly of Illumina reads using the protocol published by Lischer and Shimizu (BMC Bioinformatics, Volume 18, 2017). So basically, I have aligned the reads to a reference genome, and based on coverage, I have defined blocks and superblocks (areas across reference genome with continuous read coverage). Then I have performed de novo assembly within each superblock, and generated a set of contigs for each superblock.

Now of course there will be some redundancy within the resulting contigs. The paper has mentioned the use of AMOScmp v3.1.0, a homology-guided Sanger assembler for assembling the resulting contigs to output a set of supercontigs.

Unfortunately, try as I might, I am unable to install AMOScmp. I was wondering if there is any alternative software that I can use for this step. Any help would be appreciated!

0 Comments
2024/11/15
04:49 UTC

1

Sex determination from SRA

is there anyone who would be able to give me a WGD-sex determination from the SRA data?🙏🏻🙏🏻🙏🏻 or a programm to try it Thank you so sooooo much!

10 Comments
2024/11/15
04:23 UTC

0

issue with nuc.div in R ape.

Hi,

I have an aligned DNAbin of ~30k sequences and when I try to determine the nucleotide diversity using nuc.div in R, the output is NaN. But if I use a subset of the sequences, I am able to get a value.

I don't understand why this is happening and was not able to find any solutions online. I thought there might be some sequences which are causing an issue, so I evaluated nuc.div of various subsets to see which sequences are causing this issue, but was not able to find such sequences.

Any help is appreciated on how to approach this issue. Thank you in advance.

4 Comments
2024/11/15
02:48 UTC

2

Looking for candidate genes from biological processes highlighted by GSEA GO analysis

I’ve been tasked with identifying candidate genes related to biological processes that have been highlighted in Gene Ontology (GO). What would be the best way to approach this?

o far, I’ve selected genes associated with the relevant GO terms and performed a simple correlation with a disease-related score. I then selected the genes that showed significant correlations.

is this the correct approach?

3 Comments
2024/11/14
13:45 UTC

80

Wouldn't it be lovely if every paper had a big honest section explaining the limitations of the method/study

Imagine of every nature methods paper had a nice section explaining the limitations of their methods compared to others. It would make for such a healthier research. I see it's a bit more of a thing in cell press. It would help the field grow a lot more.

28 Comments
2024/11/14
11:33 UTC

4

What do you use to clear up Sanger sequencing data?

Hello there,

In our lab, we have a shared licence (with a colleague at another university) for CodonCodeAlligner. We use it to allign raw data from Sanger sequecing (.ab1 files), edit ambiguous positions and export them as fasta to use in downstream analyses.

Long story short, the other colleague is experiencing an issue with the computer than needs to be operating for us to be able to use the licence, and we are stuck without a subscription. Our PI called the resource allocation department to get a quote on the timeline for us to get a licence, and they told him it's gonna take months for it to be approved and implemented + we need a quote from the software company itself to even get started.

What other software do you use for this job? I am aware of Geneious prime and how the restricted/free version can allow us to allign and view chromatographs, but not edit them. We thought of using it to view the chromatographs and edit the fasta files manually (through megax for example), but it seems too much of a hasste. What alternatives do you have to offer?

6 Comments
2024/11/14
11:00 UTC

2

How To Clip Multiple R-Groups in MOE at the same time

Hi people,

I am currently working on creating a combinatorial library in MOE (molecular operating environment). For that, I have a list of Clip Reactions to use on my database of R-groups. In MOE, I saw the panel to select one clip reaction and run it on my database under Compute > QuaSAR > Combinatorial Library... However, the list of reactions I want to run is relatively long, so I would like to do it in one go.

Does anybody here know if this can be properly implemented in an SVL script or manually done in MOE?

Thank you in advance.

1 Comment
2024/11/14
09:55 UTC

13

Proteomics in R

Hi everyone. I am currently a PhD student trying to analyze some proteomics data for my project. As I am fairly unexperienced with using R, I tried my hand on BIOMEX, a free software from the Carmeliet lab that analyzes omics data. I got some good results but I was losing a lot of features when I entered differential analysis. So, to in the hopes of having my data well analyzed, I tried my hands on R, mainly with the DEP package. To my surprise, the number of significant proteins plummeted, so I ended up with a bigger problem than I originally had.
Has anyone had experience with such problems and how did you solve them?
Thank you in advance.

6 Comments
2024/11/14
09:28 UTC

2

What is the difference between survfit(Surv(...)) and cuminc(Surv(...))? Can they both handle competing risk in survival analysis?

Assuming the event variable is coded 0 = alive (censored) 1 = died from cancer 2 = died from other causes, can survfit(Surv(...)) correctly handle competing risk? If not what is the difference between the two? Similarly, what is the difference between crr() from tidycmprsk package and coxph() for handling competing risk? Does it come down to Cause specific vs Subdistribution hazard?

0 Comments
2024/11/14
06:39 UTC

0

some questions about CHR_HG2247_PATCH

hello, i am a bioinfo student. I wanna to know which reference genome this chr belongs to.

I search https://genome.ucsc.edu/cgi-bin/hgSearch?search=HG2247&db=hub_3671779_hs1 but get nothing.

I want to map the 3'utr region which some of them belong to CHR_HG2247_PATCH to reference genome to find the seq. Maybe there are some other methods to finish that or can i just ignore them?

2 Comments
2024/11/14
06:32 UTC

16

Benchmarking Polygenic Risk Scores: A Tool for Your Research

Dear All, I’ve been benchmarking Polygenic Risk Scores (PRS) and thought I would share my findings and tools with the community. If you're working with PRS tools or risk score prediction for datasets like UK BioBank, I believe this repository could be incredibly useful for your research. Documentation Link: https://muhammadmuneeb007.github.io/PRSTools/Introduction.html Code Link: https://github.com/MuhammadMuneeb007/PRSTools Cheers,

2 Comments
2024/11/14
06:23 UTC

2

Determining the quality of assembly results

Im a newbie to the bioinformqtics world, so I need help here. I ran spades on scorpion genome data, my reads were 150 bps. And here is the report of the results I've obtained: Statistics without reference contigs 3355 No. contigs (>= 0 bp) 25263 No. contigs (>= 1000 bp) 1340 Largest contig 18850 Total length 4804404 Total length (>= 0 bp) 10334389 Total length (>= 1000 bp) 3484807 N50 2063 N90 593 auN 3176.5 L50 573 L90 2467 GC (%) 32.83 Mismatches No. N's per 100 kbp 67.02 No. N's 3220

Can someone please interpret these? I'm kind of getting lost in the technicalities of it all

7 Comments
2024/11/14
06:01 UTC

2

Open Science / Open Source [Platforms, Tools, Infrastructure] for Cancer and Rare Disease Patients?

Folks, curious, who is building Open Science / Open Source stuff for Cancer and Rare Disease? Specifically, tools, platforms and infrastructure that patients can use?

We could definitely use more effort in this space!

17 Comments
2024/11/13
23:15 UTC

13

variant calling from amplicon sequencing data

deleted

5 Comments
2024/11/13
22:08 UTC

Back To Top