/r/bioinformatics

Photograph via snooOG

A subreddit to discuss the intersection of computers and biology.


A subreddit dedicated to bioinformatics, computational genomics and systems biology.

The Biology Network
science askscience biology
microbiology bioinformatics biochemistry
evolution
Bioinformatics

news for genome hackers

Frequently Asked Questions
New to Reddit?
Learning Bioinformatics
#bioinformatics IRC at Freenode
Information
  • If you have a specific bioinformatics related question, there is also the question and answer site BioStar and the next generation sequencing community SEQanswers
  • If you want to read more about genetics or personalized medicine, please visit /r/genomics
  • Information about curated, biological-relevant databases can be found in /r/BioDatasets
  • Multicore, cluster, and cloud computing news, articles and tools can be found over at /r/HPC.
Getting a job in bioinformatics
Friends

/r/bioinformatics

106,155 Subscribers

2

A simple tutorial/course for spatial transcriptomics

Hello everyone,

I am writing to you since I am approaching right now to R and spatial transcriptomics. I am pretty confident with Python (since I used it for AI and DL), but I want to face this topic with R. I would like to learn more and more.

Can you kindly suggest a simple step-by-step tutorial/course/video that can help me to learn spatial transcriptomics analysis, please? I tried to follow:

https://bookdown.org/sjcockell/ismb-tutorial-2023/practical-session-2.html
https://satijalab.org/seurat/articles/spatial_vignette
https://bioconductor.org/books/3.13/OSCA.intro/the-singlecellexperiment-class.html (this one for single cell)
https://www.youtube.com/watch?v=L_7VdCeJ4Z8&list=PLOLdjuxsfI4N1SdaQQYXGoa5Z93hPxWVY&index=8

but I am finding them a bit difficult. I know, it is my fault, I just wanted to know if there were something simpler or if these links are the state-of-the-art for spatial transcriptomics tutorial.

Thank you so much in advance.

0 Comments
2024/04/05
14:44 UTC

6

First Job as Bioinformatician

Hi everyone,

I'm about to start a new job in the field of metagenomics. I have microbiology background, also my thesis was about symbiotic microbiomes.

I will be really grateful if you could share some tips, skills or paths to learn as a bioinformatician. I am really happy to have this opportunity, and don't want to mess it up in the future.

Thank you all in advance.

13 Comments
2024/04/05
11:42 UTC

1

Help with Genomic Coordinates of mutation

Hello,

I have a list of mutation hotspots (such as Gene name : NOTCH1 and change is Amino Acid: L1574). I have the data in the above mentioned format for different genes. Is there any way to get the genomic Coordinates and the exon they belong to at once? Right now I am searching one by one and it is taking quite a lot of time. Any help would be really helpful.

Thanks in advance :)

0 Comments
2024/04/05
10:12 UTC

1

Fastqc sequence length distribution failure criterion

Hi,

I've been investigating this issue for a work related project, but I'm hitting some dead-ends.

Essentially, from the Fastqc documentation, we know that the "sequence length distribution" criterion fails when there's at least one read of length 0. My educated guess was that a read length of 0 indicates an issue with the sequencer, but I can't find any documentation about this; I've checked the sequencer documentation and nothing like this gets brought up. Does anyone have any ideas?

0 Comments
2024/04/05
07:49 UTC

8

Best way to move 1 TB of fasta files from shared Dropbox to university server

A colleague shared their Dropbox with a bunch of fasta files I need to work on. I want to get them to my root dir on university server. I can download each file to my local machine by clicking the Dropbox link, then move to university owncloud, and finally to my root dir on the university server. There must be a better way. A Dropbox app? A python library designed for this stuff? Thanks for any ideas.

I could try wget or curl but there are so many files. Even getting the exact file name links from Dropbox would take some time. Maybe I could wget the directories? IDK. Thanks again.

17 Comments
2024/04/05
05:57 UTC

1

Coding challenges

Are the coding challenges for the interview same as the ones in the practice? I am referring to those coding challenges on the offered websites. Thank you.

1 Comment
2024/04/05
05:31 UTC

0

TPM/TP10K to raw counts

I saw a script somewhere to convert TPM/TP10K to raw counts, but cannot find it anymore. Can anyone help?

You can just assume that the lowest non-zero value corresponds to 1 and apply the same scaling to all values. I think the script I saw accounted for various edge cases.

0 Comments
2024/04/05
03:23 UTC

1

Help with changing MD settings for ligands-protein interactions w/ pyMOL?

Is there any way to correct ligand-protein relationships using pyMOL or is this impossible without writing the code myself?

0 Comments
2024/04/05
02:38 UTC

2

How to highlight proficiency with specific packages on resume?

I'm trying to highlight my experience using popular packages on my resume, but I'm not sure the best way to go about it. Should I include a line under the skills section listing which packages I'm proficient in (e.g. "Packages: SAMtools, BEDtools,...")? Or should I only mention my experience in context (e.g. describing the packages I used to accomplish a certain task in my bullet points)? Or should I do both?

2 Comments
2024/04/04
21:46 UTC

67

Why do authors never attach their Single Cell analysis structure to their papers online?

I've been doing single cell analyses for a couple of years now and one thing I've consistently observed is that papers with single-cell analyses almost never make the Seurat object(s) (The most common single cell analysis structure in R) they constructed available in their data & materials section. Its almost always just SRA links to the raw sequencing data, a github link to the code (which may or may not be what they actually used for the figures in the paper) and maybe a few spreadsheets indicating annotations for cluster labels, clustering coordinates, etc.

Now, I'm code savvy enough that I can normally reconstruct the original Seurat object using the bits and pieces they've left behind, but it would save me a heck of a lot of time if authors saved their Seurat object and uploaded it online. Plus a lot of people use different versions of the software and so even if I do run through the whole analysis again with the code they've left behind, its common to just get different results. Sometimes it just doesn't work out and I've just had to contact the original authors and beg them for their Seurat object.

So if you are reading this and you are planning on publishing your single cell data soon, please make everyone's life easier and save your Seurat object as a .RDS (R object) or .h5seurat (Seurat object).

48 Comments
2024/04/04
21:08 UTC

0

How to make a genome Bed File for Picard

Hi all,

This might seem trivial, but I've been struggling with getting the input needed for GATK even after googling and screening through Biostars threads. I am simply trying to get a genome bed file. I have the .fasta, .fna, .gtf, .gff files all from ensembl/NCBI and have tried turning my genome.fasta into genome sam into genome.bam into genome.bed using bowtie and samtools but nothing has worked.

My organisms bed files are on UCSC either :(. Does anyone have something that works?

Here's what I've tried:

bowtie2-build klactis.fasta klactis

# map reads

bowtie2 -p 4 -x klactis -U klactis.fasta -S klactis.sam (it stops here with Error Abort 6)

and with

pyfaidx

faidx --transform bed klactis.fasta > klactis.bed

And with awk:

samtools faidx klactis.fasta awk 'BEGIN {FS="\t"}; {print $1 FS "0" FS $2}' klactis.fasta.fai > klactis.bed

My bed file currently looks like:

head klactis.bed

A 0 1062590

B 0 1320834

C 0 1753957

D 0 1715506

E 0 2234072

F 0 2602197

Any help is greatly appreciated.

6 Comments
2024/04/04
20:03 UTC

1

Having trouble running RDML tools

Ran a qPCR around a month ago now and since then I have been trying to analyze my results. I'm using this program https://github.com/RDML-consortium/rdml-tools and Ubuntu terminal has been giving me a bunch of errors. Does anyone have experience with this and is willing to lend a quick hand?

0 Comments
2024/04/04
17:48 UTC

1

PyRx error - Autodock Vina: removing failed ligands.

Hi all,

I am trying to start to use autodock vina in PyRx. I have downloaded Enamines solubility fragments (18K molecules) https://enamine.net/compound-collections/fragment-collection. In PyRx I have used to openBable to prepare the ligand files. I started by converting all files into pdbqt, and there are a lot of errors.

The error message is AD4LigandPreparation wrote less atoms that present in the molecule: 232.63
AD4LigandPreparation wrote less atoms that present in the molecule: 232.63 Location:C:\Program Files (x86)\PyRx\lib\site-packages\PyRx\vsModel.py:PrepareLigandMol

When looking at the renders of some of the ligands, there are clear issues (broken bonds). However, with so many failures, manually going through the entire list of fragments to visually see where the pdbqt conversion failed is silly. There doesn't appear to be anyway to search for the chemical that failed OR to easily remove these. The only way to remove them is to go to the Autodock tab, manually find the ligand and delete.

Does anyone have any insight into either this error, or how to best filter the results so I can clean up the sdf library?

Thanks,

0 Comments
2024/04/04
16:30 UTC

3

Illumina TruSeq Quality Trimming Using CutAdapt CutOffs To Use?

Background:

I am trimming some data and I have used a couple of reference for what is involved in trimming. The easiest one to read is here. However, I cannot seem to find what good quality cut offs to use for quality trimming when using CutAdapt. CutAdapt does give an example shown:

3' only:

cutadapt -q 10 -o output.fastq input.fastq

Both 3' and 5':

cutadapt -q 15,10 -o output.fastq input.fastq

But here the numbers seem to be largely for illustrative purposes and not the recommendation?

Where `-q` is described as (linked here):

The -q (or --quality-cutoff) parameter can be used to trim low-quality ends from reads. If you specify a single cutoff value, the 3’ end of each read is trimmed

Question/Concerns:

Is the `-q` different for each end or should I only trim the 3'? I cannot seem to find information about quality trimming cut offs on the Illumina site. I did not see anything that directly applied to Illumina's TruSeq. So any suggestion/help would be much appreciated.

7 Comments
2024/04/04
16:25 UTC

1

Is there any database of co-mutations available online?

So far I have only found cancer-specific ones. I'm interested in general co-mutations info across different genes.

And no, this isn't exactly the same as looking for protein-protein interactions. And Gnomad contains only info of co-occurring variants in same gene.

Any help would be greatly appreciated!

0 Comments
2024/04/04
16:18 UTC

25

Open source tool to visualize multiple intersecting sets

Hi, about 4 years ago I created an open source Python library for visualization of intersection sets called supervenn: https://github.com/gecko984/supervenn . It has since recieved more than 250 stars on Github.
My post about it in this subreddit has received a warm welcome, so I decided that another one after 4 years would do no harm. I've also implemented a new feature today, now you can use just intersection sizes instead of sets themselves. Hope you find it useful, have a great day.

https://preview.redd.it/o1i90bphugsc1.png?width=2106&format=png&auto=webp&s=a09e8ba06c5a9d5806b603c2b36b454c78bc384e

5 Comments
2024/04/04
13:36 UTC

5

Would greatly appreciate some advice

I am a college junior who just recently switch tracks from pre med to bioinformatics (still kept my Biology Major, and Chemistry and Bioinformatics minors the same) with a 3.8 gpa. It has been a little difficult finding bioinformatics opportunities for the summertime, having no previous experience in this field, so I was wondering if anyone could tell me what I should be doing right now, just starting out in this field. Or should I not even worry too much about college internships and just focus on Master's and post-graduate?

11 Comments
2024/04/04
12:40 UTC

0

What softwares to use to make phylogenetic trees. pls help, this is for thesis purposes

uhmm, so I'm diving deep into my thesis, and I'm all about that phylogenetic tree life right now. But yo, I'm hella lost on what softwares I should be usin'. Like, there's so many out there, and I don't wanna waste time tryin' 'em all out.

I need your help, squad! What softwares do y'all recommend for makin' phylogenetic trees? I need somethin' that's user-friendly 'cause, let's be real, I ain't no computer whiz. But it also gotta be legit, you know? Can't be usin' some janky software that's gonna mess up my data.

Hit me up with your suggestions and tips, y'all! And if you got any insider tricks on how to make these trees pop, I'm all ears. This thesis ain't gonna write itself, and I could use all the help I can get.

27 Comments
2024/04/04
06:20 UTC

2

GO Network Plot of DEGs

Hi all, I have two samples which I’ve done differential expression with comparing between two time points. I’m interested in finding the enriched GO terms of these high ranking genes, and comparing the results of each sample in a sort of network plot.

I’ve seen really cool plots that group GO terms by their term hierarchy, with node size based on -log2(p value) , and I’m wondering how I might be able to reproduce this sort of plot in Python? Any insight would be appreciated!

6 Comments
2024/04/03
20:53 UTC

3

Is it necessary to perform deduplication for scRNA-seq?

I recently did a personal project on investigating differential gene expression in breast cancer samples (primary vs metastatic sites). I have 8 sequences and I was just wondering whether deduplicating is necessary after the alignment step?

13 Comments
2024/04/03
15:45 UTC

25

Looking for advice

Hi everyone

I am currently a Master's Student in Molecular Biology and Bioinformatics, with soon prospective graduation. During this time I realized that the wet lab is not for me and that I would rather enhance my computational skills to apply for jobs in Bioinformatics or Computational Biology once I graduate. I do have experience in Python and RStudio, I have data analysis skills too and I just recently implemented a mathematical model in Python, however, I do not feel like this is enough for me to land a job. I have been looking for bioinformatics positions and they require skills in scRNA-seq, RNA-seq, and other omics. In my lab, I do not have the opportunity to do these and that is why I am worried. I feel like I going to be behind once I graduate and that is why I am looking for advice. How Can I develop these skills? How long it would take? How Can I do it? Do you know any source/internship/ useful to learn those skills? Are there jobs that can take you and train you?

I know these are a lot of questions and that is because I really want to be trained and succeed in my future job landing.

I would appreciate you rcomments

28 Comments
2024/04/03
15:44 UTC

15

Help dealing with batch effects when combining datasets from different experiments

I currently don't have my own RNA-seq data but I've found two publicly available datasets that I can use to start to answer my question. Each of the datasets have their own disease condition and control so I would be combining controls and disease conditions from two different papers. I have their data as count matrices both were processed with HTSeq. I was thinking about using edgeR to analyze them but I'm concerned about batch effects interfering with the analysis so I was thinking of normalizing before combining the datasets but edgeR prefers raw counts. Sorry if this is an easy question, I don't typically deal with this. Thanks for your help!

10 Comments
2024/04/03
15:35 UTC

1

Please help: Phage Display

I am helping my Online Chinese friend to find answer for her thesis. She first emailed the author of the article: Peptide-guided lipid nanoparticles deliver mRNA to the neural retina of rodents and nonhuman primates but we're not sure if the author will reply. So I suggested to try internet forum, but unfortunately she can't access Reddit in China. So Im helping her to post this and also fyi I have no knowledge of this topic. Below is her problem:

"I have bought M13 phage peptide display library from NEB, but I can’t electroporate it to electrocompetent cell following its instructions. The parameter is 25 µF, 200 Ω, 2.5 kV."

If you can help, please post reference or materials I can share. I will share your answer to her also. If you are willing to be contacted just in case for follow up questions, please let me know too. Thank you!

1 Comment
2024/04/03
15:05 UTC

0

SOS help me - bracken-build segmentation fault (core dumped) error during STEP 3: CONVERTING KMER MAPPINGS INTO READ CLASSIFICATIONS even when requesting max memory on job - need to get this figured out asap so I can graduate!

Hi all,

I'm having a major issue with bracken when trying to do the third step of the bracken-build command where it attempts to convert kmer mappings into read classifications (step 1b from the manual at https://ccb.jhu.edu/software/bracken/index.shtml?t=manual). I keep getting a segmentation fault during this step, even when requesting the max memory available by my university's supercomputer (2988GB). Has anyone else dealt with this before? If so, PLEASE HELP ME. I have come across the same issue on github (https://github.com/jenniferlu717/Bracken/issues/54), but the creator of bracken doesn't seem to know how to fix it and hasn't responded to my issue thread on github. Here is the script that I am using:

#!/bin/bash
#SBATCH --partition=hugemem
#SBATCH --account=XX
#SBATCH --time=50:00:00
#SBATCH --mem=2988GB
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=80
#SBATCH --job-name=kraken2_bracken_build

/fs/scratch/PAS1725/metagenomics/Bracken-2.6.2/bracken-build -d /fs/scratch/PAS1725/metagenomics/kraken2_standard -t 24 -k 35 -l 100

#####

I am not great at bioinformatics problem solving, and don't really feel confident in the way I set up the script, but the job seems to run okay until I hit that segfault error. Is this just a memory issue? Is there even any way of fixing it or is it a hopeless cause?

I'm set to graduate soon and am freaking out because I've been struggling with bracken and kraken2 for WEEKS and have essentially made no progress, aside from completing the kraken2 steps. If it helps, here is the link that my university's supercomputer tech support directed me to: (https://www.osc.edu/resources/technical_support/supercomputers/pitzer/batch_limit_rules). While they are helping me with this issue, it takes a while for them to respond to my questions, and even more time waiting for my large job script to be queued on the supercomputer to find out it didn't even run properly.

TLDR: I desperately need help with fixing the segmentation fault error on bracken-build, I don't have time to troubleshoot this myself, and I am at my witts end with bracken.

Edit: Thanks to those with helpful comments and suggestions! After talking with my tech support, the issue is with the installed kraken2 libraries being built on another system that is not compatible with the one I'm using, not a memory issue. It has been suggested to install everything again using a container and go from there. Hopefully it works! 🙏

16 Comments
2024/04/03
13:49 UTC

1

Struggling to understand how to install RAxML in windows

Hello, I'm a freshman biology student and my professor required us to download RAxML for an upcoming activity. I'm not tech savvy so I've been struggling in installing RAxML. All help is appreciated!

4 Comments
2024/04/03
12:14 UTC

3

Nanopore flowcell blockage by loading beads

Hi all, I have just started using the new R10 chips from oxford nanopore which include loading beads for loading the sample library onto the flowcell. However I found that when going to wash the flow cell for re-using it later I was unable to draw back any fluid from the flowcell in the initial wash step (the "initial draw back 20-30ul of fluid before priming/loading any wash fluid into the flowcell") before you are meant to load the wash mix. I faffed around very carefully repeating this process but to no avail, and had to continue to exert pressure at one point until suddenly the entire chip emptied into my pipette in one go, destroying all the remaining pores all together. I have never had this issue before using any previous chips and it seems as if this issue was caused by the loading beads causing a blockage in one of the channels, preventing me from drawing up any fluid in that initial step.

Has anyone else had this issue or know how to avoid it? And also whether it would be worth contacting nanopore for a replacement chip?

Cheers

15 Comments
2024/04/03
10:10 UTC

2

About hypotetical proteins, can I use them as a novel peptide?

Hello, sometimes when searching for proteins within an organism, I come across short protein sequences, which are often referred to as hypothetical proteins. When I blast these sequences, they often do not show a 100% match with other organisms, but they can be closely related proteins.

What I'm curious about is whether I can take these short protein sequences, perform docking, and use them as a novel, untested peptide. What principles should I follow, and has there been any research done with such an approach in the literature? Can anyone familiar with this provide guidance?

Thank you.

Example link: https://www.ncbi.nlm.nih.gov/protein?term=txid562%5Borganism%3Aexp%5D+AND+((%2210%22%5BSLEN%5D+%3A+%2220%22%5BSLEN%5D)&cmd=DetailsSearch

edit: I apologize for not being clear enough, due to my English. What I actually meant to say is, would it be too absurd if I synthetically produce these hypothetical proteins (especially those around 10 to 40 amino acids in length) and investigate their anti-cancer or antimicrobial properties? The amino acid sequences are available in the link I provided above. The reason I'm asking this is whether these hypothetical proteins, which are small peptides, are truly unique entities on their own, or are they just small, meaningless fragments of larger proteins encountered during MS/MS analysis? In other words, are they protein fragments with no discernible properties? Therefore, I'm wondering if it's worth producing them using solid-phase peptide synthesis and whether it's worth researching their properties or not.

7 Comments
2024/04/03
09:58 UTC

1

Compound Classification using ML tools

I am doing PhD in the major of AI/Computer Vision. I have applied for an ML Engineer role in a Bion Technology startup. I am given a dataset/CSV file that contains three columns- InChIKey, SMILES, and Activity. There are three activity types such as active, inactive, and intermediate.
I know ML and DL classification algorithms to classify objects given input features. However, as I have no domain knowledge in the biosphere, I can't understand what to do with these 2 input features.
What I understood so far is that InChIKey is a 27-character string or a key value of a chemical compound. SMILES is a chemical structure of that chemical compound or molecule (I am not sure what I mean by a molecule or chemical compound, that is what I thought would be correct to name).
How should I preprocess these features before feeding them into the model? Is there any demo notebook that replicates this task?
Help me understand the task!!!

2 Comments
2024/04/03
08:33 UTC

17

opinions on Biostar for studying - focus on scRNAseq

Biologist and statistician (more statistician) here. I got my bachelors more than 10 years ago. I am starting to get involved in scRNAseq research and I am quite rusty in genetics and all concerning bioinformatics field. I was looking here and the most updated comment was about the Biostar Handbook.

I want your opinions on this resource, it seems quite affordable. Any suggestions of other resources to get my self updated and more or less informed to engage in scRNAseq will be apreciated.

12 Comments
2024/04/03
00:05 UTC

Back To Top