/r/MLQuestions

Photograph via snooOG

A place for beginners to ask stupid questions and for experts to help them! /r/Machine learning is a great subreddit, but it is for interesting articles and news related to machine learning. Here, you can feel free to ask any question regarding machine learning.

What kinds of questions do we want here?

"I've just started with deep nets. What are their strengths and weaknesses?" "What is the current state of the art in speech recognition?" "My data looks like X,Y what type of model should I use?"

If you are well versed in machine learning, please answer any question you feel knowledgeable about, even if they already have answers, and thank you!


Related Subreddits:

/r/MachineLearning
/r/mlpapers
/r/learnmachinelearning

/r/MLQuestions

64,130 Subscribers

1

Best online course or tutorial to get reacquainted with Python?

I was assigned an automation task at work and in my graduation program we had a semester off Python, so I am RUSTY. I'm struggling through remembering all the functionalities that come with pandas and numpy, it's shameful. I'm not a beginner coder so I don't want a super basic tutorial, but does anyone have recommendations for me to get reacquainted with ETA and DTL tasks in Python?

0 Comments
2025/02/01
14:49 UTC

1

Is my model overfitting?

as in title, Im afraid my random forest might be overfitting on class 1. I've tried other algorithms, and balancing the weights but that didnt improve the results. What steps would you recommend to address it? Are there any other aproaches I should try?

https://preview.redd.it/ujn1so2r2jge1.png?width=273&format=png&auto=webp&s=88facf4e00396b1f115e5e90c87ff60cdb013859

predicted variables value counts:

1 20387
0 5064

2 Comments
2025/02/01
13:16 UTC

1

Questions about mechanistic interpretability, PhD workload, and applications of academic research in real-world business?

Dear all,

I am currently a Master student in Math interested in discrete math and theoretical computer science, and I have submitted PhD applications in these fields as well. However, recently as we have seen advances of reasoning capacity of foundational models, I'm also interested in pursuing ML/LLM reasoning and mechanistic interpretability, with goals such as applying reasoning models to formalised math proofs (e.g., Lean) and understanding the theoretical foundations of neural networks and/or architectures, such as the transformer.

If I really pursue a PhD in these directions, I may be torn between academic jobs and industry jobs, so I was wondering if you could help me with some questions:

  1. I have learned here and elsewhere that AI research in academic institutions is really cutting-throat, or that PhD students would have to work hard (I'm not opposed to working hard, but to working too hard). Or would you say that only engineering-focused research teams would be more like this, and the theory ones are more chill, relatively?

  2. Other than academic research, if possible, I'm also interested in pursuing building business based on ML/DL/LLM. From your experience and/or discussions with other people, do you think a PhD is more like something nice to have or a must-have in these scenarios? Or would you say that it depends on the nature of the business/product? For instance, there's a weather forecast company that uses atmospheric foundational models, which I believe would require knowledge from both CS and atmospheric science.

Many thanks!

0 Comments
2025/02/01
13:01 UTC

16

Anyone want to learn Machine learning in a group deeply?

Hi, i'm very passionate about different sciences like neuroscience, neurology, biology, chemistry, physics and more. I think the combination of ML along with different areas in those topics is very powerful and has a lot of potential. Would anyone be interested in joining a group to collaborate on certain research related to these subjects combined with ML or even to learn ML and Math more deeply. Thanks.

35 Comments
2025/02/01
11:50 UTC

0

Perplexity Pro at 29$ / yr

I am selling Perplexity Pro for 29$ ( You save 171$ , 200$/yr Pro Plan ).

It's through Partnership Program, can show my own account as a proof and few reviews on othe posts. Please dm me of you are interested.

Payment: Wise / Crypto / UPI

1 Comment
2025/02/01
10:45 UTC

1

Project Suggestions for resume please?

  1. Please suggest 1 or 2 good ML/DL project ideas (preferably but not compulsorily in Gen AI) which i can build/make to add to my resume and github. It should not be something very common or generic like clones or simple image classification, etc. Something that would stand out to recruiters.
  2. Also I have planned to build a multimodal rag based website for my final year capstone project. Could anyone offer me some tips on how i can make it more innovative or better or what model to use, etc to be able showcase it as my major AI/ML project?
2 Comments
2025/02/01
09:22 UTC

5

AI/ML Questions (First Year CS Student)

Hi, I'm a first year CS student and I've been having a few questions relating to the AI/ML field that I legit can't find the answer to anywhere unfortunately...

First, I'm heavily debating leaning my education towards AI/ML by taking more math, but specifically minoring in statistics. When going into uni, I thought I was just going to be a code demon and grind leetcode and projects. But I thought, is that really still the move? What if AI/ML is truly the future? I've been trying to do more research and can't really find any useful insight. Just wondering, if anyone thinks the SWE jobs will be cooked soon like 5+ years, and it's likely possible that AI/ML will be far superior.

Another question, what do you actually do in these new AI/ML jobs? Like I'm hearing so many different things from different people so does it just depend on the company? Everywhere I look, on YouTube, LinkedIn, personal friends... It's all so confusing, you see me refer to the term "AI/ML" and to be frank, I don't even know exactly what that means. From my understanding, an ML Engineer for example, doesn't actually work with the theory (the math and statistics) behind these models. That's the work of the Masters and or PhD people. Are ML Engineer's just SWE's but work with these pre-built/designed models? I've heard they just help train and tune the models by programming and likely other tools that I'm unaware of, but no crazy math or stats is needed I think? I've also heard that they help "deploy" the models into the real world, because the mathematicians and statisticans wouldn't know how to make it public, since that's what a SWE does in normal SWE jobs.

I mentioned potentially doing a stats minor. Is that at all useful? Some courses that I would be taking would be, statistical modeling, probability, regression analysis, analysis of variance and expermentail design, sampling methodoloy, and statistical computing. Maybe I should point out that, I don't want to be really working with a lot of data and graphs and all of that. Hence why I don't want to become a Data Anaylst or Data Scientist for example. I want to code because it's something I enjoy doing, but I want to know if these AI/ML jobs are meant for SWE's but just specific to that field, or are they different in the sense that you need a deeper understanding of math and statistics. If so, how much? And also, if do need higher level of math/statistics, is it like just taking a few more courses, or do you need a Masters/PhD? If it's just a few more courses, does this mean that you're basically just a SWE, and need just some fundamental knowledge to help with your workflow, or it's just completely different?

Essentially, is a stats minor significant in increasing the chances of working in that field? What are the types of tasks you would do in this field, and please if anyone can explain like when you would require higher level of math and statistics versus when you wouldn't like depending on the jobs I would appreciate it a lot. I enjoy math and somewhat statistics, if you were wondering, I'm just trying to figure out what this new field is all about... Thank you so much!

7 Comments
2025/01/31
19:10 UTC

0

What laptop for good performance ?

I'm currently learning on macbook air 2017 so pretty old and performs quite slowly. It's struggling more and more so I'm thinking I will need to change soon. All of my devices are apple environment at the moment so if a macbook pro M2 2022 for example is decent enough to work on I'd be fine with it, but I've heard that lots of things are optimized for NVIDIA GPUs. Otherwise, would you have any recommendations ? Also, not sure if it's relevant but I study finance so I mainly use machine learning for this. Thank you for your help !

2 Comments
2025/01/31
19:07 UTC

3

Helping keep up with Scientific Literature with Learning Disabilities

Hello Redditors,

I'm wondering if anyone in the AI/ML space has any tips and tricks on how to keep up with the scientific literature of the industry. I currently believe that spending an hour a day on reading literature articles, and 2 hours a weekend seems to be achievable, but I'm having difficulty getting those numbers up.

I've been diagnosed with ADHD since high school, and despite getting multiple degrees in the science field I'm finding it difficult to get this into a easily maintainable routine. I've tried Pomodoro timers, and I'm definitely interested in the material that I'm reading, but any suggestions that others have that I can try out would be highly highly appreciated.

0 Comments
2025/01/31
18:16 UTC

2

Why is my LSTM just "copying" the previous day?

I'm currently trying to develop an LSTM for predicting the runoff of a river:
https://colab.research.google.com/drive/1jDWyVen5uEQ1ivLqBk7Dv0Rs8wCHX5kJ?usp=sharing

The problem is, that the LSTM is only doing what looks like "copying" the previous day and outputting it as prediction rather than actually predicting the next value, as you can see in the plot of the colab file. I've tried tuning the hyperparameters and adjusting the model architecture, but I can't seem to fix it, the only thing I noticed is that the more I tried to "improve" the model, the more accurately it copied the previous day. I spent multiple sessions on this up until now and don't know what i should do.

I tried it with another dataset, the one from the guide i also used ( https://www.geeksforgeeks.org/long-short-term-memory-lstm-rnn-in-tensorflow/ ) and the model was able to predict that data correctly. Using a SimpleRNN instead of an LSTM on the runoff data creates the same problem.

Is the dataset maybe the problem and not predictable? I also added the seasonal decompose and autocorrelation plots to the notebook but i don't really know how to interpret them.

2 Comments
2025/01/31
16:50 UTC

1

Questions about continuous ranked probability score (CRPS)

I wasn't able to find any answers online.

Is it bounded? e.g. from 0 to 1. Or is it unbounded?

Does it have any simple interpretation? How to compare two CRPS values? e.g. 5 and 20, In what sense is 20 4 times better model than 5? Could it be that both models have the same point forecasts but one has only wider prediction intervals?

0 Comments
2025/01/31
15:28 UTC

3

Why is my validation/test loss not overfitting?

Hi all, Im relatively new in ML, and Im completely new to Pytorch.

Im constructing a NN, that takes 4 inputs, to create 2 outputs, and I've tested a bunch of hyper parameters.

My problem is that my train loss is decreasing as it should, and so is my test loss, but my predictions are still not satisfying.

I've split the data in 80/20 train test set, and have 2 sets of inputs that im holding out to see the prediction after training.

I've tried to train over alot of epochs to see if i could induce overfitting, but my test loss will never increase, which i think might be part of the problem of my predictions.

Any tips or help would be much appreciated!

Here is my code: https://github.com/Muldbak/Impedance_pred

2 Comments
2025/01/31
12:34 UTC

1

Machine Learning interview prep + My Interview Experience at a fast paced startup as MLE

This is to share my interview experience as an MLE at a startup and what all you need to ace the interview for MLE roles https://youtu.be/TksIKgYYWrw?si=08XubKjLelM8s422

0 Comments
2025/01/31
08:07 UTC

1

Advice/resources on best practices for research using pytorch

Hey, i am a phd student in cs (1st year). I was not familiar with pytorch until recently. I often go to repos of some machine learning papers, particularly those in safe RL, and computer vision.

The quality of the codes I'm seeing is just crazy and so we'll written, i can't seem to find any resource on best practices for things like customizing data modules properly, custom loggers, good practices for custom training loops, and most importantly how to architect the code (utils, training, data, infrastructure and so on)

If anyone can guide me, I would be grateful. Just trying to figure out the most efficient way to learn these practices.

9 Comments
2025/01/31
03:52 UTC

1

LLM Deployment Crouse

Hi, I'm a data scientist and trying to get this new position in my company for Senior GenAi Engineer. To fit this position, I know that I'm missing some knowledge and experience in deployment and monitoring of LLM in production. Can you recommend me a good course that can teach me about the process after fine tuning? Including API, Docker, Kubernetes and anything that will be related?

0 Comments
2025/01/31
03:20 UTC

0

What are some things required to know as someone planning to work in ML (industry or research) but not usually taught in bootcamps?

Not sure what flair works, or if this is a good place to ask this, but I'm kinda curious.

Generally, most bootcamps I've seen focus on all of the smaller fundamentals like getting used to working with ML frameworks and general ideas of models and how to use them. That said, that is obviously not everything one would need in, say, research or a job. In your opinion, what topics/ideas do you think should be possibly either included in bootcamps, or as supplemental knowledge one should pick up on their own? Especially for people who do know the basics but ofc want to specialize, and aren't in the place where they can enroll in an entire degree program and take in-depth classes, or join an internship that would help them explore some of the things a new hire would be expected to know.

Some thoughts that I had were maybe good coding practices as a main thing, and not just a run down of how python/R/SQL/whatever works, but like more in depth ideas about coding. Other than that, maybe specialized software/hardware that's used, like how it works, the intricacies of different chips or CUDA/GPU's, or even TPU's, or stuff that's useful for areas like neuromorphic computing. Specialized algorithms are usually not focused on unless someone's taking a specific focused course, or they're willing to go through the literature. Basically this is a rambling of things that I'd love to see condensed into a bootcamp and want to know more about, but what about everyone else here? What are your thoughts?

4 Comments
2025/01/30
23:11 UTC

1

[Q] Best local LLM for Onenote notebooks

Best local LLM for Onenote notebooks

Hello guys! I would like to use one local LLM for a lot of Onenote notebooks (text + images) that include procedures, troubleshooting, etc, and some Word documents with pretty much same context. Since it is a business case for my company, I want to remain it locally, no access to the internet. But I want a kind if chatbot to ask and interrogate if, for example, we have a procedure for something that is in the documents.

Is this feasible? Which tools can I try? Are they open source / free? What are the limitations? Can you suggest some combinations?

Thanks a lot!!

0 Comments
2025/01/30
21:57 UTC

1

[P](Easy-Help)Machine Learning

Good afternoon developers, I am taking a Machine Learning course at BairesDev through Dio, and an easy challenge came up , I was literally lost. The mission is just to add photos (datasets) of dogs and cats, for example, into a ready-made code for it to differentiate between a dog and a cat.

Challenge Description

Transfer Learning Project in Python

The project consists of applying the Transfer Learning method to a Deep Learning network in Python using the COLAB environment.

For example, we will use the following project which performs Transfer Learning with the MNIST Dataset:https://colab.research.google.com/github/kylemath/ml4a-guides/blob/master/notebooks/transfer-learning.ipynb 

The dataset used includes two classes: cats and dogs. A description of the database can be viewed at this link:https://www.tensorflow.org/datasets/catalog/cats_vs_dogs.

You can download the dataset from this link:https://www.microsoft.com/en-us/download/details.aspx?id=54765.

Notes: In this project, you can use your own dataset (for example: photos of yourself, your parents, your friends, your pets, etc.). The example of cats and dogs can be replaced by two other classes of your interest. The dataset created in our previous project can be used now.

0 Comments
2025/01/30
21:31 UTC

1

How to fill missing data gaps in a time series with high variance?

How do we fill missing data gaps in a time series with high variance like this?

https://preview.redd.it/rmpgqbamu6ge1.png?width=507&format=png&auto=webp&s=e003cf015372f0894d09176cad335f39f505e2fb

2 Comments
2025/01/30
20:10 UTC

1

Hyperparameter transferability between different GPUs

I am trying to run hyperparameter tuning on a model and then use the hyperparameters to train the specific model. However, due to resource limitations, I am planning on running the hyperparameter tuning and the training on different hardwares, more specifically I will run the tuning on a Quadro RTX 6000 and the training on an A100.

Is the optimality of the hyperparameters depended on the hardware that I am using for training? For example, assume I find an optimal learning rate from tuning on the Quadro, is it safe to assume that this could also be optimal if I choose an A100 for training (or any other GPU for this matter). My ML professor told me that there should not be a problem since the tuning process would be similar between the two GPUs, but I wanted to get an opinion here as well.

1 Comment
2025/01/30
09:36 UTC

2

NER texts longer than max_length ?

Hello,

I want to do NER on texts using this model: https://huggingface.co/urchade/gliner_large_bio-v0.1 . The texts I am working with are of variable length. I do not truncate or split them. The model seems to have run fine on them, except it displayed warnings like:

UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the b
yte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these
unknown tokens into a sequence of byte tokens matching the original piece of text.
 warnings.warn(
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
I manually gave a max_length longer, what was i the config file:

model_name = "urchade/gliner_large_bio-v0.1"model = GLiNER.from_pretrained(pretrained_model_name_or_path=model_name, max_length=2048)

What could be the consequences of this?

Thank you!

3 Comments
2025/01/30
09:24 UTC

0

Questions about different AI use cases and what AI to use for them?

I want to use AI for a few things, role play, image generation descensoring hentai and the likes.

There are a lot of AI for this already though and I can't just use whatever Google recommends because I'm data and privacy conscious enough to not want to just use whatever app is in the play store that just feeds off my data. So mostly just run locally on my 7700x 32gb ram and 4070 unless the company explicitly is privacy in mind like MiocAI.

I've tried MiocAI, stable diffusion and the likes and while some of it was over a year ago especially stable diffusion I could never get the AI to create the pics I wanted (some nsfw of anime cannon characters, mostly what could be used as OC refs so I don't have to draw or get them drawn but I want fantasy races and not just random human girl) MiocAI doesn't have group chats and no mater how much they say otherwise image generation is never improved really if at all, I am not good at writing characters either and the let it generate cannon character bot is bad...

You get the picture so TLDR, I need a good role play bot, image generation tool and if you can decensore hentai video well... whether it be a general assistant that just knows everything and the characters I ask it about and isn't censored for rp and don't accidentally interject other characters in the scenen or a real RP bot whatever it may be for image and rp and maybe decensoring? the only requirement is it has to have a real privacy focus or be able to be ran locally on my 4070 7700x and 32gb ram and be good and not to censored so what do I try out and use?

2 Comments
2025/01/30
07:50 UTC

1

Which AWS tools/services do you use regularly?

So I currently have a few ML Projects under my belt, including a multi-agent RAG and an image classifier. I would like to get more experience with AWS, to see which AWS tools professionals use. I am open to learning AWS through the certificates, but as with everything else I've learned about ML, working on a project hands-on is a better experience. Two questions:

  1. Which services should I focus on in regards to ML?
  2. If an AWS certificate is the way to go, which do you recommend? I've looked into the ML certs, but those seemed to have a lot of focus on the basics of ML, whereas I want to focus much more on how cloud services are used with ML.

Thanks!

0 Comments
2025/01/30
01:08 UTC

3

Beginner Seeking Resources on Weather (or time-series in general) Prediction

Hello, I recently finished a mathematics degree and completed Andrew Ng’s ML and deep learning courses. I’m starting my first personal project. It has to do with predicting niche weather events as they pertain to local farms. I doubt the result will be of any significant accuracy or value, I really just want to practice with TensorFlow. But I’m stuck on choosing hyperparameters (layer size, type, number, etc.) since I haven’t read enough about existing models.

Does anyone have recommended papers or models on time-series prediction? They don’t have to be weather-related—economics and biology work too. I’m new to this and want to focus on really basic neural networks before moving to RNNs or CNNs. Thank you!

4 Comments
2025/01/29
22:32 UTC

2

supervised learning

hello, I just started Andrew Ng's "machine learning specialization course" and am almost done with course 1 (regression and classification). Should I finish all 3 courses (regression and classification, advanced learning algorithms, etc) before learning about image classification using tensor flow and other topics?

1 Comment
2025/01/29
19:44 UTC

3

DeepSeek very slow when using Ollama

Ever wonder the computation power required for Gen AI? Download one of the models, I suggest the smallest version unless you have a massive computing power and see how long it takes for it to generate some simple results!

I wanted to test how DeepSeek would work locally. So, I downloaded deepseek-r1:1.5b and deepseek-r1:14b to test them out. To make it a bit more interesting, I also tried out the web gui, so I am not stuck in the cmd interface. One thing to note is that the cmd results aare much quicker than the cmd results for both. But my laptop would take forever to generate a simple request like, can you give me a quick workout ...

Does anyone know why there is such a difference in results when using web GUI vs cmd?

Also, I noticed that currently there is no way to get the DeepSeek API, probably overloaded. But I used the Docker option to get to the webgui. I am using the default controls on the web gui ...

3 Comments
2025/01/29
18:42 UTC

1

Method for training line-level classification model

I'm writing a model for line-level classification of text. The labels are binary. Right now, the approach I'm using is:
- Use a pretrained encoder on the text to extract a representation of the words.
- Extract the embeddings corresponding to "\n"(newline tokens), as this should be a good representation of the whole line.
- Feed this representations to a new encoder layer to better establish the relationships between the lines
- Feed the output to a linear layer to obtain a score for each line

I then use BCEWithLogitsLoss to calculate the loss. But I'm not confident on this approach due to two reasons:
- First, I'm not sure my use of the newline representations has enough meaningful information to represent the lines
- Second, each instance of my dataset can have a very large amount of lines (128 for instance). However the number of positive labels in each instance is very small (let's say 0 to 20 positive lines). I was already using pos_weight on the loss, but I'm still not sure this is the correct approach.

Would love some feedback on this. How would you approach a line classification problem like this

1 Comment
2025/01/29
15:03 UTC

Back To Top