/r/MLQuestions

Photograph via snooOG

A place for beginners to ask stupid questions and for experts to help them! /r/Machine learning is a great subreddit, but it is for interesting articles and news related to machine learning. Here, you can feel free to ask any question regarding machine learning.

What kinds of questions do we want here?

"I've just started with deep nets. What are their strengths and weaknesses?" "What is the current state of the art in speech recognition?" "My data looks like X,Y what type of model should I use?"

If you are well versed in machine learning, please answer any question you feel knowledgeable about, even if they already have answers, and thank you!


Related Subreddits:

/r/MachineLearning
/r/mlpapers
/r/learnmachinelearning

/r/MLQuestions

57,872 Subscribers

1

Basics of ML - Multiple Linear regression maths + derivation

I've covered maths and derivations behind Multiple Linear regression in detail. https://www.youtube.com/watch?v=_ctvTfqtX9c

0 Comments
2024/11/12
10:36 UTC

1

Question about Transformers

Hello, I am writing transformer for my Sequence Processing library but I got stuck at some point.

S = Sequence length L = Word embedding length V = Vocabulary size

As you know, the transformer takes two inputs during training, one for the encoder and one for the decoder. However, the input word count of the encoder (Sencoderinput) may be different from that of the decoder(Sdecoderinput). The problem is that since the output of the encoder (SxL matrix) is used in the decoder, the input length of the encoder and the decoder must be the same (Sencoderinput ?= Sdecoderinput). Should we specify a fixed length that will not exceed any instance?

Second question is the output of the decoder matrix is SxV matrix, but each output is for one word so we need 1xV vector. What is the use of the remaining part?

0 Comments
2024/11/12
10:21 UTC

2

Looking for papers about the architecture/communication patterns of LLM-based Agents

Hey guys, as the title says, I'm looking for papers about the architecture of LLM-based agent systems. Any recommendations are highly appreciated!

0 Comments
2024/11/12
09:31 UTC

1

How to automatically identify product models in an e-commerce database?

I have an e-commerce product database, and my goal is to automatically identify products that belong to the same model (e.g., a black iPhone and a white iPhone would be variations of the same model).

Aside from embedding product names and searching by embedding proximity, are there other effective approaches for finding products that belong to the same model?

Thanks for any insights!

0 Comments
2024/11/12
05:24 UTC

3

I have deployed the model using fastapi for backend and streamlit for frontend in my secured office system

No use of cloud computing in my company. Now can some one help me how to monitor the model and retrain it For monitoring, I see evidently ai is good and for retraining airflow is good

Please someone provide some good tutorial or a simple basic code

0 Comments
2024/11/12
05:14 UTC

2

[D] How to report without a test set

The dataset I am using has no splits. And previous work do k-fold without a test set. I think I have to follow the same if I want to benchmark against theirs. But my Val accuracy on each fold is keeping fluctuating. What should I report for my result?

6 Comments
2024/11/11
21:16 UTC

2

Is there any way to fine-tune using completely unlabeled data?

Say I want to fine-tune llama to answer questions about medical data and just want to use a medical textbook, the text book states facts but has no question answer matterial that looks like the text I wan't my chatbot to replicate.

Is this possible? Or do I need to have it in a question answer format?

5 Comments
2024/11/11
20:08 UTC

2

Complex molecule representation

Hey Reddit, I’m trying to put together a GNN for a small molecule dataset. I’ve done this fine with planar molecules before but now I have complex 3D bridged molecules which make the 2D representation very messy. Any tips on how to better represent these? I’ve tried RDkit but the alternatives to GNN result in thousands of features which is a no go for my ~ 350 molecule dataset. Cheers!

1 Comment
2024/11/11
16:59 UTC

3

How to Predict Future Shapes of Weather Radar Contours?

My friends and I are working on a project where we capture weather radar images from Windy and extract contours based on DBZ values, by mapping the RGB value in a pixel to a DBZ value. We've successfully automated the process of capturing images and extracting contours, but moving from extracting contours using RGB to predicting the shapes of a contour is quite a leap. Currently, we are trying to find out

  1. What kind of problem is this in the field of machine learning?
  2. Which topics, techniques should we look into to help predict the future shape of the contours?
4 Comments
2024/11/11
16:36 UTC

10

How to learn Calculus the proper way to prepare for Machine Learning Mathematics?

I was not a good student in school, and never paid much attention to learn Mathematics. However, I am planning a career as a MLE and I know that I need to learn Mathematics for a successful career in Machine Learning. I have planned to study Mathematics for Machine Learning the hard way, start from the beginning, and then move all the way to learning Calculus. This is because learning Calculus requires prerequisites and I am not sure what I will be missing if I try to pick and choose topics.

My question is, if I speedrun the following syllabus from the beginning till the end, will it be enough for me to start learning Mathematics for Machine Learning? https://www.khanacademy.org/math/in-math-ncert

12 Comments
2024/11/11
16:05 UTC

1

Non-Technical Resources

I recently began supporting an AI/ML/Big Data infra group at my company. I'm looking for resources, preferably a book vs online videos but I'll take either, that are a step below the high level 'what is ml/ai/etc' but also not heavily technical. I don't need to learn the code but want some more information so I have a better understanding of what my folks are talking about on calls. If it matters for the resources, it's cloud based vs on-prem. Thanks for any direction you can provide!

0 Comments
2024/11/11
16:03 UTC

3

How to Speed Up CI/CD Pipelines for ML Model Deployment?

Hi all,

We’re working on setting up a CI/CD pipeline for deploying ML models, but the process is slow, especially with model retraining and testing in the loop. Does anyone have advice on optimizing the pipeline to cut down deployment time, particularly for rapid model iterations? Any tips or tools that have worked well for you?

4 Comments
2024/11/11
06:13 UTC

1

Any ideas for working on a ranking problem for sales representatives based on their historical performance.

I have a dataset of sales performance of multiple sales representatives (sales made, total amount of sales, talk time, number of customers talked to etc) and I am looking to rank them based on their predicted performance each day. My approach is to use time series model to predict who will make maximum sales next day based on past performance (lags, rolling averages for week, month etc) and then rank them based on that predicted values, could their be a better approach to solve this problem?

1 Comment
2024/11/11
04:04 UTC

5

How does network structure enforces network function ?

Hello Ladies and Gentlemen of the Machine Learning,

The more I read about neural networks, the more there is something that troubles me.
I fail to build the intuition about how structure of a network constraint what the network will actually learn.

For instance, how does the fact that in a LSTM you have a built in long term and a short term memories mean that when learning those will work as actually long and short term memories. Yes they are able to work like so but how does simple back propagation actually enforces that the model learns this way feels like magic to me.

Similarly, with transformers: they have these matrices we call “Query,” “Key,” and “Value” that create a self-attention mechanism. However, why does the learning process make them actually function as a self-attention mechanism? Aren’t they just dense parameter matrices that happen to be named this way? What ensures that backpropagation will lead them to act in this intended way?

I think my main confusion is around how backpropagation leads these specialized sub structures to fulfill their expected roles, as opposed to just learning arbitrary functions. Any clarity on this, or pointers to resources that might help, would be greatly appreciated!

Thanks in advance for any insights!

5 Comments
2024/11/10
23:32 UTC

3

Would You Be Interested in a Platform for Batch AI Inference?

Hey everyone,

We’ve been thinking about a project to simplify AI usage, while reducing cost. The core idea is that many tasks don't need an immediate response. This allows for more efficient HW usage as we can schedule the jobs to run at the cheapest times and increase overall utilization.

Examples of good asyncronous tasks would be adding metadata to photos indicating who's in the picture or transcribing hundreds of hours of conference calls.

We're curious to hear from the community...would a batch inference platform be useful for you? What features would you want to see? Or maybe you're already using something like this, and I’d love to know what’s working (or not working) for you.

0 Comments
2024/11/10
21:18 UTC

1

Tired of using libraries like Keras, TernsorFlow, Sklearn wanted to code the algorithm behind machine learning models mathematically, suggest resources paid/unpaid if any?

Hello any resource where they show the mathematical implementation of the algorithm while solving the problem hopefully in python?

6 Comments
2024/11/10
16:42 UTC

1

is there smh like Weight AI running LOCALLY

just for the conversion not training btw

0 Comments
2024/11/10
16:31 UTC

4

Hiii In fortune 100 companies, do we use scikit learn for ml algo or write ml algorithms from scratch

7 Comments
2024/11/10
15:26 UTC

1

Help needed with LLM training implementation

Hello everyone! I recently started learning how LLMs work and I think I have a decent understanding of the math behind it. I also got somewhat comfortable with using PyTorch and managed to build a Transformer model the exact architecture of gpt2. The forward function of the model outputs a tensor of the shape (b, t, voc_size).

So far so good, but while I do know how training works in theory, I can’t figure out how to do it in PyTorch (with a manual training loop). (I managed to do it for a simple image classification model (a normal NN), but just not for my llm)

It would be really nice if somebody could help me writing this training loop. For example I have two variables: example_prompt = „who is the best?“ example_response = „it‘s you!“

Now I need to know how I tokenize them with the gpt2 tokenizer (from the transformers lib) properly with the correct padding and stuff and how to train the model correctly.

It would be already enough if somebody could write this part of the code so I at least have something to see how it would be actually done, bc I just can’t find anything for this!

Thanks in advance!

0 Comments
2024/11/10
14:40 UTC

4

Dunno what to do next

Hi, I'm a clg student seeking to learn ml then dl. I know python , numpy, pandas, matplotlib and maths. I also did Andrew ng's ML specialization course.

Now I'm stuck at this point where i dunno what should i do next. Should I learn eda , preprocessing or start learning ml algorithms? If so, where and how can l learn to do these? I need your guidence guys. Please help me out. Thanks in advance!

( Edit : give some upvotes and make this post floating in top because it would be helpful for ppl like me)

4 Comments
2024/11/10
12:39 UTC

2

[Help] Seq2Seq model predicting same output token

Kaggle Notebook

I am trying to implement seq2seq model in pytorch to do translation. The problem is model generating same sequence. My goal is to implement attention for seq2seq and then eventually moving to transformers. Can anyone look at my code (Also attached kaggle notebook) :

class Encoder(nn.Module):
  def __init__(self,vocab_size,embedding_dim,hidden_dim,num_layers):
    super(Encoder,self).__init__()
    self.vocab_size = vocab_size
    self.embedding_dim = embedding_dim
    self.hidden_dim = hidden_dim
    self.num_layers = num_layers
    self.embedding = nn.Embedding(self.vocab_size,self.embedding_dim)
    self.lstm = nn.LSTM(self.embedding_dim,self.hidden_dim,self.num_layers,batch_first=True)

  def forward(self,x):
    x = self.embedding(x)
    output,(hidden_state,cell_state) = self.lstm(x)
    return output,hidden_state,cell_state


class Decoder(nn.Module):
  def __init__(self,vocab_size,embedding_dim,hidden_dim,num_layers):
    super(Decoder,self).__init__()
    self.vocab_size = vocab_size
    self.embedding_dim = embedding_dim
    self.hidden_dim = hidden_dim
    self.num_layers = num_layers
    self.embedding = nn.Embedding(self.vocab_size,self.embedding_dim)
    self.lstm = nn.LSTM(self.embedding_dim,self.hidden_dim,self.num_layers,batch_first=True)
    self.fc = nn.Linear(self.hidden_dim,self.vocab_size)

  def forward(self,x,h,c):
    x = self.embedding(x)
    output,(hidden_state,cell_state) = self.lstm(x)
    output = self.fc(output)
    return output,h,c


class Seq2Seq(nn.Module):
  def __init__(self,encoder,decoder):
    super(Seq2Seq,self).__init__()
    self.encoder = encoder
    self.decoder = decoder
  
  def forward(self,X,Y):
    output,h,c = encoder(X)
    decoder_input = Y[:,0].to(torch.int32)
    output_tensor = torch.zeros(Y.shape[0],Y.shape[1],FR_VOCAB_SIZE).to(device)
    # output_tensor[:,0] = Y[:,0] # Set same start token which is "<START>"

    for i in range(1,Y.shape[1]):
      output_d,h,c = decoder(decoder_input,h,c)
      # output shape : (batch_size,fr_vocab_size)
      decoder_input = torch.argmax(output_d,dim=1)
      # output shape : (batch_size,1)
      output_tensor[:,i] = output_d

    return output_tensor # ouput shape : (batch_size,seq_length)


class Seq2Seq2(nn.Module):
  def __init__(self,encoder,decoder):
    super(Seq2Seq2,self).__init__()
    self.encoder = encoder
    self.decoder = decoder
  
  def forward(self,X,Y):
    output,h,c = encoder(X)
    decoder_input = Y[:,:-1].to(torch.int32)
    output_tensor,h,c = self.decoder(decoder_input,h,c)
    return output_tensor

encoder = Encoder(ENG_VOCAB_SIZE,32,64,1).to(device)
decoder = Decoder(FR_VOCAB_SIZE,32,64,1).to(device)
model = Seq2Seq2(encoder,decoder).to(device)

lr = 0.001
optimizer = torch.optim.Adam(model.parameters(),lr=lr)
loss_fn = nn.CrossEntropyLoss(ignore_index=0)
epochs = 20

for epoch in range(epochs):
    running_loss = 0.0
    progress_bar = tqdm(train_dataloader, desc=f"Epoch {epoch+1}", leave=False)

    for X, Y in progress_bar:
        Y_pred = model(X, Y)
      
        # Y = Y[:,1:]
        # Y_pred = Y_pred[:,:-1,:]
        Y_pred = Y_pred.reshape(-1, Y_pred.size(-1))  # Flatten to (batch_size * seq_length, vocab_size)
        Y_true = Y[:,1:]
       
        Y_true = Y_true.reshape(-1)  # Flatten to (batch_size * seq_length)
       
        loss = loss_fn(Y_pred, Y_true)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # Update running loss and display it in tqdm
        running_loss += loss.item()
        progress_bar.set_postfix(loss=loss.item())

    print(f"Epoch {epoch+1}, Loss = {running_loss/len(train_dataloader)}")
0 Comments
2024/11/10
02:32 UTC

2

What does "use log probability to automatically increase the temperature until certain thresholds are hit" mean when using OpenAI ASR with temperature=0?

I read on https://platform.openai.com/docs/api-reference/audio/createTranscription#audio-createtranscription-temperature (mirror):

temperature. number. Optional. Defaults to 0. The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

What does "use log probability to automatically increase the temperature until certain thresholds are hit" mean when using OpenAI ASR with temperature=0?

0 Comments
2024/11/09
19:45 UTC

6

How does your ML team manage the transition from research to production?

I'm curious to know how different teams handle the handoff from the research phase to production. Specifically, I’d love to learn about:

  1. Research Workflow: How do researchers in your team structure their work? Do they follow specific guidelines or frameworks?
  2. Data Management: If your team works with large datasets, how do you store and manage them? Are there specific tools or practices you rely on?
  3. Experiment Documentation: How do you document experiments, especially when they involve multiple iterations and parameters? Are there common tools or practices for tracking results and sharing findings?
  4. Transition to Production: How do you hand off models from research to production? Are there dedicated roles or steps involved in ensuring the transition is smooth and maintains model accuracy?
  5. Continuous Training: Once a model is in production, who manages the retraining cycle? How do you handle updating and monitoring models in production?

Any insights into your team’s process and the tools you use would be super helpful. Thanks in advance!

6 Comments
2024/11/09
18:15 UTC

2

The dynamics of SGD

Hello,

I have a background in pure mathematics, and I would like to understand better the dynamics of stochastic gradient descent (SGD), for example speed of convergence, guarantees of convergence, continuous approximations of SGD... but in the stochastic case, that is, not just classical convex optimization where the objective function is fully known.

Would you have any recent references to get up to date? I would prefer recent papers. Thank you very much

2 Comments
2024/11/09
16:23 UTC

5

BatchNorm and Normal Distribution

Why do so many resources assume inputs and outputs follow a Normal/Gaussian Distribution when discussing BatchNorm? My understanding is that there is no guarantee that the distribution of inputs into BatchNorm (or really anywhere else in a network) will be normal. All were doing is standardizing those inputs but they could really have almost any distribution and BatchNorm doesnt change the shape of that distribution.

1 Comment
2024/11/09
15:39 UTC

1

Need help with classification problem

Hello everyone.

I have a question. I am just starting my journey in machine learning, and I have encountered a problem.

I need to make a neural network that would determine from an image whether the camera was blocked during shooting (by a hand, a piece of paper, or an ass - it doesn't matter). In other words, I need to make a classifier. I took mobilenet, downloaded different videos from cameras, made a couple of videos with blockages, added augmentations and retrained mobilenet on my data. It seems to work, but periodically the network incorrectly classifies images.

Question: how can such a classifier be improved? Or is my approach completely wrong?

1 Comment
2024/11/09
14:46 UTC

1

I created a Podcast trying to explain the R-CNN can you suggest how should I improve?

1 Comment
2024/11/09
08:22 UTC

1

How to train using my own dataset for real time object detection?

In recent weeks, I've become interested in creating my own video object detection model. I don’t want to build it entirely from scratch but would like to train it using my own dataset. However, I’m unsure where to start. Could someone guide me on where to begin, what tools I can use to prepare my dataset, and what trainable models are available? Any advice would be greatly appreciated.

4 Comments
2024/11/09
04:11 UTC

Back To Top