/r/MLQuestions
A place for beginners to ask stupid questions and for experts to help them! /r/Machine learning is a great subreddit, but it is for interesting articles and news related to machine learning. Here, you can feel free to ask any question regarding machine learning.
What kinds of questions do we want here?
"I've just started with deep nets. What are their strengths and weaknesses?" "What is the current state of the art in speech recognition?" "My data looks like X,Y what type of model should I use?"
If you are well versed in machine learning, please answer any question you feel knowledgeable about, even if they already have answers, and thank you!
Related Subreddits:
/r/MLQuestions
I've covered maths and derivations behind Multiple Linear regression in detail. https://www.youtube.com/watch?v=_ctvTfqtX9c
Hello, I am writing transformer for my Sequence Processing library but I got stuck at some point.
S = Sequence length L = Word embedding length V = Vocabulary size
As you know, the transformer takes two inputs during training, one for the encoder and one for the decoder. However, the input word count of the encoder (Sencoderinput) may be different from that of the decoder(Sdecoderinput). The problem is that since the output of the encoder (SxL matrix) is used in the decoder, the input length of the encoder and the decoder must be the same (Sencoderinput ?= Sdecoderinput). Should we specify a fixed length that will not exceed any instance?
Second question is the output of the decoder matrix is SxV matrix, but each output is for one word so we need 1xV vector. What is the use of the remaining part?
Hey guys, as the title says, I'm looking for papers about the architecture of LLM-based agent systems. Any recommendations are highly appreciated!
I have an e-commerce product database, and my goal is to automatically identify products that belong to the same model (e.g., a black iPhone and a white iPhone would be variations of the same model).
Aside from embedding product names and searching by embedding proximity, are there other effective approaches for finding products that belong to the same model?
Thanks for any insights!
No use of cloud computing in my company. Now can some one help me how to monitor the model and retrain it For monitoring, I see evidently ai is good and for retraining airflow is good
Please someone provide some good tutorial or a simple basic code
The dataset I am using has no splits. And previous work do k-fold without a test set. I think I have to follow the same if I want to benchmark against theirs. But my Val accuracy on each fold is keeping fluctuating. What should I report for my result?
Say I want to fine-tune llama to answer questions about medical data and just want to use a medical textbook, the text book states facts but has no question answer matterial that looks like the text I wan't my chatbot to replicate.
Is this possible? Or do I need to have it in a question answer format?
Hey Reddit, I’m trying to put together a GNN for a small molecule dataset. I’ve done this fine with planar molecules before but now I have complex 3D bridged molecules which make the 2D representation very messy. Any tips on how to better represent these? I’ve tried RDkit but the alternatives to GNN result in thousands of features which is a no go for my ~ 350 molecule dataset. Cheers!
My friends and I are working on a project where we capture weather radar images from Windy and extract contours based on DBZ values, by mapping the RGB value in a pixel to a DBZ value. We've successfully automated the process of capturing images and extracting contours, but moving from extracting contours using RGB to predicting the shapes of a contour is quite a leap. Currently, we are trying to find out
I was not a good student in school, and never paid much attention to learn Mathematics. However, I am planning a career as a MLE and I know that I need to learn Mathematics for a successful career in Machine Learning. I have planned to study Mathematics for Machine Learning the hard way, start from the beginning, and then move all the way to learning Calculus. This is because learning Calculus requires prerequisites and I am not sure what I will be missing if I try to pick and choose topics.
My question is, if I speedrun the following syllabus from the beginning till the end, will it be enough for me to start learning Mathematics for Machine Learning? https://www.khanacademy.org/math/in-math-ncert
I recently began supporting an AI/ML/Big Data infra group at my company. I'm looking for resources, preferably a book vs online videos but I'll take either, that are a step below the high level 'what is ml/ai/etc' but also not heavily technical. I don't need to learn the code but want some more information so I have a better understanding of what my folks are talking about on calls. If it matters for the resources, it's cloud based vs on-prem. Thanks for any direction you can provide!
Hi all,
We’re working on setting up a CI/CD pipeline for deploying ML models, but the process is slow, especially with model retraining and testing in the loop. Does anyone have advice on optimizing the pipeline to cut down deployment time, particularly for rapid model iterations? Any tips or tools that have worked well for you?
I have a dataset of sales performance of multiple sales representatives (sales made, total amount of sales, talk time, number of customers talked to etc) and I am looking to rank them based on their predicted performance each day. My approach is to use time series model to predict who will make maximum sales next day based on past performance (lags, rolling averages for week, month etc) and then rank them based on that predicted values, could their be a better approach to solve this problem?
Hello Ladies and Gentlemen of the Machine Learning,
The more I read about neural networks, the more there is something that troubles me.
I fail to build the intuition about how structure of a network constraint what the network will actually learn.
For instance, how does the fact that in a LSTM you have a built in long term and a short term memories mean that when learning those will work as actually long and short term memories. Yes they are able to work like so but how does simple back propagation actually enforces that the model learns this way feels like magic to me.
Similarly, with transformers: they have these matrices we call “Query,” “Key,” and “Value” that create a self-attention mechanism. However, why does the learning process make them actually function as a self-attention mechanism? Aren’t they just dense parameter matrices that happen to be named this way? What ensures that backpropagation will lead them to act in this intended way?
I think my main confusion is around how backpropagation leads these specialized sub structures to fulfill their expected roles, as opposed to just learning arbitrary functions. Any clarity on this, or pointers to resources that might help, would be greatly appreciated!
Thanks in advance for any insights!
Hey everyone,
We’ve been thinking about a project to simplify AI usage, while reducing cost. The core idea is that many tasks don't need an immediate response. This allows for more efficient HW usage as we can schedule the jobs to run at the cheapest times and increase overall utilization.
Examples of good asyncronous tasks would be adding metadata to photos indicating who's in the picture or transcribing hundreds of hours of conference calls.
We're curious to hear from the community...would a batch inference platform be useful for you? What features would you want to see? Or maybe you're already using something like this, and I’d love to know what’s working (or not working) for you.
just for the conversion not training btw
Hello everyone! I recently started learning how LLMs work and I think I have a decent understanding of the math behind it. I also got somewhat comfortable with using PyTorch and managed to build a Transformer model the exact architecture of gpt2. The forward function of the model outputs a tensor of the shape (b, t, voc_size).
So far so good, but while I do know how training works in theory, I can’t figure out how to do it in PyTorch (with a manual training loop). (I managed to do it for a simple image classification model (a normal NN), but just not for my llm)
It would be really nice if somebody could help me writing this training loop. For example I have two variables: example_prompt = „who is the best?“ example_response = „it‘s you!“
Now I need to know how I tokenize them with the gpt2 tokenizer (from the transformers lib) properly with the correct padding and stuff and how to train the model correctly.
It would be already enough if somebody could write this part of the code so I at least have something to see how it would be actually done, bc I just can’t find anything for this!
Thanks in advance!
Hi, I'm a clg student seeking to learn ml then dl. I know python , numpy, pandas, matplotlib and maths. I also did Andrew ng's ML specialization course.
Now I'm stuck at this point where i dunno what should i do next. Should I learn eda , preprocessing or start learning ml algorithms? If so, where and how can l learn to do these? I need your guidence guys. Please help me out. Thanks in advance!
( Edit : give some upvotes and make this post floating in top because it would be helpful for ppl like me)
I am trying to implement seq2seq model in pytorch to do translation. The problem is model generating same sequence. My goal is to implement attention for seq2seq and then eventually moving to transformers. Can anyone look at my code (Also attached kaggle notebook) :
class Encoder(nn.Module):
def __init__(self,vocab_size,embedding_dim,hidden_dim,num_layers):
super(Encoder,self).__init__()
self.vocab_size = vocab_size
self.embedding_dim = embedding_dim
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.embedding = nn.Embedding(self.vocab_size,self.embedding_dim)
self.lstm = nn.LSTM(self.embedding_dim,self.hidden_dim,self.num_layers,batch_first=True)
def forward(self,x):
x = self.embedding(x)
output,(hidden_state,cell_state) = self.lstm(x)
return output,hidden_state,cell_state
class Decoder(nn.Module):
def __init__(self,vocab_size,embedding_dim,hidden_dim,num_layers):
super(Decoder,self).__init__()
self.vocab_size = vocab_size
self.embedding_dim = embedding_dim
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.embedding = nn.Embedding(self.vocab_size,self.embedding_dim)
self.lstm = nn.LSTM(self.embedding_dim,self.hidden_dim,self.num_layers,batch_first=True)
self.fc = nn.Linear(self.hidden_dim,self.vocab_size)
def forward(self,x,h,c):
x = self.embedding(x)
output,(hidden_state,cell_state) = self.lstm(x)
output = self.fc(output)
return output,h,c
class Seq2Seq(nn.Module):
def __init__(self,encoder,decoder):
super(Seq2Seq,self).__init__()
self.encoder = encoder
self.decoder = decoder
def forward(self,X,Y):
output,h,c = encoder(X)
decoder_input = Y[:,0].to(torch.int32)
output_tensor = torch.zeros(Y.shape[0],Y.shape[1],FR_VOCAB_SIZE).to(device)
# output_tensor[:,0] = Y[:,0] # Set same start token which is "<START>"
for i in range(1,Y.shape[1]):
output_d,h,c = decoder(decoder_input,h,c)
# output shape : (batch_size,fr_vocab_size)
decoder_input = torch.argmax(output_d,dim=1)
# output shape : (batch_size,1)
output_tensor[:,i] = output_d
return output_tensor # ouput shape : (batch_size,seq_length)
class Seq2Seq2(nn.Module):
def __init__(self,encoder,decoder):
super(Seq2Seq2,self).__init__()
self.encoder = encoder
self.decoder = decoder
def forward(self,X,Y):
output,h,c = encoder(X)
decoder_input = Y[:,:-1].to(torch.int32)
output_tensor,h,c = self.decoder(decoder_input,h,c)
return output_tensor
encoder = Encoder(ENG_VOCAB_SIZE,32,64,1).to(device)
decoder = Decoder(FR_VOCAB_SIZE,32,64,1).to(device)
model = Seq2Seq2(encoder,decoder).to(device)
lr = 0.001
optimizer = torch.optim.Adam(model.parameters(),lr=lr)
loss_fn = nn.CrossEntropyLoss(ignore_index=0)
epochs = 20
for epoch in range(epochs):
running_loss = 0.0
progress_bar = tqdm(train_dataloader, desc=f"Epoch {epoch+1}", leave=False)
for X, Y in progress_bar:
Y_pred = model(X, Y)
# Y = Y[:,1:]
# Y_pred = Y_pred[:,:-1,:]
Y_pred = Y_pred.reshape(-1, Y_pred.size(-1)) # Flatten to (batch_size * seq_length, vocab_size)
Y_true = Y[:,1:]
Y_true = Y_true.reshape(-1) # Flatten to (batch_size * seq_length)
loss = loss_fn(Y_pred, Y_true)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Update running loss and display it in tqdm
running_loss += loss.item()
progress_bar.set_postfix(loss=loss.item())
print(f"Epoch {epoch+1}, Loss = {running_loss/len(train_dataloader)}")
I read on https://platform.openai.com/docs/api-reference/audio/createTranscription#audio-createtranscription-temperature (mirror):
temperature. number. Optional. Defaults to 0. The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
What does "use log probability to automatically increase the temperature until certain thresholds are hit" mean when using OpenAI ASR with temperature=0?
I'm curious to know how different teams handle the handoff from the research phase to production. Specifically, I’d love to learn about:
Any insights into your team’s process and the tools you use would be super helpful. Thanks in advance!
Hello,
I have a background in pure mathematics, and I would like to understand better the dynamics of stochastic gradient descent (SGD), for example speed of convergence, guarantees of convergence, continuous approximations of SGD... but in the stochastic case, that is, not just classical convex optimization where the objective function is fully known.
Would you have any recent references to get up to date? I would prefer recent papers. Thank you very much
Why do so many resources assume inputs and outputs follow a Normal/Gaussian Distribution when discussing BatchNorm? My understanding is that there is no guarantee that the distribution of inputs into BatchNorm (or really anywhere else in a network) will be normal. All were doing is standardizing those inputs but they could really have almost any distribution and BatchNorm doesnt change the shape of that distribution.
Hello everyone.
I have a question. I am just starting my journey in machine learning, and I have encountered a problem.
I need to make a neural network that would determine from an image whether the camera was blocked during shooting (by a hand, a piece of paper, or an ass - it doesn't matter). In other words, I need to make a classifier. I took mobilenet, downloaded different videos from cameras, made a couple of videos with blockages, added augmentations and retrained mobilenet on my data. It seems to work, but periodically the network incorrectly classifies images.
Question: how can such a classifier be improved? Or is my approach completely wrong?
In recent weeks, I've become interested in creating my own video object detection model. I don’t want to build it entirely from scratch but would like to train it using my own dataset. However, I’m unsure where to start. Could someone guide me on where to begin, what tools I can use to prepare my dataset, and what trainable models are available? Any advice would be greatly appreciated.