/r/learnmachinelearning
A subreddit dedicated to learning machine learning
A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.
Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.
/r/learnmachinelearning
I'm a clg student seeking to learn ml then dl. I know python , numpy, pandas, matplotlib and maths. I also did Andrew ng's ML specialization course.
Now I'm stuck at this point where i dunno what should i do next. Should I learn eda , preprocessing or start learning ml algorithms? If so, where and how can l learn to do these? I need your guidence guys. Please help me out. Thanks in advance!
( Edit : give some upvotes and make this post floating in top because it would be helpful for ppl like me)
Offer Details:
Starting today, Coursera is offering a 40% discount on our annual Coursera Plus subscription. You can gain unlimited access to over 7,000 courses, including Professional Certificates from top industry leaders like Google, Meta, Microsoft, IBM, and more — all for just $239 (regularly $399) for 12 months. Read Main article.
I have just completed mathematics for machine learning book and I am unable to find resources to practice questions. Can anyone suggest me some resources?
Hi everyone! I'm a student working on a machine learning project and I'm in need of a dataset. Ideally, I’m looking for a dataset that has a few thousand samples with around 15 features that I can preprocess and then use for training ML algorithms. I’ve received suggestions on general sources for datasets, but I’m looking for particular datasets that are well-suited for hands-on learning and experimentation. Any specific recommendations would be very appreciated!
Thank you in advance!
I am trying to implement seq2seq model in pytorch to do translation. The problem is model generating same sequence. My goal is to implement attention for seq2seq and then eventually moving to transformers. Can anyone look at my code (Also attached kaggle notebook) :
class Encoder(nn.Module):
def __init__(self,vocab_size,embedding_dim,hidden_dim,num_layers):
super(Encoder,self).__init__()
self.vocab_size = vocab_size
self.embedding_dim = embedding_dim
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.embedding = nn.Embedding(self.vocab_size,self.embedding_dim)
self.lstm = nn.LSTM(self.embedding_dim,self.hidden_dim,self.num_layers,batch_first=True)
def forward(self,x):
x = self.embedding(x)
output,(hidden_state,cell_state) = self.lstm(x)
return output,hidden_state,cell_state
class Decoder(nn.Module):
def __init__(self,vocab_size,embedding_dim,hidden_dim,num_layers):
super(Decoder,self).__init__()
self.vocab_size = vocab_size
self.embedding_dim = embedding_dim
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.embedding = nn.Embedding(self.vocab_size,self.embedding_dim)
self.lstm = nn.LSTM(self.embedding_dim,self.hidden_dim,self.num_layers,batch_first=True)
self.fc = nn.Linear(self.hidden_dim,self.vocab_size)
def forward(self,x,h,c):
x = self.embedding(x)
output,(hidden_state,cell_state) = self.lstm(x)
output = self.fc(output)
return output,h,c
class Seq2Seq(nn.Module):
def __init__(self,encoder,decoder):
super(Seq2Seq,self).__init__()
self.encoder = encoder
self.decoder = decoder
def forward(self,X,Y):
output,h,c = encoder(X)
decoder_input = Y[:,0].to(torch.int32)
output_tensor = torch.zeros(Y.shape[0],Y.shape[1],FR_VOCAB_SIZE).to(device)
# output_tensor[:,0] = Y[:,0] # Set same start token which is "<START>"
for i in range(1,Y.shape[1]):
output_d,h,c = decoder(decoder_input,h,c)
# output shape : (batch_size,fr_vocab_size)
decoder_input = torch.argmax(output_d,dim=1)
# output shape : (batch_size,1)
output_tensor[:,i] = output_d
return output_tensor # ouput shape : (batch_size,seq_length)
class Seq2Seq2(nn.Module):
def __init__(self,encoder,decoder):
super(Seq2Seq2,self).__init__()
self.encoder = encoder
self.decoder = decoder
def forward(self,X,Y):
output,h,c = encoder(X)
decoder_input = Y[:,:-1].to(torch.int32)
output_tensor,h,c = self.decoder(decoder_input,h,c)
return output_tensor
encoder = Encoder(ENG_VOCAB_SIZE,32,64,1).to(device)
decoder = Decoder(FR_VOCAB_SIZE,32,64,1).to(device)
model = Seq2Seq2(encoder,decoder).to(device)
lr = 0.001
optimizer = torch.optim.Adam(model.parameters(),lr=lr)
loss_fn = nn.CrossEntropyLoss(ignore_index=0)
epochs = 20
for epoch in range(epochs):
running_loss = 0.0
progress_bar = tqdm(train_dataloader, desc=f"Epoch {epoch+1}", leave=False)
for X, Y in progress_bar:
Y_pred = model(X, Y)
# Y = Y[:,1:]
# Y_pred = Y_pred[:,:-1,:]
Y_pred = Y_pred.reshape(-1, Y_pred.size(-1)) # Flatten to (batch_size * seq_length, vocab_size)
Y_true = Y[:,1:]
Y_true = Y_true.reshape(-1) # Flatten to (batch_size * seq_length)
loss = loss_fn(Y_pred, Y_true)
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Update running loss and display it in tqdm
running_loss += loss.item()
progress_bar.set_postfix(loss=loss.item())
print(f"Epoch {epoch+1}, Loss = {running_loss/len(train_dataloader)}")
I’m planning to learn machine learning, but I’m at the start of my computer science degree and feeling a bit overwhelmed with all the options out there. I’d love some guidance on where to begin, especially since I want to do a masters in machine learning and I want to be a stand out applicant.
Some questions I have:
Thank you for your answers in advance :)
I'm tracking a value in time with noisy measurements and am interested in knowing both the estimate of the underlying value at a given instant in time, as well as the estimated error in that value at each instant in time. (Essentially, value plus-minus error, both as a functions of time).
For example, if the real value did a step function in time, the measured value would have some transition as it jumps to the new value, and during that transition, the error would spike.
I've been trying a Bayesian linear dynamic system with a Kalman filter. (It's possible that I've implemented this wrong) but it seems to get increasingly certain, even when horribly wrong. Any suggestions for good algorithms to use for this type of problem?
Also, the measurement noise is gaussian and I know about what its distribution is if that helps at all.
hi guys, i got an offer from a small insurance company which is data scientist, which will be working on predict customer behavior and fed into risk equation(will include deployment and monitoring) but i think this role is lack of work life balance. My current role is machine learning engineer and mainly working on research and proof of concept using genAI like GPT, big insurance company with great work life balance. The offer i got is about 10% higher than my current role, please share some advice as I'm struggling to make a wise decision
While training a classification Neural Network I keep getting a very volatile / "jumpy" test accuracy? This is still the early stages of me fine tuning the network but I'm curious if this has any well known implications about the model? How can I get it to stabilize at a higher accuracy? I appreciate any feedback or thoughts on this.
Hello I am trying to figure out the formula of the partial derivatives of loss of MSE (mean square error) with the parameters of a single layer multi perceptron network. There are m input neurons, n hidden neurons. Weights of the hidden layer is defined as m x n matrix. bias of the hidden layer n- dimensional vector. weights of the output matrix n x k matrix. bias of the output neuron is a k dimensional vector.
Given the above dimensions, I define a true output matrix Y which is a matrix that holds true value of Y for all samples. I have N samples and the number of output neurons is n, thus resulting in an n x N matrix. The same can be developed for Ypred which represents our prediction values.
Define a loss function based on MSE as the sum of (Yi - Ypredi)**2.
One can easily see that derivative d (MSE) / d (Ypred) = 2/ N (Ypred - Y).
Now in trying to find d(MSE) / d(Wo), where W0 represents weights of the output layer. I tried to use chain rule to simplify d (MSE / d(Wo) = d(MSE) / d(Ypred) X d(Ypred) / d(W0). And Ypred = W0 ^ T * Xh + B0.... where Xh is output of the hidden layer and B0 represents the bias which includes the value b0 vectors as column N times. However I am stuck here, because derivative of a Matrix (Ypred) with a matrix W0 is a tensor right. How do I simplify the above relationships and continue with the other parameters.
Any help even with only the answers will be appreciated thanks....
Does anyone have any good resources that explains the math math behind diffusion models crystal clear?
I am currently trying to build a simple multi class image classifier. I want to use a pretrained model for image embeddings. However, to reliably differentiate the classes of my task, the model also needs to take into context the text/numbers displayed in the image. The number of texts per image to be classified is not fixed in size.
Most vision encoders have a fairly small input size, which makes text intelligible for the model, requiring the need to extract the required text using a different approach, for example using OCR tools.
My idea would be to run a detection + recognition OCR tool and then embed the recognized text, using a text encoder and then add positional embeddings based on the bounding box location in the image.
However, given the "n" embedded texts + the embedded image, what would be the best way to combine them then and feed them into a classifcation head, for example?
In general, is the approach I am trying to take feasible or are there any other ones which I can apply which ensure that the text in the image is taken into account, in addition to the general image structure?
Thank you guys in advance!
Hello,
I have a background in pure mathematics, and I would like to understand better the dynamics of stochastic gradient descent (SGD), for example speed of convergence, guarantees of convergence, continuous approximations of SGD... but in the stochastic case, that is, not just classical convex optimization where the objective function is fully known.
Would you have any recent references to get up to date? I would prefer recent papers. Thank you very much
Hello everyone.
I have a question. I am just starting my journey in machine learning, and I have encountered a problem.
I need to make a neural network that would determine from an image whether the camera was blocked during shooting (by a hand, a piece of paper, or an ass - it doesn't matter). In other words, I need to make a classifier. I took mobilenet, downloaded different videos from cameras, made a couple of videos with blockages, added augmentations and retrained mobilenet on my data. It seems to work, but periodically the network incorrectly classifies images.
Question: how can such a classifier be improved? Or is my approach completely wrong?
It's almost 120 days into ML. I only learned basic terminology and basic statistics and am applying the ML library to do projects, but I want to learn ML properly(Math).
will it be worth it?
And please provide any other resources.
Thank you
Logit reg - Accuracy: 0.90 Confusion Matrix: [[67472 499] [ 6679 511]] True Positive: 511 True Negative: 67472 False Negative: 6679 False Positive: 499 Sensitivity: 0.07 Specificity: 0.99 Positive Predictive Value: 0.51 Negative Predictive Value: 0.91 Classification Report: precision recall f1-score support
0 0.91 0.99 0.95 67971
1 0.51 0.07 0.12 7190
accuracy 0.90 75161
I'm performing a Frequent Pattern Mining analysis on a dataframe in pandas.
Suppose I want to find the most frequent patterns for columns A, B and C. I find several patterns, let's pick one: (a, b, c). The problem is that with high probability this pattern is frequent just because a is very frequent in column A per se, and the same with b and c. How can I discriminate patterns that are frequent for this trivial reason and others that are frequent for interesting reasons? I know there are many metrics to do so like the lift, but they are all binary metrics, in the sense that I can only calculate them on two-columns-patterns, not three or more. Is there a way to to this for a pattern of arbitrary length?
One way would be calculating the lift on all possible subsets of length two:
lift(A, B)
lift((A, B), C)
and so on
but how do I aggregate all the results to make a decision?
Any advice would be really appreciated.
I'm a developer but newbie in AI and this is my first question I ever posted about it.
Our non-profit site hosts data of people such as biographies. I'm looking to build something like chatgpt that could help users search through and make sense of this data.
For example, if someone asks, "how many people died of covid and were married in South Carolina" it will be able to tell you.
Basically an AI driven search engine based on our data.
I don't know where to start looking or coding. I somehow know I need an llm model and datasets to train the AI. But how do I find the model, then how to install it and what UI do we use to train the AI with our data. Our site is powered by WordPress.
Basically I need a guide on where to start.
Thanks in advance!
Hi everyone,
I recently completed my MCA (Master of Computer Applications) and unfortunately, I wasn't able to secure a placement during campus recruitment. I’m now feeling a bit lost, as many of my peers have already landed jobs, and I’m concerned about the impact of this study gap on my job prospects. I’ve decided to focus on building a career in machine learning, but I’m not sure where to start, given that I’m a fresher without prior experience in this field.
Could anyone guide me on how to begin my journey in machine learning from scratch? What are the essential skills I need to acquire, and what resources (books, courses, projects) would be helpful for a beginner like me?
Additionally, in the current job market conditions, do you think it’s realistic to land a job in machine learning? Are there specific strategies I should adopt to stand out in this competitive job market?
Any advice or personal experiences would be greatly appreciated!
Thanks in advance!
That's it, he explains so well practical concepts, andrew ng is good too but mostly for theoretical
As the title: We offer Perplexity AI PRO voucher codes for one year plan.
To Order: https://cheapgpts.store/Perplexity
Payments accepted:
New to Machine Learning? Start Here with a Beginner-Friendly Roadmap!😌
Machine learning can seem daunting, but with the right roadmap, anyone can get started. This post lays out a clear, beginner-friendly plan to help newcomers navigate the world of ML. From understanding basic algorithms to working with Python and PyTorch, you’ll find resources to start building and deploying your own models. Say goodbye to confusion and hello to actionable steps toward ML mastery.
Ready to begin your ML journey? Head over to r/learnmachinelearning and start with this guide! 👇🏽
I implemented basic gradient descent for linear regression first in numpy and then using pytorch. However, with the same data, parameter initialization and learning rate, one converges (numpy, left) while the other diverges (pytorch, right)
Here is the code for each:
Numpy:
import math
import matplotlib.pyplot as plt
import numpy as np
n = 50
np.random.seed(1)
x = np.linspace(0, 2*math.pi, n)
y = np.sin(x)
y += np.random.normal(scale=0.1, size=len(y))
alpha = 0.15
m = 0
b = 0
losses = []
fig, axs = plt.subplots(2)
while True:
axs[0].plot(x, m*x+b)
axs[0].scatter(x, y)
axs[1].plot(losses)
plt.draw()
plt.waitforbuttonpress()
for ax in axs:
ax.clear()
b -= alpha * 1/n * sum(b + m*x[i] - y[i] for i in range(n))
m -= alpha * 1/n * sum((b + m*x[i] - y[i]) * x[i] for i in range(n))
mse = sum((y - (m*x+b))**2)/n
losses.append(mse)
Pytorch:
import math
import matplotlib.pyplot as plt
import numpy as np
import torch.nn
n = 50
np.random.seed(1)
x = np.linspace(0, 2*math.pi, n)
y = np.sin(x)
y += np.random.normal(scale=0.1, size=len(y))
x = torch.from_numpy(x)
y = torch.from_numpy(y)
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)
alpha = 0.15
m = torch.zeros(1, requires_grad=True)
b = torch.zeros(1, requires_grad=True)
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.SGD([m, b], lr=alpha)
losses = []
fig, axs = plt.subplots(2)
while True:
y_est = m * x + b
loss = loss_fn(y_est, y)
losses.append(loss.item())
loss.backward()
optimizer.step()
optimizer.zero_grad()
axs[0].plot(x, y_est.detach().numpy())
axs[0].scatter(x, y)
axs[1].plot(losses)
plt.draw()
plt.waitforbuttonpress()
for ax in axs:
ax.clear()
Even when I drop the LR to 0.1 they still behave the same, so I don't think it's a small rounding error or similar.
I've read that the cosine similarity activation function isn't usually used in practice as an activation function and I wonder why that is. Specifically if the use case is training for similarity?
I am currently training a sentence transformer neural net using linear activation functions but assessed against labelled cosine similarity scores based on doc2vec vectors so the match doesn't seem to great as output values can be quite outside the bounds of the cosine similarity function.
I came across a recent video featuring Geoffrey Hinton where he said (I'm paraphrasing) in the context of humans learning languages, "(...) recent models show us that stochastic gradient descent is really how the brain learns (...)" and I remember him comparing "weights" to "synapses" in the brain. If we were to take this analogy forward - if weights are synapses in the brain, what would the learning rate be?
I just finished neural network zero to hero by andrej karpathy. I am trying to again revise it as it is so much information dense.
What other course should I take? I was looking forward to fast ai ?is it good or should I go for cs231n? Or what should I do ?
I have tried learning from it multiple times and from multiple versions of it. I just don't get how there are some people going on to work at big tech AI labs who attribute Fast.ai for their success. I understand my learning style could be different from the intended audience, but I'd like to know the people it benefited.
Firstly, the notebooks/book have little to do with the videos. Secondly, there is so much abstraction that it kind of doubles your work, as you need to look up how something is actually implemented in PyTorch. Thirdly, everything is a notebook, and I am not a fan of notebooks.