/r/learnmachinelearning
A subreddit dedicated to learning machine learning
A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.
Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.
/r/learnmachinelearning
I’m building an app where users will upload image and my model will need to identify the product style and the print used pattern id). Which model is best to use to transfer learning? I appreciate the help!
I'm new to building a PC and i'm learning various new topics along the way. Any suggestions on which GPU would be a good starting point?
Is the following topics very relevant for foundational models and NLP/CV? I am thinking of taking search and optimization class next quarter. Do you recommend taking this class if I am thinking of pursuing PhD in Computer Vision?
Schedule
• Week 1: Numerical Optimization (I)
(first-order and second-order directions, line search, various accelerations)
• Week 2: Stochastic Search
(simulated annealing, cross-entropy methods, search gradient)
• Week 3: Classical Search
(heuristic search, adversarial search, sampling-based planning)
• Week 4: Reinforcement Learning (I)
(MDP, value and policy iteration, temporal-difference, Q-learning)
• Week 5: Reinforcement Learning (II)
(deep Q-learning, policy gradient, policy improvement theorems)
• Week 6: Bandits and Monte Carlo Tree Search
(concentration bounds, upper confidence bound, MCTS, AlphaGo)
• Week 7: Combinatorial Search (I)
(constraint programming, SAT, conflict-driven backtracking)
• Week 8: Combinatorial Search (II)
(integer programming, cutting planes, general nonlinear problems)
• Week 9: Numerical Optimization (II)
(gradient projection, Lagrange duality, interior point methods)
It seems that the “building ml models” part is going to ml engineers, while data scientists especially at big tech companies are just analysts that do ab testing (at least from reading job descriptions).
Is DS still a good path if i like to analyze data and build ml models or should i switch to ml engineer? I am currently studying MS is data science, i can switch to CS but it would cost me one year, if it is worth it i will do it no problem
I understand that we can see similarity (forgetting cosine similarity for now) by understanding how much two vectors "align." However, a larger dot product means more "alignment," and this is where I get confused.
If we have vector embeddings a=[10,20]; b=[11,21]; and c=[15,25], visually in the dimensional space, a and b would be "more similar" because they are so close, but the dot product would be lower than that of a and c.
Since the dot product is higher for a and c, my understanding suggests they are more aligned and therefore more similar.
However, we know that a and b are closer. How should I interpret the dot product in this case? Or generally dot product for vector embeddings and other data structures.
I hot-wired the ANN to itself, and had it play 1 million times. Awful results, and it fails to learn very well. I'm considering putting it up against a hard-coded bot instead of itself, to provide a greater challenge, or maybe start with human training, then after 1000 games, it will play against itself?
ANN details:
This project will utilize an ANN, and ASCII depictions of the Tic-Tac-Toe board. In the ASCII depiction, the Xs and Os will be displayed, but empty spaces will be replaced with a "." It will run locally on the user's CPU. The ANN will be manually programmed, with no libraries.
Idea:
The input layer consists of 9 neurons, each with an input normalized to either 1, or -1. 1 will indicate a move by the player, 0 will indicate an empty spot, and -1 will indicate a move made by the ANN. IE:
Board visualization:
User is X, ANN is O:
```
X | O | X
---------
O | X | .
---------
X | O | X
```
The input layer will be:
```
{1, -1, 1, -1, 1, 0, 1, -1, 1}
```
This will enable the ANN to start playing either player, and be able to continue previous games.
Output will consist of 9 neurons, with a weight between 1 and -1. The highest output will become the spot where the ANN is to place their piece. If the highest output already has a piece in it, the second highest, then the 3rd, and so on will be considered.
# Neurons
Each neuron will have the following properties:
- A weight between 1 and -1
- A bias from between 1 and -1
- A value between 1 and -1
- A function to calculate the value of the neuron
- A function to update the weight and bias of the neuron
For the input neuron, the input is multiplied by the weight, and the bias is added to the result. The value is then normalized to between 1 and -1 by dividing by the sum of the absolute value of the weight and the bias. It's then fed into a binary function, where if the total is more than 0.5, it activates. Same is similar for hidden layer neurons, but the input from every connected neuron is multiplied by the weight, summed up, averaged, and then normalized to between 1 and -1 by dividing by the sum of the absolute value of the weight and the bias. The value is then fed into a binary function, where if the total is more than 0.5, it activates. For output neurons, there is no activation function.
# Architecture
Will run locally on the user's CPU.
Input layer: 9 neurons
Hidden layer #1: 9 neurons
Hidden layer #2: 9 neurons
Hidden layer #3: 9 neurons
Output layer: 9 neurons
# Training
ALL VALUES WILL BE NORMALIZED TO C++ LONG DOUBLES.
The ANN will be trained using the following method:
- The ANN will start with random weights and biases between -0.5 and 0.5.
- The ANN will play a game of Tic-Tac-Toe against a human player
During the game, a win/loss/draw detection function will be run after every move.
- The ANN will then add 2/(1+(game#/200) (minimum will be 0.1 to prevent stagnation) to the weights and a quarter of that to the biases of the neurons that were activated when the ANN won, and subtract 2/(1+(game#/101) (minimum will be 0.1 to prevent stagnation) from the weights and a quarter of that from the biases of the neurons that were activated when the human player won.
**NOTE THAT THE BIASES ARE EXEMPT FROM THE 0.0001 RULE, THEY GO TO A MINNIMUM OF 0.000025, WHICH IS 1/4th OF THE MINNIMUM WEIGHT**.
- The ANN will then play another game of Tic-Tac-Toe against a human player
There are plans to implement ANN vs ANN training.
#Win/Loss Detection
- A program will be written to detect if the game has been won, lost, or is a draw
Background : I've taken Andrew Ng's Machine learning specialisation. Now I want to learn python libraries like matplotlib , pandas and scikit learn and tensorflow for DL in depth.
PS : If you know better sources please guide me
I am looking for help buying a 3090 with a decent price. It's too expensive and I have to train a model which needs higher VRAM. Where can I look for a decent price for 3090.
I am working on two health-related datasets. And I use Python.
- One tabular dataset (called A) contains patient-level information (by id) and a bunch of other features which I have already transformed and cleaned. This dataset has around 3000 rows. The dataset contains labels (y) for a classification problem.
- The other data is a collection of dataframes. Each dataframe represents time-series data on a particular patient (by id also). There are around 1000 dataframes (only 1000 patients have available information on this time-series data).
My methods so far:
- For the collection of dataframes, for each dataframe/patient-id, I selected only the mean, median, max, and min for each column. Then transformed the a dataframe into a single row of data: for example: "patient_id", "min_X", "max_X", "median_X", "mean_X" instead of lengthy timestep-level dataframe. Do you think this is a good idea to preserve key information about the time-series data? Otherwise, I think of a machine learning model to select the time-series features but not sure how to do so.
- Now, I would have this single dataframe (called B) of patient-level time-series data and want to join it with the first cleaned dataframe (A) but the rows are mismatched. That is, A has 3000 rows but B only has 1000 rows. The patient ids of B are subset of the patient ids of A. I don't know how to deal with this. I'm thinking of just using the 1000 rows of B and left join A but would it be a lot of data loss?
Any advice/thoughts are appreciated.
Hey Guys, I want to Become an Ai engineer And My Journey Will Be Self Taught I am Learning python and then full stack web Dev. With python and django and Then DSA in python , then Move to ai engineer ,this is my path. What Do You Think On path, is am following right path.
And I am starting this journey at the age of 21 is I am too late??? What do You Think 🤔
For resBlocks the paper says,
the input feature maps go through a convolution layer with a kernel size of 1×1 and a convolution layer with a kernel size of 3×3 to obtain the feature maps F.
But then what do the number say on the blocks? I thought they meant as input_dims, kernel_size, stride, out_channels but then why does the paper mentions only a 1x1 and then 3x3.
The figure 2 is below:
This is my first time implementing the paper. So, any help is appreciated.
Link to paper: https://ieeexplore.ieee.org/document/9303478
Hi,
I have a set of data, I'm trying to find correlation between input and label, but by nature, a majority of the data is not supposed to have correlation. let's imagine that under certain circumstances, say 1% of the data has a strong correlation between the input and the label.
the problem is that if I train an neural net model I will have at best 50 to 51% good predictions since 99% of the data don't have correlation. I need to identify this subset of data.
I have tried K-cluster groupement as suggested by chat gpt but it didn't improve the prediction % for any of the clustered set of datas. any suggestion if this is even possible ?
Hey everyone,
Hope you’re all doing well! I’m a Senior Software Engineer, and I’ve been really curious about getting into machine learning. Right now, my knowledge of ML is pretty basic, and I don’t know much about its different areas.
I started by learning some math, thinking it would help later. I’ve just finished a Calculus 1 course, but now I’m stuck on what to do next. Should I keep going with math (like Linear Algebra, Calculus 2, or Stats and Probability), or should I start exploring ML concepts, get a general idea, and then dive deeper into the math when needed?
What do you all think? What’s the best way to move forward? Would love some advice!
I just want to know which Linux Distro is best for ML and DL development which also supports nvidia graphics card for CUDA and cuDNN. I am open for all the suggestions that I can get rn.
I'm a Electronics Engineer and had started learning mathematics for DS/ML two months ago and i found myself tangled in it.
I decided to unlearn and start fresh. Please recommend me yt playlist/notes for me.
Thank you for reading. Glad if you respond🫶
Hi guys thoughts about AI engineer from hyperskill thats starting in January? Is it also beneficial for someone that may go into network engineering? (I ask because I want to have this as backup if I cant find job in IT already have CCNA Im also in my third year of CS degree)
I'm currently trying to train a sparse neural network and could use some advice. I've experimented with L1 regularization and the pruning techniques available in PyTorch, but neither has given me good results so far.
When I used L1 regularization alone, I found that the resulting neural network didn't show any real sparsity. I suspect this might be due to the optimizer's numerical nature, which introduces small errors that prevent sparsity from emerging. A friend suggested that dropout might help in training sparse neural networks, but I'm a bit skeptical about how effective that would be.
If anyone has practical tips or insights on how to train a sparse neural network effectively, I would greatly appreciate your help!
I'm curious if it's the case that child nodes in a decision trees always have less information gain/less entropy reduction and are in general always less informative than their parent nodes?
Hey, So I am working on a project whereby I have to quantize my model's weights and biases to integers and perform subsequent operations using integers. The output of my model can be either (int8 or int16) values (in this case, logits) and I need to call softmax on this logits output/array. I was able to find an integer implementation of softmax written in C (https://github.com/ARM-software/CMSIS-NN/tree/main/Source/SoftmaxFunctions). The problem I'm having is trying to evaluate that this C implementation is accurate (or more specifically, that I am using it accurately). The way I'm thinking of doing that is detailed below:
**In Python**
Take my integer logits, call an integer python implementation of softmax on the logits, get a result
(**python_integer_prediction_probabilities**).
** In C (using CMSIS-NN's )
Take the same integer logits, call the C softmax implementation on my logits, get a result (**CMSIS_NN_prediction_probabilities**)
Finally, I compare these two results to see if they are close enough. The main problem I'm having is, I assumed there would be information about how to implement a softmax function that takes integer inputs in Python, but I can't find anything online. Does anyone have an idea of how to implement this in python or is aware of resources that I could use to figure this out? thank you.
Hello,
I was wondering how a entry level machine learning engineer becomes a senior machine learning engineer. Is the skills required to become a Sr ML engineer learned on the job, or do I have to self study? If self studying is the appropriate way to advance, how many hours per week should I dedicate to go from entry level to Sr level in 3 years, and how exactly should I self study? Advice is greatly appreciated!
I’ve noticed job postings for machine learning engineers often fall into two categories:
This makes me wonder: Is prior experience as a software engineer necessary, or is a background in data science (a degree in ds and experience in the field) sufficient for most MLE roles? (With mle roles i mean roles that build models, so no data scientist role that is actually a glorified analyst)
P.S. I know job titles can be misleading, but I hope my question is clear!
DeWave is an EEG-to-text model that uses discrete codex. What I'm struggling to understand is how they could have made a discrete or indexing codex for the model. Figure 3 in the paper mentions a "Codex Transformer" being used to create a codex encoder and decoder, but I don't know what that is and can't find anything online about it. If anyone knows the answer to these questions it would be greatly appreciated.
Embeddings are a fundamental step in a RAG pipeline. Irrespective of how we choose to implement RAG, we won't be able to escape the embedding step. When researching for an indepth video, I found this one:
https://youtu.be/rZnfv6KHdIQ?si=0n9qfUsWWQnEyYTU
Hope its useful.
Hi guys, I made a video about the connection between denoising autoencoders and the underlying data distribution. If you don't know these topics, they rule most of the principles of modern generative models such as diffusion models. Anyway, here's the video, hope you enjoy. https://youtu.be/0V96wE7lY4w?si=P45Pz_CmqQgDFSFq