/r/learnmachinelearning

Photograph via snooOG

A subreddit dedicated to learning machine learning

A subreddit dedicated for learning machine learning. Feel free to share any educational resources of machine learning.

Also, we are a beginner-friendly sub-reddit, so don't be afraid to ask questions! This can include questions that are non-technical, but still highly relevant to learning machine learning such as a systematic approach to a machine learning problem.

  • Foster positive learning environment by being respectful to others. We want to encourage everyone to feel welcomed and not be afraid to participate.
  • Do share your works and achievements, but do not spam. Keep our subreddit fresh by posting your YouTube series or blog at most once a week.
  • Do not share referral links and other purely marketing content. They prioritize commercial interests over intellectual ones.

Chatrooms

Official Discord Server


Wiki

Getting Started with Machine Learning

Resources


Related Subreddits

/r/MachineLearning

/r/MLQuestions

/r/datascience

/r/computervision

Machine Learning Multireddit

/m/machine_learning

/r/learnmachinelearning

418,093 Subscribers

1

As a beginner should I go for Andrew Ng's Machine learning course on Youtube or the Machine Learning specialization on coursera.

I have been planning to pursue machine learning, but I don't really know how do I get started, suggest me a basic starting roadmap.

0 Comments
2024/07/13
07:00 UTC

1

Decision tree classifier returning leaf nodes with count 0

# Fit the classifier
clf.fit(X_train, y_train)

# Get leaf nodes
leaf_nodes = clf.apply(X_train)

# Count the number of classes at each node
current_size = np.bincount(leaf_nodes)

I am getting results like these for current_size:
[0, 0, 809, 2314, 3412, 0, 89]

Please assist with this. It is very confusing to see a count of 0 at leaf nodes where at least one value of a class should be present.

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
3 Comments
2024/07/13
05:31 UTC

1

How to build ai products

Hey guys, its my first time posting please dont downvote. I have software engineering background and would love to learn about ai. My end goal is to build ai solutions for business problems. For example what location should a store build their next store. How to build those systems to find the data to solve that question. Or another example could be using smaller models make them into a system to create great content about x that is human-like. Im just confused in where can i learn those concepts to being able to build cool stuff leveraging ai. Sorry if i came across as naive lol. Thanks for answering :p

0 Comments
2024/07/13
04:54 UTC

3

Transitioning from academia to industry

I started working as a junior ML researcher after my master's program. The job is okay. It's super flexible but I just feel like I'm not learning enough. I plan on switching to industry as a data scientist but the skills I'm learning here will not be enough.

I'm looking at job descriptions and coming across RAG, LLM Tuning, SQL, AWS, MLOPS, Snowflake, Databricks, Kubernetes etc and feeling overwhelmed because I don't know most of that. Nor will my current job ever involve using most of those tools.

I know I can learn all of this from courses but that's not the point. Companies require "experience" with all these and I don't really know how to get it (apart from projects).

Has anyone here transitioned from academia to industry ML before? Would really appreciate any advice. Thank you.

1 Comment
2024/07/13
03:21 UTC

1

Problem-solving architecture using AI models iteratively with centralized storage and distributed processing

Hi everyone!

I'm building a problem-solving architecture and I'm looking for issues or problems as suggestions so I can battle-test it. I would love it if you could comment an issue or problem you'd like to see solved, or just purely to see if you find any interesting results among the data that will get generated.

The architecture/system will subdivide the issue and generate proposals. A special type of proposal is called an extrapolation, in which I draw solutions from other related or unrelated fields and apply them to the field of the issue being targeted. Innovative proposals, if you will.

If you want to share some info privately, or if you want me to explain how the architecture works in more detail, let me know and I will DM you!

Again, I would greatly appreciate it if you could suggest some genuine issues or problems I can run through the system.

I will then share the generated proposals with you and we'll see if they are of any value or use :)

0 Comments
2024/07/13
02:13 UTC

1

Is there an extension or any other way in VS code by which I can visualise all my graphs at the same time like the dedicated window in Spyder? Currently I can see only 1 graph. I have to close the pop window and run the code to see the next graph

0 Comments
2024/07/13
01:04 UTC

3

ML Frameworks – Large-Scale Study Reveals Catastrophic Performance and Function Loss

this short video explain a large-scale study conducted by Cohere in collaboration with MIT analyzed TensorFlow, PyTorch, and JAX to see how well these frameworks perform across different hardware setups.

https://youtu.be/k9Pp8-z0yho

0 Comments
2024/07/13
01:00 UTC

1

Semantic segmentation IoU needs improvement. Need another experiment ideas to improve

I am currently working on semantic segmentation with 9 classes.

The issue right now is as follows: the mean IoU of the validation reaches 0.8, but if you look at it per class, there are still classes with IoU of 0.5.

When plotted, it turns out that the high IoU is because there are classes that are indeed the majority in the image and the segmentation captures them well.

Another issue is that there are classes that are quite small in the image, so they are often not captured by the segmentation model.

The models I have tried so far are FCN, DeepLabV3, and UNet. I used DenseNet121 as the backbone and also tried several ResNet variants.

The training dataset consists of 150 images and the validation set has 70 images. I am addressing this by patching, where the patch size is 128 and the patch step has some overlap, resulting in a total of around 40,000 training data and around 15,000 validation data.

Currently, I am using DiceLoss + CrossEntropy for the loss function.

The learning rate is 1e-4. I also tried 1e-3. I have implemented a scheduler so that as the epochs increase, the learning rate decreases.

With the current issues, I want to try other experiments. Has anyone here ever encountered a similar case with a small dataset and imbalanced segmentation classes? Do you have any other ideas to increase the IoU?

0 Comments
2024/07/13
00:53 UTC

0

Have a few questions about going for a MS after graduating in 2019

I graduated back in 2019 with a BS in Statistics, however at the time I was not very passionate about the subject at all, and barely squeezed my way through classes. I ended up graduating with what I think was around a 2.2 GPA, which I know is very bad. However, recently in my life I have found a new passion for all things computer science, and want to go back to school to study it. I want to mainly focus on ML/AI stuff, and I really like the Statistics side of it. My question is: Will I even have a chance of being accepted into any MS program with a poor undergrad like mine? Would I be better off going back to school for a BS in CS, and then going for an MS? Would it be possible for me to just run with my Stats BS I already have, and just teach myself everything I need to know and build projects/portfolios?

2 Comments
2024/07/13
00:38 UTC

1

Seeking Recommendations for resources for learning Computational sciences, Algorithms & Data Structures, for an applied ML scientist

Hello everyone,

I am a 35-year-old applied machine learning scientist with experience in formulating specific problems as learning problems and solving them using both published ML algorithms and by proposing new ones. However, I've realized that my foundational knowledge in computer science, particularly in algorithms and data structures, isn't as strong as I'd like it to be.

I want to deepen my understanding of these fundamentals and learn more about how computer scientists approach problems at a deeper level. Could you recommend any online courses that are particularly effective for building a solid foundation in algorithms and data structures?

I’m looking for courses that are:

  • Comprehensive and start from the basics
  • Well-structured and clear in their explanations
  • Preferably have hands-on coding exercises
  • Ideally from reputable sources or platforms

Additionally, do you think it’s wise to focus on learning algorithms and data structures at this stage in my career, or should I be concentrating on something else?

Thanks in advance for your suggestions!

0 Comments
2024/07/13
00:25 UTC

2

Guidance with neural network forecast

Hello to all who have stopped to read my question,

I happen to be working with a neural network for oxygen time series regression (TensorFlow and Keras), I have oxygen, saturation, temperature and salinity data every 5 minutes from 2023-08-25 00:02:12 to 2024-06-18 15:38:36

My question now is, how can I forecast for 24,48 or 72 hours beyond my validation data set (each one hour). How could I do this, is it possible? Or should I look for a network with another approach?.

Thank you very much for your time.

2 Comments
2024/07/12
22:15 UTC

0

The best course for LLM and GEN AI

Hi all,

I am looking for a good course or youtube series that i can learn LLM from that. Do you have any idea how can I start. I am working with the dl models such as LSTM and Autoencoder and I know the basics of the deep learning? Thank you.

2 Comments
2024/07/12
21:38 UTC

0

AI Meetups in NYC

Are there communities or meetups that you recommend in NYC

0 Comments
2024/07/12
21:24 UTC

1

LlamaIndex - Retrieve nodes using query and VectorStoreIndex

Hi,

I am currently working with LlamaIndex. My goal is to have a query (as a string), query my chroma vector database and receive the k nodes closest to the embedding of my query.
I managed to build a VectorStoreIndex based on my PDF documents.

index: VectorStoreIndex = VectorStoreIndex.from_vector_store(vector_store)

I am bit stuck here. I could run ".as_query_engine" but I do not want to query any kind of LLM. I just want to return the closest embeddings as a list.

Thanks !

0 Comments
2024/07/12
20:14 UTC

0

Transitioning from Mechanical Engineering to Machine Learning – Is CMU's Master's in AI Engineering (ME) Sufficient?

I'm about to graduate with a Mechanical Engineering degree and I'm keen on transitioning into Machine Learning. I'm considering the Master's in Artificial Intelligence Engineering (Mechanical Engineering) at CMU, which includes 5 core ME courses along with several AI-focused electives.

My question is: Will this program sufficiently prepare me for an ML job, or should I pursue a full master's in Computer Science or dedicated AI/ML instead?

I'm looking to understand if this interdisciplinary approach would be valuable for breaking into the ML field, or if a more focused degree is necessary for job prospects. Any insights or experiences would be greatly appreciated!

Relevant info - I have a few papers published in the field of DL, ML in fault diagnosis of mechanical systems in peer reviewed journals (not predatory). Have decent amount of experience with python which I want to improve in my free time as well to become better and increase my chances.

1 Comment
2024/07/12
20:14 UTC

1

Attention for classification

Hello, I have a question on how attention mechanism can be used for classification (for example classifying an image into one of n categories, where the output is just a label). So far, all the materials I’ve looked at have explained attention with encoder-decoder models for NLP and translation. I have been looking at different papers to see how attention can be applied to models such as CNN, but what I’ve read so far is not helping me connect the dots so I wonder if I’m missing something.

And if someone can help explain multi-hop attention I would be grateful as well. Thank you.

0 Comments
2024/07/12
20:01 UTC

63

3D Gradient descent

Hi, I’m looking to generate a figure like this one for demonstration/illustration purposes.

Python or R are welcome but perhaps something a bit more GUI oriented wouldn’t be bad as I could easily adapt the plane.

Thanks

5 Comments
2024/07/12
19:57 UTC

1

Seeking Guidance for Hosting a RAG Chatbot on Cloud with any open 7B model or Mistral-7B-Instruct-v0.2

Hello there,

I'm planning to host a Retrieval-Augmented Generation (RAG) chatbot on Cloud using the Mistral-7B-Instruct-v0.2-AWQ model. I’m looking for guidance on the following:

  • Steps: What are the key steps I need to follow to set this up?
  • Resources: Any articles, tutorials, or documentation that can help me through the process?
  • Videos: Are there any video tutorials that provide a walkthrough for deploying similar models on Cloud?

I appreciate any tips or insights you can share. I am willing to pay if you can teach me this. Thanks in advance for your help :)

0 Comments
2024/07/12
19:19 UTC

0

Possibility to MOE qwen2 72B until parameters reach scale of GPT4 level

Is it possible to recreate a chatgpt4 by using the latest model weigh from Qwen2 72B, while multiplying it by 48, making a 48x72B 4T A360B model, 5 active experts. While implementing powerinfer2’s 22x performance boost?

Qwen2 72B has an total cmmlu benchmark of 90. So if moe it by 48 times, will it reach 30 percent boost on the benchmark? If there’s an 30 percent boost, there will be an 91 mark on the MMMU benchmark hence does better job than human. Hence are we getting an AGI model this way?

4 Comments
2024/07/12
18:34 UTC

1

Need help with choosing project

I am a third-year major in AI and I have to choose a project for college assignment. One is enterprise search assistant which involves building an intelligent chatbot that can interact with enterprise knowledge base and answer user questions. The other project is data science copilot that assists engineers in coding specifically tailored for data science applications. I am unsure of what project to go ahead with. My interests lie more in ML than data science. Help me choose the better one that aligns well with my interests and is also industry relevant.

2 Comments
2024/07/12
18:22 UTC

1

ASR trained for a specific person

Hey, me and a friend are currently planning a project. We want to train an ASR for a person who is Speech impaired. Neither of us have previous worked with ASR but have the skills to learn it.

I am currently trying to map out a plan for a first database. Normally you need thousands of hours in speech from different people to train an ASR. But that's training an ASR to recognize the speech from as many people as possible. We just want it to be trained on the speech of one singular person.

However I can't find even an estimate of how much material is needed or what specifically the spoken things should be for the best learning effect. And saying "As much as possible" just ist viable. I can always collect more data once it's in use.

My plan up till now entails the phonetic alphabet, the general alphabet in different ways of pronunciation, words, phrases and these in different situations. What I need to know still is how many words and phrases I will need and what I should look out for when selecting them, besides how likely the person is going to need a specific word or phrase.

TL;DR: How much data and what specific kinds of speech would be needed to train an ASR to understand just one person.

Thanks in advance.

0 Comments
2024/07/12
18:03 UTC

0

What model to use?

Hi, I want to train some models with data, but I'm having an issue finding what type of data each models needs. My data is formed by 2 arrays related to a categroical value, anyone know any source to look up the data different models needs? Any suggestion on what to use?

Thanks for reading

0 Comments
2024/07/12
17:12 UTC

5

Learn till transformers, what is next (towards llm) ?

Just finished learning transformers, ofcourse will code gpt from scratch (Andrej karpaty's). I am going towards llm's and Gen AI applications.

What should I do next ? Read about each llm & architecture ? Learn langchain ? I am very confused different rodmaps say different approach.

Help me in guiding from your experience and mistakes.

A flow will help.

Thanks in advance.

6 Comments
2024/07/12
17:08 UTC

1

XGboost not capturing yearly patterns

I am hourly Electrical demand load data with weather parameters for four years.

I have trained XGboost on this data which is giving me 97% accuracy.

Problem is that i wana forecast demand load for next one year. But feature importance shows that XGboost is giving 90% weightage to Lag-1 feature.

When i forecast load it doesn't capture load term trend. It revolves around past 2 months for whole year.

I have added weakly and monthly rooling and lag features but nothing solves this problem.

I have added temperature as input parameter which gives correlation coefficient of 0.81 but when i train model it gives highest weitage to Lag-1.

How can i overcome this problem? Which alternative model is best in my case ?

0 Comments
2024/07/12
16:52 UTC

1

How AI Really Works (And Why Open Source Matters)

0 Comments
2024/07/12
16:45 UTC

35

LSTM classification model: loss and accuracy not improving

Hi guys!

I am currently working on a project, where I try to predict whether the price of a specific stock is going up or down the next day using a LSTM implemented in PyTorch. Please note that I am aware that I will not be able to predict the price action 100% accurately using the data and model I chose. But that's not the point, I just need this model to evaluate how adding synthetic data to my dataset will affect the predictions of the model.

So far so good. But my problem right now is that the model doesn't seem to learn anything at all and I already tried everything in my power to fix it, so I thought I'll ask you guys for help. I'll try my best to explain the model and data that I am using:

Data

I am using Apple stock data from Yahoo Finance which I modified to include the following features for a specific day:

  • Volume (scaled between 0 and 1)
  • Closing Price (log scaled between 0 and 1)
  • Percentage difference of the Closing Price to the previous day (scaled between 0 and -1)

To not only use 1 day to make a prediction, I created a sequence by adding lagged data from the previous 14 days. The Input now has the shape (n_samples, sequence_length, n_features), which would be (10000, 14, 3) for my case.

The targets are just whether the stock went down (0) or up (1) the following day and have the shape (10000, 1).

I divided the data into train (80%), test (10%) and validation set (10%) and made sure to scale the data solely based on the training set. (Although this also means that closing prices in the test and validation set can be outside of the usual 0-1 range after scaling but I assume that this wouldn't be a big problem?)

Model

As I said in the beginning, I am using a LSTM implemented in PyTorch. I am using the code from this YouTube video right here: https://www.youtube.com/watch?v=q_HS4s1L8UI

*Note that he is using this model for a regression task although I am doing classification in my case. I don't see why this would be a problem, but please correct me if I am wrong!

Code for the model

class LSTMClassification(nn.Module):
    def __init__(self, device, input_size=1, hidden_size=4, num_stacked_layers=1):
        super().__init__()
        self.hidden_size = hidden_size
        self.num_stacked_layers = num_stacked_layers
        self.device = device

        self.lstm = nn.LSTM(input_size, hidden_size, num_stacked_layers, batch_first=True) 
        self.fc = nn.Linear(hidden_size, 1) 

    def forward(self, x):

        batch_size = x.size(0) # get batch size bc input size is 1

        h0 = torch.zeros(self.num_stacked_layers, batch_size, self.hidden_size).to(self.device)

        c0 = torch.zeros(self.num_stacked_layers, batch_size, self.hidden_size).to(self.device)

        out, _ = self.lstm(x, (h0, c0))
        logits = self.fc(out[:, -1, :])
        
        return logits

Code for training (and validating)

model = LSTMClassification(
        device=device,
        input_size=X_train.shape[2], # number of features
        hidden_size=8,
        num_stacked_layers=1
    ).to(device)

optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
criterion = nn.BCEWithLogitsLoss()


train_losses, train_accs, val_losses, val_accs, model = train_model(model=model,
                        train_loader=train_loader,
                        val_loader=val_loader,
                        criterion=criterion
                        optimizer=optimizer,
                        device=device)

def train_model(
        model, 
        train_loader, 
        val_loader, 
        criterion, 
        optimizer, 
        device,
        verbose=True,
        patience=10, 
        num_epochs=1000):

    train_losses = []    
    train_accs = []
    val_losses = []    
    val_accs = []
    best_validation_loss = np.inf
    num_epoch_without_improvement = 0
    for epoch in range(num_epochs):
        print(f'Epoch: {epoch + 1}') if verbose else None

        # Train
        current_train_loss, current_train_acc = train_one_epoch(model, train_loader, criterion, optimizer, device, verbose=verbose)

        # Validate
        current_validation_loss, current_validation_acc = validate_one_epoch(model, val_loader, criterion, device, verbose=verbose)

        train_losses.append(current_train_loss)
        train_accs.append(current_train_acc)
        val_losses.append(current_validation_loss)
        val_accs.append(current_validation_acc)
        
        # early stopping
        if current_validation_loss < best_validation_loss:
            best_validation_loss = current_validation_loss
            num_epoch_without_improvement = 0
        else:
            print(f'INFO: Validation loss did not improve in epoch {epoch + 1}') if verbose else None
            num_epoch_without_improvement += 1

        if num_epoch_without_improvement >= patience:
            print(f'Early stopping after {epoch + 1} epochs') if verbose else None
            break

        print(f'*' * 50) if verbose else None

    return train_losses, train_accs, val_losses, val_accs, model

def train_one_epoch(
        model, 
        train_loader, 
        criterion, 
        optimizer, 
        device, 
        verbose=True,
        log_interval=100):
    
    model.train()
    running_train_loss = 0.0
    total_train_loss = 0.0
    running_train_acc = 0.0

    for batch_index, batch in enumerate(train_loader):
        x_batch, y_batch = batch[0].to(device, non_blocking=True), batch[1].to(device, non_blocking=True)  

        train_logits = model(x_batch)

        train_loss = criterion(train_logits, y_batch)
        running_train_loss += train_loss.item()
        running_train_acc += accuracy(y_true=y_batch, y_pred=torch.round(torch.sigmoid(train_logits)))

        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()

        if batch_index % log_interval == 0:
            
            # log training loss 
            avg_train_loss_across_batches = running_train_loss / log_interval
            # print(f'Training Loss: {avg_train_loss_across_batches}') if verbose else None

            total_train_loss += running_train_loss
            running_train_loss = 0.0 # reset running loss

    avg_train_loss = total_train_loss / len(train_loader)
    avg_train_acc = running_train_acc / len(train_loader)
    return avg_train_loss, avg_train_acc

def validate_one_epoch(
        model, 
        val_loader, 
        criterion, 
        device, 
        verbose=True):
    
    model.eval()
    running_test_loss = 0.0
    running_test_acc = 0.0

    with torch.inference_mode():
        for _, batch in enumerate(val_loader):
            x_batch, y_batch = batch[0].to(device, non_blocking=True), batch[1].to(device, non_blocking=True)

            test_pred = model(x_batch) # output in logits

            test_loss = criterion(test_pred, y_batch)
            test_acc = accuracy(y_true=y_batch, y_pred=torch.round(torch.sigmoid(test_pred)))
            
            running_test_acc += test_acc
            running_test_loss += test_loss.item()

    # log validation loss
    avg_test_loss_across_batches = running_test_loss / len(val_loader)
    print(f'Validation Loss: {avg_test_loss_across_batches}') if verbose else None

    avg_test_acc_accross_batches = running_test_acc / len(val_loader)
    print(f'Validation Accuracy: {avg_test_acc_accross_batches}') if verbose else None
    return avg_test_loss_across_batches, avg_test_acc_accross_batches

Hyperparameters

They are already included in the code, but for convenience I am listing them here again:

  • learning_rate: 0.0001
  • batch_size: 8
  • input_size: 3
  • hidden_size: 8
  • num_layers: 8

Results after Training

As I said earlier, the training isn't very successful right now. I added plots of the error and accuracy of the model for the training and validation data below:

Loss and accuracy for training and validation data after training

The Loss curves may seem okay at first glance, but they just sit around 0.67 for training data and 0.69 for validation data and barely improve over time. The accuracy is around 50% which further proves that the model is not learning anything currently. Note that the Validation Accuracy always jumps from 48% to 52% during the training. I don't know why that happens.

Question

As you can see, the model in its current state is unusable for any kind of prediction. I already tried everything I know to solve this problem, but it doesn't seem to work. As I am fairly new to machine learning, I hope that any one of you might be able to help with my problem.

My main question at the moment is the following:

Is there anything I can do to improve the model (more features, different architecture, fix errors while training, ...) or do my results just show that stocks are unpredictable and that there are no patterns in the data that my model (or any model) is able to learn?

Please let me know if you need any more code snippets or whatsoever. I would be really thankful for any kind of information that might help me, thank you!

19 Comments
2024/07/12
16:08 UTC

2

What is Retrieval Augmented Generation (RAG) for LLMs? A 5-minute visual guide. 🧠

TL;DR: RAG overcomes the limitations of LLMs by bringing in external sources of information as relevant context.

RAG functions like a student in an open-book exam. When faced with a question, the student can look up the latest information in textbooks or online resources, ensuring their answer is accurate and up-to-date.

A Visual Guide On RAGs in the Context of LLMs

https://preview.redd.it/8bhes74p24cd1.png?width=1456&format=png&auto=webp&s=c43efd89ea144111ee2a86fa39eca603615ea8e1

0 Comments
2024/07/12
16:04 UTC

18

Learn advanced math for ML

I am a CS student who wants to go to grad school for ML after I graduate (specifically a CS master's with a specialization in ML). The problem is my CS degree only requires discrete math, calc 2, linear algebra, and a prob and stats class. I just finished linear algebra and had a question about going further.

I have gone through different resources like “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow”, and I learned through yt videos on my own when math I did not understand came up. Is this sufficient or should I keep going to get a deeper understanding like taking multi-variable calculus and differential equations to better prepare myself for grad school? If so should I take it at my college or a local community college to save money?

9 Comments
2024/07/12
15:04 UTC

Back To Top