/r/MLQuestions

Photograph via snooOG

A place for beginners to ask stupid questions and for experts to help them! /r/Machine learning is a great subreddit, but it is for interesting articles and news related to machine learning. Here, you can feel free to ask any question regarding machine learning.

What kinds of questions do we want here?

"I've just started with deep nets. What are their strengths and weaknesses?" "What is the current state of the art in speech recognition?" "My data looks like X,Y what type of model should I use?"

If you are well versed in machine learning, please answer any question you feel knowledgeable about, even if they already have answers, and thank you!


Related Subreddits:

/r/MachineLearning
/r/mlpapers
/r/learnmachinelearning

/r/MLQuestions

48,628 Subscribers

0

What would you do in this situation?

People of MLQuestions, if you were in my shoes and the goal was to get a job asap in machine learning engineering role (preferably NLP), how would you plan a few months ahead and in what time frame would you meet that goal.

My shoes: https://www.dropbox.com/scl/fi/k0ruhu6wnri4phfiniuri/Resume-Censored.pdf?rlkey=zq46ltuvu4xjxtrn68d8zyd01&dl=0
(this is enough to give you an idea of what my shoes look like)

Goal is to get a job asap, as mentioned above. My university is online so I have to spend about to 2hrs daily to complete my lectures, and prepare for assignments and quizes. I spend about 8-12 hrs daily, divided for learning ML (currently taking CS224n lectures) and working at my internship (fully-remote). I live in Kuwait and the ML market here is almost non-existent. I can move back to Pakistan (home country) and get a full-time job there where the market is not very mature but there is work for engineering (R&D near non-existent). Best case scenario would be to get a fully-remote job in any part of the world possible. My salary expectations are not much, I can work full-time job if it can pay me $500/mo atleast because that would cover my needs and then with that I can focus on my studies and ML specifically as I want to get into masters or phd after bachelors. That's my long-term goal to get into academic research.

So for now, getting a job to cover my expenses is a priority and a short term goal. How would you then plan next few months to meet that goal, if possible?

0 Comments
2024/04/13
18:44 UTC

1

Differentiable rendering framework recommendation?

I’ve been googling for a differentiable rendering framework but all the ones I have found are quite old. Like years old.

Pytorch 3D seems like my best choice but I’m curious if I have overlooked something?

0 Comments
2024/04/13
16:19 UTC

2

How would you go about creating an open-source "Aqua Voice"?

I saw the launch HN post of Aqua Voice https://withaqua.com/ which is really nice,

and since such a tool would really be beneficial to the open-source community, I was wondering how to build one

I had a few ideas, but wondering what other people here think of those, or whether you have better ones? And perhaps some people would like to start an open-source effort to build an open version of such a tool?

First version

My thinking would be to first try a "v0" version which uses no custom model, and relies on commercial STT (Whisper) and NLP (ChatGPT)

It would go this way:

  • record the user and continuously (streaming) convert to text using the STT
  • use some Voice Activity Detection to detect blanks / split on output sentences to create "blocks" that could be processed incrementally
  • the model would have two states : all the detected blocks until now, and the current "text output"
  • after each block has been detected, a first LLM model could be used to transform the block in an instruction (eg "Make the first bullet point in bold")
  • then a second LLM would take both the current "text output" and the "new instruction", and produce a new "text output"
  • the two LLMs could be just a call to ChatGPT with some instructions to prime it (eg "the user said this: blablabla. transform it to instructions to modify an existing text block", or "this is the current state of the text as markdown blablabla, apply the following instruction and output the transformed text as markdown: blablabla)

Second version

A more elaborate version could use custom models (particularly custom designed LLMs or other NLP models), and work internally on an Abstract Syntax Tree of the markdown documents (eg explicitly representing text as list of raw text, or "styled text" sections, or "numbered list" sections, etc), and then having the custom LLM apply transforms directly to that representation to make it more efficient

Happy to hear your thoughts

0 Comments
2024/04/13
13:45 UTC

1

My simple transformer's output is gibberish. Can someone guide?

1 Comment
2024/04/13
12:35 UTC

1

[HELP] Using Flan T5 for prompt engineering

1 Comment
2024/04/13
12:16 UTC

1

How should I make the most of my ML internship?

I'm not a CS major. I mostly got the role because of my experience annotating data (domain knowledge). I also train the deep learning models running on Docker, but I barely understand what's going on underneath (U-net architecture, keras, tensorflow).

Would it be worth investing time into educating myself on deep learning and computer vision? Or should I focus on the MLOps side instead since I kinda wanna be a data engineer? Would it be too farfetched to think that I could pivot towards becoming an ML engineer despite the lack of a CS degree? Maybe there's a related project I could do and add to my resume, or I could work towards getting an AWS Machine Learning certification?

A ton of questions, I know. I'm just navigating through a lot of uncertainty right now. Any insights will be greatly appreciated!

1 Comment
2024/04/13
07:23 UTC

2

Questions Regarding Custom Model Trained on Patent Data

Hello all!

My goal is to build a custom model specifically for patent data.

I plan to begin by fine-tuning Llama 70B model on patent data. My question is, with the appropriate fine-tuning, do you think it will be able to generate technical patent verbiage that compares favorably to the output of models like ChatGPT 3.5 or 4?

Additionally, I am considering whether to create separate models for each patent section or to use one model that is fine-tuned on all sections of patent applications. Do you think a single model could handle each section without hallucinating and generating inaccurate or fabricated content?

The prompt will always remain the same; the variable will be the content related to the specific invention input.

I was considering starting with the 13B model to see how it performs to keep training and deployment costs down. However, I'm concerned that the 13B model might struggle with the complex technical patent language.

I am also looking to hire a consultant or someone who can help build the model (apologies in advance if this is not allowed on here).

0 Comments
2024/04/13
06:01 UTC

2

how important is int16 processing performance for LLM?

I read that many LLM models are int16 but you can use fp16, or some other computation to represent int16. Issue then is your performance goes down significantly. If true do we need more native int16 compute capabilities on CPUs, GPUs and NPUs?

1 Comment
2024/04/13
04:45 UTC

1

Graphics Programming: Cloth Deformation, Monte Carlo Raytracing, Denoising

Title,

Just joined up from over in Graphics Programming. I am a grad student in CG, and have been writing up my own renderers, reading papers, the works. I'd love to know where to look in expanding my hardware and software skills in order to start implementing some ML cloth deformation, raytracing, denoising, etc.

As I am already upgrading my PC soon for higher-tier graphics anyway, what is the proposed path to take for hardware if I wanted to implement something like this in house? Apologies for my inexperience as I'm expanding from a different specialization ;)

P.S., apologies if this duplicate appears on r/learnmachinelearning.

0 Comments
2024/04/13
04:44 UTC

1

Image Segmentation Advice

I am using a mask rcnn to predict the layers of a seed. I need it to predict various concentric regions. When we label, we label using polygons and just label the outer boundary, and assume that the inner boundary will be determined by anything inside of it. When training, should we train on the original polygons or should we subtract from the resulting mask any masks of objects within in?

1 Comment
2024/04/13
02:03 UTC

1

Sub for Tensorflow questions?

Is this the place, or is there a more specific sub for Tensorflow implementation and programming questions?

0 Comments
2024/04/12
19:43 UTC

2

How to get a list of interacting features (or quantify interaction)

I'm interested in noting pairwise interaction between a large set of features, statistical tests for association between 2 variables like chi square and Fisher are not an option due to large feature size (literally 1000+) - same goes for doing machine learning models like LR on every pair of features + interaction effects and checking significance.

While reading about feature interaction I found out about Friedman's H statistic in interpretable ML, but I'm afraid that also is computationally expensive. I'm looking for a way where i can use ML (or DL) and learn interactions without explicitly testing for them, e.g., maybe learn them from similar weights of SVM features (I dont know if that would be a valid approach). Also though about learning them out of ANN's hidden layers weights but I am also not sure how to go about it. Is there anything (resource/approach..) that you know about to go around it or how would you conceptualize it out of model weights?

1 Comment
2024/04/12
19:10 UTC

2

Physics Informed Neural Network

First of all i know this i rather complex problem but i would be grateful for any feedback cause its rather newa area in ML and its not easy finding help with it (even at my university). ( maybe you know some community that specializes in this area).

I want to train model that takes time as input and outputs current and voltage in a circuit.

During training i give two batches at each epoch:

  • first is for "data loss" - its just a few first points in time and i calculate MSE between predictions of model and real values at these points in time. Ideally this is for setting the initial values and could be just single point. Now if i were to give the model only this, it would fit the data perfectly but wouldnt be able to generalize beyond the range of time it was given solutions for.

  • So here comes second batch - it covers much wider range in time and the loss is calvulated by calcualting the residuals of differential equations that were used for generating the data. Generally they look like that.

dI/dt + aI+bU = 0 dV/dt + c*I = 0

For calculating gradients i use torch.autograd.grad functions. This approach was described for example here:https://benmoseley.blog/my-research/so-what-is-a-physics-informed-neural-network/

Now, my problem is that it doesnt work at all. One of the problems is of course balancing the data and physics loss - for example if i prioritze physics loss too much it goes into trivial solution - zeros, because then differential equations are always satisfied. Im not sure if this is rather a problem of model/optimizer selection, error in implementation or maybe its not that easy for the model to converge to global optima.

2 Comments
2024/04/12
18:27 UTC

2

My supervisor said I can use testing set for a couple times as long as it is not for validation. Is that correct?

5 Comments
2024/04/12
18:18 UTC

1

Beginner help

Hello , I am new to machine learning. I know python , for basics should I watch this video suggestion recommended

3 Comments
2024/04/12
17:18 UTC

1

Question Generation

How can you generate MCQ from the textbook including formaulas , tables, figures using transformer models ? can any one tell what will the work flow be?

1 Comment
2024/04/12
12:45 UTC

1

What are effective feature extraction methods for audio classification besides MFCCs?

Hi everyone,

I'm working on a project involving audio classification using CNNs and currently using MFCCs (Mel Frequency Cepstral Coefficients) as features. However, I'm curious to explore other feature extraction methods that could potentially improve the performance of my CNN model.

Could anyone recommend alternative feature extraction techniques apart from MFCCs that have shown promising results in audio classification tasks? I'm particularly interested in methods suitable for input into a convolutional neural network (CNN).

Thanks in advance for any insights or suggestions!

1 Comment
2024/04/12
10:05 UTC

0

Help!

  1. Gradient Descent:

Consider the linear regression model y=2x+3. Use gradient descent to find the optimal values of m and b (slope and intercept) to minimize the mean squared error loss function.   Gradient Descent Steps: Describe the gradient descent steps for minimizing the loss function: J(θ)=2m1∑i=1m(hθ(x(i))−y(i))2 where hθ(x) is the hypothesis function, θ are the model parameters, x is the input, and y is the true output.   2. Underfitting: Given a linear regression model with the equation y=2x+1, explain whether it is underfitting or not. If it is underfitting, suggest a better-fitting model.

  3. Overfitting: Given the following polynomial regression model: y = 3x^2 - 2x + 1 Determine whether the model is overfitting or not. If it is overfitting, suggest a simpler model.

3 Comments
2024/04/12
08:38 UTC

1

where the best place to get good insurance and economics data sets?

0 Comments
2024/04/12
07:07 UTC

1

Specific Learning Roadmap of Deep Learning for my research work

Hello,

I have been working in a software engineering research project as part of my thesis. To solve one part of the project, it seems I need the help of Deep Learning, more specifically Graph Neural Network and Recurrent Neural Network. However, I am completely a beginner in Machine Learning and Deep Learning and thats why its causing me problem to get to the right point of solution.

Therefore, I am seeking for a specific roadmap of learning ML that will help me in solving my problem using GNN and RNN. Since I have other areas to solve for my project, I wont be able to dive deeper into all the ML concepts having time constraints. But I want to know properly what I am doing.

It will be really helpful to have a roadmap for this which may include maths basics, stat topics, ML algorithms, pre processing techniques (I have json data to work with and getting confused on how to preprocess it for my use case), feature engineering and so on. I have been looking at ML resources and getting a bit confused on where to start.

Thanks a lot.

3 Comments
2024/04/11
23:14 UTC

2

Extending vocabulary size (LLM)

I recently came across this really interesting paper:

https://arxiv.org/abs/2404.01744

In the paper, the researches train the 2B Gemma model to perform function calling with a roughly equal accuracy as much larger models like GPT-4

To achieve this SOT accuracy, one of the novel ideas is to transform the function calls from a multi token prediction problem to a single token prediction problem. The researchers achieve this my extending the tokenizers vocab with the names of the functions.

I would like to replicate this research as I think it's incredibly impressive if true, but I'm not sure how to go about adding new tokens to the Gemma model.

The researchers give no indication on how they added the new tokens to the Gemma model.

Are there any resources/guides that tackle this issue?

3 Comments
2024/04/11
21:15 UTC

0

Convolutional Neural Network (CNN) Research Topics

I am a G11 Student doing the IB, and for my IB Extended Essay, I would like to do something related to Convolutional Neural Networks. However, I am a beginner in all of the Machine Learning topics, so I don't know how to make a good research question (like what to test, what to answer). Also, I would most likely be using code on github as I would performing an analysis, not programming. Any ideas for good questions, or things to test, that are somewhat beginner friendly? Here are some examples of some titles that scored well:

- To what extent are character-level convolutional neural networks viable for classifying texts by their century of creation?

- Investigating Relationship Between Covid-19 and Spectrograms of Coughing Acoustics Through the Use of Convolutional Neural Network

- How do convolutional neural networks compare to recurrent neural networks in terms of speed and accuracy when performing speech recognition?

Also popular are questions that ask about how useful something like CNNs are in a particular context, like analyzing handwriting, etc.

Thank you very much!

0 Comments
2024/04/11
09:05 UTC

1

🎓📊 Calling all experts in machine learning-based business models! 🤖💼

As a master's student at the Technical University of Darmstadt in Germany, I am conducting research for my thesis on the integration of machine learning into business models. I'm looking to connect with professionals who have insights or experience in this area.
The focus of my research is understanding how organizations respond to machine learning-driven changes in their business models. As someone familiar with the intricacies of business informatics, your insights would be invaluable in contributing to the depth and breadth of my study.
Your contribution will greatly enrich our research and provide valuable insights into the successful implementation of ML-driven strategies. If you can spare a moment, please click on the following link to access the survey: https://ww3.unipark.de/uc/StudierendeMitRechten/3139/

Thank you in advance for your support!

0 Comments
2024/04/10
22:52 UTC

1

What can I do next?

I will be graduating with my Master’s in bioinformatics soon, but I’m wanting to potentially pivot into ML/AI. I have experience with ML from my classes but I’m not sure at what depth compared to what is expected to get a job in this field. What additional credentials would be helpful to me to give me an edge in my job search?

0 Comments
2024/04/10
21:29 UTC

3

Masters in computer science or data science?

Hello everyone! I’m having trouble deciding between the masters in computer science or data science at Georgia tech. Both have a machine learning program in them, but I’m not sure which one would be most beneficial for a career in the machine learning field, especially considering I want to go the machine learning engineer route, with a PhD in Deep Learning in the future. Thanks in advance!

6 Comments
2024/04/10
17:43 UTC

1

Joining masters at CMU this fall. Need help with selecting courses to take.

I'm joining the Masters in Computational Data Science program this fall.

My goal after graduating is to be a research scientist at the top ML labs. I've seen that, given the market, a PhD may be needed. Therefore, I want to take courses that would build a first principles, fundamental deep understanding to do research, either in a PhD or in the industry.

Here's my current list:
Advanced Introduction to ML
Intro to DL (Practical applications of DL) or Advanced DL (Theoretical foundations of DL)
Convex Optimization/ Probabilistic Graphical Model's
A course on LLM's
Advanced NLP
A course on statistics/ Statistical ML
I also have some mandatory requirements for systems like cloud computing.

Have I covered the necessary topics for getting to the depth?

Are there any specific courses or CMU specific professors you recommend?
Please DM me if you have done an MS at CMU with a similar goal, and could help me to plan this out.

TIA.

5 Comments
2024/04/10
14:58 UTC

2

Why does my convolution layer loose the details it learned?

I've been testing my cnn on the mnist dataset. I recently added a visualizer for the convolution layers and I saw that initially it works as intended, the CN layers return details like edges, my pooling layers then return a more compressed version. but when the network is close to converging it suddenly looses the edge features and just returns an identical image, The weights also just all go positive and high.

Training Video

[Note: squares represent convolution layers (Input image -> conv kernel 1 -> output -> pool 1 -> output -> conv kernel 2 -> output -> pool 2 -> output)]

[Another Note: purple represents negative weights while green represents positive weights]

these are my layers:

let layers = vec![
        Layer::conv(3, Valid, 1, ReLU),
        Layer::pool(2, 2),
        Layer::conv(3, Valid, 1, ReLU),
        Layer::pool(3, 2),
        Layer::dense([25, 32], Sigmoid),
        Layer::dense([32, 10], SoftMax),
    ];

Is this normal? I think its explosive gradients but I'm not sure, there might be something wrong with my code.

1 Comment
2024/04/10
09:44 UTC

3

Too many duplicate samples

Hello everyone. I’m new to Machine Learning and I have been given a task to predict cardiovascular diseases with the dataset: https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset

I have found that it contains many duplicate records (about 70% of the data). As a newbie, my instinct is to drop the duplicates, but I need some advice/guidance. Would that be the right approach?

7 Comments
2024/04/10
05:45 UTC

Back To Top