/r/MLQuestions
A place for beginners to ask stupid questions and for experts to help them! /r/Machine learning is a great subreddit, but it is for interesting articles and news related to machine learning. Here, you can feel free to ask any question regarding machine learning.
What kinds of questions do we want here?
"I've just started with deep nets. What are their strengths and weaknesses?" "What is the current state of the art in speech recognition?" "My data looks like X,Y what type of model should I use?"
If you are well versed in machine learning, please answer any question you feel knowledgeable about, even if they already have answers, and thank you!
Related Subreddits:
/r/MLQuestions
Hi,guys need help with something, I need to know in azure ml ,can we convert the drag and drop context into a notebook context ,will it by itself write it for me ? ,if someone can help me teach and willing to help me finish my project ,it will be very helpful , kindly request you guys to help me.thanks in advance
In this image at this time stamp (https://youtu.be/pj9-rr1wDhM?si=NB520QQO5QNe6iFn&t=382) it shows the later CNN layers on top with kernels showing higher level feature, but as you can see they are pretty blurry and pixelated and I know this is caused by each layer shrinking the dimensions.
But in this image at this time stamp (https://youtu.be/pj9-rr1wDhM?si=kgBTgqslgTxcV4n5&t=370) it shows the same thing as the later layers of the CNN's kernels, but they don't look lower res or pixelated, they look much higher resolution
My main question is why is that?
I am assuming is that each layer is still shrinking but the resolution of the image and kernel are high enough that you can still see the details?
I recently started developing ASR, and I started with an acoustic model. I started trying to train it, but it gives me a completely wrong result and the loss becomes negative.
acoustioModel.h
#include <torch/torch.h>
#include <vector>
class SpeechRecognitionModelImpl : public torch::nn::Module {
public:
SpeechRecognitionModelImpl(int input_size, int hidden_size, int num_classes, int num_layers);
torch::Tensor forward(torch::Tensor x);
void train(std::vector<torch::Tensor> inputs, std::vector<torch::Tensor> targets,
std::vector<int> input_lengths, std::vector<int> target_lengths, size_t epochs);
std::vector<int> decode_greedy(torch::Tensor output);
private:
torch::nn::LSTM lstm;
torch::nn::Linear fc;
torch::nn::CTCLoss ctc_loss;
};
TORCH_MODULE(SpeechRecognitionModel);
acousticModel.cpp
#include "acousticModel/acousticModel.h"
SpeechRecognitionModelImpl::SpeechRecognitionModelImpl(int input_size, int hidden_size, int num_classes, int num_layers)
: lstm(torch::nn::LSTMOptions(input_size, hidden_size).num_layers(num_layers).batch_first(true)),
fc(hidden_size, num_classes),
ctc_loss(torch::nn::CTCLoss()) {
register_module("lstm", lstm);
register_module("fc", fc);
register_module("ctc_loss", ctc_loss);
}
torch::Tensor SpeechRecognitionModelImpl::forward(torch::Tensor x) {
if (x.dim() == 2) {
x = x.unsqueeze(0);
}
x = x.to(torch::kFloat);
auto lstm_out = lstm->forward(x);
auto hidden_states = std::get<0>(lstm_out);
auto output = torch::log_softmax(fc->forward(hidden_states), 2);
return output;
}
void SpeechRecognitionModelImpl::train(std::vector<torch::Tensor> inputs, std::vector<torch::Tensor> targets,
std::vector<int> input_lengths, std::vector<int> target_lengths, size_t epochs) {
if (inputs.size() != targets.size() || inputs.size() != input_lengths.size()) {
throw std::runtime_error("Inputs, targets, and lengths must have the same size");
}
torch::optim::Adam opt(parameters(), 0.001);
for (size_t i = 0; i < inputs.size(); i++) {
for (size_t epoch = 0; epoch < epochs; epoch++) {
std::cout << "\nstart epoch" << std::endl;
auto output = forward(inputs[i]);
std::cout << "forward" << std::endl;
output = output.transpose(0, 1);
std::cout << "transpose" << std::endl;
auto loss = ctc_loss(
output,
targets[i],
torch::tensor(input_lengths[i], torch::kInt32),
torch::tensor(target_lengths[i], torch::kInt32)
);
std::cout << "ctc_loss" << std::endl;
opt.zero_grad();
std::cout << "zero_grad" << std::endl;
loss.backward();
std::cout << "backward" << std::endl;
opt.step();
std::cout << "step" << std::endl;
std::cout << "loss: " << loss.item<double>() << std::endl;
std::cout << "epoch: " << epoch << std::endl << std::endl;
}
}
/*for (size_t epoch = 0; epoch < epochs; ++epoch) {
double total_loss = 0.0;
for (size_t i = 0; i < inputs.size(); ++i) {
std::cout << "1" << std::endl;
auto output = forward(inputs[i]);
std::cout << "2" << std::endl;
output = output.transpose(0, 1);
std::cout << "3" << std::endl;
auto loss = ctc_loss(
output,
targets[i],
torch::tensor(input_lengths[i], torch::kInt32),
torch::tensor(target_lengths[i], torch::kInt32)
);
std::cout << "4" << std::endl;
opt.zero_grad();
std::cout << "5" << std::endl;
loss.backward();
std::cout << "6" << std::endl;
opt.step();
std::cout << "7" << std::endl;
std::cout << loss.item<double>() << std::endl;
total_loss += loss.item<double>();
}
std::cout << "Epoch [" << epoch + 1 << "/" << epochs << "], Loss: " << total_loss / inputs.size() << std::endl;
}*/
}
std::vector<int> SpeechRecognitionModelImpl::decode_greedy(torch::Tensor output) {
output = output.argmax(2);
std::vector<int> decoded_sequence;
int prev = -1;
for (int t = 0; t < output.size(1); ++t) {
int current = output[0][t].item<int>();
if (current != prev && current != 0) {
decoded_sequence.push_back(current);
}
prev = current;
}
return decoded_sequence;
}
read_audio realization
std::vector<double> read_audio(const std::string& filename) {
SF_INFO sfinfo;
SNDFILE* infile = sf_open(filename.c_str(), SFM_READ, &sfinfo);
if (!infile) {
throw std::runtime_error("Unable to open the file: \"" + filename + "\"");
}
std::vector<double> audio(sfinfo.frames);
sf_read_double(infile, audio.data(), sfinfo.frames);
sf_close(infile);
return audio;
}
main.cpp
torch::Tensor string_to_tensor(const std::string& str) {
std::vector<double> data;
for (auto& c : str) {
double x = static_cast<double>(c) / 128.0;
data.push_back(x);
}
return torch::tensor(data, torch::kFloat32);
}
std::string tensor_to_string(const torch::Tensor& tensor) {
std::string result;
auto normalized_values = tensor.contiguous().data_ptr<float>();
auto num_elements = tensor.size(0);
for (size_t i = 0; i < num_elements; i++) {
char c = static_cast<char>(normalized_values[i] * 128.0);
result.push_back(c);
}
return result;
}
torch::Tensor calculate_spectrogram(const std::vector<double>& audio) {
int num_frames = (audio.size() - WINDOW_SIZE) / HOP_SIZE + 1;
auto spectrogram = torch::zeros({ num_frames, WINDOW_SIZE / 2 + 1 }, torch::kDouble);
fftw_complex* fft_out = fftw_alloc_complex(WINDOW_SIZE);
fftw_plan fft_plan = fftw_plan_dft_r2c_1d(WINDOW_SIZE, nullptr, fft_out, FFTW_ESTIMATE);
for (int i = 0; i < num_frames; ++i) {
std::vector<double> window(WINDOW_SIZE);
int start = i * HOP_SIZE;
for (int j = 0; j < WINDOW_SIZE; ++j) {
if (start + j < audio.size()) {
window[j] = audio[start + j] * 0.5 * (1 - cos(2 * M_PI * j / (WINDOW_SIZE - 1)));
}
else {
window[j] = 0.0;
}
}
fftw_execute_dft_r2c(fft_plan, window.data(), fft_out);
for (int k = 0; k < WINDOW_SIZE / 2 + 1; ++k) {
spectrogram[i][k] = std::log1p(std::sqrt(fft_out[k][0] * fft_out[k][0] + fft_out[k][1] * fft_out[k][1]));
}
}
fftw_destroy_plan(fft_plan);
fftw_free(fft_out);
return spectrogram;
}
std::pair<std::vector<torch::Tensor>, std::vector<torch::Tensor>> get_train_data(const std::filesystem::path& path) {
if (!std::filesystem::exists(path) || !std::filesystem::is_directory(path)) {
throw std::runtime_error(path.string() + " invalid path");
}
std::cout << "-7" << std::endl;
std::pair<std::vector<torch::Tensor>, std::vector<torch::Tensor>> data;
rapidcsv::Document doc("data/validated.tsv", rapidcsv::LabelParams(), rapidcsv::SeparatorParams('\t'));
auto path_column = doc.GetColumn<std::string>("path");
auto sentence_column = doc.GetColumn<std::string>("sentence");
std::cout << "-6" << std::endl;
if (path_column.size() != sentence_column.size()) {
throw std::out_of_range("path column size not equal sentence column size");
}
for (size_t i = 0; i < path_column.size(); i++) {
for (const auto& entry : std::filesystem::directory_iterator(path)) {
if (entry.is_regular_file() && entry.path().filename() == path_column[i]) {
std::string sentence = sentence_column[i];
data.first.push_back(calculate_spectrogram(read_audio(path.string() + "/" + path_column[i])));
data.second.push_back(string_to_tensor(sentence));
std::cout << path_column[i] << " " << sentence << std::endl;
if (data.first.size() >= 1) {
return data;
}
}
}
}
return data;
}
int main(int argc, char* argv[]) {
mi_version();
try {
int input_size = WINDOW_SIZE / 2 + 1;
int hidden_size = 128;
int num_classes = 30;
int num_layers = 2;
std::shared_ptr<SpeechRecognitionModelImpl> model = std::make_shared<SpeechRecognitionModelImpl>(input_size, hidden_size, num_classes, num_layers);
torch::load(model, "nn/nn2.pt");
auto data = get_train_data("data/clips");
std::vector<int> input_lengths, target_lengths;
for (const auto& input : data.first) input_lengths.push_back(input.size(0));
for (const auto& target : data.second) target_lengths.push_back(target.size(0));
int epochs = 10;
if (argc == 2) {
epochs = std::stoi(std::string(argv[1]));
std::cout << "Epochs = " << epochs << std::endl;
}
model->train(data.first, data.second, input_lengths, target_lengths, epochs);
torch::save(model, "nn/nn2.pt");
std::cout << tensor_to_string(model->forward(calculate_spectrogram(read_audio("data/clips/common_voice_en_41047776.mp3"))));
}
catch (const std::exception& ex) {
std::cout << ex.what() << std::endl;
}
return 0;
}
constexpr int WINDOW_SIZE = 1024;
constexpr int HOP_SIZE = 512;
I have a small team of 10 and I'm thinking of buying a prebuilt solution to get started. Have any of you used these pre-built AI deployment or have experience with customizing them for mining? I'd love to hear about your experiences
I am new to this. I used code from the link to train my custom dataset and it works. Now want to use this code and but change model to EfficientDet D1. This is how the config file is handle in the default code. But it doesnt support Efficientdet D1 model. So I downloaded the efficientdet D1 config file. I don't how to reference it. Can anyone help? I would like to use the default code for it. I dont mind changing the config file parameters manually. Thanks in advance!
exp_config = exp_factory.get_exp_config('retinanet_resnetfpn_coco')
I want to train an audio model. The code:
https://github.com/tsurumeso/vocal-remover
The training/validation datasets consist of pairs: One version is the mix with the vocals and instruments. The other version is the same song but without the vocals.
Since the datasets should represent real case scenarios: I have some songs (training dataset) where the vocals are quieter than the instruments. Meaning that the volume of the instruments in those songs is louder than the volume of the vocals.
Should I make the vocals in those mix file louder?
My thought was that the model won't be able to recognize the difference between the vocals and instruments in those songs because the vocals are too quiet and therefore hard to "find" for the model while training.
I worry that if I don't have any songs that have such scenarios that my model will have issues with separating songs outside of the datasets where the vocals are also quieter than the instruments.
Hey all! I'm taking a course on machine learning this quarter and one of the assignments given to us is to code a basic fully-connected neural net from scratch using Python. The goal here is to classify digits from the MNIST library.
So I built a basic implementation of a network with a hidden layer in between input and output and started training.
Something I noticed early on was that the network converged very differently from training run to training run, and to be honest I expected that given how my parameters are initialized randomly. However I've noticed that the vast majority of the runs converge to fairly middling accuracies/cost values (eg 30-50%) with only a small fraction of training runs ever going above 70%. I get that local minimums are a thing here but I didn't expect there to be so many.
Some measures I took to prevent this/improve performance include:
Is it just normal for the vast majority of training runs to get stuck like this? I was lead to believe that the network would usually still converge to a fairly good fitness over time but that's not been true for me. Implementation seems good as the network still converges to a lower cost.
I've been working with AI for a while. I usually build AI frameworks from scratch using PyTorch, but I have zero experience in turning these frameworks into actual products.
I used to think that I only needed to write an API for the backend to call, but now I keep hearing terms like CI/CD, Docker, and MLOps (or even LLMOps). I’m realizing there's a lot I don’t know about serving models in production.
Where can I start learning about these concepts? How do I go from a "toy framework" to something that’s actually deployed and running? Like is there anyplace/course for me to practice with third parties like AWS Any resources or advice would be greatly appreciated!
I've experience of 8yrs as a backend SWE. I have great interest in ML but I wasn't getting any opportunity in my job. Recently I worked in one of the project where I had to build a ML pipeline from scratch, collaborate with analysts. I was able to pull off the project and learn new technologies. Unfortunately, there's no such option to transition internally in my company to MLE.
I'm looking for help or advices if I can make an external switch to MLE. I know it requires lot of effort but would I get any interview call for MLE ?
How are these roles similar/different in the current times? Also meant to add Applied Scientist to the list.
Any projects or groups that I could participate in/contribute to?
Really want to get into the field of machine learning. Have fiddled around with ML and more specifically deep learning for the past couple of years, a few side projects, a few courses. But nothing more serious than that. If anyone were to have a project they have going on, something with structure and purpose which i could contribute to on the software side for free of course, in exchange that I will learn something new then I would be more than grateful, Thank you :D
Nvidia just released Jetson Orin™ Nano Super and I'm trying to understand how to interpret the specs. Obviously things like, the more FLOPS the better for both training and inference but I'm looking for resources on how to decode and understand different levers to increase training and/or inference capability (size, speed, energy, etc.). Things like cuda cores, memory size and bus speed, flops, and any other parameters that might matter.
Anyone know of resources or want to take a quick crack at this?
Hi everyone, I’m developing a custom chatbot focused on Canadian laws (for lawyers). I’ve noticed that a popular option is to use a vector database that indexes the information gathered during a training phase and then passes it as context to the OpenAI model when it receives a prompt. I know there are pre-trained models like GPT-2 and LLama that I can download and fine-tune with my own data, but it seems to be quite costly. In your opinion, what is the best solution for this type of problem in terms of cost/benefit? Thank you.
I've worked on some great projects in computer vision (CV), like image segmentation and depth estimation (stereo vision), and I'm currently in my final year. While LLMs (large language models) are in high demand compared to CV, I believe there could be a potential saturation in the LLM space, as both job seekers and industries seem to be aligning in the same direction. On the other hand, the pool of talent in CV might not be as large, which could create more opportunities in this field. Is this perspective accurate?
#computerVision #LLM #GenAI #MachineLearning DeepLearning
Hello,
I am currently working on preprocessing big data dataset for ML purposes. I am struggling with encoding strings as numbers. I have a dataset of multiple blockchain transactions and I have addresses of sender and receivers for these transactions. I use pyspark.
I've tried String Indexer but it throws out of memory errors due to number of unique values. How should I approach it? Is hasing with SHA256 and casting to big int good approach? Wouldn't big numbers influence ML methods too much? (i will try different methods ex. random forests, gan, some based on distance etc)
I am given data from a marketing campaign that has been conducted. Unfortunately, the people who were selected for communication are statistically different from the people in the control group. Please suggest ways to take this into account in order to build an uplift model.
At the moment I know ways of building based on matching techniques (propensity score, mahalanobis distance and coarsened exact), but I would like to know other options for solving this problem.
I have several pdfs(presentations with pdf extensions) which consists of images that have a lot of important texts that I want to extract from I need to extract from all images not one page by page ?
Hey guys,
I’m sure some of you have faced the challenge of dealing with the high costs of renting cloud resources to train large language models. As a machine learning enthusiast living in a third-world country, the cost becomes unsustainable pretty quickly, and it’s hard to justify.
I’m curious has anyone else run into this issue? How are you handling the cost of training models, or are you finding alternative ways to get the performance you need for ML tasks without breaking the bank?
Would love to hear your thoughts and experiences!
Hi everyone, How do you keep up with the latest publications in your field of interest? For example, my major is NLP and I'm interested in some specific problems like Information Extraction, Graph NN, KG, LLM. Are there any tools, websites, or strategies you use to stay informed?
The question is in the title, I'm first-year CS and just learned about Taylor series. Seeing as the point of ML is to learn a function to approximate the real world, if I were to build a model, and train it with inputs as x and expected outputs as e^(x) will the neural network just learn the Taylor series for e^(x)?
Are there any solutions, whether open-source self-hosted or proprietary, free or paid (but preferably free, haha), that would allow for the automation of blogging or a website on WordPress posting or, for example, a Telegram channel posting using neural networks (like ChatGPT or perhaps even self-hosted Llama)?
Such solutions, that can automate rewriting of materials from user-specified sources and automatic post creation?
I've seen some websites that look very much like they were written by neural networks. Some even seem not to bother with manual curation of materials. What solutions are they using for these tasks?
I’m a final-year student exploring machine learning jobs. I know basics like supervised learning, regression, and Python libraries, but I’m unsure if I need deep learning or advanced concepts, or if solid projects and practical skills are enough. What do companies expect from freshers?
https://www.youtube.com/watch?v=xUbD3YaDZYM - APPLY ASAP
Hello everyone!
I hope you're all doing well. I have an upcoming interview for a startup for a mid-senior Computer Vision Engineer role in Robotics. The position requires a strong focus on both classical computer vision and 3D point cloud algorithms, in addition to deep learning expertise.
For the classical computer vision and 3D point cloud aspects, I need to review topics like feature extraction and matching, 6D pose estimation, image and point cloud registration, and alignment. Do you have any tips on how to efficiently review these concepts, solve related problems, or practice for this part of the interview? Any specific resources, exercises, or advice would be highly appreciated. Thanks in advance!
Hey all! I'm currently doing a project involving NER and ABSA using text data from social media posts for training (Facebook, Instagram, etc). I know Twitter doesn't provide read access for the free API tier. I've heard that Meta's Graph API can do it but I'm not sure on the exact how-to. Can anyone help a beginner out on this? I only need the text data based on a keyword search. For example, "Apple" to extract all posts containing that word
I am using the nn model to apply to a problem of predicting network attacks through features obtained from packets transmitted on each outgoing flow of IPs. The model works well when predicting flows of IPs but when I group flows of each IP into a data frame and predict on that frame and vote to determine whether the IP belongs to the malicious IP or not, its performance is significantly reduced although it is very good at predicting each flow. Is there any way to improve this ability of the model?