/r/MachineLearning
ml.
Beginners please see learnmachinelearning
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Andrew Ng and Adam Coates (4/15/2015)
Related Subreddit :
/r/MachineLearning
Hey there,
For some context I am graduating this May with my bachelors is SE and will be starting a role at Raytheon full time in data science. It is essentially going to be paid for by my company and I have had managers ask me if that is something that I would want to pursue?
I keep reading how a masters is useless unless you have a PHD. How does that stack up with me, upon completion having about 2-3 years exp at my current job?
Thanks
So I've done object detection a few years ago where FRCNN, SSD and YOLO were popular together with stuff like RESNET and VGG as backbones.
Coming back to an object detection task today in 2024, I can't find any major improvements or really new architectures. Am I missing something or is this still the SOTA?
Thanks! :)
I want to implement a service that checks for broken HTML files and pinpoints the exact locations of errors, such as missing tags, excessive tags, unexpected special characters, etc. But I don't know all the ways in which an HTML file can become invalid beforehand.
I have a large dataset containing both valid and invalid HTML files. So far, I’ve chosen an LSTM model, which effectively reconstructs missing tags. Then, I compare the reconstructed text with the original and show the diff.
However, I’m unsure if this model will fulfill all my requirements or if there might be a better option available for my needs. I would appreciate any advice.
Hi everyone. I'm a newcomer in generative AI, and I am currently diving into the inner structures of stable diffusion. I noticed that the VAE encode of SD uplifted the channels to 512 gradually, but suddenly drop to only 4 when generating the latent vectors. It's just like having a very very thin bottleneck after a very broad tunnel. Why is it designed like that? Is 64*64*4 really sufficient for expressing so much possible features.
I've heard multiple people complaining about things breaking down when you're doing multi-node training. As well as issues in trying to get the most out of their infra. But I haven't heard a reliable solution yet.
Is a combination of Ray with SLURM an approach that can solve this? I'm guessing it's not enough otherwise this wouldn't be a problem anymore.
I'm interested in learning more about this topic. So feel free to share any learning resource too.
Thanks
I'm a master's student, and for a few weeks now I've been working on a project which is strongly related to the work done by researcher A in a recent paper. The project is going well, and I think it could make a great publication, unfortunately I don't know anyone who works on this topic, and some guidance would be useful to defend the project properly. To be clear, the project is already well advanced, it's not just an idea.
I was thinking of trying to cold email researcher A, to see if he'd be interested in collaborating on this project, but I have my doubts:
1/ Is it weird to cold reach by email ? I guess usually people use conferences to do this kind of networking but I haven't had the chance yet 2/ Should I be cautious about the information I send to this person? He's a well-known researcher, so I imagine it's pretty safe, but I wouldn't want to be scooped because of an email
I know I'm giving little information about the situation, but I'd be happy to hear any advice or experiences, if you've ever done something similar and whether or not it was successful
Assuming I had a GPU that could load a 7B model without compressing, just wanted to know if 4bit quantization was faster for inference? Or do the 4bit vectors need to be decompressed making 4bit quantization slower?
Here is a sample code to load Mistral.
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
device = "cuda" # the device to load the model onto
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
I want public opinion on multiple data science certificates. Certificates are a great way to validate your skill. And it can help you at being hired.
I will list a few certificates I am interested in to enroll and finish.
My main question is:
Which of the following certificates are "legit"? That said I mean which of the following certificates are high-quality, valuable, highly regarded, prestigious, solid proof of your skill?
I am asking in context of building career as a data scientist in finance domain.
Also I would be glad if you shared your experience have you taken it before. If you recommend any other certificates, please share.
I am planning to create a tourism app with a function that helps tourists avoid scams by improving their communication with locals. The app will analyze conversations and detect potential scams.
Do you have any suggestions on how to approach the task of finding a suitable data set and training a model? It seems like a challenging task.
So I was on Twitter (first mistake) and mentioned my neural network in Java and was ridiculed for using an "outdated and useless language" for the NLP that have built.
To be honest, this is my first NLP. I did however create a Python application that uses a GPT2 pipeline to generate stories for authors, but the rest of the infrastructure was in Java and I just created a python API to call it.
I love Java. I have eons of code in it going back to 2017. I am a hobbyist and do not expect to get an ML position especially with the market and the way it is now. I do however have the opportunity at my Business Analyst job to show off some programming skills and use my very tiny NLP to perform some basic predictions on some ticketing data which I am STOKED about by the way.
My question is: Am l a complete loser for using Java going forward? I am learning a bit of robotics and plan on learning a bit of C++, but I refuse to give up on Java since so far it has taught me a lot and produced great results for me.
l'd like your takes on this. Thanks!
In my project, I focus on training a model for segmenting lung CT tumors.
My training dataset comprises 450 patients, each exhibiting tumor. Typically, a patient's data consists of approximately 200 slices, with around 20 slices containing lesions. Consequently, the dataset consists of roughly 90,000 images(512x512), out of which only 9,000 contain small tumors.
This is sample data for one slice with tumor:
For model implementation, I utilize torch.Hub 2d unet and employ both BCEWithLogitsLoss and Dice Loss with weight = 10.0 functions.
Despite my efforts, I am perplexed as to why the loss metric fails to decrease during training.
doesnt look to be the case but I also dont have any experience with these libraries so idk exactly, do one of you know?
so I set up whisperx and I love it, but the way the dependencies work I have to run it in a conda environment. This is generally not an issue for me, except the fact that the error and output messages in regards to file locations aren't correct and I guess assume you aren't working in a conda env. But the way the libraries work is that when you load the whisper model, if you've just started the kernel lightning will automatically upgrade my loaded checkpoint from v.1.5.4 to v.2.2.2. and it takes 15 minutes every time which is honestly way too long for every time I start the kernel because it automatically deletes when you power it back off. the annoying thing is that it isn't permanent, and I get the following instructions:
To apply the upgrade to your files permanently, run \
python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/torch/whisperx-vad-segmentation.bin``
when I copy and paste the bash command into my terminal, it says the file directory doesnt exist, so I'm assuming maybe thats what it would be if I wasn't in a conda env. But I cannot for the life of me find the checkpoint file so I can just swap out the paths. I've searched it in the directory where all the files for that conda env are, and can't find that file. I also did a whole disk search and couldn't find that file. ChatGPT was super useless and I can't find anything about this online. Does somebody know how I can find this?
if your answer is a model, keep in mind of data availability and hardware limitations.
Any recommendations on computer cases that fit multiple gpus (like the RTX4090), without having to go the watercooling route to slim the gpus to fit?
Can any bitcoin mining cases be repurposed for ML server rigs?
I've conducted research in NLP for a while now and have been implementing solutions in industry for the past couple of years. Like many other companies, my company's management was also "wowed" by ChatGPT and has pushed for LLM R&D for a while now. Other than using them for text generation, however, I don't see how good the ROI is.
A lot of the tasks that I'm working on tend to focus on semantic search, information extraction (NER, RE), text-image representation learning, etc. These tasks can be handled very well with well-trained BERT and CLIP models and I don't think that the effort put into developing LLMs would be worth it. The recent research also seems to support that for IE tasks traditional supervised methods are still where it's at.
Are there any other use cases that you guys have found LLMs to excel at?
I'm curious about how we can verify the internal computations of a language model when the output does not directly reflect the processed information. For instance, consider a scenario where a model is asked to process a query, but is instructed to output only a "!" instead of the actual answer.
What mechanisms or tools are available to confirm that a language model has executed the right computation, especially when the output is intentionally obfuscated or simplified?
Edit: Attempt to rephrase the question with more clarity.
Hands-On with Cognita : modular, open source RAG application for prod, built w/ scalability in mind.
This comprehensive guide provides a hands-on approach to building and deploying Retrieval Augmented Generation (RAG) systems using Cognita, a modular and open-source framework designed with scalability in mind. The article walks you through the process of setting up Cognita, ingesting your knowledge base, and leveraging its core features to develop production-ready RAG applications.
🔹 Seamless Setup: Step-by-step instructions guide you through cloning the Cognita repository, creating a virtual environment, installing dependencies, and configuring the necessary files.
🔹 Knowledge Base Ingestion: Learn how to ingest and process your knowledge base, including chunking documents, generating embeddings, and indexing into a vector database.
🔹 Modular Architecture: Cognita promotes separation of concerns by decoupling the data ingestion process from the query service, enabling independent scaling and efficient updates.
🔹 Scalable Deployment: Explore how Cognita facilitates scalable deployments by integrating with cloud infrastructure, allowing independent scaling of components based on resource requirements.
🔹 Production-Ready: Discover the key challenges addressed by Cognita when transitioning a RAG system from a development environment to production, such as chunking and embedding large datasets, building scalable query services, deploying language models and embedding models as separate services, and setting up robust vector databases.
🔹 Hands-On Example: Follow along with a practical example that demonstrates ingesting a credit card knowledge base, configuring the RAG system, and testing the question-answering capabilities using cURL requests.
Whether you're a seasoned developer or new to RAG systems, this article equips you with the knowledge and tools to build modular, flexible, and scalable production-ready RAG applications using the Cognita framework.
The large language models (LLMs) are comprehensive NLP task solvers. They can do many NLP tasks by the earliest program language, English, rather than the discriminative-based models that need complicated processes.
As a PhD student, I have been a little anxious about the future recently. My advisor supports a funded project on knowledge extraction and knowledge graph. But I think the future must be LLMs. My current plan is to do some RAG work after completing the NER and RE project at hand.
Do you guys have any thoughts about the traditional NLP tasks (text classification, NER, RE, etc.) in the era of LLMs?
TLDR at the bottom
My Background
I am an electrical engineer by trade. I started learning about ML/AI almost a year ago. I have taken the Machine Learning Specialization by Andrew Ng on Coursera, and most of the GAN specialization also on Coursera. I'm comfortable in python, but definitely no expert. Most of my coding and AI/ML knowledge in self-taught, so there's definitely huge holes in my knowledge.
My Goal
I am trying to create a video generator model. To that end I found an architecture that supposedly works very well called TGANv2, which is written in Chainer. I want to edit this model and eventually put it on a microcontroller. I am also treating this as a learning exercise, so I copied the model in PyTorch and got similar results as the Chainer model. Here is the Pytorch model, if you are interested.
Now I was playing around with making edits in PyTorch and found the training slow and expensive sometimes (I don't have a GPU, I am using a runpod VM with a GPU). I tried changing the the generator convolutions to depth-wise separable and it considerably increased training time, which is odd considering it cut down my generator size to about 1/10 or so. So I decided to re-create the model in Tensorflow to see if it would train faster, and because I think I will eventually have to write the model in Tensorflow Lite to get it on a microcontroller. But I keep running into a Floating Point Exception Error. More on that below.
Here is my Tensorflow Code: Model (It's long, I have a snippet of the training loop a bit further down, which I think is more relevant).
Setup
Runpod VM with an NVIDIA A40 GPU running Ubuntu 22.04. Originally running CUDA 11.8, Tensorflow 2.14, and cuDNN 8.7 (though tried a bunch of different combinations, explained below). I am also running this in a conda virtual environment.
Issue I am running into
I run my tensorflow code, it instantiates the model just fine. It starts the training loop and goes through forward propagation just fine. But when it gets to the line that is meant to calculate discriminator gradients, it keeps throwing a Floating Point Exception error. I have tried multiple different combinations of CUDA, Tensorflow, and cuDNN. Also, when I run this code on my local machine CPU it calculates the discriminator gradients and doesn't throw a floating point exception error (it has another error later on, but I will cross that bridge when I get there).
Here is the pertinent part of my training loop (snipped for brevity):
#Create dataloader
dataset = VideoDataset(directory)
dataloader = tf.data.Dataset.from_generator(
lambda: iter(dataset), # Corrected to use iter() to clearly return an iterator from the dataset
output_signature=(
tf.TensorSpec(shape=(16, 32, 32, 3), dtype=tf.float32),
tf.TensorSpec(shape=(8, 64, 64, 3), dtype=tf.float32),
tf.TensorSpec(shape=(4, 128, 128, 3), dtype=tf.float32),
tf.TensorSpec(shape=(2, 256, 256, 3), dtype=tf.float32)
)
).batch(batch_size)
print("Starting Training...")
for epoch in range(n_epochs):
start_time = time.time()
total_gen_loss, total_disc_loss, total_loss_real = 0, 0, 0
num_batches = 0
for batch in dataloader:
#Adjust learning rate
iteration += 1
x_real = batch
num_batches += 1
current_batch_size = x_real[0].shape[0]
noise = tf.random.uniform((batch_size, z_dim), -1.0, 1.0)
# Discriminator loss
print("Generated noise")
with tf.GradientTape(persistent=True) as gen_tape, tf.GradientTape() as disc_tape:
x_fake = gen(noise, training=True)
print("Generated x_fake")
y_fake = disc(x_fake, training=True) # Compute once for both G and D updates
y_real = disc(x_real, training=True)
print("Generated y_fake and y_real")
disc_loss = disc_loss = tf.reduce_mean(tf.math.softplus(y_fake)) + tf.reduce_mean(tf.math.softplus(-y_real))
gen_loss = tf.reduce_mean(tf.math.softplus(-y_fake))
# Use the checking function instead of tf.debugging.check_numerics
check_and_report(y_fake, "y_fake")
check_and_report(y_real, "y_real")
check_and_report(disc_loss, "disc_loss")
check_and_report(gen_loss, "gen_loss")
total_disc_loss += disc_loss
loss_real = tf.reduce_mean(tf.math.softplus(-y_real))
total_loss_real += loss_real
#Apply discriminator loss gradients
print("Calculating disc gradients")
disc_gradients = disc_tape.gradient(disc_loss, disc.trainable_variables)
print("Applying disc gradients")
try:
disc_opt.apply_gradients(zip(disc_gradients, disc.trainable_variables))
except Exception as e:
print(f"Error applying gradients: {e}")
print("Applied disc grad")
And here is my terminal output:
(tf_gpu) root@07016ffd12d2:/workspace# python 2TF-TGANv2.py
Num GPUs Available: 1
Tensorflow version: 2.8.0
2024-04-14 19:46:37.328967: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-14 19:46:37.920705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 43453 MB memory: -> device: 0, name: NVIDIA A40, pci bus id: 0000:52:00.0, compute capability: 8.6
Starting Training...
Generated noise
2024-04-14 19:47:03.174697: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
2024-04-14 19:47:03.705363: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
Generated x_fake
Generated y_fake and y_real
No issues in y_fake
No issues in y_real
No issues in disc_loss
No issues in gen_loss
Calculating disc gradients
Floating point exception
The "no issues" print statement comes from a function checking the named tensors for NaN and Inf values, but these are just the output labels, not necessarily the gradients. Notice the error "Floating point exception" comes after "Calculating disc gradients", which means that this line is where the issue is:
disc_gradients = disc_tape.gradient(disc_loss, disc.trainable_variables)
I have been at this for a week and am at a loss.
What I've tried
I have essentially just tried a bunch of different combinations Tensorflow(TF), CUDA, and cuDNN based on the "Tested Build Configurations" for GPU according to tensorflow's "Build From Source" page. Combinations I've tried (on NVIDIA A40).
TF = 2.15, CUDA = 12.2, cuDNN = 8.9
TF = 2.14, CUDA = 11.8, cuDNN = 8.7
TF = 2.13, CUDA = 11.8, cuDNN = 8.7
TF = 2.12, CUDA = 11.8, cuDNN = 8.7
TF = 2.4, CUDA = 11.0, cuDNN = 8.0
TF = 2.5, CUDA = 11.0, cuDNN = 8.0
TF = 2.11, CUDA = 11.2, cuDNN = 8.1
TF = 2.10, CUDA = 11.2, cuDNN = 8.1
TF = 2.9, CUDA = 11.2, cuDNN = 8.1
TF = 2.8, CUDA = 11.2, cuDNN = 8.1
TF = 2.3, CUDA = 10.1, cuDNN = 7.6
TF = 2.2, CUDA = 10.1, cuDNN = 7.6
All of these give me the same "Floating point exception" error at the discriminator gradient calculation step. Does anyone have an idea of what is causing this? Again, I do NOT have this issue when running this on my local machine CPU. It only happens when running on a runpod VM. Is this something in my code that is handled differently on a GPU? Is this caused by a crappy VM setup? If so what could be causing it? Please help me. I really want to get this to work and I have spent a ridiculous amount of time on just trying to get Tensorflow to work on GPU lol.
TLDR
Trying to copy a video generation model TGANv2, written in Chainer. Copied and ran it in PyTorch, works fine. When trying to run it in Tensorflow, I keep getting a "Floating point exception" error when it gets to the gradient calculation for the discriminator:
disc_gradients = disc_tape.gradient(disc_loss, disc.trainable_variables)
I am running it on a runpod VM with Ubuntu 22.04 and an NVIDIA A40 GPU. I am also running it on a conda environment. The error does not happen when running it on my local computer CPU. I have tried 13 different combinations of Tensorflow (see section immediately before this) based on the "Tested Build Configurations" for GPU according to tensorflow's "Build From Source" page, all have the same issue. Please help me.
Hello everyone, i am working on a project where we receive daily hundreds of users list of products they want to know it's price
so basically what i am trying to automate here is to take all these different slang, market names, syntaxes of product names and map them to the formal list names using ML
so to make it short i am trying to understand what does the customer mean by saying "X" and map it to the same product in a list of formal names "Y"
so can customized NER help in this task? tagging the user entries which contains "product name" "brand" "size" "color" as a training dataset and then map it to the formal name that is also has the same tags? or how can it work in this situation? i am trying to understand in this task
1 - what is the shape of the dataset for training
2 - how can i use NER to do this project
3 - the data or the user entries contains both arabic and english or purely just arabic will this cause any problems in the future?
if you have any experience with NER projects that are similar or different to the domain would be super happy to hear about it :)
Thanks!
Are there currently any machine learning tools that can detect whether audio contains natural human speech or automated/AI speech?
Depending on my budget which one is better ? 2x RTX 3060 12GB GPU=24GB OR 1 SINGLE RTX 4060TI 16GB IS BETTER FOR RUNNING AI LOCALLY?
I am trying to train a neural net which takes in two text inputs and produces a single binary classification output. I have a huge dataset (~2 million sentences) which I am processing with a simple LSTM + linear layer model on pytorch.
However, however long I train it, all the values within the batch are being assigned the same prediction (https://imgur.com/a/nAZP0Tp).
Any fixes?
I would love to know more about folks in fields like agriculture, health economics
Please provide
Research Field
Problem you are solving
Any interesting paper in this field?
Paper: https://arxiv.org/abs/2404.03622
Abstract:
Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks. However, their abilities in spatial reasoning, a crucial aspect of human cognition, remain relatively unexplored. Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind's Eye, enabling the imagination of the unseen world. Inspired by this cognitive capacity, we propose Visualization-of-Thought (VoT) prompting. VoT aims to elicit spatial reasoning of LLMs by visualizing their reasoning traces, thereby guiding subsequent reasoning steps. We employed VoT for multi-hop spatial reasoning tasks, including natural language navigation, visual navigation, and visual tiling in 2D grid worlds. Experimental results demonstrated that VoT significantly enhances the spatial reasoning abilities of LLMs. Notably, VoT outperformed existing multimodal large language models (MLLMs) in these tasks. While VoT works surprisingly well on LLMs, the ability to generate mental images to facilitate spatial reasoning resembles the mind's eye process, suggesting its potential viability in MLLMs.
I developed this basic and generic CNN and fully connected layer accelerator project for the uni, and I wanted to share it with you. It utilizes int8 quantized data and weights like Coral TPUs. The project includes a script that converts the TensorFlow model into the instructions required by the accelerator. Additionally, there is an example with an MNIST classifier. More info in README.