/r/deeplearning
Resources for understanding and implementing "deep learning" (learning data representations through artificial neural networks).
/r/deeplearning
Greetings,
I came across a use case which I wondered could be solved by LLMs. I've crafted a problem statement, which would help you understand the use case. Any advice would be really helpful.
Here is a problem statement. Your job is to analyse multiple smartphones, to determine which one to buy. You have been given, for each smartphone, a list of their details and specs. These details tell about the configuration & features of each smartphone. The details are only in textual sentences. This sentences can be for example the RAM, the screen size, the processor configuration, the launch date, the increasing popularity etc. Now say for each smartphone you have a 100 such details and you have a hundred smartphones in total to compare. The thing to note is that the type of details may not be common for all the smartphones. For example some smartphones may have folding feature which you can compare but some smartphones may not have folding feature at all. Your job is to create a program using LLMs and AI to analyse these details and compare the smartphones to determine which one is the best Smartphone to buy. You do not have any other kind of structure data about this items and you only have to use the LLMs to compare the smartphones.
Explain only the approach you would use.
Thnxxxx
So I have a dataset of 300 samples of text mapped to code. That code is limited in the sense that they are some basic lines of codes that are not going to be changed for later inference. To be more precise if I have those 300 lines of code, with their text description, then for inference, we're expecting one of those lines given the parameters provided in the text. For example: ("print the first index of this string s", "print(s[0])")
, this is just a hypothetical example. What we expect, is an inference for something like: "display the 0 index of string j"
to give "print(j[0])"
.
There are two ways to go about this, either to fine-tune an llm, preferably smaller one, while increasing the model complexity if the smaller one works but doesn't capture all of the complexity.
Or, we do RAG, because basically we're doing knowledge here to be memorized, and basically for inference, we're just asking for one of those patterns in the initial dataset, and thus it s more of a knowledge memory. Am I right with this?
I want to detect whether or not someone is done speaking on an open line (EP detection). Using hard coded values like "if they stops for 5 seconds then they are done" isn't really the best approach through testing.
I want to explore these options; how to tell if someone is done talking through:
I'm looking to get a laptop for machine learning/deep learning research projects in the medical domain.
In the US.
I'm not sure if a Mac would be a better choice, but preferably non-Apple just due to their excessive pricing.
Budget $1500-2000.
Please suggest.
I spent a long time trying to figure out why gradient descent doesn't work correctly for me. Explain how it works. What's the difference between regular gradient descent and batch gradient descent. Average error for gradient descent (and I'm not talking about loss functions, but about backpropagation), how is it calculated? In the loss function, we kind of take this error squared, thereby amplifying it or taking the absolute one, but we don't divide it like that for backpropagation, thus the errors + - give 0. I have a neural network like this now: 4 input, 2 hidden, 1 output layer. I don't know how gradient descent works, but I'll assume for each how it works (I don't understand this, as I understand, and I want to express this idea so that you can correct me in the right direction). For example, a regular gradient descent calculates an error for the entire dataset, I don't know there, here is an example of datasets (0, 1, 1, 0) (1,0,1,0) (1, 1, 1, 0), and on the first dataset I expect the value 0, the second, 1, the third 1. When running with randomly initialized weights, LET'S say I get the following: 0.6, 0.3, 0.8. Here, for the loss function, I calculate like this: (0 - 0.6) ^ 2 + (1-0.3) ^ 2 + (1-0.8) ^ 2. Everything will be okay here, the error is normal. Here is its sum and we divide it by the number of samples: 0.89 / 3 and that's how the loss is calculated. But for backpropagations it needs to be different: -2*(0-0.6) + -2*(1-0.3) + -2*(1-0.8), and here everything is different: The sum and its division: the sum is -0.6 and we divide it by 3, it turns out -0.2, we sort of go to correct, but the error will be the same, since we will get an average of -0.2 from the run of this era and we will go along the same path.
There's this issue in cgpt where it'll randomly start deviating from the format specified in instructions to adopt its own and apply it to all following outputs.
I’ve been trying to learn about Explainable AI, and I’m curious about the differences between model-agnostic techniques compared to model-specific ones. How do they actually work, and what are the trade-offs in terms of accuracy and interpretability? Any insights or examples you could share would be super helpful!
Hello, I am fairly beginner in deep learning, and we have to complete a requirement in a course. Can anybody recommend a good alternative to colab? I am willing to pay for compute units, but others said colab's kinda trash. Despite this, I haven't heard any other alternative ways on how I can train AI models. Thank you for the help!
Hello everyone,
I am working on a Image Segmentation project. Going to the point, in the TensorBoard logger the recorded metrics and loss values are significantly different than what can be seen in the terminal, as the learning epochs go by.
I tried to investigate this issue online, but couldn't find anything. I also added a CSV Logger to compare the results, and those match with the TensorBoard ones.
However, if I take a look at the values returned in the terminal, for example I can see that: mean_iou value is consistently above 0.50, yet in the CSV Logger & TensorBoard the recorded value for mean_iou in that epoch is 0.36.
I cannot see where the issue could be created. I used the same training workflow as always and I have never experienced such issue.
Thanks in advance to anyone who could answer.
This is a beginner question but I was wondering what part of transformer makes it parallel? It still needs to process tokens layer and layer right?
Hi guys, I'm building a RAG solution with Llama3 model that is designed for retrieving knowledge from Research Paper. As other RAG projects, I started to split the document into chunks with a specific number of tokens per chunk. However, I recently began to wonder: what if I instead create chunks that encompass entire sections of the paper, such as having one chunk for the abstract, another for the methodology, and so on?
Not sure this will enhance the work but I'm curious and want to try but doesn't know where to start. Can anyone suggest me a light weight pretrained model that excellent in identify the sections of a document like research paper?
Lane Detection using Mask RCNN – An Instance Segmentation Approach
https://debuggercafe.com/lane-detection-using-mask-rcnn/
Lane detection and segmentation have a lot of use cases, especially in self-driving vehicles. With lane detection and segmentation, the vehicle gets to see different types of lanes. This allows it to accordingly plan the route and action. Of course, there are several other components involved along with computer vision and deep learning. But this serves as the first step. In this article, we will try to solve the first step involving computer vision and deep learning. We will train a Mask RCNN model for lane detection and segmentation. We are taking an instance segmentation approach to detect and segment various types of lane lines.
Before ChatGPT, when I struggled to understand some key concepts in theory or some details were obscure, I would go to Reddit, StackOverflow, etc.
But people, especially in Programming Forums, if you state something stupid (what you belived at), can make embarassment out of you. Even here, in the sub-reddit topic, you propose something you were tinkering with, some people may make fun of you.
ChatGPT changed it all. It became all different story. It would explain to you all the theory so patiently, would no argue with you even if it recognized your intellectual level is below desk.
It would explain and explain, and even say sorry to you if it did not understand the idea of the question fully.
Image by macrovector on Freepik
With ChatGPT3 -> 4 -> 4o it became better and better in its answers.
I cannot complain about ChatGPT in these terms.
However, only what can cause mistrust, is the fact, that all these conversations are saved and can be misused. Because at some point I started to share my ideas with ChatGPT, asking it to check them. Some people started to share their photos, videos with AI. At one point it can accumulate all this information. And become ... a monster. Some people even nowadays, stopped doing intellectual/creative work relying on it.
I personally believe that ChatGPT o1 - is the treshold that we don't need to go above. It should stay helper tool for people, and not go beyond that.
Why I was working on this small LLM posted recently, because it is limited, it is kind of small unit that can be put somewhere to work as auxiliary block...
Hey folks,
Are you looking for a Military Audio Dataset? We are happy to announce that we deployed our MAD (Military Audio Dataset) which contains 7,466 audio samples from 7 classes (communication, gunshot, footsteps, shelling, vehicle (tank), helicopter, and fighter) corresponding to approximately 12 hours, exhibiting distinctive characteristics not presented in academic datasets typically used for machine learning research.
The dataset is available at Kaggle. For more detail, please refer our Github repository or paper
Hey guys,
I'm currently putting together a computer to build my own workstation for my deep learning and machine learning hobby projects. I know there are cloud solutions that are often better, but I want to build my own station first. Here are the planned components:
- MSI B760 GAMING PLUS WIFI motherboard
- Intel i7 of the 13th generation
- 32 GB DDR5 RAM (6000 MHz)
- 2 TB M.2 PCIe Gen4 SSD
The only difficulty I'm having at the moment is deciding between two graphics cards:
- Gigabyte NVIDIA GeForce RTX 3060 GAMING OC V2 (12 GB GDDR6)
- Gigabyte NVIDIA GeForce RTX 4060 Ti GAMING OC (16 GB GDDR6)
I'm aware that VRAM isn't the most important thing, as I don't just want to work with LLM and CNN models, but also want to develop my own deep learning models. The memory bandwidth of the GTX 3060 is better, but it has less VRAM. On the other hand, the 4060 Ti offers a significantly higher CUDA score, but also costs 200 euros more.
I would be super grateful if you could help me with this decision!
Thanks in advance!
Hi everyone I have this problem in hand. I have reviews of certain number of products like I have around 10000 rows and based on these reviews I want to do hierarchical classification of these products. Just to test out things I tried modelling three different model to be able to predict three different level of classes, but as one would expect I am able to get good accuracy or F1 scores for the category one but as I dive into the hierarchy the score worsens. My plan was to start with simpler models like Naive Bayes classifiers, SVM, logistic classifiers, xgboost and then eventually move to more advanced methods like rnns, lstm and bert. I want to develop good intuition around how I should be solving hierarchical classification problem, any suggestions would be helpful. Thanks
does anyone have any insight into the model architecture / method used to determine the coordinates of the elements on the screen?
they mentioned in their blog post that they had to come up with a new model but didn't give any details
So i am in the mind set of building a home PC with Nvidia 4080 super , core i9 blah blah... for doing experiments on LLMs and also build AI applications. So is it worth buying all this for like say $3500 or go for cloud services like googleColab, PaperSpace.. etc. What do you think folks?
if train_images.shape[0] == 0:
raise ValueError("No images were loaded. Please check the image directory.")
# Create a TensorFlow dataset without batching
full_dataset = tf.data.Dataset.from_tensor_slices(train_images)
# Shuffle the dataset
full_dataset = full_dataset.shuffle(buffer_size=len(train_images))
# Split the dataset into training and validation sets
train_size = int(0.8 * len(train_images))
val_size = len(train_images) - train_size
train_dataset = full_dataset.take(train_size)
val_dataset = full_dataset.skip(train_size)
# Batch the datasets
train_dataset = train_dataset.batch(32)
val_dataset = val_dataset.batch(32)
# Function to build the autoencoder model
def build_autoencoder(input_shape):
# Encoder
encoder_input = layers.Input(shape=input_shape)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(encoder_input)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
encoder_output = layers.MaxPooling2D((2, 2), padding='same')(x)
# Decoder
x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(encoder_output)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
decoder_output = layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)
# Autoencoder model
autoencoder = models.Model(encoder_input, decoder_output)
return autoencoder
# Function to train the autoencoder
def train_autoencoder(train_dataset, val_dataset):
model = build_autoencoder(input_shape=(224, 224, 3))
model.compile(optimizer='adam', loss='mse')
# Define callbacks for saving the model and early stopping
checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("autoencoder_model.keras", save_best_only=True)
early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
# Train the model
history = model.fit(
train_dataset,
epochs=50,
validation_data=val_dataset,
callbacks=[checkpoint_cb, early_stopping_cb],
verbose=2
)
# Save the training history
with open('training_history.txt', 'w') as f:
for key, values in history.history.items():
f.write(f'{key}: {values}\n')
print("Training complete, model and history saved.")
# Train the autoencoder
train_autoencoder(train_dataset, val_dataset)
Epoch 1/50Epoch
ValueError: None values not supported.
Essentially, I have a trained model (using PyTorch) that I want to deploy on an edge device (all written in C/C++) for inference. Just for context (I'm working alone on this project so I don't get much guidance). My understanding is that, at deployment, the input (inference data needs to be integers), and my model's parameters (weights and bias/activation) also need to be integers. Because I don't have "inference data", I am currently testing out my implementation/prototyping by quantizing my validation/test data and comparing the validation/test results I get using the floating point model parameters vs the results I get using quantized/integer model parameters. To make this more concrete (or succinct), I'm testing with two cases:
def quantize_tensor(tensor, num_bits):
qmin = - (2 ** (num_bits - 1))
qmax = (2 ** (num_bits - 1)) - 1
min_val, max_val = tensor.min(), tensor.max()
scale = (max_val - min_val) / (qmax - qmin)
zero_point = qmin - min_val / scale
zero_point = torch.round(zero_point).clamp(qmin, qmax)
q_tensor = torch.round(tensor/scale+zero_point).clamp(qmin, qmax)
if num_bits == 8:
q_tensor = q_tensor.type(torch.int8)
elif num_bits == 16:
q_tensor = q_tensor.type(torch.int16)
else:
q_tensor = q_tensor.type(torch.int)
return q_tensor, scale, zero_point
Then I quantize the model's weights and the bias using this:
def quantize_model(model, weight_bit_width=16, bias_bit_width=16):
quantized_state_dict = {}
scale_zp_dict = {} # To store scale and zero-point for each parameter
for name, param in model.state_dict().items():
if 'weight' in name:
q_param, scale, zero_point = quantize_tensor(param, weight_bit_width)
quantized_state_dict[name] = q_param
scale_zp_dict[name] = (scale, zero_point)
elif 'bias' in name:
q_param, scale, zero_point = quantize_tensor(param, bias_bit_width)
quantized_state_dict[name] = q_param
scale_zp_dict[name] = (scale, zero_point)
else:
# For other parameters, keep them as is or apply appropriate quantization
quantized_state_dict[name] = param
return quantized_state_dict, scale_zp_dict
Furthermore, I quantize my model and the data like so (see code below) however, because my ML Problem is a multiclass and multioutput problem, I need to call torch.softmax on the logits I get out of my model, so I can get prediction probabilities but the softmax function doesn't support integers (or technically is not implemented for ints) which makes me worried that my overally quantization approach is wrong (I add the model's code and extra below):
import copy
class model(nn.Module):
def __init__(self, inputs, l1, l2, num_outputs, output_classes=3):
super().__init__()
# define the layers
self.output_classes = output_classes
self.num_outputs = num_outputs
self.layers = nn.Sequential(
nn.Linear(inputs, l1),
nn.ReLU(),
nn.Linear(l1, l2),
nn.ReLU(),
nn.Linear(l2, num_outputs * output_classes), # output_classes = number of classes in each output
)
def forward(self, x):
x = self.layers(x)
x = x.view(-1, self.output_classes, self.num_outputs) # Reshapes output tensor (logits output).
return x
model_copy = copy.deepcopy(floating_point_trained_model)
# quantize model params
quantized_state_dict, scale_zp_dict = quantize_model(model_copy, weight_bit_width=16, bias_bit_width=16)
for name, param in model_copy.named_parameters():
param.requires_grad = False
param.data = quantized_state_dict[name].to(dtype=torch.float) # <--- Need help here: Casting to float to satisfy softmax requirements
# Quantize data
Quant_X_train, scale, zp = quantize_tensor(X_train, 16) # can make your X_train
Quant_X_test, test_scale, test_zp = quantize_tensor(X_test, 16) # can make your X_test
# call quantized model on quantized input data
pred_probs = torch.softmax(model_copy(Quant_X_test.to(torch.float), dim = 1) # <---Need Help: Casting to float to get prediction probabilities
predictions = torch.argmax(pred_probs, dim=1)
I'm curious about a few things:
If this is the correct process/way to approach this problem.
If I implemented the quantization procedures accurately
If anyone has some guidance about how to approach this problem (or sample examples/tutorials) that'll be great. I have perused PyTorch's quantization mode support
If it helps, this is an example of what my training data looks like:
0 0.995231 0.996840 1.000000 0.998341 1.000000 1.000000 1.000000 0.998709 ... 0.000024 0.000019 0.000015 0.000016 0.000011 0.000007 0.000007 0.000015
1 0.996407 0.998568 1.000000 0.997889 1.000000 0.999954 0.999738 0.997458 ... 0.000018 0.000013 0.000011 0.000012 0.000008 0.000005 0.000006 0.000009
2 0.996083 0.999702 1.000000 0.999031 1.000000 1.000000 0.999816 0.998727 ... 0.000019 0.000013 0.000012 0.000011 0.000008 0.000006 0.000006 0.000011
3 0.998531 0.999481 0.999199 1.000000 0.999720 1.000000 1.000000 0.998682 ... 0.000015 0.000011 0.000010 0.000010 0.000007 0.000005 0.000004 0.000007
Hi everyone! I'm working on an AI project that involves several models, and I’m exploring the best cloud service to use for GPU-based model inference. My requirements are as follows:
I've looked into a few options like AWS, vast.ai, runpod, and some specialized providers, but I’m unsure which would work best for this setup. Has anyone here worked with these or other services for similar needs? Any feedback on cost, performance, or ease of setup would be great!
I have used runpod for text to image (SD-xl template) but inference is very slow.
Thanks in advance!
I made a custom model for myself and I trained it on a dataset with 120 classes. I then proceed to fine tune it with a dataset with 5 classes, which is my target.
However, I get the following error:
RuntimeError: Error(s) in loading state_dict for hybrid_model:
size mismatch for classification_head.3.weight: copying a param with shape torch.Size([120, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
size mismatch for classification_head.3.bias: copying a param with shape torch.Size([120]) from checkpoint, the shape in current model is torch.Size([5]).
I used model = hybrid_model(num_classes=120)
while training and am usingmodel = hybrid_model(num_classes=5)
for fine-tuning.
Any suggestions?
I am BEYOND EXCITED to publish our interview with John Yang and Carlos E. Jimenez from SWE-bench, SWE-agent, and SWE-bench Multimodal!
Beyond just solving LeetCode-style programming challenges, this series of works tackles deploying LLM Agents to real GitHub repositories and their respective issues and pull requests!
This was such an interesting discussion beginning with the data problem of interfacing LLMs and GitHub repositories and then diving into all sorts of things from Code Execution as a Tool to Agents vs. Compound AI System designs, Multimodal SWE Agents, and more!
If I have a file of floating point numbers, like a CSV file or something, can I safely use Chatgpt and other LLMs to change its format? For example, transforming a csv file into a markdown style table?
My concern is that, because of hallucinations, the actual numerical values might be transformed into something different. I'm thinking of some cases like a table that says "pi values" and I have 3.15 in it, and because most tables would have pi as 3.14 the model might change the numerical value.
My model is pretrained with 32bit. I have good data for my niche in 16bit. Can I convert the 16bit via noramlization to float 32 and train the model or is that normal working?
Is there a way I can use the 16bit Audios?
Hey r/deeplearning! I recently built a project called Cells AI to help businesses get more out of their data without requiring a data team. The idea is pretty straightforward: just ask your data questions and get instant answers. Here’s a bit about how it works and what it does:
It’s been an interesting project, especially working out how to make the responses both fast and accurate. If you’re interested, I’ve got a demo that shows it in action.
https://reddit.com/link/1gfjp5t/video/7al4qj0ojvxd1/player
Would love to hear if anyone’s working on similar projects or has tackled similar challenges with NLP for data insights!