/r/deeplearning

Photograph via snooOG

Resources for understanding and implementing "deep learning" (learning data representations through artificial neural networks).

/r/deeplearning

168,787 Subscribers

1

A potential use case for an LLM based tool

Greetings,

I came across a use case which I wondered could be solved by LLMs. I've crafted a problem statement, which would help you understand the use case. Any advice would be really helpful.

Here is a problem statement. Your job is to analyse multiple smartphones, to determine which one to buy. You have been given, for each smartphone, a list of their details and specs. These details tell about the configuration & features of each smartphone. The details are only in textual sentences. This sentences can be for example the RAM, the screen size, the processor configuration, the launch date, the increasing popularity etc. Now say for each smartphone you have a 100 such details and you have a hundred smartphones in total to compare. The thing to note is that the type of details may not be common for all the smartphones. For example some smartphones may have folding feature which you can compare but some smartphones may not have folding feature at all. Your job is to create a program using LLMs and AI to analyse these details and compare the smartphones to determine which one is the best Smartphone to buy. You do not have any other kind of structure data about this items and you only have to use the LLMs to compare the smartphones.

Explain only the approach you would use.

Thnxxxx

0 Comments
2024/11/02
06:05 UTC

1

Fine tune or rag?

So I have a dataset of 300 samples of text mapped to code. That code is limited in the sense that they are some basic lines of codes that are not going to be changed for later inference. To be more precise if I have those 300 lines of code, with their text description, then for inference, we're expecting one of those lines given the parameters provided in the text. For example: ("print the first index of this string s", "print(s[0])"), this is just a hypothetical example. What we expect, is an inference for something like: "display the 0 index of string j" to give "print(j[0])" .

There are two ways to go about this, either to fine-tune an llm, preferably smaller one, while increasing the model complexity if the smaller one works but doesn't capture all of the complexity.

Or, we do RAG, because basically we're doing knowledge here to be memorized, and basically for inference, we're just asking for one of those patterns in the initial dataset, and thus it s more of a knowledge memory. Am I right with this?

2 Comments
2024/11/02
02:53 UTC

2

Any APIs or ways to detect whether someone is done speaking?

I want to detect whether or not someone is done speaking on an open line (EP detection). Using hard coded values like "if they stops for 5 seconds then they are done" isn't really the best approach through testing.

I want to explore these options; how to tell if someone is done talking through:

  1. Video recording of them speaking live
  2. Audio
  3. Both
0 Comments
2024/11/01
21:32 UTC

1

Recommendations for a 2TB SSD/32GB RAM Laptop for ML/DL in Medical Imaging and NLP?

I'm looking to get a laptop for machine learning/deep learning research projects in the medical domain.

In the US.

I'm not sure if a Mac would be a better choice, but preferably non-Apple just due to their excessive pricing.

Budget $1500-2000.

Please suggest.

3 Comments
2024/11/01
20:33 UTC

0

Does anyone know how gradient descent works?

I spent a long time trying to figure out why gradient descent doesn't work correctly for me. Explain how it works. What's the difference between regular gradient descent and batch gradient descent. Average error for gradient descent (and I'm not talking about loss functions, but about backpropagation), how is it calculated? In the loss function, we kind of take this error squared, thereby amplifying it or taking the absolute one, but we don't divide it like that for backpropagation, thus the errors + - give 0. I have a neural network like this now: 4 input, 2 hidden, 1 output layer. I don't know how gradient descent works, but I'll assume for each how it works (I don't understand this, as I understand, and I want to express this idea so that you can correct me in the right direction). For example, a regular gradient descent calculates an error for the entire dataset, I don't know there, here is an example of datasets (0, 1, 1, 0) (1,0,1,0) (1, 1, 1, 0), and on the first dataset I expect the value 0, the second, 1, the third 1. When running with randomly initialized weights, LET'S say I get the following: 0.6, 0.3, 0.8. Here, for the loss function, I calculate like this: (0 - 0.6) ^ 2 + (1-0.3) ^ 2 + (1-0.8) ^ 2. Everything will be okay here, the error is normal. Here is its sum and we divide it by the number of samples: 0.89 / 3 and that's how the loss is calculated. But for backpropagations it needs to be different: -2*(0-0.6) + -2*(1-0.3) + -2*(1-0.8), and here everything is different: The sum and its division: the sum is -0.6 and we divide it by 3, it turns out -0.2, we sort of go to correct, but the error will be the same, since we will get an average of -0.2 from the run of this era and we will go along the same path.

3 Comments
2024/11/01
18:04 UTC

1

Format deviation issue

There's this issue in cgpt where it'll randomly start deviating from the format specified in instructions to adopt its own and apply it to all following outputs.

1 Comment
2024/11/01
17:52 UTC

2

Machine Translation of Maharashtri Prakrit (an ancient Indian language) to English by Fine-Tuning M2M100_418M model on custom made Dataset.

1 Comment
2024/11/01
16:31 UTC

1

Explainable AI

I’ve been trying to learn about Explainable AI, and I’m curious about the differences between model-agnostic techniques compared to model-specific ones. How do they actually work, and what are the trade-offs in terms of accuracy and interpretability? Any insights or examples you could share would be super helpful!

1 Comment
2024/11/01
13:52 UTC

0

ComSci Major (Badly Need Help)

Hello, I am fairly beginner in deep learning, and we have to complete a requirement in a course. Can anybody recommend a good alternative to colab? I am willing to pay for compute units, but others said colab's kinda trash. Despite this, I haven't heard any other alternative ways on how I can train AI models. Thank you for the help!

1 Comment
2024/11/01
12:07 UTC

1

Inconsistency between TensorBoard logged values and Terminal outputs

Hello everyone,

I am working on a Image Segmentation project. Going to the point, in the TensorBoard logger the recorded metrics and loss values are significantly different than what can be seen in the terminal, as the learning epochs go by.

I tried to investigate this issue online, but couldn't find anything. I also added a CSV Logger to compare the results, and those match with the TensorBoard ones.

However, if I take a look at the values returned in the terminal, for example I can see that: mean_iou value is consistently above 0.50, yet in the CSV Logger & TensorBoard the recorded value for mean_iou in that epoch is 0.36.

I cannot see where the issue could be created. I used the same training workflow as always and I have never experienced such issue.

Thanks in advance to anyone who could answer.

0 Comments
2024/11/01
10:11 UTC

2

What make a transformer parallel?

This is a beginner question but I was wondering what part of transformer makes it parallel? It still needs to process tokens layer and layer right?

10 Comments
2024/11/01
06:39 UTC

1

How to Identify Document Sections for RAG in Varying Formats?

Hi guys, I'm building a RAG solution with Llama3 model that is designed for retrieving knowledge from Research Paper. As other RAG projects, I started to split the document into chunks with a specific number of tokens per chunk. However, I recently began to wonder: what if I instead create chunks that encompass entire sections of the paper, such as having one chunk for the abstract, another for the methodology, and so on?
Not sure this will enhance the work but I'm curious and want to try but doesn't know where to start. Can anyone suggest me a light weight pretrained model that excellent in identify the sections of a document like research paper?

0 Comments
2024/11/01
04:13 UTC

2

[Tutorial] Lane Detection using Mask RCNN – An Instance Segmentation Approach

Lane Detection using Mask RCNN – An Instance Segmentation Approach

https://debuggercafe.com/lane-detection-using-mask-rcnn/

Lane detection and segmentation have a lot of use cases, especially in self-driving vehicles. With lane detection and segmentation, the vehicle gets to see different types of lanes. This allows it to accordingly plan the route and action. Of course, there are several other components involved along with computer vision and deep learning. But this serves as the first step. In this article, we will try to solve the first step involving computer vision and deep learning. We will train a Mask RCNN model for lane detection and segmentation. We are taking an instance segmentation approach to detect and segment various types of lane lines.

https://preview.redd.it/mle6trzgq6yd1.png?width=1000&format=png&auto=webp&s=c9aae202487a9f8ec770c25e8a06c01e637d24aa

0 Comments
2024/11/01
00:32 UTC

0

About Artificial Intellegence ... (from ordinary coder)

Before ChatGPT, when I struggled to understand some key concepts in theory or some details were obscure, I would go to Reddit, StackOverflow, etc.

But people, especially in Programming Forums, if you state something stupid (what you belived at), can make embarassment out of you. Even here, in the sub-reddit topic, you propose something you were tinkering with, some people may make fun of you.

ChatGPT changed it all. It became all different story. It would explain to you all the theory so patiently, would no argue with you even if it recognized your intellectual level is below desk.

It would explain and explain, and even say sorry to you if it did not understand the idea of the question fully.

Image by macrovector on Freepik

With ChatGPT3 -> 4 -> 4o it became better and better in its answers.

I cannot complain about ChatGPT in these terms.

However, only what can cause mistrust, is the fact, that all these conversations are saved and can be misused. Because at some point I started to share my ideas with ChatGPT, asking it to check them. Some people started to share their photos, videos with AI. At one point it can accumulate all this information. And become ... a monster. Some people even nowadays, stopped doing intellectual/creative work relying on it.

I personally believe that ChatGPT o1 - is the treshold that we don't need to go above. It should stay helper tool for people, and not go beyond that.

Why I was working on this small LLM posted recently, because it is limited, it is kind of small unit that can be put somewhere to work as auxiliary block...

Image by pikisuperstar on Freepik

3 Comments
2024/10/31
20:04 UTC

3

Looking for Military Audio Dataset? Please refer our dataset.

Hey folks,

Are you looking for a Military Audio Dataset? We are happy to announce that we deployed our MAD (Military Audio Dataset) which contains 7,466 audio samples from 7 classes (communication, gunshot, footsteps, shelling, vehicle (tank), helicopter, and fighter) corresponding to approximately 12 hours, exhibiting distinctive characteristics not presented in academic datasets typically used for machine learning research.

The dataset is available at Kaggle. For more detail, please refer our Github repository or paper

1 Comment
2024/10/31
17:09 UTC

0

Caching Methods in Large Language Models (LLMs)

https://preview.redd.it/ce3x4oa564yd1.png?width=1200&format=png&auto=webp&s=f85ea78638e93ee72d4bcc8e4abc14f7c8e49201

https://preview.redd.it/o0vcrna564yd1.png?width=1200&format=png&auto=webp&s=89ba1d49f11bb73ae3f082adb2fa461954747341

https://preview.redd.it/829glod564yd1.png?width=1200&format=png&auto=webp&s=1f41931e6e1c9922d76b51fe7c32d562ab1f379e

https://preview.redd.it/j4qgvoa564yd1.png?width=1200&format=png&auto=webp&s=8cbf3f9d55dedfe56b55d1d666719b92aa7bd4ed

https://preview.redd.it/i0mwina564yd1.png?width=1200&format=png&auto=webp&s=dcbf688124e7e405d6345b908843e1ed1ae3ad60

https://preview.redd.it/ehcr1qa564yd1.png?width=1200&format=png&auto=webp&s=91b149c69d62df155a8181b92dc9edf94e4b154e

https://preview.redd.it/mex1rpa564yd1.png?width=1200&format=png&auto=webp&s=9bbc8267879eac14dd690b5a6c978ffa606cf440

https://preview.redd.it/cl5eppa564yd1.png?width=1200&format=png&auto=webp&s=4aff0dcf28031a0f32fb114bf0fde8e08f5016cc

https://preview.redd.it/y2xpfoa564yd1.png?width=1200&format=png&auto=webp&s=609b97383b0c7ca3df0eac3ef91a75ac9e9f3d78

https://preview.redd.it/wt9d5pa564yd1.png?width=1200&format=png&auto=webp&s=b3033745d6c7736b30c673aa266a765af6055379

https://preview.redd.it/2o75mpa564yd1.png?width=1200&format=png&auto=webp&s=429f9c4969db92f9726c4b54f3d508fb2df31404

https://preview.redd.it/ln00roa564yd1.png?width=1200&format=png&auto=webp&s=9dd05cac81f20edfc742b668d9881844e8cdcf92

https://preview.redd.it/bn4y0pa564yd1.png?width=1200&format=png&auto=webp&s=d871b03fde025af31579c1aa0420a9bc62b6b403

https://preview.redd.it/zonrqqa564yd1.png?width=1200&format=png&auto=webp&s=cc0bef81a5d9761dffde92418d3840240fb3e3e9

https://preview.redd.it/i9jskqc564yd1.png?width=1200&format=png&auto=webp&s=00c44fafc89a7581a59c04da208edfa9eefb93d5

https://preview.redd.it/hlb2bud564yd1.png?width=1200&format=png&auto=webp&s=8617aeb071dbdef7720b6f7152afa611dffcf960

0 Comments
2024/10/31
15:55 UTC

0

Advice on graphics cards: GTX 3060 or 4060 Ti for DL/ML

Hey guys,

I'm currently putting together a computer to build my own workstation for my deep learning and machine learning hobby projects. I know there are cloud solutions that are often better, but I want to build my own station first. Here are the planned components:

- MSI B760 GAMING PLUS WIFI motherboard

- Intel i7 of the 13th generation

- 32 GB DDR5 RAM (6000 MHz)

- 2 TB M.2 PCIe Gen4 SSD

The only difficulty I'm having at the moment is deciding between two graphics cards:

- Gigabyte NVIDIA GeForce RTX 3060 GAMING OC V2 (12 GB GDDR6)

- Gigabyte NVIDIA GeForce RTX 4060 Ti GAMING OC (16 GB GDDR6)

I'm aware that VRAM isn't the most important thing, as I don't just want to work with LLM and CNN models, but also want to develop my own deep learning models. The memory bandwidth of the GTX 3060 is better, but it has less VRAM. On the other hand, the 4060 Ti offers a significantly higher CUDA score, but also costs 200 euros more.

I would be super grateful if you could help me with this decision!

Thanks in advance!

0 Comments
2024/10/31
15:02 UTC

1

What are some good methods to perform hierarchical classificationof text?

Hi everyone I have this problem in hand. I have reviews of certain number of products like I have around 10000 rows and based on these reviews I want to do hierarchical classification of these products. Just to test out things I tried modelling three different model to be able to predict three different level of classes, but as one would expect I am able to get good accuracy or F1 scores for the category one but as I dive into the hierarchy the score worsens. My plan was to start with simpler models like Naive Bayes classifiers, SVM, logistic classifiers, xgboost and then eventually move to more advanced methods like rnns, lstm and bert. I want to develop good intuition around how I should be solving hierarchical classification problem, any suggestions would be helpful. Thanks

1 Comment
2024/10/31
14:45 UTC

1

How does Claude Computer Use get mouse coordinates?

does anyone have any insight into the model architecture / method used to determine the coordinates of the elements on the screen?

they mentioned in their blog post that they had to come up with a new model but didn't give any details

0 Comments
2024/10/31
12:03 UTC

1

Crucial Python Project |Backpropagation| Classification| Real world example

0 Comments
2024/10/31
07:29 UTC

1

For LLM cloud or Local GPUs?

So i am in the mind set of building a home PC with Nvidia 4080 super , core i9 blah blah... for doing experiments on LLMs and also build AI applications. So is it worth buying all this for like say $3500 or go for cloud services like googleColab, PaperSpace.. etc. What do you think folks?

12 Comments
2024/10/31
03:47 UTC

1

I am encountering ValueError: None values not supported while doing anomaly detection project

if train_images.shape[0] == 0:
    raise ValueError("No images were loaded. Please check the image directory.")

# Create a TensorFlow dataset without batching
full_dataset = tf.data.Dataset.from_tensor_slices(train_images)

# Shuffle the dataset
full_dataset = full_dataset.shuffle(buffer_size=len(train_images))

# Split the dataset into training and validation sets
train_size = int(0.8 * len(train_images))
val_size = len(train_images) - train_size

train_dataset = full_dataset.take(train_size)
val_dataset = full_dataset.skip(train_size)

# Batch the datasets
train_dataset = train_dataset.batch(32)
val_dataset = val_dataset.batch(32)

# Function to build the autoencoder model
def build_autoencoder(input_shape):
    # Encoder
    encoder_input = layers.Input(shape=input_shape)
    x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(encoder_input)
    x = layers.MaxPooling2D((2, 2), padding='same')(x)
    x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(x)
    encoder_output = layers.MaxPooling2D((2, 2), padding='same')(x)

    # Decoder
    x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(encoder_output)
    x = layers.UpSampling2D((2, 2))(x)
    x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x)
    x = layers.UpSampling2D((2, 2))(x)
    decoder_output = layers.Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)

    # Autoencoder model
    autoencoder = models.Model(encoder_input, decoder_output)
    return autoencoder

# Function to train the autoencoder
def train_autoencoder(train_dataset, val_dataset):
    model = build_autoencoder(input_shape=(224, 224, 3))
    model.compile(optimizer='adam', loss='mse')

    # Define callbacks for saving the model and early stopping
    checkpoint_cb = tf.keras.callbacks.ModelCheckpoint("autoencoder_model.keras", save_best_only=True)
    early_stopping_cb = tf.keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)

    # Train the model
    history = model.fit(
        train_dataset,
        epochs=50,
        validation_data=val_dataset,
        callbacks=[checkpoint_cb, early_stopping_cb],
        verbose=2
    )

    # Save the training history
    with open('training_history.txt', 'w') as f:
        for key, values in history.history.items():
            f.write(f'{key}: {values}\n')

    print("Training complete, model and history saved.")

# Train the autoencoder
train_autoencoder(train_dataset, val_dataset)

Epoch 1/50Epoch

ValueError: None values not supported.
1 Comment
2024/10/31
00:01 UTC

3

PyTorch Quantization of model parameters for deployment on edge device

Essentially, I have a trained model (using PyTorch) that I want to deploy on an edge device (all written in C/C++) for inference. Just for context (I'm working alone on this project so I don't get much guidance). My understanding is that, at deployment, the input (inference data needs to be integers), and my model's parameters (weights and bias/activation) also need to be integers. Because I don't have "inference data", I am currently testing out my implementation/prototyping by quantizing my validation/test data and comparing the validation/test results I get using the floating point model parameters vs the results I get using quantized/integer model parameters. To make this more concrete (or succinct), I'm testing with two cases:

  • Case 1: floating point model called on floating point train and test data.
  • Case 2: quantized int model parameters called on quantized test data.

            def quantize_tensor(tensor, num_bits): 
                qmin = - (2 ** (num_bits - 1)) 
                qmax = (2 ** (num_bits - 1)) - 1 
                min_val, max_val = tensor.min(), tensor.max() 
                scale = (max_val - min_val) / (qmax - qmin)
                zero_point = qmin - min_val / scale
                zero_point = torch.round(zero_point).clamp(qmin, qmax)

                q_tensor = torch.round(tensor/scale+zero_point).clamp(qmin, qmax)

                if num_bits == 8:
                    q_tensor = q_tensor.type(torch.int8)
                elif num_bits == 16:
                    q_tensor = q_tensor.type(torch.int16)
                else:
                    q_tensor = q_tensor.type(torch.int)
                
                return q_tensor, scale, zero_point

Then I quantize the model's weights and the bias using this:

            def quantize_model(model, weight_bit_width=16, bias_bit_width=16):
                quantized_state_dict = {}
                scale_zp_dict = {}  # To store scale and zero-point for each parameter

                for name, param in model.state_dict().items():
                    if 'weight' in name:
                        q_param, scale, zero_point = quantize_tensor(param, weight_bit_width)
                        quantized_state_dict[name] = q_param
                        scale_zp_dict[name] = (scale, zero_point)
                    elif 'bias' in name:
                        q_param, scale, zero_point = quantize_tensor(param, bias_bit_width)
                        quantized_state_dict[name] = q_param
                        scale_zp_dict[name] = (scale, zero_point)
                    else:
                        # For other parameters, keep them as is or apply appropriate quantization
                        quantized_state_dict[name] = param

                return quantized_state_dict, scale_zp_dict

Furthermore, I quantize my model and the data like so (see code below) however, because my ML Problem is a multiclass and multioutput problem, I need to call torch.softmax on the logits I get out of my model, so I can get prediction probabilities but the softmax function doesn't support integers (or technically is not implemented for ints) which makes me worried that my overally quantization approach is wrong (I add the model's code and extra below):

import copy 

class model(nn.Module):
    def __init__(self, inputs, l1, l2, num_outputs, output_classes=3):
        super().__init__()

        # define the layers
        self.output_classes = output_classes
        self.num_outputs = num_outputs

        self.layers = nn.Sequential(
            nn.Linear(inputs, l1),
            nn.ReLU(),
            nn.Linear(l1, l2),
            nn.ReLU(),
            nn.Linear(l2, num_outputs * output_classes),  # output_classes = number of classes in each output
        )

    def forward(self, x):
        x = self.layers(x)
        x = x.view(-1, self.output_classes, self.num_outputs)  # Reshapes output tensor (logits output).
        return x


model_copy = copy.deepcopy(floating_point_trained_model)

# quantize model params
quantized_state_dict, scale_zp_dict = quantize_model(model_copy, weight_bit_width=16, bias_bit_width=16)
for name, param in model_copy.named_parameters():
    param.requires_grad = False
    param.data = quantized_state_dict[name].to(dtype=torch.float) # <--- Need help here: Casting to float to satisfy softmax requirements 

# Quantize data 
Quant_X_train, scale, zp = quantize_tensor(X_train, 16) # can make your X_train
Quant_X_test, test_scale, test_zp = quantize_tensor(X_test, 16) # can make your X_test

# call quantized model on quantized input data 
pred_probs = torch.softmax(model_copy(Quant_X_test.to(torch.float), dim = 1) # <---Need Help: Casting to float to get prediction probabilities 
predictions = torch.argmax(pred_probs, dim=1) 

I'm curious about a few things:

  • If this is the correct process/way to approach this problem.

    • Especially because I am not able to call softmax on my int tensor, I feel I might be doing something wrong.
  • If I implemented the quantization procedures accurately

    • i.e. does my method of verifying make sense (the method being: comparing the results between case 1 and case 2 above)
  • If anyone has some guidance about how to approach this problem (or sample examples/tutorials) that'll be great. I have perused PyTorch's quantization mode support

If it helps, this is an example of what my training data looks like:

0      0.995231  0.996840  1.000000  0.998341  1.000000  1.000000  1.000000  0.998709  ...         0.000024         0.000019         0.000015         0.000016         0.000011         0.000007         0.000007         0.000015
1      0.996407  0.998568  1.000000  0.997889  1.000000  0.999954  0.999738  0.997458  ...         0.000018         0.000013         0.000011         0.000012         0.000008         0.000005         0.000006         0.000009
2      0.996083  0.999702  1.000000  0.999031  1.000000  1.000000  0.999816  0.998727  ...         0.000019         0.000013         0.000012         0.000011         0.000008         0.000006         0.000006         0.000011
3      0.998531  0.999481  0.999199  1.000000  0.999720  1.000000  1.000000  0.998682  ...         0.000015         0.000011         0.000010         0.000010         0.000007         0.000005         0.000004         0.000007
6 Comments
2024/10/30
20:29 UTC

2

Seeking Advice on Best Cloud GPU Service for AI Model Inference (Text-to-Speech, Speech-to-Text, Text-to-Image, LLM)

Hi everyone! I'm working on an AI project that involves several models, and I’m exploring the best cloud service to use for GPU-based model inference. My requirements are as follows:

  • Models: I need to deploy text-to-speech, speech-to-text, text-to-image, and a large language model (LLM).
  • Performance: I'm looking for high inference speed and minimal latency, as this will be a real-time or near-real-time application.
  • Scalability: I’d like a solution that can handle scaling with multiple users, ideally without cold starts.
  • Cost Efficiency: Budget is a consideration as I am thinking of bootstrapping, so I'd appreciate any insights into cost-effective services.

I've looked into a few options like AWS, vast.ai, runpod, and some specialized providers, but I’m unsure which would work best for this setup. Has anyone here worked with these or other services for similar needs? Any feedback on cost, performance, or ease of setup would be great!

I have used runpod for text to image (SD-xl template) but inference is very slow.

Thanks in advance!

10 Comments
2024/10/30
17:05 UTC

1

Fine-tuning with a model with a different number of classes

I made a custom model for myself and I trained it on a dataset with 120 classes. I then proceed to fine tune it with a dataset with 5 classes, which is my target.

However, I get the following error:

RuntimeError: Error(s) in loading state_dict for hybrid_model:
size mismatch for classification_head.3.weight: copying a param with shape torch.Size([120, 256]) from checkpoint, the shape in current model is torch.Size([5, 256]).
size mismatch for classification_head.3.bias: copying a param with shape torch.Size([120]) from checkpoint, the shape in current model is torch.Size([5]).

I used model = hybrid_model(num_classes=120) while training and am using
model = hybrid_model(num_classes=5) for fine-tuning.

Any suggestions?

5 Comments
2024/10/30
16:08 UTC

1

SWE-bench with John Yang and Carlos E. Jimenez - Weaviate Podcast #107!

I am BEYOND EXCITED to publish our interview with John Yang and Carlos E. Jimenez from SWE-bench, SWE-agent, and SWE-bench Multimodal!

Beyond just solving LeetCode-style programming challenges, this series of works tackles deploying LLM Agents to real GitHub repositories and their respective issues and pull requests!

This was such an interesting discussion beginning with the data problem of interfacing LLMs and GitHub repositories and then diving into all sorts of things from Code Execution as a Tool to Agents vs. Compound AI System designs, Multimodal SWE Agents, and more!

YouTube: https://www.youtube.com/watch?v=8rwHAR4fsFg

Spotify: https://spotifyanchor-web.app.link/e/lHcSCgNr7Nb

0 Comments
2024/10/30
15:12 UTC

0

Can you safely use ChatGPT and other LLM for data manipulation?

If I have a file of floating point numbers, like a CSV file or something, can I safely use Chatgpt and other LLMs to change its format? For example, transforming a csv file into a markdown style table?

My concern is that, because of hallucinations, the actual numerical values might be transformed into something different. I'm thinking of some cases like a table that says "pi values" and I have 3.15 in it, and because most tables would have pi as 3.14 the model might change the numerical value.

9 Comments
2024/10/30
14:31 UTC

2

16bit vs 32bit for training ASR

My model is pretrained with 32bit. I have good data for my niche in 16bit. Can I convert the 16bit via noramlization to float 32 and train the model or is that normal working?

Is there a way I can use the 16bit Audios?

4 Comments
2024/10/30
11:48 UTC

0

Building an AI for Business Data—How Cells AI Lets You “Talk” to Your Data

Hey r/deeplearning! I recently built a project called Cells AI to help businesses get more out of their data without requiring a data team. The idea is pretty straightforward: just ask your data questions and get instant answers. Here’s a bit about how it works and what it does:

  • Question-Based Queries: Cells AI lets users ask questions in plain language—think, “What were last quarter’s top-selling products?” and it provides an immediate, clear answer.
  • Data Insights Without Manual Analysis: Instead of pulling reports or using spreadsheets, Cells AI automates the analysis, making data insights instantly accessible.
  • Flexible Data Sources: Cells AI can handle multiple formats, from CSVs and Excel sheets to databases, adapting to whatever data the user has on hand.

It’s been an interesting project, especially working out how to make the responses both fast and accurate. If you’re interested, I’ve got a demo that shows it in action.

https://reddit.com/link/1gfjp5t/video/7al4qj0ojvxd1/player

Would love to hear if anyone’s working on similar projects or has tackled similar challenges with NLP for data insights!

3 Comments
2024/10/30
10:53 UTC

Back To Top