/r/deeplearning

Photograph via snooOG

Resources for understanding and implementing "deep learning" (learning data representations through artificial neural networks).

/r/deeplearning

169,436 Subscribers

1

I reversed engineered how WizardMath actually works. The 3-step process is brilliant. [Technical Analysis]

Been reverse engineering WizardMath's architecture (Luo et al., 2023) and honestly, it's beautiful in its simplicity. Everyone's focused on the results, but the 3-step training process is the real breakthrough.

Most "math-solving" LLMs are just doing fancy pattern matching. This approach is different because it's actually learning mathematical reasoning, not just memorizing solution patterns.

I've been implementing something similar in my own work. The results aren't as good as WizardMath yet, but the approach scales surprisingly well to other types of reasoning tasks. You can read more of my analysis here. If you're experimenting with wizard math, also let me know https://blog.bagel.net/p/train-fast-but-think-slow

https://preview.redd.it/8zckr1ljrvzd1.png?width=1518&format=png&auto=webp&s=9c784445cd9fbc93325b861bd1e840dda92673f2

1 Comment
2024/11/09
13:47 UTC

1

VAE for different size inputs - KL divergence loss

Hi,

I'm working on a Variational Autoencoder, which is fully convolutional, so it can take different sequence length inputs. In calculating the Kullback-Liebler divergence, you would normally use summation to reduce the loss, then divide by the batch size. However, I have different sequence lengths as inputs, and I think this is making the KL divergence be very different between batches. I could normalize by dividing the kl loss by the sequence length of the given batch, but this will not be a correct implementation mathematically.
I'm unsure on what to do.

0 Comments
2024/11/09
12:56 UTC

0

Driving The Next Wave of AI power🚀

Deep Learning Innovations: What’s Driving the Next Wave of AI Power?

The field of deep learning is continuously evolving, with innovations like self-supervised learning, new architectures, and neural networks that are capable of unimaginable feats. This post explores recent advancements that are pushing the boundaries of what deep learning can achieve, from image recognition to natural language understanding. Learn about the latest tools, research, and trends shaping the future of AI.🪐

Want to stay at the cutting edge of AI? Join r/deeplearning and see what’s new in the world of deep learning! 👇🏽

0 Comments
2024/11/09
11:45 UTC

1

Advice on Fine-Tuning Meta's Segment Anything 2 (SAM) Model — Balancing Error Correction with Generalizability

I was working with SAM2 and have been trying to figure out the best way to fine-tune it for my specific use case. A few considerations that I was hoping get some insights on:

  1. Error Correction vs Generalization: If I'm interested in fine-tuning the model to perform better on cases where it went wrong most on, can I retains its performance on the examples it was already doing well on. i.e. still maintaining (or even improving) its prior generalizability? Or should I have enough number of examples it was doing well already on to preserve that performance?
  2. Which Components to Fine-Tune? In terms of the model's architecture, I've seen different advice on whether to fine-tune just the mask decoder, the prompt encoder, or both. In your experience, is fine-tuning just the mask decoder enough to improve performance, or do you need to adjust the prompt encoder as well? Or maybe there's more to it—like the backbone or other parts of the model? Is it computationally too much of a difference? Or are there other downsides/considerations as well?
  3. Real-World Experiences: For those who have fine-tuned SAM before, how has your experience been? Any tips, tricks, or pitfalls I should watch out for? Also, how did you go about preparing your fine-tuning dataset? Any suggestions on balancing the diversity of data vs focusing on edge cases?
0 Comments
2024/11/09
08:33 UTC

0

Need advice

Hey there i hope you'll good , im going to be 20s old in the next months and i just dropped off the university for financial reasons my parents aren't that much to support me,so I'm feeling lost right now i wanna invest my time in something that's can earn me some money ,i knew some of electronics repair but im not sure if it's good career, and i have intereste in Al and machine learning and i heard frome someone on YouTube it's not for who have no coding skills , pls clear me up or you can suggest some finance advice

16 Comments
2024/11/08
21:00 UTC

3

ONNX Runtime Web Greedy/Beam Search

Hello, I have a custom transformer model exported from PyTorch, and I am trying to deploy as a Chrome extension. For greedy/beam search, what is the best practice? I am in the process of using Javascript and ort.Tensor to create attention mask and input sequence at each step, but realized this could be a bit slow. Thanks!

0 Comments
2024/11/08
18:52 UTC

3

any recommendations for materials for attention mechanisms and transformers

I have been through a great book called Dive into Deep Learning but i can't understand the intuition behind attention which leads to the fact that I can't fully comprehend transformers

so where should I go if I want to fully understand attention mechanisms and transformers?

my second question is , are attention mechanisms a must in order to understand transformers?

6 Comments
2024/11/08
18:41 UTC

0

Why are model_q4.onnx and model_q4f16.onnx not 4 times smaller than model.onnx?

I see on https://huggingface.co/HuggingFaceTB/SmolLM2-135M-Instruct/tree/main/onnx:

File NameSize
model.onnx654 MB
model_fp16.onnx327 MB
model_q4.onnx200 MB
model_q4f16.onnx134 MB

I understand that:

  • model.onnx is the fp32 model,
  • model_fp16.onnx is the model whose weights are quantized to fp16

I don't understand the size of model_q4.onnx and model_q4f16.onnx

  1. Why is model_q4.onnx 200 MB instead of 654 MB / 4 = 163.5 MB? I thought model_q4.onnx meant that the weights are quantized to 4 bits.

  2. Why is model_q4f16.onnx 134 MB instead of 654 MB / 4 = 163.5 MB? I thought model_q4f16.onnx meant that the weights are quantized to 4 bits and activations are fp16, since https://llm.mlc.ai/docs/compilation/configure_quantization.html states:

    qAfB(_id), where A represents the number of bits for storing weights and B represents the number of bits for storing activations.

and Why do activations need more bits (16bit) than weights (8bit) in tensor flow's neural network quantization framework? indicates that activations don't count toward the model size (understandably).

3 Comments
2024/11/08
17:36 UTC

2

Does onnxruntime support bfloat16?

I want to train pytorch model in bfloat16 and convert into onnx bfloat16. Does onnxruntime support bfloat16?

1 Comment
2024/11/08
12:27 UTC

2

Metadata on test set not found?

Hello there I was just seeing this dataset: FOR-species20K dataset and I noticed that it has the metadata (that is, the species, genus, filename of the particular point cloud data) for the training set but it is not given for the test set. Is it provided somewhere else or is it not provided at all? Because if I do not know the true labels for the test set then how will I validate it?

I was interested in making a model for this dataset and was thinking of using PointNet++ or PointNet to do so.

0 Comments
2024/11/08
09:49 UTC

3

[Tutorial] Traffic Sign Detection using DETR

Traffic Sign Detection using DETR

https://debuggercafe.com/traffic-sign-detection-using-detr/

In this article, we will create a small proof of concept for traffic sign detection. We will use the DETR object detection model in particular for traffic sign detection. We will use a very small dataset. Also, we will entirely focus on the practical steps that we take to get the best results.

https://preview.redd.it/gn5s5svkokzd1.png?width=1000&format=png&auto=webp&s=a140091b70f3110cfb6c0aacfe0ce150e5b0afa3

0 Comments
2024/11/08
00:31 UTC

0

Interesting theory on how to build AGI

0 Comments
2024/11/07
21:29 UTC

0

when is somebody going to use tokenformer in prompt to video, in chatbots and robots

when is somebody going to use tokenformer in prompt to video, in chatbots and robots ? https://github.com/Haiyang-W/TokenFormer

2 Comments
2024/11/07
20:11 UTC

1

Wandb best practices for training several models in parallel?

0 Comments
2024/11/07
19:34 UTC

2

Research directions in NLP

Hey DL enjoyers, I feel like LLMs have pretty much hit their limit with innovation. There can be a lot done but nothing extra significant that it can complete change the LLM scene. Agents excluded. I did enjoy NLP before the whole LLM thing started. So here I ask, what next? What can a single individual or an individual with a research team do to make the NLP and LLM scene more interesting. My eyes are on explainable NLP (a long the lines of bertviz and SHAPley) and human in the loop compatible NLP. Redditors, show me the way.

Full disclosure: I'm going to use some of these ideas to add in my PhD idea.

5 Comments
2024/11/07
17:17 UTC

1

Let Me Speak Freely? with Zhi Rui Tam - Weaviate Podcast #108!

JSON Mode has been one of the biggest enablers for working with Large Language Models! JSON mode is even expanding into Multimodal Foundation models! But how exactly is JSON mode achieved?

There are generally 3 paths to JSON mode:

  1. Constrained Generation (such as Outlines)
  2. Begging the model for a JSON response in the prompt
  3. A two stage process of generate-then-format (or generate-then-retry)

Although most of the field has converged on the first method, Let Me Speak Freely? is a new paper challenging the potential tradeoffs in achieving JSON mode with constrained generation.

I am BEYOND EXCITED to publish the 108th Weaviate Podcast with Zhi Rui Tam, the lead author of Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models!

As the title of the paper suggests, although constrained generation is awesome because of its reliability, we may be sacrificing the performance of the LLM by producing our JSON with this method.

The podcast dives into how these experiments identify this and all sorts of details about the potential and implementation details of Structured Outputs. I particularly love the conversation topic of incredibly Complex Structured Outputs, such as generating 10 values in a single inference or say HTML templates.

I hope you enjoy the podcast! As always please reach out if you would like to discuss any of these ideas further!

https://www.youtube.com/watch?v=UsVIX9NJ_a4

0 Comments
2024/11/07
16:08 UTC

3

Pricing and recommendation for Online GPU services

I am training different neural network models on an images dataset (40 gb) size may exceed later. I have a laptop with RTX 260 but it's taking too long so how do you guys do it. If anyone of you is using online GPUs, how much do they cost and which is the cheaper and get the job done.

15 Comments
2024/11/07
16:05 UTC

41

AI That Can "Smell"?

I've been reading about Osmo, a startup using AI to predict and recreate scents by analyzing the molecular structures of smells, which they believe could impact fields from healthcare to fragrances.

It’s fascinating to think about machines “smelling” with this level of accuracy, but I’m curious — how might this actually change the way we experience the world around us? I guess I'm struggling to see the practical or unexpected ways AI-driven scent technology could affect daily life or specific industries, so I want to hear different perspectives on this.

26 Comments
2024/11/07
13:45 UTC

2

Resume training with optimizers from a checkpoint

Hi!

I am working on a deep learning model training script with checkpointing functionality. I have a question about the order in which to setup things when picking up training from a checkpoint. The checkpoint contains the model weights and optimizer state. Now what I would like to know is whether there is any difference between these two options:

  1. First load the model weights to the model and then setup the optimizer (pass the model parameters as argument to optimizer constructor and load optimizer state)
  2. First set up the optimizer (pass model parameters as argument) and then load model weights and optimizer state.

Thank you in advance for answering.

EDIT: I am using PyTorch

4 Comments
2024/11/07
11:50 UTC

0

[ONNXRuntimeError] : 7 : INVALID_PROTOBUF while trying to run hugging face repo in WSL = Windows Subsystem LINUX

[ONNXRuntimeError] : 7 : INVALID_PROTOBUF while trying to run hugging face repo in WSL = Windows Subsystem LINUX

I am trying to run Nymbo/Virtual-Try-On at main in my local server based on ubuntu, I had set it up, installed the libraries yet getting [ONNXRuntimeError] : 7 : INVALID_PROTOBUF.

Although I was able to run this repository successfully on google colab.

Error in detail:

python app.py

/home/ubuntu/VTON-env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.

warnings.warn(

The config attributes {'decay': 0.9999, 'inv_gamma': 1.0, 'min_decay': 0.0, 'optimization_step': 37000, 'power': 0.6666666666666666, 'update_after_step': 0, 'use_ema_warmup': False} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.

Some weights of the model checkpoint were not used when initializing UNet2DConditionModel:

['add_embedding.linear_1.bias, add_embedding.linear_1.weight, add_embedding.linear_2.bias, add_embedding.linear_2.weight']

Traceback (most recent call last):

File "/home/ubuntu/Virtual-Try-On/app.py", line 93, in <module>

parsing_model = Parsing(0)

File "/home/ubuntu/Virtual-Try-On/preprocess/humanparsing/run_parsing.py", line 20, in __init__

self.session = ort.InferenceSession(os.path.join(Path(__file__).absolute().parents[2].absolute(), 'ckpt/humanparsing/parsing_atr.onnx'),

File "/home/ubuntu/VTON-env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__

self._create_inference_session(providers, provider_options, disabled_optimizers)

File "/home/ubuntu/VTON-env/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 452, in _create_inference_session

sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)

onnxruntime.capi.onnxruntime_pybind11_state.InvalidProtobuf: [ONNXRuntimeError] : 7 : INVALID_PROTOBUF : Load model from /home/ubuntu/Virtual-Try-On/ckpt/humanparsing/parsing_atr.onnx failed:Protobuf parsing failed.

I will be really thankful if anyone can help me resolve this error.

1 Comment
2024/11/07
11:16 UTC

2

Problem with Precision Loss During Rescaling in 3D Segmentation

I am working with a set of ground truth points in Armstrong units, which I need to rescale to match the size of my 3D data grid. However, when I try to scale the points back to their original Armstrong units, I notice that I’m losing some precision, and the positions no longer match exactly.

Here’s the approach I’ve implemented so far:

  1. Rescaling to Image Grid: I first rescale the ground truth points (in Armstrong units) to fit within the image size. This involves calculating the scaling factors for each axis (x, y, z) based on the Armstrong range and image size.
  2. Creating a Mask from Rescaled Positions: I then use the rescaled coordinates to create a segmentation mask. This mask has 1s in the positions corresponding to the rescaled points and 0s elsewhere.
  3. Inverse Rescaling from the Mask: To map the points back to their original Armstrong units, I perform an inverse rescaling. I retrieve the coordinates from the mask and rescale them back using the inverse scaling factors.

However, when I rescale the points back to the Armstrong units, they don’t match the original positions exactly, leading to a loss of precision.

Let me share my code so you guys understand better

https://github.com/TanetiSanjay/Doubts/blob/main/seg.py

Edit: The code was not readable so I uploaded it in github.

0 Comments
2024/11/07
09:49 UTC

2

Are there any free finance apis that will help me in a real time deep learning project?

I am trying to work on a project that involves fetching real time data from apis and feeding in into an autoencoder model but most of the apis have extremely limited requests allowance. Are there any free resources that would suit real time streaming and if not can you specify any alternatives to make sure I stay in the api limit and still be able to build a robust autoencoder model?

0 Comments
2024/11/07
09:47 UTC

0

How can I optimize multi-batch and parallel inference in TensorRT for faster performance on high-resolution image patches?

Description I am encountering performance bottlenecks while running multi-threaded inference on high-resolution images using TensorRT. The model involves breaking the image into patches to manage GPU memory, performing inference on each patch, and then merging the results. However, the inference time per patch is still high, even when increasing the batch size. Additionally, loading multiple engines onto the GPU to parallelize the inference does not yield the expected speedup. I am seeking advice on optimizing the inference process for faster execution, either by improving batch processing or enabling better parallelism in TensorRT.

Environment TensorRT Version: 10.5.0 GPU Type: RTX 3050TI 4GB Nvidia Driver Version: 535.183.01 CUDA Version: 12.2 CUDNN Version: N/A Operating System + Version: Ubuntu 20.04 Python Version: 3.11

Relevant Files build_engine.py def build_engine(onnx_file_path, engine_file_path): logger = trt.Logger(trt.Logger.ERROR) builder = trt.Builder(logger) network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) profile = builder.create_optimization_profile() config = builder.create_builder_config() parser = trt.OnnxParser(network, logger)

if not os.path.exists(onnx_file_path):
    print("Failed finding ONNX file!")
    return
print("Succeeded finding ONNX file!")

with open(onnx_file_path, 'rb') as model:
    if not parser.parse(model.read()):
        print('Failed parsing the ONNX file')
        for error in range(parser.num_errors):
            print(parser.get_error(error))
        return
print('Completed parsing of ONNX file')

# Configure input profile
input_tensor = network.get_input(0)
profile.set_shape(input_tensor.name, (min_batch, shape[1], shape[2], shape[3]), shape, (max_batch, shape[1], shape[2], shape[3]))
config.add_optimization_profile(profile)

# Build the serialized engine
engine_string = builder.build_serialized_network(network, config)
if engine_string is None:
    print("Failed building engine!")
    return
print("Succeeded building engine!")

with open(engine_file_path, "wb") as f:
    f.write(engine_string)

inference.py class TRTModel: def init(self, trt_path): self.trt_path = trt_path trt.init_libnvinfer_plugins(None, "") self.logger = trt.Logger(trt.Logger.ERROR) with open(self.trt_path, "rb") as f: engine_data = f.read() self.engine = trt.Runtime(self.logger).deserialize_cuda_engine(engine_data)

def create_execution_context(self):
    return self.engine.create_execution_context()

def process_async(self, input_data):
    _, stream = cudart.cudaStreamCreate()
    context = self.create_execution_context()
    
    input_size = input_data.nbytes
    output_size = input_data.nbytes

    input_device = cudart.cudaMallocAsync(input_size, stream)[1]
    output_device = cudart.cudaMallocAsync(output_size, stream)[1]

    input_data_np = input_data.cpu().numpy()

    cudart.cudaMemcpyAsync(input_device, input_data_np.ctypes.data, input_data.nbytes,
                           cudart.cudaMemcpyKind.cudaMemcpyHostToDevice, stream)

    context.set_tensor_address('images', int(input_device))
    context.set_tensor_address('output', int(output_device))
    context.execute_async_v3(stream_handle=int(stream))

    output_host = np.empty_like(input_data_np, dtype=np.float32)
    cudart.cudaMemcpyAsync(output_host.ctypes.data, output_device, output_host.nbytes,
                           cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, stream)
    cudart.cudaStreamSynchronize(stream)

    cudart.cudaFree(input_device)
    cudart.cudaFree(output_device)
    cudart.cudaStreamDestroy(stream)

    return output_host

Steps To Reproduce Build the Engine: Use build_engine to convert an ONNX model into a TensorRT engine. Run Inference: Use TRTModel to perform inference on cropped image patches. Expected Result: While batch sizes are increased, the inference time per patch remains high. Running multiple engines for parallel inference also does not improve performance. Profiling Results: Transfer to device: 0.48 ms Inference time: 784.75 ms Transfer to host: 0.67 ms Total time for a single patch (256x256): 19-22 seconds on average I am seeking optimization suggestions for improving multi-batch processing or multi-threaded parallel inference in TensorRT.

0 Comments
2024/11/07
08:11 UTC

4

Why does my model "not work"?

Hey Hive mind,
I am new to deep learning and I am using deep learning (a CNN) to predict a timeseries.
I am using the model from this paper:
https://arxiv.org/pdf/2211.02024
So it has been done before and seems to work well.

However, for my data the output of the model is super weird. See images...
train/loss_r is the correlation for the training and val/loss_r is the correlation for the validation
and then for each region of interest (ROI) the predicted (blue) vs. real (orange) timeseries.

What is also weird is that it says for some ROIs r = 0.20 (or so) but the predicted signal (blue) is almost flat?

What am I doing wrong? Any input?

Edit: code is available here:
https://github.com/kovalalvi/beira/tree/master

https://preview.redd.it/tky9h9cnmfzd1.png?width=1204&format=png&auto=webp&s=8ade30d7a15c8c541ca5d213e18bcbf426826b86

https://preview.redd.it/gwz0v9cnmfzd1.png?width=1206&format=png&auto=webp&s=ea81533202718458b1851587e42c5be4a44bee99

https://preview.redd.it/z1mru9cnmfzd1.png?width=960&format=png&auto=webp&s=49a1db798c009ac320dd53b816b7a256400bd8b1

5 Comments
2024/11/07
07:31 UTC

0

New AI/DL tools

Do you know of any new or old tools or libraries related to AI and deep learning… Or for generative AI

2 Comments
2024/11/07
05:00 UTC

4

Understanding distillation in BYOL, JEPA architectures

I'm currently having trouble understanding why distillation works in JEPA and BYOL. This is how I'm currently thinking about it:

There are 2 encoders: teacher and student. Teacher weights are updated via exponential moving average of student weight. So essentially a "dumb" encoder teaching a "smart encoder"?

It's not intuitive to me why distillation would even work. Hope somebody can give a good explanation!

4 Comments
2024/11/07
04:55 UTC

1

Pdf querying project

I was reading a text book and I found it cumbersome to highlight the pdf, copy it and paste it in the chapGPT and ask queries on the pasted text. So i thought to build a project, basically an application, that lets us query using llms all we need to do is select the text in the pdf. Any thoughts for guidance, where to start or any tools i can use…

2 Comments
2024/11/07
04:44 UTC

1

Neural Network Optimization Problem

I am currently using timm_3d 3d classification models to train simple binary classification problem, I have around 200 sample data, i have used monai Densenet Resnet and other networks and have good train test and validation accuracy (above 95% balance accuracy) , but When using monai efficient net model and vgg models from timm_3d the loss function is not decreasing and accuracy is just above 50% , I have tried running using different learning rate and also tried different learning rate scheduler but none of them are working, How can I overcome this issue? Thank you

1 Comment
2024/11/07
04:08 UTC

Back To Top