/r/MLQuestions

Photograph via snooOG

A place for beginners to ask stupid questions and for experts to help them! /r/Machine learning is a great subreddit, but it is for interesting articles and news related to machine learning. Here, you can feel free to ask any question regarding machine learning.

What kinds of questions do we want here?

"I've just started with deep nets. What are their strengths and weaknesses?" "What is the current state of the art in speech recognition?" "My data looks like X,Y what type of model should I use?"

If you are well versed in machine learning, please answer any question you feel knowledgeable about, even if they already have answers, and thank you!


Related Subreddits:

/r/MachineLearning
/r/mlpapers
/r/learnmachinelearning

/r/MLQuestions

51,239 Subscribers

0

Resources for AIML

Hey πŸ‘‹, I have completed my BTech 1st year in IIIT Naya Raipur. And I have completed DSA(Graphs a d dynamic programming are left) in python and Java and I am able to solve leetcode easy and medium questions and I also learnt pop and oop in python,c++ and java. And I am pretty much interested in AIML can you give me some resources to excel my skills in AIML.

0 Comments
2024/06/30
13:10 UTC

1

Resources for AIML

Hey πŸ‘‹, I have completed my BTech 1st year in IIIT Naya Raipur. And I have completed DSA(Graphs a d dynamic programming are left) in python and Java and I am able to solve leetcode easy and medium questions and I also learnt pop and oop in python,c++ and java. And I am pretty much interested in AIML can you give me some resources to excel my skills in AIML.

0 Comments
2024/06/30
13:08 UTC

1

Dobta about how to start machine learning and should I learn software engineering before starting ML

Hi I am in my 2 year of college and I wanted to start machine learning but one of my friend who is also doing ML said that I should start with doing software engineering ( as it covers basics of computer) as C++ and it's DSA. So I have no idea where to start can someone guide me with their valuable knowledge and how can I start my ML career.

Thank you for giving your time and for your valuable response.

4 Comments
2024/06/30
05:37 UTC

3

"RuntimeError: BlobWriter not loaded" error when exporting a PyTorch model to CoreML. How to fix it?

I get a "RuntimeError: BlobWriter not loaded" error when exporting a PyTorch model to CoreML. How to fix it?

Same issue with Python 3.11 and Python 3.10. Same issue with torch 2.3.1 and 2.2.0. Tested on Windows 10.

Export script:

# -*- coding: utf-8 -*-
"""Core ML Export
pip install transformers torch coremltools nltk
"""
import os
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch
import torch.nn as nn
import nltk
import coremltools as ct

nltk.download('punkt')

# Load the model and tokenizer
model_path = os.path.join('model')
model = AutoModelForTokenClassification.from_pretrained(model_path, local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, local_files_only=True)

# Modify the model's forward method to return a tuple
class ModifiedModel(nn.Module):
    def __init__(self, model):
        super(ModifiedModel, self).__init__()
        self.model = model
        self.device = model.device  # Add the device attribute

    def forward(self, input_ids, attention_mask, token_type_ids=None):
        outputs = self.model(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
        return outputs.logits


modified_model = ModifiedModel(model)

# Export to Core ML
def convert_to_coreml(model, tokenizer):
    # Define a dummy input for tracing
    dummy_input = tokenizer("A French fan", return_tensors="pt")
    dummy_input = {k: v.to(model.device) for k, v in dummy_input.items()}

    # Trace the model with the dummy input
    traced_model = torch.jit.trace(model, (
    dummy_input['input_ids'], dummy_input['attention_mask'], dummy_input.get('token_type_ids')))

    # Convert to Core ML
    inputs = [
        ct.TensorType(name="input_ids", shape=dummy_input['input_ids'].shape),
        ct.TensorType(name="attention_mask", shape=dummy_input['attention_mask'].shape)
    ]
    if 'token_type_ids' in dummy_input:
        inputs.append(ct.TensorType(name="token_type_ids", shape=dummy_input['token_type_ids'].shape))

    mlmodel = ct.convert(traced_model, inputs=inputs)

    # Save the Core ML model
    mlmodel.save("model.mlmodel")
    print("Model exported to Core ML successfully")

convert_to_coreml(modified_model, tokenizer)

Error stack:

C:\Users\dernoncourt\anaconda3\envs\coreml\python.exe C:\Users\dernoncourt\PycharmProjects\coding\export_model_to_coreml6_fopr_SE_q.py 
Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'
Fail to import BlobReader from libmilstoragepython. No module named 'coremltools.libmilstoragepython'
Fail to import BlobWriter from libmilstoragepython. No module named 'coremltools.libmilstoragepython'
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\dernoncourt\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
C:\Users\dernoncourt\anaconda3\envs\coreml\lib\site-packages\transformers\modeling_utils.py:4565: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
When both 'convert_to' and 'minimum_deployment_target' not specified, 'convert_to' is set to "mlprogram" and 'minimum_deployment_target' is set to ct.target.iOS15 (which is same as ct.target.macOS12). Note: the model will not run on systems older than iOS15/macOS12/watchOS8/tvOS15. In order to make your model run on older system, please set the 'minimum_deployment_target' to iOS14/iOS13. Details please see the link: https://apple.github.io/coremltools/docs-guides/source/target-conversion-formats.html
Model is not in eval mode. Consider calling '.eval()' on your model prior to conversion
Converting PyTorch Frontend ==> MIL Ops:   0%|          | 0/127 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 126/127 [00:00<00:00, 2043.73 ops/s]
Running MIL frontend_pytorch pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00, 212.62 passes/s]
Running MIL default pipeline:  37%|β–ˆβ–ˆβ–ˆβ–‹      | 29/78 [00:00<00:00, 289.75 passes/s]C:\Users\dernoncourt\anaconda3\envs\coreml\lib\site-packages\coremltools\converters\mil\mil\ops\defs\iOS15\elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
  return input_var.val.astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 78/78 [00:00<00:00, 137.56 passes/s]
Running MIL backend_mlprogram pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12/12 [00:00<00:00, 315.01 passes/s]
Traceback (most recent call last):
  File "C:\Users\dernoncourt\PycharmProjects\coding\export_model_to_coreml6_fopr_SE_q.py", line 58, in <module>
    convert_to_coreml(modified_model, tokenizer)
  File "C:\Users\dernoncourt\PycharmProjects\coding\export_model_to_coreml6_fopr_SE_q.py", line 51, in convert_to_coreml
    mlmodel = ct.convert(traced_model, inputs=inputs)
  File "C:\Users\dernoncourt\anaconda3\envs\coreml\lib\site-packages\coremltools\converters\_converters_entry.py", line 581, in convert
    mlmodel = mil_convert(
  File "C:\Users\dernoncourt\anaconda3\envs\coreml\lib\site-packages\coremltools\converters\mil\converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "C:\Users\dernoncourt\anaconda3\envs\coreml\lib\site-packages\coremltools\converters\mil\converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "C:\Users\dernoncourt\anaconda3\envs\coreml\lib\site-packages\coremltools\converters\mil\converter.py", line 307, in mil_convert_to_proto
    out = backend_converter(prog, **kwargs)
  File "C:\Users\dernoncourt\anaconda3\envs\coreml\lib\site-packages\coremltools\converters\mil\converter.py", line 130, in __call__
    return backend_load(*args, **kwargs)
  File "C:\Users\dernoncourt\anaconda3\envs\coreml\lib\site-packages\coremltools\converters\mil\backend\mil\load.py", line 902, in load
    mil_proto = mil_proto_exporter.export(specification_version)
  File "C:\Users\dernoncourt\anaconda3\envs\coreml\lib\site-packages\coremltools\converters\mil\backend\mil\load.py", line 400, in export
    raise RuntimeError("BlobWriter not loaded")
RuntimeError: BlobWriter not loaded

Process finished with exit code 1
0 Comments
2024/06/30
03:32 UTC

1

My model gets created but not trained. Need help

The following is my code:

import matplotlib.pyplot as plt
import pandas as pd

import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D , MaxPool2D , Flatten , Dropout 
from keras.src.legacy.preprocessing.image import ImageDataGenerator
from keras.optimizers import Adam, SGD

from sklearn.metrics import classification_report,confusion_matrix
from sklearn.model_selection import train_test_split

import tensorflow as tf
import json

import cv2
import os

import numpy as np

from keras.datasets import mnist
import argparse
from ast import literal_eval
import json

data = 0
with open('/Users/samuelprovenzano/Desktop/GameDevelopment/Godot4.2.0/Spatialness/Data.json', 'r') as f:
    data = json.load(f)

input = data

new_input = []
validation = []

for i in input:
    new_input.append([])
    for j in range(6):
        if j == 5:
            validation.append(i[j])
        else:
            new_input[len(new_input)-1].append(i[j])

x_train = new_input[:int(len(new_input) - (len(new_input) * 0.2))]
y_train = new_input[:int(len(new_input) - (len(new_input) * 0.2))]
x_test = validation[int(len(new_input) - (len(new_input) * 0.2)):]
y_test = validation[int(len(new_input) - (len(new_input) * 0.2)):]

x_train = new_input[:20]
y_train = validation[:20]

x_test = new_input[20:]
y_test = validation[20:]

x_train = np.array(x_train)
y_train = np.array(y_train)
x_test = np.array(x_test)
y_test = np.array(y_test)

opt = Adam(learning_rate=0.00001)
model = Sequential()
model.add(Dense(128, input_dim=5, activation="relu"))
model.add(Dense(64,activation="relu"))
model.add(Dense(1, activation="linear"))
model.compile(loss='mse', optimizer = opt, metrics=['mae'])
model.summary()

history = model.fit(x_train,y_train,epochs = 20 , validation_data = (x_test, y_test))

The problem is that when I start the training, is gets to 1 epoch and stops.

Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                         ┃ Output Shape                ┃         Param # ┃
┑━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
β”‚ dense (Dense)                        β”‚ (None, 128)                 β”‚             768 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense_1 (Dense)                      β”‚ (None, 64)                  β”‚           8,256 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ dense_2 (Dense)                      β”‚ (None, 1)                   β”‚              65 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 Total params: 9,089 (35.50 KB)
 Trainable params: 9,089 (35.50 KB)
 Non-trainable params: 0 (0.00 B)
Epoch 1/20

Below is what my data looks like before being converted to numpy array
Training:
[[4.07847213745117, 0.00196698230299392, -0.00566931236143, 0.00346527582201236, 0], [0.556317746639252, -0.0019701067059, -0.03814148104292, 0.00223098284024739, 0], [3.77389144897461, -0.00552960719586, -0.02206867968592, 0.00423209430682622, 0.515625], [2.58137392997742, -0.00264708069455, -0.05483102057413, 0.00195612419991669, 0.484375], [4.32815313339233, -0.00618482284632, -0.04589552838923, -0.00174142023086, 0.328125], [1.4347710609436, -0.00612428846653, 0.025490098914395, -0.00395036229219, 0.703125], [4.4814624786377, 0.000656538388475036, 0.0190668290036113, 0.000375348011132228, 0.359375], [3.67614650726318, -0.00682427331133, -0.03395023557006, -0.00333052065108, 0.671875], [2.80180549621582, 0.00368187846368339, -0.01250583413317, 0.00362652086334095, 0.265625], [3.9584903717041, -0.01420360661018, 0.0243454452301491, -0.00493602916047, 0.546875], [3.62931847572327, -0.00433386971266, -0.05167021454485, -0.00498671452557, 0.4375], [0.723354876041412, 0.00345079982499723, 0.0386537442397906, -0.00564126698649, 0.328125], [3.48066020011902, -0.00692774232116, -0.04859673939339, 0.00509167532173053, 0.578125], [2.12765622138977, -0.00297944319859, 0.0485325530499253, -0.00380111499963, 0.546875], [0.426249265670776, -0.00210969951936, -0.01568306541653, 0.00488972552761576, 0.578125], [1.97426342964172, 0.00133038576358964, 0.0474803921986947, -0.00602926615063, 0.359375], [2.34402370452881, -0.01222090086367, 0.0313411865843805, 0.00246997137946028, 0.640625], [0.34597909450531, -0.0009879550022, -0.0106645203943, -0.00192159010061, 0.5], [1.74070918560028, 0.000779410503417217, -0.01551780378732, -0.00608293352793, 0.4375], [4.21650791168213, -0.00407419174073, -0.01219866242827, 5.84940560677804e-05, 0.53125]]

Validation:
[29.417516708374, 27.3860111236572, 23.0467643737793, 18.0484676361084, 8.7041482925415, 26.8840274810791, 27.6892986297607, 26.578239440918, 22.0586910247803, 7.33460235595703, 15.004940032959, 22.9986820220947, 24.3519477844238, 29.9847106933594, 13.1349277496338, 13.3223991394043, 6.52166795730591, 8.20684814453125, 37.1182098388672, 24.8545188903809]

I can't find my problem here. Thanks

0 Comments
2024/06/30
00:24 UTC

2

"Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'"when converting an ONNX model to CoreML with onnx-coreml lib. How to fix?

I try to use the onnx-coreml lib to convert an ONNX model to CoreML:

import onnx
from onnx_coreml import convert
onnx_model = onnx.load('model.onnx')
coreml_model = convert(onnx_model)
coreml_model.save('model.mlmodel')

Error:

C:\code\export_model_to_coreml.py 
scikit-learn version 1.3.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'
Fail to import BlobReader from libmilstoragepython. No module named 'coremltools.libmilstoragepython'
Fail to import BlobWriter from libmilstoragepython. No module named 'coremltools.libmilstoragepython'
Traceback (most recent call last):
  File "C:\code\export_model_to_coreml.py", line 2, in <module>
    from onnx_coreml import convert
  File "C:\Users\dernoncourt\anaconda3\envs\py311\Lib\site-packages\onnx_coreml\__init__.py", line 6, in <module>
    from .converter import convert
  File "C:\Users\dernoncourt\anaconda3\envs\py311\Lib\site-packages\onnx_coreml\converter.py", line 35, in <module>
    from coremltools.converters.nnssa.coreml.graph_pass.mlmodel_passes import remove_disconnected_layers, transform_conv_crop
ModuleNotFoundError: No module named 'coremltools.converters.nnssa'

Process finished with exit code 1

How to fix?

0 Comments
2024/06/30
00:00 UTC

2

noob question: why compute the gradient of a NN cost with respect to the input data?

please see this stanford.edu PDF regarding computing the gradient of a NN, in particular please see the top of pg 6.

At the top of pg 6, they compute the gradient of the NN cost with respect to all the parameters U W b_2 b_1 (all the affine transformations), but they also include the gradient with respect to the input x.

What use is there in computing the gradient with respect to x, if it is not a parameter? You aren't trying to learn x, it's not a parameter, it's fixed. I was unable to figure this out from online readings.

1 Comment
2024/06/29
22:32 UTC

1

Is there a service I could use to train my data/use case?

I’m a developer but have never worked with AI directly. I have a set of data where each item is a json object with 20 name-value pairs. Each of these items could have an associated score.

The associated score could be expressed in 1 of 2 different ways. Either as an integer, -100 to 100 (like 32, -65 etc) or it could be expressed as a value of -1, 0, 1 (negative, neutral, positive).

Then I’m hoping to feed it additional items and have it return a score and possibly a an optional confidence level.

I’d like to use a service to do this rather than trying to setup an environment from scratch and I don’t have to make this a programmatic task if there’s a UI based way to do it as this is probably something I’m only going to do once.

Looking to see if someone could point me in the right direction.

1 Comment
2024/06/29
21:06 UTC

0

MLOps and Generative AI

I am pursuing AiML from IIIT banglore, please let me know which electives subject should I choose? We have two options MLOps and Generative AI

4 Comments
2024/06/29
12:22 UTC

5

Anyone want to join an ML group outside of reddit to share resources and help with learning?

2 Comments
2024/06/29
11:20 UTC

2

Brainstorming: Generative model knows what to do, but does it incorrectly

Hi everyone,

I am trying to train a Flow Matching model that generates rotated versions of images. In my simple example, I try to learn how to rotate 9s by 180 degree.

The following results are visualized after training:

Left: A randomly sampled 9 from the source distribution. From Left to right, I forward integrate the image in time using my learned vector field $x_t = x_t + dt * v(x_t+h(t))$

However, my model seems to understand that it needs to "remove" the 9 and add the rotated version (right plot). However, it never produces good results: I played with the lr, optimizer, network, #epochs, number of samples, stepsize dt, variance of the target normal distribution, etc.

Does anyone have an idea what could be the reason, why I see these results?

I don't know if this is related, but due to the forward integration, the values become >1 over time.

Thanks in advance!

https://preview.redd.it/qvl1nqur8g9d1.png?width=1608&format=png&auto=webp&s=3b909d9209b8a4dc7e83023055c3ad3044e25d50

4 Comments
2024/06/29
05:52 UTC

1

what could be wrong with this model

https://preview.redd.it/dzr6fpzekd9d1.png?width=1519&format=png&auto=webp&s=491bca65650ecaf7eeba2b8c8e3bc0c996cf37e4

why are those spikes appearing? I used an CNN model, AdamW optimizer with a CEloss loss function, lr= 0.001, dropout=0.3.

5 Comments
2024/06/28
20:48 UTC

1

opencv/mediapipe questions, training sign languages

Hello! I have recently finished ML specialization by Andrew Ng and am working to create hand gesture recognition model. While I am working with opencv and mediapipe, I came across other github and this person seems to trained the model in a way that only hand landscapes are shown (no color except white and green). Is this preferable thing when it comes with training the model? Sorry, I am really new to ML. Also, when I train the model, should I create a neural network with multi classification?

0 Comments
2024/06/28
20:21 UTC

1

Detection of musical instruments using Yamnet

My goal is to detect musical instruments with AI (machine learning).

I'm currently using the Yamnet model to make inferences, but it has a very wide range of categories, for example, "Growling", "Printer", and "Piano". I wonder if that causes it to be less precise in detecting instruments since instrument classes are only a fraction of the total classes.

The description of the Yamnet model on Kaggle states that:

You should expect to do some amount of fine-tuning and calibration to make YAMNet usable in any system that you build.

There is another model called NSynth, with a large dataset of musical instrument samples, but it is used for synthesizing new sounds, rather than classifying/detecting instruments.

Would fine-tuning the Yamnet module with NSynth make sense in that case?

0 Comments
2024/06/28
19:47 UTC

1

Problems with GFP-GAN TencentARC

A few months ago, I think it was in March, this AI GFP-GAN Tencent ARC stopped working, when installing the model, it tells me that there are folders missing within the results folder, I tried to create them, but when starting the installation procedure improvement it gives me the same error, I don't know what happened, but that error deleted the folders that I created, there are some methods of this AI that do not delete the folders but it gives me an error like this:

Traceback (most recent call last): File "/content/GFPGAN/inference_gfpgan.py", line 7, in <module> from basicsr.utils import imwrite File "/usr/local/lib/python3.10/dist-packages/basicsr/init.py", line 4, in <module> from .data import * File "/usr/local/lib/python3.10/dist-packages/basicsr/data/init.py", line 22, in <module> _dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames] File "/usr/local/lib/python3.10/dist-packages/basicsr/data/init.py", line 22, in <listcomp> _dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames] File "/usr/lib/python3.10/importlib/init.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/usr/local/lib/python3.10/dist-packages/basicsr/data/realesrgan_dataset.py", line 11, in <module> from basicsr.data.degradations import circular_lowpass_kernel, random_mixed_kernels File "/usr/local/lib/python3.10/dist-packages/basicsr/data/degradations.py", line 8, in <module> from torchvision.transforms.functional_tensor import rgb_to_grayscale ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'

And this is the error I get when there are missing folders:


FileNotFoundError Traceback (most recent call last) <ipython-input-3-186fc93c6a8f> in <cell line: 53>() 147 fps = 25.0 #change this to FPS of your source video 148 --> 149 convert_frames_to_video(pathIn, pathOut, fps) 150 151 #after processing frames converted to .avi video , delete upscaled frames from previous video

<ipython-input-3-186fc93c6a8f> in convert_frames_to_video(pathIn, pathOut, fps) 29 def convert_frames_to_video(pathIn,pathOut,fps): 30 frame_array = [] ---> 31 files = [f for f in os.listdir(pathIn) if isfile(join(pathIn, f))] 32 #for sorting the file names properly 33 files.sort(key = lambda x: int(x[5:-4]))

FileNotFoundError: [Errno 2] No such file or directory: '/content/GFPGAN/results/restored_imgs

Well, as you can see, I get these two errors. I tried to update everything but no luck. I hope I can receive help in this community. Here is the method I always use in case you are curious enough to try it.

https://colab.research.google.com/drive/1e29WixKpCvtCeieHamfaW5hSKuzzVxCi?usp=sharing#scrollTo=XTVL4m_zXqBU

1 Comment
2024/06/28
17:39 UTC

2

How to fix low performance of model due to one class underperforming?

I am trying to perform a multi class classification (3 classes) as part of a kaggle competition (not a monetary prize competition).

I have tried using Random forest, XGBoost, SVM and MLPs. They all yield more or less similar accuracy scores. The precision, recall and accuracy are reasonable for two classes, but low for one of the classes. This lowers the overall accuracy score. The dataset is imbalanced with the low performing class having the lowest number of samples. I have tried weighting the loss function based on the number of samples in each class but this doesn't not lead to much improvement.

What can be done to improve the accuracy score? Are there specific methods which deal with data where one of the classes is significantly more difficult to classify?

1 Comment
2024/06/28
15:40 UTC

0

Any good in depth tutorial to learn LLM in python using Openai?

I am working on an office project, I need to learn this asap. I can't find anything good on YouTube. Anything helps. Thank you

1 Comment
2024/06/28
08:36 UTC

2

Can Jax be unsuitable for something?

Sup ML, I'm considering creating a Jax-only RL loop for maximum speed and parallelization. My environment is dependent on a large piece of code, namely a 3d geometry kernel (CAD kernel). That's millions of lines of code, at least 30% of which is just pure math.

Right now all such kernels are in C++ and a few in Rust and other languages. And from what I know, they are heavily reliant on CPU usage. Further, I've the team member of the kernel said that running everything on GPU will be impossible and largely unpractical.

I want to rewrite a subset that I need to gain the speed boost - as everyone knows it could be some 1000 times faster w/o CPU. But to avoid a mistake that could cost me weeks if not 1.5-2 months, could it be that some programs are simply unsuitable for JAX? With Jax, is there no such thing as unsuitable at all?

Some description:

  1. Computation is generally done on Doubles, not on simple floats due to increased precision requirement (may be somewhat waived in my use-case).

  2. Computation must be fast and there will be a lot of it. 3d "models" is involved, although there will be no explicit rendering until the very end.

  3. I consider using Google TPUs exclusively for the simulation.

So, is "not suitable for GPU" only legacy-thinking BS or is it actually possible?

2 Comments
2024/06/28
08:02 UTC

2

MLOPS Question: Deploying Model Ensembles BentoML to Sagemaker Alternatives

Hi, we are using BentoML and serving our model endpoint in an ECS accessible thru an API call. The model endpoint consists of multiple computer vision models (e.g. YOLO models) working together. Our use case is as follows

  • should use only CPU VMs for now (GPUs are quite expensive, but it depends on the MLOps design)
  • API calls can be sporadic (i.e. mostly single calls, random, etc.) but there are rare times within the day that the API will be called maybe 4-5x simultaneously)
  • Predicting if an API request for the model can be somehow predicted as to when someone logs-in to the application, but not foolproof
  • The endpoint can take 1-2 minutes to give a full prediction for an API call (since its CPU) but can be faster with a cheap GPU endpoint
  • The user does not need the results to be instant. But it shouldnt be longer than 2-5 minutes.
  • should not crash when you send concurrent requests
  • cheap

Currently, BentoML on ECS (probably because of a very small machine) crashes when it receives multiple concurrent requests (maybe the RAM is getting flooded).

Is Sagemaker a good answer to this use case? Are there some alternatives to this?

3 Comments
2024/06/28
06:29 UTC

5

My loss gets a big spike downward after every epoch.

Hi everyone,

I'm fine-tuning a DistilBERT model on a classification task using the IMDB dataset. After every epoch, however, my loss gets really low, then starts rising again to a value which it maintains for the whole epoch. This behavior repeats itself.

I created a Trainer class, and the training loop for one epoch is handled in the following way:

def _train_one_epoch(self, model, trainloader, testloader):
        """Train the model for one epoch."""
        model.train()
        training_loss = 0.0
        accumulated_loss = 0.0
        total_steps = len(trainloader) // self.gradient_accumulation_steps
        with tqdm(total=total_steps) as pbar:
            for i, data in enumerate(trainloader):
                
                # Move data to the device
                if isinstance(data, list):
                    data = [item.to(self.device) for item in data]
                elif isinstance(data, dict):
                    data = {key: value.to(self.device) for key, value in data.items()}
                else:
                    data = data.to(self.device)
                
                # Forward pass
                with autocast(enabled=self.use_mixed_precision, dtype=torch.float16):
                    output = model.train_step(data)
                    loss = output[0] if isinstance(output, tuple) else output
                    loss /= self.gradient_accumulation_steps
                    
                # Backward pass
                if self.use_mixed_precision:
                    self.scaler.scale(loss).backward()
                else:
                    loss.backward()
    
                accumulated_loss += loss.item() * self.gradient_accumulation_steps
            
                total_norm = 0.0
                for param in model.parameters():
                    param_norm = param.grad.detach().data.norm(2)
                    total_norm += param_norm.item() ** 2
                total_norm = total_norm ** (1. / 2)
                self._log_metrics({"Total Gradient Norm": total_norm})
                
                # Update the weights
                if (i + 1) % self.gradient_accumulation_steps == 0:
                    if self.use_mixed_precision:
                        self.scaler.unscale_(self.optimizer)
                        if self.max_grad_norm is not None:
                            nn_utils.clip_grad_norm_(model.parameters(), self.max_grad_norm)
                        self.scaler.step(self.optimizer)
                        self.scaler.update()
                    else:
                        if self.max_grad_norm is not None:
                            nn_utils.clip_grad_norm_(model.parameters(), self.max_grad_norm)
                        self.optimizer.step()
                    self.optimizer.zero_grad()
                    pbar.update(1)
                      
                    if self.scheduler:
                        self.scheduler.step()

                    if self.log:
                        self._log_metrics({"Training Loss": accumulated_loss / (i + 1), 
                                            "Learning Rate": self.optimizer.param_groups[0]['lr']})
                pbar.set_postfix({'Training Loss': accumulated_loss / (i + 1)})

            training_loss = accumulated_loss / len(trainloader)
        self.optimizer.zero_grad()
        return training_loss

This behaviour happens every time independently of the learning rate (I also use scheduling). It also happens on task such as quesion answering or classification on different datasets such as Hyperpartisan. You can also see the loss behaviour when training for the QA task on WikihopQA.

imdb

1 Comment
2024/06/27
17:41 UTC

1

How to estimate ML project

I am trying to build an in-house ML department. We are getting clients to build ML solutions and the sales team asks for estimates. If they were standard ML projects I can easily estimate them. However, most of the time the problem requires a good amount of research and data exploration before we can give estimations. Also they require estimations in the for of sprints/weeks which is very difficult to do. How do you do such kind of estimations for ML projects that require research and experimentation? How do you create a pricing model around this? I am not sure if this is the right subreddit, so please do point me in the right direction in that case.

3 Comments
2024/06/27
12:28 UTC

0

What TTS voice is this?

0 Comments
2024/06/27
12:10 UTC

1

Deep Learning Project Hardware Requirements with $2K budget: large and complex dataset

Hello everyone.

Although it's been more than 8 months since I got into the field of applied machine learning (and deep learning in particular) for the sake of defending my thesis on an ECG analysis algorithm, I have yet to figure out the hardware requirements for an optimal setup that would take into consideration an intelligent use of the research grant of two thousand dollars.

I'm not a US citizen, and our country does not have Nvidia suppliers. My laptop is weak with an Intel core i3 processor and 4GB of RAM. My options within the country are to either buy a new laptop or get a workstation for a little less than twice the price of a 16GB RAM and core i7 laptop. But I have read elsewhere that laptops aren't a great option for heavy DL projects, although I was thinking about the possibility of using an SSD to increase memory and time efficiency. Google Collaboratory seemed like a good option at first, but it has limitations when tackling such large projects, especially with the processing of data.

I have to apply deep learning to the complex dataset of electrocardiogram signals and my field of study is biomedical engineering which takes little account of these topics. It would be appreciated to get an insightful response to not blunder with the money. Much thanks for your time and consideration in reading this far.

2 Comments
2024/06/27
10:35 UTC

1

multi gpu training for large data set

I am trying to implement this ae cnn model to train on 2 GPUs. the data is stored as ~150 npy files containing 60k instances of 100k bytes each. I am new to working with huge datasets like this and gpu training. I implemented this but this isnt utlizing GPUs. I cannot load the entire data into m/y so I do it in batches. Please guide me on this

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  On   | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0    62W / 400W |  38819MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM...  On   | 00000000:00:05.0 Off |                    0 |
| N/A   33C    P0    62W / 400W |  38819MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     10175      C   python                          38817MiB |
|    1   N/A  N/A     10175      C   python                          38817MiB |

import os
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv1D, MaxPooling1D, UpSampling1D, Dropout
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from google.cloud import storage
from tensorflow.distribute import MirroredStrategy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import MeanSquaredError
import logging
import random

# Set up logging
logging.basicConfig(filename='app.log', filemode='w', format='%(name)s - %(levelname)s - %(message)s', level=logging.INFO)

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "key.json"
BASE_DIR = "/mnt/data_disk/exp"


class ConvAutoEncoder:
    def __init__(self, input_dim=100000):
        self.input_dim = input_dim
        logging.info("Creating Convolutional AutoEncoder model...")
        self.autoencoder, self.encoder = self.create_model()

    def create_model(self):
        inputs = Input(shape=(self.input_dim, 1))

        # Encoder
        x = Conv1D(64, 8, activation='relu', padding='same')(inputs)
        x = MaxPooling1D(2, padding='same')(x)
        x = Dropout(0.25)(x)
        x = Conv1D(32, 8, activation='relu', padding='same')(x)
        x = MaxPooling1D(2, padding='same')(x)
        encoded = MaxPooling1D(2, padding='same')(x)

        # Defining the encoder model for the encoded representation
        encoder = Model(inputs, encoded)

        # Decoder
        x = Conv1D(32, 8, activation='relu', padding='same')(encoded)
        x = UpSampling1D(2)(x)
        x = Dropout(0.25)(x)
        x = Conv1D(64, 8, activation='relu', padding='same')(x)
        x = UpSampling1D(2)(x)
        x = UpSampling1D(2)(x)  # Ensure this line matches the dimensionality reduction caused by pooling
        decoded = Conv1D(1, 3, activation='sigmoid', padding='same')(x)

        # Autoencoder
        autoencoder = Model(inputs, decoded)

        autoencoder.compile(optimizer='adam', loss='mean_squared_error')

        return autoencoder, encoder




def get_gcs_files(benign_bucket, malicious_bucket):
    benign_files = tf.io.gfile.glob(benign_bucket + "/*.npy")
    malicious_files = tf.io.gfile.glob(malicious_bucket + "/*.npy")
    all_files = benign_files + malicious_files
    random.shuffle(all_files)

    split_index = int(len(all_files) * 0.8)
    train_files = all_files[:split_index]
    validation_files = all_files[split_index:]

    train_ben_files = sum('ben' in os.path.basename(file) for file in train_files)
    train_mal_files = sum('mal' in os.path.basename(file) for file in train_files)
    validation_ben_files = sum('ben' in os.path.basename(file) for file in validation_files)
    validation_mal_files = sum('mal' in os.path.basename(file) for file in validation_files)

    logging.info(f'Training set: {train_ben_files} benign files, {train_mal_files} malicious files')
    logging.info(f'Validation set: {validation_ben_files} benign files, {validation_mal_files} malicious files')

    return train_files, validation_files


def load_npy_file(file_path, batch_size):
    data = np.load(file_path, mmap_mode='r')
    num_batches = len(data) // batch_size  # This ensures all batches have the same size
    for i in range(num_batches):
        batch = data[i*batch_size:(i+1)*batch_size]
        batch = np.expand_dims(batch, axis=-1)  # Add a channel dimension
        logging.info(f'Processing batch from {file_path} starting at index {i*batch_size}')
        yield batch, batch
    remainder = len(data) % batch_size
    if remainder != 0:
        final_batch = data[-remainder:]
        final_batch = np.expand_dims(final_batch, axis=-1)  # Add a channel dimension
        logging.info(f'Processing final smaller batch from {file_path}')
        yield final_batch, final_batch



def npy_generator(file_list, local_dir, batch_size):
    while True:
        for gcs_file in file_list:
            local_file = os.path.join(local_dir, os.path.basename(gcs_file))
            # logging.info(f'Copying file {gcs_file} to {local_file}')
            # tf.io.gfile.copy(gcs_file, local_file, overwrite=True)
            yield from load_npy_file(local_file, batch_size)
            logging.info(f'Removing local file {local_file}')
            tf.io.gfile.remove(local_file)


def train_model():
    gpus = tf.config.experimental.list_physical_devices('GPU')
    logging.info(f"gpu #: {gpus}")
    devices = [f"/gpu:{i}" for i in range(len(gpus))]
    logging.info(f"devices : {devices}")
    mirrored_strategy = tf.distribute.MirroredStrategy(devices=devices)

    train_files, validation_files = get_gcs_files('gs://data-malware-gen/unsupervised_data/ben_npy', 'gs://data-malware-gen/unsupervised_data/mal_npy')
    train_files = ["gs://data-malware-gen/unsupervised_data/ben_npy/preprocessed_ben_data_batch_1_2024-06-11.npy"]
    validation_files = ["gs://data-malware-gen/unsupervised_data/ben_npy/preprocessed_ben_data_batch_1_2024-06-11.npy"]

    local_dir = BASE_DIR
    batch_size = 64
    
    with mirrored_strategy.scope():
        model = ConvAutoEncoder().autoencoder
        model.compile(optimizer=Adam(learning_rate=0.001), loss=MeanSquaredError())
    model.summary()
    
    # Define the options for the dataset
    options = tf.data.Options()
    options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.AUTO

    
    # Create datasets
    train_dataset = tf.data.Dataset.from_generator(
        lambda: npy_generator(train_files, local_dir, batch_size),
        output_signature=(
            tf.TensorSpec(shape=(None, 100000, 1), dtype=tf.float32),
            tf.TensorSpec(shape=(None, 100000, 1), dtype=tf.float32)
        )
    )

    validation_dataset = tf.data.Dataset.from_generator(
        lambda: npy_generator(validation_files, local_dir, batch_size),
        output_signature=(
            tf.TensorSpec(shape=(None, 100000, 1), dtype=tf.float32),
            tf.TensorSpec(shape=(None, 100000, 1), dtype=tf.float32)
        )
    )

    # Optimize datasets
    train_dataset = train_dataset.with_options(options).prefetch(tf.data.experimental.AUTOTUNE)
    validation_dataset = validation_dataset.with_options(options).prefetch(tf.data.experimental.AUTOTUNE)

    
    model.fit(train_dataset, validation_data=validation_dataset, epochs=5)

    model.save_weights('autoencoder_weights.h5')
    logging.info("Autoencoder weights saved!")

    encoder_model = model.encoder
    encoder_model.save_weights('encoder_weights.h5')
    logging.info("Encoder weights saved!")

    # Generate embeddings for the first batch of the validation data
    batch_data = next(iter(validation_dataset))[0]
    embeddings = encoder_model.predict(batch_data)
    logging.info("Generated embeddings!")

    # Save embeddings
    np.save('embeddings.npy', embeddings)
    logging.info("Embeddings saved!")

if __name__ == '__main__':
    train_model()
1 Comment
2024/06/27
07:25 UTC

1

Confused on roc_curve() input (Python)

In sklearn.metrics. The input for roc_curve() are my test labels and my label predictions using predict(), which has a threshold of 0.5. But from what I understand, the ROC curve I get computes values across all thresholds, so my question is why do I need to input the predictions one threshold? Does it expect this input to have a threshold of 0.5 and base things off of that? Because otherwise I don’t see why it’s needed

2 Comments
2024/06/27
06:55 UTC

1

GridSearchCV with data removal

Hello,

I'm creating a model that predicts song streams in one week, using the number of streams from the previous seven days. Some of the rows have 0s in all of these columns, and I want to remove them. However, I'm not sure how I can Perform GridSearchCV such that it handles this data removal in only the specific training set that is unique to each CV iteration. any advice?

Thanks!

0 Comments
2024/06/27
06:09 UTC

1

A DSA certificate wanted!!

I want a dsa in python certification. I want a certificate only in dsa to have a liable proof. Let's talk about the price range later. Oh and I'm an ai and ml student but I need a certificate more viable. Can anyone suggest some ???

8 Comments
2024/06/27
05:47 UTC

0

Does the rate at which models learn to translate reflect the rate at which humans acquire foreign languages, and/or vice versa?

The US government has a list ranking languages by difficulty to learn, provided that English is one’s mother tongue. For example, learning Spanish from English is on average much easier than learning Turkish or Russian from English, since things like agglutination, word order, and declension are rather foreign if you’re a monolingual English speaker. If a model is trained to translate between English and Spanish, for example, would this take less time than training the same model to learn to translate from English to Turkish? Is there any objective way to measure this, since available corpora are of different sizes and richness and in-person contextual information (e.g. different pronouns in Vietnamese) is not necessarily available in paired texts?

1 Comment
2024/06/26
21:42 UTC

5

Data Science without a degree, possible?

Is it possible to get into data science without a dedicated masters degree/phd, if yes then how?

11 Comments
2024/06/26
16:14 UTC

1

LLM finetuning help

I've been studying about fine-tune LLMs and I have come to a conclusion that the code to actually fine tuning the model is almost the same (use trl and a model from hugging face). The main aspect is the dataset that we use and how we evaluate it.

On the same context I would like to fine-tune a model on a particular domain, for example lets say medical. I did SFT on a instructions dataset but then got lost searching for a preference dataset to further fine-tune using DPO.

Please help me with where I can find datasets (I've search for DPO based Medical datasets on huggingface). Also correct me if my process of thinking is wrong.

0 Comments
2024/06/26
14:23 UTC

Back To Top