/r/MLQuestions

Photograph via snooOG

A place for beginners to ask stupid questions and for experts to help them! /r/Machine learning is a great subreddit, but it is for interesting articles and news related to machine learning. Here, you can feel free to ask any question regarding machine learning.

What kinds of questions do we want here?

"I've just started with deep nets. What are their strengths and weaknesses?" "What is the current state of the art in speech recognition?" "My data looks like X,Y what type of model should I use?"

If you are well versed in machine learning, please answer any question you feel knowledgeable about, even if they already have answers, and thank you!


Related Subreddits:

/r/MachineLearning
/r/mlpapers
/r/learnmachinelearning

/r/MLQuestions

49,201 Subscribers

1

I'm having a hard time grasping a behavior I observed in the SimpleImputer in scikit-learn where it seems to update derived values correctly. Can someone explain this to me?

I'm going through the hands on machine learning book (3rd edition) and I'm at the part where you are preparing the data in the first project.

You need to use the SimpleImputer from scikit to fill in the median for total_bedrooms for some rows that have it missing.

What is confusing me is that there is also a column that relies on total_bedrooms. data['bedrooms_ratio'] = data['total_bedrooms'] / data['total_rooms']

My expectation was that data['bedrooms_ratio'] would get filled in with the median, since it was also NaN, however, after looking through the data, it was set to the correct value on all the rows where total_bedrooms was filled in by the imputer. Here is the code.

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='median')
housing_num = housing.select_dtypes(include=[np.number])
imputer.fit(housing_num);

X = imputer.transform(housing_num)
housing_tr = pd.DataFrame(X, columns=housing_num.columns, 
                                             index=housing_num.index)

and this is what is loading the data

def get_training_data(housing_in, test_ratio=0.2, random_state=42):
    train_set, _ =split_train_test(housing_in, test_ratio=test_ratio, random_state=random_state)
    train_set['rooms_per_house'] = train_set["total_rooms"] / train_set['households']
    train_set['bedrooms_ratio'] = train_set['total_bedrooms'] / train_set['total_rooms']
    train_set['people_per_home'] = train_set['population'] / train_set['households']
    return train_set

Can someone explain this behavior to me, it is not at all what I expected. I mean, it's great that it works that way. I'm just not sure what the "rules" for this behavior are and I was unable to find anything on it in the documentation.

0 Comments
2024/05/01
04:27 UTC

1

AI for speech to phonetic/text without the cloud/internet

Hello guys!

I was just wondering, I know it is possible, but if there was a way to have a remote device with an AI that could figure out the phonetics of what someone was saying and then simulate the jaw movements of the phones using piezoelectric crystals.

Hyper specific, I know.

I've been considering allosaurus or some of google's Wav2Vec2Phoneme stuff-- could I, like, copy the software over to a GPU/raspberry pi or something and use it from there?

I was also thinking to just resorting to speech to text tech such as Arduino's speech recognition software, but I am not sure that they do anything else other than recognise pre-written commands. I need real-time simulation of each letter, and their software seems slow also...

Has anyone done this type of thing before?

Am I putting this in the wrong subreddit? Where can I get more technical suggestions?

(I am gonna need some response today, my group project depends on it)

0 Comments
2024/05/01
02:20 UTC

1

Need help understanding training a model for use as pretrained weights

I have trained a segmentation model (model A) using dataset A. I now want to train a new model (model B) on dataset B, using the same model definition as model A, and which uses the trained model A as pretrained weights. Using pytorch, my understanding was that I could simply train model B using the following:

weight_file = 'weights/model_a.pth'
weights = torch.load(weight_file)
self.load_state_dict(weights)

However, this does not work, I get errors "Missing key(s) in state_dict" and "Unexpected key(s) in state_dict". If both models were trained with exactly the same model definition, then why can't the state_dict be mapped? I feel like I am missing something fundamental here, can anyone help?

3 Comments
2024/04/30
22:31 UTC

2

What are some good places to learn how to use "data for good"?

0 Comments
2024/04/30
21:49 UTC

1

[P] Persisting Overfitting Problem

Hello, I'm somewhat of a beginner in machine learning, but I have a good foundation. I started a school project which involves developing a model to predict the employability of data scientists in the United States. The goal is to help job seekers determine if they are employable or not in the current U.S. market. I scraped my data from Indeed and developed the model, but I am still facing issues with overfitting. Initially, there was a problem of data leakage, but now, even after resolving that, I still have an overfitting problem. I consistently get a perfect score of 100% on the training data and a score of 9% on the test data. With data leakage, it was 100% on both. Now, I'm unsure how to fix this. I've tried everything from data balancing to feature selection. I've tried everything i can think of and used several algorithms with grid search. I don't know what to do now, even though I think I've done good preprocessing of the raw data. Could someone help me identify the problem?

2 Comments
2024/04/30
21:13 UTC

1

How do I get team features based on player features?

I am making ML model that is going to predict outcome of Counter Strike 2 matches based on recent stats for each player in each team. For anyone not familiar with Counter Strike 2, matches are being played 5 v 5, This would be binary classification where model is going to predict if team1 or team2 is going to win. It will also return probability for each team to win.

For every match used for training I have player statistics for 20 latest matches (20 matches before the given match). I am not what would be good to use as features. I was thinking about having average, mean, max and min in the last 20 matches for each of 10 players and use that as features. But I believe(correct me if I am wrong) that order of stats for 10 players would in predictions. Thus, I would like something better. I would like to have team1 features calculated based on player's stats for the last 20 matches for each team and use those as features.

https://preview.redd.it/4j4yesspqnxc1.png?width=1043&format=png&auto=webp&s=b0d98a2f27cc1a33c26c2e9189a560b7ebc27c3a

Here is the picture of how my data is modeled in database.

3 Comments
2024/04/30
18:11 UTC

2

How can I level up my technical knowledge and skills?

I am doing an internship in data science with 13 other interns but out of them all are beginners in data science and most of us have transitioned from different tech stacks to data science but still I feel like I have low technical knowledge and skills than the others. So, what should I do to increase my knowledge and skills in data science and gain knowledge and skills on wider range of concepts and also be more technically sound. Kindly mention all the resources for practice and learning if you know any.

2 Comments
2024/04/30
10:23 UTC

2

What's the "meta" for training vision models?

Hello everyone,

I am approaching training my first model on my first "big" dataset, which I assume will require a bit more than my M1 laptop or 4070Ti desktop

The model would be YoloV8 on 7-21 GB datasets

I don't mind paying 10-100€ but I'd rather make sure I'm putting those money in the right place, where it's worth it

Will Colab be enough or should I point towards something else?

2 Comments
2024/04/30
08:23 UTC

1

Sensitivity of JAX multi-node training to network speed

I see documentation indicating that JAX's multi-node setup requires a Mellanox or similar fiber network.

I'm wondering about how sensitive to expect training speed to be to network speed. Would a 10 gigabit ethernet network be good enough to make the extra node worthwhile? Or do I really need fiber-like speeds such as 100, 200, 400, or 800 gigabits? (QSFP28, QSFP56, QSFP-DD, OSFP800)

Thanks for any thoughts.

3 Comments
2024/04/30
06:02 UTC

0

Machine Learning internships..

Hello, I'm seeking a summer internship, but I'm struggling to find one suitable for my level since this will be my first internship. Can anyone assist me in finding an internship that offers training in machine learning? While I have some knowledge in ML, it's not yet in-depth, and I hope to work on projects during the internship.

0 Comments
2024/04/30
05:48 UTC

3

Which course better for ML research career

Bachelor of Computer Science with Honours Or Bachelor of Computer Science (Data Science) with Honours I am interested in deep level knowledge of Ai, its transformers, deep learning, CV, mathematical foundations and even physics industry perhaps. Which shall i go with?

0 Comments
2024/04/30
05:46 UTC

1

Fine-tuning Object detection models to identify mutliple classes in Ariel images

I've chosen this guys as my bachelor's topic please let me know is this is good enough or I should pick something else If I should then suggest a topic if you can Thank you

0 Comments
2024/04/30
05:11 UTC

1

Help With Model for Sentiment Analysis (LSTM vs GRU)

Hi guys!

I am training this small model with approximately 4000 sentences for sentiment analysis (multi-class, pos, neut, neg):

model = Sequential(name='Sentiment_Analysis_MonoLSTM')
model.add(Embedding(V+1, 500, input_length=max_length, trainable=True, name='Embedding_layer'))
model.add(Conv1D(filters=64, kernel_size=4, padding='same', activation='relu', kernel_regularizer=l1_l2(0.005, 0.01), name='Conv1D_layer')

model.add(BatchNormalization(name='BatchNormalization'))

model.add(Bidirectional(LSTM(64, return_sequences=False), name='Bidirectional_LSTM_layer_1'))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu', kernel_regularizer=l1_l2(0.005, 0.01)))
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
optimizer = Adam(lr=1e-4)
model.compile(optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

early_stopping = EarlyStopping(monitor='val_loss',
patience=4,
restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.15, patience=1, min_lr=1e-7)
batch_size = 16
model.fit(train_pad, y_train, epochs=50, batch_size=batch_size, validation_data=(val_pad, y_val), callbacks=[reduce_lr, early_stopping])

I get stuck at about 50-60% validation accuracy. I honestly have tried a bunch of different things to add and decrease model complexity, but nothing seems to work. There is enough data to get better performance, so I don't know whether I should think about using GRUs instead of LSTMs.

Let me know what you think! Any advice works!

🤗

7 Comments
2024/04/30
01:37 UTC

1

[D] Reporting Overfitted ML Model Results in Research Paper Discussion

I have a tabular dataset of ~500 lines that I've collected for a research project. I want to write a paper after building a binary classification model for it. I do a bunch of feature engineering and the purpose of the model is interesting.

I have some missing data, missing not at random, and the choice of imputation also impacts the model highly. I apply column mean imputation.

I am fitting a gradient boosting machine to the data, do a standard feature selection, hyperparameter tuning, CV splits are actually done properly with no info leakage, and I can achieve an F1= 0.8,, AUROC=0.85 which is pretty good FWIW.

However, the model is pretty unstable, and making very small changes to the processing, a different model seed, column subsampling etc. will result in big changes in performance. Even the same feature set with the same set of hyperparameters can give wildly different results between experiments. It can be a F1=0.8 in one run, and F1=0.63 in the next, etc.

I am bootstrapping everything with 50 runs and reporting the avg/std so that it's not completely misleading, however certain steps in my preprocessing is done intentionally so that I can introduce more bias to the model and see a high number on my screen (like the choice of missing data impute).

Also, I apply a very harsh feature selection with two steps, so that I can actually be left with a small amount of features (15-20 features selected out of 90). If I leave one of the steps and leave 30 of the features, I could even get F1=0.88, AUROC =0.90. But I want to mitigate overfitting as much as possible while also being left with reportable results.

I am planning to report an average of F1=0.72, AUROC= 0.80, and report the standard deviations. Also, explain every step of my preprocessing, but I cannot mention how sensitive the model is to small changes.

What else can be done so that the results are reported in an honest manner? What are your opinions on applied ML in research? Is this acceptable?

Edit: Also, if for example I apply, kNN imputation or some other more sophisticated method, my results drastically decrease to 0.45-0.50 F1, etc. So, I have to apply column mean imputation, to intentionally make the model biased. Otherwise, no paper.

1 Comment
2024/04/30
01:18 UTC

0

ELI5: ML regression vs "regular" regression

Hoping to ask a very naive question without eliciting the ire of people who really know wtf they’re talking about! I’d appreciate any genuine and kind responses. Taking a beginner stats class and just learned that regression is something you can do in ML. Explain it like I’m five - how does ML regression differ from a regular old linear regression you’d plug into Stata/R (like I’m doing in my class now) vs ML regression (let’s say using sklearn)? Is it the multiple iterations that you get with sklearn to minimize RSS? (My understanding is that it runs a bunch of models, some with different subsets of variables, maybe even other hyper parameter differences like robust SE?) Or is it more the practice of splitting up the data into train/test so you can see how it actually performs on new data? Just trying to understand the fundamentals of what’s going on under the hood with my very basic stats/DS knowledge.

3 Comments
2024/04/30
00:37 UTC

0

Im lost, guide me.

I apologize for the dumb question, im new to AI.

Im trying extract Questions, answers, and explaination of the answer from Arabic textbooks

Previous developer used chatGPT, it worked well but not all the time. we're changing it cuz its expensive.
So far what i understood is i should use a tool like openNLP instead of chatGPT or Gemini.

Both GPT, Gemini and google confused the hell out of me, I just need to know where to start, or the tools i should use, nth more.

Thanks.

0 Comments
2024/04/29
20:55 UTC

2

CNN model doubts

Greetings to the community, I have recently taken up a course in ML with my final project of binary classification on audio snippets. I split my dataset of songs into 80:20 and further split every song into 10 second snippets on 11025Hz sampling rate. After selecting features like Spectograms, Chromagrams and MFCCs together, I am training various CNN models. The problem is my performance on the validation data. The accuracy on training goes up high as I train for more epochs. But my validation accuracies and loss are clearly overfitting. Infact the graph spikes up and down over all the epochs. Training Accuracies: 87%-95% depending on the number of epochs, learning rate and model architecture. Validation accuracies: 60%-70%

There is enough data to train (501 songs-142 class1 and 359 class 2). I am using BCELoss from Pytorch and Adam Optimizer. Until now I have tried out all techniques to reduce overfitting(L2 Regularization, Dropout, LR scheduler, etc)

What else can I do to overcome the overfitting problem. How should I improve my model architecture, if thats the issue. Please drop in any insightful suggestions that can help. Thank you!!

7 Comments
2024/04/29
19:09 UTC

1

Markov-Decison-Process

Anyone Here, Who have done a Project, In MDP Using Q-learning. I need help regarding this. If Anyone, Can We please connect, and The Help would mean a Lot

Edit: Here is my formulation. Ps: If anyone Has Enough Karma Can You Please Post It In the r/machinelearning.

The Figure below will give the readers a brief idea.

https://preview.redd.it/g3s5ut9g3kxc1.png?width=982&format=png&auto=webp&s=17ae7b7a52efe71439cd1fe91651bc1215582f83

1.States: The state of the MDP can be represented by the current inventory levels of each stakeholder (farmers, local traders, PPCs, wholesalers, ripening and storage facilities, local markets, farmers markets, and retailers). The state also includes additional information transportation costs, and handling costs.

2.Actions: The actions in this MDP represent the decisions of each stakeholder regarding where to supply or procure bananas. For example, a farmer can choose to supply to a local trader, PPC, or wholesaler. A wholesaler can choose to procure from a PPC, ripening and storage facility, or local trader.

3.Rewards: The reward function is to be made to minimize the total cost of the supply chain. This could include factors like revenue from sales, transportation costs, handling costs, and margins for each stakeholder.

4.Transition Function: The transition function describes how the state changes based on the current state and the actions taken by the stakeholders. This can include updating inventory levels, and other relevant information ( If any domain Experts can help).

  1. To use Q-learning to solve this problem: a. Initialize the Q-table: b. Define the Q-learning Algorithm: The Q-learning algorithm iteratively updates the Q-values based on the observed rewards and transitions between states. The Q-value update rule is: Q(s, a) = Q(s, a) + α * (r + γ * max(Q(s', a')) - Q(s, a)) s is the current state a is the action taken r is the immediate reward received s' is the next state α is the learning rate γ is the discount factor

6.Generate Training Episodes: Train the Q-learning algorithm, generate episodes (sequences of states, actions, and rewards) by simulating the supply chain process.

7.Update the Q-values: During each episode, update the Q-values based on the observed transitions and rewards using the Q-learning update rule.

8.Balance exploration (trying new actions to discover better solutions) and exploitation (choosing actions with the highest expected rewards). Use techniques like ε-greedy

  1. Train the Q-learning algorithms by generating multiple episodes and updating the Q-values until convergence. Convergence can be determined by monitoring the changes in the Q-values or the average rewards obtained during episodes.

  2. Deployment and Decision Making: ??? fill in please

7 Comments
2024/04/29
16:19 UTC

1

Tensorflow Lite Model throws SEGFAULT error when ran in flutter

Hello!

I am pretty new in ML and I am creating a mobile application that recognizes local sign language. I've trained a ConvLSTM model, saved in .h5 file and then converted it into tflite model. I tried checking the tflite model thru Google Colab and it shows that it works. When I use the model in my flutter application, it throws the following error

signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
Cause: null pointer dereference

I am guessing that the error is with the model that I created since when I tried using models that I downloaded from the internet, it works.

I would greatly appreciate if someone could help me with this issue. Thank you!!

0 Comments
2024/04/29
06:52 UTC

1

Stochastic MuZero Chance outcomes training values question

I recently stumbled on Stochastic MuZero paper. I really like the results of the paper and the overall process. I understand the inference of the network and the MCTS planning. However, I dont understand the training of the chance outcomes. Could someone explain ? In the MCTS the sigma variable represents the distribution over chance outcomes in that state. What is this distribution trained against ? In the paper they mention that its trained against some encoder ? Is there additional encoder in the network that is used for this or how do they know which chance outcome actually occured?

0 Comments
2024/04/28
23:03 UTC

4

Tensorflow Strided Slice Error. Need help.

TLDR at the bottom

My Full Tensorflow Code: Link. Please excuse all the different commented out parts of code, I've had a long road of trouble shooting this code.

Hardware and Software Setup

-Virtual Machine on Runpod

-NVIDIA A100 GPU

-Tensorflow 2.15

-CUDA 12.2

-cuDNN 8.9

What I'm doing and the issue I'm facing

I am trying to creating a visual generator AI, and to that end I am trying to implement the TGANv2 architecture in Tensorflow. The TGANv2 model I am following was originally written in Chainer by some researchers. I also implemented it in Pytorch (here is my PyTorch code if you are interested) and also ran it in Chainer. It works fine in both. But when I try to implement it in Tensorflow I start running into this error:

Traceback (most recent call last):

  File "/root/anaconda3/envs/tf_gpu/lib/python3.11/site-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
    ret = func(*args)
          ^^^^^^^^^^^

  File "/root/anaconda3/envs/tf_gpu/lib/python3.11/site-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^

  File "/root/anaconda3/envs/tf_gpu/lib/python3.11/site-packages/tensorflow/python/data/ops/from_generator_op.py", line 198, in generator_py_func
    values = next(generator_state.get_iterator(iterator_id))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/workspace/3TF-TGANv2.py", line 140, in __iter__
    yield self[idx]
          ~~~~^^^^^

  File "/workspace/3TF-TGANv2.py", line 126, in __getitem__
    x2 = self.sub_sample(x1)
         ^^^^^^^^^^^^^^^^^^^

  File "/workspace/3TF-TGANv2.py", line 99, in sub_sample
    x = tf.strided_slice(x, begin, end, strides)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/anaconda3/envs/tf_gpu/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None

  File "/root/anaconda3/envs/tf_gpu/lib/python3.11/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    except TypeError as e:

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__StridedSlice_device_/job:localhost/replica:0/task:0/device:GPU:0}} Expected begin and size arguments to be 1-D tensors of size 2, but got shapes [4] and [2] instead. [Op:StridedSlice]

What's important to note about this issue is that it does not come up right away. It can go through dozens of batches before this issue pops up. This error was generated with a batch size of 16, but if I lower my batch size to 8 I can even get it to run for 5 epochs (longest I've tried). The outputs of the Generator are not what I saw with Chainer or Pytorch after 5 epochs (it's mostly just videos of a giant black blob), though I am unsure if this is related to the issue. So with a batch size of 8 sometimes the issue comes up and sometimes it doesn't. If I lower the batch size to 4, the issue almost never comes up. The fact that this is batch size driven really perplexes me. I've tried it with multiple different GPUs.

Description of relevant parts of model and code

The way the Generator works is as follows. There is a CLSTM layer that generates 16 features maps that have a 4x4 resolution and 1024 channels each. Each feature map corresponds to a frame of the output video (the output video has 16 frames and runs at 8fps, so it's a 2 second long gif).

During inference each feature map passes through 6 upsampling blocks, with each upsampling block doubling the resolution and halving the channels. So after 6 blocks the shape of each frame is (256, 256, 16), so it has a 256p resolution and 16 channels. Each frame then gets rendered by a rendering block to render it into a 3-channel image, of shape (256, 256, 3). So the final shape of the output video is (16, 256, 256, 3) = (T, H, W, C), where T is the number of frame, H is the height, W the width, and C the number of channels. This output is a single tensor.

During training the setup is a bit different. The generated output video will be split up into 4 "sub-videos", each of varying resolution and frames. This will output a tuple of tensors: (tensor1, tensor2, tensor3, tensor4). The shapes of each tensor (after going through a rendering block to reduce the channel length to 3)) is tensor1=(16, 32, 32, 3), tensor2=(8, 64, 64, 3), tensor3=(4, 128, 128, 3), tensor4=(2, 256, 256, 3). As you can see, as you go from tensor1 to tensor4 the frame number gets halved each time while the resolution doubles. The real video examples also get split up into 4 sub-video tensors of the same shape. These sub-videos are what are fed into the discriminator. Now the functionality that halves the frame length is called sub-sampling. How the function works is that it starts at either the first or second frame (this is supposed to be random) and then selects every other frame. There is a sub-sample function in both the Videodataset class (which takes the real videos and generates 4 sub-video tensors) and in the Generator class. The Videodataset class outputs 4-D tensors (T, H, W, C), while the Generator class outputs 5 because it has a batch dimension N.

This is the sub-sample function in the VideoDataset class:

    def sub_sample(self, x, frame=2):
        original_shape = x.shape  # Logging original shape
        offset = 0  
        begin = [offset, 0, 0, 0]  # start from index 'offset' in the frame dimension
        end = [original_shape[0], original_shape[1], original_shape[2], original_shape[3]]
        strides = [frame, 1, 1, 1]  # step 'frame' in the Frame dimension
        x = tf.strided_slice(x, begin, end, strides)
        expected_frames = (original_shape[0]) // frame
        #print(f"VD Expected frames after sub-sampling: {expected_frames}, Actual frames: {x.shape[0]}")
        if x.shape[0] != expected_frames:
            raise ValueError(f"Expected frames: {expected_frames}, but got {x.shape[0]}")
        return x

This is the sub-sample function in the Generator class:

    def sub_sample(self, x, frame=2):
        original_shape = x.shape  # Logging original shape
        offset = 0  # 
        begin = [0, offset, 0, 0, 0]  # start from index 'offset' in the second dimension
        end = [original_shape[0], original_shape[1], original_shape[2], original_shape[3], original_shape[4]]
        strides = [1, frame, 1, 1, 1]  # step 'frame' in the second dimension
        x = tf.strided_slice(x, begin, end, strides)
        expected_frames = (original_shape[1]) // frame
        #print(f"Gen Expected frames after sub-sampling: {expected_frames}, Actual frames: {x.shape[1]}")
        if x.shape[1] != expected_frames:
            raise ValueError(f"Expected frames: {expected_frames}, but got {x.shape[1]}")
        return x

You'll notice I am using tf.strided_slice(). I originally tried slicing/sub-sampling using the same notation you would do for slicing a numpy array: x = x[:,offset::frame,:,:,:]. I changed it because I thought maybe that was causing some sort of issue.

Below is a block diagram of the Generator and VideoDataset (labeled "Dataset" in the block diagram) functionalities.

https://preview.redd.it/2vh7yx2g09xc1.png?width=1862&format=png&auto=webp&s=143d5c4c8df91fc71b9da1d3858feaae28c4605a

A point of note about the block diagram, the outputs of Dataset are NOT combined with the outputs of the Generator, as might be mistakenly deduced based on the drawing. The discriminator outputs predictions on the Generator outputs and the Dataset outputs separately.

I don't think this issue is happening in the backward pass because I put in a bunch of print statements and based on those print statements the error does not occur in the middle of a gradient calculation or backward pass.

My Dataloader and VideoDataset class

Below is how I am actually fetching data from my VideoDataset class:

    #Create dataloader
    dataset = VideoDataset(directory)
    dataloader = tf.data.Dataset.from_generator(
        lambda: iter(dataset),  # Corrected to use iter() to clearly return an iterator from the dataset
        output_signature=(
            tf.TensorSpec(shape=(16, 32, 32, 3), dtype=tf.float32),
            tf.TensorSpec(shape=(8, 64, 64, 3), dtype=tf.float32),
            tf.TensorSpec(shape=(4, 128, 128, 3), dtype=tf.float32),
            tf.TensorSpec(shape=(2, 256, 256, 3), dtype=tf.float32)
        )
    ).batch(batch_size)

and here is my VideoDataset class:

class VideoDataset():
    def __init__(self, directory, fraction=0.2, sub_sample_rate=2):
        print("Initializing VD")
         = directory
        self.fraction = fraction
        self.sub_sample_rate = sub_sample_rate
        all_files = [os.path.join(self.directory, file) for file in os.listdir(self.directory)]

        valid_files = []
        for file in all_files:
            try:
                # Read the serialized tensor from file
                serialized_tensor = tf.io.read_file(file)
                # Deserialize the tensor
                tensor = tf.io.parse_tensor(serialized_tensor, out_type=tf.float32)  # Adjust dtype if necessary
                # Validate the shape of the tensor
                if tensor.shape == (16, 256, 256, 3):
                    valid_files.append(file)
            except Exception as e:
                print(f"Error loading file {file}: {e}")

        # Randomly select a fraction of the valid files
        selected_file_count = int(len(valid_files) * fraction)
        print(f"Selected {selected_file_count} files")
        self.files = random.sample(valid_files, selected_file_count)

    def sub_sample(self, x, frame=2):
        original_shape = x.shape  # Logging original shape
        offset = 0  
        begin = [offset, 0, 0, 0]  # start from index 'offset' in the frame dimension
        end = [original_shape[0], original_shape[1], original_shape[2], original_shape[3]]
        strides = [frame, 1, 1, 1]  # step 'frame' in the Frame dimension
        x = tf.strided_slice(x, begin, end, strides)
        expected_frames = (original_shape[0]) // frame
        #print(f"VD Expected frames after sub-sampling: {expected_frames}, Actual frames: {x.shape[0]}")
        if x.shape[0] != expected_frames:
            raise ValueError(f"Expected frames: {expected_frames}, but got {x.shape[0]}")
        return x
    
    def pooling(self, x, ksize):
        if ksize == 1:
            return x
        T, H, W, C = x.shape
        Hd = H // ksize
        Wd = W // ksize
        # Reshape the tensor to merge the spatial dimensions into the pooling blocks
        x_reshaped = tf.reshape(x, (T, Hd, ksize, Wd, ksize, C))
        # Take the mean across the dimensions 3 and 5, which are the spatial dimensions within each block
        pooled_x = tf.reduce_mean(x_reshaped, axis=[2, 4])
        return pooled_x
    
    def __len__(self):
        return len(self.files)
    
    def __getitem__(self, idx):
        #print("Calling VD getitem method")
        serialized_tensor = tf.io.read_file(self.files[idx])
        video_tensor = tf.io.parse_tensor(serialized_tensor, out_type=tf.float32)
        x1 = video_tensor
        x2 = self.sub_sample(x1)
        x3 = self.sub_sample(x2)
        x4 = self.sub_sample(x3)
        #print("\n")
        x1 = self.pooling(x1, 8)
        x2 = self.pooling(x2, 4)
        x3 = self.pooling(x3, 2)
        #print(f"Shapes of VD output = {x1.shape}, {x2.shape}, {x3.shape}, {x4.shape}")
        return (x1, x2, x3, x4)
    
    def __iter__(self):
        print(f"Calling VD iter method, len self = {len(self)}")
        #Make the dataset iterable, allowing it to be used directly with tf.data.Dataset.from_generator.
        for idx in range(len(self)):
            yield self[idx]self.directory

The issue is happening at one point when the dataloader is fetching examples from Videodataset in my opinion, I just can't figure out what is causing it.

TLDR

I am using a runpod VM with an NVIDIA A100 GPU. I am trying to train a GAN that outputs 2 second long gifs that are made up fo 16 frames. One of the training step involves splitting the output video (either real or fake) into 4 sub videos of different frame length and resolution. The reduction of frames is achieve by a sub-sample function (which you can find earlier in my post, it is bolded) that starts at the first or second frame of the video (random) and then selects every other frame, so it halves the frames. So I am essentially doing a strided slice on a tensor, and I am using tf.strided_slice(). I tried using regular slicing notation (like you would use in NumPy), and I get the same error. The weird thing about this is that the issue does NOT come up immediately in training and is dependent on batch size. The training goes through several batch iterations just fine (and sometimes some epochs) with a batch size of 16. If I lower the batch size to 8 it's absle to go thorugh even more iterations, even up to 5 epochs (I didn't test it for longer), although the outputs are not the outputs I would expect after some epochs (I expect a specific type of noisy image based on how this model ran in PyTorch and Chainer frameworks, but I instead get a video that's mostly just a black blob through most of the resolution, just a bit of color on the edges). If I go down to a batch size of 4 the issue goes away mostly. See below for the error I am seeing:

Error:

Expected begin and size arguments to be 1-D tensors of size 2, but got shapes [4] and [2] instead. [Op:StridedSlice]

9 Comments
2024/04/28
20:37 UTC

1

Help me structure my plan.

Hi im undergrad at cse and I'm taking part in my first hackathon and want to build a webapp that is used by people for their mental health , I have an idea I want to create a webapp and integrate some ML models to predict their mental health but i can structure my thoughts , please help me I think I can use " a webapp that can be used by people having low mental health or people who are unaware about their mental health" But I'm not sure which models should I create to achieve this. I know how to make songs recommender system and movie recommender system and also songs recommender using emotions but I still don't know how to use these or any model that can be perfect for my webapp.

Ps- can you also tell me how to improve my webapp.

1 Comment
2024/04/28
20:28 UTC

1

Getting into ML with an uncommon major?

I recently got admitted into a pretty good engineering school for undergrad but also only recently begun learning about ML/AI engineering. I was thinking of Majoring in Computational and Systems Neuroscience since my primary fascination is with the brain whilst the overall course plan for it combines courses in math, engineering, physics, neuroscience, etc.

I’ve been told it’d be a good option because of how I’d learn about neural networks & deep learning, computational modeling, CS (etc.) & there’d be some relevant math but I think I’d probably have to take additional math classes as electives. I don’t feel like it’d be necessary since there’s also a bunch of courses online with which I can learn that math & coding knowledge if needed. If anything I’d hope to take electives that are closer to ML/Neuroscience. I’ve heard people talk more about how a CS degree is more about understanding the math behind the processes and not much of coding. (But again idk) I also asked models like ChatGPT and Google Gemini just to see another perspective but I’m sure the information isn’t 100% reliable. Either way, the neuroscience and being able to create such language models is what I find interesting right now.

There’s also talk about how it’s a little risky as it’s a rapidly changing field in which you’re always having to learn which I don’t find a problem with, I’m sure that keeps things interesting! I also once saw someone say that they’d rather have someone that enjoyed the job to a point where it wouldn’t feel like work anymore over someone that hated everyday working. It’s not a matter of not wanting to put in the work for an Engineering/CS degree but more about being able stay interested in what I’m learning if that makes sense. (Like having that link with something I enjoy learning + Something I’m motivated to learn about?) It’s VERY important to me that I don’t cut corners so I’d like to know some opinions on pursuing a career in MLE with the major.

Sorry if this is all over the place, I currently know nothing about ML & AI but find it really interesting.

3 Comments
2024/04/28
15:37 UTC

1

Machine Learning Interview

Hello everyone, Are AI/ML positions abundant nowadays?, if not, then what are you guys doing who planned to be an AI/ML engineer. If yes, then how was your Interview. Were you asked DSA/System Design questions or just AI/ML related topics. There are not many AI/ML jobs available here, so I'm just upskilling and hoping for best. Keen to hear from you all

1 Comment
2024/04/28
14:50 UTC

1

Is ordinal classification appropriate for my data?

I have a number of unstructured text documents, each of which is a list of responsibilities. I plan to use some embedding to vectorise them.

My target variables (I have multiple) represent aspects that each document should be graded on, such as complexity, agency, and skills required. They are numeric, but the difference between them is not meaningful. For example, for "agency", the values might be interpreted as follows:

  1. Solely following fixed instructions that never change, with no room for judgement or interpretation
  2. Solely following fixed instructions that never change, with some room for judgement or interpretation
  3. Solely following fixed instructions that may occasionally change, with some room for judgement or interpretation

Each value is in some sense "greater" than the one before, which leads me to believe that ordinal classification is appropriate here.

Is this a good approach? What else should I consider? I was thinking of using gradient-boosted trees (like XGBoost) - should I look at deep learning instead?

I'm primarily an SWE with some ML experience, but I haven't touched it for a few years.

0 Comments
2024/04/28
14:20 UTC

3

Recommend me some courses for LLM

I recently tried to make a chatbot, and it was really frustrating to have chatgpt not work (idk why but it just couldn't answer langchain questions , maybe the training cutoff date) , the docs are not so well arranged... And even if I do somehow get the code to work, it does not perform very well bcz I don't know much in the first place, I have a theoretical understanding of ML, but idk what are the diff kind of chains, retrievers, agents... I just find it to be a lot of things which are scattered all over the place

So, can someone pls recommend me a course on langchain which consolidates all the different techniques (chains, agents, vectordb etc.) And goes a bit in depth for everything, like how does this chain work or the diff methods of querying to the vectordb... Also feel free to recommend courses other than langchain, it's just langchain is the only LLM framework I know...

2 Comments
2024/04/28
12:36 UTC

1

Using ML to help me run my business

I'm curious about using ML to help me run my business. I have no idea where to start but want to start somewhere. Can I for example have it look at all the emails conversations I've had and pinpoint the top questions I get from clients? Another example: my wife likes writing a lot (for her business), can we feed all the stuff she's written to it and build some kind of AI from her knowledge? What about using ML to actually run the business, is that possible yet?

If yes, is there some kind of platform I need to use? Cloud-based I'm guessing but I'd prefer offline? What courses would you recommend to get started (youtube, udemy coursera, etc)? Thanks!

10 Comments
2024/04/28
12:27 UTC

1

Newbie: string classification question

First, apologies for the potentially poorly-worded or plain stupid question. I have many years of experience in programming but I only started playing with ML in python.

I thought the best way to learn some core ML concepts is by trying to solve a real problem. I want to classify strings into "real" words / usernames vs. fake / random / keyboard bashes.

I created a CSV training data like

real.username,1
hgasdhgg,0
gaspyy,1
assdasasd,0

... and so on

Next, I'm tokenizing by character, like this:

tokenizer = Tokenizer(char_level=True)
tokenizer.fit_on_texts(strings)
sequences = tokenizer.texts_to_sequences(strings)
padded = pad_sequences(sequences, 20, padding='post')

I'm generating the training/testing data with train_test_split and then I'm training it like this:

model = Sequential([
    Dense(128, activation='relu', input_shape=(20,)),    
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=50, validation_data=(X_val, y_val))

The problem is the results are pretty poor. model.evaluate gives a 70% accuracy and when I use model.predict on different entries, the results are all over the place.

My training data has about 1000 entries, evenly split. Is the dataset too small or is there a fundamental error I'm making?

2 Comments
2024/04/27
12:19 UTC

5

Do people still use classical Boltzmann Machines?

I have been learning about energy-based models and taking a look at the code implementation to get a sense of how they work. It appears that resources are limited regarding classical Boltzmann Machines (not RBM and DBM). Are there any good resources to learn more about the technical implementation of Boltzmann Machines?

3 Comments
2024/04/27
02:45 UTC

4

Is Machine learning an engineering career?

I am a mech engineer and i have a huge interest in ML. I am making my base on Data science. In the course they tell that engineer solve complex problem with ML through ANNs. I am not in the field most my course is based in mech engineering. So the experts in the group how common are these jobs and are they common?

4 Comments
2024/04/26
20:52 UTC

Back To Top