/r/MLQuestions

Photograph via snooOG

A place for beginners to ask stupid questions and for experts to help them! /r/Machine learning is a great subreddit, but it is for interesting articles and news related to machine learning. Here, you can feel free to ask any question regarding machine learning.

What kinds of questions do we want here?

"I've just started with deep nets. What are their strengths and weaknesses?" "What is the current state of the art in speech recognition?" "My data looks like X,Y what type of model should I use?"

If you are well versed in machine learning, please answer any question you feel knowledgeable about, even if they already have answers, and thank you!


Related Subreddits:

/r/MachineLearning
/r/mlpapers
/r/learnmachinelearning

/r/MLQuestions

56,605 Subscribers

2

Why do DDPMs implement a different sinusoidal positional encoding from transformers?

Hi,

I'm trying to implement a sinusoidal positional encoding for DDPM. I found two solutions that compute different embeddings for the same position/timestep with the same embedding dimensions. I am wondering if one of them is wrong or both are correct. DDPMs official source code does not uses the original sinusoidal positional encoding used in transformers paper... why?

1) Original sinusoidal positional encoding from "Attention is all you need" paper.

Original sinusoidal positional encoding

https://preview.redd.it/569mjvbnqyvd1.png?width=706&format=png&auto=webp&s=c4ef0141668b5c80e835a5bef2631c35a4b57fba

2) Sinusoidal positional encoding used in the official code of DDPM paper

Sinusoidal positional encoding used in official DDPM code. Based on tensor2tensor.

https://preview.redd.it/m6juj22iqyvd1.png?width=702&format=png&auto=webp&s=5e7f6da6d3a281895366d197a13d81179e7e0c22

Why does the official code for DDPMs uses a different encoding (option 2) than the original sinusoidal positional encoding used in transformers paper? Is the second option better for DDPMs?

I noticed the sinusoidal positional encoding used in the official DDPM code implementation was borrowed from tensor2tensor. The difference in implementations was even highlighted in one of the PR submissions to the official tensor2tensor implementation. Why did the authors of DDPM used this implementation (option 2) rather than the original from transformers (option 1)?

ps: If you want to check the code it's here https://stackoverflow.com/questions/79103455/should-i-interleave-sin-and-cosine-in-sinusoidal-positional-encoding

0 Comments
2024/10/20
19:55 UTC

2

A generalisation of trees by replacing each split with a one-knot cubic spline fit. Has anyone tried this? Does this approach have a name? Seems to be a pretty obvious idea to me but AI says no one's tried it and a cursory Google search didn't return any results

You know how tree-based algorithms just do a split. If you think about algorithms like XGBoost, every time you split you are just creating another step in a step function. Step functions have discontinuities and so are not differentiable which makes them a bit harder to optimise.

So I have been thinking, how can I make a tree-based algorithm differentiable? Then I thought why not replace the step function with a differenatiable one? One idea is a cubic spline with only one knot. As we know, at the end of a cubic spline the value just flatlines - this is just a like step function. Also a cubic spline can smooth the transition of the left and right split.

So here's my rough sketch of an XGBoost-like algorithm to build ONE TREE

  1. For each feature, try to fit a one-knot cubic spline to the pseudo-residual where the end points are parameters too.
  2. "Split" the node by using the best feature and the knot's location as the split point
  3. Repeat 1 to 2 for the sample before the knot and one for after the knot
  4. Optimise all parameters at once instead of fixing parameters so splits can be refined as the algorithm goes along;

This algorithm is novel in that it kinda keeps growing the tree from a simple model unlike a neural network where the architecture is fixed at the beginning. With this structure, it organically grows (of course u need a stopping criterion of some kind but yet).

Also because the whole "tree" is differentiable, one can optimise the parameters even further up the tree at any one step which help alleviate the greediness of algorithms like XGBoost where once you've choosen a split point, that split point is there permanent. where as In my cubic spline approach the whole tree's parameters can still be optimised (although it wil be a pain to use so many indicator functions).

Also by making the whole tree differentiable, one can apply lots of techniques from neural networks to optimise things like using RADAM optimisers, or sending batches of data through the network etc etc.

3 Comments
2024/10/20
11:56 UTC

4

If I add a randomly generated feature to a tabular dataframe and call XGBoost on it, and I stop the growth of a node if that feature was selected and use that as my stop-growth criterion. Is this is a known approach?

I would find it hard to believe that this is a new approach I came up with but it occured to me that it's a pretty cute way to say "well, even a random feature is doing better than everything else; so stop growing this node any furhter".

Is this a well known idea and has a name?

AI (Gemini specifically) tells that it's a good idea and that it's not aware of a name for it.

What do you think? Do you think it's a good idea or a bad one?

2 Comments
2024/10/20
11:41 UTC

1

Fine tuning for segmenting LEGO pieces from video ?

Right now looking for a base line solution. Starting with Video or images of spread out lego pieces.

Any suggestion on a base model, and best way to fine-tune ?

0 Comments
2024/10/20
09:33 UTC

8

after making dozens of project and publishing 2 papers and 3 internship in machine learning, i want to fulfill my childhood dream of sharing my knowledge with community through youtube, can you suggest me what you might want to watch?

i was suggested that it is the right place for this question so posting here, After gaining my own perspective on ml and working with industry leaders i felt that now i am ready to make in-depth YouTube video telling the overall new story of same old classical ml and then take journey from there to learning by doing projects and comparing different approach, overall resulting in the community of learners. teaching is my passion and giving back to the community is what i have always learned from, in this while doing my research on what are the competitions and how can i thrive as a helping_buddy i feel i might require a lot of video editing skill or may be knowledge of memes as they are quite popular in teaching videos. can you as a reader having read this much tell me what content you usually watch for ml

15 Comments
2024/10/20
08:37 UTC

1

Ensemble Modeling for Predicting Dengue Cases base on climate factors, population, demographics

Hi! I have an idea of using stacking ensemble learning for predicting dengue cases. My dataset contains dates(temporal) and geospatial(geohraphy of barangays). I am also gonna use climate factors and demographics like population and age group, and also historic cases of dengue. For this ensemble model, I want to firstly use LSTM since my data is sequential. My initial is LSTM, random forest, SARIMA, and xgboost as my meta-model. My problem is are these model i initally choose a good combination, and if not, what other models should I incorporate? I really need help.

3 Comments
2024/10/20
08:11 UTC

1

Weird loss issue with different validation/training split sizes?

Hello, I've been trying to build a transformer for predicting certain values from sequences of time series data.

The input features are a sequence of time series data, but divided into "time windows" of a certain sequence length. So 1 input into the network would be like 8 or so features, but ~168 rows of those features in a time series sequence.

The output is just a couple scalar values.

It is set up in pytorch. My question isn't so much about transformers themselves or programming or machine learning architecture, but about a specific phenomenon/problem I keep noticing with the way I organize the data.

The code starts by splitting the data into training, validation, and test data. Because it's time series data, I can't just take all points and then shuffle them and sample, as that would leave parts of windows into other sets. I have to first split the data into 3 segments, for training, validation, and testing. After that, it creates the windows isolated in their segments, then shuffled the windows.

During training, I've noticed that the validation loss is always lower than the training loss on epoch 1. No I know this can be normal, especially when reporting training loss during an epoch, and validation loss at the end of the epoch, as the validation set is like 1/2 and epoch better trained, but this is different.

If I run the code at like 0.00000001(so that the training won't influence the comparison) learning rate, the validation loss will be like half of the training loss(for example, validation at 0.4 and training at 0.7 or so). If I run it 100 times, the validation loss will ALWAYS be significantly lower than the training, which seems like an impossible coincidence especially given that I took training out of the equation.

All of the above happens when I have the data split 60% training, 15% validation, and 15% test. If I change the split to 40% training and 40% validation, the losses instantly start at around the same value. Every time.

Now this would be fine, I could just make the splits even, however just the fact that that happens makes me think that somehow the data splitting or size is influencing the way my code treats the training and validation.

I've tried everything to make the training and validation perform exactly the same to isolate the issue. I've compared the models forwarding behavior on train and eval mode, and they give the same output for the same inputs, so that not it. I've made sure the batch size is identical for both training and evaluating. If the set is split differently only the number of batches differ, making sure they are divisible by the batch size.

It's just hard for me to move on and decelope other parts of the code when I feel like this problem will make all of that not work properly so it doesn't seem like any work I do on it matters unless I figure this out, so does anyone know what can cause this?

I'm generally new to ML. I understand machine learning algorithms and architecture to an intermediate degree. I have a intermediate proficiency in python, however I'm not good enough to implement the entire code myself so I use claude for assistance, however I understand what each part of the code does conceptually(I just can't write it all myself)

0 Comments
2024/10/20
07:33 UTC

1

What's the status of applications with external complexity versus internal complexity nowadays for artificial neural networks?

I've been learning about ANNs. It seems to me that there's a significant amount of difference between the behaviour of external complexity & internal complexity for them. I couldn't find any online resources summarising the current trend of how these differences are being exploited for either academic or commercial purposes. Please help me understand this topic.

0 Comments
2024/10/20
05:42 UTC

1

Vision Transformer Models not generalizing well on independent validation dataset

Hi everyone,

I am training a Wave Vision Transformer model. The code for the Wave_ViT is available on the below link.

https://github.com/YehLi/ImageNetModel/blob/main/classification/wavevit.py

I did not change the code for wave_ViT.py and torch_wavelets.py file. The only change I made is in the pipeline of how to provide data to model. My original dataset involves around 38000 MRI images of 256 with RGB format. I augmented this dataset by rotating each image from its original angle to 90, 180, 270 degrees and saved those images. So each image has its 3 rotated copies. Hence, My original dataset increased to about 156000 images of same size and format.

I further saved those images with labels in a numpy.memmap format of uint8 as my code was giving me OOM error when tried to directly load them in an numpy array at once.

I load my memmap in train and test images with labels like this.

def load_memmap_data( train_memmap_file, train_label_memmap_file, test_memmap_file, test_label_memmap_file,num_train_images,     num_test_images):
        train_images = np.memmap(train_memmap_file, dtype='uint8', mode='r', shape=(num_train_images, 256, 256, 3))
        train_labels = np.memmap(train_label_memmap_file, dtype='int32', mode='r', shape=(num_train_images,))
    
        test_images = np.memmap(test_memmap_file, dtype='uint8', mode='r', shape=(num_test_images, 256, 256, 3))
        test_labels = np.memmap(test_label_memmap_file, dtype='int32', mode='r', shape=(num_test_images,))


        return train_images, train_labels, test_images, test_labels


# Create memory-mapped files for train/test datasets
train_memmap_file = 'train_images.dat'
train_label_memmap_file = 'train_labels.dat'
test_memmap_file = 'test_images.dat'
test_label_memmap_file = 'test_labels.dat'


train_images, train_labels, test_images, test_labels = load_memmap_data( 
    train_memmap_file=train_memmap_file, 
    train_label_memmap_file=train_label_memmap_file, 
    test_memmap_file=test_memmap_file, 
    test_label_memmap_file=test_label_memmap_file,
    num_train_images=num_train_images,
    num_test_images=num_test_images
    )

My optimizer and call to train function in Trainer class looks like this.

model = WaveViT()
model = nn.DataParallel(model)

optimizer = optim.AdamW(model.parameters(), lr=1e-4, weight_decay=1e-4)
loss_fn = nn.CrossEntropyLoss()
trainer = Trainer(model, optimizer, loss_fn, exp_name="waveViT-256-aug", device=device)
trainer.train(train_images, train_labels, test_images, test_labels, epochs=100, config=None, steps_per_epoch=steps_per_epoch, augment=False)

My Trainer class looks like this. This take the images and labels, augment them to the particular transformation for the epoch, and train and test it on Model.

class Trainer:
def __init__(self, model, optimizer, loss_fn, exp_name, device):        
        self.model = model.to(device)
        self.optimizer = optimizer
        self.loss_fn = loss_fn
        self.exp_name = exp_name
        self.device = device
        def train(self, train_images, train_labels, test_images, test_labels, epochs, config = None,steps_per_epoch = 0,augment = False):
        train_losses, test_losses, test_accuracies,train_accuracies,test_precision,train_precision,test_recall,train_recall,test_f1,train_f1 = [], [], [],[],[],[],[],[],[],[]
        best_test_loss = float('inf')  # Initialize with a large value
        best_accuracy = 0.0  # Initialize with the worst possible accuracy
        scaler = GradScaler()
        # Early stopping variables
        best_epoch = 0
        epochs_no_improvement = 0  # Counter for epochs without improvement
        # Train the model
        transform_1 = transforms.Compose([
      # Rotate the image by 0-40 degrees
       transforms.RandomAffine(degrees=(-40,40), shear=15),
        transforms.RandomVerticalFlip(p=0.5),
        transforms.ToTensor(),])
        
        transform_2 = transforms.Compose([
 # Shear with a 20-degree angle
        transforms.RandomResizedCrop(size=224, scale=(0.95, 1.0)),
        transforms.RandomVerticalFlip(p=0.5),
        transforms.RandomAffine(degrees=0, translate=(0.15, 0.15)),
        transforms.RandomApply([transforms.ElasticTransform(alpha=30.0)], p=0.3),transforms.ToTensor(),])

        transform_3 = transforms.Compose([transforms.ToTensor(),])
        for i in range(epochs):
            print("\nTraining epoch\n")
            print("Preparing data loaders...")
            if i % 2 == 0:
                transform = transform_1
            elif i % 2 == 0 and i % 7 == 0:
                transform = transform_3# If divisible by 2
            else:
                transform = transform_2  # Otherwise
            trainloader, testloader = prepare_data(
            
            batch_size=64,
            x_train=train_images,
            y_train=train_labels,
            x_test=test_images,
            y_test=test_labels,
            transform=transform
            )
           
            accuracy_train,train_loss,precision_train,recall_train,f1_train = self.train_epoch(trainloader,steps_per_epoch,augment,scaler)
            
            accuracy_test, test_loss,precision_test,recall_test,f1_test = self.evaluate(testloader)
            print("\nEvaluation Completed\n")
            train_losses.append(train_loss)
            test_losses.append(test_loss)
            test_accuracies.append(accuracy_test)
            train_accuracies.append(accuracy_train)
            test_precision.append(precision_test)
            train_precision.append(precision_train)
            test_recall.append(recall_test)
            train_recall.append(recall_train)
            test_f1.append(f1_test)
            train_f1.append(f1_train)
            is_best_loss = test_loss < best_test_loss
            is_best_accuracy = accuracy_test > best_accuracy
            if is_best_loss:
                best_test_loss = test_loss
                best_epoch = i + 1
                epochs_no_improvement = 0  # Reset counter
                save_checkpoint(self.exp_name + "-Best-Test-Loss", self.model, best_epoch)
            else:
                epochs_no_improvement += 1
                  
            if is_best_accuracy:# Update best test loss
                best_accuracy = max(accuracy_test, best_accuracy)  # Update best accuracy
                save_checkpoint(self.exp_name + "-Best-Test-Accuracy", self.model, i+1)
            
            if epochs_no_improvement >= 10:
                print(f"Early stopping triggered after {i + 1} epochs without improvement.")
                break  # Stop training if no improvement
   
        save_experiment(self.exp_name, config, self.model, train_losses, test_losses, test_accuracies,train_accuracies,test_precision,train_precision,test_recall,train_recall,test_f1,train_f1)
        plot_metrics(train_losses, test_losses, train_accuracies, test_accuracies,
                 train_precision, test_precision, train_recall, test_recall,
                 train_f1, test_f1, self.exp_name)

            
    def train_epoch(self, trainloader, steps_per_epoch, augment,scaler):
        self.model.train()
        total_loss = 0
        trainloader_iter = itertools.cycle(trainloader)
        correct = 0
        y_true = []
        y_pred = []      # To store all true labels
    # Wrap the range with tqdm for the progress bar
        with tqdm(total=steps_per_epoch, desc='Training', unit='step') as pbar:
            for i in range(steps_per_epoch):
                batch = next(trainloader_iter)
                batch = [t.to(self.device) for t in batch]
                images, labels = batch
                images, labels = images.to(self.device), labels.to(self.device)
                
                images = images.to(torch.float32)
                with autocast():
                    result = self.model(images)
                    loss = self.loss_fn(result, labels)
                self.optimizer.zero_grad()
                
                scaler.scale(loss).backward()
            # Update the model's parameters
                scaler.step(self.optimizer)
                scaler.update()
                
                total_loss += loss.item() * len(images)
                predictions = torch.argmax(result, dim=1)
                y_pred.extend(predictions.cpu().numpy())
                y_true.extend(labels.cpu().numpy())
                correct += torch.sum(predictions == labels).item()
                # Update the progress bar only after 25% of the progress is done
                if (i + 1) % (steps_per_epoch // 4) == 0:  # 25% of total steps
                    pbar.update(1)
            
        # Convert lists to tensors for calculation
        y_true_tensor = torch.tensor(y_true)
        y_pred_tensor = torch.tensor(y_pred)

# Calculating precision, recall, and F1 score using PyTorch
        TP = ((y_pred_tensor == 1) & (y_true_tensor == 1)).sum().item()
        FP = ((y_pred_tensor == 1) & (y_true_tensor == 0)).sum().item()
        FN = ((y_pred_tensor == 0) & (y_true_tensor == 1)).sum().item()

        precision = TP / (TP + FP) if TP + FP > 0 else 0
        recall = TP / (TP + FN) if TP + FN > 0 else 0
        f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0    
        avg_loss = total_loss / len(trainloader.dataset)
        accuracy = correct / len(trainloader.dataset)  # Accuracy in percentage

        return accuracy, avg_loss,precision,recall,f1

    @torch.no_grad()
    def evaluate(self, testloader):
        self.model.eval()
        total_loss = 0
        correct = 0
        y_true = []
        y_pred = []
        with torch.no_grad():
            for batch in testloader:
                # Move the batch to the device
                batch = [t.to(self.device) for t in batch]
                images, labels = batch
                images = images.to(torch.float32)
                with autocast():
                    result = self.model(images)
                    loss = self.loss_fn(result, labels)
                
                total_loss += loss.item() * len(images)
                predictions = torch.argmax(result, dim=1)
                y_pred.extend(predictions.cpu().numpy())
                y_true.extend(labels.cpu().numpy())
                correct += torch.sum(predictions == labels).item()
        # Convert lists to tensors for calculation
        y_true_tensor = torch.tensor(y_true)
        y_pred_tensor = torch.tensor(y_pred)

# Calculating precision, recall, and F1 score using PyTorch
        TP = ((y_pred_tensor == 1) & (y_true_tensor == 1)).sum().item()
        FP = ((y_pred_tensor == 1) & (y_true_tensor == 0)).sum().item()
        FN = ((y_pred_tensor == 0) & (y_true_tensor == 1)).sum().item()

        precision = TP / (TP + FP) if TP + FP > 0 else 0
        recall = TP / (TP + FN) if TP + FN > 0 else 0
        f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
        accuracy = correct / len(testloader.dataset)
        avg_loss = total_loss / len(testloader.dataset)
        return accuracy, avg_loss,precision,recall,f1

The model is doing very well while training and testing on same dataset the execution.

Epoch: 1 
Training Metrics: Accuracy: 0.7357, Loss: 0.5236, Precision: 0.6928, Recall: 0.8479, F1 Score: 0.7625 
Testing Metrics: Accuracy: 0.7672, Loss: 0.4838, Precision: 0.7271, Recall: 0.8556, F1 Score: 0.7861
....
Epoch: 4 
Training Metrics: Accuracy: 0.8031, Loss: 0.4078, Precision: 0.7644, Recall: 0.8772, F1 Score: 0.8169 
Testing Metrics: Accuracy: 0.7494, Loss: 0.4712, Precision: 0.8186, Recall: 0.6408, F1 Score: 0.7189
...
Epoch: 8 
Training Metrics: Accuracy: 0.8529, Loss: 0.3148, Precision: 0.8324, Recall: 0.8845, F1 Score: 0.8577 
Testing Metrics: Accuracy: 0.8280, Loss: 0.4027, Precision: 0.8015, Recall: 0.8720, F1 Score: 0.8352
...
Epoch: 18 
Training Metrics: Accuracy: 0.9284, Loss: 0.1706, Precision: 0.9237, Recall: 0.9346, F1 Score: 0.9292 
Testing Metrics: Accuracy: 0.8008, Loss: 0.5767, Precision: 0.8357, Recall: 0.7488, F1 Score: 0.7899

The model does not over give me the best accuracy and loss at epoch. I have saved the mode at epoch 8 and then keep its accuracy and loss between 79-80.

This model when validated on independent dataset performed poorly.

Validation Metrics:
 - Accuracy: 0.4890
 - Precision: 0.4878
 - Recall: 0.4416
 - F1 Score: 0.4636
 - Confusion Matrix:
[[1341 1159]
 [1396 1104]]

I have also validated it on same dataset as of training and still the accuracy stays same ( even though I gave it same images which I used in training). I have used pretrained weights of the ImageNet ( the original WaveViT was trained on ImageNet and the saved model is present on GitHub) for this WaveViT too, but the result is same.

Please, it will be a great help if someone can help me in resolving this behavior of the model.

Why the validation accuracy even on the same dataset used for training and testing did not improve?

I hope I have explained everything. Please let me know if you need more clarifications.

1 Comment
2024/10/20
05:38 UTC

1

How can my Loss and F1 be correlated? as in, not inversely correlated

https://preview.redd.it/fmv9uxm7fuvd1.png?width=1046&format=png&auto=webp&s=e101196d4a498d058eede3c0d2f5b5f8e26772ca

The image above is my data on learning rate tuning, as you can see, while the differences in f1 is very small, the differences in val loss is quite big, but the best f1 is 1e-5 with the worst val loss, while 1e-6 has the worst f1 while having the best val loss. The same pattern can be seen on another one of my data, with RoBERTa instead of XLNet.

https://preview.redd.it/jfgtl0w3guvd1.png?width=1041&format=png&auto=webp&s=54cb83cf96a7a345801c89a5f06153ff2be8fe9f

For context, the loss function used here is Cross Entropy, with 10 epochs of training, and AdamW optimizer, if that matters.

As this whole process is part of my hyperparameter tuning, I don't know which learning rate should i use, should I focus on loss or f1?.

There might be some problems in my code to cause this problem, or maybe just a wrong methodology, I am quite new to machine learning, so it could just be my mistake.

0 Comments
2024/10/20
05:14 UTC

2

Doubt with PPO

I'm working on a reinforcement learning AI for a car agent, currently using PPO (Proximal Policy Optimization). The car agent needs to navigate toward a target point in a 2D environment, while optimizing for speed, alignment, and correct steering. The project includes a custom physics engine using the Vector2 math class.

Inputs (11):

  1. CarX: Car's X position
  2. CarY: Car's Y position
  3. CarVelocity: Normalized car speed
  4. CarRotation: Normalized car orientation
  5. CarSteer: Normalized steering angle
  6. TargetX: Target point's X position
  7. TargetY: Target point's Y position
  8. TargetDistance: Distance to the target
  9. TargetAngle: Normalized angle between the car's direction and the target
  10. LocalX: Target's relative X position (left/right of the car)
  11. LocalY: Normalized target's relative Y position (front/behind the car)

Outputs (2):

  • Steering angle (left/right)
  • Acceleration (forward)

Current Reward System:

  • Positive rewards for good alignment with the target.
  • Positive rewards for speed and avoiding reverse.
  • Positive rewards for being close to the target.
  • Positive rewards for steering in the correct direction based on the target's relative position.
  • Special cases to discourage wrong turns and terminate episodes after 1000 steps or if the distance exceeds 2000 units.

Problems I'm Facing:

  1. No Reverse: PPO prevents the car from reversing, even when it's optimal. I'd like to allow reverse if the target is behind the car.
  2. Reward Tuning: Struggling to balance the reward function. The agent tends to favor speed over precision or gets stuck in certain situations due to conflicting rewards.
  3. Steering Issues: Sometimes the agent struggles to steer correctly, especially when the target is at odd angles (left or right).
  4. Generalization: The model works well in specific scenarios but struggles when I introduce more variability in the target's position and distance.

Any advice on how to improve the reward system or tweak the model to better handle steering and reversing would be greatly appreciated!

0 Comments
2024/10/20
04:52 UTC

2

Is black box optimization considered ML?

I am working on a project where I optimize what I am considering a black box function with PSO (pyswarm to be specific). Whether or not it really is a black box function is another story. It can probably be solved by someone who is better at math than I am. Anyways, I have seen people refer to PSO and SCO algorithms as "machine learning algorithms". Is this correct? there is no model being made, no training, nothing really being "learned". I guess the algorithm does "learn" the topology of the function as it wanders around, but this just doesn't seem to be what is usually meant by machine learning.

3 Comments
2024/10/20
03:28 UTC

0

Various experts in the sector plus Hinton - Noble Prize - have been talking about AGI and ASI to be very soon achieved. How realistic are these prediction?

Edit:these predictions* in plural

By very soon I mean 5-10 years.

The general mood I see on machine learning subreddits is generally less excited, I could understand corporate interest marketing it, however what's conflicting is that Hinton says similar things. Not only him but Bill Gates whom has not a stake anymore in this. Couple more figures.

How could I learn more about machine learning, both to practice for myself tools but also just doing some conceptual learning about the field

4 Comments
2024/10/20
01:39 UTC

1

Getting ValueError: The model did not return a loss from the inputs while training flan-t5-small

Please help me as I am new to this. I am training this below code and getting valueError. unable to understand why i am getting this. Any help is appreciated!

Github repo link: https://github.com/VanekPetr/flan-t5-text-classifier (I cloned it and tried to train it)

Getting error:

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\username\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
  0%|                                                                                                                                        | 0/8892 [00:00<?, ?it/s]Traceback (most recent call last):
  File "C:\projects\flan-t5-text-classifier\classifier\AutoModelForSequenceClassification\flan-t5-finetuning.py", line 122, in <module>
    train()
  File "C:\projects\flan-t5-text-classifier\classifier\AutoModelForSequenceClassification\flan-t5-finetuning.py", line 112, in train
    trainer.train()
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\trainer.py", line 2043, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\trainer.py", line 2388, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\trainer.py", line 3485, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\trainer.py", line 3550, in compute_loss
    raise ValueError(

, only the following keys: logits,past_key_values,encoder_last_hidden_state. For reference, the inputs it received are input_ids,attention_mask.

my python script is below:

import nltk
import numpy as np
from huggingface_hub import HfFolder
from sklearn.metrics import precision_recall_fscore_support
from transformers import (
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
)

import os

import pandas as pd
from datasets import Dataset

ROOT_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

label2id = {"Books": 0, "Clothing & Accessories": 1, "Electronics": 2, "Household": 3}
id2label = {id: label for label, id in label2id.items()}

print(ROOT_DIR)
def load_dataset(model_type: str = "") -> Dataset:
    """Load dataset."""
    dataset_ecommerce_pandas = pd.read_csv(
        ROOT_DIR + "/data/test-train.csv",
        header=None,
        names=["label", "text"],
    )

    dataset_ecommerce_pandas["label"] = dataset_ecommerce_pandas["label"].astype(str)
    if model_type == "AutoModelForSequenceClassification":
        # Convert labels to integers
        dataset_ecommerce_pandas["label"] = dataset_ecommerce_pandas["label"].map(
            label2id
        )

    dataset_ecommerce_pandas["text"] = dataset_ecommerce_pandas["text"].astype(str)
    dataset = Dataset.from_pandas(dataset_ecommerce_pandas)
    dataset = dataset.shuffle(seed=42)
    dataset = dataset.train_test_split(test_size=0.2)
    print(' this is dataset: ', dataset)
    return dataset

MODEL_ID = "google/flan-t5-small"
REPOSITORY_ID = f"{MODEL_ID.split('/')[1]}-ecommerce-text-classification"

config = AutoConfig.from_pretrained(
    MODEL_ID, num_labels=len(label2id), id2label=id2label, label2id=label2id
)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID, config=config)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

training_args = TrainingArguments(
    num_train_epochs=2,
    output_dir=REPOSITORY_ID,
    logging_strategy="steps",
    logging_steps=100,
    report_to="tensorboard",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    fp16=False,  # Overflows with fp16
    learning_rate=3e-4,
    save_strategy="epoch",
    save_total_limit=2,
    load_best_model_at_end=False,
    push_to_hub=True,
    hub_strategy="every_save",
    hub_model_id=REPOSITORY_ID,
    hub_token="hf_token",
)


def tokenize_function(examples) -> dict:
    """Tokenize the text column in the dataset"""
    return tokenizer(examples["text"], padding="max_length", truncation=True)


def compute_metrics(eval_pred) -> dict:
    """Compute metrics for evaluation"""
    logits, labels = eval_pred
    if isinstance(
        logits, tuple
    ):  # if the model also returns hidden_states or attentions
        logits = logits[0]
    predictions = np.argmax(logits, axis=-1)
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, predictions, average="binary"
    )
    return {"precision": precision, "recall": recall, "f1": f1}


def train() -> None:
    """
    Train the model and save it to the Hugging Face Hub.
    """
    dataset = load_dataset("AutoModelForSequenceClassification")
    tokenized_datasets = dataset.map(tokenize_function, batched=True)

    nltk.download("punkt")

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_datasets["train"],
        eval_dataset=tokenized_datasets["test"],
        compute_metrics=compute_metrics,
    )

    # TRAIN
    trainer.train()

    # SAVE AND EVALUATE
    tokenizer.save_pretrained(REPOSITORY_ID)
    trainer.create_model_card()
    trainer.push_to_hub()
    print(trainer.evaluate())


if __name__ == "__main__":
    train()
4 Comments
2024/10/19
23:25 UTC

3

Bachelor's thesis ideas

Hello! I am a senior-year undergraduate student in Applied Mathematics and Artificial Intelligence. For my bachelor's thesis, I want to try developing a machine learning model capable of analyzing medical images and predicting the progression of diseases, such as tumor growth. I was initially considering a CNN+LSTM architecture.

I'm having difficulty selecting a suitable medical dataset that contains sequential images of patients (e.g., series of MRI or CT scans, retinal images, X-rays of knee joints, etc.) that would allow tracking changes over time. Could you recommend any open medical datasets for such a task?

Alternatively, I had another idea for my thesis: to develop a machine learning-based system that analyzes a annotated cranial CT exams using RSNA Intracranial Hemorrhage Detection Dataset because it seems more feasible but i do not know what model or architecture can I use to bring at least a bit of novelty into my research. That is the option I suggested to my research supervisor.

Also there's been an idea to develop a machine learning-based system that analyzes vocalist's data (timbre, range, voice type) and suggests (predicts) songs that match their style, range, and vocal characteristics. How feasible is this?

Perhaps there are simpler ideas for a thesis related to machine learning or computer vision that are suitable for someone starting out in this field ?
Thanks in advance!

0 Comments
2024/10/19
20:01 UTC

1

Neural Network - Times Series

I am trying to predict the FFER. I am getting an error when trying to print the mean squared error. It states "

ValueError: Found input variables with inconsistent numbers of samples: [5975, 4780]". However, I do have a bigger issue: my code is not predicting it correctly and the graph at the bottm of the code is two linear, parallel lines. Since predicitons are wrong, so is this graph. If someone could help me and look at my code, that would be much appreciated. 

Code: https://github.com/bmccoy002/Federal_Funds_Rate
0 Comments
2024/10/19
18:12 UTC

3

In video sythesis, how is video represented as sequence of time and images? Like, how is the time axis represented?

Title

I know 3D convolution works with depth (time in our case), width and height (which is spatial, ideal for images).

Its easy to understand how image is represented as width and height. But how time is represented in videos?

Like, is it like positional encodings? Where you use sinusoidal encoding (also, that gives you unique embeddings, right?)

I read video synthesis papers (started with VideoGPT, I have solid understanding of image synthesis, its for my theisis) but I need to understand first the basics.

3 Comments
2024/10/19
18:10 UTC

2

Question about input embedding in Transformers

I’ve recently been learning about transformer architectures and while there are a lot of things I still don’t understand, one that stands out to me is how the training is actually performed in the input embedding process. So for instance, let’s assume we are talking about a LLM. Each word is initially encoded using essentially a look up table, and this encoded vector is then embedded in a larger abstract vector space with dimension of our choosing. The dimensions do not have any inherent meaning, which I am totally fine accepting. The locations of each word in the this vector space are initially random and as the model trains, the words that share similarities are suppose to get grouped closer together in the vector space. My confusion is how this training is actually done during backpropagation. For instance, the attention mechanism can observe which words are often used together or even used interchangeably and therefore learn their similarity, however the attention weights are a separate set of weights than the input embedding weights. How is this then propagated to the input embedding such that they also learn what was deduced by the attention mechanism? Am I perhaps just misunderstanding how back propagation is performed here? To word this differently, I understand that during gradient descent the contribution from each weight to the overall loss function is calculated, and then the weights are updated using the step size and the descent value, but since the dimensions in the abstract vector space have no inherent meaning, how does one make sense of what “direction” each word needs to move? Does it just move towards the target word or something?

0 Comments
2024/10/19
17:51 UTC

4

Should I interleave sine and cosine embeddings in sinusoidal positional encoding?

I'm trying to implement a sinusoidal positional encoding. I found two solutions that give different encodings. I am wondering if one of them is wrong or both are correct. The only difference is that the second solution interleaves the sine and cosine embeddings. I showcase visual figures of the resulting encodings for both options.

Note: The first solution is used in DDPMs and the second in transformers. Why? Does it matter?

Solution (1):

Non-interleaved

Solution (2):

Interleaved

ps: If you want to check the code it's here https://stackoverflow.com/questions/79103455/should-i-interleave-sin-and-cosine-in-sinusoidal-positional-encoding

3 Comments
2024/10/19
16:51 UTC

1

CNN Hyperparameter Tuning and K-Fold

Hey y'all, I'm currently creating a custom CNN model to classify images. I want to do hyperparameter tuning (like kernel size and filter size) with keras tuner. I also want to cross validate the model using Kfold.

My question is, how do I do this? Do I have to do the tuning first and then kfold separately. Or, do I have to do kfold in each trial of the tuning?

0 Comments
2024/10/19
16:34 UTC

2

Exploring New Tools for My Machine Learning Project

Are there any recent preprocessing techniques, visualization libraries, or classification algorithms that are not yet widely adopted? I'm looking to incorporate cutting-edge methods into my project.

0 Comments
2024/10/19
14:26 UTC

0

Best and appropriate definition of GenAI

Hello Geeks!

Just heading towards GenAI after ML. wondering if I can get simple and accurate definition of GenAI which is interpretable by almost everyone. As well as technically sound. Let me know from your experience.

Thanks in advance

0 Comments
2024/10/19
06:49 UTC

1

Is the double-descent interpolation threshold based on parameters or linear regions?

I'm a bit confused in this part of my college class. In online explanations and textbooks people say that the interpolation threshold tends to be when the number of model parameters equals the number of datapoints, but then they will show a visual aid which shows a simple model that has the same number of linear regions as datapoints... but I know that at least in simple models, each linear region usually corresponds to multiple parameters. Do we know which it is and why that's where the threshold is? Or what I might be misunderstanding?

0 Comments
2024/10/19
04:12 UTC

0

Any feedback ML in cybersecurity

Guys i have a academic project about maching learning for detecting incidents and im lost

Im trying to create a module for risk analysis and attack detection, any feedback please..

3 Comments
2024/10/18
20:34 UTC

1

What is the difference between cross attention and multi-head attention?

3 Comments
2024/10/18
20:34 UTC

1

How do I develop weights?

I'm currently working on a ML algorithm for providing user content based on certain features. I'm not measuring any implicit interaction, but I can't find any resources on how to actually 'weigh' the explicit features' impacts. Any resources or recommendations would be great (I could also elaborate or provide code, just not sure if we're allowed to do so).

2 Comments
2024/10/18
20:19 UTC

1

Split same objects with different colors into multiple classes?

I want to predict chess pieces on a custom dataset. Should I have a class for each piece regardless of color (e.g. pawn, rook, bishop, etc) and then predict the color separately with a simple architecture or should I just have a class for each piece with its color (e.g. w-pawn, b-pawn, w-rook, b-rook, etc)?

I feel like the actual object detection model should focus on the feature of the object rather than the color, but it might be so trivial that I could just split into 2 different classes.

4 Comments
2024/10/18
19:17 UTC

0

Seeking Feedback on My Paper After Rejection from arXiv

[Cross-posted: https://www.reddit.com/r/MachineLearning/comments/1g2fmfw/comment/lsjul5v/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button ]

Hello,

A few days ago, I posted seeking guidance and collaboration in ML research: Seeking Guidance on Breaking into ML Research. Unfortunately, due to a lack of time and researchers willing to collaborate, I decided to write a paper myself. Although the paper was rejected by arXiv, I'm willing to ask for feedback from the community so I can correct it and learn more about the research process.

If anyone has some time to check a short paper (10 pages) and is willing to help me, I'm providing the paper along with the code. Your feedback would be greatly appreciated!

Paper: Scaling Down Transformers: Investigating Emergent Phenomena in Tiny Models

Code: GitHub Repository

This is a simple attempt to write a paper for publishing, and once I understand how scientific literature is written, I hope to produce better and more advanced work in the future. Thank you in advance for your help!

A paper for feedback from the community. First page only.

7 Comments
2024/10/18
16:44 UTC

Back To Top