/r/MLQuestions
A place for beginners to ask stupid questions and for experts to help them! /r/Machine learning is a great subreddit, but it is for interesting articles and news related to machine learning. Here, you can feel free to ask any question regarding machine learning.
What kinds of questions do we want here?
"I've just started with deep nets. What are their strengths and weaknesses?" "What is the current state of the art in speech recognition?" "My data looks like X,Y what type of model should I use?"
If you are well versed in machine learning, please answer any question you feel knowledgeable about, even if they already have answers, and thank you!
Related Subreddits:
/r/MLQuestions
Hi,
I'm trying to implement a sinusoidal positional encoding for DDPM. I found two solutions that compute different embeddings for the same position/timestep with the same embedding dimensions. I am wondering if one of them is wrong or both are correct. DDPMs official source code does not uses the original sinusoidal positional encoding used in transformers paper... why?
1) Original sinusoidal positional encoding from "Attention is all you need" paper.
Original sinusoidal positional encoding
2) Sinusoidal positional encoding used in the official code of DDPM paper
Sinusoidal positional encoding used in official DDPM code. Based on tensor2tensor.
Why does the official code for DDPMs uses a different encoding (option 2) than the original sinusoidal positional encoding used in transformers paper? Is the second option better for DDPMs?
I noticed the sinusoidal positional encoding used in the official DDPM code implementation was borrowed from tensor2tensor. The difference in implementations was even highlighted in one of the PR submissions to the official tensor2tensor implementation. Why did the authors of DDPM used this implementation (option 2) rather than the original from transformers (option 1)?
ps: If you want to check the code it's here https://stackoverflow.com/questions/79103455/should-i-interleave-sin-and-cosine-in-sinusoidal-positional-encoding
You know how tree-based algorithms just do a split. If you think about algorithms like XGBoost, every time you split you are just creating another step in a step function. Step functions have discontinuities and so are not differentiable which makes them a bit harder to optimise.
So I have been thinking, how can I make a tree-based algorithm differentiable? Then I thought why not replace the step function with a differenatiable one? One idea is a cubic spline with only one knot. As we know, at the end of a cubic spline the value just flatlines - this is just a like step function. Also a cubic spline can smooth the transition of the left and right split.
So here's my rough sketch of an XGBoost-like algorithm to build ONE TREE
This algorithm is novel in that it kinda keeps growing the tree from a simple model unlike a neural network where the architecture is fixed at the beginning. With this structure, it organically grows (of course u need a stopping criterion of some kind but yet).
Also because the whole "tree" is differentiable, one can optimise the parameters even further up the tree at any one step which help alleviate the greediness of algorithms like XGBoost where once you've choosen a split point, that split point is there permanent. where as In my cubic spline approach the whole tree's parameters can still be optimised (although it wil be a pain to use so many indicator functions).
Also by making the whole tree differentiable, one can apply lots of techniques from neural networks to optimise things like using RADAM optimisers, or sending batches of data through the network etc etc.
I would find it hard to believe that this is a new approach I came up with but it occured to me that it's a pretty cute way to say "well, even a random feature is doing better than everything else; so stop growing this node any furhter".
Is this a well known idea and has a name?
AI (Gemini specifically) tells that it's a good idea and that it's not aware of a name for it.
What do you think? Do you think it's a good idea or a bad one?
Right now looking for a base line solution. Starting with Video or images of spread out lego pieces.
Any suggestion on a base model, and best way to fine-tune ?
i was suggested that it is the right place for this question so posting here, After gaining my own perspective on ml and working with industry leaders i felt that now i am ready to make in-depth YouTube video telling the overall new story of same old classical ml and then take journey from there to learning by doing projects and comparing different approach, overall resulting in the community of learners. teaching is my passion and giving back to the community is what i have always learned from, in this while doing my research on what are the competitions and how can i thrive as a helping_buddy i feel i might require a lot of video editing skill or may be knowledge of memes as they are quite popular in teaching videos. can you as a reader having read this much tell me what content you usually watch for ml
Hi! I have an idea of using stacking ensemble learning for predicting dengue cases. My dataset contains dates(temporal) and geospatial(geohraphy of barangays). I am also gonna use climate factors and demographics like population and age group, and also historic cases of dengue. For this ensemble model, I want to firstly use LSTM since my data is sequential. My initial is LSTM, random forest, SARIMA, and xgboost as my meta-model. My problem is are these model i initally choose a good combination, and if not, what other models should I incorporate? I really need help.
Hello, I've been trying to build a transformer for predicting certain values from sequences of time series data.
The input features are a sequence of time series data, but divided into "time windows" of a certain sequence length. So 1 input into the network would be like 8 or so features, but ~168 rows of those features in a time series sequence.
The output is just a couple scalar values.
It is set up in pytorch. My question isn't so much about transformers themselves or programming or machine learning architecture, but about a specific phenomenon/problem I keep noticing with the way I organize the data.
The code starts by splitting the data into training, validation, and test data. Because it's time series data, I can't just take all points and then shuffle them and sample, as that would leave parts of windows into other sets. I have to first split the data into 3 segments, for training, validation, and testing. After that, it creates the windows isolated in their segments, then shuffled the windows.
During training, I've noticed that the validation loss is always lower than the training loss on epoch 1. No I know this can be normal, especially when reporting training loss during an epoch, and validation loss at the end of the epoch, as the validation set is like 1/2 and epoch better trained, but this is different.
If I run the code at like 0.00000001(so that the training won't influence the comparison) learning rate, the validation loss will be like half of the training loss(for example, validation at 0.4 and training at 0.7 or so). If I run it 100 times, the validation loss will ALWAYS be significantly lower than the training, which seems like an impossible coincidence especially given that I took training out of the equation.
All of the above happens when I have the data split 60% training, 15% validation, and 15% test. If I change the split to 40% training and 40% validation, the losses instantly start at around the same value. Every time.
Now this would be fine, I could just make the splits even, however just the fact that that happens makes me think that somehow the data splitting or size is influencing the way my code treats the training and validation.
I've tried everything to make the training and validation perform exactly the same to isolate the issue. I've compared the models forwarding behavior on train and eval mode, and they give the same output for the same inputs, so that not it. I've made sure the batch size is identical for both training and evaluating. If the set is split differently only the number of batches differ, making sure they are divisible by the batch size.
It's just hard for me to move on and decelope other parts of the code when I feel like this problem will make all of that not work properly so it doesn't seem like any work I do on it matters unless I figure this out, so does anyone know what can cause this?
I'm generally new to ML. I understand machine learning algorithms and architecture to an intermediate degree. I have a intermediate proficiency in python, however I'm not good enough to implement the entire code myself so I use claude for assistance, however I understand what each part of the code does conceptually(I just can't write it all myself)
I've been learning about ANNs. It seems to me that there's a significant amount of difference between the behaviour of external complexity & internal complexity for them. I couldn't find any online resources summarising the current trend of how these differences are being exploited for either academic or commercial purposes. Please help me understand this topic.
Hi everyone,
I am training a Wave Vision Transformer model. The code for the Wave_ViT is available on the below link.
https://github.com/YehLi/ImageNetModel/blob/main/classification/wavevit.py
I did not change the code for wave_ViT.py and torch_wavelets.py file. The only change I made is in the pipeline of how to provide data to model. My original dataset involves around 38000 MRI images of 256 with RGB format. I augmented this dataset by rotating each image from its original angle to 90, 180, 270 degrees and saved those images. So each image has its 3 rotated copies. Hence, My original dataset increased to about 156000 images of same size and format.
I further saved those images with labels in a numpy.memmap format of uint8 as my code was giving me OOM error when tried to directly load them in an numpy array at once.
I load my memmap in train and test images with labels like this.
def load_memmap_data( train_memmap_file, train_label_memmap_file, test_memmap_file, test_label_memmap_file,num_train_images, num_test_images):
train_images = np.memmap(train_memmap_file, dtype='uint8', mode='r', shape=(num_train_images, 256, 256, 3))
train_labels = np.memmap(train_label_memmap_file, dtype='int32', mode='r', shape=(num_train_images,))
test_images = np.memmap(test_memmap_file, dtype='uint8', mode='r', shape=(num_test_images, 256, 256, 3))
test_labels = np.memmap(test_label_memmap_file, dtype='int32', mode='r', shape=(num_test_images,))
return train_images, train_labels, test_images, test_labels
# Create memory-mapped files for train/test datasets
train_memmap_file = 'train_images.dat'
train_label_memmap_file = 'train_labels.dat'
test_memmap_file = 'test_images.dat'
test_label_memmap_file = 'test_labels.dat'
train_images, train_labels, test_images, test_labels = load_memmap_data(
train_memmap_file=train_memmap_file,
train_label_memmap_file=train_label_memmap_file,
test_memmap_file=test_memmap_file,
test_label_memmap_file=test_label_memmap_file,
num_train_images=num_train_images,
num_test_images=num_test_images
)
My optimizer and call to train function in Trainer class looks like this.
model = WaveViT()
model = nn.DataParallel(model)
optimizer = optim.AdamW(model.parameters(), lr=1e-4, weight_decay=1e-4)
loss_fn = nn.CrossEntropyLoss()
trainer = Trainer(model, optimizer, loss_fn, exp_name="waveViT-256-aug", device=device)
trainer.train(train_images, train_labels, test_images, test_labels, epochs=100, config=None, steps_per_epoch=steps_per_epoch, augment=False)
My Trainer class looks like this. This take the images and labels, augment them to the particular transformation for the epoch, and train and test it on Model.
class Trainer:
def __init__(self, model, optimizer, loss_fn, exp_name, device):
self.model = model.to(device)
self.optimizer = optimizer
self.loss_fn = loss_fn
self.exp_name = exp_name
self.device = device
def train(self, train_images, train_labels, test_images, test_labels, epochs, config = None,steps_per_epoch = 0,augment = False):
train_losses, test_losses, test_accuracies,train_accuracies,test_precision,train_precision,test_recall,train_recall,test_f1,train_f1 = [], [], [],[],[],[],[],[],[],[]
best_test_loss = float('inf') # Initialize with a large value
best_accuracy = 0.0 # Initialize with the worst possible accuracy
scaler = GradScaler()
# Early stopping variables
best_epoch = 0
epochs_no_improvement = 0 # Counter for epochs without improvement
# Train the model
transform_1 = transforms.Compose([
# Rotate the image by 0-40 degrees
transforms.RandomAffine(degrees=(-40,40), shear=15),
transforms.RandomVerticalFlip(p=0.5),
transforms.ToTensor(),])
transform_2 = transforms.Compose([
# Shear with a 20-degree angle
transforms.RandomResizedCrop(size=224, scale=(0.95, 1.0)),
transforms.RandomVerticalFlip(p=0.5),
transforms.RandomAffine(degrees=0, translate=(0.15, 0.15)),
transforms.RandomApply([transforms.ElasticTransform(alpha=30.0)], p=0.3),transforms.ToTensor(),])
transform_3 = transforms.Compose([transforms.ToTensor(),])
for i in range(epochs):
print("\nTraining epoch\n")
print("Preparing data loaders...")
if i % 2 == 0:
transform = transform_1
elif i % 2 == 0 and i % 7 == 0:
transform = transform_3# If divisible by 2
else:
transform = transform_2 # Otherwise
trainloader, testloader = prepare_data(
batch_size=64,
x_train=train_images,
y_train=train_labels,
x_test=test_images,
y_test=test_labels,
transform=transform
)
accuracy_train,train_loss,precision_train,recall_train,f1_train = self.train_epoch(trainloader,steps_per_epoch,augment,scaler)
accuracy_test, test_loss,precision_test,recall_test,f1_test = self.evaluate(testloader)
print("\nEvaluation Completed\n")
train_losses.append(train_loss)
test_losses.append(test_loss)
test_accuracies.append(accuracy_test)
train_accuracies.append(accuracy_train)
test_precision.append(precision_test)
train_precision.append(precision_train)
test_recall.append(recall_test)
train_recall.append(recall_train)
test_f1.append(f1_test)
train_f1.append(f1_train)
is_best_loss = test_loss < best_test_loss
is_best_accuracy = accuracy_test > best_accuracy
if is_best_loss:
best_test_loss = test_loss
best_epoch = i + 1
epochs_no_improvement = 0 # Reset counter
save_checkpoint(self.exp_name + "-Best-Test-Loss", self.model, best_epoch)
else:
epochs_no_improvement += 1
if is_best_accuracy:# Update best test loss
best_accuracy = max(accuracy_test, best_accuracy) # Update best accuracy
save_checkpoint(self.exp_name + "-Best-Test-Accuracy", self.model, i+1)
if epochs_no_improvement >= 10:
print(f"Early stopping triggered after {i + 1} epochs without improvement.")
break # Stop training if no improvement
save_experiment(self.exp_name, config, self.model, train_losses, test_losses, test_accuracies,train_accuracies,test_precision,train_precision,test_recall,train_recall,test_f1,train_f1)
plot_metrics(train_losses, test_losses, train_accuracies, test_accuracies,
train_precision, test_precision, train_recall, test_recall,
train_f1, test_f1, self.exp_name)
def train_epoch(self, trainloader, steps_per_epoch, augment,scaler):
self.model.train()
total_loss = 0
trainloader_iter = itertools.cycle(trainloader)
correct = 0
y_true = []
y_pred = [] # To store all true labels
# Wrap the range with tqdm for the progress bar
with tqdm(total=steps_per_epoch, desc='Training', unit='step') as pbar:
for i in range(steps_per_epoch):
batch = next(trainloader_iter)
batch = [t.to(self.device) for t in batch]
images, labels = batch
images, labels = images.to(self.device), labels.to(self.device)
images = images.to(torch.float32)
with autocast():
result = self.model(images)
loss = self.loss_fn(result, labels)
self.optimizer.zero_grad()
scaler.scale(loss).backward()
# Update the model's parameters
scaler.step(self.optimizer)
scaler.update()
total_loss += loss.item() * len(images)
predictions = torch.argmax(result, dim=1)
y_pred.extend(predictions.cpu().numpy())
y_true.extend(labels.cpu().numpy())
correct += torch.sum(predictions == labels).item()
# Update the progress bar only after 25% of the progress is done
if (i + 1) % (steps_per_epoch // 4) == 0: # 25% of total steps
pbar.update(1)
# Convert lists to tensors for calculation
y_true_tensor = torch.tensor(y_true)
y_pred_tensor = torch.tensor(y_pred)
# Calculating precision, recall, and F1 score using PyTorch
TP = ((y_pred_tensor == 1) & (y_true_tensor == 1)).sum().item()
FP = ((y_pred_tensor == 1) & (y_true_tensor == 0)).sum().item()
FN = ((y_pred_tensor == 0) & (y_true_tensor == 1)).sum().item()
precision = TP / (TP + FP) if TP + FP > 0 else 0
recall = TP / (TP + FN) if TP + FN > 0 else 0
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
avg_loss = total_loss / len(trainloader.dataset)
accuracy = correct / len(trainloader.dataset) # Accuracy in percentage
return accuracy, avg_loss,precision,recall,f1
@torch.no_grad()
def evaluate(self, testloader):
self.model.eval()
total_loss = 0
correct = 0
y_true = []
y_pred = []
with torch.no_grad():
for batch in testloader:
# Move the batch to the device
batch = [t.to(self.device) for t in batch]
images, labels = batch
images = images.to(torch.float32)
with autocast():
result = self.model(images)
loss = self.loss_fn(result, labels)
total_loss += loss.item() * len(images)
predictions = torch.argmax(result, dim=1)
y_pred.extend(predictions.cpu().numpy())
y_true.extend(labels.cpu().numpy())
correct += torch.sum(predictions == labels).item()
# Convert lists to tensors for calculation
y_true_tensor = torch.tensor(y_true)
y_pred_tensor = torch.tensor(y_pred)
# Calculating precision, recall, and F1 score using PyTorch
TP = ((y_pred_tensor == 1) & (y_true_tensor == 1)).sum().item()
FP = ((y_pred_tensor == 1) & (y_true_tensor == 0)).sum().item()
FN = ((y_pred_tensor == 0) & (y_true_tensor == 1)).sum().item()
precision = TP / (TP + FP) if TP + FP > 0 else 0
recall = TP / (TP + FN) if TP + FN > 0 else 0
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
accuracy = correct / len(testloader.dataset)
avg_loss = total_loss / len(testloader.dataset)
return accuracy, avg_loss,precision,recall,f1
The model is doing very well while training and testing on same dataset the execution.
Epoch: 1
Training Metrics: Accuracy: 0.7357, Loss: 0.5236, Precision: 0.6928, Recall: 0.8479, F1 Score: 0.7625
Testing Metrics: Accuracy: 0.7672, Loss: 0.4838, Precision: 0.7271, Recall: 0.8556, F1 Score: 0.7861
....
Epoch: 4
Training Metrics: Accuracy: 0.8031, Loss: 0.4078, Precision: 0.7644, Recall: 0.8772, F1 Score: 0.8169
Testing Metrics: Accuracy: 0.7494, Loss: 0.4712, Precision: 0.8186, Recall: 0.6408, F1 Score: 0.7189
...
Epoch: 8
Training Metrics: Accuracy: 0.8529, Loss: 0.3148, Precision: 0.8324, Recall: 0.8845, F1 Score: 0.8577
Testing Metrics: Accuracy: 0.8280, Loss: 0.4027, Precision: 0.8015, Recall: 0.8720, F1 Score: 0.8352
...
Epoch: 18
Training Metrics: Accuracy: 0.9284, Loss: 0.1706, Precision: 0.9237, Recall: 0.9346, F1 Score: 0.9292
Testing Metrics: Accuracy: 0.8008, Loss: 0.5767, Precision: 0.8357, Recall: 0.7488, F1 Score: 0.7899
The model does not over give me the best accuracy and loss at epoch. I have saved the mode at epoch 8 and then keep its accuracy and loss between 79-80.
This model when validated on independent dataset performed poorly.
Validation Metrics:
- Accuracy: 0.4890
- Precision: 0.4878
- Recall: 0.4416
- F1 Score: 0.4636
- Confusion Matrix:
[[1341 1159]
[1396 1104]]
I have also validated it on same dataset as of training and still the accuracy stays same ( even though I gave it same images which I used in training). I have used pretrained weights of the ImageNet ( the original WaveViT was trained on ImageNet and the saved model is present on GitHub) for this WaveViT too, but the result is same.
Please, it will be a great help if someone can help me in resolving this behavior of the model.
Why the validation accuracy even on the same dataset used for training and testing did not improve?
I hope I have explained everything. Please let me know if you need more clarifications.
The image above is my data on learning rate tuning, as you can see, while the differences in f1 is very small, the differences in val loss is quite big, but the best f1 is 1e-5 with the worst val loss, while 1e-6 has the worst f1 while having the best val loss. The same pattern can be seen on another one of my data, with RoBERTa instead of XLNet.
For context, the loss function used here is Cross Entropy, with 10 epochs of training, and AdamW optimizer, if that matters.
As this whole process is part of my hyperparameter tuning, I don't know which learning rate should i use, should I focus on loss or f1?.
There might be some problems in my code to cause this problem, or maybe just a wrong methodology, I am quite new to machine learning, so it could just be my mistake.
I'm working on a reinforcement learning AI for a car agent, currently using PPO (Proximal Policy Optimization). The car agent needs to navigate toward a target point in a 2D environment, while optimizing for speed, alignment, and correct steering. The project includes a custom physics engine using the Vector2
math class.
Inputs (11):
Outputs (2):
Current Reward System:
Problems I'm Facing:
Any advice on how to improve the reward system or tweak the model to better handle steering and reversing would be greatly appreciated!
I am working on a project where I optimize what I am considering a black box function with PSO (pyswarm to be specific). Whether or not it really is a black box function is another story. It can probably be solved by someone who is better at math than I am. Anyways, I have seen people refer to PSO and SCO algorithms as "machine learning algorithms". Is this correct? there is no model being made, no training, nothing really being "learned". I guess the algorithm does "learn" the topology of the function as it wanders around, but this just doesn't seem to be what is usually meant by machine learning.
Edit:these predictions* in plural
By very soon I mean 5-10 years.
The general mood I see on machine learning subreddits is generally less excited, I could understand corporate interest marketing it, however what's conflicting is that Hinton says similar things. Not only him but Bill Gates whom has not a stake anymore in this. Couple more figures.
How could I learn more about machine learning, both to practice for myself tools but also just doing some conceptual learning about the field
Please help me as I am new to this. I am training this below code and getting valueError. unable to understand why i am getting this. Any help is appreciated!
Github repo link: https://github.com/VanekPetr/flan-t5-text-classifier (I cloned it and tried to train it)
Getting error:
[nltk_data] Downloading package punkt to
[nltk_data] C:\Users\username\AppData\Roaming\nltk_data...
[nltk_data] Package punkt is already up-to-date!
0%| | 0/8892 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\projects\flan-t5-text-classifier\classifier\AutoModelForSequenceClassification\flan-t5-finetuning.py", line 122, in <module>
train()
File "C:\projects\flan-t5-text-classifier\classifier\AutoModelForSequenceClassification\flan-t5-finetuning.py", line 112, in train
trainer.train()
File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\trainer.py", line 2043, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\trainer.py", line 2388, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\trainer.py", line 3485, in training_step
loss = self.compute_loss(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\username\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\trainer.py", line 3550, in compute_loss
raise ValueError(
, only the following keys: logits,past_key_values,encoder_last_hidden_state. For reference, the inputs it received are input_ids,attention_mask.
my python script is below:
import nltk
import numpy as np
from huggingface_hub import HfFolder
from sklearn.metrics import precision_recall_fscore_support
from transformers import (
AutoConfig,
AutoModelForSequenceClassification,
AutoTokenizer,
Trainer,
TrainingArguments,
)
import os
import pandas as pd
from datasets import Dataset
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
label2id = {"Books": 0, "Clothing & Accessories": 1, "Electronics": 2, "Household": 3}
id2label = {id: label for label, id in label2id.items()}
print(ROOT_DIR)
def load_dataset(model_type: str = "") -> Dataset:
"""Load dataset."""
dataset_ecommerce_pandas = pd.read_csv(
ROOT_DIR + "/data/test-train.csv",
header=None,
names=["label", "text"],
)
dataset_ecommerce_pandas["label"] = dataset_ecommerce_pandas["label"].astype(str)
if model_type == "AutoModelForSequenceClassification":
# Convert labels to integers
dataset_ecommerce_pandas["label"] = dataset_ecommerce_pandas["label"].map(
label2id
)
dataset_ecommerce_pandas["text"] = dataset_ecommerce_pandas["text"].astype(str)
dataset = Dataset.from_pandas(dataset_ecommerce_pandas)
dataset = dataset.shuffle(seed=42)
dataset = dataset.train_test_split(test_size=0.2)
print(' this is dataset: ', dataset)
return dataset
MODEL_ID = "google/flan-t5-small"
REPOSITORY_ID = f"{MODEL_ID.split('/')[1]}-ecommerce-text-classification"
config = AutoConfig.from_pretrained(
MODEL_ID, num_labels=len(label2id), id2label=id2label, label2id=label2id
)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_ID, config=config)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
training_args = TrainingArguments(
num_train_epochs=2,
output_dir=REPOSITORY_ID,
logging_strategy="steps",
logging_steps=100,
report_to="tensorboard",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
fp16=False, # Overflows with fp16
learning_rate=3e-4,
save_strategy="epoch",
save_total_limit=2,
load_best_model_at_end=False,
push_to_hub=True,
hub_strategy="every_save",
hub_model_id=REPOSITORY_ID,
hub_token="hf_token",
)
def tokenize_function(examples) -> dict:
"""Tokenize the text column in the dataset"""
return tokenizer(examples["text"], padding="max_length", truncation=True)
def compute_metrics(eval_pred) -> dict:
"""Compute metrics for evaluation"""
logits, labels = eval_pred
if isinstance(
logits, tuple
): # if the model also returns hidden_states or attentions
logits = logits[0]
predictions = np.argmax(logits, axis=-1)
precision, recall, f1, _ = precision_recall_fscore_support(
labels, predictions, average="binary"
)
return {"precision": precision, "recall": recall, "f1": f1}
def train() -> None:
"""
Train the model and save it to the Hugging Face Hub.
"""
dataset = load_dataset("AutoModelForSequenceClassification")
tokenized_datasets = dataset.map(tokenize_function, batched=True)
nltk.download("punkt")
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["test"],
compute_metrics=compute_metrics,
)
# TRAIN
trainer.train()
# SAVE AND EVALUATE
tokenizer.save_pretrained(REPOSITORY_ID)
trainer.create_model_card()
trainer.push_to_hub()
print(trainer.evaluate())
if __name__ == "__main__":
train()
Hello! I am a senior-year undergraduate student in Applied Mathematics and Artificial Intelligence. For my bachelor's thesis, I want to try developing a machine learning model capable of analyzing medical images and predicting the progression of diseases, such as tumor growth. I was initially considering a CNN+LSTM architecture.
I'm having difficulty selecting a suitable medical dataset that contains sequential images of patients (e.g., series of MRI or CT scans, retinal images, X-rays of knee joints, etc.) that would allow tracking changes over time. Could you recommend any open medical datasets for such a task?
Alternatively, I had another idea for my thesis: to develop a machine learning-based system that analyzes a annotated cranial CT exams using RSNA Intracranial Hemorrhage Detection Dataset because it seems more feasible but i do not know what model or architecture can I use to bring at least a bit of novelty into my research. That is the option I suggested to my research supervisor.
Also there's been an idea to develop a machine learning-based system that analyzes vocalist's data (timbre, range, voice type) and suggests (predicts) songs that match their style, range, and vocal characteristics. How feasible is this?
Perhaps there are simpler ideas for a thesis related to machine learning or computer vision that are suitable for someone starting out in this field ?
Thanks in advance!
I am trying to predict the FFER. I am getting an error when trying to print the mean squared error. It states "
ValueError: Found input variables with inconsistent numbers of samples: [5975, 4780]". However, I do have a bigger issue: my code is not predicting it correctly and the graph at the bottm of the code is two linear, parallel lines. Since predicitons are wrong, so is this graph. If someone could help me and look at my code, that would be much appreciated.
Code: https://github.com/bmccoy002/Federal_Funds_Rate
Title
I know 3D convolution works with depth (time in our case), width and height (which is spatial, ideal for images).
Its easy to understand how image is represented as width and height. But how time is represented in videos?
Like, is it like positional encodings? Where you use sinusoidal encoding (also, that gives you unique embeddings, right?)
I read video synthesis papers (started with VideoGPT, I have solid understanding of image synthesis, its for my theisis) but I need to understand first the basics.
I’ve recently been learning about transformer architectures and while there are a lot of things I still don’t understand, one that stands out to me is how the training is actually performed in the input embedding process. So for instance, let’s assume we are talking about a LLM. Each word is initially encoded using essentially a look up table, and this encoded vector is then embedded in a larger abstract vector space with dimension of our choosing. The dimensions do not have any inherent meaning, which I am totally fine accepting. The locations of each word in the this vector space are initially random and as the model trains, the words that share similarities are suppose to get grouped closer together in the vector space. My confusion is how this training is actually done during backpropagation. For instance, the attention mechanism can observe which words are often used together or even used interchangeably and therefore learn their similarity, however the attention weights are a separate set of weights than the input embedding weights. How is this then propagated to the input embedding such that they also learn what was deduced by the attention mechanism? Am I perhaps just misunderstanding how back propagation is performed here? To word this differently, I understand that during gradient descent the contribution from each weight to the overall loss function is calculated, and then the weights are updated using the step size and the descent value, but since the dimensions in the abstract vector space have no inherent meaning, how does one make sense of what “direction” each word needs to move? Does it just move towards the target word or something?
I'm trying to implement a sinusoidal positional encoding. I found two solutions that give different encodings. I am wondering if one of them is wrong or both are correct. The only difference is that the second solution interleaves the sine and cosine embeddings. I showcase visual figures of the resulting encodings for both options.
Note: The first solution is used in DDPMs and the second in transformers. Why? Does it matter?
Solution (1):
Solution (2):
ps: If you want to check the code it's here https://stackoverflow.com/questions/79103455/should-i-interleave-sin-and-cosine-in-sinusoidal-positional-encoding
Hey y'all, I'm currently creating a custom CNN model to classify images. I want to do hyperparameter tuning (like kernel size and filter size) with keras tuner. I also want to cross validate the model using Kfold.
My question is, how do I do this? Do I have to do the tuning first and then kfold separately. Or, do I have to do kfold in each trial of the tuning?
Are there any recent preprocessing techniques, visualization libraries, or classification algorithms that are not yet widely adopted? I'm looking to incorporate cutting-edge methods into my project.
Hello Geeks!
Just heading towards GenAI after ML. wondering if I can get simple and accurate definition of GenAI which is interpretable by almost everyone. As well as technically sound. Let me know from your experience.
Thanks in advance
I'm a bit confused in this part of my college class. In online explanations and textbooks people say that the interpolation threshold tends to be when the number of model parameters equals the number of datapoints, but then they will show a visual aid which shows a simple model that has the same number of linear regions as datapoints... but I know that at least in simple models, each linear region usually corresponds to multiple parameters. Do we know which it is and why that's where the threshold is? Or what I might be misunderstanding?
Guys i have a academic project about maching learning for detecting incidents and im lost
Im trying to create a module for risk analysis and attack detection, any feedback please..
I'm currently working on a ML algorithm for providing user content based on certain features. I'm not measuring any implicit interaction, but I can't find any resources on how to actually 'weigh' the explicit features' impacts. Any resources or recommendations would be great (I could also elaborate or provide code, just not sure if we're allowed to do so).
I want to predict chess pieces on a custom dataset. Should I have a class for each piece regardless of color (e.g. pawn, rook, bishop, etc) and then predict the color separately with a simple architecture or should I just have a class for each piece with its color (e.g. w-pawn, b-pawn, w-rook, b-rook, etc)?
I feel like the actual object detection model should focus on the feature of the object rather than the color, but it might be so trivial that I could just split into 2 different classes.
Hello,
A few days ago, I posted seeking guidance and collaboration in ML research: Seeking Guidance on Breaking into ML Research. Unfortunately, due to a lack of time and researchers willing to collaborate, I decided to write a paper myself. Although the paper was rejected by arXiv, I'm willing to ask for feedback from the community so I can correct it and learn more about the research process.
If anyone has some time to check a short paper (10 pages) and is willing to help me, I'm providing the paper along with the code. Your feedback would be greatly appreciated!
Paper: Scaling Down Transformers: Investigating Emergent Phenomena in Tiny Models
Code: GitHub Repository
This is a simple attempt to write a paper for publishing, and once I understand how scientific literature is written, I hope to produce better and more advanced work in the future. Thank you in advance for your help!