/r/MachineLearning

2,930,362 Subscribers

6

[D] Monthly Who's Hiring and Who wants to be Hired?

For Job Postings please use this template

Hiring: [Location], Salary:[], [Remote | Relocation], [Full Time | Contract | Part Time] and [Brief overview, what you're looking for]

For Those looking for jobs please use this template

Want to be Hired: [Location], Salary Expectation:[], [Remote | Relocation], [Full Time | Contract | Part Time] Resume: [Link to resume] and [Brief overview, what you're looking for]

Please remember that this community is geared towards those with experience.

0 Comments
2024/10/31
02:30 UTC

2

[D] good second tier conference/journal on medical image analysis?

I think the method is solid or at least borderline for TMI (transactions on medical imaging) level journal but the experiments are not thorough enough. It got rejected by TMI and I’m gonna graduate soon so I’m trying to not put too much more extra effort into the paper. I’m only familiar with the top tier journals and conferences and a few of the second tier ones but none seems to be a good fit of the paper.

So I’m looking for a journal that takes some medical imaging related papers with a reasonable turnaround time or a conference with a deadline coming up, focusing on technical novelty but being not to rigorous on the experiments.

Thanks very much in advance!

6 Comments
2024/10/30
21:46 UTC

2

[P] PyTorch Quantization of model parameters for deployment on edge device

Essentially, I have a trained model (using PyTorch) that I want to deploy on an edge device (all written in C/C++) for inference. Just for context (I'm working alone on this project so I don't get much guidance). My understanding is that, at deployment, the input (inference data needs to be integers), and my model's parameters (weights and bias/activation) also need to be integers. Because I don't have "inference data", I am currently testing out my implementation/prototyping by quantizing my validation/test data and comparing the validation/test results I get using the floating point model parameters vs the results I get using quantized/integer model parameters. To make this more concrete (or succinct), I'm testing with two cases:

  • Case 1: floating point model called on floating point train and test data.

  • Case 2: quantized int model parameters called on quantized test data.

              def quantize_tensor(tensor, num_bits): 
                  qmin = - (2 ** (num_bits - 1)) 
                  qmax = (2 ** (num_bits - 1)) - 1 
                  min_val, max_val = tensor.min(), tensor.max() 
                  scale = (max_val - min_val) / (qmax - qmin)
                  zero_point = qmin - min_val / scale
                  zero_point = torch.round(zero_point).clamp(qmin, qmax)
    
                  q_tensor = torch.round(tensor/scale+zero_point).clamp(qmin, qmax)
    
                  if num_bits == 8:
                      q_tensor = q_tensor.type(torch.int8)
                  elif num_bits == 16:
                      q_tensor = q_tensor.type(torch.int16)
                  else:
                      q_tensor = q_tensor.type(torch.int)
                  
                  return q_tensor, scale, zero_point

Then I quantize the model's weights and the bias using this:

            def quantize_model(model, weight_bit_width=16, bias_bit_width=16):
                quantized_state_dict = {}
                scale_zp_dict = {}  # To store scale and zero-point for each parameter

                for name, param in model.state_dict().items():
                    if 'weight' in name:
                        q_param, scale, zero_point = quantize_tensor(param, weight_bit_width)
                        quantized_state_dict[name] = q_param
                        scale_zp_dict[name] = (scale, zero_point)
                    elif 'bias' in name:
                        q_param, scale, zero_point = quantize_tensor(param, bias_bit_width)
                        quantized_state_dict[name] = q_param
                        scale_zp_dict[name] = (scale, zero_point)
                    else:
                        # For other parameters, keep them as is or apply appropriate quantization
                        quantized_state_dict[name] = param

                return quantized_state_dict, scale_zp_dict

Furthermore, I quantize my model and the data like so (see code below) however, because my ML Problem is a multiclass and multioutput problem, I need to call torch.softmax on the logits I get out of my model, so I can get prediction probabilities but the softmax function doesn't support integers (or technically is not implemented for ints) which makes me worried that my overally quantization approach is wrong (I add the model's code and extra below):

import copy 

class model(nn.Module):
    def __init__(self, inputs, l1, l2, num_outputs, output_classes=3):
        super().__init__()

        # define the layers
        self.output_classes = output_classes
        self.num_outputs = num_outputs

        self.layers = nn.Sequential(
            nn.Linear(inputs, l1),
            nn.ReLU(),
            nn.Linear(l1, l2),
            nn.ReLU(),
            nn.Linear(l2, num_outputs * output_classes),  # output_classes = number of classes in each output
        )

    def forward(self, x):
        x = self.layers(x)
        x = x.view(-1, self.output_classes, self.num_outputs)  # Reshapes output tensor (logits output).
        return x


model_copy = copy.deepcopy(floating_point_trained_model)

# quantize model params
quantized_state_dict, scale_zp_dict = quantize_model(model_copy, weight_bit_width=16, bias_bit_width=16)
for name, param in model_copy.named_parameters():
    param.requires_grad = False
    param.data = quantized_state_dict[name].to(dtype=torch.float) # <--- Need help here: Casting to float to satisfy softmax requirements 

# Quantize data 
Quant_X_train, scale, zp = quantize_tensor(X_train, 16) # can make your X_train
Quant_X_test, test_scale, test_zp = quantize_tensor(X_test, 16) # can make your X_test

# call quantized model on quantized input data 
pred_probs = torch.softmax(model_copy(Quant_X_test.to(torch.float), dim = 1) # <---Need Help: Casting to float to get prediction probabilities 
predictions = torch.argmax(pred_probs, dim=1) 

I'm curious about a few things:

  • If this is the correct process/way to approach this problem.
    • Especially because I am not able to call softmax on my int tensor, I feel I might be doing something wrong.
  • If I implemented the quantization procedures accurately
    • i.e. does my method of verifying make sense (the method being: comparing the results between case 1 and case 2 above)
  • If anyone has some guidance about how to approach this problem (or sample examples/tutorials) that'll be great. I have perused PyTorch's quantization mode support

If it helps, this is an example of what my training data looks like:

0      0.995231  0.996840  1.000000  0.998341  1.000000  1.000000  1.000000  0.998709  ...         0.000024         0.000019         0.000015         0.000016         0.000011         0.000007         0.000007         0.000015
1      0.996407  0.998568  1.000000  0.997889  1.000000  0.999954  0.999738  0.997458  ...         0.000018         0.000013         0.000011         0.000012         0.000008         0.000005         0.000006         0.000009
2      0.996083  0.999702  1.000000  0.999031  1.000000  1.000000  0.999816  0.998727  ...         0.000019         0.000013         0.000012         0.000011         0.000008         0.000006         0.000006         0.000011
3      0.998531  0.999481  0.999199  1.000000  0.999720  1.000000  1.000000  0.998682  ...         0.000015         0.000011         0.000010         0.000010         0.000007         0.000005         0.000004         0.000007
1 Comment
2024/10/30
20:34 UTC

73

[D] I’m an ML/programming educator - I was invited as ceo of codesmith to Berlin Global Dialogue (tech/AI insider conference) - see what they said behind closed doors - AMA

Edit (5pm PT): Thanks so much all for really great questions - I'm going to pause now but will take a look over next 24 hours and try to answer any more questions. V grateful for chance to do this and to others who helped answer some of the Qs too from their perspective (shoutout u/Rebeleleven)

--

I recently had the opportunity to attend the Berlin Global Dialogue, which has been likened to Davos but with a stronger focus on technology and AI . The lineup was impressive: Hermann Hauser, the founder of ARM, executives from OpenAI and ASML, and a mix of founders from emerging startups tackling everything from quantum ML to supply chain optimization. Even leaders like President Macron and the German Vice Chancellor were there, engaging with critical tech issues that impact us all.

As the CEO of Codesmith – a small, independent tech school with a data science and machine learning research group (last year we contributed to TensorFlow) – I was invited to announce our latest endeavor: Codesmith’s AI & ML Technical Leadership Program.

I shared this experience in an AMA on r/technology and had a great conversation—but the depth of questions around ML/AI didn’t quite match what I’d hoped to explore. I spoke to the mods here and am grateful for them supporting this AMA. 

Proof: https://imgur.com/a/bYkUiE7

My real passion, inherited from my parents who were both educators, is teaching and making ML more accessible to a broader audience. I’m currently developing an AI/ML workshop for Frontend Masters, and I want to hear from those navigating the ML field. What’s the biggest challenge you're facing in this space?

A few of my takeaways from the event:

  • Chip manufacturers are shifting to new architectures rather than further miniaturization due to physical limits. High-bandwidth memory (HBM) is a central focus for future roadmaps.
  • Europe is fixated on finding a ‘tech champion,’ but there's a distinct emphasis on core industries rather than consumer internet—think ASML and ARM.
  • Quantum ML is gaining momentum and receiving government support, particularly for applications like climate forecasting (e.g., Germany’s Klim-QML initiative). While promising, these efforts are still in the prototype phase.
  • There was also, candidly, a lot of talk without much substance. Even OpenAI execs demonstrated a need for more leaders with deep technical insights.

Looking forward to diving deeper into these issues and the broader challenges in ML/AI in an AMA!

43 Comments
2024/10/30
19:31 UTC

2

[D] Is there a preferred data distribution for CNN models to handle?

Hi everyone,
Do you know of any works that analyzed the performance of a CNN based model w.r.t the training data distribution? meaning, are some distribution easier to the model to learn its task on than others?
For example, let's say I'm training a model to do object detection on images. I see that day images get better performance than night images (same amount of data). I wonder if I can explain this in some analytical way.
Thanks!

4 Comments
2024/10/30
19:24 UTC

2

[D] Local LLaMA based LLM for Technical Document Search | Help!

I wanted to make an LLM that could search through around 60k technical documents (about 50000 characters each) and could retrieve information from them semantically. The final model I envisioned would know those technical documents and I could just prompt the model to find me something similar to the information it already knew or something exact.

  • My initial approach was to fine tune the LLM with these documents and then query it. But after researching I got to know that it is very resource heavy and the model often hallucinates a lot.
  • I came across RAG and Sematic RAG recently and I'm currently reading about it. Could it work for my use case? Or anything else that you can suggest? One Issue in my mind for RAG was that let's say I ask the model something vague and the vector database returned top k Nearest Neighbor Vectors and I pass that onto my LLM with the Original Prompt. What if the information was not completely there in the Top K Nearest Neighbors or if the Context Window for the LLM is not big enough?
  • Another issue was that with RAG wouldn't inference with the LLM become a lot more resource heavy due to a large input token count.

Could you guys comment on anything in it?

PS: I know this is a large question. I'm a bit new to ML and NLP and learning about it. Also sorry about my English, I'm not a native speaker.

1 Comment
2024/10/30
18:22 UTC

0

[R] Our results experimenting with different training objectives for an AI evaluator

Hey r/MachineLearning!

Lots of research has been published around LLM-as-a-judge as it's becoming a popular approach to evaluate cheap + fast.

A pretty cool paper that recently came out was from the Salesforce AI Research team; tldr: they found preference optimisation techniques like DPO and RPO could yield better results than supervised fine-tuning (SFT) alone as a training objective for LLM-as-a-judge models. We wanted to test this hypothesis as it it's not yet clear which training objective performs best for aligning eval models..

Our experiments

We trained a Llama-3.1-70B-Instruct with SFT and compared it to base Llama-3.1-70B-Instruct on core benchmarks to see how SFT fares alone.

We also trained a Llama-3.1-8B-Instruct model on two training datasets with

  1. Purely SFT
  2. DPO
  3. RPO (compound loss objective incorporates both SFT and DPO)

and compared their performance against the base model across four core benchmarks.

Here's a summary of our key findings:

https://preview.redd.it/755s8f3rnjxd1.png?width=1423&format=png&auto=webp&s=e7841d170d27629b5f347dc64449250df6a12614

  • DPO performed best on the on PreferenceCollection with 98.89% accuracy
  • RPO performed best on RewardBench with 81.96% accuracy
  • RPO outperformed both SFT and DPO on UltraFeedback (No CoT), with a score of 0.57
  • RPO achieved the highest average Pearson correlation on evaluation scores (0.49), compared to SFT (0.43) and DPO (0.43)

https://preview.redd.it/ic9fjvlsojxd1.png?width=1453&format=png&auto=webp&s=46b225f6750f6be97f0abca558b020dcbcd13963

  • SFT showed improvements on in-distribution tasks whereas quality dropped on out-of-distribution tasks, underperforming base Llama-70B on aggregate metrics

If you want the details, here's our blog post with extra information on why we think this works. We're working on scaling this up and seeing how far we can push this thing now :)

Open questions for you all

  • Will this trend hold for larger models?
  • What kind of data might be particularly useful for training an LLM-as-a-judge?
0 Comments
2024/10/30
18:09 UTC

30

[D] How do you manage your (read and to-read) research papers?

I'm kind of new to the field of research and over the past year. I've probably read over 100 research papers, but I feel as though I don't retain a lot of the information and I forget a lot of the paper papers that are bread. I'm curious what people who have been in the industry longer used for organization.

I've tried Zotero, but I haven't really been a big fan

21 Comments
2024/10/30
18:01 UTC

5

[R] Torchtune - How to finetune custom models?

I'm wondering how I can get started to finetune my custom model with torchtune lora. Does anyone have any documentation or suggestions?

2 Comments
2024/10/30
15:29 UTC

10

[R] Riemannian Generative Models

Hi everyone,

I’m currently interested in exploring generative models defined over Riemannian manifolds. Though the idea is theoretically appealing, I have trouble understanding the practical motivation behind this approach, and whether any useful/large scale model has been developed lately based on it.

To be more precise, I am looking at the following set of papers.

Generalizing diffusion models to the Riemannian setting :

Riemannian Diffusion Models, Riemannian Score-Based Generative Modelling

Scaling these models:

Scaling Riemannian Diffusion Models

I don’t understand how impactful the experimental results really are, and what the interest for these models are whether in the industry or in the research community. 

If anyone has any thoughts about the interrogations I have, I’d be happy to start a discussion here. I’d be extremely grateful for your insights! Thanks for any help

4 Comments
2024/10/30
14:23 UTC

13

[D] Is INRIA (France) a good place for UG to do ML research internship?

I am a student conducting research related in MAB/Online Algorithm, I see there are really very little people doing this in the USA. However I found there are noticable amount of researcher doing this in INRIA , the one in France if you dont know. Does anyone familar with this insitution? As a undergraduate from non-EU country is it possible for me intern here on voluntary bias during summer break if my goal is get recommendation letter and publish paper?

8 Comments
2024/10/30
14:08 UTC

5

[D] Classification approaches for short text, many categories?

Hi - I am dealing with an issue where I will likely have many thousands of short text snippets (think 2-4 sentences each), and need to assess the extent to which each sentence is consistent with each of about ~200 categories (that is, a piece of text may fit "best" into one category, but it's also possible that a few other categories are "reasonable". Getting huge amounts of text labeled may be an effort, so I'm especially interested in things like few-shot approaches. (Or maybe even a bootstrap approach -- not the statistical technique, the concept -- where we develop a quick and dirty classification model, and use that to assist raters in doing another larger tranche of labelling, faster. Which obviously has potential drawbacks in terms of bias, etc., but may have )

My background is mostly in traditional/Bayesian statistics (think like linear models and factor analysis), so I'm a little out of the loop on good approaches to a task like this. The place this analysis will take place will not have any fancy LLMs, and no access to internet-based platforms (Huggingface, OpenAI, etc.). No GPUs, so any fine-tuning that might be needed has to take that into consideration. The obvious (to me, a-not-NLP person) starting point seems like BERT with a normal classifier. But there's so many variations to BERT, and similar models (Universal Sentence Encoders?)... and I'm not sure which ones are better for short text. I am aware of the huggingface leaderboards, which I've looked over, but it wasn't immediately clear to me which are best for short text classification.

So if anyone has suggestions for thoughts on potential approaches to look into, I'd really appreciate it.

6 Comments
2024/10/30
13:36 UTC

14

[D] COLING 2025 Results / rebuttals

I'll go first.

Soundness: 3,3,4

Overall: 2,2,3

🥺

14 Comments
2024/10/30
11:01 UTC

35

[D] Does anyone here work in healthcare?

I'm curious about the cool things people around the world are doing related to data in this area of work att

15 Comments
2024/10/30
10:50 UTC

1

[P] Opik 1.0: Open source LLM evaluations

Hey all!

My colleagues and I have released version 1.0 of our open source LLM evaluation framework, and I wanted to share it here for feedback/visibility. With this first major release, we've focused on a few key areas:

  • Out-of-the-box implementations of popular LLM-as-a-judge metrics, as well as "traditional" heuristic metrics, along with a clean API for defining custom metrics.
  • Configurable LLM tracing, with a nice UI for visualizing traces/spans. Also supports automatic tracing for OpenAI and LiteLLM.
  • Version-controlled datasets for running eval experiments.

If you have time to check out the repo and share any feedback or questions, I'd really appreciate it. It's still early days, but we've been blown away by the community response so far, and we're excited to get more input as we continue to work on the project.

Repo Link: https://github.com/comet-ml/opik

0 Comments
2024/10/29
19:37 UTC

1

[D] Predicting happiness from survey data

I have a dataset containing survey data with 39 variables, variables such as perfect.physical.health with score -2, -1, 0, 1, 2. Now I want to predict happiness which is a decimal value. How do i approach this problem?

2 Comments
2024/10/30
09:12 UTC

0

[D] Problem with graph based-VAE on molecular dynamics trajectory.

Recently I saw someone post a query regarding Graph based VAE construction on MD trajectory data. Actually I am facing a similar problem as well. This is the code I have generated till now. As I am not a professional coder myself, coming from a chemistry background, I mostly relied on chatbots to generate the code for me, but the problem is the model has some serious problems with the dimensionality.

import numpy as np

import random

import MDAnalysis as mda

import networkx as nx

import torch

import torch.nn as nn

import torch.optim as optim

from torch_geometric.data import Data, DataLoader

from torch_geometric.nn import GCNConv

from Bio.PDB import PDBIO, Structure, Model, Chain, Residue, Atom

import matplotlib.pyplot as plt

from sklearn.model_selection import ParameterGrid

from tqdm import tqdm

import pandas as pd
# Load MD trajectory and select C-alpha atoms

u = mda.Universe('synuclein.top', 'short.nc')

ca_atoms = u.select_atoms("name CA")

# Define the amino acid sequence in three-letter code

sequence_one_letter = "MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKK"

amino_acid_1_to_3 = {

'A': 'ALA', 'C': 'CYS', 'D': 'ASP', 'E': 'GLU', 'F': 'PHE',

'G': 'GLY', 'H': 'HIS', 'I': 'ILE', 'K': 'LYS', 'L': 'LEU',

'M': 'MET', 'N': 'ASN', 'P': 'PRO', 'Q': 'GLN', 'R': 'ARG',

'S': 'SER', 'T': 'THR', 'V': 'VAL', 'W': 'TRP', 'Y': 'TYR'

}

sequence = [amino_acid_1_to_3[aa] for aa in sequence_one_letter]

# One-hot encoding for amino acids

amino_acid_types = {

'ALA': 0, 'CYS': 1, 'ASP': 2, 'GLU': 3, 'PHE': 4,

'GLY': 5, 'HIS': 6, 'ILE': 7, 'LYS': 8, 'LEU': 9,

'MET': 10, 'ASN': 11, 'PRO': 12, 'GLN': 13, 'ARG': 14,

'SER': 15, 'THR': 16, 'VAL': 17, 'TRP': 18, 'TYR': 19

}

# Function to convert amino acid sequence to one-hot encoding

def one_hot_encode(sequence):

num_amino_acids = len(amino_acid_types)

features = np.zeros((len(sequence), num_amino_acids))

for i, aa in enumerate(sequence):

if aa in amino_acid_types:

features[i, amino_acid_types[aa]] = 1

return features

# Generate node features for the amino acid sequence

node_features = one_hot_encode(sequence)

# Define the contact map based on CA distances

threshold_distance = 8.0 # Distance threshold in angstroms

num_amino_acids = len(sequence)

# Prepare data for PyTorch Geometric for all frames

data_list = []

num_frames = len(u.trajectory)

for frame in tqdm(range(num_frames), desc="Processing Frames"):

u.trajectory[frame]

ca_atoms = u.select_atoms("name CA")

# Create a contact graph

contact_graph = nx.Graph()

for i in range(num_amino_acids):

contact_graph.add_node(i, features=node_features[i])

# Add edges based on CA distances

for i in range(num_amino_acids):

for j in range(i + 1, num_amino_acids):

distance = np.linalg.norm(ca_atoms.positions[i] - ca_atoms.positions[j ])

if distance <= threshold_distance:

contact_graph.add_edge(i, j)

# Prepare data for PyTorch Geometric

edge_index = torch.tensor(list(contact_graph.edges), dtype=torch.long).t().contiguous()

x = torch.tensor(node_features, dtype=torch.float)

data = Data(x=x, edge_index=edge_index)

# print(data)

data_list.append(data)

# Plot and save contact map for every 500th frame

if frame % 500 == 0:

contact_map = np.zeros((num_amino_acids, num_amino_acids))

for i, j in contact_graph.edges:

contact_map[i, j] = 1

contact_map[j, i] = 1

plt.imshow(contact_map, cmap='binary')

plt.title(f"Contact Map for Frame {frame}")

plt.xlabel("Residue Index")

plt.ylabel("Residue Index")

plt.savefig(f"contact_map_frame_{frame}.png")

pd.DataFrame(contact_map).to_csv(f"contact_map_frame_{frame}.csv", index=False)

class GCNEncoder(nn.Module):

def __init__(self, in_channels, hidden_channels, num_layers):

super(GCNEncoder, self).__init__()

self.convs = nn.ModuleList()

self.fc_mu = nn.Linear(hidden_channels, hidden_channels)

self.fc_logvar = nn.Linear(hidden_channels, hidden_channels)

# Create multiple GCN layers

for _ in range(num_layers):

self.convs.append(GCNConv(in_channels, hidden_channels))

in_channels = hidden_channels # Update input channels for the next layer

def forward(self, x, edge_index):

for conv in self.convs:

x = conv(x, edge_index)

x = torch.relu(x) # Activation function

mu = self.fc_mu(x)

logvar = self.fc_logvar(x)

return mu, logvar

class GCNDecoder(nn.Module):

def __init__(self, hidden_channels, out_channels):

super(GCNDecoder, self).__init__()

self.fc = nn.Linear(hidden_channels, out_channels)

def forward(self, z):

return torch.sigmoid(self.fc(z))

class GCNVAE(nn.Module):

def __init__(self, in_channels, hidden_channels, out_channels, num_layers):

super(GCNVAE, self).__init__()

self.encoder = GCNEncoder(in_channels, hidden_channels, num_layers)

self.decoder = GCNDecoder(hidden_channels, out_channels)

def reparameterize(self, mu, logvar):

std = torch.exp(0.5 * logvar)

eps = torch.randn_like(std)

return mu + eps * std

def forward(self, x, edge_index):

mu, logvar = self.encoder(x, edge_index)

z_sample = self.reparameterize(mu, logvar)

return self.decoder(z_sample), mu, logvar

def loss_function(recon_x, x, mu, logvar):

BCE = nn.functional.binary_cross_entropy(recon_x, x, reduction='sum')

KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())

return BCE, KLD, BCE + KLD # Return BCE, KLD, and Total Loss

def train_model(model, data_loader, optimizer, epochs, early_stopping_patience=5):

model.train()

best_loss = float('inf')

patience_counter = 0

for epoch in range(epochs):

total_loss = 0

total_bce = 0

total_kld = 0

for data in tqdm(data_loader, desc=f"Training Epoch {epoch+1}/{epochs}"):

optimizer.zero_grad()

recon_batch, mu, logvar = model(data.x, data.edge_index)

bce, kld, total = loss_function(recon_batch, data.x, mu, logvar)

total_loss += total.item()

total_bce += bce.item()

total_kld += kld.item()

total.backward()

optimizer.step()

avg_loss = total_loss / len(data_loader)

avg_bce = total_bce / len(data_loader)

avg_kld = total_kld / len(data_loader)

print(f"Epoch {epoch+1}/{epochs} - Total Loss: {avg_loss:.4f}, BCE Loss: {avg_bce:.4f}, KLD Loss: {avg_kld:.4f}")

# Early stopping

if avg_loss < best_loss:

best_loss = avg_loss

patience_counter = 0

else:

patience_counter += 1

if patience_counter >= early_stopping_patience:

print("Early stopping triggered.")

break

# Create a DataLoader

data_loader = DataLoader(data_list, batch_size=1, shuffle=True)

# Hyperparameter grid

param_grid = {

'hidden_channels': [16, 32, 64],

'num_layers': [2, 3, 4],

'activation_function': ['relu', 'tanh', 'sigmoid'],

'batch_size': [1, 2, 4],

'latent_dimensions': [16, 32, 64],

'learning_rate': [0.001, 0.01, 0.1],

'epochs': [50, 100, 200]

}

# Perform hyperparameter tuning

best_loss = float('inf')

best_params = {}

for params in ParameterGrid(param_grid):

model = GCNVAE(in_channels=20, hidden_channels=params['hidden_channels'], out_channels=20, num_layers=params['num_layers'])

optimizer = optim.Adam(model.parameters(), lr=params['learning_rate'])

print(f"Training with parameters: {params}")

train_model(model, data_loader, optimizer, params['epochs'], early_stopping_patience=5)

# Evaluate the model (using training loss as a proxy)

model.eval()

total_loss = 0

total_bce = 0

total_kld = 0

with torch.no_grad():

for data in data_loader:

recon_batch, mu, logvar = model(data.x, data.edge_index)

bce, kld, total = loss_function(recon_batch, data.x, mu, logvar)

total_loss += total.item()

total_bce += bce.item()

total_kld += kld.item()

avg_loss = total_loss / len(data_loader)

avg_bce = total_bce / len(data_loader)

avg_kld = total_kld / len(data_loader)

print(f"Average loss: {avg_loss:.4f}, BCE Loss: {avg_bce:.4f}, KLD Loss: {avg_kld:.4f}")

if avg_loss < best_loss:

best_loss = avg_loss

best_params = params

print(f"Best parameters: {best_params} with loss: {best_loss}")

# Final training with best parameters

final_model = GCNVAE(in_channels=20, hidden_channels=best_params['hidden_channels'], out_channels=20, num_layers=best_params['num_layers'])

final_optimizer = optim.Adam(final_model.parameters(), lr=best_params['learning_rate'])

train_model(final_model, data_loader, final_optimizer, best_params['epochs'], early_stopping_patience=5)

I know the code is quite long, but I want to know is the code correct? I have a trajectory size of 500 frames, and 97 residues (corresponding to 97 C alpha atoms). Once this code is done, I want to generate protein configurations from the latent space. So I want to ensure that the code is running fine. Thanks a lottt in advance.

3 Comments
2024/10/30
07:26 UTC

5

[R] What's there yet to improve in speech technologies? What's there left in speech research?

Hi everyone, I am currently researching speech technologies as an undergrad, mainly focusing on improving the applications for the visually challenged. I am new to this niche area of research, so I want to pick a research topic that will address some of the existing issues of the current tech. So far, ElevenLabs seem to be the SOTA. I would like to know whether there is anything else to improve in TTS, speech to speech, voice cloning, deepfake audio detection etc., And any insights on ethical issues or the need for guardrails in the future would also be helpful. And due to the availability of low compute resources from uni, I cannot address the research involving scaling or multilingual.

10 Comments
2024/10/30
06:43 UTC

90

[D] How do you structure your codebase and workflow for a new research project?

Suppose you have got a new idea about a solution to a problem in the domain you are working in. How do you go about implementing the thing from the ground up?

What is the general structure of the codebase you construct for your project?

How do you go about iteratively training and testing your solution until you arrive at a final solution where you can write a paper for publication?

Is there any design recipe you follow? Where did you learn it from?

16 Comments
2024/10/30
05:43 UTC

11

[D] Voices Separation Pipeline

Let suppose I have audio from karaoke with

  1. Music
  2. Several voices singing (A, B, C)
  3. Random noise

Let suppose I know exactly how many main sources I have on the tape and I want to

  1. Clear the noise
  2. Extract voice B from the tape and return audio with music and A and B vocals.

I have several questions and appreciate any help.

  1. Are there any models that can help me with such separation (pre-trained / needn’t to be trained)?

  2. If not, I have some ideas about possible solution pipeline and appreciate any comments:

2.1. Separate instrumental music from everything else (what model I can use to do that?) 2.2. Clear noise from audio without music (what model I can use for that?) 2.3. Separate voices (how?) and delete wave I needn’t. 2.4. Put everything I need together back.

1 Comment
2024/10/30
02:40 UTC

10

[P] Open-Source AI Tool for PII Masking

Privacy has always been and will continue to be a threat into the future of technology, especially with AI! AI and privacy are contradictory in nature. AI needs data to learn, but the more data the bigger the risk...

Curious what everyone's thoughts about this are and also sharing a new open-source tool called PII Masker that detects and masks personally identifiable information in text: https://github.com/HydroXai/pii-masker-v1. It’s fairly simple to use and makes protecting sensitive data a bit easier.

Would appreciate any feedback!

0 Comments
2024/10/30
00:32 UTC

11

[D] M4 chips for training ML? (MPS)

Apple is (purposefully) creating a lot of buzz regarding their “Apple Intelligence”, stating that their M4 chips are built for AI.

My question is this, Will this only be helpful for running the built in Apple Intelligence - or is this supposed to vastly improve on MPS when actually training large transformer models etc.? I haven’t heard them mention any improvements on MPS.

30 Comments
2024/10/29
20:10 UTC

5

[R] Bayesian Nonparametrics - Master Thesis Proposal

Hi everyone,

I’m starting planning my Master’s thesis in my Data Science and ML program and could really use some advice on narrowing down my topic. My undergrad thesis was on Bayesian nonparametrics, covering concepts like Dirichlet processes, hierarchical Dirichlet processes, dependent Dirichlet processes, HDP topic models, and Gaussian process regression. Out of everything, I really enjoyed implementing (albeit straightforward) applications of HDP topic modeling—getting hands on was a highlight for me.

For my Master’s, I’m hoping to build on this Bayesian foundation but apply it to something new, ideally in time series analysis or NLP. I want the topic to feel relevant to the field right now and would love suggestions on where Bayesian nonparametrics might add unique value, especially in practical-relevant applications.

One important thing to note is that I’ll be doing most of this work independently, as my department and supervisor aren't particularly relevant to my chosen areas of interest.

If anyone has thoughts on specific areas in NLP or time series that could benefit from a Bayesian approach, or if there are other areas where the Bayesian framework could be effectively utilized, I’d be incredibly grateful for your insights. Thanks so much for any guidance or ideas!

8 Comments
2024/10/29
18:25 UTC

0

[D] "Problem with Graph Based VAE. P.S. I am not a very good programmer !!!"

So, I am trying to generate a a graph based Variational Autoencoder Model (VAE), using smaller trajectories of my protein as input (I have generated multiple small trajectories of my protein at different random seeds). My goal is to see the latent space from the observed trajectories and generate new structures from the region that are less explored, and start MD simulations from these regions.
I have used protein's C alpha atoms as input and calculated adjacency matrix based on contact distance bewteen two C alpha atoms, with a cutoff of 8 angstrom. However I am facing a lot of issues with the dimensionality of the model, like I have 97 residues in my protein and for the test trajectory there are 2500 frames, and with 80:20 split, I have training set (2000,97,97) and validation set (500,97,97). But when I tried to decode the latent point, the decoded dimension was 194,97. this is creating a confusion for me. I am attaching the architecture of the model that I am using. Also the hyperparameters obtained in my case were:

Best Hyperparameters: {'activation_fn': ReLU(), 'batch_size': 2, 'dropout_rate': 0.1, 'epochs': 50, 'hidden_dim': 16, 'latent_dim': 2, 'learning_rate': 0.001, 'num_layers': 2, 'optimizer_type': 'adam', 'weight_decay': 1e-05}

please check them and let me know where am I going wrong. Thanks a lottt in advance.

GraphVAE(
  (gcn_layers): ModuleList(
    (0): GCNConv(97, 16)
    (1): GCNConv(16, 16)
  )
  (fc_mu): Linear(in_features=16, out_features=2, bias=True)
  (fc_logvar): Linear(in_features=16, out_features=2, bias=True)
  (decoder_layers): ModuleList(
    (0): GCNConv(2, 16)
    (1): GCNConv(16, 16)
  )
  (decoder_output): GCNConv(16, 97)
  (activation): ReLU()
)
2 Comments
2024/10/29
14:21 UTC

153

[R] "How to train your VAE" substantially improves the reported results for standard VAE models (ICIP 2024)

https://preview.redd.it/b1dmh67uroxd1.png?width=1025&format=png&auto=webp&s=3d42a65e2c0a946aa307f01886aebedfc4b88b8e

The proposed method redefines the Evidence Lower Bound (ELBO) with a mixture of Gaussians for the posterior probability, introduces a regularization term to prevent variance collapse, and employs a PatchGAN discriminator to enhance texture realism. The main contribution in this work is an ELBO that reduces the collapse of the posterior towards the anterior (observed as the generation of very similar, blurry images)

https://arxiv.org/abs/2309.13160
https://github.com/marianorivera/How2TrainUrVAE

16 Comments
2024/10/29
12:08 UTC

2

[D] Exploring Serverless Solutions for Whisper V3 Turbo Integration

Currently, the serverless solution from Runpod meets my needs in terms of cost and features: https://github.com/runpod-workers/worker-faster_whisper

However, I'm interested in using https://huggingface.co/openai/whisper-large-v3-turbo due to its reported speed.

I'm uncertain about how to set up and run Whisper V3 Turbo on Runpod’s serverless infrastructure.

It seems we might need to wait until the upstream project https://github.com/SYSTRAN/faster-whisper/issues/1030 is updated with Turbo and published on https://pypi.org/project/faster-whisper/.

Only then will this feature be available, and at that point, we could fork https://github.com/runpod-workers/worker-faster_whisper to update it accordingly.

In the meantime, do you know of any cost-effective serverless solutions for using Whisper V3 Turbo?

Thanks.

p/s

Groq offers this service: https://groq.com/whisper-large-v3-turbo-now-available-on-groq-combining-speed-quality-for-speech-recognition/

However, they currently don't accept payments from developers and haven't provided an estimated timeframe for when this might be available.

1 Comment
2024/10/29
05:26 UTC

108

[R] SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time

I am very happy to announce that our paper "SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time" got accepted for WACV2025: https://arxiv.org/abs/2407.15507
Project-Page: https://spotdiffusion.github.io
Code: https://github.com/stanifrolov/spotdiffusion

Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next. This results in coherent, high-resolution images with fewer overall steps. We demonstrate the effectiveness of our approach through qualitative and quantitative evaluations, comparing it with MultiDiffusion, SyncDiffusion, and StitchDiffusion. Our method offers several key benefits, including improved computational efficiency and faster inference times while producing comparable or better image quality.

4 Comments
2024/10/29
02:33 UTC

179

[R] Dynamic Attention-Guided Diffusion for Image Super-Resolution

I'm glad to share that our paper "Dynamic Attention-Guided Diffusion for Image Super-Resolution" got accepted for WACV2025:
https://arxiv.org/abs/2308.07977

The goal of this work was to introduce a new attention-guided diffusion mechanism to focus image refinement on essential areas that benefit the most from deep refinement :)

4 Comments
2024/10/29
02:31 UTC

2

[R] Model suggestion for variable-length output in ML thesis

Hi all, I’m starting my thesis and have basic ML/DL knowledge. I need a model that can take a fixed set of inputs (a snapshot) and output a variable-length vector with real and complex values. I’ve read LSTM might work, but I’m unsure given the fixed input.

Does anyone have recommendations for models or architectures that could work well for this kind of task? Any advice on where to start or resources to check out would be super helpful. Thanks in advance!

4 Comments
2024/10/28
20:58 UTC

64

[R] Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

Paper: https://arxiv.org/abs/2410.14157

I'd be curious to hear expert perspectives on this.

It relates to ideas I find attractive:

  1. Autoregressive generation is limiting in compositional domains, such as reasoning, planning, math.
  2. This explains much of the challenges LLMs have in these domains.
  3. Diffusion might be more efficient in these domains: it learns to generate from the general to the specific. (More like an energy-based model perspective).
  4. It's less likely to get stuck by making specific poor choices, early in its generation process.
19 Comments
2024/10/28
19:42 UTC

Back To Top