/r/MachineLearning
Beginners -> /r/mlquestions , AGI -> /r/singularity, career advices -> /r/cscareerquestions, datasets -> r/datasets
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Andrew Ng and Adam Coates (4/15/2015)
Related Subreddit :
/r/MachineLearning
For Job Postings please use this template
Hiring: [Location], Salary:[], [Remote | Relocation], [Full Time | Contract | Part Time] and [Brief overview, what you're looking for]
For Those looking for jobs please use this template
Want to be Hired: [Location], Salary Expectation:[], [Remote | Relocation], [Full Time | Contract | Part Time] Resume: [Link to resume] and [Brief overview, what you're looking for]
Please remember that this community is geared towards those with experience.
I think the method is solid or at least borderline for TMI (transactions on medical imaging) level journal but the experiments are not thorough enough. It got rejected by TMI and I’m gonna graduate soon so I’m trying to not put too much more extra effort into the paper. I’m only familiar with the top tier journals and conferences and a few of the second tier ones but none seems to be a good fit of the paper.
So I’m looking for a journal that takes some medical imaging related papers with a reasonable turnaround time or a conference with a deadline coming up, focusing on technical novelty but being not to rigorous on the experiments.
Thanks very much in advance!
Essentially, I have a trained model (using PyTorch) that I want to deploy on an edge device (all written in C/C++) for inference. Just for context (I'm working alone on this project so I don't get much guidance). My understanding is that, at deployment, the input (inference data needs to be integers), and my model's parameters (weights and bias/activation) also need to be integers. Because I don't have "inference data", I am currently testing out my implementation/prototyping by quantizing my validation/test data and comparing the validation/test results I get using the floating point model parameters vs the results I get using quantized/integer model parameters. To make this more concrete (or succinct), I'm testing with two cases:
Case 1: floating point model called on floating point train and test data.
Case 2: quantized int model parameters called on quantized test data.
def quantize_tensor(tensor, num_bits):
qmin = - (2 ** (num_bits - 1))
qmax = (2 ** (num_bits - 1)) - 1
min_val, max_val = tensor.min(), tensor.max()
scale = (max_val - min_val) / (qmax - qmin)
zero_point = qmin - min_val / scale
zero_point = torch.round(zero_point).clamp(qmin, qmax)
q_tensor = torch.round(tensor/scale+zero_point).clamp(qmin, qmax)
if num_bits == 8:
q_tensor = q_tensor.type(torch.int8)
elif num_bits == 16:
q_tensor = q_tensor.type(torch.int16)
else:
q_tensor = q_tensor.type(torch.int)
return q_tensor, scale, zero_point
Then I quantize the model's weights and the bias using this:
def quantize_model(model, weight_bit_width=16, bias_bit_width=16):
quantized_state_dict = {}
scale_zp_dict = {} # To store scale and zero-point for each parameter
for name, param in model.state_dict().items():
if 'weight' in name:
q_param, scale, zero_point = quantize_tensor(param, weight_bit_width)
quantized_state_dict[name] = q_param
scale_zp_dict[name] = (scale, zero_point)
elif 'bias' in name:
q_param, scale, zero_point = quantize_tensor(param, bias_bit_width)
quantized_state_dict[name] = q_param
scale_zp_dict[name] = (scale, zero_point)
else:
# For other parameters, keep them as is or apply appropriate quantization
quantized_state_dict[name] = param
return quantized_state_dict, scale_zp_dict
Furthermore, I quantize my model and the data like so (see code below) however, because my ML Problem is a multiclass and multioutput problem, I need to call torch.softmax on the logits I get out of my model, so I can get prediction probabilities but the softmax function doesn't support integers (or technically is not implemented for ints) which makes me worried that my overally quantization approach is wrong (I add the model's code and extra below):
import copy
class model(nn.Module):
def __init__(self, inputs, l1, l2, num_outputs, output_classes=3):
super().__init__()
# define the layers
self.output_classes = output_classes
self.num_outputs = num_outputs
self.layers = nn.Sequential(
nn.Linear(inputs, l1),
nn.ReLU(),
nn.Linear(l1, l2),
nn.ReLU(),
nn.Linear(l2, num_outputs * output_classes), # output_classes = number of classes in each output
)
def forward(self, x):
x = self.layers(x)
x = x.view(-1, self.output_classes, self.num_outputs) # Reshapes output tensor (logits output).
return x
model_copy = copy.deepcopy(floating_point_trained_model)
# quantize model params
quantized_state_dict, scale_zp_dict = quantize_model(model_copy, weight_bit_width=16, bias_bit_width=16)
for name, param in model_copy.named_parameters():
param.requires_grad = False
param.data = quantized_state_dict[name].to(dtype=torch.float) # <--- Need help here: Casting to float to satisfy softmax requirements
# Quantize data
Quant_X_train, scale, zp = quantize_tensor(X_train, 16) # can make your X_train
Quant_X_test, test_scale, test_zp = quantize_tensor(X_test, 16) # can make your X_test
# call quantized model on quantized input data
pred_probs = torch.softmax(model_copy(Quant_X_test.to(torch.float), dim = 1) # <---Need Help: Casting to float to get prediction probabilities
predictions = torch.argmax(pred_probs, dim=1)
I'm curious about a few things:
If it helps, this is an example of what my training data looks like:
0 0.995231 0.996840 1.000000 0.998341 1.000000 1.000000 1.000000 0.998709 ... 0.000024 0.000019 0.000015 0.000016 0.000011 0.000007 0.000007 0.000015
1 0.996407 0.998568 1.000000 0.997889 1.000000 0.999954 0.999738 0.997458 ... 0.000018 0.000013 0.000011 0.000012 0.000008 0.000005 0.000006 0.000009
2 0.996083 0.999702 1.000000 0.999031 1.000000 1.000000 0.999816 0.998727 ... 0.000019 0.000013 0.000012 0.000011 0.000008 0.000006 0.000006 0.000011
3 0.998531 0.999481 0.999199 1.000000 0.999720 1.000000 1.000000 0.998682 ... 0.000015 0.000011 0.000010 0.000010 0.000007 0.000005 0.000004 0.000007
Edit (5pm PT): Thanks so much all for really great questions - I'm going to pause now but will take a look over next 24 hours and try to answer any more questions. V grateful for chance to do this and to others who helped answer some of the Qs too from their perspective (shoutout u/Rebeleleven)
--
I recently had the opportunity to attend the Berlin Global Dialogue, which has been likened to Davos but with a stronger focus on technology and AI . The lineup was impressive: Hermann Hauser, the founder of ARM, executives from OpenAI and ASML, and a mix of founders from emerging startups tackling everything from quantum ML to supply chain optimization. Even leaders like President Macron and the German Vice Chancellor were there, engaging with critical tech issues that impact us all.
As the CEO of Codesmith – a small, independent tech school with a data science and machine learning research group (last year we contributed to TensorFlow) – I was invited to announce our latest endeavor: Codesmith’s AI & ML Technical Leadership Program.
I shared this experience in an AMA on r/technology and had a great conversation—but the depth of questions around ML/AI didn’t quite match what I’d hoped to explore. I spoke to the mods here and am grateful for them supporting this AMA.
Proof: https://imgur.com/a/bYkUiE7
My real passion, inherited from my parents who were both educators, is teaching and making ML more accessible to a broader audience. I’m currently developing an AI/ML workshop for Frontend Masters, and I want to hear from those navigating the ML field. What’s the biggest challenge you're facing in this space?
A few of my takeaways from the event:
Looking forward to diving deeper into these issues and the broader challenges in ML/AI in an AMA!
Hi everyone,
Do you know of any works that analyzed the performance of a CNN based model w.r.t the training data distribution? meaning, are some distribution easier to the model to learn its task on than others?
For example, let's say I'm training a model to do object detection on images. I see that day images get better performance than night images (same amount of data). I wonder if I can explain this in some analytical way.
Thanks!
I wanted to make an LLM that could search through around 60k technical documents (about 50000 characters each) and could retrieve information from them semantically. The final model I envisioned would know those technical documents and I could just prompt the model to find me something similar to the information it already knew or something exact.
Could you guys comment on anything in it?
PS: I know this is a large question. I'm a bit new to ML and NLP and learning about it. Also sorry about my English, I'm not a native speaker.
Hey r/MachineLearning!
Lots of research has been published around LLM-as-a-judge as it's becoming a popular approach to evaluate cheap + fast.
A pretty cool paper that recently came out was from the Salesforce AI Research team; tldr: they found preference optimisation techniques like DPO and RPO could yield better results than supervised fine-tuning (SFT) alone as a training objective for LLM-as-a-judge models. We wanted to test this hypothesis as it it's not yet clear which training objective performs best for aligning eval models..
We trained a Llama-3.1-70B-Instruct with SFT and compared it to base Llama-3.1-70B-Instruct on core benchmarks to see how SFT fares alone.
We also trained a Llama-3.1-8B-Instruct model on two training datasets with
and compared their performance against the base model across four core benchmarks.
If you want the details, here's our blog post with extra information on why we think this works. We're working on scaling this up and seeing how far we can push this thing now :)
I'm kind of new to the field of research and over the past year. I've probably read over 100 research papers, but I feel as though I don't retain a lot of the information and I forget a lot of the paper papers that are bread. I'm curious what people who have been in the industry longer used for organization.
I've tried Zotero, but I haven't really been a big fan
I'm wondering how I can get started to finetune my custom model with torchtune lora. Does anyone have any documentation or suggestions?
Hi everyone,
I’m currently interested in exploring generative models defined over Riemannian manifolds. Though the idea is theoretically appealing, I have trouble understanding the practical motivation behind this approach, and whether any useful/large scale model has been developed lately based on it.
To be more precise, I am looking at the following set of papers.
Generalizing diffusion models to the Riemannian setting :
Riemannian Diffusion Models, Riemannian Score-Based Generative Modelling
Scaling these models:
Scaling Riemannian Diffusion Models
I don’t understand how impactful the experimental results really are, and what the interest for these models are whether in the industry or in the research community.
If anyone has any thoughts about the interrogations I have, I’d be happy to start a discussion here. I’d be extremely grateful for your insights! Thanks for any help
I am a student conducting research related in MAB/Online Algorithm, I see there are really very little people doing this in the USA. However I found there are noticable amount of researcher doing this in INRIA , the one in France if you dont know. Does anyone familar with this insitution? As a undergraduate from non-EU country is it possible for me intern here on voluntary bias during summer break if my goal is get recommendation letter and publish paper?
Hi - I am dealing with an issue where I will likely have many thousands of short text snippets (think 2-4 sentences each), and need to assess the extent to which each sentence is consistent with each of about ~200 categories (that is, a piece of text may fit "best" into one category, but it's also possible that a few other categories are "reasonable". Getting huge amounts of text labeled may be an effort, so I'm especially interested in things like few-shot approaches. (Or maybe even a bootstrap approach -- not the statistical technique, the concept -- where we develop a quick and dirty classification model, and use that to assist raters in doing another larger tranche of labelling, faster. Which obviously has potential drawbacks in terms of bias, etc., but may have )
My background is mostly in traditional/Bayesian statistics (think like linear models and factor analysis), so I'm a little out of the loop on good approaches to a task like this. The place this analysis will take place will not have any fancy LLMs, and no access to internet-based platforms (Huggingface, OpenAI, etc.). No GPUs, so any fine-tuning that might be needed has to take that into consideration. The obvious (to me, a-not-NLP person) starting point seems like BERT with a normal classifier. But there's so many variations to BERT, and similar models (Universal Sentence Encoders?)... and I'm not sure which ones are better for short text. I am aware of the huggingface leaderboards, which I've looked over, but it wasn't immediately clear to me which are best for short text classification.
So if anyone has suggestions for thoughts on potential approaches to look into, I'd really appreciate it.
I'll go first.
Soundness: 3,3,4
Overall: 2,2,3
🥺
I'm curious about the cool things people around the world are doing related to data in this area of work att
Hey all!
My colleagues and I have released version 1.0 of our open source LLM evaluation framework, and I wanted to share it here for feedback/visibility. With this first major release, we've focused on a few key areas:
If you have time to check out the repo and share any feedback or questions, I'd really appreciate it. It's still early days, but we've been blown away by the community response so far, and we're excited to get more input as we continue to work on the project.
Repo Link: https://github.com/comet-ml/opik
I have a dataset containing survey data with 39 variables, variables such as perfect.physical.health with score -2, -1, 0, 1, 2. Now I want to predict happiness which is a decimal value. How do i approach this problem?
Recently I saw someone post a query regarding Graph based VAE construction on MD trajectory data. Actually I am facing a similar problem as well. This is the code I have generated till now. As I am not a professional coder myself, coming from a chemistry background, I mostly relied on chatbots to generate the code for me, but the problem is the model has some serious problems with the dimensionality.
import numpy as np
import random
import MDAnalysis as mda
import networkx as nx
import torch
import torch.nn as nn
import torch.optim as optim
from torch_geometric.data import Data, DataLoader
from torch_geometric.nn import GCNConv
from Bio.PDB import PDBIO, Structure, Model, Chain, Residue, Atom
import matplotlib.pyplot as plt
from sklearn.model_selection import ParameterGrid
from tqdm import tqdm
import pandas as pd
# Load MD trajectory and select C-alpha atoms
u = mda.Universe('synuclein.top', 'short.nc')
ca_atoms = u.select_atoms("name CA")
# Define the amino acid sequence in three-letter code
sequence_one_letter = "MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKK"
amino_acid_1_to_3 = {
'A': 'ALA', 'C': 'CYS', 'D': 'ASP', 'E': 'GLU', 'F': 'PHE',
'G': 'GLY', 'H': 'HIS', 'I': 'ILE', 'K': 'LYS', 'L': 'LEU',
'M': 'MET', 'N': 'ASN', 'P': 'PRO', 'Q': 'GLN', 'R': 'ARG',
'S': 'SER', 'T': 'THR', 'V': 'VAL', 'W': 'TRP', 'Y': 'TYR'
}
sequence = [amino_acid_1_to_3[aa] for aa in sequence_one_letter]
# One-hot encoding for amino acids
amino_acid_types = {
'ALA': 0, 'CYS': 1, 'ASP': 2, 'GLU': 3, 'PHE': 4,
'GLY': 5, 'HIS': 6, 'ILE': 7, 'LYS': 8, 'LEU': 9,
'MET': 10, 'ASN': 11, 'PRO': 12, 'GLN': 13, 'ARG': 14,
'SER': 15, 'THR': 16, 'VAL': 17, 'TRP': 18, 'TYR': 19
}
# Function to convert amino acid sequence to one-hot encoding
def one_hot_encode(sequence):
num_amino_acids = len(amino_acid_types)
features = np.zeros((len(sequence), num_amino_acids))
for i, aa in enumerate(sequence):
if aa in amino_acid_types:
features[i, amino_acid_types[aa]] = 1
return features
# Generate node features for the amino acid sequence
node_features = one_hot_encode(sequence)
# Define the contact map based on CA distances
threshold_distance = 8.0 # Distance threshold in angstroms
num_amino_acids = len(sequence)
# Prepare data for PyTorch Geometric for all frames
data_list = []
num_frames = len(u.trajectory)
for frame in tqdm(range(num_frames), desc="Processing Frames"):
u.trajectory[frame]
ca_atoms = u.select_atoms("name CA")
# Create a contact graph
contact_graph = nx.Graph()
for i in range(num_amino_acids):
contact_graph.add_node(i, features=node_features[i])
# Add edges based on CA distances
for i in range(num_amino_acids):
for j in range(i + 1, num_amino_acids):
distance = np.linalg.norm(ca_atoms.positions[i] - ca_atoms.positions[j ])
if distance <= threshold_distance:
contact_graph.add_edge(i, j)
# Prepare data for PyTorch Geometric
edge_index = torch.tensor(list(contact_graph.edges), dtype=torch.long).t().contiguous()
x = torch.tensor(node_features, dtype=torch.float)
data = Data(x=x, edge_index=edge_index)
# print(data)
data_list.append(data)
# Plot and save contact map for every 500th frame
if frame % 500 == 0:
contact_map = np.zeros((num_amino_acids, num_amino_acids))
for i, j in contact_graph.edges:
contact_map[i, j] = 1
contact_map[j, i] = 1
plt.imshow(contact_map, cmap='binary')
plt.title(f"Contact Map for Frame {frame}")
plt.xlabel("Residue Index")
plt.ylabel("Residue Index")
plt.savefig(f"contact_map_frame_{frame}.png")
pd.DataFrame(contact_map).to_csv(f"contact_map_frame_{frame}.csv", index=False)
class GCNEncoder(nn.Module):
def __init__(self, in_channels, hidden_channels, num_layers):
super(GCNEncoder, self).__init__()
self.convs = nn.ModuleList()
self.fc_mu = nn.Linear(hidden_channels, hidden_channels)
self.fc_logvar = nn.Linear(hidden_channels, hidden_channels)
# Create multiple GCN layers
for _ in range(num_layers):
self.convs.append(GCNConv(in_channels, hidden_channels))
in_channels = hidden_channels # Update input channels for the next layer
def forward(self, x, edge_index):
for conv in self.convs:
x = conv(x, edge_index)
x = torch.relu(x) # Activation function
mu = self.fc_mu(x)
logvar = self.fc_logvar(x)
return mu, logvar
class GCNDecoder(nn.Module):
def __init__(self, hidden_channels, out_channels):
super(GCNDecoder, self).__init__()
self.fc = nn.Linear(hidden_channels, out_channels)
def forward(self, z):
return torch.sigmoid(self.fc(z))
class GCNVAE(nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels, num_layers):
super(GCNVAE, self).__init__()
self.encoder = GCNEncoder(in_channels, hidden_channels, num_layers)
self.decoder = GCNDecoder(hidden_channels, out_channels)
def reparameterize(self, mu, logvar):
std = torch.exp(0.5 * logvar)
eps = torch.randn_like(std)
return mu + eps * std
def forward(self, x, edge_index):
mu, logvar = self.encoder(x, edge_index)
z_sample = self.reparameterize(mu, logvar)
return self.decoder(z_sample), mu, logvar
def loss_function(recon_x, x, mu, logvar):
BCE = nn.functional.binary_cross_entropy(recon_x, x, reduction='sum')
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return BCE, KLD, BCE + KLD # Return BCE, KLD, and Total Loss
def train_model(model, data_loader, optimizer, epochs, early_stopping_patience=5):
model.train()
best_loss = float('inf')
patience_counter = 0
for epoch in range(epochs):
total_loss = 0
total_bce = 0
total_kld = 0
for data in tqdm(data_loader, desc=f"Training Epoch {epoch+1}/{epochs}"):
optimizer.zero_grad()
recon_batch, mu, logvar = model(data.x, data.edge_index)
bce, kld, total = loss_function(recon_batch, data.x, mu, logvar)
total_loss += total.item()
total_bce += bce.item()
total_kld += kld.item()
total.backward()
optimizer.step()
avg_loss = total_loss / len(data_loader)
avg_bce = total_bce / len(data_loader)
avg_kld = total_kld / len(data_loader)
print(f"Epoch {epoch+1}/{epochs} - Total Loss: {avg_loss:.4f}, BCE Loss: {avg_bce:.4f}, KLD Loss: {avg_kld:.4f}")
# Early stopping
if avg_loss < best_loss:
best_loss = avg_loss
patience_counter = 0
else:
patience_counter += 1
if patience_counter >= early_stopping_patience:
print("Early stopping triggered.")
break
# Create a DataLoader
data_loader = DataLoader(data_list, batch_size=1, shuffle=True)
# Hyperparameter grid
param_grid = {
'hidden_channels': [16, 32, 64],
'num_layers': [2, 3, 4],
'activation_function': ['relu', 'tanh', 'sigmoid'],
'batch_size': [1, 2, 4],
'latent_dimensions': [16, 32, 64],
'learning_rate': [0.001, 0.01, 0.1],
'epochs': [50, 100, 200]
}
# Perform hyperparameter tuning
best_loss = float('inf')
best_params = {}
for params in ParameterGrid(param_grid):
model = GCNVAE(in_channels=20, hidden_channels=params['hidden_channels'], out_channels=20, num_layers=params['num_layers'])
optimizer = optim.Adam(model.parameters(), lr=params['learning_rate'])
print(f"Training with parameters: {params}")
train_model(model, data_loader, optimizer, params['epochs'], early_stopping_patience=5)
# Evaluate the model (using training loss as a proxy)
model.eval()
total_loss = 0
total_bce = 0
total_kld = 0
with torch.no_grad():
for data in data_loader:
recon_batch, mu, logvar = model(data.x, data.edge_index)
bce, kld, total = loss_function(recon_batch, data.x, mu, logvar)
total_loss += total.item()
total_bce += bce.item()
total_kld += kld.item()
avg_loss = total_loss / len(data_loader)
avg_bce = total_bce / len(data_loader)
avg_kld = total_kld / len(data_loader)
print(f"Average loss: {avg_loss:.4f}, BCE Loss: {avg_bce:.4f}, KLD Loss: {avg_kld:.4f}")
if avg_loss < best_loss:
best_loss = avg_loss
best_params = params
print(f"Best parameters: {best_params} with loss: {best_loss}")
# Final training with best parameters
final_model = GCNVAE(in_channels=20, hidden_channels=best_params['hidden_channels'], out_channels=20, num_layers=best_params['num_layers'])
final_optimizer = optim.Adam(final_model.parameters(), lr=best_params['learning_rate'])
train_model(final_model, data_loader, final_optimizer, best_params['epochs'], early_stopping_patience=5)
I know the code is quite long, but I want to know is the code correct? I have a trajectory size of 500 frames, and 97 residues (corresponding to 97 C alpha atoms). Once this code is done, I want to generate protein configurations from the latent space. So I want to ensure that the code is running fine. Thanks a lottt in advance.
Hi everyone, I am currently researching speech technologies as an undergrad, mainly focusing on improving the applications for the visually challenged. I am new to this niche area of research, so I want to pick a research topic that will address some of the existing issues of the current tech. So far, ElevenLabs seem to be the SOTA. I would like to know whether there is anything else to improve in TTS, speech to speech, voice cloning, deepfake audio detection etc., And any insights on ethical issues or the need for guardrails in the future would also be helpful. And due to the availability of low compute resources from uni, I cannot address the research involving scaling or multilingual.
Suppose you have got a new idea about a solution to a problem in the domain you are working in. How do you go about implementing the thing from the ground up?
What is the general structure of the codebase you construct for your project?
How do you go about iteratively training and testing your solution until you arrive at a final solution where you can write a paper for publication?
Is there any design recipe you follow? Where did you learn it from?
Let suppose I have audio from karaoke with
Let suppose I know exactly how many main sources I have on the tape and I want to
I have several questions and appreciate any help.
Are there any models that can help me with such separation (pre-trained / needn’t to be trained)?
If not, I have some ideas about possible solution pipeline and appreciate any comments:
2.1. Separate instrumental music from everything else (what model I can use to do that?) 2.2. Clear noise from audio without music (what model I can use for that?) 2.3. Separate voices (how?) and delete wave I needn’t. 2.4. Put everything I need together back.
Privacy has always been and will continue to be a threat into the future of technology, especially with AI! AI and privacy are contradictory in nature. AI needs data to learn, but the more data the bigger the risk...
Curious what everyone's thoughts about this are and also sharing a new open-source tool called PII Masker that detects and masks personally identifiable information in text: https://github.com/HydroXai/pii-masker-v1. It’s fairly simple to use and makes protecting sensitive data a bit easier.
Would appreciate any feedback!
Apple is (purposefully) creating a lot of buzz regarding their “Apple Intelligence”, stating that their M4 chips are built for AI.
My question is this, Will this only be helpful for running the built in Apple Intelligence - or is this supposed to vastly improve on MPS when actually training large transformer models etc.? I haven’t heard them mention any improvements on MPS.
Hi everyone,
I’m starting planning my Master’s thesis in my Data Science and ML program and could really use some advice on narrowing down my topic. My undergrad thesis was on Bayesian nonparametrics, covering concepts like Dirichlet processes, hierarchical Dirichlet processes, dependent Dirichlet processes, HDP topic models, and Gaussian process regression. Out of everything, I really enjoyed implementing (albeit straightforward) applications of HDP topic modeling—getting hands on was a highlight for me.
For my Master’s, I’m hoping to build on this Bayesian foundation but apply it to something new, ideally in time series analysis or NLP. I want the topic to feel relevant to the field right now and would love suggestions on where Bayesian nonparametrics might add unique value, especially in practical-relevant applications.
One important thing to note is that I’ll be doing most of this work independently, as my department and supervisor aren't particularly relevant to my chosen areas of interest.
If anyone has thoughts on specific areas in NLP or time series that could benefit from a Bayesian approach, or if there are other areas where the Bayesian framework could be effectively utilized, I’d be incredibly grateful for your insights. Thanks so much for any guidance or ideas!
So, I am trying to generate a a graph based Variational Autoencoder Model (VAE), using smaller trajectories of my protein as input (I have generated multiple small trajectories of my protein at different random seeds). My goal is to see the latent space from the observed trajectories and generate new structures from the region that are less explored, and start MD simulations from these regions.
I have used protein's C alpha atoms as input and calculated adjacency matrix based on contact distance bewteen two C alpha atoms, with a cutoff of 8 angstrom. However I am facing a lot of issues with the dimensionality of the model, like I have 97 residues in my protein and for the test trajectory there are 2500 frames, and with 80:20 split, I have training set (2000,97,97) and validation set (500,97,97). But when I tried to decode the latent point, the decoded dimension was 194,97. this is creating a confusion for me. I am attaching the architecture of the model that I am using. Also the hyperparameters obtained in my case were:
Best Hyperparameters: {'activation_fn': ReLU(), 'batch_size': 2, 'dropout_rate': 0.1, 'epochs': 50, 'hidden_dim': 16, 'latent_dim': 2, 'learning_rate': 0.001, 'num_layers': 2, 'optimizer_type': 'adam', 'weight_decay': 1e-05}
please check them and let me know where am I going wrong. Thanks a lottt in advance.
GraphVAE(
(gcn_layers): ModuleList(
(0): GCNConv(97, 16)
(1): GCNConv(16, 16)
)
(fc_mu): Linear(in_features=16, out_features=2, bias=True)
(fc_logvar): Linear(in_features=16, out_features=2, bias=True)
(decoder_layers): ModuleList(
(0): GCNConv(2, 16)
(1): GCNConv(16, 16)
)
(decoder_output): GCNConv(16, 97)
(activation): ReLU()
)
The proposed method redefines the Evidence Lower Bound (ELBO) with a mixture of Gaussians for the posterior probability, introduces a regularization term to prevent variance collapse, and employs a PatchGAN discriminator to enhance texture realism. The main contribution in this work is an ELBO that reduces the collapse of the posterior towards the anterior (observed as the generation of very similar, blurry images)
https://arxiv.org/abs/2309.13160
https://github.com/marianorivera/How2TrainUrVAE
Currently, the serverless solution from Runpod meets my needs in terms of cost and features: https://github.com/runpod-workers/worker-faster_whisper
However, I'm interested in using https://huggingface.co/openai/whisper-large-v3-turbo due to its reported speed.
I'm uncertain about how to set up and run Whisper V3 Turbo on Runpod’s serverless infrastructure.
It seems we might need to wait until the upstream project https://github.com/SYSTRAN/faster-whisper/issues/1030 is updated with Turbo and published on https://pypi.org/project/faster-whisper/.
Only then will this feature be available, and at that point, we could fork https://github.com/runpod-workers/worker-faster_whisper to update it accordingly.
In the meantime, do you know of any cost-effective serverless solutions for using Whisper V3 Turbo?
Thanks.
p/s
Groq offers this service: https://groq.com/whisper-large-v3-turbo-now-available-on-groq-combining-speed-quality-for-speech-recognition/
However, they currently don't accept payments from developers and haven't provided an estimated timeframe for when this might be available.
I am very happy to announce that our paper "SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time" got accepted for WACV2025: https://arxiv.org/abs/2407.15507
Project-Page: https://spotdiffusion.github.io
Code: https://github.com/stanifrolov/spotdiffusion
Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next. This results in coherent, high-resolution images with fewer overall steps. We demonstrate the effectiveness of our approach through qualitative and quantitative evaluations, comparing it with MultiDiffusion, SyncDiffusion, and StitchDiffusion. Our method offers several key benefits, including improved computational efficiency and faster inference times while producing comparable or better image quality.
I'm glad to share that our paper "Dynamic Attention-Guided Diffusion for Image Super-Resolution" got accepted for WACV2025:
https://arxiv.org/abs/2308.07977
The goal of this work was to introduce a new attention-guided diffusion mechanism to focus image refinement on essential areas that benefit the most from deep refinement :)
Hi all, I’m starting my thesis and have basic ML/DL knowledge. I need a model that can take a fixed set of inputs (a snapshot) and output a variable-length vector with real and complex values. I’ve read LSTM might work, but I’m unsure given the fixed input.
Does anyone have recommendations for models or architectures that could work well for this kind of task? Any advice on where to start or resources to check out would be super helpful. Thanks in advance!
Paper: https://arxiv.org/abs/2410.14157
I'd be curious to hear expert perspectives on this.
It relates to ideas I find attractive: