/r/MachineLearning
Beginners -> /r/mlquestions , AGI -> /r/singularity, career advices -> /r/cscareerquestions, datasets -> r/datasets
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Andrew Ng and Adam Coates (4/15/2015)
Related Subreddit :
/r/MachineLearning
AI Agent for Digital Adoption.
Problem
Solution
OpenAI Vision enabled AI agent that could assist to provide guidance.
Customers
B2b: SaaS companies for their employees and their customer companies.
D2c: People who are starting to learn new things. Using new software.
Competitors ??
Future Market Evolution??
Differentiator??
[D], [P]
Last Week in Medical AI: Top LLM Research Papers/Models (December 7 - December 14, 2024)
Medical LLM & Other Models
Frameworks & Methodologies
- TOP-Training: Medical Q&A Framework
- Hybrid RAG: Secure Medical Data Management
- Zero-Shot ATC Clinical Coding
- Chest X-Ray Diagnosis Architecture
- Medical Imaging AI Democratization
Benchmarks & Evaluations
- KorMedMCQA: Korean Healthcare Licensing Benchmark
- Large Language Model Medical Tasks
- Clinical T5 Model Performance Study
- Radiology Report Quality Assessment
- Genomic Analysis Benchmarking
LLM Applications
- TCM-FTP: Herbal Prescription Prediction
- LLaSA: Activity Analysis via Sensors
- Emergency Department Visit Predictions
- Neurodegenerative Disease AI Diagnosis
- Kidney Disease Explainable AI Model
Ethical AI & Privacy
- Privacy-Preserving LLM Mechanisms
- AI-Driven Digital Organism Modeling
- Biomedical Research Automation
- Multimodality in Medical Practice
Full thread in detail: https://x.com/OpenlifesciAI/status/1867999825721242101
I am trying to implement the contrastive loss function I am unsure if it is correct. My loss seems to explode into infinity. Another set of eyes on this would be appreciated does this look correct?
class ContrastiveLoss(nn.Module):
def __init__(self, temperature=0.9):
super(ContrastiveLoss, self).__init__()
self.temperature = temperature
def forward(self, projections_1, projections_2):
z_i = projections_1
z_j = projections_2
z_i_norm = F.normalize(z_i, dim=1)
z_j_norm = F.normalize(z_j, dim=1)
cosine_num = torch.matmul(z_i, z_j.T)
cosine_denom = torch.matmul(z_i_norm, z_j_norm.T)
cosine_similarity = cosine_num / cosine_denom
numerator = torch.exp(torch.diag(cosine_similarity) / self.temperature)
denominator = cosine_similarity
diagonal_indices = torch.arange(denominator.size(0))
denominator[diagonal_indices, diagonal_indices] = 0
denominator = torch.sum(torch.exp(cosine_similarity), dim=1)
loss = -torch.log(numerator / denominator).mean()
return loss
https://github.com/mikayahlevi/mru-lm
Hi, I'm posting here to share a project I just published on GitHub. I'll start with a description, some of which will be copy/pasted from the GitHub repo.
The idea of a matrix recurrent unit is dictated by the update rule H_t = H_{t-1} X_{t-1} and H_1 = X_1 where X and H are s×n×n sequences of square matrices. The primary difference between this and a traditional RNN is that no initial vector is passed through the linears, instead the first state is a matrix, leading to the output also being a matrix. My motivation for coming up with this idea are based on the following reasons:
I tried generating matrix X by different methods in the different branches. All of the ways to generate X and fold the output hidden state back into a vector, are arbitrary combinations of linears and reshapes and just based on what I found worked well.
Loss vs Steps for a Transformer and an MRU-LM on shakespeare-char
This approach seems to work pretty well based on the toy dataset shakespeare-char, so if anyone wants to help me out, I would like to benchmark it on more informative datasets and see how it works out.
Disclaimer: I posted this in r/learnmachinelearing first, but the sub seems to be more concerned with very basic questions, courses and hiring, so feel free to remove it if it doesn't fit here (tho I think that also fits this sub as a discussion).
I now have a few years of experience building and training different model architectures, I know most of the basic theory and am able to follow most papers. So my question goes into a more methodological direction. While I am able to successfully build models for a number of applications, a lot of the time this is to a large extend guesswork. I try out different stuff and see what sticks. I know there is a lot of research in the direction of interpretability going on, but this is not directly the direction I want to go with this. Instead I want to ask you all what general advice you have on the training process, what are some practical observations, rules of thumb, approaches you take that are not described in a paper or theoretical ml class. For example:
How do you analyze gradients in your model. I know how to do some very basic plots in this regard, but would be interested in your methods and how you read them from a practical perspective?
How do you visualize temporal instabilities between optimizer steps resulting from e.g. a too large learning rate?
How do you determine appropriate regularization?
What are your rules of thumb for diminisheing returns during a training run?
How do you tune your hyperparameters? I eyeballed them more or less and also used optuna for this in the past.
What are some important intuitions, unwritten rules and pitfalls during training in your opinion?
What are your debugging steps when a model does not perform as expected?
What tricks do you actually use? There are lots of small tricks (EMA, obscure activation functions, ...) that promise some gains, but what do you actually use?
How does your approach differ when you do a transformer, CNN, diffusion model, ...
Some general opinions or tips that I might have missed above.
University classes and online resources mostly teach the basics or theoretical foundation, which is very important, but in practice only part of the story. Real world experience also helps, but you only get so far with trial and error and might miss something useful. I am aware of the blog posts by Karpathy on the training of neural networks and look for more resources in this direction.
I am happy to here your replies on this arguably broad topic.
We are university students and we're conducting a quick survey on students’ motivation to learn Artificial Intelligence and Modeling. The survey will take less than 10 minutes to complete.
Here's the link to the survey: https://docs.google.com/forms/d/e/1FAIpQLSdS-xy53N9lDRlC_835A_E59VMjCPql0_HuihPYqaQ_nINSsw/viewform?usp=sf_link
Your input would mean a lot to us! Thank you so much for your support and time.
I'm dealing with a clustering over time issue. Our company is a sort of PayPal. We are trying to implement an antifraud process to trigger alerts when a client makes excessive payments compared to its historical behavior. To do so, I've come up with seven clustering features which are all 365-day-long moving averages of different KPIs (payment frequency, payment amount, etc.). So it goes without saying that, from one day to another, these indicators evolve very slowly. I have about 15k clients, several years of data. I get rid of outliers (99-percentile of each date, basically) and put them in a cluster-0 by default. Then, the idea is, for each date, to come up with 8 clusters. I've used a Gaussian Mixture clustering (GMM) but, weirdly enough, the clusters of my clients vary wildly from one day to another. I have tried to plant the previous mean of my centroids, using the previous day centroid of a client to sort of seed the next day's clustering of a client, but the results still vary a lot. I've read a bit about DynamicC and it seemed like the way to address the issue, but it doesn't help.
In AI and professional workstations, NVIDIA's dominance feels like a suffocating monopoly. Their segmented product lines widen the gap between consumer and professional GPUs, particularly in VRAM, performance, and price.
AI enthusiasts struggle with prohibitive costs for GPUs equipped with sufficient VRAM. The reliance on CUDA cores—a proprietary standard—further locks developers into NVIDIA’s ecosystem, stifling competition and innovation.
NVIDIA’s control extends beyond hardware, as their CUDA platform discourages adoption of open, competitive solutions. This feeds a cyberpunk dystopia where corporations consolidate power, leaving consumers and developers with few choices.
Why does the tech world remain complicit? Why aren’t we pursuing alternative hardware architectures or broader software compatibility beyond CUDA? AMD’s ROCm is a start, but more aggressive development and policy interventions are needed to challenge NVIDIA’s grip.
Until when will this continue? Who will stand up for the end consumer?
This paper introduces a framework for analyzing and visualizing the branching decisions language models make during text generation. The key methodology involves tracking probability distributions across different sampling paths to understand how early choices affect downstream generation.
Main technical points:
Key results:
I think this work provides important insights into how we might better control text generation. The ability to map and understand generation paths could help develop more reliable sampling methods and better uncertainty estimates.
I think the clustering of generation paths is particularly interesting - it suggests there may be ways to guide generation toward desired trajectory groups. This could be valuable for applications needing more predictable outputs.
The methodology also reveals some concerning aspects about current sampling methods. The strong dependence on early decisions suggests we may need new approaches that better preserve generation flexibility throughout the sequence.
TLDR: New framework for analyzing how language models make text generation choices. Shows that generation paths cluster into distinct groups and early decisions heavily influence outcomes. Could help develop better sampling methods and uncertainty estimates.
Full summary is here. Paper here.
About a year ago, research papers talked about model collapse when dealing with synthetic data. Recently I’ve been hearing about some progress in this regard. I am not expert and would welcome your views on what’s going on. Thank you and have a fantastic day.
I was looking into design patterns for Agentic AI and I could need some help to grasp the concepts.
I read about ReAct and ReWOO.
From ReWOO, I really liked the idea of having a planner that creates a blueprint of the work that needs to be done. I can imagine that this works well for a lot of tasks, and it optimizes token usage compared to ReAct.
From ReAct, I like that it has a reflection/observation LLM, to decide whether the output is good enough or needs another pass through the agents.
What I don't understand: Why does ReWOO not have a reflection component??
Wouldn't it be the best of both worlds to have the planner and the reflection?
This was the first draft for my agentic AI prototype, and I think it has pretty obvious advantages.
I think I am missing something here.
I’ve noticed that the time spent on hyperparameter optimization vary significantly, not just between industry and academia but also across different fields like NLP, computer vision, or reinforcement learning. I’m curious—what’s your experience?
Would love to hear your experiences! Thanks
i am having an issue with evaluating my model because model.evaluate() returns an okay overall score in accuracy but the confusion matrix and classification report return 100% for one class and 0% for another, i am using cifar10 but only 2 classes from it. anyone know why this happens? is this overfitting i am not sure because i am getting a similar score as model.evaluate(0 in my training accuracy and same for loss (which is almost as high as the accuracy)
i can't find this information and if both are open source it make sense a compatibility layer , any of the two is already ported to the other platform?, if you can share info about nvidia too will be cool
Hello, I have a project in which I detect anomalies on transactions data from ethereum blockchain. I have performed aggregated calculations on each wallet address (ex. minimum, maximum, median, sum, mode of transactions' values) and created seperated datafile with it. I have joined the data on all the transactions. Now I have to standardize data (I have chosen robust scalling) before machine learning but I have following questions regarding this topic:
For a project I am currently trying to integrate an Autoencoder for feature extraction and an LSTM for classification of the reduced feature space. The problem I am encountering is on how to train the LSTM network. The AE produces 5 datapoints which is fed into the LSTM network. The trick now comes in on the training of the LSTM network and how the LSTM works. I want the LSTM to take into account the 5 parameters from the AE at time t as well as the parameters at t-1 and t-2. As far as I understand the LSTM does this automatically, or should it then be that the LSTM takes in a total of 15 parameters with each pair of 5 corresponding to one timestep of the AE?
Any advice on LSTM would be great or how such training can be done in an efficient way. The AE is processing a time-series signal.
We recently had a paper accepted to a conference (AAAI). We found out that the conference does not publish appendices so they recommend we upload the full paper (with appendix) to arXiv. This is something we were considering doing anyway since the paper would be available before the conference proceedings come out.
My concern is that if someone decides to cite our work, they may either become confused or cite the arXiv rather than AAAI "version".
Is there a "correct" or common way to handle this? Do arXiv uploads with the same title get indexed to "one manuscript" on google scholar?
Also, are we allowed to use the conference template to upload? (This part might be conference dependent I suppose).
I know it is common these days to upload to arXiv before hearing back from a conference (usually with a different title) but I think this is a slightly different situation as the paper is accepted and the uploaded version will be identical to the conference paper (though with an Appendix).
Thanks in advance!
Presumably, the winner of the NeurIPS 2024 Best Paper Award (a guy from ByteDance, the creators of Tiktok) sabotaged the other teams to derail their research and redirect their resources to his own. Plus he was at meetings debugging his colleagues' code, so he was always one step ahead. There's a call to withdraw his paper.
https://var-integrity-report.github.io/
I have not checked the facts themselves, so if you can verify what is asserted and if this is true this would be nice to confirm.
So I submitted a paper and it got accepted by my publication around 2 months ago, today was my conference in online mode, didnt go well I think he was in hurry he didnt listen much diagreed a bit and then closed the meet on my face. So my question is how bad is it? Will it be published as I have the acceptance or still a no?
Hi, I am sharing my recent work which allows arbitrary images to be positive pairs. Our finding is quite astonishing that two disparate images, e.g., a snake and a lamp, can be positive. Our work potentially broadens the applications of contrastive learning to deal with the "false positive" in which two views are not similar.
We challenge the common sense in contrastive learning, that is, the positive pair design is critical. Our results prove that the feature selection is the key!
General Discussion - now that they are about to be banned in the US, I'm becoming fascinated by the strength of their For You recommendations. To try and put some guard rails on what I mean, TikTok has shown itself to be able to match content to relevant audience at greater frequency and scale than any other app (YouTube included). Many creators can join the platform, post a single video, and have millions of views in 24 hours. This does happen on other apps, but TikTok seems to be the most consistent at scaling audience incredibly fast.
What models might they be basing their system on? What about their models creates their competitive advantage?
Hi, I am learning ML and this is my first project. I did a simple 100 LoC implementation of the Neural Style Transfer paper by Gatys et al. See https://github.com/TAOGenna/pytorch-neural-style-transfer
I'm working on an application that requires ambient sounds/ music. For example:
I've had a look at Hugging Face and found the Text-To-Audio section. However it appears the top models have very few downloads:
This makes me think the field is immature, and there's no clear best model. Is this a fair appraisal of the field, or are there models outside of Hugging Face that perform well for this use case?
I’ve been exploring the architecture of ResNet and its ability to train very deep neural networks effectively. While I understand that residual connections help mitigate issues like vanishing gradients and make training deeper networks feasible, I’m curious about the limitations of this approach when scaling to extremely deep networks, such as those with 1000 layers or more.
From my understanding, a ResNet with, say, 100 layers might effectively function like a much smaller network due to the residual connections, which essentially "skip" layers and add outputs. However, wouldn’t this also mean that if a regular MLP struggles to scale beyond 15 layers, a ResNet might just shift this limit proportionally (e.g., struggling beyond 150 layers)? In other words, does ResNet fundamentally solve the problem of training extremely deep networks, or does it merely extend the depth at which issues start to reappear?
I’d appreciate any insights you might have! TYSM!
This paper presents a grounded theory study of how red-teaming is conducted on Large Language Models (LLMs), based on interviews with practitioners. The researchers systematically analyzed practitioner approaches to identify common patterns, strategies and motivations in LLM red-teaming.
Key technical points:
Main results:
I think this work provides an important foundation for developing more structured approaches to LLM safety testing. The taxonomy they've developed could help standardize how we evaluate and secure these systems. Their finding that manual testing remains superior to automation suggests we need much more work on automated testing approaches.
I think the emphasis on non-malicious intent and safety motivations is particularly relevant as these systems become more widely deployed. Understanding how and why people conduct these tests helps distinguish legitimate security research from attacks.
TLDR: First systematic study of LLM red-teaming practices, providing taxonomy of strategies and techniques based on practitioner interviews. Shows importance of manual testing and team collaboration, while establishing red-teaming as legitimate security research.
Full summary is here. Paper here.