/r/q?req.query.q -- Subreddit Search

27,111 Subscribers

Convex Optimization Theory Predicts Optimal Learning Rate Schedules for Large Language Models

This paper makes a key connection between classical convex optimization theory and empirically successful learning rate schedules used in modern deep learning. The researchers derive mathematical proofs showing that cosine learning rate decay emerges naturally from optimization bounds.

Main technical points:

Developed theoretical framework connecting classical optimization with deep learning scheduling
Proved that cosine decay schedules minimize convergence bounds for convex problems
Showed linear warmup has theoretical justification through optimization lens
Validated results on ImageNet, language models, and other standard benchmarks
Found 10-15% improvement in final model performance using theoretically optimal schedules

I think this work provides valuable mathematical grounding for practices that were mainly developed through trial and error. While the analysis focuses on convex cases, the alignment with empirical results suggests the insights transfer well to deep learning. The proofs could help develop better automated scheduling methods.

I think the framework could be extended to analyze other training components like momentum and weight decay. The connection to classical optimization theory opens up possibilities to leverage decades of theoretical work.

TLDR: Research proves popular learning rate schedules (cosine decay, linear warmup) are theoretically optimal under convex optimization, matching empirical findings. Results validate current practices and provide foundation for improving training methods.

Full summary is here. Paper here.

1 Comment

2025/02/04
12:22 UTC

Hyperdimensional Computing (HDC) with Peter Sutor Part 1 (Interview)

0 Comments

2025/02/03
16:37 UTC

Calculating batch norm for hidden layers

I am trying to understand the details of performing batch norm for hidden layers. I understand that for a given neuron, say, X^(l) in layer l, we need to calculate mean and variance over all mini-batch samples to standardize its activation before feeding it to the next layer.

I would like to understand how exactly the above calculation is done. One way might be to process each element of the mini-batch and collect stats for neurons in layer l, and ignore the subsequent layers. Once means and variance for all elements in layer l have been calculated, process the mini-batch elements again for layer l+1, and so on. This seems rather wasteful. Is this correct?

If not, please share a description of the exact calculation being performed. The root of my confusion is that standardization in layer l affects values going to layer l+1. So unless we know mean and variance for layer l, how can we standardize the next layer. Thank you in advance.

0 Comments

2025/02/03
06:09 UTC

Curvature-guided Langevin Monte Carlo for Multi-chirp Parameter Estimation

This paper introduces a new approach for estimating parameters in multi-chirp signals using Curvature-guided Langevin Monte Carlo (CLMC). The key innovation is combining geometric information from the parameter space with stochastic sampling to better handle overlapping frequency components.

Main technical contributions:

Integration of curvature information into the Langevin Monte Carlo framework
Adaptive step size mechanism based on local geometric properties
Novel approach to handling multi-modal distributions in parameter space
Implementation of second-order information for guided sampling

Results showed:

Improved accuracy in parameter estimation compared to standard methods
Better performance in low SNR conditions (demonstrated up to -5dB)
More reliable separation of closely spaced frequency components
Faster convergence compared to traditional LMC
Successful handling of up to 4 overlapping chirp components

I think this work opens up new possibilities for applications like radar and sonar where precise frequency analysis is crucial. The ability to better separate overlapping components could be particularly valuable for wireless communications and medical imaging applications where signal clarity is essential.

I think the main limitation is computational complexity scaling with the number of components, which might restrict real-time applications. The method also requires careful parameter tuning, which could make practical deployment challenging.

TLDR: New method combines curvature information with Langevin Monte Carlo for better multi-chirp parameter estimation, showing improved accuracy and robustness in handling overlapping frequency components.

Full summary is here. Paper here.

1 Comment

2025/02/02
20:11 UTC

Elman networks - can you explain what they are and how they work?

0 Comments

2025/02/01
22:08 UTC

ChatGPT is made from 100 million of these [The Perceptron]

1 Comment

2025/02/01
20:51 UTC

Giving ppl access to free GPUs - would love beta feedback🦾

Hi all! I’m the founder of a YC backed company, and we’re trying to make it very easy and very cheap to train ML models. For the next 2 weeks we’re running a *free* beta and would love some of your feedback.

If it sounds interesting feel free to check us out here: https://github.com/tensorpool/tensorpool

TLDR; free GPUs😂

4 Comments

2025/02/01
20:26 UTC

I need to label your data for my project

Hello!

I'm working on a private project involving machine learning, specifically in the area of data labeling.

Currently, my team is undergoing training in labeling and needs exposure to real datasets to understand the challenges and nuances of labeling real-world data.

We are looking for people or projects with datasets that need labeling, so we can collaborate. We'll label your data, and the only thing we ask in return is for you to complete a simple feedback form after we finish the labeling process.

You could be part of a company, working on a personal project, or involved in any initiative—really, anything goes. All we need is data that requires labeling.

If you have a dataset (text, images, audio, video, or any other type of data) or know someone who does, please feel free to send me a DM so we can discuss the details

0 Comments

2025/01/28
21:18 UTC

Hi guys I just did some work on making a recommendation system use knowledge aware coupled graph neural network and transformer if some one can help me with it please message me I have some reference but need help understanding them

0 Comments

2025/01/28
11:32 UTC

Free Beginner Course to Get Into Deep Learning

What's up yall!

Hey I just started writing a newsletter that I hope will help people understand the basics of deep learning and may clarify some things I found hard to understand while I was learning. I spend quite a bit of time on each one, so I figured I'd share it here if anyone is looking to start from the basics.

https://www.linkedin.com/newsletters/neural-notes-understanding-ai-7282889158631534592/

0 Comments

2025/01/28
05:55 UTC

hello guys, so i started learning CNN and i want to make a model that will remove this black spots and can also construct the damaged text. For now i have 70 images like this and i have cleaned it using photoshop. If any can give me some guidance on how to start doing it. Thank you

6 Comments

2025/01/27
11:03 UTC

Combining XGBoost with Pytorch.

I've been experimenting with combining XGBoost and PyTorch to see how they can complement each other. The idea is to use XGBoost's predictions and feed its output into PyTorch for deep learning, creating a sort of hybrid model. The results have been pretty interesting—seems like this approach can really improve performance in certain cases.

Curious if anyone else has tried something similar or has insights on this combo? Would love to hear your thoughts or suggestions!

https://machinelearningsite.com/machine-learning-using-xgboost/

0 Comments

2025/01/26
12:11 UTC

Loading model from pickle - no module named "ModuleName"

I have 2 projects, one is destined to train all sorts of neural network models and LLMs which should then either be called via API (for the LLM) or loaded via pickle from the second project which is a text analytics algorithm (incl. text parsing large PDFs and other simpler nlp tasks).

The problem I'm having is that when I pickle my neural network and try to load it in the second workspace, I get a ModuleNotFoundError "no module named "neuralnet" ", neuralnet being a file (neuralnet.py( where the actual neuralnet logic is contained and the model is trained. I've tried to copy the file to the first workspace but I'm still running into the same error.

Clearly I'm doing something wrong in terms of saving and loading the model? Has anyone encountered a similar struggle?

0 Comments

2025/01/26
10:47 UTC

Leveraging LLM Hallucinations to Enhance Drug Discovery Performance: A Multi-Model Analysis

The researchers explored how controlled hallucinations in LLMs might actually benefit drug discovery by enabling novel molecular generation. They developed methods to tune GPT-4's hallucination rates when generating molecular structures and analyzed the relationship between hallucination levels and drug-like compound novelty.

Key technical points:

Implemented temperature scaling and nucleus sampling to control hallucination rates
Evaluated generated molecules using standard metrics (validity, drug-likeness, novelty)
Tested different hallucination levels and their impact on molecular properties
Analyzed trade-offs between molecular novelty and chemical feasibility
Developed prompt engineering techniques to guide molecular generation

Results showed:

Moderate hallucination rates (0.4-0.6) produced most promising molecules
Generated compounds maintained basic chemical validity
Higher novelty correlated with increased hallucination rates
Model demonstrated ability to create previously unknown structures
Output quality varied significantly with sampling parameters

I think this could transform early-stage drug discovery by providing a new source of candidate molecules. While computational feasibility doesn't guarantee real-world viability, the ability to rapidly generate novel structures could accelerate initial screening processes. The key challenge will be validating these compounds experimentally and ensuring safety.

The approach needs more work on:

Physical synthesis validation
Toxicity screening
Integration with existing pipelines
Reproducibility standards
Regulatory compliance

TLDR: Researchers found that controlled LLM hallucinations can generate novel, chemically valid drug candidates. By tuning hallucination rates, they balanced molecular novelty with chemical feasibility.

Full summary is here. Paper here.

1 Comment

2025/01/25
17:07 UTC

Dreaming Learning

A new method to include novelties in neural networks and for preparing the network for paradigm shifts in time series https://arxiv.org/abs/2410.18156

1 Comment

2025/01/24
17:54 UTC

Understanding Sequence Models Through Test-Time Regression: A Framework for Associative Memory in Neural Architectures

This paper introduces a test-time regression framework that approaches sequence modeling in a novel way - instead of relying on standard attention mechanisms, it performs regression during inference to build associative memory connections.

Key technical points:

The model performs dynamic memory updates during inference time rather than just during training
Uses a bilinear projection technique to map between sequence elements and memory states
Achieves O(n) complexity while maintaining competitive performance with O(n²) attention models
Demonstrates strong results on long-range dependency tasks
Shows consistent improvement on sequence lengths >1000 tokens

Main empirical findings:

15-20% speedup compared to standard attention mechanisms
Memory usage scales linearly with sequence length
Maintains 98% accuracy compared to full attention baseline
Particularly strong on tasks requiring associative recall
Effective across multiple architectures (Transformers, RNNs)

I think this approach could lead to meaningful improvements in how we handle long sequences in practice. The linear scaling properties make it particularly relevant for processing longer documents or time series. While the memory trade-offs need careful consideration, the ability to build associative connections during inference opens up new possibilities for adaptive models.

I suspect we'll see this framework adapted for specific domains like document QA and time series forecasting where the associative memory aspects could be particularly valuable. The compatibility with existing architectures makes it quite practical to adopt.

TLDR: New framework performs regression at inference time to build associative memory, achieving linear complexity while maintaining strong performance. Shows particular promise for long sequence tasks.

Full summary is here. Paper here

1 Comment

2025/01/24
15:56 UTC

Learning experience

Hello. I am a grad student who's been working in mainly researched companies and public institutions for the last 2 years after college. Unfortunately most of my work consisted of building NNs with Tensorflow, Keras or PyTorch, and finding the best hyper parameters that fit my data. So I've been mostly trying all different kinds of hyper parameters for hour on end for probably two years (I'm exaggerating but you get the idea, I haven't really "built" a network from scratch of stuff like that). And unfortunately still, I've been in pretty alone positions which doesn't allow me to learn much from my peers. It seems to me that there's so much more to NN, and without being a math wizards, I'd like to start working into building my own NNs. For that I would need some king of ressource that give you the "intuition" on choosing a certain layer...etc and not just brute forcing your way into a good NN. Would you have anything to recommend ? Thanks a lot.

5 Comments

2025/01/24
11:40 UTC

Doubt for extremely unbalanced data

I have been trying for the last few days to train a neural network on an extremely unbalanced dataset, but the results have not been good enough, there are 10 classes and for 4 or 5 of them it does not obtain good results. I could start to group them but I want to try to get at least decent results for the minority classes.

This is the dataset

Kaggle dataset

The pre processing I did was the following one:

-Obtain temporal data from the time the loan has been on

datos_crudos['loan_age_years'] = (reference_date - datos_crudos['issue_d']).dt.days / 365

datos_crudos['credit_history_years'] = (reference_date - datos_crudos['earliest_cr_line']).dt.days / 365

datos_crudos['days_since_last_payment'] = (reference_date - datos_crudos['last_pymnt_d']).dt.days

datos_crudos['days_since_last_credit_pull'] = (reference_date - datos_crudos['last_credit_pull_d']).dt.days

- Drop columns which have 40% or more NaN

- Imputation for categorical and numerical data

categorical_imputer = SimpleImputer(strategy='constant', fill_value='Missing')

numerical_imputer = IterativeImputer(max_iter=10, random_state=42)

- One Hot Encoding, Label Encoder and Ordinal Encoder

Also did this

-Feature selection through random forest

-Oversampling and Undersampling techniques, used SMOTE

Current                                                361097
Fully Paid                                             124722
Charged Off                                             27114
Late (31-120 days)                                       6955
Issued                                                   5062
In Grace Period                                          3748
Late (16-30 days)                                        1357
Does not meet the credit policy. Status:Fully Paid       1189
Default                                                   712
Does not meet the credit policy. Status:Charged Off       471

undersample_strategy = {

'Current': 100000,

'Fully Paid': 80000

}

oversample_strategy = {

'Charged Off': 50000,

'Default': 30000,

'Issued': 50000,

'Late (31-120 days)': 30000,

'In Grace Period': 30000,

'Late (16-30 days)': 30000,

'Does not meet the credit policy. Status:Fully Paid': 30000,

'Does not meet the credit policy. Status:Charged Off': 30000

}

- Computed class weights

- Focal loss function

- I am watching F1 Macro because of the unbalanced data

This is the architecture

model = Sequential([

Dense(1024, activation="relu", input_dim=X_train.shape[1]),

BatchNormalization(),

Dropout(0.4),

Dense(512, activation="relu"),

BatchNormalization(),

Dropout(0.3),

Dense(256, activation="relu"),

BatchNormalization(),

Dropout(0.3),

Dense(128, activation="relu"),

BatchNormalization(),

Dropout(0.2),

Dense(64, activation="relu"),

BatchNormalization(),

Dropout(0.2),

Dense(10, activation="softmax") # 10 clases

])

And the report classification, the biggest problems are class 3,6 and 8 some epochs obtain really low metrics for those clases

Epoch 7: F1-Score Macro = 0.5840
5547/5547 [==============================] - 11s 2ms/step
              precision    recall  f1-score   support

           0       1.00      0.93      0.96      9125
           1       0.99      0.85      0.92    120560
           2       0.94      0.79      0.86       243
           3       0.20      0.87      0.33       141
           4       0.14      0.88      0.24       389
           5       0.99      0.95      0.97     41300
           6       0.02      0.00      0.01      1281
           7       0.48      1.00      0.65      1695
           8       0.02      0.76      0.04       490
           9       0.96      0.78      0.86      2252

    accuracy                           0.87    177476
   macro avg       0.58      0.78      0.58    177476
weighted avg       0.98      0.87      0.92    177476

Any idea what could be missing to obtain better results?

1 Comment

2025/01/24
06:54 UTC

Medical Melanoma Detection | TensorFlow U-Net Tutorial using Unet

https://preview.redd.it/o4te29de1see1.png?width=1280&format=png&auto=webp&s=55ee2be35ceab748043a4dcedb3611341cddfed9

This tutorial provides a step-by-step guide on how to implement and train a U-Net model for Melanoma detection using TensorFlow/Keras.

🔍 What You’ll Learn 🔍:

Data Preparation: We’ll begin by showing you how to access and preprocess a substantial dataset of Melanoma images and corresponding masks.

Data Augmentation: Discover the techniques to augment your dataset. It will increase and improve your model’s results Model Building: Build a U-Net, and learn how to construct the model using TensorFlow and Keras.

Model Training: We’ll guide you through the training process, optimizing your model to distinguish Melanoma from non-Melanoma skin lesions.

Testing and Evaluation: Run the pre-trained model on a new fresh images . Explore how to generate masks that highlight Melanoma regions within the images.

Visualizing Results: See the results in real-time as we compare predicted masks with actual ground truth masks.

You can find link for the code in the blog : https://eranfeit.net/medical-melanoma-detection-tensorflow-u-net-tutorial-using-unet/

Full code description for Medium users : https://medium.com/@feitgemel/medical-melanoma-detection-tensorflow-u-net-tutorial-using-unet-c89e926e1339

You can find more tutorials, and join my newsletter here : https://eranfeit.net/

Check out our tutorial here : https://youtu.be/P7DnY0Prb2U&list=UULFTiWJJhaH6BviSWKLJUM9sg

Enjoy

Eran

0 Comments

2025/01/23
17:17 UTC

Regarding a project query

Currently I am working on a dataset named “Large scale annotation dataset for fetal head biometry in ultrasound images” available in zenodo. My dataset has ultrasound images with corresponding segmentation masks. I overlayed the images now I want to calculate Head circumference, Biparietal diameter etc. and using such 20 features I wanna do correlation analysis. But unfortunately I don’t have any domain expert. In this case if I target Q1 journal, what can I do for validation? My dataset does not have any existing work! Can anyone help?

0 Comments

2025/01/19
18:09 UTC

Practical Lessons and Threat Models from Large-Scale AI Red Teaming Operations

This paper presents a systematic analysis of red teaming 100 generative AI products, developing a comprehensive threat model taxonomy and testing methodology. The key technical contribution is the creation of a structured framework for identifying and categorizing AI system vulnerabilities through hands-on testing.

Main technical points:

Developed an attack taxonomy covering prompt injection, data extraction, and system manipulation
Created standardized testing procedures that combine automated and manual probing
Documented attack patterns and defense mechanisms across different AI architectures
Quantified success rates of various attack vectors across system types
Mapped common vulnerability patterns and defense effectiveness

Key results:

80% of tested systems showed vulnerability to at least one form of prompt injection
Multi-step attacks proved more successful than single-step attempts
System responses to identical attacks varied significantly based on prompt construction
Manual testing revealed 2.3x more vulnerabilities than automated approaches
Defense effectiveness decreased by 35% when combining multiple attack vectors

I think this work provides an important baseline for understanding AI system vulnerabilities at scale. While individual red teaming efforts have been done before, having data across 100 systems allows us to identify systemic weaknesses and patterns that weren't visible in smaller studies.

I think the methodology could become a standard framework for AI security testing, though the rapid pace of AI development means the specific attack vectors will need constant updating. The finding about manual testing effectiveness suggests we can't rely solely on automated security measures.

TLDR: Analysis of red teaming 100 AI systems reveals common vulnerability patterns and establishes a framework for systematic security testing. Manual testing outperforms automated approaches, and multi-vector attacks show increased success rates.

Full summary is here. Paper here.

1 Comment

2025/01/16
14:35 UTC

Dynamic LLM Adaptation Through Selective Weight Matrix Updates: A Task-Specific Self-Adaptive Framework

The core contribution is a self-adaptive learning mechanism that allows transformers to modify their weights during inference without additional training. This "Transformer²" approach introduces a dual-attention system that processes both content and meta-learning patterns simultaneously.

Key technical points:

Dynamic weight adjustment using gradient approximation during inference
Meta-learning layer that enables real-time parameter updates
Dual attention mechanism combining standard and adaptive self-attention
Efficient memory management through selective weight updates
Maintains base weights while generating task-specific adaptations

Results show notable improvements:

15% increase in performance on complex reasoning benchmarks
Better handling of edge cases and novel inputs
Minimal computational overhead (1.2x standard transformer)
More consistent responses across varied task types
Improved performance on long-sequence tasks

I think this could meaningfully change how we approach model adaptation. Instead of fine-tuning or prompt engineering, having models that can self-modify during inference opens up some fun possibilities for adaptation. The computational efficiency is particularly noteworthy - previous attempts at adaptive models often had significant overhead.

I also think the dual-attention mechanism could influence how we design future transformer architectures. The ability to process both content and meta-learning patterns simultaneously seems like a valuable architectural pattern that could be applied more broadly.

TLDR: New transformer architecture that can adapt its weights during inference using an efficient dual-attention mechanism. Shows 15% better performance with minimal computational overhead.

Full summary is here. Paper here.

2 Comments

2025/01/15
13:36 UTC

Image Classification for Thermal Images

Got a small task from my high school - make a small neural network for image classification. I made it with VGG and with small database(around 2k of images). Everything was alright until it(neural network) just started making weird predictions on test data. It was saying that every single picture in test dataset was related to class 0(which is human)... And now im stuck with it... If anyone could help me, i would really appreciate it and provide any information about my NN which needed

2 Comments

2025/01/14
18:52 UTC

Why L1 Regularization Produces Sparse Weights

0 Comments

2025/01/12
09:20 UTC

U-net Image Segmentation | How to segment persons in images 👤

https://preview.redd.it/cuvm3dtoffce1.jpg?width=1280&format=pjpg&auto=webp&s=c8e94c412194ba8c05f86c3bb6f675922ba956f8

This tutorial provides a step-by-step guide on how to implement and train a U-Net model for persons segmentation using TensorFlow/Keras.

The tutorial is divided into four parts:

Part 1: Data Preprocessing and Preparation

In this part, you load and preprocess the persons dataset, including resizing images and masks, converting masks to binary format, and splitting the data into training, validation, and testing sets.

Part 2: U-Net Model Architecture

This part defines the U-Net model architecture using Keras. It includes building blocks for convolutional layers, constructing the encoder and decoder parts of the U-Net, and defining the final output layer.

Part 3: Model Training

Here, you load the preprocessed data and train the U-Net model. You compile the model, define training parameters like learning rate and batch size, and use callbacks for model checkpointing, learning rate reduction, and early stopping.

Part 4: Model Evaluation and Inference

The final part demonstrates how to load the trained model, perform inference on test data, and visualize the predicted segmentation masks.

You can find link for the code in the blog : https://eranfeit.net/u-net-image-segmentation-how-to-segment-persons-in-images/

Full code description for Medium users : https://medium.com/@feitgemel/u-net-image-segmentation-how-to-segment-persons-in-images-2fd282d1005a

You can find more tutorials, and join my newsletter here : https://eranfeit.net/

Check out our tutorial here : https://youtu.be/ZiGMTFle7bw&list=UULFTiWJJhaH6BviSWKLJUM9sg

Enjoy

Eran

1 Comment

2025/01/11
20:46 UTC

Agent Laboratory: An LLM-Based Framework for Autonomous Scientific Research

A new framework introduces an automated research pipeline using LLM agents to conduct scientific research with human oversight. The system implements a three-stage process: literature review, experimentation, and report writing.

Key technical components:

Hierarchical agent structure with specialized roles for different research tasks
Integration of human feedback loops at critical decision points
Code generation capabilities for implementing experiments
Automated paper synthesis combining literature and experimental results
Custom prompting system to maintain research coherence across stages

Results from their evaluation:

84% cost reduction compared to baseline automated research methods
Generated code matched quality of human ML practitioners in blind review
Successfully reproduced results from existing ML papers
Human reviewers rated output quality comparable to graduate-level research

I think this could significantly impact how we conduct ML research, particularly for tasks like hyperparameter optimization and architecture search. The ability to automate literature reviews while maintaining quality could help researchers focus on novel directions rather than background work.

I see the main limitation being the system's reliance on existing literature - it may struggle with truly novel research directions. The framework seems better suited for systematic exploration of known areas rather than groundbreaking new concepts.

TLDR: LLM-based research automation framework shows promising results in conducting end-to-end ML research with human oversight, achieving significant cost reductions while maintaining research quality.

Full summary is here. Paper here.

1 Comment

2025/01/11
14:57 UTC

Meta Chain-of-Thought: Teaching LLMs to Model Reasoning Processes Behind Chain-of-Thought

This work introduces Meta Chain-of-Thought (Meta-CoT), which extends regular chain-of-thought prompting by explicitly modeling the meta-reasoning process - how models decide which reasoning steps to take and why. The key innovation is combining process supervision (tracking reasoning paths), synthetic data generation, and search algorithms to help models learn better reasoning strategies.

Key technical points:

Uses process supervision to track how models explore different solution paths
Generates synthetic training data by observing successful reasoning patterns
Implements both instruction tuning and RL-based optimization
Develops verification methods for meta-reasoning explanations
Studies scaling behavior across model sizes and architectures

Results:

Models show improved performance on reasoning tasks compared to standard CoT
Generated explanations align better with human reasoning patterns
Training pipeline successfully combines instruction tuning with RL
Framework demonstrates ability to handle multiple reasoning strategies
Shows correlation between model size and meta-reasoning capabilities

I think this approach could help create more transparent AI systems that can better explain their decision-making process. The combination of process supervision and synthetic data seems like a practical way to improve reasoning capabilities without requiring massive amounts of human-labeled data.

I think the key challenge will be validating the quality of meta-reasoning explanations and ensuring they truly reflect the model's internal process rather than post-hoc rationalizations. The computational overhead may also limit practical applications.

TLDR: New framework helps language models learn not just what reasoning steps to take, but why those steps make sense, by combining process supervision, synthetic data, and search algorithms.

Full summary is here. Paper here.

1 Comment

2025/01/10
19:31 UTC