2,951,462 Subscribers

[D][R] are large language models going to revolutionize Recommendation?

LinkedIn just dropped some intriguing research on using large language models (LLMs) for ranking and recommendation tasks. You can dive into the details in this paper (https://arxiv.org/abs/2501.16450).

Traditionally, recommendation systems have leaned on big, sparse tables (think massive ID embedding tables) to map users to content. But this new approach flips the script: it “verbalizes” all the features, turning them into text that an LLM can chew on (LLM have small embedding tables). The idea is that since recommendations are essentially about matching users with content, an LLM’s knack for pattern recognition and reasoning might uncover hidden insights in user behavior that old-school methods miss.

Here’s the cool part: if this works, we could be looking at recommendation systems that aren’t just smarter but also capable of explaining why they made a certain suggestion. This create zero-shot capability, building a RS model with few examples. No need for a new team or ML engineers for every ranking model.

Of course, there’s a catch. Converting everything into text and then processing it with a massive model sounds like it could be super inefficient. We're talking potential issues with latency and scaling, especially when you need to serve recommendations in real time. It’s a classic case of “smarter but slower” unless some clever optimizations come into play.

So, while this research direction is undeniably exciting and could totally shake up the recommendation game, the big question is: can it be made practical? Will the benefits of better reasoning and explainability outweigh the extra computational cost? Only time (and further research) will tell.

What do you all think?

8 Comments

2025/02/02
20:32 UTC

286

[D] Which software tools do researchers use to make neural net architectures like this?

47 Comments

2025/02/02
20:19 UTC

[R] Addressing Underthinking in LLMs: A Token-Based Strategy to Improve Reasoning Depth

This paper introduces a novel methodology for analyzing "underthinking" patterns in large language models by tracking reasoning consistency through token-level output analysis. The researchers developed metrics to identify when models switch between different cognitive approaches during tasks.

Key technical points:

Developed quantitative metrics for measuring thought pattern switches in model outputs
Analyzed token-level sequences to detect reasoning path changes
Found models switch thinking approaches every 2-3 reasoning steps on average
Demonstrated 15-30% accuracy reduction correlating with frequent switches
Showed simpler tasks are more impacted by inconsistent reasoning than complex ones

The methodology combines:

Token pattern analysis to identify reasoning state changes
Performance correlation studies across task complexity levels
Comparative analysis between consistent vs inconsistent reasoning paths
Metrics for quantifying thought fragmentation impact

I think this research reveals important limitations in current LLM architectures that need addressing before these systems can be reliably used for tasks requiring sustained reasoning. The metrics and analysis methods could be valuable tools for evaluating and improving model training approaches.

I think the most interesting technical finding is that simpler tasks actually suffer more from thought switching than complex ones - this suggests our assumptions about how these models handle different cognitive loads may need revision.

TLDR: New method quantifies how often LLMs switch reasoning patterns mid-task, showing 15-30% performance drops from inconsistent thinking. Simple tasks surprisingly more affected than complex ones.

Full summary is here. Paper here.

0 Comments

2025/02/02
20:07 UTC

[D] xLSTM and Attention

Hi everyone,

I am currently working on my Masters thesis about Drum-Track-Synthesis via a Extended Long-Term-Short-Term Model and I thought about introducing Attention to the Model-Architecture as it seems to be quite effective in Music Generation tasks as some studies with Bi-LSTMs have shown. As I haven't really found any papers combining xLSTMs and Attention, I am kind of unsure if I have missed something or it hasn't really been tested yet (Since it is still a novel tech.). What is your opinion?

Thanks in advance!

2 Comments

2025/02/02
18:40 UTC

[R] [P] Investigating KV Cache Compression using Large Concept Models

Hey folks, over the holidays I read Meta's papers introducing Large Concept Models and thought it could be powerful approach to compress the KV Cache. I implemented and trained an LCM architecture in Jax on TPU v4-32s to explore its potential for KV cache compression. Full implementation and detailed results are available here.

Key findings: While promising in theory, the base LCM architecture showed significant performance degradation. I suspect the following to cause this degredation:

Sequence packing compromises concept embedding semantics, hindering effective attention
Joint encoder-decoder training wastes compute on concept formation rather than leveraging pretrained knowledge
Reduced effective training as LCM trains over seq_len/concept_size examples vs seq_len in standard transformers

Potential improvements worth exploring:

Disabling sequence packing
Leveraging pretrained encoders/decoders (SONAR/T5)
Investigating diffusion-based LCM with/without joint training

However, given the fundamental data efficiency issues, alternative KV cache compression approaches may be more promising.

Implementation details and full analysis in the links above. Open to discussion and feedback.

2 Comments

2025/02/02
16:27 UTC

[D] How you even start with modeling data and ML with Statistics

Ok, So I have learn and has some idea about algos of Machine learning like Decision Tree, Random forest, etc. But I still dont have any idea about Hypothesis testing practically in ML, like I dont even know about how many and which test to use when. I was working with someone and he said that he is going to train models based on different distribution, perform HYpthesis testing and all, and I was dumbstruck. I know kaggle but when I go through them they are sometimes too confusijng (which I want to learn) and sometimes just EDA (basic), I want to know how you even get these Idea like using test, creating distribution of models. I maybe wrong in describing these, but I am just confused and scared.
Please help me I want to learn these things, but I only understand the easy stuff (HOML 2 and 3). Are there any resources to learn these things.

2 Comments

2025/02/02
14:37 UTC

[D] How to get attention maps from a Multimodal LLM like Llama-3.2-Vision?

I am working on a project where I want the user to see what the model "sees" when predicting each token. I am looking for a way to extract attention maps from the vision encoder during inference. Any idea how this can be achieved or if there is any code available for this?

1 Comment

2025/02/02
13:40 UTC

[P] VGSLify – Define and Parse Neural Networks with VGSL (Now with Custom Layers!)

Hey everyone, I want to share VGSLify, a Python package that simplifies defining, training, and interpreting neural networks using VGSL (Variable-size Graph Specification Language). Inspired by Tesseract's VGSL, VGSLify extends this concept for both TensorFlow and PyTorch. 🚀

🔹 What is VGSL?

VGSL is a compact way to define deep learning models using a simple string format:

None,None,64,1 Cr3,3,32 Mp2,2 Cr3,3,64 Mp2,2 Rc3 Fr64 D20 Lfs128 D20 Lf64 D20 Fs10

Each token represents a layer:

Cr3,3,32 → Convolution (3x3 kernel, 32 filters, ReLU activation)
Mp2,2 → MaxPooling (2x2)
Rc3 → Reshape to (sequence, features)
Lfs128 → Forward LSTM with 128 units that returns sequences
D20 → Dropout layer with rate 0.2
Lf64 → Forward LSTM with 128 units that does not return sequences
Fs10 → Fully connected layer with 10 outputs and softmax activation

🚀 Convert VGSL to a Deep Learning Model

With VGSLify, you can easily generate TensorFlow or PyTorch models from a VGSL string:

from vgslify import VGSLModelGenerator

vgsl_spec = "None,None,64,1 Cr3,3,32 Mp2,2 Fs92"
vgsl_gen = VGSLModelGenerator(backend="tensorflow")  # Or "torch"

model = vgsl_gen.generate_model(vgsl_spec)
model.summary()

🔄 Convert an Existing Model to VGSL

Want to get the VGSL representation of your model? Use:

from vgslify import model_to_spec
import tensorflow as tf

model = tf.keras.models.load_model("your_model.keras")
vgsl_spec = model_to_spec(model)
print(vgsl_spec)

Perfect for exporting models in a compact format.

🔥 What's New in VGSLify v0.14.0?

I've just released VGSLify v0.14.0, which adds some highly requested features! 🎉

✅ Custom Layer Registration

Now you can extend VGSL with your own layers:

from vgslify.tensorflow import register_custom_layer

@register_custom_layer("Xsw")
def build_custom_layer(factory, spec):
    return tf.keras.layers.Dense(10)  # Example custom layer

This means you can add any layer you need while still using VGSL's simplicity.

✅ Custom Model Parsing

Need to convert a model with custom layers back to VGSL? Just register a parser:

from vgslify.model_parsers.tensorflow import register_custom_parser

@register_custom_parser(MyCustomLayer)
def parse_my_custom_layer(layer):
    return f"Xsw({layer.units})"

Now, VGSLify will automatically recognize your custom layers when converting models.

✅ Simplified Imports & Cleaner API

I've reorganized modules for easier usage:

from vgslify import VGSLModelGenerator, model_to_spec

No need for deep imports anymore!

📥 Installation

pip install vgslify[tensorflow]   # For TensorFlow
pip install vgslify[torch]        # For PyTorch

Or, install just the core library without any deep learning backend:

pip install vgslify

🛠️ Why Use VGSLify?

Compact and Readable → Define entire models in a single string
Works with TensorFlow & PyTorch → Seamlessly switch between backends
Parse & Export Models → Easily convert models to VGSL and back
Now Extendable! → Custom layers and parsers make it even more flexible

🌟 Check it out on GitHub & PyPI:

GitHub: https://github.com/timkoornstra/VGSLify
PyPI: https://pypi.org/project/vgslify/

Would love to hear your feedback! Let me know what you think. 😊

2 Comments

2025/02/02
13:08 UTC

[D] [R] Teaching AI to Think Without Knowing What Thinking Is

AI has made huge strides in mimicking human behavior, but it still lacks true thought processes behind decision-making and problem-solving. Instead of replicating neural activity, what if we trained AI on the outcomes of human thinking—decisions, solutions, and actions—using text, voice, multimodal data, and EEG signals?

Our approach aims to teach AI how we think, not just what we do, bridging the gap between pattern recognition and true cognitive emulation. This could revolutionize problem-solving in AI.

📄 Read the paper: github.com/abhijayhm/ThoughtMimickingModel

What are your thoughts on AI learning from human decision-making instead of just data patterns?

#AI #MachineLearning #CognitiveAI #Neuroscience #EEG

7 Comments

2025/02/02
07:59 UTC

[News] TMLR was approved for indexing in Scopus

2024 TMLR Annual Report - Google Docs On January 14, 2025, TMLR was approved for indexing in Scopus. On January 15, 2025, TMLR was approved for indexing in DOAJ.

Posting this here because I haven't seen this announced anywhere. Great news for ML researchers/PhDs in Europe and South-America where many universities only recognize Scopus indexed papers.

16 Comments

2025/02/02
08:40 UTC

[D] Self-Promotion Thread

Please post your personal projects, startups, product placements, collaboration needs, blogs etc.

Please mention the payment and pricing requirements for products and services.

Please do not post link shorteners, link aggregator websites , or auto-subscribe links.

Any abuse of trust will lead to bans.

Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Meta: This is an experiment. If the community doesnt like this, we will cancel it. This is to encourage those in the community to promote their work by not spamming the main threads.

5 Comments

2025/02/02
03:15 UTC

How to correctly compute the 16 quantization levels for NF4 (NormalFloat4) from QLoRA? [Discussion]

Hey everyone,

I’m trying to correctly implement the NF4 (NormalFloat4) quantization levels described in the QLoRA paper, but I’m running into discrepancies between my computed values and the expected ones.

The paper states:

The information theoretically optimal data type for zero-mean normal distributions with arbitrary standard deviations 𝜎 in the range [−1,1] is computed as follows:

(1) estimate the 2^𝑘+1 quantiles of a theoretical N(0,1) distribution to obtain a k-bit quantile quantization data type for normal distributions,

(2) take this data type and normalize its values into the [−1,1] range,

(3) quantize an input weight tensor by normalizing it into the [−1,1] range through absolute maximum rescaling.

First, doubt is 2^𝑘+1 quantiles of a theoretical N(0,1) includes infinities on either end; how do I normalize them to [-1, 1]? Also, regarding the quantization levels/values of the NF4 data type, are they the midpoint of adjacent quantiles? or a point between adjacent quantiles such that both the splits have the same number of weights?

Once I understand these, maybe my other doubts will be resolved.

1 Comment

2025/02/01
23:49 UTC

[D] A video compilation of the best NLP papers from 2024

Sharing the best NLP research papers from 2024, covering 15 papers that I found the most interesting.

0 Comments

2025/02/01
22:39 UTC

[News] Tulu 3 model performing better than 4o and Deepseek?

Has anyone used this model released by the Allen Institute for AI on Thursday? It seems to outperform 4o and DeepSeek in a lot of places, but for some reason there's been little to no coverage. Thoughts?

https://www.marktechpost.com/2025/01/31/the-allen-institute-for-ai-ai2-releases-tulu-3-405b-scaling-open-weight-post-training-with-reinforcement-learning-from-verifiable-rewards-rlvr-to-surpass-deepseek-v3-and-gpt-4o-in-key-benchmarks/

22 Comments

2025/02/01
19:57 UTC

[P] New site/app for listening to research papers: Paper2Audio.com

tl;dr Use Paper2Audio.com to listen to research papers, or DM me for access to our beta iOS app.

We’ve built a website and a beta iOS app for listening to research papers! Check out Paper2Audio.com or reach out if you’d like access to the iOS beta.

There are three listening modes:

Full Paper – Reads the entire paper, including summarized tables, figures, and code blocks.
Short Summary – Condenses the paper into a ~5-minute audio summary.
Long Summary – Provides a more detailed summary, about one-third the length of the original paper.

None of the modes simulate a podcast. You just upload a PDF and you get back an audio version of a paper. For now, it is entirely free for users.

I've been using Paper2Audio to listen to papers mostly on vision-language models, the latest LLM papers like Deepseek R1, which have helped us improve our service. I'm also an economist, so I've been catching up on economics papers with Paper2Audio.

Questions and feedback are most welcome!

6 Comments

2025/02/01
18:42 UTC

[D]What is the best speech recognition model now?

OpenAI’s Whisper was released more than two years ago, and it seems that no other model has seriously challenged its position since then. While Whisper has received updates over time, its performance in languages other than English—such as Chinese—is not ideal for me. I’m looking for an alternative model to generate subtitles for videos and real-time subtitles for live streams.

I have also tried Alibaba’s FunASR, but it was released more than one year ago as well and does not seem to offer a satisfied performance.

I am aware of some LLM-based speech models, but their hardware requirements are too high for my use case.

In other AI fields, new models are released almost every months, but there seems to be less attention on advancements in speech recognition. Are there any recent models worth looking into?

8 Comments

2025/02/01
17:28 UTC

[P] Built a Simple Linear Regression Tool – Would Love Your Thoughts!

Hey ML folks,
I'm Khanh, a software engineer guy, I just started to learn ML.

I threw together a web-based linear regression tool that lets you plot data, fit a regression line, and check out key stats (R², MSE, p-values, etc.)—all without writing a single line of code.

🔗 Check it out here: https://www.linear-regression.dev/

You can:

• Add your own data points or generate random ones

• See the regression line update in real-time

• Get a quick breakdown of the model stats

Not trying to reinvent the wheel, just wanted something simple and quick for basic regression analysis. If you give it a spin, let me know what you think! Anything missing? Anything annoying? Appreciate any feedback! 🙌

https://preview.redd.it/yb71svhdmfge1.png?width=2608&format=png&auto=webp&s=7db939541a2ebd8af136ec2c595168662fadf204

1 Comment

2025/02/01
01:40 UTC

[D] - Data Leakage in Time Series Classification

Hello everyone.

I am working on a project which involves multi-class time series classification. The database is kinda complicated, as it has a good amount of missing or inconsistent values (extreme outliers). The data is also imbalanced.

We are testing some of these architectures:

Random Forest.
Arsenal.
DrCIF.
Resnet.
InceptionTime.
LSTM.

The procedure we use is given as follows:

Data cleaning - Feature Extraction (if needed, because for the Deep Learning architectures the feature extraction is done automatically, the input is the raw time series) - Normalization (Standard Scaler) - Classification.

The dataset is instance based, that is, there are lots of instances (csv files) for each class. The dataset is also composed by more than 30 variables, however the majority of them are NaN or inconsistent values. Hence for the classification task only four variables are considered.

Considering the four variables, the cleaning is done as follows:

If one of the four variables has a non-valid value for 100% of the observations in an instance, that instance is removed.
If one of the four variables has a non-valid value different of 100% for an instance, interpolation is used.

In the cleaning step, the interpolation is always done within the same instance. I do the train-test-validation split separating different instances in different folders (training, testing and validation folders). The ratio is kept the same for all the classes in all three folders. Hence as far as my knowledge goes no data leakage is happening here.

Then in the feature extraction step, I use the sliding window, with no overlap because the data-set is large: These following features are extracted: mean, std dev, kurtosis, skewness, min, Q1, median, Q3 and max. Again, the values are calculated only from the windows, without considering other windows, hence I don't see data leakage happening here.

For the normalization step, I apply the fit_transform() method to the data in X_train, then the transform() method for the data in X_test and X_val, which to me is standard. Finally, the classification method is applied.

From my point of view, I see no data leakage. However, analyzing the results, the Random Forest had a better average f1-score (use f1-score due to imbalanced data) than the other methods (not a large difference), hence I want to check it here it I missed any step to ensure the absence of data leakage.

Thanks a lot everyone.

TLDR: Did I miss anything in my time series classification problem to cause data leakage? Especially in the cleaning and feature extraction steps. Random Forest performed a bit better than more robust methods.

0 Comments

2025/02/01
00:26 UTC

[D] Sentence classification and Custom Entity Recognition for Information extraction - Does This Approach Work?

https://preview.redd.it/yt2v9o0klige1.png?width=1901&format=png&auto=webp&s=e5efc6e6a03ce5210ca807b620b0886eb3c598bb

I'm working on extracting financial entities (e.g., EPS, Revenue) from HTML documents that don’t follow a consistent template. i don't want go with LLM (RAG).

I’m considering the following approach:

Parse the HTML using a custom parser to maintain the table structure while adding delimiters.
Classify the extracted text line by line or sentence by sentence.
Perform NER on the classified text to extract relevant values.

The goal is to achieve maximum accuracy with low latency. Does this approach seem viable? Are there any optimizations or alternative methods I should consider?

10 Comments

2025/02/01
11:39 UTC

[R] Molecular Fingerprints Are Strong Models for Peptide Function Prediction

TL;DR we show that molecular fingerprints give SOTA results for peptide classification, and Long Range Graph Benchmark (LRGB) does not really have long-range dependencies

ArXiv: https://arxiv.org/abs/2501.17901

Abstract:

We study the effectiveness of molecular fingerprints for peptide property prediction and demonstrate that domain-specific feature extraction from molecular graphs can outperform complex and computationally expensive models such as GNNs, pretrained sequence-based transformers and multimodal ensembles, even without hyperparameter tuning. To this end, we perform a thorough evaluation on 126 datasets, achieving state-of-the-art results on LRGB and 5 other peptide function prediction benchmarks. We show that models based on count variants of ECFP, Topological Torsion, and RDKit molecular fingerprints and LightGBM as classification head are remarkably robust. The strong performance of molecular fingerprints, which are intrinsically very short-range feature encoders, challenges the presumed importance of long-range interactions in peptides. Our conclusion is that the use of molecular fingerprints for larger molecules, such as peptides, can be a computationally feasible, low-parameter, and versatile alternative to sophisticated deep learning models.

Key contributions:

Molecular fingerprints, a simple feature extraction on molecular graphs, work great for peptides
They get SOTA results on LRGB, while being very short-range descriptors, and contradict claims that it really requires long-range dependencies

First one is more bioinformatics-oriented, but second is very relevant for GNNs evaluation methodology. Most papers that design GNNs capable of learning long-range relations between nodes evaluate on LRGB. But it seems not to really have that, so any conclusions here may be either a) spurious correlation b) they are learning something interesting, but not really long-range relations. Interestingly, the original reviewers of LRGB had the same doubts (https://openreview.net/forum?id=in7XC5RcjEn).

11 Comments

2025/02/01
09:37 UTC

[Discussion] Reason for Activation Steering over finetuning?

I am working on a project and someone suggested me to try out activation steering over fine tuning, but I fail to understand why anyone would do that, on paper the idea looks elegant but what are the real benefits for doing it?

More context about activation steering (from chatgpt):
Activation steering is a technique to control language model behavior by modifying neuron activations in specific layers. Instead of retraining or fine-tuning, it applies learned direction vectors—often derived from contrastive examples—to nudge model outputs in a desired direction (e.g. reducing bias or aligning with specific instructions). This method is efficient, interpretable, and allows real-time intervention without modifying the underlying model weights. Great for fine-grained control over model behavior!

9 Comments

2025/02/01
04:31 UTC

[P] Interactive Explanation to ROC AUC Score

Hi Community,

I worked on an interactive tutorial on the ROC curve, AUC score and the confusion matrix.

https://maitbayev.github.io/posts/roc-auc/

Any feedback appreciated!

Thank you!

13 Comments

2025/01/31
18:54 UTC

[D] Does all distillation only use soft labels (probability distribution)?

I'm reading through the Deepseek R1 paper's distillation section and did not find any reference to soft labels (probability distributions) in the SFT dataset.

Is it implied that in the process of distillation it's always soft labels? Because the SFT data creation using rejection sampling sounded more like these were hard labels. Thoughts?

7 Comments

2025/01/31
16:18 UTC

[Discussion] Reproducibility in reporting Performance and Benchmarks

I have been reading ML papers for about a year now. Coming from a background in physics, I see that papers do not account for reproducibility at all. The paper often does not reveal all the details they used, such as the model architecture parameters or other hyperparameters.

This also brings me to the question: I almost never see error bars!

I know pre-training is difficult and requires a lot of computing power. However, I imagine that evaluation can be done several times. In fact, many researchers run the evaluation several times but only report their best results instead of reporting an average with confidence intervals, especially when comparing their model against baselines.

What do you guys think about this? Do you think this might be a reason for the inflation of mediocre research being done in AI/ML?

8 Comments

2025/01/31
16:02 UTC

[D] Cloud GPU instance service that plays well with Nvidia Nsight Systems CLI?

TLDR is the title.

I'm working on writing custom pytorch code to improve training throughput, primarily through asynchrony, concurrency and parallelism on both the GPU and CPU.

Today I finally set up Nsight Systems locally and it's really improved my understanding of things.

While I got it working on my RTX3060, that is hardly representative of true large ML training environments.

... so I tried to get it going on Runpod and fell flat on my face. Something about a kernel paranoid level (that I can't reduce), a --privileged arg (which I can't add because Runpod gives the RUN for Docker, ) and everything in 'nsys status -e' showing 'fail'.

Any ideas?

4 Comments

2025/01/31
15:41 UTC

125

[R] Fully open source codebase to train SOTA VLMs

Hi! I'm Andi from multimodal team at Hugging Face.

Today we're open-sourcing the codebase used to train SmolVLM from scratch on 256 H100s
Inspired by our team's effort to open-source DeepSeek's R1 training, we are releasing the training and evaluation code on top of the weights
Now you can train any of our SmolVLMs—or create your own custom VLMs!

Go check it out:

https://github.com/huggingface/smollm/tree/main/vision

11 Comments

2025/01/31
15:19 UTC

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

2,951,462 Subscribers

[D][R] are large language models going to revolutionize Recommendation?

[D] Which software tools do researchers use to make neural net architectures like this?

[R] Addressing Underthinking in LLMs: A Token-Based Strategy to Improve Reasoning Depth

[D] xLSTM and Attention

[R] [P] Investigating KV Cache Compression using Large Concept Models

[D] How you even start with modeling data and ML with Statistics

[D] How to get attention maps from a Multimodal LLM like Llama-3.2-Vision?

[P] VGSLify – Define and Parse Neural Networks with VGSL (Now with Custom Layers!)

🔹 What is VGSL?

🚀 Convert VGSL to a Deep Learning Model

🔄 Convert an Existing Model to VGSL

🔥 What's New in VGSLify v0.14.0?

✅ Custom Layer Registration

✅ Custom Model Parsing

✅ Simplified Imports & Cleaner API

📥 Installation

🛠️ Why Use VGSLify?

🌟 Check it out on GitHub & PyPI:

[D] [R] Teaching AI to Think Without Knowing What Thinking Is

[News] TMLR was approved for indexing in Scopus

[D] Self-Promotion Thread

How to correctly compute the 16 quantization levels for NF4 (NormalFloat4) from QLoRA? [Discussion]

[D] A video compilation of the best NLP papers from 2024

[News] Tulu 3 model performing better than 4o and Deepseek?

[P] New site/app for listening to research papers: Paper2Audio.com

[D]What is the best speech recognition model now?

[P] Built a Simple Linear Regression Tool – Would Love Your Thoughts!

[D] - Data Leakage in Time Series Classification

[D] Sentence classification and Custom Entity Recognition for Information extraction - Does This Approach Work?

[R] Molecular Fingerprints Are Strong Models for Peptide Function Prediction

[Discussion] Reason for Activation Steering over finetuning?

[P] Interactive Explanation to ROC AUC Score

[D] Does all distillation only use soft labels (probability distribution)?

[Discussion] Reproducibility in reporting Performance and Benchmarks

[D] Cloud GPU instance service that plays well with Nvidia Nsight Systems CLI?

[R] Fully open source codebase to train SOTA VLMs