/r/mlpapers

Photograph via snooOG

A subreddit for weekly machine learning paper discussions. Started by the people from /r/MachineLearning

If you want to get started with Machine Learning, try /r/LearnMachineLearning

A subreddit for weekly machine learning paper discussions. Started by the people from /r/MachineLearning

/r/mlpapers

7,618 Subscribers

8

Google announces 2.2M new materials discovered using GNN

Materials discovery is critical but tough. New materials enable big innovations like batteries or LEDs. But there are ~infinitely many combinations to try. Testing for them experimentally is slow and expensive.

So scientists and engineers want to simulate and screen materials on computers first. This can check way more candidates before real-world experiments. However, models historically struggled at accurately predicting if materials are stable.

Researchers at DeepMind made a system called GNoME that uses graph neural networks and active learning to push past these limits.

GNoME models materials' crystal structures as graphs and predicts formation energies. It actively generates and filters candidates, evaluating the most promising with simulations. This expands its knowledge and improves predictions over multiple cycles.

The authors introduced new ways to generate derivative structures that respect symmetries, further diversifying discoveries.

The results:

  1. GNoME found 2.2 million new stable materials - equivalent to 800 years of normal discovery.
  2. Of those, 380k were the most stable and candidates for validation.
  3. 736 were validated in external labs. These include a totally new diamond-like optical material and another that may be a superconductor.

Overall this demonstrates how scaling up deep learning can massively speed up materials innovation. As data and models improve together, it'll accelerate solutions to big problems needing new engineered materials.

TLDR: DeepMind made an AI system that uses graph neural networks to discover possible new materials. It found 2.2 million candidates, and over 300k are most stable. Over 700 have already been synthesized.

Full summary available here. Paper is here.

1 Comment
2023/11/30
02:30 UTC

1

PubDef: Defending Against Transfer Attacks Using Public Models

Adversarial attacks pose a serious threat to ML models. But most proposed defenses hurt performance on clean data too much to be practical.

To address this, researchers from UC Berkeley developed a new defense called PubDef. It focuses on defending against a very plausible type of attack - transfer attacks using publicly available surrogate models.

They model the attack/defense game with game theory. This lets PubDef train against diverse attacks simultaneously.

PubDef picks source models covering different training methods - standard, adversarial, corruption robust, etc. This gives broad coverage.

Against 264 transfer attacks on CIFAR and ImageNet, PubDef smashed previous defenses:

  • 89% vs 69% on CIFAR-10
  • 51% vs 33% on CIFAR-100
  • 62% vs 36% on ImageNet

Even better - it did this with minimal drop in accuracy on clean data.

  • On CIFAR-10, accuracy only dropped from 96.3% to 96.1%
  • On CIFAR-100, 82% to 76%
  • On ImageNet, 80% to 79%

By targeting a very real threat, PubDef made big robustness gains without hurting the ability to work with clean data.

TLDR: New defense PubDef achieves much higher robustness against transfer attacks with barely any drop in standard accuracy.

Full summary here. Paper is here.

1 Comment
2023/10/29
14:52 UTC

1

Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes

When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects.

By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes.

The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image.

Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues.

Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects.

Models trained with registers have:

  • Smoother and more meaningful attention maps
  • Small boosts in downstream performance
  • Way better object discovery abilities

The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet!

I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs.

TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely.

Full summary. Paper is here.

1 Comment
2023/10/01
16:00 UTC

4

[P] Will Tsetlin machines reach state-of-the-art accuracy on CIFAR-10/CIFAR-100 anytime soon?

0 Comments
2023/09/13
13:23 UTC

3

Voicebox From Meta AI Gonna Change Voice Generation & Editing Forever - Can Eliminate ElevenLabs

1 Comment
2023/06/16
23:02 UTC

3

AI Learns How To Play Physically Simulated Tennis At Grandmaster Level By Watching Tennis Matches - By Researchers from Stanford University, NVIDIA, University of Toronto, Vector Institute, Simon Fraser University

0 Comments
2023/05/03
22:19 UTC

6

Hello. I am looking for a way to improve audio quality of older videos - perhaps audio super resolution - or any other ways

Hello everyone. I am a software engineering assistant professor at a private university. I have got lots of older lecture videos on my channel.

I am using NVIDIA broadcast to remove noise and it works very well.

However, I want to improve audio quality as well.

After doing a lot of research I found that audio super-resolution is the way to go

The only github repo I have found so far not working

Any help is appreciated

How can I improve speech quality?

Here my example lecture video (noise removed already - reuploaded - but sound is not good)

C# Programming For Beginners - Lecture 2: Coding our First Application in .NET Core Console

https://youtu.be/XLsrsCCdSnU

0 Comments
2023/02/15
21:26 UTC

2

Help needed in interpretation of a paper's data preparation.

I'm trying to build a neural network for unsupervised anomaly detection in logfiles and found and interesting paper, but I'm not sure how to prepare the data. Maybe that's because I am not a native English speaker.

[Unsupervised log message anomaly detection]

https://www.sciencedirect.com/science/article/pii/S2405959520300643

I will write down in chunks and try to interpret it.

It says under 2.3 Proposed model (page 3 bottom) the following :

  1. Tokenize and change letters to lower case - Meaning: separate by words and change to lower case
  2. Sentences are padded into 40 words - If a row has fewer than 40 word we add some special character (like '0') as placeholder for the remaining words.
  3. sentences below 5 words are eliminated - Trivial
  4. Word frequency than calculated and the data is shuffled - ????
  5. Data normalized between 0 and 1 - I don't really understand what is the data

I cannot really follow at step 4. It would be great if you could help me!

0 Comments
2023/01/12
10:01 UTC

5

[R] Do we really need 300 floats to represent the meaning of a word? Representing words with words - a logical approach to word embedding using a self-supervised Tsetlin Machine Autoencoder.

0 Comments
2023/01/03
18:57 UTC

5

[R] New paper on autonomous driving and multi-task: "HybridNets: End-to-End Perception Network"

0 Comments
2022/03/18
21:52 UTC

3

Fully interpretable logical learning and reasoning for board game winner prediction with Tsetlin Machine obtain 92.1% accuracy on 6x6 Hex boards.

Logical learning of strong and weak board game positions

The approach learns what strong and weak board positions look like with simple logical patterns, facilitating both global and local interpretability, as well as explaining the learning steps. Our end-goal in this research project is to enable state-of-the-art human-AI-collaboration in board game playing through transparency. Paper: https://arxiv.org/abs/2203.04378

1 Comment
2022/03/10
08:18 UTC

9

NeurIPS 2021 - Curated papers - Part 2

In part-2 , I have discussed following papers :

  1. Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
  2. Attention Bottlenecks for Multimodal Fusion
  3. AugMax: Adversarial Composition of Random Augmentations for Robust Training
  4. Revisiting Model Stitching to Compare Neural Representations

https://rakshithv-deeplearning.blogspot.com/2021/12/neurips-2021-curated-papers-part2.html

0 Comments
2021/12/28
17:28 UTC

1

NeurIPS 2021 — Curated papers — Part 1

I tried to curate the list of few papers from #neurips2021

In the following blog, Goal is to briefly describe what paper talks about and how it works in a crisp way, this is not a detailed explanation.

In Part-1, I have discussed about following papersa. UniDoc : Multi-modal interactions between text and image from document understanding point of view.b. Few-shot learning for multi-modal data using frozen auto-regressive language modelc. Adversarial methods to avoid manipulation of counter-factual explanations

https://rakshithv-deeplearning.blogspot.com/2021/12/neurips-2021-curated-papers-part-1.html

0 Comments
2021/12/18
17:10 UTC

5

Steerable discovery of neural audio effects

Paper: https://arxiv.org/abs/2112.02926

Abstract:

Applications of deep learning for audio effects often focus on modeling analog effects or learning to control effects to emulate a trained audio engineer. However, deep learning approaches also have the potential to expand creativity through neural audio effects that enable new sound transformations. While recent work demonstrated that neural networks with random weights produce compelling audio effects, control of these effects is limited and unintuitive. To address this, we introduce a method for the steerable discovery of neural audio effects. This method enables the design of effects using example recordings provided by the user. We demonstrate how this method produces an effect similar to the target effect, along with interesting inaccuracies, while also providing perceptually relevant controls.

Repo with video demo & Colab examples: https://github.com/csteinmetz1/steerable-nafx

Submission statement: This has already been making the rounds on a few other subs, but I thought that this was an interesting conference abstract and project. I'm personally interested in the potential for driving a similar process in reverse, i.e., removing distortion rather than adding it. If anyone else has read any good papers pertaining to audio restoration recently, let me know! (I have a pet project to eventually restore some very low-quality audio of a deceased relative, so I've been loosely keeping tabs on ML audio processing, but it's not my primary area.)

1 Comment
2021/12/16
04:40 UTC

7

BEIT: BERT Pre-Training of Image Transformers

https://rakshithv.medium.com/beit-bert-pre-training-of-image-transformers-e43a9884ec2f

BERT like architecture for training a vision models. Vision transformers make use of idea of using a image patch in analogous with text token.
Whereas BEiT also formulates a objective function similar to MLM, But predicting a masked image patch of 16*16 patch which can take 0 to 255 is challenging.
Hence they make use of image tokenizers for prediction instead of predicting a overall patch.
BEiT takes relatively less data for pre-training compared to vision transformers .

In this blog, I tried to put together my understanding of the paper.

0 Comments
2021/09/12
15:53 UTC

5

What are some good review articles to start learning about ML application in Biomedical disciplines?

I have been working in ML for some time now, and want to start learning about its applications in the biomedical domain. What would be some good starting points?

1 Comment
2021/08/23
09:28 UTC

4

[D] Charformer Paper Explained and Visualized: Fast Character Transformers via Gradient-based Subword Tokenization

1 Comment
2021/06/30
15:16 UTC

10

ProteinBERT: A universal deep-learning model of protein sequence and function

0 Comments
2021/05/30
09:02 UTC

1

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

0 Comments
2021/05/23
14:26 UTC

2

MLP-Mixer: An all-MLP Architecture for Vision

0 Comments
2021/05/23
14:25 UTC

5

Emerging Properties in Self-Supervised Vision Transformers (DINO)

0 Comments
2021/05/23
14:24 UTC

5

[R] A Review of "Neural Anisotropy Directions" (2020)

1 Comment
2021/05/21
11:49 UTC

6

PET, iPET, ADAPET papers explained! “Small language models are also few-shot learners”. Paper links in the comment section and as always, in the video description.

1 Comment
2021/04/02
11:54 UTC

2

New Pre-Print: Bio-Inspired Robustness: A Review

Hello everyone,

We recently added a new pre-print on how human visual system-inspired components can help with adversarial robustness. We study recent attempts in the area and analyze their properties and evaluation criteria for robustness. Please let us know what you think of the paper and any feedback is highly appreciated!!! :)

P.S Please forgive the word format TT TT, first and last time I do this in my life. Else it's Latex all the way.

Title: 'Bio-Inspired Robustness: A Review '

Arxiv link: https://arxiv.org/abs/2103.09265

Abstract: Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. For example, in the case of adversarial attacks, where adding small amounts of noise to an image, including an object, can lead to strong misclassification of that object. But for humans, the noise is often invisible. If vulnerability to adversarial noise cannot be fixed, DCNNs cannot be taken as serious models of human vision. Many studies have tried to add features of the human visual system to DCNNs to make them robust against adversarial attacks. However, it is not fully clear whether human vision-inspired components increase robustness because performance evaluations of these novel components in DCNNs are often inconclusive. We propose a set of criteria for proper evaluation and analyze different models according to these criteria. We finally sketch future efforts to make DCCNs one step closer to the model of human vision.

0 Comments
2021/03/25
14:24 UTC

3

Animating facial expressions and body gestures directly from speech!

0 Comments
2021/02/17
18:45 UTC

1

Create a Game Character Face from a Single Portrait!

0 Comments
2021/02/06
22:44 UTC

7

[N] NVIDIA, UToronto, McGill & Vector Study Delivers Real-Time SDF Rendering & SOTA Complex Geometry Reconstruction

A new study by NVIDIA, University of Toronto, McGill University and the Vector Institute introduces an efficient neural representation that enables real-time rendering of high-fidelity neural SDFs for the first time while delivering SOTA quality geometric reconstruction.

Here is a quick read: NVIDIA, UToronto, McGill & Vector Study Delivers Real-Time SDF Rendering & SOTA Complex Geometry Reconstruction

The paper Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Surfaces is on arXiv.

1 Comment
2021/01/29
01:07 UTC

Back To Top