/r/mlpapers
A subreddit for weekly machine learning paper discussions. Started by the people from /r/MachineLearning
If you want to get started with Machine Learning, try /r/LearnMachineLearning
A subreddit for weekly machine learning paper discussions. Started by the people from /r/MachineLearning
/r/mlpapers
Materials discovery is critical but tough. New materials enable big innovations like batteries or LEDs. But there are ~infinitely many combinations to try. Testing for them experimentally is slow and expensive.
So scientists and engineers want to simulate and screen materials on computers first. This can check way more candidates before real-world experiments. However, models historically struggled at accurately predicting if materials are stable.
Researchers at DeepMind made a system called GNoME that uses graph neural networks and active learning to push past these limits.
GNoME models materials' crystal structures as graphs and predicts formation energies. It actively generates and filters candidates, evaluating the most promising with simulations. This expands its knowledge and improves predictions over multiple cycles.
The authors introduced new ways to generate derivative structures that respect symmetries, further diversifying discoveries.
The results:
Overall this demonstrates how scaling up deep learning can massively speed up materials innovation. As data and models improve together, it'll accelerate solutions to big problems needing new engineered materials.
TLDR: DeepMind made an AI system that uses graph neural networks to discover possible new materials. It found 2.2 million candidates, and over 300k are most stable. Over 700 have already been synthesized.
Full summary available here. Paper is here.
Adversarial attacks pose a serious threat to ML models. But most proposed defenses hurt performance on clean data too much to be practical.
To address this, researchers from UC Berkeley developed a new defense called PubDef. It focuses on defending against a very plausible type of attack - transfer attacks using publicly available surrogate models.
They model the attack/defense game with game theory. This lets PubDef train against diverse attacks simultaneously.
PubDef picks source models covering different training methods - standard, adversarial, corruption robust, etc. This gives broad coverage.
Against 264 transfer attacks on CIFAR and ImageNet, PubDef smashed previous defenses:
Even better - it did this with minimal drop in accuracy on clean data.
By targeting a very real threat, PubDef made big robustness gains without hurting the ability to work with clean data.
TLDR: New defense PubDef achieves much higher robustness against transfer attacks with barely any drop in standard accuracy.
Full summary here. Paper is here.
When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects.
By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes.
The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image.
Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues.
Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects.
Models trained with registers have:
The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet!
I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs.
TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely.
Full summary. Paper is here.
Hello everyone. I am a software engineering assistant professor at a private university. I have got lots of older lecture videos on my channel.
I am using NVIDIA broadcast to remove noise and it works very well.
However, I want to improve audio quality as well.
After doing a lot of research I found that audio super-resolution is the way to go
The only github repo I have found so far not working
Any help is appreciated
How can I improve speech quality?
Here my example lecture video (noise removed already - reuploaded - but sound is not good)
C# Programming For Beginners - Lecture 2: Coding our First Application in .NET Core Console
I'm trying to build a neural network for unsupervised anomaly detection in logfiles and found and interesting paper, but I'm not sure how to prepare the data. Maybe that's because I am not a native English speaker.
[Unsupervised log message anomaly detection]
https://www.sciencedirect.com/science/article/pii/S2405959520300643
I will write down in chunks and try to interpret it.
It says under 2.3 Proposed model (page 3 bottom) the following :
I cannot really follow at step 4. It would be great if you could help me!
Logical learning of strong and weak board game positions
The approach learns what strong and weak board positions look like with simple logical patterns, facilitating both global and local interpretability, as well as explaining the learning steps. Our end-goal in this research project is to enable state-of-the-art human-AI-collaboration in board game playing through transparency. Paper: https://arxiv.org/abs/2203.04378
In part-2 , I have discussed following papers :
https://rakshithv-deeplearning.blogspot.com/2021/12/neurips-2021-curated-papers-part2.html
I tried to curate the list of few papers from #neurips2021
In the following blog, Goal is to briefly describe what paper talks about and how it works in a crisp way, this is not a detailed explanation.
In Part-1, I have discussed about following papersa. UniDoc : Multi-modal interactions between text and image from document understanding point of view.b. Few-shot learning for multi-modal data using frozen auto-regressive language modelc. Adversarial methods to avoid manipulation of counter-factual explanations
https://rakshithv-deeplearning.blogspot.com/2021/12/neurips-2021-curated-papers-part-1.html
Paper: https://arxiv.org/abs/2112.02926
Abstract:
Applications of deep learning for audio effects often focus on modeling analog effects or learning to control effects to emulate a trained audio engineer. However, deep learning approaches also have the potential to expand creativity through neural audio effects that enable new sound transformations. While recent work demonstrated that neural networks with random weights produce compelling audio effects, control of these effects is limited and unintuitive. To address this, we introduce a method for the steerable discovery of neural audio effects. This method enables the design of effects using example recordings provided by the user. We demonstrate how this method produces an effect similar to the target effect, along with interesting inaccuracies, while also providing perceptually relevant controls.
Repo with video demo & Colab examples: https://github.com/csteinmetz1/steerable-nafx
Submission statement: This has already been making the rounds on a few other subs, but I thought that this was an interesting conference abstract and project. I'm personally interested in the potential for driving a similar process in reverse, i.e., removing distortion rather than adding it. If anyone else has read any good papers pertaining to audio restoration recently, let me know! (I have a pet project to eventually restore some very low-quality audio of a deceased relative, so I've been loosely keeping tabs on ML audio processing, but it's not my primary area.)
https://rakshithv.medium.com/beit-bert-pre-training-of-image-transformers-e43a9884ec2f
BERT like architecture for training a vision models. Vision transformers make use of idea of using a image patch in analogous with text token.
Whereas BEiT also formulates a objective function similar to MLM, But predicting a masked image patch of 16*16 patch which can take 0 to 255 is challenging.
Hence they make use of image tokenizers for prediction instead of predicting a overall patch.
BEiT takes relatively less data for pre-training compared to vision transformers .
In this blog, I tried to put together my understanding of the paper.
I have been working in ML for some time now, and want to start learning about its applications in the biomedical domain. What would be some good starting points?
Quick summary of the paper https://rakshithv.medium.com/mlp-mixer-an-all-mlp-architecture-for-vision-70ad2cea545f
Quick summary of the paper https://rakshithv.medium.com/emerging-properties-in-self-supervised-vision-transformers-dino-e9cd2126c05b
Hello everyone,
We recently added a new pre-print on how human visual system-inspired components can help with adversarial robustness. We study recent attempts in the area and analyze their properties and evaluation criteria for robustness. Please let us know what you think of the paper and any feedback is highly appreciated!!! :)
P.S Please forgive the word format TT TT, first and last time I do this in my life. Else it's Latex all the way.
Title: 'Bio-Inspired Robustness: A Review '
Arxiv link: https://arxiv.org/abs/2103.09265
Abstract: Deep convolutional neural networks (DCNNs) have revolutionized computer vision and are often advocated as good models of the human visual system. However, there are currently many shortcomings of DCNNs, which preclude them as a model of human vision. For example, in the case of adversarial attacks, where adding small amounts of noise to an image, including an object, can lead to strong misclassification of that object. But for humans, the noise is often invisible. If vulnerability to adversarial noise cannot be fixed, DCNNs cannot be taken as serious models of human vision. Many studies have tried to add features of the human visual system to DCNNs to make them robust against adversarial attacks. However, it is not fully clear whether human vision-inspired components increase robustness because performance evaluations of these novel components in DCNNs are often inconclusive. We propose a set of criteria for proper evaluation and analyze different models according to these criteria. We finally sketch future efforts to make DCCNs one step closer to the model of human vision.
A new study by NVIDIA, University of Toronto, McGill University and the Vector Institute introduces an efficient neural representation that enables real-time rendering of high-fidelity neural SDFs for the first time while delivering SOTA quality geometric reconstruction.
Here is a quick read: NVIDIA, UToronto, McGill & Vector Study Delivers Real-Time SDF Rendering & SOTA Complex Geometry Reconstruction
The paper Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Surfaces is on arXiv.