/r/neuralnetworks

Photograph via snooOG

Subreddit about Artificial Neural Networks, Deep Learning and Machine Learning.

/r/neuralnetworks

26,768 Subscribers

5

Fractal-like Basins of attraction in Hopfield Neural Networks.

1 Comment
2024/12/05
03:12 UTC

2

PointNet Ensemble Improves Antimatter Annihilation Position Reconstruction at CERN

The researchers developed a deep learning approach for detecting and classifying antihydrogen annihilation events in CERN's ALPHA experiment. The key innovation is combining CNN architectures with custom physics-informed layers specifically designed for antimatter signature detection.

Key technical points:

  • Custom neural network architecture processes raw detector data from silicon vertex detectors
  • Model trained on both real and simulated antihydrogen annihilation events
  • Implements physics-informed regularization based on known antimatter behavior
  • Uses data augmentation to handle limited training examples
  • Achieves real-time processing (<1ms per event)

Results:

  • 99.9% accuracy on test set
  • False positive rate of 0.1%
  • Performance matches human expert analysis
  • Validated against traditional reconstruction methods
  • Maintains accuracy across different experimental conditions

I think this work opens up interesting possibilities for applying ML to other rare physics events. The ability to process events in real-time could enable new types of experiments that weren't feasible with traditional analysis pipelines. The physics-informed architecture approach might also transfer well to other particle physics problems.

I'm particularly interested in how they handled the limited training data challenge - antimatter events are extremely rare and expensive to produce. Their data augmentation and physics-based regularization techniques could be valuable for other domains with similar constraints.

TLDR: Deep learning system achieves 99.9% accuracy detecting antimatter annihilation events at CERN, reducing analysis time from hours to milliseconds using physics-informed neural networks.

Full summary is here. Paper here.

0 Comments
2024/12/04
12:59 UTC

4

Auto-Annotate Datasets with LVMs

7 Comments
2024/12/04
08:17 UTC

5

Can the lessons learned with the "split brain experiment" help develop smarter neural networks/machine learning software?

If you don't know, the surgery called "Corpus callosotomy" was a last-resort surgery used to help treat patients with severe epilepsy cases.Well, a side effect of that is it also splits the consciousness of the brain in two.

Meaning that one side of the brain would control half of the body without the person willing to, their hands grabbing things without their control and other similar things.Although this may sound extreme, both consciousness were still somewhat connected and still a single person, not "evil-version" of yourself or something like that.

There are a lot of videos on subject, but in essence:

From all the research that has been done, it is believed (or proved, I'm no neuroscientist) that the brain is made out of several "black boxes" of processing compartments and semi-independent consciousnesses that all work together in sync.

However, each "compartment" is specialized for specific tasks, like visual information, motion control, communication etc.

And as such, having a neural network that somewhat resembles/mimics this compartmentalization of the human brain could allow for smarter artificial intelligences?

1 Comment
2024/12/03
17:07 UTC

0

control image generation .

hello guys , is their a way to control the images generation with items from local database. exemple :

  • i input a prompt or image of room or both.
  • the model will generate me the room where all its items are from the local database ( mongodb or sql ) .

now my questions :

  • how to do this ?
  • if yes then how to build it ?
  • how to set the database structure ?
0 Comments
2024/12/03
14:21 UTC

10

Hopfield Neural Networks

John Hopfield won the Nobel Prize in Physics this year with G. Hinton. Has anyone played around with the Hopfield Neural Network systems? I have and they have some interesting properties for such a simple system. I mapped the basins as a function of the number of memories stored. They look fractal-like. I would be happy to post and share if anyone is interested.

3 Comments
2024/12/03
05:30 UTC

8

L1 vs L2 Regularization

0 Comments
2024/12/02
10:15 UTC

2

Update to Dense Layered NN in C

Hello! About two weeks ago, I posted about a dense layered neural net I created in C from scratch. I wanted to make a post about some updates to the work I've done. The network currently supports a classification-related NN, and the GitHub has been cleaned up for viewing. Any feedback would be appreciated.
https://github.com/Asu-Ghi/Personal_Projects/tree/main/MiniNet
Thank you for your time

0 Comments
2024/12/02
07:25 UTC

0

Would it be possible to train a model to replace all shoes in videos with crocs?

And how difficult would that be for a newbie(me)

2 Comments
2024/11/28
03:03 UTC

2

VanceNet Neural Network

I've made a neural network called VanceNet, it is designed to identify and analyze patterns within complex systems. It also uses dynamic energy based neurons, evolutionary updates, and fractal analysis to adapt and evolve over time. By tracking metrics like entropy and fractal dimensions, VanceNet generates increasingly sophisticated patterns, making it useful for applications like generative art, chaotic system modeling, and scientific research. If you're curious to learn more check the research paper here: VanceNet

9 Comments
2024/11/28
02:48 UTC

3

Transformer based anomaly detection

I am trying to build a model on anomaly detection based on a transformer autoencoder architecture that will detect anomalies in stock prices based on reconstruction errors.Will be using minute by minute OHCLV historical data of past 5 years of preferably 15 to 20 stocks to train the model and use real time apis and ingest it through Kafka to test it.

This would be my first project working on transformer based architecture.Can anyone with familiarity to these concepts let me know what kind of roadblocks I would face in this project and please do mention any valuable resources that would help me in building this.

1 Comment
2024/11/26
12:02 UTC

1

Large-Scale Evaluation of a Physician-Supervised LLM for Medical Chat Support Shows Enhanced Patient Satisfaction

This paper presents a real-world deployment of a medical LLM assistant that helps triage and handle patient inquiries at scale. The system uses a multi-stage architecture combining medical knowledge injection, conversational abilities, and safety guardrails.

Key technical components:

  • Custom medical knowledge base integrated with LLM
  • Multi-stage pipeline for query understanding and response generation
  • Safety classification system to detect out-of-scope requests
  • Synthetic patient testing framework for validation
  • Human-in-the-loop monitoring system

Results from deployment:

  • 200,000+ users served in France
  • 92% user satisfaction rate
  • Statistically significant reduction in doctor workload
  • 99.9% safety score on held-out test cases
  • Average response time under 30 seconds

I think this demonstrates that carefully constrained LLMs can be safely deployed for basic medical triage and information provision. The multi-stage architecture with explicit safety checks seems like a promising approach for high-stakes domains. However, the system's limitation to text-only interaction and reliance on accurate symptom reporting by patients suggests we're still far from fully automated medical care.

The synthetic testing framework is particularly interesting - it could be valuable for developing similar systems in other regulated domains where real-world testing is risky.

TLDR: Production medical LLM assistant using multi-stage architecture with safety guarantees shows promising results in real-world deployment, handling 200k+ users with 92% satisfaction while reducing doctor workload.

Full summary is here. Paper here.

2 Comments
2024/11/23
20:23 UTC

1

Does anyone know how to make a realistic rim light in Stable DIffusion?

I’ve seen people do something similar, they took a person and didn’t carefully draw the rim light, and after ST they did everything realistically, but I can’t do it very well, tell me what model can I use and the settings for it?

0 Comments
2024/11/22
17:29 UTC

2

Design2Code: Evaluating Multimodal LLMs for Screenshot-to-Code Generation in Web Development

This paper introduces a systematic benchmark called Design2Code for evaluating how well multimodal LLMs can convert webpage screenshots into functional HTML/CSS code. The methodology involves testing models like GPT-4V, Claude 3, and Gemini across 484 real-world webpage examples using both automatic and human evaluation.

Key technical points:

  • Created a diverse dataset of webpage screenshots paired with ground-truth code
  • Developed automatic metrics to evaluate visual element recall and layout accuracy
  • Tested different prompting strategies including zero-shot and few-shot approaches
  • Compared model performance using both automated metrics and human evaluation
  • Found that current models achieve ~70% accuracy on visual element recall but struggle with precise layouts

Main results:

  • GPT-4V performed best overall, followed by Claude 3 and Gemini
  • Models frequently miss smaller visual elements and struggle with exact positioning
  • Layout accuracy drops significantly as webpage complexity increases
  • Few-shot prompting with similar examples improved performance by 5-10%
  • Human evaluators rated only 45% of generated code as fully functional

I think this benchmark will be valuable for measuring progress in multimodal code generation, similar to how BLEU scores help track machine translation improvements. The results highlight specific areas where current models need improvement, particularly in maintaining visual fidelity and handling complex layouts. This could help focus research efforts on these challenges.

I think the findings also suggest that while automatic webpage generation isn't ready for production use, it could already be useful as an assistive tool for developers, particularly for simpler layouts and initial prototypes.

TLDR: New benchmark tests how well AI can convert webpage designs to code. Current models can identify most visual elements but struggle with precise layouts. GPT-4V leads but significant improvements needed for production use.

Full summary is here. Paper here.

1 Comment
2024/11/22
13:48 UTC

2

Greener Supply Chains Through AI? Share Your Expertise!

Supply chains are evolving faster than ever, and Artificial Intelligence (AI) is becoming the go-to ingredient for driving sustainability. From inventory systems that seem to know what we need before we do, to HR tools that streamline operations, AI is changing the game.

I’m diving into the question: How does AI adoption really impact environmental performance in supply chains? To answer it, I need your expertise (and maybe a bit of your time).

If you’ve got 10 minutes to spare, I’d love for you to share your insights via this survey: https://nyenrode.eu.qualtrics.com/jfe/form/SV_dmPtjoM1s9mwZ38

0 Comments
2024/11/22
00:12 UTC

3

Building a NN that predicts a specific stock

I’m currently in my final year of a computer science degree, building a CNN for my final project.

I’m interested in investing etc so I thought this could be a fun side project. How viable do you guys think it would be?

Obviously it’s not going to predict it very well but hey, side projects aren’t supposed to be million dollar inventions.

7 Comments
2024/11/21
20:11 UTC

2

Prompt-in-Decoder: Efficient Parallel Decoding for Transformer Models on Decomposable Tasks

The key technical advance in this paper is a method called "Encode Once and Decode in Parallel" (EODP) that enables transformers to process multiple output sequences simultaneously during decoding. This approach caches encoder outputs and reuses them across different prompts, reducing computational overhead.

Main technical points:

  • Encoder computations are decoupled from decoder operations, allowing single-pass encoding
  • Multiple prompts can be decoded in parallel through cached encoder states
  • Memory usage is optimized through efficient caching strategies
  • Method maintains output quality while improving computational efficiency
  • Tested on machine translation and text summarization tasks
  • Reports 2-3x speedup compared to traditional sequential decoding

Results:

  • Machine translation: 2.4x speedup with minimal BLEU score impact (<0.1)
  • Text summarization: 2.1x speedup while maintaining ROUGE scores
  • Memory overhead scales linearly with number of parallel sequences
  • Works with standard encoder-decoder transformer architectures

I think this could be important for deploying large language models more efficiently, especially in production environments where latency and compute costs matter. The ability to batch decode multiple prompts could make transformer-based systems more practical for real-world applications.

I think the main limitation is that it's currently only demonstrated on standard encoder-decoder architectures - it would be interesting to see if/how this extends to more complex transformer variants with cross-attention or dynamic computation.

TLDR: New method enables parallel decoding of multiple prompts in transformer models by caching encoder states, achieving 2-3x speedup without sacrificing output quality.

Full summary is here. Paper here.

1 Comment
2024/11/21
12:57 UTC

3

Transformer-Based Sports Simulation Engine for Generating Realistic Multi-Player Gameplay and Strategic Analysis

I've been reviewing this new paper on generating sustained sports gameplay sequences using a multi-agent approach. The key technical contribution is a framework that combines positional encoding, action generation, and a novel coherence discriminator to produce long-duration, realistic multi-player sports sequences.

Main technical components:

  • Multi-scale transformer architecture that processes both local player interactions and global game state
  • Hierarchical action generation that decomposes complex gameplay into coordinated individual actions
  • Physics-aware constraint system to ensure generated movements follow realistic game rules
  • Novel coherence loss that penalizes discontinuities between generated sequences
  • Curriculum training approach starting with short sequences and gradually increasing duration

Results from their evaluation:

  • Generated sequences maintain coherence for up to 30 seconds (significantly longer than baselines)
  • Human evaluators rated generated sequences as realistic 72% of the time
  • System successfully captures team-level strategies and formations
  • Computational requirements scale linearly with sequence length

The implications are significant for sports simulation, training, and analytics. This could enable better AI-driven sports game development and automated highlight generation. The framework could potentially extend to other multi-agent scenarios requiring sustained, coordinated behavior.

TLDR: New multi-agent framework generates extended sports gameplay sequences by combining transformers, hierarchical action generation, and coherence constraints. Shows strong results for sequence length and realism.

Full summary is here. Paper here.

0 Comments
2024/11/20
19:53 UTC

1

Book recommendations for learning tricks and techniques

Looking for books similar to Neural Networks: Tricks of the Trade, except newer and/or different.

0 Comments
2024/11/20
15:14 UTC

4

Large Language Models Enable High-Fidelity Behavioral Simulation of 1,000+ Individuals

I found this paper interesting for its technical approach to creating behavioral simulations using LLMs. The researchers developed a system that generates digital agents based on interview data from real people, achieving high fidelity in replicating human behavior patterns.

Key technical aspects:

  • Architecture combines LLM-based agents with structured interview processing
  • Agents are trained on personal narratives to model decision-making
  • Validation against General Social Survey responses
  • Tested on 1,052 individuals across diverse demographic groups

Main results:

  • 85% accuracy in replicating survey responses compared to human consistency
  • Maintained performance across different racial and ideological groups
  • Successfully reproduced experimental outcomes from social psychology studies
  • Reduced demographic bias compared to traditional simulation approaches

The implications for social science research are significant. This methodology could enable more accurate policy testing and social dynamics research by:

  • Creating representative populations for simulation studies
  • Testing interventions across diverse groups
  • Modeling complex social interactions
  • Reducing demographic biases in research

Technical limitations to consider:

  • Current validation limited to survey responses and controlled experiments
  • Long-term behavioral consistency needs further study
  • Handling of evolving social contexts remains uncertain
  • Privacy considerations in creating digital representations

TLDR: New methodology creates digital agents that accurately simulate human behavior using LLMs and interview data, achieving 85% accuracy in replicating survey responses. Shows promise for social science research while reducing demographic biases.

Full summary is here. Paper here.

1 Comment
2024/11/19
14:49 UTC

2

Neural Net Framework in C

Hello! This is one of my first posts ever, but I'd like feedback on a Neural Network Framework I've been working on recently. It's fully implemented in C, and any input would be appreciated. This is just a side project I've been working on, and the process has been rewarding so far.

Files of relevance are, main.c, network.c, forward.c, backward.c, and utils.c

https://github.com/Asu-Ghi/Personal_Projects/tree/main/C_Projects/Neural

Thanks for your time!

0 Comments
2024/11/19
06:45 UTC

1

Memoripy: Bringing Memory to AI with Short-Term & Long-Term Storage

Hey r/neuralnetworks!

I’ve been working on Memoripy, a Python library that brings real memory capabilities to AI applications. Whether you’re building conversational AI, virtual assistants, or projects that need consistent, context-aware responses, Memoripy offers structured short-term and long-term memory storage to keep interactions meaningful over time.

Memoripy organizes interactions into short-term and long-term memory, prioritizing recent events while preserving important details for future use. This ensures the AI maintains relevant context without being overwhelmed by unnecessary data.

With semantic clustering, similar memories are grouped together, allowing the AI to retrieve relevant context quickly and efficiently. To mimic how we forget and reinforce information, Memoripy features memory decay and reinforcement, where less useful memories fade while frequently accessed ones stay sharp.

One of the key aspects of Memoripy is its focus on local storage. It’s designed to work seamlessly with locally hosted LLMs, making it a great fit for privacy-conscious developers who want to avoid external API calls. Memoripy also integrates with OpenAI and Ollama.

If this sounds like something you could use, check it out on GitHub! It’s open-source, and I’d love to hear how you’d use it or any feedback you might have.

0 Comments
2024/11/19
01:16 UTC

4

TSMamba: SOTA time series model based on Mamba

TSMamba is a Mamba based (alternate for transformers) Time Series forecasting model generating state of the art results for time series. The model uses bidirectional encoders and supports even zero-shot predictions. Checkout more details here : https://youtu.be/WvMDKCfJ4nM

0 Comments
2024/11/18
09:38 UTC

18

Using Neural Network to learn snake to win

#neuralnetwork #machinelearning

6 Comments
2024/11/18
06:33 UTC

3

I'm overwhelmed and I need help.

So, I'm in a Ph.D. programme that I started on August and my main research revolves around deep learning, neural network and activation functions. My supervisor gave certain materials for me to read that could help me get into learning about neural networks and activation functions. However, the introductory materials were vast, and I'd need more time to learn about the basic concepts. But my supervisor overwhelmed me with the responsibility to read 200 papers each for one week on activation functions even before I could finish up the basics. I just learned about gradient descent and the basic materials need a good amount of time for me to comprehend. I am really having hard time understanding the research papers I'm reading right now, because I didn't get the time to fully cover basics. But my supervisor expects me to give a weekly report on the papers I have read. So far, I have read 4 papers, but I couldn't understand any of them. They were like Classical Greek for me. I told my supervisor that I'm having a hard time comprehending those papers because my basics haven't been covered, but my supervisor didn't seem to mind it.

Now, I'm in a rut. On one hand, I have to write reports on incomprehensible papers which is really draining me out and on the other hand I still need more time to cover the basics of neural network. I really don't know what I should do in this case.

13 Comments
2024/11/17
15:15 UTC

4

I Like Working With Model Architecture Visually. How About You?

I don’t know about you, but I feel like visual representations of CNNs (and models in general) are seriously underrated. In my experience, it’s so much easier to work on a project when you can mentally “walk around” the model.

Maybe that’s just me. I’d definitely describe myself as a visual learner. But I’m curious, have you had a similar experience? Do you visualize the structure of your models when working on your projects?

Over the past month, I’ve been working on visualizing a (relatively simple) model. (Link to project: https://youtu.be/zLEt5oz5Mr8 ).

What’s your take on this?

3 Comments
2024/11/17
13:19 UTC

2

Help with Project for Damage Detection

Hey guys,

I am currently working on creating a project that detects damage/dents on construction machinery(excavator,cement mixer etc.) rental and a machine learning model is used after the machine is returned to the rental company to detect damages and 'penalise the renters' accordingly. It is expected that we have the image of the machines pre-rental so there is a comparison we can look at as a benchmark

What would you all suggest to do for this? Which models should i train/finetune? What data should i collect? Any other suggestion?

If youll have any follow up questions , please ask ahead.

3 Comments
2024/11/17
12:43 UTC

1

Model loss is too sensitive to one parameter count

Hi everyone, I'm training a translation(en -> hi) model with my own transformer implementation, I trained one with 15 mil parameters and it achieved a loss of less than 1, the learning rate was initially set to 0.001 and I lowered it as the model progressed, the final learning rate was 0.0001, the problem is when I change the model size(30mil) even slightly, the loss just stagnates somewhere around 5.3, what is happening, I know the learning rate should be based on model and dataset size, the dataset is the same and 15 to 30 mil doesn't look that big a difference, they are both small models. Should I use a learning rate scheduler?

edit: smaller models seem to be doing better, an 8.5 mil model doesn't get stuck at 5.3

here is the transformer implementation if you want to check that: https://github.com/n1teshy/transformer
the notebook I used to train : https://github.com/n1teshy/transformer/blob/main/notebooks/transformer.colab.ipynb

2 Comments
2024/11/17
06:51 UTC

2

MobileNetV2 not going past 50% accuracy no matter what I try

So for context, I'm trying to create a CNN which can recognize emotions based on images of faces. I'm using the FER-2013 dataset. Initially, I tried to construct a CNN on my own, but didn't achieve a good enough accuracy so I decided to use the pre-trained model MobileNetV2 . The model doesn't overfit but whatever I've tried to increase model complexity like data augmentation and training the last few layers of the pre-trained model haven't worked. I've trained the model for 30 epochs but the accuracy and validation loss plateau at just under 50% and 1.3 respectively. What else can I do to improve the accuracy of the model?

2 Comments
2024/11/16
14:10 UTC

Back To Top