/r/neuralnetworks
Subreddit about Artificial Neural Networks, Deep Learning and Machine Learning.
Related subreddits:
/r/neuralnetworks
The researchers developed a deep learning approach for detecting and classifying antihydrogen annihilation events in CERN's ALPHA experiment. The key innovation is combining CNN architectures with custom physics-informed layers specifically designed for antimatter signature detection.
Key technical points:
Results:
I think this work opens up interesting possibilities for applying ML to other rare physics events. The ability to process events in real-time could enable new types of experiments that weren't feasible with traditional analysis pipelines. The physics-informed architecture approach might also transfer well to other particle physics problems.
I'm particularly interested in how they handled the limited training data challenge - antimatter events are extremely rare and expensive to produce. Their data augmentation and physics-based regularization techniques could be valuable for other domains with similar constraints.
TLDR: Deep learning system achieves 99.9% accuracy detecting antimatter annihilation events at CERN, reducing analysis time from hours to milliseconds using physics-informed neural networks.
Full summary is here. Paper here.
If you don't know, the surgery called "Corpus callosotomy" was a last-resort surgery used to help treat patients with severe epilepsy cases.Well, a side effect of that is it also splits the consciousness of the brain in two.
Meaning that one side of the brain would control half of the body without the person willing to, their hands grabbing things without their control and other similar things.Although this may sound extreme, both consciousness were still somewhat connected and still a single person, not "evil-version" of yourself or something like that.
There are a lot of videos on subject, but in essence:
From all the research that has been done, it is believed (or proved, I'm no neuroscientist) that the brain is made out of several "black boxes" of processing compartments and semi-independent consciousnesses that all work together in sync.
However, each "compartment" is specialized for specific tasks, like visual information, motion control, communication etc.
And as such, having a neural network that somewhat resembles/mimics this compartmentalization of the human brain could allow for smarter artificial intelligences?
hello guys , is their a way to control the images generation with items from local database. exemple :
now my questions :
John Hopfield won the Nobel Prize in Physics this year with G. Hinton. Has anyone played around with the Hopfield Neural Network systems? I have and they have some interesting properties for such a simple system. I mapped the basins as a function of the number of memories stored. They look fractal-like. I would be happy to post and share if anyone is interested.
Hello! About two weeks ago, I posted about a dense layered neural net I created in C from scratch. I wanted to make a post about some updates to the work I've done. The network currently supports a classification-related NN, and the GitHub has been cleaned up for viewing. Any feedback would be appreciated.
https://github.com/Asu-Ghi/Personal_Projects/tree/main/MiniNet
Thank you for your time
And how difficult would that be for a newbie(me)
I've made a neural network called VanceNet, it is designed to identify and analyze patterns within complex systems. It also uses dynamic energy based neurons, evolutionary updates, and fractal analysis to adapt and evolve over time. By tracking metrics like entropy and fractal dimensions, VanceNet generates increasingly sophisticated patterns, making it useful for applications like generative art, chaotic system modeling, and scientific research. If you're curious to learn more check the research paper here: VanceNet
I am trying to build a model on anomaly detection based on a transformer autoencoder architecture that will detect anomalies in stock prices based on reconstruction errors.Will be using minute by minute OHCLV historical data of past 5 years of preferably 15 to 20 stocks to train the model and use real time apis and ingest it through Kafka to test it.
This would be my first project working on transformer based architecture.Can anyone with familiarity to these concepts let me know what kind of roadblocks I would face in this project and please do mention any valuable resources that would help me in building this.
This paper presents a real-world deployment of a medical LLM assistant that helps triage and handle patient inquiries at scale. The system uses a multi-stage architecture combining medical knowledge injection, conversational abilities, and safety guardrails.
Key technical components:
Results from deployment:
I think this demonstrates that carefully constrained LLMs can be safely deployed for basic medical triage and information provision. The multi-stage architecture with explicit safety checks seems like a promising approach for high-stakes domains. However, the system's limitation to text-only interaction and reliance on accurate symptom reporting by patients suggests we're still far from fully automated medical care.
The synthetic testing framework is particularly interesting - it could be valuable for developing similar systems in other regulated domains where real-world testing is risky.
TLDR: Production medical LLM assistant using multi-stage architecture with safety guarantees shows promising results in real-world deployment, handling 200k+ users with 92% satisfaction while reducing doctor workload.
Full summary is here. Paper here.
I’ve seen people do something similar, they took a person and didn’t carefully draw the rim light, and after ST they did everything realistically, but I can’t do it very well, tell me what model can I use and the settings for it?
This paper introduces a systematic benchmark called Design2Code for evaluating how well multimodal LLMs can convert webpage screenshots into functional HTML/CSS code. The methodology involves testing models like GPT-4V, Claude 3, and Gemini across 484 real-world webpage examples using both automatic and human evaluation.
Key technical points:
Main results:
I think this benchmark will be valuable for measuring progress in multimodal code generation, similar to how BLEU scores help track machine translation improvements. The results highlight specific areas where current models need improvement, particularly in maintaining visual fidelity and handling complex layouts. This could help focus research efforts on these challenges.
I think the findings also suggest that while automatic webpage generation isn't ready for production use, it could already be useful as an assistive tool for developers, particularly for simpler layouts and initial prototypes.
TLDR: New benchmark tests how well AI can convert webpage designs to code. Current models can identify most visual elements but struggle with precise layouts. GPT-4V leads but significant improvements needed for production use.
Full summary is here. Paper here.
Supply chains are evolving faster than ever, and Artificial Intelligence (AI) is becoming the go-to ingredient for driving sustainability. From inventory systems that seem to know what we need before we do, to HR tools that streamline operations, AI is changing the game.
I’m diving into the question: How does AI adoption really impact environmental performance in supply chains? To answer it, I need your expertise (and maybe a bit of your time).
If you’ve got 10 minutes to spare, I’d love for you to share your insights via this survey: https://nyenrode.eu.qualtrics.com/jfe/form/SV_dmPtjoM1s9mwZ38
I’m currently in my final year of a computer science degree, building a CNN for my final project.
I’m interested in investing etc so I thought this could be a fun side project. How viable do you guys think it would be?
Obviously it’s not going to predict it very well but hey, side projects aren’t supposed to be million dollar inventions.
The key technical advance in this paper is a method called "Encode Once and Decode in Parallel" (EODP) that enables transformers to process multiple output sequences simultaneously during decoding. This approach caches encoder outputs and reuses them across different prompts, reducing computational overhead.
Main technical points:
Results:
I think this could be important for deploying large language models more efficiently, especially in production environments where latency and compute costs matter. The ability to batch decode multiple prompts could make transformer-based systems more practical for real-world applications.
I think the main limitation is that it's currently only demonstrated on standard encoder-decoder architectures - it would be interesting to see if/how this extends to more complex transformer variants with cross-attention or dynamic computation.
TLDR: New method enables parallel decoding of multiple prompts in transformer models by caching encoder states, achieving 2-3x speedup without sacrificing output quality.
Full summary is here. Paper here.
I've been reviewing this new paper on generating sustained sports gameplay sequences using a multi-agent approach. The key technical contribution is a framework that combines positional encoding, action generation, and a novel coherence discriminator to produce long-duration, realistic multi-player sports sequences.
Main technical components:
Results from their evaluation:
The implications are significant for sports simulation, training, and analytics. This could enable better AI-driven sports game development and automated highlight generation. The framework could potentially extend to other multi-agent scenarios requiring sustained, coordinated behavior.
TLDR: New multi-agent framework generates extended sports gameplay sequences by combining transformers, hierarchical action generation, and coherence constraints. Shows strong results for sequence length and realism.
Full summary is here. Paper here.
Looking for books similar to Neural Networks: Tricks of the Trade, except newer and/or different.
I found this paper interesting for its technical approach to creating behavioral simulations using LLMs. The researchers developed a system that generates digital agents based on interview data from real people, achieving high fidelity in replicating human behavior patterns.
Key technical aspects:
Main results:
The implications for social science research are significant. This methodology could enable more accurate policy testing and social dynamics research by:
Technical limitations to consider:
TLDR: New methodology creates digital agents that accurately simulate human behavior using LLMs and interview data, achieving 85% accuracy in replicating survey responses. Shows promise for social science research while reducing demographic biases.
Full summary is here. Paper here.
Hello! This is one of my first posts ever, but I'd like feedback on a Neural Network Framework I've been working on recently. It's fully implemented in C, and any input would be appreciated. This is just a side project I've been working on, and the process has been rewarding so far.
Files of relevance are, main.c, network.c, forward.c, backward.c, and utils.c
https://github.com/Asu-Ghi/Personal_Projects/tree/main/C_Projects/Neural
Thanks for your time!
Hey r/neuralnetworks!
I’ve been working on Memoripy, a Python library that brings real memory capabilities to AI applications. Whether you’re building conversational AI, virtual assistants, or projects that need consistent, context-aware responses, Memoripy offers structured short-term and long-term memory storage to keep interactions meaningful over time.
Memoripy organizes interactions into short-term and long-term memory, prioritizing recent events while preserving important details for future use. This ensures the AI maintains relevant context without being overwhelmed by unnecessary data.
With semantic clustering, similar memories are grouped together, allowing the AI to retrieve relevant context quickly and efficiently. To mimic how we forget and reinforce information, Memoripy features memory decay and reinforcement, where less useful memories fade while frequently accessed ones stay sharp.
One of the key aspects of Memoripy is its focus on local storage. It’s designed to work seamlessly with locally hosted LLMs, making it a great fit for privacy-conscious developers who want to avoid external API calls. Memoripy also integrates with OpenAI and Ollama.
If this sounds like something you could use, check it out on GitHub! It’s open-source, and I’d love to hear how you’d use it or any feedback you might have.
TSMamba is a Mamba based (alternate for transformers) Time Series forecasting model generating state of the art results for time series. The model uses bidirectional encoders and supports even zero-shot predictions. Checkout more details here : https://youtu.be/WvMDKCfJ4nM
#neuralnetwork #machinelearning
So, I'm in a Ph.D. programme that I started on August and my main research revolves around deep learning, neural network and activation functions. My supervisor gave certain materials for me to read that could help me get into learning about neural networks and activation functions. However, the introductory materials were vast, and I'd need more time to learn about the basic concepts. But my supervisor overwhelmed me with the responsibility to read 200 papers each for one week on activation functions even before I could finish up the basics. I just learned about gradient descent and the basic materials need a good amount of time for me to comprehend. I am really having hard time understanding the research papers I'm reading right now, because I didn't get the time to fully cover basics. But my supervisor expects me to give a weekly report on the papers I have read. So far, I have read 4 papers, but I couldn't understand any of them. They were like Classical Greek for me. I told my supervisor that I'm having a hard time comprehending those papers because my basics haven't been covered, but my supervisor didn't seem to mind it.
Now, I'm in a rut. On one hand, I have to write reports on incomprehensible papers which is really draining me out and on the other hand I still need more time to cover the basics of neural network. I really don't know what I should do in this case.
I don’t know about you, but I feel like visual representations of CNNs (and models in general) are seriously underrated. In my experience, it’s so much easier to work on a project when you can mentally “walk around” the model.
Maybe that’s just me. I’d definitely describe myself as a visual learner. But I’m curious, have you had a similar experience? Do you visualize the structure of your models when working on your projects?
Over the past month, I’ve been working on visualizing a (relatively simple) model. (Link to project: https://youtu.be/zLEt5oz5Mr8 ).
What’s your take on this?
Hey guys,
I am currently working on creating a project that detects damage/dents on construction machinery(excavator,cement mixer etc.) rental and a machine learning model is used after the machine is returned to the rental company to detect damages and 'penalise the renters' accordingly. It is expected that we have the image of the machines pre-rental so there is a comparison we can look at as a benchmark
What would you all suggest to do for this? Which models should i train/finetune? What data should i collect? Any other suggestion?
If youll have any follow up questions , please ask ahead.
Hi everyone, I'm training a translation(en -> hi) model with my own transformer implementation, I trained one with 15 mil parameters and it achieved a loss of less than 1, the learning rate was initially set to 0.001 and I lowered it as the model progressed, the final learning rate was 0.0001, the problem is when I change the model size(30mil) even slightly, the loss just stagnates somewhere around 5.3, what is happening, I know the learning rate should be based on model and dataset size, the dataset is the same and 15 to 30 mil doesn't look that big a difference, they are both small models. Should I use a learning rate scheduler?
edit: smaller models seem to be doing better, an 8.5 mil model doesn't get stuck at 5.3
here is the transformer implementation if you want to check that: https://github.com/n1teshy/transformer
the notebook I used to train : https://github.com/n1teshy/transformer/blob/main/notebooks/transformer.colab.ipynb
So for context, I'm trying to create a CNN which can recognize emotions based on images of faces. I'm using the FER-2013 dataset. Initially, I tried to construct a CNN on my own, but didn't achieve a good enough accuracy so I decided to use the pre-trained model MobileNetV2 . The model doesn't overfit but whatever I've tried to increase model complexity like data augmentation and training the last few layers of the pre-trained model haven't worked. I've trained the model for 30 epochs but the accuracy and validation loss plateau at just under 50% and 1.3 respectively. What else can I do to improve the accuracy of the model?