/r/MachineLearning
Beginners -> /r/mlquestions , AGI -> /r/singularity, career advices -> /r/cscareerquestions, datasets -> r/datasets
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Andrew Ng and Adam Coates (4/15/2015)
Related Subreddit :
/r/MachineLearning
Hello AI Community,
I’m working on a project to streamline the processing of a large volume of invoices from various suppliers. Each invoice may have a unique layout and design, depending on the supplier, and I want to train an AI model to automatically identify specific fields like article numbers, gross amounts, unit prices, etc., across these invoices. I’ll outline my situation below and would appreciate any advice on the best approach, relevant models, or practical considerations to help automate this process.
I have a substantial collection of PDF invoices from different suppliers. Some of these PDFs contain machine-readable text, while others are scanned images requiring OCR processing. Each invoice has a similar set of fields I need to extract, including:
Additionally, I have corresponding XML files for each invoice that list the correct field values as structured data. This XML data serves as my “ground truth” and is accurate in labeling each field with the correct values.
Goal: Train an AI model that can automatically parse and map values from new invoices to these field labels without needing manual bounding boxes or annotations on each new layout. My ideal solution would learn from the XML data and understand where each value is likely located on any invoice.
I’ve looked into some potential approaches and models that might be suitable, but I’m unsure of the best approach given my requirements:
To give you an idea of what I’m working with, here’s a basic breakdown:
<invoice>
<orderDetails>
<positions>
<position>
<positionNumber>0010</positionNumber>
<articleNumber>EDK0000379</articleNumber>
<description>Sensorcable, YF1234-100ABC3EEAX</description>
<quantity>2</quantity>
<unit>ST</unit>
<unitPrice>23.12</unitPrice>
<netAmount>46.24</netAmount>
</position>
</positions>
</orderDetails>
</invoice>
Thanks in advance for your insights! I’d be especially grateful for any step-by-step advice on setting up and training such a model, as well as practical tips or pitfalls you may have encountered in similar projects.
Hi all, I’m working on a self-supervised learning approach to estimate missing or uncertain data in a freeway traffic density dataset, inspired by matrix completion methods.
The dataset is generated from simulated freeway traffic, discretized in time and space to form a grid of cells. Each cell reflects a traffic density value observed from mobile sensors. I have three core arrays:
with dimensions (T, E, S, L), where:
Goal
The goal here is to build a model that can improve the estimation for cells where the certainty is less than 1. I want the model to capture dependencies over time and space, using self-supervision to “fill in” unobserved or uncertain values more accurately.
Proposed Approach
Here’s what I’m thinking in terms of architecture:
Question
Hi Folks,
I come from a tradition electrical engineering background doing things like industrial automation and computer vision. I decided to pursue a PhD in ML as I thought it will be a good field to enter given my past experience. Now I have been doing the PhD for the past three years. While I like my group and research, I am getting discouraged/depressed by (1) The publication rat race (2) post graduation opportunities mostly being coding heavy (3) the inability to carve a name for myself in the field given how crowded the field has become.
Thus, ideally I would like to complete my PhD and move into a more relaxed paced (even if it is not as high paying as ML jobs) non coding heavy but technical job, where I do not have to constantly up-skill myself. Do you folks have any suggestion on what jobs I can look into or would you suggest dropping the PhD and doing something else?
TLDR: 4th year ML PhD student unsure of sticking with the PhD as they desire a non coding heavy technical job in the industry post graduation. Seeking advice on what to do.
As AI models continue to scale in both complexity and size, I'm interested in how the field of matrix computations is evolving to meet these new challenges. What are some of the latest advancements or strategies in matrix computation that are improving efficiency and adaptability for modern AI systems? Are there any recent breakthroughs or shifts in our approach to these computations that are making a significant impact in AI research and applications?
Hi, I am currently trying to use LLMs to see how it could help people to query technical documents in French. For my tests, documents are about maintenance of water heaters. The format is PDF files (clean PDF, scans, PDF with tables, ...). There is one PDF file per reference of water heater
I would like the LLM be able to answer to questions like "very little hot water, what could be the problem ?" or "how to drain the heater?"
For the information retrieval module of the RAG part, I am currently trying to use FAISS and SentenceTransformer with the model sentence-camembert-large ( I will probably use mixtral-8x7b-instruct-v0.1 for the answer generation)
In some use cases, I know what the reference of the water heater that the technician is working on (eg "1N11001") is . And I am able to pass this information to my agent.
In such cases, do you have any idea how I can constraint the search to the related notice file (eg "1N11001.pdf")
I noticed that the `search function` on the FAISS index proposes a parameter `labels`. Do you think I could use this in my case? Could anyone explain me how I can add labels with the name of my files during the indexation process in order to be able to search only in the subset of vectors related to a given file ?
Any help, idea or advices would be very appreciated :)
Thanks a lot
is there any reason to train llms to predict only one token? like wouldnt inference be 2 times faster if it was trained to predict just 2? thats huge gain , sure there can be performance loss but for inference we already do quantization to increase speed which decreases performance anyway, will having llm predict more than 1 token decrease it more?
Being a PhD student, much of my time is spent on supervising students, project management and writing "quick and dirty" code for prototyping. I intend to move to industry after the PhD, but I feel like I'm missing out on key software engineering skills and good coding practices. Does anyone else feel this way? How do you upskill yourself to be industry-ready while doing a PhD?
I've been building LLM-based applications, and was super frustated with all major frameworks - langchain, autogen, crewAI, etc. They also seem to introduce a pile of unnecessary abstractions. It becomes super hard to understand what's going behind the curtains even for very simple stuff.
So I just published this open-source framework GenSphere. The idea is have something like Docker for LLMs. You build applications with YAML files, that define an execution graph. Nodes can be either LLM API calls, regular function executions or other graphs themselves. Because you can nest graphs easily, building complex applications is not an issue, but at the same time you don't lose control.
You basically code in YAML, stating what are the tasks that need to be done and how they connect. Other than that, you only write individual python functions to be called during the execution. No new classes and abstractions to learn.
Its all open-source. Now I'm looking for contributors to adapt the framework for cycles and conditional nodes - which would allow full-fledged agentic system building! Pls reach out if you want to contribute, there are tons of things to do!
PS: you can read the detailed docs here, And go over this quick Google Colab tutorial.
Hi, i got a research idea and applied it on nanogpt repo for lm training and validated transformer generalizes better on validation loss but worse training loss and is more prone to overfitting since like training loss a little worse validation loss a little better, i only applied to full shakespeare_char and a subset on openwebtext bcz 10 usd on runpod only allows me to do this, i am still going to release a paper since i get good results and done some math work, should i do it?
Just recently saw Autograd(library) by google people that thinly wraps numpy to offer backprop. JAX also does this but rewrites numpy basically. What’s the difference? Is it the gpu tpu support of JAX? is autograd meant for smaller models?
Just read that Kinetic Seas launched a new AI-specific data center—sounds like they’re aiming to make model training and fine-tuning less of a headache. Their setup includes specialized GPUs and CPUs, supposedly built to handle the demands of large, complex models. If traditional data centers feel like running uphill, maybe these AI-specific centers are the downhill version?
With machine learning models becoming more resource-hungry, I wonder if optimized infrastructure like this might change the game. Think about it: training models faster and with fewer limitations could really boost productivity for researchers and data scientists. Kinetic Seas seems to believe it’s worth building infrastructure just for AI, which feels like a pretty interesting bet.
Has anyone here worked with AI-specific setups like this? Curious to know if it’s really as smooth as it sounds!
Hi guys! I was thinking that, if we could dynamically merge LLM fine-tuning LoRAs depending on type of task at hand, we could fix catastrophic forgetting and maybe even have transformers better able to generalize. The thing is, due to Attention layers being very very non linear on their weights, transformers don't show poor LMC (linear mode connectivity).
Are you aware of the computational complexity of exact LoRA merging? I have seen quite a lot of papers on the subject of LoRA merging but they seem of poor quality and only empirical, with little mathematical grounding.
So if you guys have thought of it, I'd be glad to hear about it!
Hi all,
Apologies if this is the wrong place to post. I'm looking for tools that can help me support my partner, who has been harassed for a number of years by her ex and father of her child.
She is trying to compile evidence for a restraining order but going back through the years of emails and other messages is psychologically draining for her. I was wondering if there are any tools that have a good use case for analysing and classifying emails, either individually or in bulk, so that I can support her by taking over this work for her?
Hi everyone! 😊 I just published an article: Mastering LLM Testing: Ensuring Accuracy, Ethics, and Future-Readiness for Next-Gen AI Models. I hope I didn’t miss anything important in there!
I’m planning to turn this into a series on AI model testing and testing in general. Hope you enjoy it, and I’m always open for feedback and discussion! 😄
Hey r/MachineLearning! 👋
I’m doing some research to understand the key challenges people face when managing multiple AI models—particularly around scaling, monitoring performance, and handling model failures. I’d love to hear from the community to get a better sense of where the pain points are.
Here are a few questions to start:
Thanks so much for sharing your experiences—I’m excited to hear your insights!
Hi!
So ElevenLabs has a pretty good audio isolation API but it is really expensive. Are there any opensource models that can be self-hosted and get near the same quality?
LLMs are usually evaluated on benchmarks that aim to measure broad abilities. However, most publishers of foundational models do not publish the actual cross-entropy loss value that the model achieves at the end of training. I couldn't find any sources on this, but I would like to know what loss value the LLMs can achieve on human language. Is there anyone who knows more about this? Might there be some lower bound?
https://arxiv.org/pdf/2310.02980
The authors show that when transformers are pre trained, they can match the performance with S4 on the Long range Arena benchmark.
Hi everyone,
I'm facing a frustrating issue with my Python script. I'm processing prices and quantities in a DataFrame, using them to calculate unit prices, and saving the result to a CSV file. Everything seems perfect in Python (correct calculations, high precision), but when I open the CSV file, the values—particularly in the "Unit Prices"
column—are incorrect (usually divided by 1000) or rounded, even though I specified high precision.
A few details:
pd.to_csv()
with decimal='.'
to ensure dot-based decimal formatting.float_format
, aiming to retain maximum precision for Unit Prices
.Example Output: Here’s an example of what I'm seeing:
Python Output (before saving to CSV): Unit Prices = 0.696
CSV Output (opened in Excel): Unit Prices = 696
The weird thing is that this does not happen consistently. In some cases, rows are correct.
Has anyone faced this issue before? Any tips on ensuring that the CSV retains the exact precision and format as seen in Python?
I need to make a text classifier at work. I have 200 examples for each of the 5 categories. Each example is an email. Two approaches:
Which approach is best?
Hi, I am going for a PhD in theoretical deep learning and I am looking to buy a new laptop. I am unsure how readily the remote servers will be available (I have not been admitted into a program yet), so I am looking for enough compute power to simply test my code before running it on my lab's servers. I am currently contemplating between buying
I understand it would be better to go for a Nvidia GPU, and that neither of these laptops have a GPU, but I am not looking to invest in one.
My thoughts right now are that the Zenbook 14 has a slightly better processor, and much higher RAM than the MBA. I don't care about the SSD; 512GB is enough for me. However, I frequently see academics use the MBA, which could simply be about the fad, but I am not aware. I am also wondering if I am missing something I am not aware of by not jumping on the MBA train. They are about the same price, so that's not much of decision factor.
I am also not sure if I should look at the cheaper 16GB options. I am currently using a 16GB Zenbook 13 bought 5 years back, but the RAM was limiting me in my Master's thesis project. The processors have improved since then, so I am not sure if 16GB is enough now. Also, I know it would be ideal to wait to learn more about the compute resources available at the lab I join, but my current laptop is in a very poor state, so much so that I cannot carry it anywhere (hardware damage), the screen flickers all the time, and I worry that it will turn off any second and leave my data inaccessible.
Does anyone have any thoughts or suggestions?
I can't guarantee that the tag is appropriate.
I got tired of searching the WSJ0Mix dataset.
I want to separate multiple speakers.
The separator model of speechbrain doesn't give me the result I want.
So I wanted to build a model with the dataset I have.
However, no matter how much I searched for the WSJ0Mix dataset, it didn't come up.
I only found the *.m file, but I can't find what is included in the dataset or what is written in the *.csv file.
https://speechbrain.readthedocs.io/en/latest/tutorials/tasks/source-separation.html
The link above doesn't have the information I want either.
I'm very curious how you built the model.
I think I have come across some industry jobs before that required applicants to have top tier paper (NIPS/ICML/ICLR/CVPR/ICCV/ECCV), so my question is do paper from less prestige (AAAI/IJCAI/WACV/BMVC.... or journals) conference have any value when appying for these job? Additionaly, are metrics like h-index or citation matter?
I'm on a team that's launching a large project to examine how an ML pipeline behaves in response to variations in data.
This is the first time I'm doing a sensitivity analysis this large and complex in a while, so I'm looking for help to identify the most up-to-date resources on:
Simulated data, and especially any Python tools and how they compare with the best that R has to offer
Evaluation tooling
Elasticity
Any best resources on Sensitivity analysis overall, particularly newer ones from the post couple of years
What are the best resources you've found?
I’m preparing for a machine learning interview with Google, and the recruiter shared the main areas they’ll focus on:
- Theoretical ML concepts and practical applications – including problem definition, model selection, model tuning, and evaluation.
- Industry-Scale ML – covering performance and cost optimization, data handling, and production-oriented experimentation & debugging.
If anyone has insights on what to expect in these areas or tips on what to focus on, I’d really appreciate it! I’m especially struggling to understand what “Industry-Scale ML” questions could actually be.
Thanks in advance for any advice or resources!
edit: for context: I've already done my two LC style interviews. The first interview was an easy-medium I would say, and the second interview was definitely hard. I think I did well on both but only the second interviewer let me know how I did (I did well apparently). I also did the Googlyness interview which I think went well also. We had some good conversation.
What My Project Does
OpenSceneSense-Ollama is a powerful Python package designed for privacy-focused video analysis directly on your local machine. With this tool, you can leverage Ollama’s local models to analyze frames, transcribe audio, dynamically select key frames, and generate detailed summaries — all without relying on cloud-based APIs. It’s ideal for those needing rich, insightful analysis of video content while ensuring data privacy and minimizing usage costs.
Target Audience
This project is tailored for developers, researchers, data scientists, and privacy-conscious users who require in-depth, locally processed video analysis. It's perfect for applications where data security is critical, including:
- Content creation workflows that need automatic video summarization
- Researchers building labeled datasets for machine learning
- Platforms needing context-rich content moderation
- Offline projects in remote or restricted environments
Comparison
OpenSceneSense-Ollama goes beyond traditional video analysis tools that often separate frame and audio analysis. Instead, it integrates both visual and audio elements, allowing users to prompt the models to produce comprehensive summaries and in-depth contextual insights. Where most tools might identify objects or transcribe audio separately, OpenSceneSense-Ollama unifies these components into narrative summaries, making it ideal for richer datasets or more nuanced content moderation.
Getting Started
To begin using OpenSceneSense-Ollama:
Feel free to dive in, try it out, and share your feedback especially if you're working in AI, privacy-focused applications, or video content moderation. Let’s build a powerful, local solution for meaningful video analysis!
I had this idea for some time, and I have created all the functions for creating data as well as all the architecture. The problem is that I only have two years experience in Deep Learning, and this is GAN style network, and GANs are known to be very hard to train. I would like you opinions on idea, as well as some tips, suggestions, advices and things to change. Also if someone finds this interesting I would love to work with someone on this project.
The objective is to create a model that generates optimal camouflage color patterns by training a generator model and using a segmentation model as a discriminator to assess the effectiveness of the generated camouflage. Both the generator and discriminator are trained simultaneously.
n_embed = 128
and outputs a 3x32x32 camouflage color pattern.(1, W, H)
, with the values inverted so the soldier appears in white (foreground) and the background is black.Two loss functions are used, each with separate backpropagation processes:
CrossEntropyLoss(output, 0)
where the output is the predicted segmentation map from the discriminator, and 0 represents the background class.CrossEntropyLoss(output, label_mask)
where the label mask has two classes: background and soldier.This setup resembles a Generative Adversarial Network (GAN) but differs in that it uses no "real" camouflage data, only generated samples. Additionally:
While there's growing skepticism about the AI hype cycle, particularly around chatbots and RAG systems, I'm interested in identifying specific problems where LLMs demonstrably outperform traditional methods in terms of accuracy, cost, or efficiency. Problems I can think of are:
- words categorization
- sentiment analysis of no-large body of text
- image recognition (to some extent)
- writing style transfer (to some extent)
what else?
I want to train new cross attention layers feeding into a pretrained transformer (maybe a small llama model) while keeping the rest of the model constant.
What are some resources that might be helpful?