/r/Rag
Welcome to r/RAG! This is the go-to community for everything related to Retrieval-Augmented Generation (RAG). Join us to discuss, share, and explore cutting-edge RAG techniques, research, projects, and tools. Whether you're a researcher, developer, or AI enthusiast, you'll find tips, tutorials, and support to help you innovate with RAG. Let's collaborate to push the boundaries of AI's potential!
/r/Rag
🔍 Inside this Issue:
Check our Publication: https://medium.com/aiguys
This article covers different issues with creating a production-grade RAG system, understanding the deterministic nature of processes, and delving deep into the advanced RAG components. We will cover everything from reranker to repacking, from query classification to query expansion and many more such techniques that form the backbone of a modern RAG system.
Why Scaling RAGs For Production Is So Hard?
Don’t worry I’m not going to give you a list of the top 50 prompts to try, anyways that just doesn’t work at scale. We are here going to talk about different prompting techniques.
The Six Major Prompting Categories
Within the 58 categories, there are 6 top-level categories.
The Prompt Report: Prompt Engineering Techniques
A brand new paper from Google and Apple, where they looked into the internal LLMs to understand the nature of hallucinations. They showed that internal representations can also be used for predicting the types of errors the model is likely to make, facilitating the development of tailored mitigation strategies.
They also reveal a discrepancy between LLMs’ internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one. Taken together, these insights deepen our understanding of LLM errors from the model’s internal perspective, which can guide future research on enhancing error analysis and mitigation.
Apple says: “we found no evidence of formal reasoning in language models …. Their behavior is better explained by sophisticated pattern matching — so fragile, in fact, that changing names can alter results by ~10%!”
Apple Says LLMs Are Really Not That Smart
This year’s two Nobel Laureates in Physics have used tools from physics to develop methods that are the foundation of today’s powerful machine learning. John Hopfield created an associative memory that can store and reconstruct images and other types of patterns in data. Geoffrey Hinton invented a method that can autonomously find properties in data, and so perform tasks such as identifying specific elements in pictures.
Press release: Click here
The Nobel Prize in Chemistry 2024 is about proteins, life’s ingenious chemical tools. David Baker has succeeded with the almost impossible feat of building entirely new kinds of proteins. Demis Hassabis and John Jumper have developed an AI model to solve a 50-year-old problem: predicting proteins’ complex structures. These discoveries hold enormous potential.
Press release: Click here
OpenAI has introduced a search capability within ChatGPT, enabling real-time web browsing to provide up-to-date information. This feature positions ChatGPT as a direct competitor to traditional search engines like Google.
News Article: Click here
Elon Musk’s xAI Seeks $40 Billion Valuation: Elon Musk’s AI startup, xAI, is in talks to raise funding at a valuation of $40 billion, up from $24 billion five months prior. The company is developing an AI chatbot named Grok, available on Musk’s social media platform X.
News Article: Click here
Both Microsoft’s and Google’s AI-driven investment leads to a profit surge:
Microsoft’s substantial investments in AI have resulted in a 16% increase in quarterly sales, reaching $65.6 billion. The Azure cloud computing division saw a 33% revenue rise, highlighting the impact of AI on business processes.
Google’s parent company, reported a 34% increase in profit, earning $26.3 billion in the July-September quarter. This growth is attributed to AI investments and a 15% revenue surge to $88.27 billion
News Article: Click here
News Article: Click here
Hey everyone! I’m new to RAG and I wouldn't call myself a programmer by trade, but I’m intrigued by the potential and wanted to build a proof-of-concept for my company. We store a lot of data in .docx
and .pptx
files on Google Drive, and the built-in search just doesn’t cut it. Here’s what I’m working on:
We need a system that can serve as a knowledge base for specific projects, answering queries like:
Here’s what I looked into so far:
SimpleDirectoryReader
LLamaParse
use_vendor_multimodal_model
I’m running experiments from the simplest approach to more complex ones, eliminating what doesn’t work. For now, I’ve been testing with a single .pptx
file containing text, images, and graphs.
.pdf
would be an awkward solution.Generally I am not a fan of the solutions i called "Enterprise".
Probably it is only a matter of time when Google or one of the other main tech companies just launch a tool like NotebookLM for a reasonable price, or integrate a proper reasoning / vector search in google drive, right? So would it actually make sense to dig into RAG more right now. Or, as a user, should i just wait couple more months until a solution has been developed. Also I feel like the whole Augmented generation part might not be necessary for my use case at all, since the main productivity boost for my company would be to find things faster (or at all ;)
Thanks for reading this far! I’d love to hear your thoughts on the current state of RAG or any insights on building an efficient search system, Cheers!
Hey i want to make a rag to look through slide decks (think like lecture notes) and answer some basic questions about them. There may be a mix of text, diagrams and images so it would be great to be able to parse the text in those diagrams and images. Anyone have any suggestions on what models/tools work well for this? PaperQa looks good but not sure if it would fit my usecase with slide decks. It would be great to host it locally too if possible.
I just fell in love with this new RAG tool (Vectorize) I am playing with and just created a simple tutorial on how to build RAG pipelines in minutes and find out the best embedding model, chunking strategy, and retrieval approach to get the most accurate results from our LLM-powered RAG application.
I am trying to build a voice assistant RAG with 11labs api. I don't have any ML or Ai experience and not any exposure within Conversational/realtime apps/systems.
I understand things fine as long as I am running something locally i.e a python program with 11labs sdk to use my local pc microphone and speaker.
Now things get super alien once I try to convert this into a proper backend and have a frontend wrapper for it to be usable. I just don't want to cooy paste code from gpt but want to absolutely understand the basic concept of these websocket apps, audio chunks and whatever is needed here.
What do I study? What topics do I google or youtube? There's so much going on so idk where to begin to actually be able implement my use case on my own. (Not to fancy, just basics)
TLDR: Newbie in ai conversational apps. Where to start learning building a frontend/backend wrapper for conversational ai apis.
As the title suggests I am curious how you setup your system prompt to ensure your solution only answers within the limits of its knowledge base.
Since negative prompting have mixed reviews how do you manage this and what success rates have you got so far?
Hey folks!
I’m working on a demo for Retrieval-Augmented Generation (RAG) and want to showcase each step of the pipeline to my team in a clear and interactive way. My goal is to walk them through the entire RAG process, showing live outputs for:
1. Document Upload - Loading documents into the system.
2. Embedding Creation - Converting documents into embeddings.
3. Storage in Vector Database - Storing these embeddings in a vector database.
4. Similarity Search - Performing similarity searches and showing results.
5. Answer Generation - Displaying the final generated answers based on retrieved information.
Ideally, I’m looking for a UI-based tool that can help display each step’s output so I can show it live during the demo. I’d like to avoid heavy customizations or coding if possible.
I experimented with Kotaemon, but it doesn’t seem to have this functionality built-in.
Does anyone know of any RAG UI tools or platform that might meet this need?
Hi everyone,
I built an open source chatbot to make RAG seamless. It collects text data from what's visible on your screen into a local DB using OCR and accessibility APIs, and then finds the relevant context upon query.
Comparing to something like Rewind or Recall, it's much higher quality in terms of text data capture + more resource efficient as there's no storage beyond text. Would love your feedback on improving the retrieval performance, what features you'd like to see it added, or anything else.
Hi everyone,
I’m currently working in my spare time on a RAG project to develop a chatbot for users of a French software, and I’m facing an issue with the Llava model (13B; I also tried the 34B version, but it's very slow, and the problem persists).
For full details on the architecture and pipeline, you can check out this document: https://github.com/PaulAero/OpaleAI/blob/master/README.md
(Note: The repo version is not updated but will be soon; however, the README is up-to-date!)
One of the key parts of this project is converting visual information from PDFs (tutorials with annotated screenshots) into step-by-step text instructions. However, Llava is generating completely unusable descriptions, often inventing procedures that don’t exist.
Here’s what I’ve done so far:
Pipeline: I use Llava to transcribe screenshots from each page of the PDF tutorials. The response is then vectorized and stored for future retrieval.
Prompting: I created a detailed prompt instructing Llava to focus on actionable steps and avoid irrelevant details.
Example Issue: In one of the documents, Llava completely misinterpreted a set of instructions by creating steps that were not present in the image, resulting in misleading outputs.
My main questions:
- Has anyone else faced similar issues with Llava when processing screenshots with annotations?
- Has anyone experienced this issue with documents in languages other than English? Could this be a limitation of the model itself, or should I consider modifying the pipeline?
- Are there any alternative models or approaches for handling screenshots that might work better?
- If I need to use a paid API, which model would you recommend, and what would you consider the minimum image quality required to achieve good results and ensure cost-effectiveness?
Additional context:
I’m using the Mistral-Nemo model for text chunks, with embedding handled by SentenceTransformer('all-mpnet-base-v2'). Document retrieval and vector storage are managed by ChromaDB.
I’d appreciate any suggestions or insights!
Thanks in advance!
PS: I have several additional questions about handling mixed French and English documents and ways to improve the retrieval module—maybe by adding a maximum distance parameter. :)
Hey everyone, as you many start-up developing no-code rag applications nowadays, actually I haven't tried most of them but I wonder generally can no-code RAG applications be an alternative to custom RAG pipelines?
Recently tried the long term memory feature in OpenAGI for autonomous agents—works super well. Check it out: https://github.com/aiplanethub/openagi
Hey mates. So i'm completely new to RAG and llamaindex, i'm trying to make a RAG system that will take pdf documents of resume and will answer questions like "give me the best 3 candidates for an IT Job".
I ran into an issue trying to use ChromaDB, i tried to make a function that will save embedding into a database, and another that will load them. But whenever I ask a question it just says stuff like "I don't have information about this", or "i don't have context about this document"...
Here is the code:
chroma_storage_path = "chromadb"
#@app.post("/submissao/")
def save_to_db(document):
"""Save document to the database."""
file_extractor = {".pdf": parser}
documents = SimpleDirectoryReader(input_files=[document], file_extractor=file_extractor).load_data()
db = chromadb.PersistentClient(path=chroma_storage_path)
chroma_collection = db.get_or_create_collection("candidaturas")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
chroma_index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, show_progress=True)
return {"message": "Document saved successfully."}
#@app.get("/query/")
def query_op(query_text: str):
"""Query the index with provided text using documents from ChromaDB."""
# Load documents from ChromaDB
db = chromadb.PersistentClient(path=chroma_storage_path)
chroma_collection = db.get_or_create_collection("candidaturas")
chroma_vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
chroma_index = VectorStoreIndex.from_vector_store(vector_store=chroma_vector_store) #new addition
query_engine = chroma_index.as_query_engine(llm=llm)
response = query_engine.query(query_text)
#print(response)
return {"response": response}
#if __name__ == "__main__":
#pass
save_to_db("cv1.pdf")
query_op("Is this candidate fit for an IT Job?")
Hi, I'm building an agentic rag system using colpali. Though I've seen a lot of posts asking about indexing, retrieval strategies, I haven't seen much about how folks go about doing load testing on the inference speed. What tools are you using, what are the pros and cons about each of them? Thanks in advance.
Basically what the title says:
What is the most adopted open-source observability tool out there? In the industry standard, not the best but the most adopted one.
Phoenix Arize? LangFuse?
I need to choose a tool for the ai proyects at my company and your insights could be gold for this research!
Struggling to manually verify your model's answers every time you tweak it? We’ve been there. That’s why we built EvalMy.AI, a simple, easy-to-integrate service that automates this process using our C3-score metric (correctness, completeness, contradiction). It helps you quickly spot where your AI might fall short, reducing friction and speeding up testing.
We’re now in beta, and we’d love your feedback. Try it out for free at evalmy.ai and let us know what you think! Connect with me for any questions. Let’s build smarter together! 💡
Hi All,
Been dabbing at Ollama to create a custom RAG hosted in local server (for security reasons). Now the client wants a Chain of Thought (COT) capability as well. Basically the client wants basic numerical functionality. For e.g. "I am doing 80 mph on I 80. What is the average speed here and how much slower or faster I am".
The data has details about avg speed of I80. example 90 mph. So the RAG application should say "I am 10mph slower than average speed."
Are there any COT capable Local LLMs? If not any idea how to solve the above problem?
Hi everyone, I have been learning Rag for months and I have created some question-answering applications using LangChain to add to my resume. But I am wondering in real life, in production level rag applications, what is the difference from my local simple rag project? Which vectorstore do you use for your project, or embedding model? open source or api?
What are the biggest differences between production-level RAG applications and simple RAG projects on github? Are your documents usually pdf or csv?
Thank you.
My company has built an AI platform that acts as a co-pilot for recruiters. We are looking to add another member to our team as a AI/Backend Engineer.
Role Description:
AI/Backend Engineer will assist with all backend + RAG needs such as database systems, RAG chatbot, node.js crons, server-side code for our web application, and relational DB design + management.
DM me if you are interested!
I'm trying to better understand how implementations are typically done. My own implementations so far have used matches from a vector db and pushed then in to the prompt for the chat LLM, but for large result sets, this obviously becomes expensive in terms of tokens.
I understand that with e.g OpenAI, I have other options but I'm wondering if someone can explain alternate implementation methods where it's not simply a matter of massaging the prompt with the stuff I fetch via embeddings...
Also, what kind of businesses are you approaching? Are they technical/non-technical? How are you convincing them of your value prop? Are you using any qualifying questions to filter businesses that are more open to your solution?
I just wanted to write a quick message to let everyone here know how much HITL has been a powerful tool for me!
This approach allows us at https://quepasa.ai to achieve target accuracy quickly when developing both high-accuracy SaaS and on-premises solutions.
I'd say that HITL is definitely one of the core tools to improve the accuracy of RAG systems. But I also want to point out that it's just one of the tools — not the only one, and not necessarily the key one.
As is often the case in reality, there's no single silver bullet here. You choose the best solution based on the specific task and available resources.
From my experience building RAG systems and end-to-end solutions for our clients, here are a few key points when it comes to using HITL:
Also, for client-specific cases, we always bring in domain experts to evaluate the accuracy of our RAG systems. These experts literally review the generated answers and assess whether they'd have answered the same way or differently.
What about you? Do you use HITL? If yes, for what and how? What are your experiences with it?
Hi r/RAG,
I'm a software engineer on the AI team at timescale.
We've just released a significant upgrade to our open-source postgres extension pgai, which builds upon pgvector and our own vector index extension pgvectorscale. The new update enables automatic embedding synchronization, meaning you can simply insert/update/delete on a source table and it will synchronize the embeddings to an embeddings table in the background. A feature we've seen a lot of customers build from scratch.
Of course you keep all the beautiful features of postgres with ACID compliance, filtering capabilities, etc. alive.
How does it work?
To configure a table with embeddings you simply create a so called vectorizer with an sql function provided by the extension:
SELECT ai.create_vectorizer(
<table_name>::regclass,
destination => <embedding_table_name>,
embedding => ai.embedding_openai(<model_name>, <dimensions>),
chunking => ai.chunking_recursive_character_text_splitter(<column_name>)
);
The actual synchronization then runs in a background worker that ships as a simple docker container . You only have to configure your LLM providers API key and a postgres connection string. Everything else is done by the worker automatically (We also have a cloud offering if you're too lazy to host the worker yourself). We make use of triggers to set up a queue for any changes made to the source table.
Combined with already existing features from the extension this then allows you to e.g. build semantic search directyy in postgres. For example this statement embeds a query string and finds the closest embedding from the source table:
SELECT
chunk,
embedding <=> ai.openai_embed(<embedding_model>, 'some-query') as distance
FROM <table_name>
ORDER BY distance
LIMIT 5;
If you like it, head over to Github and try it out: https://github.com/timescale/pgai
(and leave a star while you're at it)
We'd love to hear your thoughts and feedback. We're committed to open-source and the postgres ecosystem and want to make Postgres the best database for any sort of LLM powered application. So if you have any ideas, feel free to reach out.
PS: I know I'm slightly violating the 10/90 rule of self promotion here. I've only recently joined the team and plan to be a lot more active. Since the project is open source and highly relevant to RAG, I hope you can give me the benefit of the doubt for now.
Hey guys, I’m thinking about extending my RAG system to work with SQL DBs. The typical RAG setup doesn’t really fit here since we’re dealing with around 50,000 items that are frequently updated.
I’m considering an approach where I gather filters/parameters through a conversation with the user and then turn that into an SQL query. Not sure if that’s the best path, though.
Anyone have experience with similar tasks? Any tips or better approaches you’d suggest?
I’m using Vectara solution and I received this email today :
An Important Message from Vectara
We hope you’ve been enjoying the features of the Vectara Platform as part of our Growth plan. As we continue to enhance our platform, we’re making an adjustment to help you get even more out of the experience.
When we first launched our free plan, it focused primarily on semantic search capabilities with 15,000 free queries per month. Since then, we’ve launched a ton of new features: generative AI capabilities, new models, hybrid search, structured metadata storage and retrieval, new cross-attentional rankers, our automated hallucination evaluation model, an automated chat API, and many more. Users who have used these features report that they provide a ton of value but also require a lot more storage and compute from Vectara than we had ever built the free plan to handle.
For that reason, starting October 28th, 2024, our free plan will shift to a 30-day trial. However, that trial will have the full power of Vectara behind it, as opposed to the limitations of our Growth plan. For example, you’ll get access to our most powerful multi-lingual reranker, the ability to modify prompts, our most powerful generative models, the ability to ask for more results per query and much more.
We understand changes can be unexpected, and we want you to know this will help us better serve you with richer functionality and continued improvements. If you want to continue using the platform, you can input your credit card details to move to a Pro plan or reach out to our team to discuss a Scale plan agreement, including business SLAs, premium support, custom dimensions, and more. The new plans start at $100/mo.
We hope that you’ll take up one of these offerings. However, we will delete accounts and associated data for those who have not upgraded to a Pro plan or above by November 28th, 2024.
If you have any questions, please feel free to reach out.
——-
This will make my cost to from ~$110 USD to $300 USD with a 30 days notice or else they will delete all my datas. (Knowing that there is no way to export parsed datas).
I’m not happy and I would not recommend anyone to work with them in the future.
Now would anyone could recommend me a good alternative?
I need a SaaS/self service RAG solution with API to add documents and search in them.
I know RAGIE but they ask for a minimum commitment of $500 USD
Llamaindex have a SaaS solution but not open to public yet.
I'm looking for the best open-source PDF parsing solution for Node.js/TypeScript, specifically with features that support Retrieval-Augmented Generation (RAG) workflows. My ideal parser should be capable of handling the following:
Does anyone have experience with a parser that meets these criteria, or could recommend any libraries or tools that work well for RAG setups?
Hey RAGers!
We've been working on bringing data engineering to AI LLM workflows the project is trustgraph.ai. To give you a flavor, the project pulls together a variety of capabilities, SLMs, commercial LLM invocation, chunking, prompt engineering, Graph RAG. There's a UI which can generate deployment specs for Docker Compose, GCP K8s and Minikube. Deploying on Pulsar to connect processing units with infrastructure to scale, handle errors and retry.