/r/Rag

Photograph via snooOG

Welcome to r/RAG! This is the go-to community for everything related to Retrieval-Augmented Generation (RAG). Join us to discuss, share, and explore cutting-edge RAG techniques, research, projects, and tools. Whether you're a researcher, developer, or AI enthusiast, you'll find tips, tutorials, and support to help you innovate with RAG. Let's collaborate to push the boundaries of AI's potential!

/r/Rag

7,697 Subscribers

2

Last Month In AI | AIGuys Newsletter

🔍 Inside this Issue:

  • 🤖 Latest Breakthroughs: This month it’s all about Scaling RAGs for Production, The Prompt Report, and LLM's black box nature.
  • 🌐 AI Monthly News: Discover how these stories revolutionize industries and impact everyday life: AI Scientists winning the Noble Prize in Chemistry and Physics, OpenAI challenges Google Search and Big Tech makes big money.
  • 📚 Editor’s Special: This covers the interesting talks, lectures, and articles we came across recently.

Check our Publication: https://medium.com/aiguys

Latest Breakthroughs

This article covers different issues with creating a production-grade RAG system, understanding the deterministic nature of processes, and delving deep into the advanced RAG components. We will cover everything from reranker to repacking, from query classification to query expansion and many more such techniques that form the backbone of a modern RAG system.

Why Scaling RAGs For Production Is So Hard?

Don’t worry I’m not going to give you a list of the top 50 prompts to try, anyways that just doesn’t work at scale. We are here going to talk about different prompting techniques.

The Six Major Prompting Categories

Within the 58 categories, there are 6 top-level categories.

  1. Zero-Shot
  2. Few-Shot
  3. Thought Generation
  4. Decomposition
  5. Ensembling
  6. Self-Criticism

The Prompt Report: Prompt Engineering Techniques

A brand new paper from Google and Apple, where they looked into the internal LLMs to understand the nature of hallucinations. They showed that internal representations can also be used for predicting the types of errors the model is likely to make, facilitating the development of tailored mitigation strategies.

They also reveal a discrepancy between LLMs’ internal encoding and external behavior: they may encode the correct answer, yet consistently generate an incorrect one. Taken together, these insights deepen our understanding of LLM errors from the model’s internal perspective, which can guide future research on enhancing error analysis and mitigation.

LLMs Know More Than They Show

Apple says: “we found no evidence of formal reasoning in language models …. Their behavior is better explained by sophisticated pattern matching — so fragile, in fact, that changing names can alter results by ~10%!”

Apple Says LLMs Are Really Not That Smart

AI Monthly News

Computer Scientists Wins Noble In Both Physics and Chemistry.

This year’s two Nobel Laureates in Physics have used tools from physics to develop methods that are the foundation of today’s powerful machine learning. John Hopfield created an associative memory that can store and reconstruct images and other types of patterns in data. Geoffrey Hinton invented a method that can autonomously find properties in data, and so perform tasks such as identifying specific elements in pictures.

Press release: Click here

The Nobel Prize in Chemistry 2024 is about pro­teins, life’s ingenious chemical tools. David Baker has succeeded with the almost impossible feat of building entirely new kinds of proteins. Demis Hassabis and John Jumper have developed an AI model to solve a 50-year-old problem: predicting proteins’ complex structures. These discoveries hold enormous potential.

Press release: Click here

OpenAI Challenges Google’s Search Monopoly

OpenAI has introduced a search capability within ChatGPT, enabling real-time web browsing to provide up-to-date information. This feature positions ChatGPT as a direct competitor to traditional search engines like Google.

News Article: Click here

Big Tech Makes Big Money

Elon Musk’s xAI Seeks $40 Billion Valuation: Elon Musk’s AI startup, xAI, is in talks to raise funding at a valuation of $40 billion, up from $24 billion five months prior. The company is developing an AI chatbot named Grok, available on Musk’s social media platform X.

News Article: Click here

Both Microsoft’s and Google’s AI-driven investment leads to a profit surge:

Microsoft’s substantial investments in AI have resulted in a 16% increase in quarterly sales, reaching $65.6 billion. The Azure cloud computing division saw a 33% revenue rise, highlighting the impact of AI on business processes.

Google’s parent company, reported a 34% increase in profit, earning $26.3 billion in the July-September quarter. This growth is attributed to AI investments and a 15% revenue surge to $88.27 billion

News Article: Click here

News Article: Click here

Editor’s Special

  • The Elegant Math Behind Machine Learning Click here
  • AI RISING: Risk vs Reward — The Hinton Lectures™: Click here
  • A fireside chat with Sam Altman OpenAI CEO at Harvard University: Click here
  • NVIDIA’s New Ray Tracing Tech Should Be Impossible!: Click here
1 Comment
2024/11/04
08:23 UTC

11

Investigating RAG for improved document search and a company knowledge base

Hey everyone! I’m new to RAG and I wouldn't call myself a programmer by trade, but I’m intrigued by the potential and wanted to build a proof-of-concept for my company. We store a lot of data in .docx and .pptx files on Google Drive, and the built-in search just doesn’t cut it. Here’s what I’m working on:

Use Case

We need a system that can serve as a knowledge base for specific projects, answering queries like:

  • “Have we done Analysis XY in the past? If so, what were the key insights?”

Requirements

  • Precision & Recall: Results should be relevant and accurate.
  • Citation: Ideally, citations should link directly to the document, not just display the used text chunks.

Dream Features

  • Automatic Updates: A vector database that automatically updates as new files are added, embedding only the changes.
  • User Interface: Simple enough for non-technical users.
  • Network Accessibility: Everyone on the network should be able to query the same system from their own machine.

Initial Investigations

Here’s what I looked into so far:

  1. DIY Solutions- LLamaIndex with different readers:
  • SimpleDirectoryReader
  • LLamaParse
  • use_vendor_multimodal_model
  1. Open-Source Options
  1. Enterprise Solutions

Test Setup

I’m running experiments from the simplest approach to more complex ones, eliminating what doesn’t work. For now, I’ve been testing with a single .pptx file containing text, images, and graphs.

Findings So Far

  • Data Loss: A lot of metadata is lost when downloading Google Drive slides.
  • Vision Embeddings: Essential for my use case. I found vision embeddings to be more valuable when images are detected and summarized by an LLM, which is then used for embedding.
  • Results: H2O significantly outperformed other options, particularly in processing images with text. Using vision embeddings from GPT-4o and Claude Haiku, H2O gave perfect answers to test queries. some solutions doesn't support .pptx files out of the box. I feel like to first transform them to a .pdf would be an awkward solution.

Considerations & Concerns

Generally I am not a fan of the solutions i called "Enterprise".

  • Vertex AI is way to expensive because google charges per user.
  • NotebookLM is in beta and I have no clue what they are actually doing under the hood (is this even RAG or does everything just get fed into Gemini?).
  • H2O.ai themself claim, to not use private / sensitive / internal documents / knowledge. Plus I am also not sure if it is really RAG what they are doing. Changing models and parameters, doesn't change the answer for my queries in the slightest + when looking at the citations the whole document seems to be used. Obviously a DIY solution offers the best control over everything and also lets me chunk and semantically enrich exactly the way I would want to. BUT it is also very hard (at least for me) to build such a tool + to actually use it within my company it would need maintenance and a UI + a way to distribute it to all employees etc. \I am a bit lost right now about which path I should further investigate.

Is RAG even worth it?

Probably it is only a matter of time when Google or one of the other main tech companies just launch a tool like NotebookLM for a reasonable price, or integrate a proper reasoning / vector search in google drive, right? So would it actually make sense to dig into RAG more right now. Or, as a user, should i just wait couple more months until a solution has been developed. Also I feel like the whole Augmented generation part might not be necessary for my use case at all, since the main productivity boost for my company would be to find things faster (or at all ;)

Thanks for reading this far! I’d love to hear your thoughts on the current state of RAG or any insights on building an efficient search system, Cheers!

7 Comments
2024/11/04
00:08 UTC

4

Making a rag to look thru pdfs of slide deck

Hey i want to make a rag to look through slide decks (think like lecture notes) and answer some basic questions about them. There may be a mix of text, diagrams and images so it would be great to be able to parse the text in those diagrams and images. Anyone have any suggestions on what models/tools work well for this? PaperQa looks good but not sure if it would fit my usecase with slide decks. It would be great to host it locally too if possible.

1 Comment
2024/11/03
22:19 UTC

0

Building RAG pipelines so seamlessly? I never thought it would be possible

I just fell in love with this new RAG tool (Vectorize) I am playing with and just created a simple tutorial on how to build RAG pipelines in minutes and find out the best embedding model, chunking strategy, and retrieval approach to get the most accurate results from our LLM-powered RAG application.

4 Comments
2024/11/03
16:34 UTC

3

RAG voice assistant newbie

I am trying to build a voice assistant RAG with 11labs api. I don't have any ML or Ai experience and not any exposure within Conversational/realtime apps/systems.

I understand things fine as long as I am running something locally i.e a python program with 11labs sdk to use my local pc microphone and speaker.

Now things get super alien once I try to convert this into a proper backend and have a frontend wrapper for it to be usable. I just don't want to cooy paste code from gpt but want to absolutely understand the basic concept of these websocket apps, audio chunks and whatever is needed here.

What do I study? What topics do I google or youtube? There's so much going on so idk where to begin to actually be able implement my use case on my own. (Not to fancy, just basics)

TLDR: Newbie in ai conversational apps. Where to start learning building a frontend/backend wrapper for conversational ai apis.

5 Comments
2024/11/02
13:56 UTC

8

Techniques to follow RAG Only for answers? (Help)

As the title suggests I am curious how you setup your system prompt to ensure your solution only answers within the limits of its knowledge base.

Since negative prompting have mixed reviews how do you manage this and what success rates have you got so far?

10 Comments
2024/11/02
13:11 UTC

8

Tool to demonstrate step by step output of RAG pipeline

Hey folks!

I’m working on a demo for Retrieval-Augmented Generation (RAG) and want to showcase each step of the pipeline to my team in a clear and interactive way. My goal is to walk them through the entire RAG process, showing live outputs for:

1.	Document Upload - Loading documents into the system.
2.	Embedding Creation - Converting documents into embeddings.
3.	Storage in Vector Database - Storing these embeddings in a vector database.
4.	Similarity Search - Performing similarity searches and showing results.
5.	Answer Generation - Displaying the final generated answers based on retrieved information.

Ideally, I’m looking for a UI-based tool that can help display each step’s output so I can show it live during the demo. I’d like to avoid heavy customizations or coding if possible.

I experimented with Kotaemon, but it doesn’t seem to have this functionality built-in.

Does anyone know of any RAG UI tools or platform that might meet this need?

6 Comments
2024/11/02
10:46 UTC

18

Heelix - Open Source RAG Chatbot with Seamless Local Data Collection

Hi everyone,

I built an open source chatbot to make RAG seamless. It collects text data from what's visible on your screen into a local DB using OCR and accessibility APIs, and then finds the relevant context upon query.

  • Privacy first: all data stays local on your machine outside of what is sent to the LLM of your choice.
  • Context retrieval: local vector DB to identify top K relevant documents, filtering through cheap LLM + ability to manually attach documents.
  • Your choice of LLM: use your own API key with Anthropic or OpenAI
  • Works on both Mac and PC
  • Build with Rust and Tauri for low resource consumption

Comparing to something like Rewind or Recall, it's much higher quality in terms of text data capture + more resource efficient as there's no storage beyond text. Would love your feedback on improving the retrieval performance, what features you'd like to see it added, or anything else.

Github: https://github.com/stritefax/heelixchat

4 Comments
2024/11/01
20:13 UTC

3

Help needed with Llava model in simple RAG project (issues with generated descriptions)

Hi everyone,

I’m currently working in my spare time on a RAG project to develop a chatbot for users of a French software, and I’m facing an issue with the Llava model (13B; I also tried the 34B version, but it's very slow, and the problem persists).

For full details on the architecture and pipeline, you can check out this document: https://github.com/PaulAero/OpaleAI/blob/master/README.md

(Note: The repo version is not updated but will be soon; however, the README is up-to-date!)

One of the key parts of this project is converting visual information from PDFs (tutorials with annotated screenshots) into step-by-step text instructions. However, Llava is generating completely unusable descriptions, often inventing procedures that don’t exist.

Here’s what I’ve done so far:

Pipeline: I use Llava to transcribe screenshots from each page of the PDF tutorials. The response is then vectorized and stored for future retrieval.

Prompting: I created a detailed prompt instructing Llava to focus on actionable steps and avoid irrelevant details.

Example Issue: In one of the documents, Llava completely misinterpreted a set of instructions by creating steps that were not present in the image, resulting in misleading outputs.

My main questions:

- Has anyone else faced similar issues with Llava when processing screenshots with annotations?
- Has anyone experienced this issue with documents in languages other than English? Could this be a limitation of the model itself, or should I consider modifying the pipeline?
- Are there any alternative models or approaches for handling screenshots that might work better?
- If I need to use a paid API, which model would you recommend, and what would you consider the minimum image quality required to achieve good results and ensure cost-effectiveness?

Additional context:

I’m using the Mistral-Nemo model for text chunks, with embedding handled by SentenceTransformer('all-mpnet-base-v2'). Document retrieval and vector storage are managed by ChromaDB.

I’d appreciate any suggestions or insights!

Thanks in advance!

PS: I have several additional questions about handling mixed French and English documents and ways to improve the retrieval module—maybe by adding a maximum distance parameter. :)

1 Comment
2024/11/01
10:05 UTC

0

can No-Code RAG Applications be an alternative to Custom-RAG Pipelines?

Hey everyone, as you many start-up developing no-code rag applications nowadays, actually I haven't tried most of them but I wonder generally can no-code RAG applications be an alternative to custom RAG pipelines?

5 Comments
2024/11/01
08:37 UTC

6

long term memory for agents

Recently tried the long term memory feature in OpenAGI for autonomous agents—works super well. Check it out: https://github.com/aiplanethub/openagi

1 Comment
2024/11/01
07:56 UTC

4

I need help with my RAG Resume Analyser

Hey mates. So i'm completely new to RAG and llamaindex, i'm trying to make a RAG system that will take pdf documents of resume and will answer questions like "give me the best 3 candidates for an IT Job".

I ran into an issue trying to use ChromaDB, i tried to make a function that will save embedding into a database, and another that will load them. But whenever I ask a question it just says stuff like "I don't have information about this", or "i don't have context about this document"...

Here is the code:

chroma_storage_path = "chromadb"

#@app.post("/submissao/")

def save_to_db(document):

"""Save document to the database."""

file_extractor = {".pdf": parser}

documents = SimpleDirectoryReader(input_files=[document], file_extractor=file_extractor).load_data()

db = chromadb.PersistentClient(path=chroma_storage_path)

chroma_collection = db.get_or_create_collection("candidaturas")

vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

chroma_index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, show_progress=True)

return {"message": "Document saved successfully."}

#@app.get("/query/")

def query_op(query_text: str):

"""Query the index with provided text using documents from ChromaDB."""

# Load documents from ChromaDB

db = chromadb.PersistentClient(path=chroma_storage_path)

chroma_collection = db.get_or_create_collection("candidaturas")

chroma_vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

chroma_index = VectorStoreIndex.from_vector_store(vector_store=chroma_vector_store) #new addition

query_engine = chroma_index.as_query_engine(llm=llm)

response = query_engine.query(query_text)

#print(response)

return {"response": response}

#if __name__ == "__main__":

#pass

save_to_db("cv1.pdf")

query_op("Is this candidate fit for an IT Job?")

1 Comment
2024/11/01
03:42 UTC

2

How do you do load testing on the rag system?

Hi, I'm building an agentic rag system using colpali. Though I've seen a lot of posts asking about indexing, retrieval strategies, I haven't seen much about how folks go about doing load testing on the inference speed. What tools are you using, what are the pros and cons about each of them? Thanks in advance.

2 Comments
2024/10/31
16:21 UTC

10

Industry standard observability tool

Basically what the title says:

What is the most adopted open-source observability tool out there? In the industry standard, not the best but the most adopted one.

Phoenix Arize? LangFuse?

I need to choose a tool for the ai proyects at my company and your insights could be gold for this research!

12 Comments
2024/10/31
15:57 UTC

11

Caching Methods in Large Language Models (LLMs)

https://preview.redd.it/4e6tbng954yd1.png?width=1200&format=png&auto=webp&s=9a6d3e6735b2cd86947416597a1a805e8e510424

https://preview.redd.it/u2seqtla54yd1.png?width=1200&format=png&auto=webp&s=0567167abc3fed30b635be63170401a05eb9c0b0

https://preview.redd.it/t1nsxzla54yd1.png?width=1200&format=png&auto=webp&s=64ee29257cbeabc3a3ac7e1b12b835bfed749b0b

https://preview.redd.it/alvzyrla54yd1.png?width=1200&format=png&auto=webp&s=7029c12c6b437b5f235e6281e5f7529f6fb2c9ae

https://preview.redd.it/moi0dwia54yd1.png?width=1200&format=png&auto=webp&s=90b906e4d8621a64b64582adba28a9b35ff52594

https://preview.redd.it/jvgu0via54yd1.png?width=1200&format=png&auto=webp&s=755dd56b675e40e5c0c7a843924a4f8d6e1c43d5

https://preview.redd.it/u749j3ma54yd1.png?width=1200&format=png&auto=webp&s=951ba7c765feae73768770a411f7df37e166649c

https://preview.redd.it/k0fx1zla54yd1.png?width=1200&format=png&auto=webp&s=43aed5a7a98f021994f04888404f628105387074

https://preview.redd.it/wbgp3tla54yd1.png?width=1200&format=png&auto=webp&s=51161c34555f2a711254da8d74e48204823e9ce7

https://preview.redd.it/t9nwiwia54yd1.png?width=1200&format=png&auto=webp&s=fb4405b56b783b4d3db6771d2486f69414c80dc3

https://preview.redd.it/7vvwssla54yd1.png?width=1200&format=png&auto=webp&s=bd57916e465217610a4c41b49ce74cb36257beaa

https://preview.redd.it/y638z0ma54yd1.png?width=1200&format=png&auto=webp&s=01ad791e067d4495b77469b623446d8271a95e35

https://preview.redd.it/6h1xzqia54yd1.png?width=1200&format=png&auto=webp&s=4c703484bf83f4fb5c27fc181e230c1a39deb5cd

https://preview.redd.it/1qxmkxia54yd1.png?width=1200&format=png&auto=webp&s=cfe67ea5f0fb851d7641d4ba10dd1b29c586d712

https://www.masteringllm.com/course/llm-interview-questions-and-answers?previouspage=home&isenrolled=no#/home

https://www.masteringllm.com/course/agentic-retrieval-augmented-generation-agenticrag?previouspage=home&isenrolled=no#/home

3 Comments
2024/10/31
15:51 UTC

0

🚀 Say goodbye to manual testing of your LLM-based apps – automate with EvalMy.AI beta! 🚀

Struggling to manually verify your model's answers every time you tweak it? We’ve been there. That’s why we built EvalMy.AI, a simple, easy-to-integrate service that automates this process using our C3-score metric (correctness, completeness, contradiction). It helps you quickly spot where your AI might fall short, reducing friction and speeding up testing.

We’re now in beta, and we’d love your feedback. Try it out for free at evalmy.ai and let us know what you think! Connect with me for any questions. Let’s build smarter together! 💡

1 Comment
2024/10/31
13:51 UTC

2

Are there any Local LLMs with COT capabilities?

Hi All,

Been dabbing at Ollama to create a custom RAG hosted in local server (for security reasons). Now the client wants a Chain of Thought (COT) capability as well. Basically the client wants basic numerical functionality. For e.g. "I am doing 80 mph on I 80. What is the average speed here and how much slower or faster I am".

The data has details about avg speed of I80. example 90 mph. So the RAG application should say "I am 10mph slower than average speed."

Are there any COT capable Local LLMs? If not any idea how to solve the above problem?

4 Comments
2024/10/31
11:34 UTC

27

Prod Level RAG Applications

Hi everyone, I have been learning Rag for months and I have created some question-answering applications using LangChain to add to my resume. But I am wondering in real life, in production level rag applications, what is the difference from my local simple rag project? Which vectorstore do you use for your project, or embedding model? open source or api?

What are the biggest differences between production-level RAG applications and simple RAG projects on github? Are your documents usually pdf or csv?

Thank you.

31 Comments
2024/10/31
11:27 UTC

5

AI Startup looking for devs

My company has built an AI platform that acts as a co-pilot for recruiters. We are looking to add another member to our team as a AI/Backend Engineer.

Role Description:

AI/Backend Engineer will assist with all backend + RAG needs such as database systems, RAG chatbot, node.js crons, server-side code for our web application, and relational DB design + management.

  • Design and Implement RAG Pipelines with Relational Database Support:
    • Develop RAG systems for AI-driven applications, integrating external knowledge sources with language models.
    • Configure and optimize retrieval components (e.g., vector databases) for high relevance and efficiency.
    • Fine-tune language models for specific retrieval-augmented tasks, such as question answering or document summarization.
  • Embedding Storage and Retrieval in a Relational Database:
    • Design tables for storing embeddings, splitting them across rows or encoding them into JSON/BLOB format as needed.
    • Implement distance or similarity functions within SQL (e.g., cosine similarity) for embedding searches, using database-specific features like PostgreSQL’s cube extension or MySQL’s spatial indexing.
  • Node.js API Development with SQL Support:
    • Build APIs in Node.js that query and manipulate relational databases for embedding-based tasks.
    • Design efficient SQL queries to retrieve and filter relevant data in real-time for RAG workflows.
    • Use ORM libraries (e.g., Sequelize, TypeORM) to streamline interaction with relational databases and maintain scalability.

DM me if you are interested!

8 Comments
2024/10/31
03:34 UTC

11

How to perform RAG without spending tokens

I'm trying to better understand how implementations are typically done. My own implementations so far have used matches from a vector db and pushed then in to the prompt for the chat LLM, but for large result sets, this obviously becomes expensive in terms of tokens.

I understand that with e.g OpenAI, I have other options but I'm wondering if someone can explain alternate implementation methods where it's not simply a matter of massaging the prompt with the stuff I fetch via embeddings...

11 Comments
2024/10/30
20:49 UTC

29

For those of you doing RAG-based startups: How are you approaching businesses?

Also, what kind of businesses are you approaching? Are they technical/non-technical? How are you convincing them of your value prop? Are you using any qualifying questions to filter businesses that are more open to your solution?

31 Comments
2024/10/30
12:28 UTC

0

Why Isn't HITL Getting the Hype It Deserves?

I just wanted to write a quick message to let everyone here know how much HITL has been a powerful tool for me!

This approach allows us at https://quepasa.ai to achieve target accuracy quickly when developing both high-accuracy SaaS and on-premises solutions.

I'd say that HITL is definitely one of the core tools to improve the accuracy of RAG systems. But I also want to point out that it's just one of the tools — not the only one, and not necessarily the key one.

As is often the case in reality, there's no single silver bullet here. You choose the best solution based on the specific task and available resources.

From my experience building RAG systems and end-to-end solutions for our clients, here are a few key points when it comes to using HITL:

  1. Human in the Loop is crucial when the data:
    • Is niche-specific, with lots of jargon or technical terms
    • Is super fresh, highly specific, or non-public. LLMs just can't match the depth of knowledge that experts have in those cases. Plus, there might be gaps in the knowledge base.
  2. At the end of the day, we’re building RAG systems for people, and they’re the ones who'll interact with them. So it makes total sense to add evaluation tools to gather feedback from users on how well the product is performing.
  3. As for evaluation, we use both explicit and implicit signals:
    • Explicit UGC (User-Generated Content) — like thumbs up/down on responses
    • Implicit — like tracking visits to certain knowledge base pages or clicks on document search results.

Also, for client-specific cases, we always bring in domain experts to evaluate the accuracy of our RAG systems. These experts literally review the generated answers and assess whether they'd have answered the same way or differently.

What about you? Do you use HITL? If yes, for what and how? What are your experiences with it?

3 Comments
2024/10/29
19:54 UTC

17

pgai 0.4.0: Sync Embeddings, build semantic search and develop RAG applications directly in PostgreSQL

Hi r/RAG,
I'm a software engineer on the AI team at timescale.

We've just released a significant upgrade to our open-source postgres extension pgai, which builds upon pgvector and our own vector index extension pgvectorscale. The new update enables automatic embedding synchronization, meaning you can simply insert/update/delete on a source table and it will synchronize the embeddings to an embeddings table in the background. A feature we've seen a lot of customers build from scratch.

Of course you keep all the beautiful features of postgres with ACID compliance, filtering capabilities, etc. alive.

How does it work?
To configure a table with embeddings you simply create a so called vectorizer with an sql function provided by the extension:

SELECT ai.create_vectorizer( 
    <table_name>::regclass,
    destination => <embedding_table_name>,
    embedding => ai.embedding_openai(<model_name>, <dimensions>),
    chunking => ai.chunking_recursive_character_text_splitter(<column_name>)
);

The actual synchronization then runs in a background worker that ships as a simple docker container . You only have to configure your LLM providers API key and a postgres connection string. Everything else is done by the worker automatically (We also have a cloud offering if you're too lazy to host the worker yourself). We make use of triggers to set up a queue for any changes made to the source table.

Combined with already existing features from the extension this then allows you to e.g. build semantic search directyy in postgres. For example this statement embeds a query string and finds the closest embedding from the source table:

SELECT 
   chunk,
   embedding <=> ai.openai_embed(<embedding_model>, 'some-query') as distance
FROM <table_name>
ORDER BY distance
LIMIT 5;

If you like it, head over to Github and try it out: https://github.com/timescale/pgai
(and leave a star while you're at it)

We'd love to hear your thoughts and feedback. We're committed to open-source and the postgres ecosystem and want to make Postgres the best database for any sort of LLM powered application. So if you have any ideas, feel free to reach out.

PS: I know I'm slightly violating the 10/90 rule of self promotion here. I've only recently joined the team and plan to be a lot more active. Since the project is open source and highly relevant to RAG, I hope you can give me the benefit of the doubt for now.

2 Comments
2024/10/29
13:16 UTC

24

SQL and RAG system – looking for efficient integration ideas

Hey guys, I’m thinking about extending my RAG system to work with SQL DBs. The typical RAG setup doesn’t really fit here since we’re dealing with around 50,000 items that are frequently updated.

I’m considering an approach where I gather filters/parameters through a conversation with the user and then turn that into an SQL query. Not sure if that’s the best path, though.

Anyone have experience with similar tasks? Any tips or better approaches you’d suggest?

25 Comments
2024/10/29
12:31 UTC

14

Vectara price increase

I’m using Vectara solution and I received this email today :

An Important Message from Vectara

We hope you’ve been enjoying the features of the Vectara Platform as part of our Growth plan. As we continue to enhance our platform, we’re making an adjustment to help you get even more out of the experience.

When we first launched our free plan, it focused primarily on semantic search capabilities with 15,000 free queries per month. Since then, we’ve launched a ton of new features: generative AI capabilities, new models, hybrid search, structured metadata storage and retrieval, new cross-attentional rankers, our automated hallucination evaluation model, an automated chat API, and many more. Users who have used these features report that they provide a ton of value but also require a lot more storage and compute from Vectara than we had ever built the free plan to handle.

For that reason, starting October 28th, 2024, our free plan will shift to a 30-day trial. However, that trial will have the full power of Vectara behind it, as opposed to the limitations of our Growth plan. For example, you’ll get access to our most powerful multi-lingual reranker, the ability to modify prompts, our most powerful generative models, the ability to ask for more results per query and much more.

We understand changes can be unexpected, and we want you to know this will help us better serve you with richer functionality and continued improvements. If you want to continue using the platform, you can input your credit card details to move to a Pro plan or reach out to our team to discuss a Scale plan agreement, including business SLAs, premium support, custom dimensions, and more. The new plans start at $100/mo.

We hope that you’ll take up one of these offerings. However, we will delete accounts and associated data for those who have not upgraded to a Pro plan or above by November 28th, 2024.

If you have any questions, please feel free to reach out.

——-

This will make my cost to from ~$110 USD to $300 USD with a 30 days notice or else they will delete all my datas. (Knowing that there is no way to export parsed datas).

I’m not happy and I would not recommend anyone to work with them in the future.

Now would anyone could recommend me a good alternative?

I need a SaaS/self service RAG solution with API to add documents and search in them.

I know RAGIE but they ask for a minimum commitment of $500 USD

Llamaindex have a SaaS solution but not open to public yet.

18 Comments
2024/10/29
08:43 UTC

10

Recommendations for an Advanced PDF Parser with Image and Layout Recognition for Node.js/TypeScript (Open Source)

I'm looking for the best open-source PDF parsing solution for Node.js/TypeScript, specifically with features that support Retrieval-Augmented Generation (RAG) workflows. My ideal parser should be capable of handling the following:

  1. Image Recognition and Extraction: Extract images embedded in PDFs and identify their context within the document.
  2. Layout Awareness: Accurately interpret complex PDF layouts, tables, and multi-column structures, preserving the document’s structure.
  3. Data Labeling and Annotation: Enable labeling of extracted data for downstream use in AI/ML tasks, ideally allowing for some customization or integration with data labeling tools.
  4. Reliability and Community Support: A well-maintained library with active contributors and comprehensive documentation would be ideal.

Does anyone have experience with a parser that meets these criteria, or could recommend any libraries or tools that work well for RAG setups?

7 Comments
2024/10/29
05:55 UTC

4

TrustGraph: LLM processing engine infrastructure

Hey RAGers!

We've been working on bringing data engineering to AI LLM workflows the project is trustgraph.ai. To give you a flavor, the project pulls together a variety of capabilities, SLMs, commercial LLM invocation, chunking, prompt engineering, Graph RAG. There's a UI which can generate deployment specs for Docker Compose, GCP K8s and Minikube. Deploying on Pulsar to connect processing units with infrastructure to scale, handle errors and retry.

https://github.com/trustgraph-ai/trustgraph

2 Comments
2024/10/28
21:06 UTC

Back To Top