RAG sucks
Think about it for a second. sending back text chunks from documents, based on a query that sometimes isn’t even making sense for a search engine. now add hundred documents, thousands, millions. It will sucks even more.
Search is a hard problem to solve and when you add the non-deterministic behavior of current models well you in for some pretty bad retrieval.
There are ways for your RAG approach to suck less. you could try hybrid search, you could try bigger chunks, semantic chunks, you could try to ask your LLM do handle the query expansion and filtering to improve relevancy. But guess what? Your RAG will still suck.
And there is no way around that. it's just that the idea of retrieving chunks and hoping there is enough content for your LLM to retrieve the useful information is just hope.
Sometimes it is easier to identify a master document (or a few) and just copy them into your prompt.
Don’t do RAG and if you have to… well do it on heavily curated documents...