📌 What do you think is an appropriate number of QA datasets?

I think you need around 100 or so.

In some cases, you may need much more than that if your RAG questions are diverse.


📌 I'm curious about your favorite LLMs to experiment with.

Our team is using SOTA models within acceptable security boundaries.

Currently, we are primarily using GPT-4o.


📌 Instead of embedding the chunked text directly, do you sometimes embed it by putting metadata (document title, etc.) in front of the text? I'm wondering if this is a meaningful preprocessing in the retrieve step.

Instead of using the text after the chunk directly as the Contents of the corpus, (title) + (summary or metadata) + (chunked text) for embedding.

It's a good tip and trick to try when improving retrieval performance :)

❗But remember, different data will perform differently. ❗

Some preprocessors may improve Retrieval performance significantly on certain data, while others may only improve it marginally or even decrease it.

In the end, you'll need to experiment to find the best method for your data.

AutoRAG was created to make these experiments easy and fast, so we recommend using AutoRAG to do some quick experiments 😁.


📌 Share your tricks for boosting RAG performance in domain-specific, jargon-laden documentation!

The more jargon-ridden a domain is, it is important to construct a realistic evaluation QA dataset.

For non-experts, they are less familiar with the jargon and ask more vague questions than precise ones, so retrieval using VectorDB with high semantic similarity may perform better.