RAG Evaluation Checklist
A practical checklist for evaluating RAG retrieval quality, source faithfulness, citations, no-answer behavior, latency, and human review effort.
Workflow
An internal knowledge QA workflow with approved sources, retrieval checks, no-answer behavior, escalation, and review-ready answer traces for teams.
An internal knowledge QA workflow helps employees answer policy, product, process, support, and technical questions from approved company knowledge. It can reduce repeated questions and speed onboarding, but it also touches sensitive boundaries: permissions, stale docs, internal policy, and unsupported answers that may spread quickly inside a team.
The workflow should be designed as a grounded retrieval system first and an answer generator second. If the approved source set cannot answer the question, the correct output is a refusal, escalation, or unresolved-question record. The RAG no-answer testing guide is central to making this behavior reliable.
Inputs include approved documents, user permissions, query, workspace or department, escalation policy, and feedback channel. Each source should have an owner, access label, last-reviewed date, and audience. The workflow should not use documents the user is not allowed to see.
Outputs include a cited answer, source list, confidence or evidence label, unresolved question, and suggested doc update. For internal use, the answer can include links to source documents, but it should still avoid exposing restricted content to users without permission.
A practical stack includes document index, retriever, access-control filter, LLM answer generator, citation checker, feedback queue, and stale-doc report. If the workflow can create tickets or tasks, classify that tool under the agent permission design model.
The retrieval and answer layers should be tested with the RAG evaluation checklist. Test both answerable questions and questions where the right behavior is to say the knowledge base is incomplete.
First, define the approved corpus. Include only documents with clear ownership and access labels. Remove drafts, duplicates, and outdated pages where possible. If stale pages must remain, label them so the workflow does not treat them as current truth.
Second, enforce permissions before retrieval. Do not retrieve first and filter later if snippets could leak restricted context into the model. The user should only receive answers grounded in sources they are allowed to access.
Third, generate answers with citations. The answer should be concise, point to supporting sources, and avoid claims not present in the retrieved context. If sources conflict, the workflow should say so and escalate.
Fourth, capture unresolved questions. A missing answer is valuable information. Create a queue for the knowledge owner that includes the question, attempted retrieval, missing source category, and suggested owner.
Fifth, close the loop. When a knowledge owner updates a document, the workflow should retest the original question and mark the unresolved item as fixed, still ambiguous, or out of scope. This prevents the queue from becoming a passive list of complaints and turns employee questions into maintenance signals.
Answers must cite accessible sources and refuse when permission or evidence is missing. The reviewer should be able to see the query, retrieved sources, excluded sources if logged safely, final answer, and reason for any refusal.
Run fixtures for restricted documents, stale policies, duplicate pages, ambiguous acronyms, newly changed procedures, and unsupported questions. The workflow should not reward confident answers that lack citations.
Permission fixtures are especially important. A user without access to an HR policy, customer contract, or security runbook should not receive a summarized version through the QA tool. The expected answer should say that the user lacks access or should contact the owner, without leaking the restricted content.
Knowledge owners review unresolved questions, stale docs, and high-impact answers. Early rollout should sample accepted and rejected answers weekly. The agent observability guide gives the trace fields needed for this review.
Internal knowledge QA fails by leaking restricted information, citing inaccessible docs, hiding knowledge gaps, over-trusting stale docs, or answering from a source that is not authoritative. It can also fail socially: if employees learn that answers are often wrong, they stop using the system and return to private chat threads.
It can also fail operationally when no one owns the corpus. Retrieval quality drops quickly if old docs, duplicate pages, and abandoned policies remain in the index without labels.
The main risks are restricted-source leakage, stale internal docs, unsupported answers, and answers that hide gaps in the knowledge base.
It should refuse, escalate, or create an unresolved-question item for the knowledge owner instead of inventing an answer.
Use the RAG Evaluation Template to test retrieval relevance, faithfulness, citation support, access boundaries, and no-answer handling before rollout.