AI-Powered GRC Knowledge Base

What this is

§01

The cyber team had become a help desk. Procurement wanted to know if a vendor passed third-party risk before signing the contract. Legal wanted the precise wording of the data-processing clause in policy v4.3. The product team wanted to confirm whether their new integration needed a DPIA. Same questions, asked thirty times a week, answered by analysts whose actual job was meant to be assessing risk, not retrieving policy quotes.

The policies were public — to anyone in the company — but nobody could find them. The intranet search returned the page titles in alphabetical order. The cyber team had become the intranet search.

The brief

§02

Audience: Procurement, legal, product, and engineering teams — i.e. anyone who occasionally needs to consult GRC policy without becoming a GRC specialist.
Non-negotiable: Every answer must cite the specific policy section it came from. Hallucinating policy is worse than not answering — auditability is the whole point.
Security posture: Inside the enterprise SSO perimeter. No external API calls leaking document content. Self-hosted vector store; LLM inference via a vetted enterprise endpoint.
Maintenance ceiling: The team owning this can't be the cyber team — they're the load it's designed to reduce. Document updates have to be a drop-in: upload PDF, re-embed, done.

The shape of it

§03

A Dify-orchestrated chat surface backed by a retrieval-augmented LLM. Policy content is sourced directly from the enterprise GRC libraries — OneTrust's policy register for the privacy / third-party risk material, RSA Archer for the broader control framework. Re-embedding happens on a schedule when those source systems publish a version bump. Three rules baked in via the system prompt and the orchestration layer:

Always cite.Every answer carries inline references to source documents with a quoted passage and a deep link. If retrieval returns nothing above the similarity threshold, the response is “I don't know — here are the closest documents” — not a guess.
Scope guardrails. The system prompt encodes what's in-scope (third-party risk policy, data classification, vendor onboarding). Out-of- scope questions get a redirect, not an attempt.
Document freshness. The vector store is keyed to document versions. Answers annotate the policy version they cite from, so a superseded-policy response is immediately visible as such.

What it ships, qualitatively

§04

24/7

Self-service for non-tech stakeholders

100%

Of answers cite a source clause

↓↓

Inbound queries to the cyber team

The quantitative win is the reduction in ad-hoc questions hitting the cyber team. The qualitative win is the cultural shift — teams started reading the cited clauses themselves and asking second-order questions, instead of treating policy as opaque. The tool stopped being a chatbot and started being a teaching aid.

Stack

§05

Frontend

Chat UI with inline citation deep-links
Streaming token rendering
Enterprise SSO gating

Sources

OneTrust — third-party risk + privacy policies
RSA Archer — control framework + risk register
Scheduled re-embed on upstream version bumps

Retrieval & reasoning

Dify orchestration (prompt + flow + tool calls)
Self-hosted vector store, embeddings keyed by doc version
Top-K with rerank + similarity threshold
Enterprise-vetted LLM endpoint with scope guardrails

Ops

Drag-and-drop document re-embed flow
Audit log of every query + cited sources
No data leaves the enterprise perimeter

What I'd do differently

§06

Build the document-versioning model into the embeddings pipeline from the start, rather than retrofitting it. Versioned retrieval is the difference between “here's a citation” and “here's a defensible citation”.
Add a feedback loop on responses — thumb up/down with an optional ‘why’ — to feed a continuous-eval set for future retrieval-tuning.
Treat the system prompt as code, with reviews and version history. We did this informally; should have done it formally.