The challenge

Energy infrastructure professionals were drowning in thousands of pages of complex regulatory documentation. Grid codes, planning regulations, environmental assessments, compliance requirements - dense, interconnected legal text that governed every decision they made.

These professionals stake their careers on getting the details right. A missed clause or an outdated cross-reference doesn't just waste time - it can derail a project worth millions. But there was no efficient way to search, cross-reference, or query this information. They spent hours manually reading through documents to find the specific condition they needed. When regulations changed, tracking the implications across related documents was a manual, error-prone process.

The startup saw the opportunity. But building AI that professionals would trust with their regulatory decisions - that was the hard part.

The approach

I co-founded the technical side of this startup, building the entire AI system from scratch.

Custom OCR pipeline. The regulatory documents came in various formats - scanned PDFs, legacy word documents, HTML from government websites. Before any AI could work with this content, it needed to be accurately extracted and structured. I built a custom OCR pipeline that handled the diversity of source formats while preserving the document structure that was critical for understanding relationships between clauses.

GraphRAG retrieval system. Regulatory text is inherently relational. A clause in one document might modify, override, or depend on clauses in other documents. A standard vector search approach would find semantically similar text, but miss these critical legal relationships.

I built a knowledge graph in Neo4j that explicitly modelled the relationships between regulatory concepts, documents, clauses, and conditions. The retrieval system could then traverse these relationships, ensuring that when a user asked about a specific regulation, they got the full picture - including related conditions, exceptions, and cross-references.

LLMOps with Langfuse. From day one, we tracked every query, every retrieval, and every generated response. Langfuse gave us visibility into what users were asking, where the system was performing well, and where it was falling short. This data drove rapid iteration on both the retrieval logic and the knowledge graph structure.

The outcome

The product went from initial concept to paying customers. Energy infrastructure professionals could now ask natural language questions about regulatory requirements and get accurate, sourced answers in seconds instead of hours.

The knowledge graph approach was critical for building trust with users. Every answer included references to the specific clauses and documents it drew from. Users could verify the AI's responses against the source material - which they did, frequently, until they built confidence in the system's accuracy.

When you're building AI for professionals who stake their careers on getting the details right, accuracy isn't a feature - it's the product. The knowledge graph was what made that possible.

Want to talk about AI?

We're always up for a chat. No pitch, no obligation.