Research

What I work on, and why.

Four themes run through my published work and the systems I have built. They are the questions I have kept returning to, and the ones that have tended to matter most once a system has to operate in production.

Scalable retrieval and representation systems

How to build representations that let you find the right thing in a large collection — quickly and reliably. This was the topic of my PhD at EdinburghNLP, and it has remained relevant as systems have grown larger.

The hashing line — Variable Bit Quantisation (ACL 2013), Neighbourhood-Preserving Quantisation (SIGIR 2013), and a 2019 monograph — replaced heavy lookups with a few learned bits at comparable accuracy. In parallel, Real-time Detection, Tracking, and Monitoring of Automatically Discovered Events in Social Media (ACL 2014) and Enhancing First Story Detection using Word Embeddings (SIGIR 2016) applied related ideas to noisy text streams. Sparse Kernel Learning for Image Annotation (ICMR 2014, Best Student Paper) extended the work into multi-modal retrieval.

In production, the line has continued in code-to-code retrieval (Senatus, De-Skew LSH), source-code understanding via spatial representations, and federated secure vocabulary learning — variations on the same question at larger scale and under tighter constraints. Recent essays on the bits-over-random metric and on the limits of vector search for some RAG queries extend the same thinking into the LLM setting.

Operational GenAI in regulated environments

Much generative-AI research is evaluated on held-out benchmarks. A separate set of problems — the ones I spend most of my time on — concern the behaviour of those same models in regulated, audit-heavy, operationally complex settings: latency budgets, governance, hallucination control, model risk management, and the failure modes that surface only in production.

Published work in this area includes SpamT5 (FinLLM @ IJCAI 2023) on few-shot LLM email-spam detection, CodeQUEST (ISSREW 2025) on iterative LLM-based code-quality evaluation, and a number of systems — API-Miner, Senatus, Ledgit, DeepClean — covering code intelligence, anomaly detection, federated training on sensitive data, and machine unlearning. The associated patents (25+ granted US patents to date) sit mostly in code intelligence, federated learning, and secure retrieval.

Efficient, interpretable architectures

A consistent preference in this work is for small, interpretable modules that recover most of the benefit of heavier networks. DeepLPF (CVPR 2020) introduced learnable local parametric filters for image enhancement. CURL (ICPR 2020) introduced neural curve layers — a differentiable colour-curve module that performed competitively with much larger image-enhancement networks on three benchmarks. SIDGAN (ECCV 2020) introduced a synthetic-data pipeline for training low-light video models where real labelled data is unavailable.

The throughline is to prefer the right inductive bias over additional scale, where the problem allows. This matters in deployment: smaller, interpretable modules tend to be easier to ship, audit, and operate under distribution shift.

Decision systems and infrastructure-aware ML

More recent work has concerned systems that make decisions under uncertainty: agents with calibrated self-evaluation, anomaly detection over network alarms and cryptocurrency transactions, biased sampling and graph-feedback methods for streaming classification, and calibrated code-quality scoring. The unifying question is how to engineer reliability into systems that are statistical by nature, with awareness of the infrastructure those systems actually run on.

How I operate

A preference for systems that ship. Production work has made me cautious about evaluations that only cover held-out test sets. The most useful benchmarks tend to resemble the eventual deployment environment.
Small, cross-functional teams. Much of the work I am proudest of has come from compact teams that combine research, engineering, and product judgment in the same room — not from organisational scale alone.
Inductive bias before scale. DeepLPF and CURL are ~600-line modules. Several of the hashing papers replace a heavy lookup with a few learned bits. Where the problem rewards careful structure over more parameters, I prefer to take that route.
Writing as a way of thinking. I publish technical essays on Towards Data Science and Medium because writing for practitioners is a reliable way to surface gaps in my own understanding.

Full list of papers and patents on the home page. Some of the names that recur in the bibliography — Victor Lavrenko (originator of relevance models), Miles Osborne, Charles Sutton, Greg Slabaugh — shaped my thinking in different ways over the years.