Trust, confidence, and verifiable data audit
Data can be a very capable tool for societal progression, assisting our most critical institutions to enhance how they serve their communities. As cities, hospitals, and transport systems find novel and innovative ways to comprehend what people need from them, they’re uncovering opportunities to alter how they function today and identifying thrilling concepts and ideas for the future.
Data can only be advantageous to society at large if it possesses society’s trust and confidence, and in this regard, all of us are encountering a major challenge. Now that you can leverage data for so many additional purposes, people are not only asking on who’s holding their data and the million-dollar question of if it is being stored securely. They also want greater assurances and certainty with regards to what is being done with their data.
In that regard, auditability becomes an increasingly critical attribute. Any well-developed digital tool will already create log of how it leverages data, and be able to display and justify those logs if questioned. But the more capable and secure we can make that audit procedure, the simpler it becomes to establish real confidence with regards to how data is being leveraged, practically speaking.
Visualize a service that could furnish mathematical assurance with regards to what is happening with every individual article of private data, with potential for falsification or omission. Think of the ability for the inner workings of that system to be assessed in real-time, to make sure that information is only being leveraged as it ought to be. Imagine that the infrastructure driving this was open source, and freely available, so any enterprise in the planet could implement their own variation if they so desired to.
The working title for this project is “Verifiable Data Audit”. Verifiable Data Audit is DeepMind’s attempt to furnish the health service with technologies that assist practitioners to forecast, diagnose, and prevent serious illnesses – a critical aspect of DeepMind’s mission to deploy tech for social welfare.
Provided the sensitive nature of health data, the belief has always been that DeepMind should intend to spearhead innovation with governance as they are with the technology itself. A panel of unpaid independent reviewers have been appointed to look through their healthcare work, commissioning audits, and putting out an annual report with their discoveries in tow.
Verifiable Data Audit is viewed as a powerful supplement to this scrutiny, providing DeepMind’s partner hospitals an extra real-time and fully proven mechanism to evaluate how they’re processing data. The belief is that this approach will be especially useful in medical care, provided the sensitivity of private medical information and the requirement for every interaction with data to be appropriately authorized and consistent with rules regarding patient consent. For instance, an organization possessing health data cannot just decide to begin executing research on patient records being leveraged to furnish care, or repurpose a research dataset for some other unauthorized usage. To put it in different words: it’s not just the location of the data being stored, it’s what’s being done with it that makes the difference.
How will it function? DeepMind serves hospital partners as a data processor, implying that their role is to furnish secure data services under their instructions, with the hospital staying in complete control throughout. At the moment, any time the systems obtain or touch that information, a log is created of that interaction which can be audited later on if required.
Verifiable Data Audit adds on to this backbone. Every time there’s any kind of interaction with data, an entry will be included on to a special digital ledger. That entry will document the fact that a specific piece of information has been leveraged, and also the purpose why – for instance, the blood test data was checked against the NHS national algorithm to identify possible acute kidney injury.
The ledger and the entries contained in it will have some of the attributes of blockchain, which is the concept underlying Bitcoin and other crypto projects. Much like blockchain, the ledger will be append only, so once a documentation of data usage is included, it can’t later be altered or removed. And just like blockchain, the ledger will make it feasible for third parties to authenticate that nobody has messed with any of the entries.
However, it’ll also be different from Blockchain in a few crucial ways. Blockchain is a decentralized framework, and therefore the authentication of any ledger is determined by consensus amongst a broad grouping of participants. To avert abuse, a majority of blockchains need participants to repeatedly execute complicated calculations, with large connected expenditure (going by some estimates, the total energy utilization of blockchain participants could be as much as the power consumption of Cyprus.) This isn’t required with regards to the health service, as we already possess trusted institutions such as hospitals or national bodies who can be depended upon to verify the integrity of ledgers, avoiding some of the wastefulness of blockchain.
We can also make this more effective by substituting the chain aspect of blockchain, and leveraging a tree-like structure instead (if you wish to learn more about Merkle trees, the UK Government’s Digital Service has a few blogs on them – check them out). The cumulative effect is much the same. Each time we include an entry to the ledger, we’ll produce a value referred to as a ‘cryptographic hash’. This hash process is special as it summarizes not only the newest entry, but all of the prior values on the ledger too. This makes it basically impossible for anybody to return and quietly modify one of the entries, as that will not just modify the hash value of that entry but also of the entire tree.
To put it in layman’s terms, you can perceive it as bit like the final move in a game of Jenga. You might attempt to gently take or shift one of the pieces, but owing to the total structure, that going to wind up making a huge noise!
So, currently we possess an enhanced variant of the humble audit log: a completely trustworthy, effective ledger that we are aware captures all interactions with information, and which can be authenticated by a reputed third party in the healthcare domain. What do we do with that?
The short answer is this: massively enhance the fashion in which these records can be audited. They intend to develop a devoted online interface that authorised employees at their partner hospitals can leverage to examine the audit trail of DeepMind Health’s information leveraging in real time. It will enable continuous verification that our systems are operating as they should, and facilitate their partners to easily query the ledger to check for specific types of data utilization. DeepMind also desires to facilitate their partners to carry out automated queries, essentially setting alarms that would be triggered if anything out of the ordinary occurred. And, in time, DeepMind intends to provide their partners with the option of permitting others to check their data processing, like individual patients or patient groups.
The challenges that lie ahead
Developing this is going to be a massive undertaking, but provided the criticality of the matter, the people involved believe that it’s well worth the effort. At the moment, three major technical challenges stand out.
No blind spots. For this to be provably trustworthy, it cannot be possible for data utilization to occur without being recorded in the ledger, otherwise, the idea crumbles. In addition to developing the logs to document the time, nature and purpose of any level of interactivity with the data, they’d also prefer to be able to prove that there exists no other software covertly carrying out interactions with data in the background. In addition to creating logs of every single data interaction on the ledger, the intention is to leverage formal methods in addition to code and data centre audits by specialists, to prove that each data access by every article of software in the data centre is captured by these logs. There are also efforts being made to ensure the trustworthiness and reliability of the hardware on which these systems run – a topic actively being researched.
Differing uses for different groups.
The core implementation will be an interface to enable their partners to provably assess in real-time that they are only leveraging patient information for authenticated purposes. If these partners desired to extend that capability to others, such as patients or patient groups, there would be complicated design questions to resolve.
A long list of logs may not be of much use to many patients, and some may have a preference to read a consolidated view or rather depend on a trusted middle man. Equally, a patient group might not possess the authority to see identified information, which would imply allowing their partners to furnish some type of system-wide-information – for instance, whether machine learning algorithms have executed on specific datasets – without unintentionally unveiling patient data.
Decentralized data and logs, without gaps. There’s no singular patient identified information database in the United Kingdom, and so the procedure of care consists information traversing back and forth amongst healthcare providers, IT systems, and even patient-controlled services such as wearable gadgets. There’s a ton of research happening to make these systems interoperable, so they can operate safely in conjunction, it would be beneficial for these standards to consist of auditability as well, to avert gaps where data becomes unauditable as it moves from one system to another.
This doesn’t imply that a data processor such as DeepMind should see data or audit logs from other systems. Logs should stay decentralized, much like the data itself. Audit interoperability would merely furnish extra reassurance that this information can’t be tampered with as it traverses between systems.
This is a considerable technical hurdle, but it should be doable. Particularly, there’s an emerging open standard for interoperability in the healthcare domain referred to as FHIR, which could be extended to include auditability in useful ways.