The importance of improving data quality at source

Privacy and participation – technologies to open data for innovation

Technology News


Very few organisations would have launched two new programmes of work by sharing a stage with the artist behind a hot pink puppet-robot hybrid that interrogates subjects about their data use. So the Open Data Institute’s (ODI) London Data Week launch of the Privacy Enhancing Technologies (PETs) and participatory data programmes as part of our live Canalside Chats series of events felt like a unique and possibly unexpected way to begin this dual journey. 

Alistair Gentry’s DoxBox trustbot is what the former ODI Data as Culture artist-in-residence refers to as ‘tech drag’, with the puppet and operator informing the public just what they are signing up to when they give away their data in the endless terms and conditions documents they must click through to use their social media apps of choice. “In the early days, DoxBox asked people what they thought about the information they’ve given away,” said Gentry at the Canalside Chat panel event. “Now it asks how they feel about that information. But it’s a really important difference, because we’re never asked how we feel about the data we surrender for the services we are using.” 

His explanation reflects some of the reasoning (not to mention inspiration) as to why our PETs and participatory data programmes are so relevant just now. PETs are tools and practices that can enable access to data that may otherwise be kept closed for reasons of privacy, as well as commercial sensitivity or national security. Yes, the name is kind of cute, but they are important tools that are likely to become more known asAI’s rapid development butts heads with increasing public knowledge of, and yearning for, privacy around data about them. 

Some PETs were used way before the term was even coined, including invisible ink and redaction, pixelation and voice obfuscation, such as you might see on a documentary or news programme. More recently, people have been focusing on more novel privacy technologies. This can be really exciting as these technologies can have a lot of utility, like federated learning technology, which is a PET we’ve done some work on already. This flips the traditional way of training a machine learning algorithm, so the data is kept at its source and the algorithm is sent to it, rather than collecting and pooling all the data you need.

Within this model, for example, constellations of hospitals have been able to train AI to identify conditions such as age-related macular degeneration. PETs mean that medical researchers can draw from much greater data sets, including scans of people’s eyes,  to work on methods that prevent sight loss, potentially transforming people’s lives.

Our PETs programme is designed to explore how further safe and secure data use might be possible, and how it might lead to more positive economic, social and environmental benefits. We know that PETs can deliver medical research at scale while respecting patient confidentiality, as well as offering ways to combat human trafficking with Enveil and DeliverFund providing safe access to a large human trafficking database without compromising the security and ownership of the underlying data.

Our participatory data programme is headed up by my fellow researcher Joe Massey and has been launched in collaboration with Aapti Institute. Like the PETs programme, the work is supported by the Patrick J. McGovern Foundation – with the aim of enhancing societal impact, justice and equity. The programme seeks to bring about more participatory approaches to data collection, use and sharing; approaches that enable people to be more meaningfully involved in decision making. This programme is a logical progression of the ODI’s work over the last 10 years, including that on Open Banking and data portability. We have also done work on more bottom-up data governance processes, like data co-ops and data trusts

Access to data helps us to tackle some of the biggest challenges facing society today, like the climate crisis, the increase in hereditary diseases and societal inequality. However, the majority of people affected by these issues are not meaningfully involved. Currently, consent processes are often ineffective, leading to people withdrawing their consent for uses of data about them.

For us, participatory data means more people are involved in the creation, maintenance, use and sharing of data. Examples include citizen science initiatives like Zooniverse, which enables people to take part in real cutting-edge research across the sciences and humanities. Technical solutions like Solid Pods enable people to determine who can access data about them, and data cooperatives like POSMO coop enable people to be more involved in the governance of data. At the policy level, people can be involved in determining the ‘rules of the game’, via methods like public consultations or citizen juries, such as Camden Data Charter, which guides how the council collects, processes and shares data ethically in Camden.

 Participatory data is about more than people taking part in the data ecosystem, though. That alone is not going to suddenly shift any imbalance in power. There are a lot of other levers that need to be pulled to ensure a more equitable world, including the regulatory and technological approaches present in our work on PETs. A really interesting example that we heard about at our annual summit last year, is a data licence that’s been put together by the Maori in New Zealand, setting out how people are able to access and use Maori data. 

The more that people are engaged and involved, the more innovative schemes like this we will see, with the people and communities who create data at their heart. Both our new programmes look to address power imbalances in their own ways, while still maintaining the ability of businesses, governments, charities, campaigners and individuals to innovate, progress, inform and create. 

We acknowledge that there is a risk that certain PETs could be used as a way to increase surveillance, which has relevance for current discussions in relation to back doors into messaging apps as part of the UK’s Online Safety Bill. There is also concern that this broader data access could be a way for online marketers to hone their personalised ads further, without users knowing how and why this increased accuracy is occuring. But we firmly believe that enabling people to have a greater say – and more involvement – will have a positive impact overall. Greater engagement and trust in privacy will lead to greater knowledge, meaning that people will be able to make increasingly informed decisions about how data about them is used, whether that is for the greater good or simply in exchange for signing up for the latest social media app. 

 If you have an interest in either of these key topics for future data use, do not hesitate to get in touch with us at the ODI, whether you are an individual with an interest or a business who would like to speak with us about our research.  

Calum Inverarity is a Senior Researcher at the Open Data Institute.