CIP Part V blog

This blog continues our series of insights into the Cyber, Identity and Privacy (CIP) sector of our RegTech Taxonomy. Previously, we have looked at the importance of identity in a crisis,  explored the intersection between Cyber Security and Data Privacy, examined the flurry of M&A activity in this space, and investigated the often missed opportunity relating to Financial Market Infrastructure providers. 

Here we take a deeper dive into data privacy concerns how to ensure General Data Protection Regulations (GDPR) compliance, whilst leveraging large data sets for commercial value through the use of synthetic data technology. Privacy-preserving synthetic data technology provides a safer working and testing environment than real data by eliminating blockers to data use and innovation, drastically reducing cyber risk and data management complexity.

A Business and Regulatory challenge

Data has fast become the most valuable commodity on earth. Historically, some companies have become incredibly successful as a result of the amount of data they possess, such as Facebook, who generate valuable insights to customers, partners, and solutions. Financial services organisations, however, handle and collect vast amounts of highly sensitive personally identifiable information (PII), and face a minefield of data management challenges.

In regulated industries, there is a difficult balance to be struck between the use of data as a revenue-generating or cost-saving tool whilst maintaining strict security and privacy standards necessary to comply with stringent industry regulations like GDPR and the California Consumer Privacy Act (CCPA). Synthetic data, however, unlocks new possibilities, being termed as ‘privacy-preserving technology’. According to recital 26 of GDPR, guaranteed anonymous data is excluded from the GDPR and states that “this Regulation does not, therefore, concern the processing of such anonymous data, including for statistical or research purposes”.

Regulated firms can finally use sensitive data to generate business outcomes and train cutting-edge technologies, like machine learning and artificial intelligence algorithms, without the need to consider compliance, security, and privacy risks in quite the same way.

Previous options = high risk

Anonymised data might seem to offer a solution – where some PII is removed but often some remains. If a data breach were to occur here, whilst potentially presenting as a lower risk, reverse engineering could take place for nefarious purposes. Please see here for the difference between anonymised and pseudonymised data

The alternative is to use raw data, which includes all relevant PII and comes with clear additional risk. If this were to be compromised, there could be significant harm caused to businesses and underlying customers. Due to strict regulations, terms within consumer contracts and best practice guidelines, the use of raw data are reserved only for the most absolutely critical uses.

Synthetic data addresses a number of these challenges at once. By creating a copy of the original data set and making minor, random changes, this completely protects the consumer and transaction credentials making it impossible to re-engineer back to the original set. The resulting properly synthesised data set has no PII in it and is as good as a like-for-like match when used to find revenue opportunities or areas for reducing cost.

Synthetic data use cases

Synthetic data can allow actionable insights from information that has been locked away until now including:

  • analysing the spending of small business users; 
  • customer preference segmentation for marketing purposes;
  • identifying new fraud detection trends; or 
  • accelerating customer loan decisions. 

Additionally, synthetic data can safely supercharge technology capabilities such as artificial intelligence engines and machine learning algorithms to create new products, improve efficiencies, reduce operational costs, and innovate by leveraging new business insights. 

Vitally, synthetic data addresses internal security in the event of a data breach. According to Verizon’s 2019 Data Breach Investigations Report, one of the top data security risks for financial institutions is privileged misuse. By using synthetic data, financial institutions can freely deploy previously inaccessible data pools to enhance offerings and processes, whilst negating security and privacy risks. 

“While financial organisations see large potential in the adoption of AI, regulatory hurdles in using the required data serve as a major bottleneck. Synthetic data serves as the perfect alternative to enable data-driven innovation in a scalable and efficient manner while minimizing the risk of de-identification. The granular nature of synthetic data makes it the perfect resource of AI development and research.” – Sebastian Weyer, CEO Statice

Conclusion – Greenfield opportunity

It is clear that high-quality synthetic data can be relevant and robust enough for FIs to monetise whilst helping organisations effortlessly adhere to the highest security, privacy, and compliance standards. Early adopters are able to gain insight into where new opportunities lie, gaining a competitive edge from the world’s newest and most valuable commodity that is fast becoming the differentiator. In comparison to other industries, financial services haven’t even scratched the surface of the business opportunities that synthetic data enables. 

The automotive sector has leveraged synthetic data to improve safety in the training of autonomous vehicles. In medical environments, professionals have begun using synthetic patient data to enable comprehensive research, resulting in the identification of symptoms and treatment of conditions sooner than would otherwise be possible. 

Industry experts and revolutionaries are already recognising how invaluable synthetic data is to business success and we expect this sector of the market to really take off in the coming months.

We are currently working with regulated institutions advising them on the best ways to evaluate and utilise synthetic data across a variety of different use cases.  If you would like to speak to us please drop us an email at 

Comments are closed