Synthetic data’s importance in financial innovation
Synthetic Data’s Importance in Financial Innovation
November 2021
By Fengzi Li and Zakie Twainy
Learn More [link to accelerator program page]
Sign-up for The Quarterly Update
When synthetic information can be a representation of the real world, it creates substantive value. Take for example Miquela Sousa, who has, in many ways, come to embody internet success. In just a few short years, she has been named to the Time’s Most Influential People on the Internet list, signed with CAA, earned an estimated more-than $10M last year, and has accrued more than three million followers on Instagram.
Miquela is a virtual human – which, to be clear, means she is not a human at all. She is synthetic, an avatar, and does not exist in the real world. However, her likeness is so close to reality that she serves the same purpose as any celebrity influencer would, wearing high-end clothes and generating beautiful images. Even better, she can be quickly customized to fit any clothing brand and easily appear in Sao Paolo at 8:00 a.m. and be in Hong Kong two hours later—all outside of a human’s realm of possibility.
The sector of technology, synthetics, that makes Miguela possible can also be used when building solutions across financial space. Synthetic data is information created manually or artificially that is not generated by actual events. It reflects real-world data mathematically or statistically and can be created at scale, whenever and wherever it is needed. While synthetic data has been used since the 1990s , its usage has gained greater importance in financial institutions.
For example, an organization implementing a new fraud prevention system can rely on a synthetic data platform to create thousands of different synthetic fraudulent transactions to test the system and avoid any harm to actual clients. Prior to synthetic data, organizations gained the same valuable insights through real-time fraudulent transactions. However, one missed instance of fraud can severely impact customer experience and brand reputation. Using synthetic data helps to more effectively test systems and mitigates the risks associated with actual fraud.
Financial institutions can use synthetic data in a variety of other ways:
Generating synthetic data is a rich and complex topic, with many methods. At a very high level, there are deep learning algorithms such as generative adversarial networks (GANs), which draw from real-world data and replicate similar data, but have no attachment to the real records. Also, there are computer simulations, an agent-based modeling method, which creates a simulation of how data is generated in reality and reproduces data through this model.
Following the creation of synthetic data, effective and expert evaluation processes become critical. The most important considerations when evaluating synthetic data providers are privacy, security, representation, and scalability. It is important that synthetic data be checked against the source to ensure no real data has been leaked, and source data could not be obtained through a reverse-engineering process. For representation, it is also important that the artificial data preserves all the patterns and statistical properties of the source data, and finally, that the synthetic data platform can be used as an organization scales.
Generating synthetic data can come with several challenges. To start, the flexible nature of synthetic data may create hidden biases. Also, because replicating all necessary features from real data might become complex in nature, there is a possibility of missing out on some necessary features during this data generation process.7
Synthetic data continues to gain momentum within the technology sector. Of the companies focused on synthetic data funded in 2019, a majority of them are still in the seed stage of funding, followed by Series A.
Those with the most mature and robust funding, including MDClone and Datagen, which raised $41M and $22M respectively, are primarily focusing within the healthcare and computer vision spaces.8 Similarly, the industry continues to see rising interest from accelerators, including YCombinator which invested in companies Zumo Labs and Synth.9
Data is critical in driving success across the financial sector and beyond. As companies continue using collected information to make more informed business decisions, synthetic data is poised to be a catalyst for innovation. As BNY Mellon looks to adopt AI in a broad range of areas within the bank, synthetic data platforms show a number of potential benefits in keeping clients’ data and innovating to meet clients’ needs. We expect that synthetic data is a space that will likely grow and become more sophisticated.