A Copy Worth More Than the Original? 

Synthetic data’s importance in financial innovation

page title

A Copy Worth More Than the Original? 

Synthetic Data’s Importance in Financial Innovation

November 2021

By Fengzi Li and Zakie Twainy

Are you an explainable AI company that we should know about?


Learn More [link to accelerator program page]

Interested in learning more about emerging technologies for the financial services industry? 


Sign-up for The Quarterly Update

Business concept - high speed abstract MRT track of motion light for background in tokyo, japan


The Value of Synthetic Data


When synthetic information can be a representation of the real world, it creates substantive value. Take for example Miquela Sousa, who has, in many ways, come to embody internet success. In just a few short years, she has been named to the Time’s Most Influential People on the Internet list,  signed with CAA, earned an estimated more-than $10M last year,  and has accrued more than three million followers on Instagram.  


Miquela is a virtual human – which, to be clear, means she is not a human at all. She is synthetic, an avatar, and does not exist in the real world. However, her likeness is so close to reality that she serves the same purpose as any celebrity influencer would, wearing high-end clothes and generating beautiful images. Even better, she can be quickly customized to fit any clothing brand and easily appear in Sao Paolo at 8:00 a.m. and be in Hong Kong two hours later—all outside of a human’s realm of possibility. 


The sector of technology, synthetics, that makes Miguela possible can also be used when building solutions across financial space. Synthetic data is information created manually or artificially that is not generated by actual events. It reflects real-world data mathematically or statistically and can be created at scale, whenever and wherever it is needed. While synthetic data has been used since the 1990s , its usage has gained greater importance in financial institutions.  


For example, an organization implementing a new fraud prevention system can rely on a synthetic data platform to create thousands of different synthetic fraudulent transactions to test the system and avoid any harm to actual clients. Prior to synthetic data, organizations gained the same valuable insights through real-time fraudulent transactions. However, one missed instance of fraud can severely impact customer experience and brand reputation. Using synthetic data helps to more effectively test systems and mitigates the risks associated with actual fraud.


Financial Use Cases


Financial institutions can use synthetic data in a variety of other ways:

  • Testing Third-Party Technology. Financial institutions seeking to work with a third-party provider to create new solutions will often face the roadblock of being unable to share private or confidential data to adequately test such new platforms. Synthetic data can be used to validate new external technologies before committing to larger implementations.    

  • Meeting Privacy & Compliance Requirements. Financial institutions continue to be challenged to comply with new and shifting data security regulations, including the General Data Protection Regulation (GDPR) in the European Union and the California Consumer Privacy Act (CCPA) in how they use and protect client data. Synthetic data can help institutions build out new internal platforms without using actual client data to ensure a potential solution will be scalable without incurring privacy related risk.  

  • Platform Beta Testing. Prior to a product launch, protocols often call for testing of the platform while still in development. However, the requisite data may be scarcely available with new platforms, making it difficult to test product performance and capabilities.  Synthetic data can perform this task. 

  • AI Training Data. Artificial learning and machine learning algorithms require a large amount of data to be processed in order to create robust and reliable models. Synthetic data can be used to generate a sufficient dataset that may otherwise be difficult to obtain.  

Understanding Synthetic Data Creation and Evaluation 


Generating synthetic data is a rich and complex topic, with many methods. At a very high level, there are deep learning algorithms such as generative adversarial networks (GANs), which draw from real-world data and replicate similar data, but have no attachment to the real records. Also, there are computer simulations, an agent-based modeling method, which creates a simulation of how data is generated in reality and reproduces data through this model. 


Following the creation of synthetic data, effective and expert evaluation processes become critical. The most important considerations when evaluating synthetic data providers are privacy, security, representation, and scalability. It is important that synthetic data be checked against the source to ensure no real data has been leaked, and source data could not be obtained through a reverse-engineering process. For representation, it is also important that the artificial data preserves all the patterns and statistical properties of the source data, and finally, that the synthetic data platform can be used as an organization scales.

Generating synthetic data can come with several challenges. To start, the flexible nature of synthetic data may create hidden biases. Also, because replicating all necessary features from real data might become complex in nature, there is a possibility of missing out on some necessary features during this data generation process.7


Outlook: The Synthetic Data Sector


Synthetic data continues to gain momentum within the technology sector. Of the companies focused on synthetic data funded in 2019, a majority of them are still in the seed stage of funding, followed by Series A.

Source: Crunchbase query of companies labeled “synthetic data” and had funding rounds in 2019, 2020, and 2021 (September 27, 2021)


Those with the most mature and robust funding, including MDClone and Datagen, which raised $41M and $22M respectively, are primarily focusing within the healthcare and computer vision spaces.8 Similarly, the industry continues to see rising interest from accelerators, including YCombinator which invested in companies Zumo Labs and Synth.9

Source: Crunchbase query of companies labeled “synthetic data” and had funding rounds in 2019, 2020, and 2021 (September 27, 2021)


Looking Ahead


Data is critical in driving success across the financial sector and beyond. As companies continue using collected information to make more informed business decisions, synthetic data is poised to be a catalyst for innovation. As BNY Mellon looks to adopt AI in a broad range of areas within the bank, synthetic data platforms show a number of potential benefits in keeping clients’ data and innovating to meet clients’ needs. We expect that synthetic data is a space that will likely grow and become more sophisticated.

The 25 Most Influential People on the Internet. The New York Times. June 30, 2018.
2 The Problematic Fakery Of Lil Miquela Explained—An Exploration Of Virtual Influencers and Realness. Forbes. May 17, 2020.
3 As of 9/17/21 on Instagram profile @lilmiquela
4Synthetic Data — key benefits, types, generation methods, and challenges. Toward Data Science. May 12, 2021.
5Top 20 Synthetic Data Use Cases & Applications in 2021. AI Multiple. September 20, 2021.

6Overcoming Data Scarcity and Privacy Challenges with Synthetic Data. InfoQ. December 25, 2020. 
7Synthetic Data — key benefits, types, generation methods, and challenges. Toward Data Science. May 12, 2021.

8 Crunchbase query of companies labeled “synthetic data” and had funding rounds in 2019, 2020, and 2021 (September 27, 2021)

9 Crunchbase query of companies labeled “synthetic data” and had funding rounds in 2019, 2020, and 2021 (September 27, 2021)

Fengzi Li

Zakie Twainy


BNY Mellon is the corporate brand of The Bank of New York Mellon Corporation and may be used to reference the corporation as a whole and/or its various subsidiaries generally.  This material does not constitute a recommendation by BNY Mellon of any kind.  The information herein is not intended to provide tax, legal, investment, accounting, financial or other professional advice on any matter, and should not be used or relied upon as such.  The views expressed within this material are those of the contributors and not necessarily those of BNY Mellon.  BNY Mellon has not independently verified the information contained in this material and makes no representation as to the accuracy, completeness, timeliness, merchantability or fitness for a specific purpose of the information provided in this material.  BNY Mellon assumes no direct or consequential liability for any errors in or reliance upon this material.
BNY Mellon will not be responsible for updating any information contained within this material and opinions and information contained herein are subject to change without notice.
BNY Mellon assumes no direct or consequential liability for any errors in or reliance upon this material. This material may not be reproduced or disseminated in any form without the prior written permission of BNY Mellon. Trademarks, logos and other intellectual property marks belong to their respective owners.
© 2021 The Bank of New York Mellon Corporation.  All rights reserved.



Rapid Insights



Emerging ESG Trends for Asset Owners




Alternative Investments Trends




New Imperatives and Evolving Operating Models




China - The Door Widens to Global Investors