Synthetic Data’s Importance in Financial Innovation
When synthetic information can be a representation of the real world, it creates substantive value. Take for example Miquela Sousa, who has, in many ways, come to embody internet success. In just a few short years, she has been named to the Time’s Most Influential People on the Internet list, signed with CAA, earned an estimated more-than $10M last year, and has accrued more than three million followers on Instagram.
Miquela is a virtual human – which, to be clear, means she is not a human at all. She is synthetic, an avatar, and does not exist in the real world. However, her likeness is so close to reality that she serves the same purpose as any celebrity influencer would, wearing high-end clothes and generating beautiful images. Even better, she can be quickly customized to fit any clothing brand and easily appear in Sao Paolo at 8:00 a.m. and be in Hong Kong two hours later—all outside of a human’s realm of possibility.
The sector of technology, synthetics, that makes Miguela possible can also be used when building solutions across financial space. Synthetic data is information created manually or artificially that is not generated by actual events. It reflects real-world data mathematically or statistically and can be created at scale, whenever and wherever it is needed. While synthetic data has been used since the 1990s , its usage has gained greater importance in financial institutions.
For example, an organization implementing a new fraud prevention system can rely on a synthetic data platform to create thousands of different synthetic fraudulent transactions to test the system and avoid any harm to actual clients. Prior to synthetic data, organizations gained the same valuable insights through real-time fraudulent transactions. However, one missed instance of fraud can severely impact customer experience and brand reputation. Using synthetic data helps to more effectively test systems and mitigates the risks associated with actual fraud.
Financial institutions can use synthetic data in a variety of other ways:
Generating synthetic data is a rich and complex topic, with many methods. At a very high level, there are deep learning algorithms such as generative adversarial networks (GANs), which draw from real-world data and replicate similar data, but have no attachment to the real records. Also, there are computer simulations, an agent-based modeling method, which creates a simulation of how data is generated in reality and reproduces data through this model.
Following the creation of synthetic data, effective and expert evaluation processes become critical. The most important considerations when evaluating synthetic data providers are privacy, security, representation, and scalability. It is important that synthetic data be checked against the source to ensure no real data has been leaked, and source data could not be obtained through a reverse-engineering process. For representation, it is also important that the artificial data preserves all the patterns and statistical properties of the source data, and finally, that the synthetic data platform can be used as an organization scales.