Big Data and Investment Management

Big Data and Investment Management

April 2015

Executive Summary

As concepts go, Big Data is one from which it currently seems difficult to escape. The term refers to collections of data sets that are very large and complex, and as such resist attempts to process or manage using traditional methods. More recently, the term has also come to encompass the use of specialised analytical techniques and computing tools to analyse these large and complex data sets. Today’s computing tools can quickly interrogate these huge data sets, revealing hitherto untapped trends, patterns and correlations, from which new insights and predictions around current and future purchasing, or in this case investor needs, can be extrapolated.

Volume, variety, veracity, velocity – these are all characteristics of Big Data. The quantity, speed and diversity of information flows continue to expand at a geometric rate, further swelling the pools of available data to be analysed and acted upon.1

As the lines between front, middle and back office continue to blur, smarter data management is essential for effective fund management. Big Data facilitates that – but also poses challenges. Through an understanding of these inherent opportunities and potential obstacles, the investment management industry can use their own data to design, manufacture and market solutions more effectively with a view to generating outcomes that are more aligned with investor expectations. The optimisation of portfolio construction may itself offer additional benefits via the mining of data patterns and generating new insights, which in turn may be transformed into additional improvements.

Supermarkets have developed sophisticated data-driven profiling tools. The investment management industry already possesses similar transactional or investment data. Today, ‘social data’ offers another window into consumer behaviour and bias and should be treated as another valuable additional data pool if it can be harnessed. Identifying and aligning correlations across these two data sets has the potential to generate better and/or more appropriate outcomes for investors.

The paper that follows looks to put history and context around Big Data, examine its credentials and potential as a transformative tool in the current era of shrinking margins and ever-more sophisticated and powerful analytical tools and address the question of how it can be used to solve the complex and commercially-critical issue of enhancing sales performance and client satisfaction.

Introduction – A Brief History of Data

In August 1854, Dr John Snow, a Yorkshire-born surgeon living and working in London, witnessed the outbreak of a severe cholera epidemic in the city’s Soho district. At that time, the accepted view of this disease was that its transmission was through the air – the miasma. Snow had another theory. He claimed the disease entered the human body directly by mouth. Snow explained that the 1854 outbreak was a direct result of sewage present in the water supply and pump in nearby Broad Street:

“…There are no sewers… and the refuse of all kinds consequently saturates into the ground in which the [water] pipes are laid. I found that the water collected by the people, after throwing away the first portion, still contained more organic matter than that supplied to the adjoining streets.”2

Through his insight and analysis of the patterns of data in the geographical area surrounding the Broad Street pump, Snow saved hundreds of lives when he tiptoed out one night and unscrewed its wooden handle. He went on to save the lives of countless more when he published his theories on cholera to universal acclaim and acceptance. Snow later became known for creating the branch of science now known as epidemiology.

Can we adapt Snow’s approach and embrace Big Data to understand the patterns, trends and causes-and-effects in the pools of data that have accumulated all around us? As with Dr Snow, the patterns and correlations in investment data may lead us to conclude that specific changes in how portfolios are designed can drive more effective fund sales linked to improved consumer outcomes.

The abridged story of data, Big Data in particular, begins in the design of the database itself. A database is an arrangement of information. It is information architecture. Big Data’s visionary forebears are as important as Le Corbusier, van der Rohe and Foster are to the design and development of the built environment. But the technology geeks are far less well known.

In 1970 IBM’s3 E. F. Codd was the first to define the relational model of data that enabled computer scientists to see inside dark pools of data by exploiting tuples. Tuples are ordered mathematical lists of data elements such as rows, columns and tables. Suddenly data was in 3D and could be organised along lines that people could better understand. A very unlikely Prometheus, Codd snatched a new power from the gods and gave it to humans – the ability to organise and make sense out of complex data. Relational databases evolved and grew and became a standard proposition powered further by subsequent developments in the speed and accessibility of computer processing. These new abilities accidentally triggered the routine collection of huge pools of data by corporations and governments, often as unexpected by-products of more mundane daily tasks and compliance requirements to retain information.

In 1970, IBM’s E.F. Codd defined the relational model of data by exploiting tuples

In the last decade, databases have suddenly begun to develop fast and reliable new data capabilities such as cubes, business intelligence and superfast data transfer pipes. Now there is Hadoop Apache and Google's  MapReduce, programme frameworks and algorithms which take the question to the data and hold entire databases in dynamic, slender strings like twisting spaghetti on a fork. Looking inside data gets faster and easier. So, why use a small sample data set when you can interrogate all the data that ever was, right now? This relentless process of data evolution has enabled the machine to nose just in front of us. Databases now provide high quality analysis using fast, reliable queries across multiple dimensions as a starting point.

This explosion of machine power and speed means we humans are catching up in our understanding of the potential of data. Overall, we have scaled up our own understanding and application of data science to cope with the new size and usefulness of the data being collected around us. With the gap closing between the size of data and our ability to master it, we have only now just begun to produce real insight using correlations across big pools of complicated, often unmatched, data sets.4 By preventing overlearning – drowning intelligent machines in too much data – human guidance now controls the analysis of data again. The stage is set for Big Data itself. And as before, it’s down to the quality of the questions we ask.

It has taken a series of specific breakthroughs in computer science and a corresponding development in our ability to manipulate data with ease and speed to enable us to now have the power of insight into the increasing amount of data that is being captured. We now have the ability to peer into the huge pools of information which have ballooned in size around us and use this information in a meaningful way.

Big Data is crystallising into a method used by engineers and data scientists to investigate and analyse information by applying inquisitive analyses and correlations that can be accurately deduced from staggeringly huge sources of data available – even live data. But it has taken human engagement to apply these powerful new tools to effect change.

Additional Topics

Data Sets

Dark Pools of Data, Predictive Analysis and Behavioural Finance

A Thought Experiment

Creating Better Outcomes for Managers and Investors


“…It is a fact universally acknowledged that a single man in possession of a good fortune must be in want of a wife.”5

— Jane Austen said of wealth, Pride and Prejudice

But Austen’s characters fished in a small gene pool. How much misery and heartache would the characters be spared if Jane Austen lived in the world of internet dating?

It seems an obvious challenge but ‘big data, so what?’ So what if there are petabytes of data collected daily about the weather? Who cares how many train tickets are bought online in France? Why should Chinese search words on Google be of interest in an outbreak of avian flu? Why do air traffic patterns matter for the environment and fuel efficiency?

Human curiosity never sleeps. John Snow’s is not the only compelling story. It is relevant – when we hold and analyse data we see shapes and patterns, the way we have evolved to do so from the Stone Age. More data must equate to more patterns and better understanding.

Big data has a bewildering plethora of sources, structures and storage silos. Its predictive algorithms are a parallel and complementary art; there are a handful of existing Big Data hubs or transports to share the databases and fewer still humans capable of understanding or driving real predictive analysis across whatever network. But by accepting messy data’s flaws, we are able to exploit its inherent correlations to our advantage. We have proposed this in the greenfield of AE data, mapped to behavioural finance, as a big data challenge that is both academically verifiable and commercially relevant. John Snow would surely have approved.

Finally, and not simply as a coda, is the relevance of the investment management industry itself. Our industry has long been like Rockefeller’s Standard Oil – generating and storing the raw material of data but refining only a small portion of that for use and exploitation. Now, in the age of powerful data analysis tools and shrinking margins, it is the investment management industry itself that has an opportunity to innovate and widen the scope of data-based solutions. By the mechanism of experiment and case study, this can be focussed and commercially advantageous both for the investment manager and the investor.

Volume, Variety, Veracity, Velocity



1 Malvey, Jack; Shrowty, Ashish; Akoner, Lale (2013). BNY Mellon Investment Management: A First Perspective: The Transformational Influence of “Big Data” on the 21st Century Global Financial System.

2 Snow, John MD (London, 1849). “On the Mode of Communication of Cholera”

3 Codd, E.F. (1970). “A Relational Model of Data for Large Shared Data Banks”

4 Mayer-Schonberger, V, Cukier, K (2013). “Big Data – A Revolution That Will Transform How We Live, Work and Think”

5 Austen, J (1813). “Pride and Prejudice”

BNY Mellon is the corporate brand of The Bank of New York Mellon Corporation and may be used as a generic term to reference the corporation as a whole and/or its various subsidiaries generally. This material and any products and services may be issued or provided under various brand names in various countries by duly authorized and regulated subsidiaries, affiliates, and joint ventures of BNY Mellon, which may include any of the following. The Bank of New York Mellon, in New York, New York a banking corporation organized pursuant to the laws of the State of New York, and operating in England through its branch, in London, England and registered in England and Wales with numbers FC005522 and BR000818. The Bank of New York Mellon is supervised and regulated by the New York State Department of Financial Services and the US Federal Reserve and authorized by the Prudential Regulation Authority. The Bank of New York Mellon, London Branch is subject to regulation by the Financial Conduct Authority and limited regulation by the Prudential Regulation Authority. The Bank of New York Mellon SA/NV, a Belgian public limited liability company, with company number 0806.743.159, whose registered office is at 46 Rue Montoyerstraat, B-1000 Brussels, Belgium, authorized and regulated as a significant credit institution by the European Central Bank (ECB), under the prudential supervision of the National Bank of Belgium (NBB) and under the supervision of the Belgian Financial Services and Markets Authority (FSMA) for conduct of business rules, and a subsidiary of The Bank of New York Mellon. The Bank of New York Mellon SA/NV (London Branch) authorized by the ECB, NBB and the FSMA and subject to limited regulation by the Financial Conduct Authority and the Prudential Regulation Authority. Details about the extent of our regulation by the Financial Conduct Authority and Prudential Regulation Authority are available from us on request. The Bank of New York Mellon, Singapore Branch is subject to regulation by the Monetary Authority of Singapore. The Bank of New York Mellon, Hong Kong Branch is subject to regulation by the Hong Kong Monetary Authority and the Securities & Futures Commission of Hong Kong. The Bank of New York Mellon Securities Company Japan Ltd acts as intermediary for The Bank of New York Mellon. Not all products and services are offered in all countries.

The material contained in this document, which may be considered advertising, is for general information and reference purposes only and is not intended to provide legal, tax, accounting, investment, financial or other professional advice on any matter, and is not to be used as such. Information contained in this document obtained from third party sources has not been independently verified by BNY Mellon, which does not guarantee the completeness or accuracy of such information. The contents may not be comprehensive or up-to-date, and BNY Mellon will not be responsible for updating any information contained within this document. If distributed in the UK or EMEA, this document is a financial promotion. This document and the statements contained herein, are not an offer or solicitation to buy or sell any products (including financial products) or services or to participate in any particular strategy mentioned and should not be construed as such. This document is not intended for distribution to, or use by, any person or entity in any jurisdiction or country in which such distribution or use would be contrary to local law or regulation. Similarly, this document may not be distributed or used for the purpose of offers or solicitations in any jurisdiction or in any circumstances in which such offers or solicitations are unlawful or not authorized, or where there would be, by virtue of such distribution, new or additional registration requirements. Persons into whose possession this document comes are required to inform themselves about and to observe any restrictions that apply to the distribution of this document in their jurisdiction. The information contained in this document is for use by wholesale clients only and is not to be relied upon by retail clients.

Reproduction, distribution, republication and retransmission of material contained in this document is prohibited without the prior consent of BNY Mellon. Trademarks, service marks and logos belong to their respective owners.

© 2015 The Bank of New York Mellon Corporation. All rights reserved.