The Entrepreneurial Impact of Open Data

Sheena Iyengar, Patrick Bergemann — February 08, 2018

The impact of open data on entrepreneurship has been touted for years. Almost five years ago, McKinsey estimated that the economic value of such data in the United States is worth as much as $3 trillion annually. Similarly, Omidyar Network has estimated that open data could increase the value of G20 countries by $13 trillion over five years. These estimates are eye-popping, but they remain estimates. Thus far, little evidence exists as to the actual economic value of open data. If the provision of open data is to continue, its effects likely need to be better understood.

To understand how open data is being used to spur innovation and create value, the Governance Lab (GovLab) at NYU Tandon School of Engineering conducted the first ever census of companies that use open data. Using outreach campaigns, expert advice and other sources, they created a database of more than 500 companies founded in the United States called the Open Data 500 (OD500). Among the small and medium enterprises identified that use government data, the most common industries they found are data and technology, followed by finance and investment, business and legal services, and healthcare.

In the context of our collaboration with the GovLab-chaired MacArthur Foundation Research Network on Opening Governance, we sought to dig deeper into the broader impact of open data on entrepreneurship. To do so we combined the OD500 with databases on startup activity from Crunchbase and AngelList. This allowed us to look at the trajectories of open data companies from their founding to the present day. In particular, we compared companies that use open data to similar companies with the same founding year, location and industry to see how well open data companies fare at securing funding along with other indicators of success.

We first looked at the extent to which open data companies have access to investor capital, wondering if open data companies have difficulty gaining funding because their use of public data may be perceived as insufficiently innovative or proprietary. If this is the case, the economic impact of open data may be limited. Instead, we found that open data companies obtain more investors than similar companies that do not use open data. Open data companies have, on average, 1.74 more investors than similar companies founded at the same time. Interestingly, investors in open data companies are not a specific group who specialize in open data startups. Instead, a wide variety of investors put money into these companies. Of the investors who funded open data companies, 59 percent had only invested in one open data company, while 81 percent had invested in one or two. Open data companies appear to be appealing to a wide range of investors.

This latter point can be seen in Figure 1. Figure 1 displays all startups founded in 2009 or later in the area of data analytics, one of the sectors with the highest use of open data. Each node represents a company and each line represents a shared investor. Two companies that are connected therefore both received funding from the same investor. Green dots represent open data companies, while red dots represent all other companies. Nodes are larger the more ties they have. A clustering algorithm places nodes closer when they share a tie and farther away when they do not.

data-analytics Figure 1. Startups Focused on Data Analytics

Figure 1 shows how investments in open data companies are not made by a particular subset of investors. Instead, open data companies are scattered throughout the network such that many of them are not connected with one another. Investors invest based on other attributes besides whether or not a company uses open data. Other sectors show similar results.

Not only do open data companies attract more investors, but they also obtain more rounds of funding than similar companies that do not use open data. We found that companies identified as using open data receive, on average, 1.35 more rounds of funding than companies that do not use open data, even when only comparing companies founded in the same year, industry and location.

Overall, our findings show that by using data that is in the public domain, startups may bolster their chances of success. Despite open data having little to no barriers to entry, its use leads to beneficial outcomes. By using open data, founders can augment their own resources and focus on the parts of their businesses that provide added and unique value.

Although our results cannot explain why open data companies appear to do better, we can speculate. The newness of open data—the website only started in 2009—could mean that companies that currently use publicly available data have a first mover advantage as they build on resources that have not been widely utilized. If this is the case, then the benefits of open data should decline over time.

Alternatively, companies that use open data may be particularly strategic in their use of public inputs so as to avoid spending time and resources on redundancies. This may help them to improve efficiency and focus their efforts on their own proprietary contributions. In either case, innovation emerges from free and publicly shared data.

There are two implications of this research that we would like to highlight. For governments, these results provide evidence that open data does generate economic value. It is both utilized and contributes to entrepreneurial success. For entrepreneurs, the use of open data provides clear benefits. Open data gives new companies a valuable tool to build upon, and investors do not appear to penalize them for using non-proprietary data.

This is only a first step in a much larger task of evaluating the economic impact of open data. More research is needed, as the above results surely do not capture all companies making use of open data, and the extent that they may capture only the most successful ones is problematic. However, these findings demonstrate that some open data companies are doing very well, and suggest that the rosy prognostications about the promise of open data may indeed have substance.