Third-Party Data Is Not the Holy Grail

Third-party data doesn’t have the insights operators need to build a brand on loyal, high-value customers. First-party data is the future, and Ocurate has the database to prove it. With in-depth behaviors and attitudes of 260 million Americans, our database helps brands predict and act on lifetime value. Here’s how we did it.

Tobias Konitzer

Third-Party Tracking Data Is Dead 

Third-party tracking data is dead. Long live first-party data. 

The recent years of big data have been built on tracking, either through online cookies or location data. Brands try to use information sourced by this tracking to make better business predictions, gain more customers, and build a strong, long-lasting customer base. 

There’s one snag: 

This is not the future of data. A bet that I am willing to take: In the next five years, firms that made their names on this tracking will disappear, partly due to regulatory shifts in favor of privacy. But the bigger issue is this: Most of the databases built on these traditional methods sacrifice breadth for depth, or vice versa.

Now it’s possible to develop databases that are broad in scope and deep in insight. We know because we’ve done it. It’s not a flashy technological breakthrough but the result of years of research, data science know-how, and some sophisticated tooling.

Here’s how we did it.

Given the background of PredictWise, the company I co-founded before Ocurate, the obvious place for us to start was voter rolls. Voter registration data is public administration data in the U.S. and makes for a logical base to any broad consumer database. 

Then we pulled in data sources like credit files for the sole purpose of building a sophisticated identity resolution tool that accounts for duplicates and differentiates individuals with the same name, or individuals that have two addresses because of a recent move. Ultimately, this led to our creation of a database of 260 million Americans with narrow demographics.

Many in the data industry hear this and say, “Oh, demographics like address and gender? That’s pretty useless.” In fact, it’s the opposite: Narrow demographics like these provided the perfect foundation for Ocurate to build behavioral and attitudinal data features that make up a highly insightful database of consumer data.

For example, residential demographic data allows us to see that a man named Marshall has moved from a one-bedroom home to a two-bedroom home. From that, we can deduce with very high accuracy that Marshall got married (I wrote an influential academic paper around using voter records to analyze underpinnings of marriages in the U.S., summary here). 

We turned this logic into a feature engineering tool, building many of these ground truth insights on top of the initial data. In addition, looking at the census block data, we can pull a lot of other data on Marshall, like his income. We can also look into his behaviors, like when and how he commutes.

Insights like commuting behaviors come from public census data like the American Community Survey (ACS). The ACS provides insights into local socioeconomic situations and local behaviors. Our entire model is built around public administration data and first-party data. 

We collect first-party data directly from individuals themselves. We received survey data from 300,000 individuals who agreed to let us monitor their media consumption on their cellphones and other behaviors. They also shared with us their foundational attitudes about the world. 

This first-party information, which we gain through explicit individual consent, is extraordinarily rare in the data world, and also extraordinarily valuable. To be clear, it's entirely different from a notorious firm which harvested Facebook information from unknowing users. 

Working from other accurate deductions and behavioral data, we have built a machine learning model that extrapolates rich, precise insights on 260 million Americans — insights that constantly update to our cloud database so that our insights are always grounded in current truth. The power of this database was validated in the 2020 Election (seen here).

All this data taken together, in conjunction with customer data on product usage and purchases, allows us to predict with high certainty the lifetime value (LTV) of an individual customer at numerous brands, even at the lead level where such customer data is sparse.