Revolutionizing Ad Spend Performance with Real-Time LTV Predictions



VBB as context for real-time LTV

In the fast-paced world of digital advertising, maximizing Return on Advertising Spend (ROAS) is paramount for businesses aiming to thrive in competitive markets. At Ocurate, our cutting-edge VBB+Real-time LTV technology offers a proven solution, delivering a minimum 15% increase in ROAS—an essential addition for any company heavily spending on digital advertising through walled-gardens such as Google and Meta.

At the core of Ocurate’s solution lies our Real-time Lifetime Value (LTV) predictions—a vital component that provides a precise and dynamic view into each customer's value from the outset. This capability drives ad platforms to peak performance by empowering value-based bidding strategies with highly accurate, individual-level, long-term LTV predictions in order to more effectively train each platform's own value-based bidding models. In this article, we’d like to dive into the Machine Learning behind our Real-time LTV, and how we developed it.

Traditional LTV models

Traditionally, predictive LTV stems from two main approaches: the well-known Buy Till You Die (BTYD) family of probabilistic models, and supervised machine learning models like XGBoost for example. While both approaches offer distinct advantages, they also pose notable limitations. BTYD models, reliant on transactional data, may oversimplify customer journeys and overlook external influences, reducing prediction accuracy particularly for individual customers and especially early in their journey, where transactional data is minimal (a conundrum known as “cold start problem”). Conversely, supervised machine learning harnesses additional features for a deeper understanding of customer behavior but faces challenges with fixed prediction times and limited adaptability to dynamic markets.

Ocurate data sources

Initially, our exploration of customer behavior complexity involved the use of a unique “telephone book” of third-party and first-party demographic data, developed by some of the Ocurate core team over 5+ years at our last company PredictWise. This data made up the covariate matrix for our models. However, that approach proved inefficient and computationally intensive, particularly because it involves identity resolution and model-based data imputation - which in turn open up endogeneity concerns - and deliver minimal lift on accuracy. Privacy concerns have also increasingly made access to such data difficult, and there were issues of added inaccuracy over matching individuals vs. households. What is worse: whether matching is done in-house or by a third party, the quality of the match (vs. match rate) cannot be validated independently (here for a unique investigation of the match accuracy rate of proprietary ID resolution solutions). 

Instead, our focus shifted to leveraging first-party data, particularly event stream data ingested in real time through Ocuboost™, Ocurate’s analytics tag, which is combined with other first-party information and purchase data.

Ocuboost™

The Ocuboost™ pixel captures details of customer interactions with retail websites.  The specific details collected are driven by the machine learning model as it is being trained. Examples include:

  • Timestamp
  • Anonymous client ID
  • URL
  • Referrer
  • Title
  • Last modified date
  • Browser details (vendor, version, language, OS, time zone, hardware specs, etc.)
  • Product details (as scrolled into view)
  • Title
  • Description
  • Price
  • Product rating
  • Product review text
  • Product details
  • Shopping cart contents
  • Discounts
  • Coupons / promo codes
  • Currency

The Ocuboost™ pixel never intentionally collects any of the following:

  • Name
  • Date of birth
  • Gender
  • Sexual orientation
  • Physical or mailing address
  • Phone number
  • Email address
  • Government ID number
  • Medical information
  • Financial information
  • Geo-location
  • IP address
  • Text entered into forms by the customer

Ocurate ML approach

At Ocurate, we’ve integrated the strengths of both methodologies by employing a deep neural network, allowing us to harvest the multitude of data sources available. The model takes as input a covariate matrix - which includes embeddings for event streams - along with RFM values (recency, frequency, monetary value) with no required holdout period / target definition, and outputs the individual-level parameters of probabilistic distributions, allowing us to forecast likelihood of churn, number of purchases, and LTV. Training optimizes over the negative log likelihood loss with custom penalizations - developed to balance the main likelihood loss with other more use-case-relevant metrics such as mean absolute error and bias.

This means we are able to both incorporate a wealth of individual-level information to better inform the complexity of the customers’ behaviors – a strength of supervised ML - and deliver a model that learns with the most recent data while at the same time enabling flexible forecasts – a clear advantage of probabilistic approaches.

Addressing challenges such as the cold start problem —where historical purchase data precedes event stream data at the start of the process — our modeling strategy uses comprehensive embeddings of consumer behaviors that can be generated from day 1 of each customer’s journey and from our data collection. The framework, blending neural networks and probabilistic modeling, facilitates dynamic predictions without holdout periods, enabling real-time adjustments to changing customer behavior and market dynamics.

In any application of LTV to net new customer acquisition/AdTech learning the potential value of each customer early on is very important. Our approach outperforms traditional probabilistic models (benchmarked against MBG/NBD) considerably, in particular for recent customers, while maintaining the advantages of such models.

So, in our particular use case, when sending value for each customer back to an ad platform in order to train their value-based bidding algorithms, improvements in accuracy of close to 10% mean a significant increase in the ROAS obtained. 

Ocurate ML in deployment

Upon deployment, we integrate into client’s backend so that we can pull historical purchase data and other information present in their CRM (such as attribution data from Segment or Triple Whale, email engagement data from Klaviyo or Braze, payment data from Recharge or Stripe), and continuously ingest new information daily. At the same time, we deploy our Ocuboost™ pixel to the client’s website and start collecting rich event stream data from it (see above).

All data is joined on customer ID and goes through an EDA (exploratory data analysis) and feature engineering phase, for which we have built an internal Python package in support of streamlining and increasing efficiency. Due to the complexity of the model structure, we run a cross-validated grid-search to scan a large hyperparameter space that includes the neural network parameters, optimizers, and loss penalization coefficients, using our automated version of distributed computing on numerous Tesla M60 GPUs which speeds up this process by 4x.

From this point we trigger a final model training and inference automation, which will kick-start daily (or even more time granular) batches of individual-level LTV forecasts from the first day of a customer lifetime (i.e. when first order is placed), distributed to all relevant back-end and client-facing locations. This includes our model evaluation and model monitoring platform, where data and models are evaluated for drift and performance daily. This process ensures that models will be continuously learning with new data.

Conclusion

Real-time LTV with the above described accuracy and necessary characteristics  seamlessly integrates into ad platforms, efficiently training value-based bidding models. Updates reflecting individual-level behaviors, communicated back to the platform within 90 days of the initial ad click, yield remarkable improvements, with observed lifts in ROAS (LTV:CAC) exceeding 15%.

Get in Touch

Our team would love to hear from you!

Let's Talk