Why are data co-ops suddenly hot again?

Even the All-In guys are discussing this esoteric business model of late

Outside of my day job, the ‘give-to-get’ or ‘data co-op’ model has popped up in 3 separate places over the past month.

  1. A Media Operator wrote a post suggesting data co-ops are a great, SaaS-like revenue stream that B2B media companies should consider adding.

  2. CB Insights’ Anand Sanwal, a guy who knows a thing or two about data co-ops, suggested a data co-op for executive compensation would be a great startup idea.

  3. David Sacks published a post suggesting that this give-to-get data strategy could be a major solve for any new generative AI company looking to get its hands on valuable data upon which to train its AI models.

Definition: At the highest level, the data co-op strategy consists of three elements.

  1. Customers contribute their internal data to an independent third party for a collective benefit.

  2. Customers receive some benefit / utility for doing so.

  3. This transaction usually replaces an actual monetary one.

So if this model is so ripe for the picking, why is it so fringe compared to other buzzwords in the industry like ‘product-led growth’ or ‘freemium’? As is usually the case, the devil is in the details. Data co-ops come in a variety of flavors, are feasible only under specific circumstances, and have known trade-offs. Let’s dig in.

Starting with data (“intentional”) vs. starting with utility (“unintentional”)

There are both “intentional” and “unintentional” data co-ops. The intentional kind start with the data: someone wakes up and says “I want to start a company that asks customers to share their data in exchange for a utility that I will be able to provide once I have enough of that data.” In other words, data sharing happens before receiving utility.

There are a series of companies such as data.ai (fka App Annie), SimilarWeb, and Nielsen that operate in this way, as independent firms which take their customers’ data and package it up into unique, relevant insights that anyone (who contributes their own data) can access.

Almost all intentional data co-ops involve some kind of benchmarking — whether that’s one business trying to compare itself to its competitor, or a ranking of individuals (e.g. credit scores).

Perhaps even more valuable are unintentional data co-ops: companies that build a utility so strong that customers don’t think twice before contributing their own data to the network. In this case, the utility precedes data collection. I imagine you’ve heard about some of these businesses: Meta neé Facebook, Google, and so on. These companies can obviously be insanely valuable but are also insanely difficult to start. Anyone who starts on day 1 by saying “IF we can build a utility so valuable that we can build a huge data co-op, the sky’s the limit” is likely dead in the water: job #1 in this example is building a utility; which is hard enough in itself.

For that reason, let’s focus mainly on the intentional kind of data co-op where the data sharing comes first.

Necessary preconditions for a data co-op to work

The primary reason why this business model isn’t as popular as others is that it only works in a specific set of circumstances. After all, convincing a group of customers or users to give you their internal data before you provide them with much value isn’t exactly the bargain of a century.

You often see data co-ops emerge at both ends of the barbell: highly concentrated, regulated, and slow growth markets where regulation or quasi-regulation force one into existence — and highly fragmented, unregulated, and fast growth markets where the name of the game is ‘survive & advance’ and companies aren’t thinking too deeply into the future when they agree to share their data.

Here are some key themes you see associated with successful data co-ops:

  1. Truly independent, non-threatening vendor: The company asking for the data must be perceived as truly independent. Sometimes, it’s even an industry trade organization whose explicit mandate is to represent all parties (and not to make a profit). Often, regulation (or quasi-regulation) mandates the creation / existence of this independent provider. Absent regulation, there are a few preconditions to look out for…

  2. Extreme utility: The benefit that customers stand to get if they contribute must be of existential importance. For example, without credit scores, banks would likely lose billions of dollars from poorly allocated loans — it’s worth contributing their user data to a co-op.

  3. Cash poor customers: Most customers who can pay for a service with cash (as opposed to their internal data) would do so. So, absent any forcing factor that dictates that they must pay via their own data, cash rich customers would opt for a typical SaaS fee and keep their private data, private.

  4. Fragmented market: In a market where no one customer’s data can make or break the utility of the co-op, everyone might as well participate. In concentrated markets, the opposite is true.

  5. Relatively equal playing field: No one player should have meaningfully much more to gain (or lose) by sharing their data. In cases when the playing field is tilted, game theory optimal decision-making usually prevents any sort of cooperation.

OK, so you’re ready to start a data co-op business. You’ve found the industry ripe for the model. What are the pros & cons you should expect?

Pros

  1. Free data: Unlike many data businesses, your data is free. You don’t have to pay for it, as is the case for running surveys, or various other types of data collection. This means better gross margins (as data costs are a COGS line item – an unjust sleight by generally accepted accounting principles against other sorts of data businesses…)

  2. Network effects: Data co-ops have strong network effects. The utility of your product strengthens the more data that is contributed to it, thereby making it more attractive to customers.

  3. Anti-fragility: Due to the fact that data co-ops tend to thrive either in regulated or highly fragmented industries, it’s either very difficult or very inconsequential to lose a given customer (and data supplier).

  4. Ancillary products: Once you’ve collected the data for a benchmarking-based use case, there are often other products or utilities you can build on top of that same data. As long as you aren’t barred from doing so (see below), the sky’s the limit!

Cons

  • Limited ability to share data: Anytime you receive data from a customer, especially in the case of unintentional data co-ops, it comes with an end-user licensing agreement (EULA) that dictates specifically what you can and cannot use that data for. Often the restrictions specifically include sharing that data with anyone else except in the most high-level ways.

  • Doesn’t get you cash rich customers: As you attempt to go upmarket and attract bigger & bigger customers, this model often falls apart. Amazon would rather pay in cash than data because (a) they have tons of cash, (b) their data is valuable to them, and (c) they have more to lose by contributing than to gain. This is doubly bad because now your other customers can’t see that juicy Amazon data they want.

  • Usually doesn’t work alone: Because of the above, a data co-op usually needs to be paired with other data strategies in order to work. It only works on a specific set of the market and really doesn’t work outside that.

  • Chicken / egg problem: Asking your customers for data before you can provide them a utility is often a fool’s errand. Plenty of folks that have wanted to build a data co-op failed on this first step. Show me yours and I’ll show you mine.

  • Regulatory hurdles: While Sacks’ essay suggests that medical and legal fields are ripe for data co-ops, I disagree. In a vacuum, these fields would be excellent candidates: and the customers of these industries would likely gladly agree to contribute… if they were allowed to. The more sensitive the data, the more valuable a data co-op could be, but the less likely it is to happen. That said, forming a contributory network in highly regulated industries… perhaps with the help of regulation… has incredibly lucrative and world-changing potential.

As an aside, one of the most powerful regulatory ideas of our time could be a paradigm shift around how we think about data sharing as a society. The societal benefits that come from more sharing & interoperability likely dwarf any costs. But alas, the regulatory tides are moving in the opposite direction, so a topic for another day.

The bottom line: data co-ops surely can be the breakout business model of the next few years IF entrepreneurs understand what they’re signing up for before blindly wading into muddy waters. Especially in data rich fields like generative AI, whether data co-op is a common buzzword or not a few years from now depends on whether we navigate these waters correctly.

PS: Unlike Sacks, I didn’t use ChatGPT to co-author this one… maybe next time!