- Rameez's Newsletter
- Posts
- Complexity is an Analytics company's best friend.
Complexity is an Analytics company's best friend.
Ask anyone who has worked at an Analytics company the worst thing about working there, and they'll launch a rant about the complexity. "There are so many nuanced rules, adjustments, transformations, and caveats that it's nearly impossible to keep them all straight. We're never going to be able to overcome the complexity."
But, for all of this hemming & hawing, complexity is the only reason most Analytics businesses exist. Why?
All else equal, buyers & sellers of data want to get to their goal with the least amount of effort possible.
Sellers of data want to make money (‘monetize’) that data with the least amount of effort. Usually, this means in its rawest form. They want to be a “pass through”. In practice, it’s usually not that easy, and complexity makes it even less easy. Think about where data comes from:
Maybe they provide a utility and users input data in order to use the utility (e.g. inputting bank information to use Paypal)
Maybe the users are creating data in their use of the product (e.g. location data via Uber)
Maybe the data is scraped from public sources of information (e.g. website traffic)
In all of these examples, this data was generated organically, and is therefore extremely messy and not purpose built for any analytics use case. There’s typically a lot of work to do to get it ready for showtime. The more complexity, the more work there is, and the less likely the originator of the data is going to successfully take that burden on themselves.
Buyers of data just want answers with the least amount of effort possible. This means that all of the upstream steps (data acquisition, transformation, visualization) are all just a means to an end. Example reasons buyers want data:
Customers may want data to understand consumer sentiment or competitive positioning. They don’t want reams of transactions, they want insights (e.g. Market research businesses such as Kantar have perfected the abstraction of messy data turned into digestible executive insights)
Customers may want to advertise to certain groups. Being able to simply click a button in an interface is ideal for them (e.g. Facebook Ads Manager perfected this abstraction of data into a simple button)
The net of all this: The greater the complexity, the greater the divide between buyer & seller, and the greater the need for pure play data businesses to sit in between as a middleman.
When mining for potential opportunities, there are two questions to ask: (1) is there customer pain that can be solved with data? (2) is there a high level of complexity to transform that data into answers?
If the answer to both of these questions is yes (and, as a bonus, the addressable market is large), this is likely a big opportunity for an Analytics business. Here are some examples of common complexities that signal an opportunity. We’ll use consumer brand insights examples to illustrate, as they’re immediately relatable to all.
Lack of granularity: Often, readily available data doesn’t quite answer the clients question because it is missing granularity of detail. For example, a retail brand like Cole Haan may want to learn more about its sales by channel relative to peers. However, when customers buy shoes from Nordstrom, the credit card data only show something like “Nordstrom | $199.99”. Not very helpful.
When obvious data sources have a lack of granular information, this is usually an opportunity, especially when there is a brand <> retail / wholesale dynamic whereby the brands are disintermediated from their customers.
Limited coverage: Even brand measurement analytics giants such as Circana, with billions in revenue, cannot measure the full picture. A fun example: historically, they have not been able to measure sales of goods in small boutiques & convenience stores. That means that, when Snickers is trying to understand how many Snickers bars they’ve sold, the best they can do is understand how many Snickers bars they’ve sold minus all of those sold in NYC bodegas. This may not be a problem for Snickers, but what products are dominated by those channels which cannot be covered?
When those gaps in coverage become large enough, or they are present for clients whose primary business is within those coverage gaps, there is an opportunity to fill the void.
Data exhaust which requires massive transformation: As my earlier examples showed, a lot of useful data is borne from “data exhaust” (i.e. data that is created as a byproduct of another product or service, altogether). Going with our consumer brand insight theme… TikTok was not built to function as a brand sentiment tracker. However, for the Uniqlo buyer looking for leading indicators of fashion & style trends amongst gen z, this is the best centralized data source in human history. However, the work required to capture what TikTok stars are wearing (machine vision), and create classification & taxonomy around it (is it a shirt? what color? what brand? what style?) is immense.
Whenever extremely valuable data is thought to be “impossible” to collect this is a bat signal that, soon, an analytics company will figure it out.
Privacy & regulatory issues: As the TikTok example highlighted, there are often privacy concerns & regulatory risks associated with capturing user data. For example, how do you avoid any COPPA violations and avoid capturing + monetizing data for users under 13? This is something an Analytics company may choose to handle but a customer, who is just looking for answers, would not likely have the skill nor appetite to build a process around.
Analytics companies can often function as a way for customers to “outsource” risks associated with handling sensitive data while gaining the benefits.
Data enrichment: Often, joining multiple datasets together yields a 1+1=3 situation. For consumer brands, understanding a consumer’s path to purchase (what they do before they buy) and lifecycle (what they do after they buy) is invaluable since it informs brand, marketing, and product strategies. Enriching the original data (what they bought) with ad exposure data (which ad they saw), location data (where they purchased), and demographics (who they are) leads to highly sophisticated customer segmentations that consumer brands revolve around.
Generally, questions about user behavior which are answerable with data, beget more questions answerable with more data. Customers may have tons of their own data, but the second that data needs to be connected to other sources, they need to call in the (Analytics company) experts. Sourcing these new data sets and connecting them together is reason for an Analytics company to exist. There are dozens of massive data companies that focus ONLY on allowing OTHER data companies to connect data sets together (“identity resolution”).
Biased data: This is perhaps the most long-standing complexity that creates opportunity for analytics businesses. Prior to the digital revolution, data was not so easy to capture. Consumer brands had no first-party data because few were selling directly to customers (Harry’s and Dollar Shave Club kicked off the DTC revolution in the 2010s). Most retail data was heavily analog (think: physical receipts) and fragmented (every grocery store that Safeway owned likely had some version of their own data storage). As a result, surveys were heavily relied upon to learn anything. The challenge: surveys do not scale. Famously, Nielsen’s TV panel, which estimates viewership for 230 million Americans, is built on fewer than 50,000 respondents. That means that each respondent represents approximately 5,000 Americans. With that kind of multiplier effect, you better make sure that your 50,000 respondents look exactly like your 230 million Americans and, if not, that you understand exactly how the two groups differ. The largest Analytics businesses have spent hundreds of millions of dollars dedicated to techniques to address this challenge over the years.
Anytime the answers you get from data cannot be trusted because they don’t represent the whole truth, I smell an opportunity.
The bottom line: when it comes to Analytics businesses, moats can be a tricky thing. Many times, someone can compete with what you do — but, if the complexity is high enough, more often than not, they won’t.
Often, many of these opportunities are squashed because they don't pass the "why can't they just do this themselves?" test. Could someone sell the raw data directly? Probably. Could someone transform that raw data into their specific use case? Probably. But *will* they do so? The more complexity there is in the equation, the more likely the answer to that is *no*.