Blog

From labelling data to maintaining performance

July 16, 2025   5 min read

By Haroon Hassan, CEO

Data labelling companies are throwing up some staggering valuations these days, fuelled by the undeniable truth that models are worthless without good data.

But looking through this excitement, there’s an important distinction to appreciate, one that separates fleeting revenue from sustainable value. The key question isn’t if a company labels data, but when and why. Are they producing a one-off training asset, or are they delivering a continuous quality assurance service? For any given labelling effort the answer determines whether sustainable value is being created.

Training data is consumed

On the path to a production-grade model, new labels are gold. Each one teaches the model something new, helping to improve performance. But once a model is production-ready, the value of its initial training data declines — at least with respect to that specific model, which has absorbed what it needs from that data.

While that dataset may retain some residual value for regression testing or periodic evaluations, additional research etc, its primary utility, getting a model to the point where it generates revenue, has been consumed. Companies that focus solely on producing these static training sets operate like single-use consultants: they generate one-off revenue, not the recurring income that drives long-term enterprise value.

Yes, in today’s early stages of AI adoption, the same dataset might help train other models. But here’s the catch—labelling companies typically don’t own the raw data. That means the labeled data can’t be resold or reused freely. Without data ownership or an ongoing role in the model lifecycle, pure labelling services do not create recurring, i.e. sustainable, value.

The recurring value of quality assurance

So, is the data-labelling business doomed? Not at all. The truth is that most labelling companies aren’t selling a label they are selling a service. They don’t treat data as a one-time asset; the opportunity lies in managing service, i.e. model inference, quality over time.

In the real world, “production” is not a static finish line. Data distributions drift and new edge cases emerge constantly. This is where a second, more durable form of labelling comes into play: old-school quality assurance.

When a live model encounters a potential edge case, the QA process kicks in. Its primary job is to confirm or deny that an error occurred, ensuring the application delivers reliable outcomes; SLAs demand this. This act of verification also creates a valuable byproduct: a steady stream of high value marginal labels. These verified failures are the precise fuel needed to continuously improve the model.

The model is retrained, absorbs this new information, and the value of that specific label is consumed. However, the value of the service that continuously generates these high value labels does not depreciate. This is a recurring service, not a one-off product. Its value is directly connected to the volatility of the real world, ensuring that as long as the world changes, the service remains essential.

Despite the current excitement around labelling companies, we must be clear about where value is truly added. Is it in producing training data, or is it in quality assuring a model at inference time? The inference-time QA model is unequivocally superior, but excelling at it requires far more than a workforce of annotators.

True value will flow from deep domain expertise, which allows for the accurate labelling of complex edge cases that current models cannot handle. To effectively monetise this capability and generate recurring revenue, one must master fundamental business principles: thoroughly understanding the customer, continuously iterating on solutions, and achieving genuine product-market fit.

Data labelling companies, as they are currently understood, will not retain their value. The real winners will be the application providers who integrate labelling with a domain edge to keep their AI software continuously performant. They aren’t selling data; they are selling reliable, intelligent outcomes.

You May Like

Brief

At Sense Street we’re building AI that reconstructs counterparty intentions in all the messiness of financial chat data.

But what does this endeavour really involve? 

On-Demand
Watch our on-demand webinar to see how Sense Street streamlines your workflows, automates tasks, and enhances visibility into trading activities to meet regulatory requirements.
 
Case Study

One-Click Extraction: Precision and Efficiency in Every Trade Request.

Learn How ING Automates Sales Trader Workflow with Sense Street.