How Clean ML Is Shattering Data Science’s Glass Ceiling

Matthew Karasick

July 5, 2021

In the days before the Internet, brands had access to drastically less data about….well, just about everything. Intelligence around customer intent and behavioral insights typically required self-reporting from surveys and focus groups conducted and analyzed by marketing analysts.

The web exponentially increased the amount of data available to organizations and marketers. In 2008, D.J. Patil and Jeff Hammerbacher were leading the intersection of data and analytics at LinkedIn and Facebook when they coined a new term to describe what they were doing on a daily basis: data science.

Within a few years, everyone, across every sector and company function, was talking about Big Data, and its volume, velocity and potential. Industry pundits rightly convinced marketers that they were sitting on a vast pool of consumer data that, if tapped, could readily power better outcomes and even new business models through better analytics and data-driven execution.

Corporations set to work building teams of data scientists who worked across the organization to add data-driven intelligence to their Sales, Marketing, HR, Operations, & beyond.

Brands, and the technology and data providers who make up their “stacks”, have been steadily growing their investments and focus in building analytics and data science disciplines and are markedly more sophisticated than just a handful of years ago.

With access to the wide-open pipes of the ad tech ecosystem, and the results are impressive; data is captured, parsed and activated within milliseconds of its creation. It is ubiquitously understood that a click on a product page anywhere will likely shape what you see, starting within milliseconds from the click. If you’re like me, you have friends who swear that they see ads for things based on words they have spoken (“my devices are listening…”).

Big Data Hits a Data Ceiling

With the changes to browsers, regulations, and beyond, access to second and third party data has recently become far less ubiquitous. Data collaboration now requires far more intentionality and clean room software has emerged as a viable path to enable data collaboration to occur at scale by allowing data owners to have fine grained control over how their data is accessed and used.

Using clean rooms, endemic publishers are able to enable strategic advertising partners to use their data for measurement without fear of their audience leaking and being activated without their involvement. Retailers are able to allow it’s CPG partners to utilize transaction data to inform optimization opportunities. Auto OEMs and dealer groups are pushing past friction which has existed in it’s three-tier system for decades.

With the momentum of data collaboration which clean rooms have paved the way for, savvy teams quickly landed at a question. If we can find ways to join distributed data, can we also find a way to put my machine learning model with your dataset to run inference or predictions, without either of us ever having to share/ship our respective assets with each other?

CleanML is the natural evolution of Clean Rooms whereby two (or more) parties can each bring distributed raw materials for machine learning/AI such as a model or model-training code or dataset(s), with each respective partners’ assets remaining safe and protected in its own clean room. CleanML then creates a temporal neutral compute environment whereby the assets are joined to produce output which is then written to one or both of the partners’ mutually agreed upon clean room(s). The compute environment then quickly evaporates with the only remaining artifact being the generated desired output.

Using CleanML, data science teams are now the proverbial kids in the candy store. Ask 100 data scientists what they’d rather have, better data or better algorithms, and you’ll hear somewhere between 98-100 of the same answer: “better data”. With CleanML, these smart teams are now realizing that they can quickly leave their own four walls and to start thinking who has potentially valuable data (or models) and would be a candidate for collaboration using CleanML.

Clean ML Use Cases

CleanML is now being used across a number of verticals and use cases. CPG companies and their retail partners are building new propensity models which are driving both advertising and distribution powered by CPG models which utilize retail partners’ data. Brands are able to utilize data enrichment vendors without their data ever leaving its home. R&D departments are now using secret product data from distribution partners to inform new product development. Partners in highly regulated industries such as Financial Services & Healthcare are shattering past ceilings which seemed immovable due to regulation and trust.

CleanML opens the door, not just to an incremental new tactic, but rather a whole path of innovation for data science teams and their partners alike. Roadmaps can and should now imagine working with datasets or models which could come from anywhere. And while it is still early on this arc of innovation where models (or other types of executables) and data can be joined, something tells me that we can hardly even imagine how smart enterprises will use this technology to do amazing things.

Latest News

Lightspeed Commerce Advances Unified Commerce Vision with New AI, Payments, Fulfillment and Operations Innovations

July 17, 2026

OpenText Appoints Jill Larsen to Board of Directors

July 17, 2026

Former IBM Quantum, IonQ Executive Joins Haiqu to Drive Commercialization of Agentic Operating System for Quantum R&D

July 17, 2026

Cambashi Announces Strategic Collaboration with ThreadMoat to Assess the Impact of AI on the Engineering Software Market

July 17, 2026

FourthSquare Appoints Josh Ezring as President and Chief Revenue Officer to Accelerate Next Phase of Growth and Innovation

July 17, 2026

Salestech Insights: Preparing Enterprises for Agent-to-Agent Commerce?

How Salestech Is Reinventing Digital Shelf Intelligence for AI Commerce?

July 13, 2026

How Is Salestech Powering Dynamic Product Recommendation Engines?

July 3, 2026

Predictive Salestech – Taming Multi-Vendor Complexity

June 29, 2026

CPQ Data as the Fuel for Agentic Sales: Why Bad Product Logic Breaks AI Selling

June 19, 2026

Predictive SalesTech: Closing Deals Before Buyers Raise Their Hands

June 15, 2026

AI-to-AI Salestech: When Buyer Bots Start Talking to Seller Bots

June 5, 2026

Neuroadaptive Salestech and the Future of Real-time Sales Psychology

June 3, 2026

Quantum-Ready Salestech: Preparing Sales Data Infrastructure for Future Threats

May 19, 2026

Matthew Karasick

Matt is passionate about finding ways to use data to achieve the wins that data-powered technology can create between companies and consumers. Matt has spent his career helping companies do more with data. He has held product leadership positions at DoubleClick, Trilogy, Acerno, Akamai, and most recently at Indeed. After working closely together with Matt Kilmartin at Akamai, Matt (Karasick) worked with the Krux team as a consultant, where he helped create Krux for Marketers. Matt believes that, when done correctly and with sustainable mutual value as the measuring stick, interests between consumers and companies are always aligned.

How Clean ML Is Shattering Data Science’s Glass Ceiling

Matthew Karasick

Big Data Hits a Data Ceiling

Clean ML Use Cases

Latest News

Lightspeed Commerce Advances Unified Commerce Vision with New AI, Payments, Fulfillment and Operations Innovations

OpenText Appoints Jill Larsen to Board of Directors

Former IBM Quantum, IonQ Executive Joins Haiqu to Drive Commercialization of Agentic Operating System for Quantum R&D

Cambashi Announces Strategic Collaboration with ThreadMoat to Assess the Impact of AI on the Engineering Software Market

Trending Articles

Salestech Insights: Preparing Enterprises for Agent-to-Agent Commerce?

How Salestech Is Reinventing Digital Shelf Intelligence for AI Commerce?

How Is Salestech Powering Dynamic Product Recommendation Engines?

Predictive Salestech – Taming Multi-Vendor Complexity

CPQ Data as the Fuel for Agentic Sales: Why Bad Product Logic Breaks AI Selling

AI-to-AI Salestech: When Buyer Bots Start Talking to Seller Bots

Neuroadaptive Salestech and the Future of Real-time Sales Psychology

Matthew Karasick

You Might Also Like

More From Author

Lightspeed Commerce Advances Unified Commerce Vision with New AI, Payments, Fulfillment and Operations Innovations

OpenText Appoints Jill Larsen to Board of Directors

Former IBM Quantum, IonQ Executive Joins Haiqu to Drive Commercialization of Agentic Operating System for Quantum R&D

About Us

Quick Links

Visit Out Other Sites

Follow Us

Interested in our Customized Editorial Services?

Please fill your details and we'll get in touch with you!

How Clean ML Is Shattering Data Science’s Glass Ceiling

Matthew Karasick

Big Data Hits a Data Ceiling

Clean ML Use Cases

Latest News

Stay With Us

Trending Articles

Matthew Karasick

You Might Also Like

About Us

Quick Links

Visit Out Other Sites

Follow Us

Interested in our Customized Editorial Services?

Please fill your details and we'll get in touch with you!