cnvrg.io and NetApp Partner to Deliver the Industry’s First MLOps Dataset Caching
cnvrg.io, the data science platform simplifying model management and introducing advanced MLOps to the industry, announced its partnership with NetApp, the first to leverage the cnvrg.io dataset caching tool, a unique set of capabilities for immediate pulling of datasets from cache for any machine learning job. cnvrg.io is the first ML platform to use dataset caching for end to end machine learning development. Caching allows datasets to be ready to use in seconds rather than hours, and cached datasets can be authorized and used by multiple teams in the same compute cluster connected to the cached data. Dataset caching is already used by cnvrg.io customers at production level.
Read More: Quotient Partners With Shipt To Enable Consumer Savings With Digital Coupons
It’s not uncommon to have hundreds of datasets feeding models. However, those datasets may live far away from the compute that is training the models, such as in the public cloud or in a data lake. With NetApp and cnvrg.io’s dataset caching capability, users can cache the needed datasets (and/or their versions) and make sure that they’re located in the ONTAP AI storage attached to the GPU compute cluster or CPU cluster that is exercising the training. Once the needed datasets are cached, they can be used multiple times by different team members.
The cnvrg.io dataset caching feature can be used by any cnvrg.io user with the ONTAP AI storage server. Once connected to an organization, data scientists can cache commits of their dataset on that Network File System (NFS). When a commit is cached, users can attach it to jobs for immediate high throughput access to the data, and the job will not need to clone the dataset on start-up. cnvrg.io’s dataset caching feature creates the following business advantages:
- Increased productivity – Datasets are ready to be used in seconds rather than hours.
- Improved sharing and collaboration – Cached datasets can be authorized and used by multiple teams in the same compute cluster connected to the cached data.
- Reduced cost – Models are pulling the datasets from the cache, reducing payments per download.
- Operationalizing hybrid cloud – Dataset cache presents an on-premises high performance mirror storage.
- Multi-cloud dataset mobility – with on-prem cache as control point for the data.
Read More: Conexiom Reports Record Performance, New Leadership To Drive Growth