Dremio’s Data Lake Engine Enables Breakthrough Speed for Analytics on Data Lake Storage
Dremio 4.0 Dramatically Improves Performance and Delivers a Self-Service Semantic Layer for Data in ADLS, S3, and Other Data Sources
Dremio, the data lake engine company, announced the release of its Data Lake Engines for AWS, Azure, and Hybrid Cloud. This version of Dremio’s open source platform includes advanced columnar caching, predictive pipelining, and a new execution engine kernel delivering up to 70x increases in performance.
“We process hundreds of thousands of transactions on a daily basis and produce insights based on those transactions; this type of capability requires sophisticated and scalable data platforms,” said Ivan Alvarez, IT vice president, big data and analytics, NCR Corporation. “Dremio is working with NCR to solve the integration between traditional enterprise data warehouse and scalable distributed compute platforms for big data repositories. This integration allows NCR to also cross pollinate data engineering knowledge among platforms and most importantly to deliver faster data insights to our internal and external customers.”
Read More: FreedomPay Announces Paradies Lagardère As Latest DecisionPoint Network Partner
Flexibility and Control for Data Architects, and Self-Service for Data Consumers
With Dremio, companies can operationalize data lake storage such as ADLS and S3, making data easy to consume while providing the interactive performance that users demand.
The engine provides ANSI SQL capabilities, including complex joins, large aggregations, common table expressions, sub-selects, window functions and statistical functions. With built-in Dremio connectors for Tableau, Power BI, Looker and other analysis tools, as well as Dremio’s ODBC, JDBC, REST and Arrow Flight interfaces, it is easy to use any client application to query the data.
Dremio executes queries directly against data lake storage while leveraging patent-pending technology to accelerate query execution. The data does not need to be loaded into other systems, such as data warehouses, data marts, cubes, aggregation tables and BI extracts. Data can reside in a variety of file formats, including Parquet, ORC, JSON and text-delimited (e.g., CSV).
“Organizations recognize the value of being able to quickly leverage data and analytics services to further their data-driven initiatives,” said Mike Leone, senior analyst, Enterprise Strategy Group. “But it’s more important than ever to start with a strong data foundation, especially one that can simplify the usage of a data lake to enable organizations to maximize data availability, accessibility, and insights. Dremio is addressing this need by providing a self-sufficient way for organizations and personnel to do what they want with the data that matters, no matter where that data is, how big it is, how quickly it changes, or what structure it’s in.”
Dremio on Cloud Data Lake Storage
The latest version of Dremio includes critical features for speeding up deployment and use of Dremio on cloud data lake storage including:
Performance
● Columnar Cloud Cache (C3) – Automatically caches data on NVMe or SSD close to compute to significantly improve performance and reduce network traffic. C3 is real-time, distributed and automatic, with zero administration or user involvement required, and uses existing cluster resources already available.
● Column-Aware Predictive Pipelining – Eliminates waits on high-latency storage by predicting access patterns, resulting in 3x – 5x faster query response times. Predictive Pipelining works with columnar data (Apache Parquet and ORC) on data lake storage (S3, ADLS, HDFS), and improves read-ahead hits and pipelining while increasing read throughput to the maximum allowed by the network.
● Gandiva GA – Gandiva is the first execution kernel optimized for high-performance columnar processing of Apache Arrow data. Gandiva makes optimal use of modern CPU architectures, is written in C++ for performance and uses runtime code-generation in LLVM for efficient evaluation of arbitrary SQL. Performance improvements are striking, with complex analytical workloads seeing up to 70x performance improvement from Gandiva.
Read More: FreedomPay Announces Paradies Lagardère As Latest DecisionPoint Network Partner
Security
● Single Sign-On and Azure AD – Dremio now offers a flexible method to integrate Dremio with existing identity management systems, and offers seamless user access when switching between applications. Includes support for OAuth and Personal Access Tokens for seamless connections over ODBC, JDBC and Arrow Flight endpoints.
● Advanced AWS Security – Dremio now includes native support for AWS security services for enterprise users, such as AWS Secrets Manager, Multiple AWS IAM Roles, Server-Side Encryption with AWS KMS–Managed Keys, and more.
“Dremio’s Data Lake Engine makes queries on data lake storage extremely fast, so that companies no longer have to move data into proprietary data warehouses or create cubes or extracts to get value from that data,” said Tomer Shiran, co-founder and CEO, Dremio. “We’re excited to announce new technologies – like our Columnar Cloud Cache (C3) and Predictive Pipelining – that work alongside Apache Arrow and the Dremio-developed Gandiva kernel to deliver big increases in performance.”
Dremio Hub
With this release, Dremio is also announcing Dremio Hub. In addition to the native connectors that come with Dremio, Dremio Hub provides a marketplace of community-developed connectors, making it easy to join data lake storage with many other data sources. At launch, Dremio Hub includes contributed connectors for Snowflake, Salesforce, and several other data sources – and the number is expected to grow quickly as Dremio has also established a formal program for soliciting, accepting and publishing further contributions.
Availability
The latest release of Dremio’s data lake engine is available immediately. Deploy Dremio to start running analytics at unprecedented speed in minutes.
Read More: Toluna Announces Panel Community Growth To 30 Million Members Worldwide