Has anyone tried Clouderas Impala

Todo list online

Cloudera is a leading Apache Hadoop software and service provider in the big data market. .. Like Apache Drill, Cloudera's Impala technology seeks to improve interactive query interaction time for Hadoop users. Apache Hive provides a familiar and powerful query mechanism for Hadoop users, but query response times are often unacceptable due to Hive's reliance on MapReduce. Cloudera's answer to this problem is Impala.

Cloudera developed an MPP query engine written in C ++ to replace the MapReduce layer used by Apache Hive. Unlike Dremel and Drill, Cloudera decided that a native C ++ MPP engine, rather than a Java engine, would be the answer to fast, interactive Hadoop queries.

Note that Impala uses HiveQL as its programming interface and Impala's Query Exec Engines are co-located with HDFS data nodes. This corresponds to the Hadoop approach of linking data with processing tasks. Impala can also use HBase as a data store. In this sense, Impala is an extension of Apache Hadoop that is a very powerful alternative to the Hive-on-Top-of-MapReduce model.

Cloudera and Twitter led the development of the new Hadoop file format that can be used with Impala and is available as open source on GitHub. The Parquet file format provides a robust column medium for storing data in Hadoop. It supports highly efficient compression and coding and is effective for storing nested data structures.

You can find Cloudera's Impala technology, which was also inspired by Google's Dremel invention.