Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet
It is a common misconception that Apache Parquet requires (slow) reparsing of metadata and is limited to indexing structures provided by the format. In fact, caching parsed metadata and using custom external indexes along with Parquet's hierarchical data organization can significantly speed up query processing.
In this blog, I describe …