Wise memory optimizer 3.65

4/5/2023

To better utilize the hardware features, we introduce VIP, an analytical query engine designed and built bottom up from pre-compiled column-oriented data parallel sub-operators and implemented entirely in SIMD. In this article, we extend a state-of-the-art analytical query engine design by combining code generation and operator pipelining with SIMD vectorization and show that the SIMD speedup is diminished when execution is dominated by random memory accesses. In the database literature, using SIMD to optimize stand-alone operators with key–rid pairs is common, yet the state-of-the-art query engines rely on compilation of tightly coupled operators where hand-optimized individual operators become impractical. Wider SIMD registers and more complex SIMD instruction sets are emerging in mainstream CPUs and new processor designs such as the many-core Intel Xeon Phi CPUs that rely on SIMD vectorization to achieve high performance per core while packing a greater number of smaller cores per chip. Query execution engines for analytics are continuously adapting to the underlying hardware in order to maximize performance. Using our cost model, we show how the carefully tuned memory access pattern of our radix algorithms makes them perform well, which is confirmed by experimental results. We obtained exact statistics on events such as TLB misses and L1 and L2 cache misses by using hardware performance counters found in modern CPUs. Experiments that validate this model were performed on the Monet database system. The performance of these algorithms is quantified using a detailed analytical model that incorporates memory access cost.

We then focus on equi-join, typically a random-access operation, and introduce radix algorithms for partitioned hash-join. We discuss how vertically fragmented data structures optimize cache performance on sequential data access. The insights gained are translated into guidelines for database architecture, in terms of both data structures and algorithms. In this article, we use a simple scan test to show the severe impact of this bottleneck. Main-memory access is therefore increasingly a performance bottleneck for many computer applications, including database systems. In the past decade, advances in the speed of commodity CPUs have far out-paced advances in memory latency.

0 Comments

Wise memory optimizer 3.65

Leave a Reply.

Author

Archives

Categories