Benchmarking Ticketing in Vectorized Cuckoo Hash Table for Database Systems
Hash tables are essential data structures for storing key-value pairs, widely used in database management for their efficient search, insert, and delete functions. Typically, hash tables use a hash function to determine the storage index for a key. However, this gives rise to collisions—where different keys hash to the same index. Cuckoo hashing is a special type of hashing that uses an eviction process to resolve collisions, ensuring efficient and consistent access times. Hash tables are especially critical in data science for large-scale data aggregation, traditionally counting and grouping data row-by-row.
In this project, we implemented a vectorized cuckoo hash table in Rust, designed for integration with Penn’s FerricDB, a high-performance database prototype. The vectorized method scans data column-wise instead of row-wise, optimizing the aggregation and ticketing processes. Ticketing assigns unique identifiers to rows for the identification of existing, unique, and repeated keys – crucial for efficient data aggregation.
We focused on optimizing lookup and insertion functions within our hash table implementation using Linux performance tools like ‘perf’ to measure efficiency. By integrating the hash table with the FerricDB API, we aimed to enhance ticketing performance and database efficiency. Our results showed marginal performance improvements over traditional hash tables, particularly in scenarios requiring repeated key identification. We evaluated two ticket assignment strategies—batch hash calculation and pre-calculation—each with distinct strengths under varying conditions.
While our optimizations yielded only incremental throughput gains, they highlighted potential areas for further improvement. Future work will refine the get and insert functions, explore more vectorization opportunities, and implement new aggregation methods, such as counting and determining minimum values. Although our current implementation offers modest improvements, continued exploration and refinement of vectorized cuckoo hash tables hold promise for meeting the demands of high-performance database systems.
Comments