Database Benchmarking

Readings in Database SystemsTPC Database Benchmarks

As most readers of data warehouse related blogs will no doubt know, the Transaction Processing Performance Council (TPC) define database benchmarks to allow comparisons to be made between different technologies.Amongst said benchmarks is the TPC-H ad hoc decision support benchmark, which has been around since 1999 with the latest version being v2.17.1.This is the most relevant TPC benchmark for those of us focussed on data warehouse platforms.

Benchmark Scale

TPC-H benchmarks are executed at a given scale point ranging from 1GB up to 100TB:

  • TPCH1 = 1GB
  • TPCH10 = 10GB
  • TPCH30 = 30GB
  • TPCH100 = 100GB
  • TPCH300 = 300GB
  • TPCH1000 = 1,000GB/1TB
  • TPCH3000 = 3,000GB/3TB
  • TPCH10000 = 10,000GB/10TB
  • TPCH30000 = 30,000GB/30TB
  • TPCH100000 = 100,000GB/100TB

Benchmark Data

Data used to populate the TPC-H schema is generated using the DBGEN utility which is provided as part of the TPC-H download.

Database Schema

The schema used in the TPC-H benchmark is retail based and contains the following tables:

  • customer
  • orders
  • lineitem
  • part
  • partsupp
  • supplier
  • nation
  • region

Each table contains either a fixed number of rows or a number related to the scale factor in use.Unsurprisingly, the 'lineitem' table contains the most rows, which is 6,000,000 x scale factor e.g. 600,000,000 rows for scale factor 100 (TPCH100).Column names and data types are provided for all tables.For those using MPP databases such as Teradata, Greenplum or Netezza you'll have to decide your own data distribution strategy.

Benchmark Queries

The queries that are executed against the populated schema are generated using the QGEN utility which is provided as part of the TPC-H download. Minor query modifications are allowed to take into account differences between DBMS products.There are a total of 22 'SELECT' queries ('query stream') that must be executed against the populated retail schema at a given scale factor. Each query corresponds to a business question. There are also 2 refresh functions ('refresh stream') that add new sales and remove old sales from the database.

Database Load Time

The elapsed time required to generate & load the data to the database must be recorded. The time to execute other supporting tasks such as table creation, index creation & statistsics collection must also be recorded.

Performance Tests

Once the data is loaded the real fun and games can begin. The performance test consists of both single-user power and multi-user throughput tests.The power test consists of the first refresh function followed by the 22 query set and lastly the second refresh function.The throughput test consists of a minimum number of concurrent runs of the 22 queries ('streams'), as determined by the scale factor:

  • SF1 = 2
  • SF10 = 3
  • SF30 = 4
  • SF100 = 5
  • SF300 = 6
  • SF1000 = 7
  • SF3000 = 8
  • SF10000 = 9
  • SF30000 = 10
  • SF100000 = 11

The throughput test is run in parallel with a single refresh stream.The set of 22 queries is run in a sequence specified in Appendix A of the TPC-H guide and is dependant on the number of streams.The timing of all queries is measured in seconds.

Benchmark Metrics

The metrics captured by the power and throughput tests are as follows:

  • composite query-per-hour (QphH@Size)
  • price/performance ($/QphH/@Size)

Detailed explanations as to how these metrics are computed are available in section 5 of the TPC-H guide. There are some lovely equations in there for maths geeks to enjoy!

Benchmark Results

Vendors that submit results are required to provide an executive summary in addition to a full disclosure report. All of the shell scripts, SQL, log files etc are also provided as supporting files. There are no secrets here!Benchmark performance results have been submitted at the following scales:

  • SF100 (100GB)
  • SF300 (300GB)
  • SF1000 (1TB)
  • SF3000 (3TB)
  • SF1000 (10TB)
  • SF3000 (30TB)
  • SF100000 (100TB)

Perhaps interestingly, the fastest system at all scale factors is currently the Exasol database running on Dell PowerEdge servers. We'll let you peruse to TPC web site to see how they did it.

Benchmark Use

The TPC-H benchmark is primarily of interest to technology vendors to show us what they've got in their locker.Here at VLDB we're data consultants, not tech vendors, so why the TPC-H interest?Well, it is possible to use a cut-down version of the TPC-H benchmark to assess the real-world capability of various database platforms.Why believe the hype when you can test database performance with a consistent set of queries against a real-world(ish) schema and usable data?Well, that's exactly what we've been doing for several years now.Specifically, we use a set of benchmark tests to assess cloud database performance. Also, as cloud platforms are said to suffer the 'noisy neighbour' problem, we also run benchmark tests over extended periods to test the variance of database performance over time for cloud platforms.Some of the results are interesting...very interesting :-)