• ebay use a Teradata EDW, a Teradata high capacity appliance system and a Hadoop system (‘horses for courses’)
  • the Teradata EDW is 6PB and dual-active
  • the ‘Singularity’ high capacity system is 40PB and consists of 256 high capacity appliance nodes
  • the Hadoop system is 20PB
  • ETL is controlled by Ab Initio and metadata-driven
  • most feeds are daily with inputs landed on disk as fixed width files
  • the Teradata loading approach is Fastload/BTEQ
  • Teradata TD13 compression delivered a 50% IO reduction
  • maximum loading throughput is 12TB/hour
  • 50TB/day of new data is received
  • 100 trillion name/value pairs are stored in a single table
  • 100 PB/day is analysed, mainly for web site optimisation
  • “Keep atomic data, it supports deep insight’
  • “Data marts are expensive chaos, which cannot be cheap enough to justify, and lead to data drift”

Discover more from VLDB

Subscribe now to keep reading and get access to the full archive.

Continue reading