So, a few days ago, we were talking through various consulting engagements we've had over the years...the kind of marketing collateral discussion you just can't avoid, no matter how hard you might try!
Some of the bigger systems we've encountered are at the likes of Vodafone, Verizon, BT and Nokia. All are big in their own right, but client confidentiality precludes saying any more than that, as you might expect.
Then talk turned to ebay. Not that we've had any dealing with that system. I just remembered that I'd scribbled some notes down from the various ebay sessions at the Teradata Partners 2011 conference in San Diego.
So, here's a few snippets relating to ebay from Teradata Partners:
ebay use a Teradata EDW, a Teradata high capacity appliance system and a Hadoop system (‘horses for courses’)
the Teradata EDW is 6PB and dual-active
the 'Singularity' high capacity system is 40PB and consists of 256 high capacity appliance nodes
the Hadoop system is 20PB
ETL is controlled by Ab Initio and metadata-driven
most feeds are daily with inputs landed on disk as fixed width files
the Teradata loading approach is Fastload/BTEQ
Teradata TD13 compression delivered a 50% IO reduction
maximum loading throughput is 12TB/hour
50TB/day of new data is received
100 trillion name/value pairs are stored in a single table
100 PB/day is analysed, mainly for web site optimisation
In addion to the system metrics above, some words of wisdom that I noted (and agree with):
“Keep atomic data, it supports deep insight’
“Data marts are expensive chaos, which cannot be cheap enough to justify, and lead to data drift”
Ebay seems to have overtaken Walmart as the 'grande fromage' of the Teradata world. They also like to share their story, which is nice.
We're big fans of Ab Initio and the FastLoad/BTEQ approach to Teradata ETL, so it's nice to know there are like-minded folk at ebay.A 100 trilllion rows in a single table - I'll bet there's no FALLBACK on that baby :-)