I thought it was about time I waded into the whole ELT & ETL dialogue with some ‘fun facts’, so without further ado…
– both ELT and ETL have been around since the late 1980s/early 1990s (ELT is *not* new)
– historically, ‘black box’ ETL engines (e.g. SSIS, Informatica, Datastage etc.) outnumbered ELT deployments
– ELT attempts to take advantage of the (typically parallel) processing power of the target DW platform to deliver the ‘T’ (Transform) part of the process
– ELT and ETL are effectively the same thing…they both take inputs, optionally make some changes, and apply deltas to the DW target
– both ELT and ELT can be more accurately described as ELTL as they both load data twice
– ‘ETL style’ ELTL receives Extracts from one/more sources, Loads the data to an ETL server, applies the Transforms on the ETL server, then Loads the changes to the DW target
– ‘ELT style’ ELTL receives Extracts from one/more source(s), Loads the data to a staging/landing zone within the DW target, applies the Transforms to a work/transform zone within the DW, then Loads the changes from the work/transform zone to the DW core schema
– the only real difference between ETL and ELT is where/when the ‘T’ takes place
– for ETL the ‘T’ happens via a dedicated external ETL engine prior to the DW load
– for ELT the ‘T’ happens within the DW in between the load/receive and apply phases
– the adoption of scalable cloud-based DW engines that can support the ELT approach is why the ‘ETL v ELT’ debate has become ‘a thing’ recently
For the record, I’m a ‘SQL-only ELT’ kinda guy.
I developed my first Teradata ‘ELT-style’ ETL processes whilst still a graduate trainee in 1989/1990. The same approach has since served me well on Netezza, Greenplum, Redshift and BigQuery.
Author: Paul Johnson

