So, You Want to Build an Analytics Capability?
With the abundance of cloud platforms and pay-as-you-go tools, building an analytics capability may seem straightforward at first—just sign up, load some data, and start doing Data Science or even AI.
However, the reality is far more complex.
TLDR; analytics is solved by people with skills, not technology
If You Know Your History
Once upon a time – certainly prior to the late 1980s – the only way to gain insight from computer systems was via operational reports. These were baked into the ‘systems of record’ that still underpin most organisations.
During the latter half of the 1980s things started to change: dedicated ‘Information Systems’ (IS) started to be developed. No longer would organisations be limited to what could be gleaned from often inflexible reports run against operational systems.
Over the years there have been various labels related to Information Systems such as:

The labels in use at present include ‘Analytics’, ‘Data Analytics’ and ‘Data Science’.
Building a dedicated Analytics capability is not a new idea. It can be traced back to at least the 1980s.
Information Systems
Building an Information System can be likened to assembling a technology jigsaw puzzle consisting of storage, servers, networking, processes, software, design & data.
On the surface this looks to many like a relatively straightforward challenge. However, the complexity of assembling the Information Systems technology jigsaw puzzle should not be under-estimated.
Countless IS development attempts have failed over the last 30-40 years. This is often due to those involved under-estimating the various skills required.
To quote the Wikipedia ‘Information System’ article:
“…the most overlooked element of the system is the people, probably the component that most influence the success or failure of information systems.”
Let’s take a look at the steps required to assemble the IS jigsaw…
Decide Which Data to Use
The data used to populate an Information System is obtained from operational systems, usually referred to as ‘data sources’ or just ‘sources’.
Logically, they are considered to be ‘upstream’ and can be internal or external, private or public, on-prem or cloud based.
When presented with often dozens (maybe even hundreds) of data sources, it’s crucial to determine:
- which source(s) to get data from
- the relative priority of each source
- the data available from each source
- the technique(s) required to obtain the data
- the refresh period
- the development & ongoing cost to acquire the data
…and perhaps most importantly…
- what does each data item actually mean?

Quality Check the Data
Once the data source(s) have been decided, it’s important to quality check the data contained within each source.
This is the point at which the actual data quality first reveals itself. This can be significantly at odds with the expected data quality. The truth is in the data, not any supporting documentation.
Quality checking the data involves following:
- obtain sample data from each source
- verify the data type & data demographics (nulls, defaults, distribution of values) for each field in each table for each source
Though time-consuming, this is a critical step. Cutting corners here can lead to costly errors later in the process.

Model the Data
Having sanity checked the data, the next step is to build a target data model.
The fundamental data modelling steps include the following:
- document the things (‘entities’) that are in scope e.g. customer, account, purchase
- for each entity determine the properties (‘attributes’) e.g. forename, surname, date of birth
- describe the relationship(s) between the entities
- produce a logical data model containing entities, attributes and relationships
- decide what source system fields will be used to populate the logical model (‘source:target mapping’)
The data model and the source:target mapping are crucial deliverables before the build phase hasbegun.

What About A Platform?
The sources have been identified and profiled, the logical data model is built and the source:target mapping is complete.
Somewhere to land, process & query the data would be useful. Data warehouse? Data lake? Lakehouse? On-prem? Cloud? So many choices!
Considerations at this point include the following:
- alignment with corporate technology strategy
- storage capacity
- compute power
- operations: support, monitoring, backup/restore, availability
- interoperability with ETL, query, dashboard & data science tooling
… and most importantly:
- cost, cost, cost

Receive, Clean & Integrate the Data
Data is often presented in complex formats containing unexpected or missing values, which is considered ‘dirty’. The level of cleansing required can vary from little to a lot.
Clean or cleansed, data must also be ‘joined up’ (integrated) and aligned with the target data model.
The data preparation processes that populate the target data model do so in support of the main objective: analytics in all its guises.
Maintaining the analytic data model is the domain of Data Engineering.The sources have been identified and profiled, the logical data model is built and the source:target mapping is complete.

Visualise & Present the Data
The rise of dedicated analytics platforms has brought forward data visualization tools, often referred to as Business Intelligence (BI) solutions. Tools like Tableau, Power BI, Looker, and Superset have become essential for translating data into actionable insights.
Those with longer memories might also recall Business Objects (BO).
With a strong emphasis of functional & attractive interfaces, BI dashboard & report development should not be left to the Data Engineering team.
As well as design flair, mastery of BI tools is a challenge in it’s own right. In general, SQL/Python gurus do not make good BI designers.

Keep The Database Running Smoothly
All of the major cloud vendors offer data platforms. Some even claim that they’re ‘fully managed’.
In reality, this usually means basic housekeeping chores like patching the OS and DBMS, backups and alerting have been automated. A more accurate description would be ‘partially automated’.
What isn’t automated are common tasks such as user creation, password resets, granting/revoking of privileges & performance investigations.

Keep The Data Pipelines Running Smoothly
The need to keep the Data Warehouse up to date with new & changed data from the source systems is taken as a given.
The data feeds – or pipelines – can vary enormously in latency, complexity, data volume, frequency & priority. No matter how well designed or developed, one thing was can’t avoid is that pipelines will fail from time to time.
Data pipeline failures typically need to be alerted, investigated and fixed as soon as possible. This can often include overnight or weekend failures.
The longer pipelines stay broken the less accurate the data becomes in the Data Warehouse, which is never good.

Keep The Boss Happy With Data Science
The application of statistical techniques & algorithms to data is not a new concept. However, with the rise of Machine Learning (ML) and Artificial Intelligence (AI) the field of Data Science (DS) is now front and centre in most organisation’s thinking.
Data science sits at the intersection of data engineering, mathematics and business domain knowledge. Languages like Python, R and SQL are the used to derive insights that can potentially derive new insights across a broad range of industries.

Summary
Building a powerful analytics function requires more than just tools; it requires a team with diverse, complementary skills—from Data Architects to Data Scientists.
Without these people, your technology investment will never achieve its full potential.
Now is the time to evaluate your analytics capability, close any expertise gaps, and assemble the team that will turn data into meaningful insights. Your data’s value depends on it—take decisive action today.
This combination of skills—technical proficiency, analytical thinking, domain expertise, and communication—underscores that data analytics is as much about human judgment and insight as it is about technology.


