- Analytics Wisdom
- Posts
- #10 How to Get Started with Analytics Engineering
#10 How to Get Started with Analytics Engineering
DBT Template and a New ML Tool!
📊 How to Get Started with Analytics Engineering
For a brief on what is Analytics Engineering, check out my previous article “What is Analytics Engineering and Why Should You Care?”.
Analytics Engineering consists of 3 parts:
Data Collection: Utilize tools like Google Analytics, Mixpanel, or custom Python and SQL scripts for comprehensive data gathering. Employ techniques such as event tracking and API integration to ensure data completeness.
Data Transformation: Clean and structure raw data using tools like Apache Spark, DBT, Python with Pandas, or SQL queries. Apply data normalization and feature engineering techniques to derive actionable insights from transformed data.
Quality Assurance Testing: Validate data integrity and analytics processes using tests for every metric or column you track. Employ unit testing and monitoring techniques to ensure ongoing quality and reliability.
Example: Let’s calculate the daily number of active users per month that place a trade type X from a table that has: customer id, trade id, trade type, day. Our SQL would look something like this for a dbt config.
Let’s try to break this down:
The dbt template defines a macro called
daily_active_users_trade_x()
to calculate the daily active users for trade type X.It retrieves the incremental values from the previous day's data using
get_incremental_values()
and subtracts yesterday's count from today's count to get the new active users.Finally, it utilizes
incremental()
to ensure this job runs incrementally per day, updating only the new data each day.
To get started with Analytics Engineering, I recommend understanding DBT through one of these resources
What is dbt?
The Complete dbt (Data Build Tool) Bootcamp: Zero to Hero