top of page

Hands On Tutorials to Implement Concepts

image.png

The best engineers have a strong appreciation for the pros and cons of a solution. 

With each exercise below we train the reflection muscle to further connect the dots. 

Data Flow (completed)

  • DF001: ETL vs. ELTL: Compared single-step ETL vs. multi-stage ELTL for tradeoffs in simplicity, scalability, and raw data retention.

  • DF002: Scheduling & Pipelines: Built orchestrated Dataflows w/advanced scheduling & dependency control, optimizing refresh strategies.

  • DF003: Data Type Conversion: Standardized inconsistent CSV datatypes & generated optimized Warehouse tables, assess schema tradeoffs.

  • DF004: Data Cleansing: Transformed messy audiobook dataset into an analyst-ready model, balancing complexity with performance.

Data Pipelines (in progress)

  • DP001: Simple End-to-End Pipeline: Orchestrated HTTP → Warehouse → Semantic Model with success-gated refresh.

  • DP002: IoT Project: Built Bronze/Silver tables; applied Dataflow & Notebook transforms; added lineage + Fahrenheit conversion.

  • DP003: Metadata-Driven Pipelines: Scaled to 4 datasets using For Each loops and dynamic content.

  • DP004: Data Structures (Arrays): Learned to parse JSON arrays for dynamic pipeline control, including iteration over multiple sources.

  • DP005: Parameter Play: Implemented end-to-end parameterization across parent/child pipelines, notebooks, and stored procedures.

  • DP006: Data Quality Testing (Task 1): Enforced column-contract validation with auto-quarantine on schema drift.

  • DP007: Data Structures (Objects): Focused on JSON objects-accessing key/value pairs and passing them into parameters/activities.

  • DP008: Data Structures (Advanced): Tackled nested arrays & objects; combined parsing patterns to handle real-world JSON payloads.

  • DP009: Data Quality Testing (Task 2): Centralized DQ rules in Warehouse metadata and processed multiple files dynamically.

 

Data Warehouse & T-SQL (in progress) 

  • DW001: Create a Data Warehouse & Tables -Provision a Fabric DW, created schemas, and define base tables with T-SQL.

  • DW002: Insert & Query Data - Practice inserting rows and building queries with filters, sorting, and aggregation.

  • DW003: Joins & Relationships -Apply INNER, LEFT, RIGHT joins to connect datasets; reinforced relational modeling.

  • DW004: Functions & Expressions - Use string, date, and numeric functions to clean and transform data.

  • DW005: Views & Stored Procedures - Build reusable views and parameterized stored procs for consistent querying.

  • DW006: Indexing & Performance - Implemented clustered/non-clustered indexes; optimized queries for efficiency.

  • DW007: Security & Permissions - Configured roles, grants, and revokes to secure Warehouse access.

  • DW008: Advanced T-SQL -Used subqueries, CTEs, and window functions for analytical queries.

  • DW009: Transactions & Error Handling -Practiced BEGIN TRAN, commit/rollback, and TRY…CATCH for resilient code.

  • DW010: Data Import/Export - Load CSVs into DW and export results back for reporting.

  • DW011: Constraints & Keys - Enforce primary, foreign, and check constraints to ensure data integrity.

  • DW012: Normalization vs. Denormalization -Model normalized schemas for integrity and denormalized for performance.

  • DW013: Star & Snowflake Schemas - Design fact/dimension models for analytics workloads.

  • DW014: Slowly Changing Dimensions (Intro) - Compare SCD1 vs. SCD2; planned for tracking customer changes.

  • DW015: Watermarks Crash Course - Configure incremental load checkpoints to handle late-arriving data.

  • DW016: Partitioning - Split large tables for query efficiency; practice sliding-window strategies.

  • DW017: Surrogate Keys - Replace business keys with surrogate keys; reinforced SCD design.

  • DW018: CONFX Intro & Setup - Stage 12 raw CSVs in Bronze Lakehouse; defined Green (conformed) target.

  • DW019: CONFX Sessions Table - Consolidate Vegas/Stockholm session data; build reusable Sessions dimension.

  • DW020: CONFX Speakers - Create Green Speakers table; handle duplicates and multi-session attribution.

  • DW021: CONFX Venues - Merge venue data into conformed Green model; standardized location attributes.

  • DW022: CONFX Attendees & Attendance - Model attendees + session links; apply surrogate keys and integrity checks.

  • DW023: Fact Tables - Build fact table to tie sessions, attendees, and attendance measures.

  • DW024: Date Dimension - Generate reusable calendar/date dimension with hierarchies.

  • DW025: Time Dimension - Build time of day dimension for hourly/minute-level reporting.

  • DW026: CONFX SCD Type 1 (Detect Changes) - Compare Bronze vs. Green customers to identify inserts, updates, deletes.

  • DW027: CONFX SCD Type 1 (Apply Changes) - Write idempotent procs for merge logic with bulk insert/update/delete.

  • DW028: CONFX SCD Type 2 — Implement SCD2 table with valid_from/valid_to/is_current and validated history tracking with staged change sets.

image.png

💡 Takeaway: x

bottom of page