top of page

Key Takeaway:

 

Day 1 built the foundation (data types, relational vs. non-relational, OLTP vs. OLAP, intro to Cosmos DB & Fabric).
Day 2 expanded into file formats, optimized engines (Spark, Databricks), real-time analytics, and integrated solutions (Fabric + Power BI + Event Stream).

Together, the two days gave a holistic view of Azure’s data services ecosystem, showing how raw data flows from ingestion → governance → analytics → visualization.

image.png

📅 Azure Data Fundamentals (DP-900) Virtual Training

​​​​Day 1 – March 19, 2025:

Core Data Concepts & Non-Relational Foundations

1. Data Types & Formats

  • Structured → rows/columns in CSV or relational DBs.

  • Semi-Structured → JSON (web APIs, system exchange), XML (finance/healthcare), flexible schemas.

  • Unstructured → text, audio, video; massive insights but complex to process.

  • Optimized Formats → Avro (streaming), ORC/Parquet (compressed, efficient for analytics).

  • Blob Storage → binary objects for media & large files.

2. Database Models

  • Relational (OLTP/OLAP): ACID compliance, referential integrity, star schemas for analysis.

  • Non-Relational: key-value, document, columnar; used for flexible/large-scale scenarios like social media.

3. Core Concepts

  • ETL Pipelines → Extract, Transform, Load for preparing clean data.

  • SQL Essentials → Views, stored procedures, indexes; trade-offs of Azure SQL DB vs. SQL Server vs. MariaDB.

  • Partitioning & Keys in non-relational storage for scalability & performance.

4. Azure Services Covered

  • Blob Storage & ADLS Gen2: scalable storage with directory support + analytics integration.

  • Azure Table Storage: simple NoSQL key-value store.

  • Azure Cosmos DB: multi-API, partitioned, globally distributed database.

  • Azure Synapse & Data Warehousing: structured queries, fact/dimension modeling.

  • Apache Spark & Databricks: batch + real-time analytics on massive datasets.

  • Azure Fabric: SaaS platform unifying ingestion, processing, analytics, and reporting.

5. Advanced Analytics & Real-Time

  • Streaming: compare batch (analyze after collection) vs. real-time (analyze as events flow).

  • Delta Lake + Spark: unify real-time streams with historical data.

  • Event Hubs, Event Stream, KQL: immediate insights (e.g., detecting breaches, IoT events).

6. Visualization

  • Power BI: build dashboards with facts/dimensions, hierarchies, scatter plots, maps, cards.

  • Seamless Fabric integration → instant reporting on governed data.

​

Day 2 – March 20, 2025:

Data Formats, Processing Engines & Real-Time Intelligence

1. File Formats & Use Cases

  • Delimited (CSV/TSV) → simple, widely used, no schema.

  • JSON → nested, human-readable, core for APIs/IoT.

  • XML → metadata-rich, schema validation (legacy systems).

  • Blobs → binary data like images/videos/backups.

  • Columnar Formats: Avro (streaming), ORC & Parquet (analytics & compression).

  • Tip: Format choice depends on workload (CSV for exchange, Parquet for big data, JSON for APIs, Avro for streaming).

2. Relational DB Deep Dive

  • Azure SQL Database → managed relational service.

  • SQL Server → maximum control, higher cost.

  • MariaDB → open-source, Oracle-compatible.

  • Normalization → reduce redundancy, enforce schema clarity.

3. Non-Relational DB & Storage

  • Cosmos DB → partition + row key structure, indexes, global low-latency multi-region access.

  • ADLS Gen2 → combines blob storage with directory/file system performance for analytics.

4. Big Data Processing

  • Apache Spark → distributed processing for both batch + real-time.

  • Azure Databricks → collaborative, scalable analytics workspace; integrates engineers, analysts, and scientists.

  • Lakehouse Architecture → combines flexibility of a data lake with the structure of a warehouse.

5. Fabric in Context

  • OneLake → centralized repository, eliminates silos/duplication.

  • Fully Managed SaaS → ingestion → analytics → visualization in one pipeline.

  • Integration with Power BI → direct dashboards without separate infra management.

6. Streaming & Real-Time

  • Event Stream + KQL DB → capture events as they arrive for instant analytics.

  • Delta Lake → query historical + real-time data together.

  • Example Use Cases → fraud detection, security monitoring, live dashboards.

7. Visualization & Action

  • Power BI → interactive, real-time visuals layered onto Fabric + Event Stream data.

  • Outcome → end-to-end pipeline: raw data → stream ingestion → analytics → visualization.

image.png
image.png
image.png
image.png
image.png
bottom of page