×

Property Advance Data Engineering Sprint - Part 2

Data Warehouse dimensional modeling using Kimball methodology and ETL automation to populate fact and dimension tables for property data analysis.

Project Overview

  • Objective: Design a dimensional model and load data from staging tables into fact and dimension tables using SSIS.
  • Tools Used: SSIS (Visual Studio), SQL Server, Star Schema Design, Lookup Transformation, Dimensional Modeling


Dimensional Model Design

  • Followed Kimball methodology to build a clean Star Schema with clear separation between facts and dimensions.
  • Created a bus matrix to plan reuse of conformed dimensions across multiple fact tables.
  • Created a dim_date table and linked it to fact tables via date_key.
  • Implemented a factless fact table for non-measurable events like school and transport presence in suburbs.


ETL Process (SSIS)

  • Separate Data Flow Tasks created for each dimension and fact table.
  • Used Lookup transformations to fetch surrogate keys from dimension tables.
  • Used Derived Column and Conditional Split to clean data during the load process.
  • Ensured surrogate key generation continued correctly even after deletes using IDENTITY columns.
  • Directed invalid records to error outputs for debugging and correction.


Tables Created

  • Dimension Tables: dim_suburb, dim_school, dim_transport, dim_property_type, dim_offence_type, dim_date
  • Fact Tables: fact_house_values, fact_rental_values, fact_crimes, fact_transport_locations, fact_school_locations (factless)


Validation & Testing

  • Verified row counts and foreign key mappings in SSMS after each SSIS package execution.
  • Used multi-cast and data viewers to trace errors during lookup or conversion failures.
  • Queried sample joins to confirm accuracy between dimensions and facts.


Key Skills Demonstrated

  • Dimensional modeling using Kimball methodology and bus matrix design.
  • End-to-end data warehouse loading from staging to dimensional model.
  • ETL performance tuning and error handling in SSIS.
  • Factless fact table design and implementation.


Links



Back to Portfolio