×

Property Data Engineering Sprint - Part 1

End-to-end data engineering solution using SSIS, SQL Server, and dimensional modeling to support property value reporting across NSW.

Project Overview

  • Objective: Build a robust BI-ready data warehouse from raw property and school datasets using SSIS pipelines and dimensional modeling best practices.
  • Tools Used: SSIS, SQL Server, Star Schema, Dimensional Modeling, Lookup & Derived Columns, Conditional Splits


Problem Statement

This project simulates a real-world scenario where a government analytics team aims to track median property values by suburb, city, and postcode across NSW. The team also wants to integrate public school information and classify suburbs by property value categories. The goal was to design a scalable data warehouse and develop ETL pipelines using SSIS that transform raw data into clean, dimensional models to support downstream reporting and dashboard development.



Design Documentation


1. Data Sources

  • AUS_SubCityDistrictState_Data.xlsx
  • NSW_PropertyMedianValue.xlsx
  • NSW-Public-Schools-Master-Dataset.csv

2. ETL Process

  • SSIS used for data ingestion from raw files into staging tables.
  • Derived Columns used to generate category_id.
  • Lookups used to fetch foreign keys from dimensions.
  • Data flow validated using Conditional Splits and redirected outputs.

3. Dimensional Modeling

  • Star schema adopted for reporting efficiency.
  • Fact table: Fact_Property
  • Dimension tables: Dim_Location, Dim_School, Dim_Category

4. Business Rules

  • category_id is derived based on median property value ranges.
  • school_id is optional; unmatched records allowed with NULL for coverage.

5. Error Handling

  • Lookup failures are redirected to Multicast for post-load review.
  • Critical nulls filtered before Fact table insert to ensure data integrity.

6. SQL Scripts

  • SQL scripts provided for creating staging, dimension, and fact tables.
  • Dim_Category populated via static insertion logic for property value bands.


Key Deliverables

  • SSIS package for staging raw data from Excel and CSV files
  • SSIS package for transforming and loading data into Star Schema
  • Fact and Dimension tables including: Fact_Property, Dim_Location, Dim_School, Dim_Category
  • Derived column to classify property values into predefined categories
  • Lookup transformations to map foreign keys and handle data mismatches
  • Bus Matrix and Schema Diagram following Kimball modeling techniques


Links



Back to Portfolio