×

Property Advance Data Engineering Sprint - Part 1

End-to-end data ingestion using SSIS to extract, clean, and load multi-source Excel data into a structured staging layer within the Data Warehouse.

Project Overview

  • Objective: Extract, transform, and load raw Excel data from multiple folders into SQL Server staging tables using SSIS packages.
  • Tools Used: SSIS (Visual Studio), SQL Server, Excel, Data Flow Task, Lookup, Derived Column, Conditional Split


Problem Statement

Multiple Excel files representing property values, rentals, crimes, schools, and transport data from three different Australian states (NSW, VIC, SA) needed to be consolidated into a unified staging environment. Each dataset came with different structures, data quality issues, and missing values. The goal was to automate the data load using SSIS, ensure transformation consistency, and prepare data for downstream modeling.



What I Built

  • Created 6 staging tables in SQL Server to capture normalized raw data.
  • Designed and implemented multiple SSIS packages and sequence containers — one for each domain (House, Rental, School, Crime, Transport, Suburb).
  • Used Data Flow Tasks to extract from Excel, apply transformation logic (e.g. trimming, casing, replacing nulls), and perform lookups to enrich missing fields (e.g. city via suburb).
  • Implemented derived columns to standardize casing and bucket values.
  • Validated successful loads through data viewer and staging table sampling.


Transformation Examples

  • Trimmed whitespace from suburb and city names to ensure joins worked correctly.
  • Standardized casing for suburbs (e.g. from ALL CAPS to Proper Case) to match lookup keys.
  • Handled null values in house value and rental fields by removing invalid rows.
  • Enriched records using lookup from suburb dataset to populate missing city values.


ETL Workflow Summary

  • Each state-specific dataset had its own Data Flow Task inside a Sequence Container.
  • Staged tables were populated separately for NSW, VIC, and SA using consistent transformations.
  • All data was appended to common staging tables like stg_house_values, stg_rental_values, etc., each with a load_date column.


Key Skills Demonstrated

  • Building and organizing SSIS packages for repeatable ETL workflows.
  • Using Lookups to join and enrich data dynamically within SSIS.
  • Data quality assurance via cleansing, derived columns, and conditional splits.
  • Scalable and auditable staging layer for downstream star schema modeling (Part 2).


Links



Back to Portfolio