Data Engineering I
Course Code: Y1D2
ECTS Credits: 3.0
Course Description
This course builds on students’ foundational SQL knowledge from Year 1 Block B and introduces them to the essential concepts and practices of data engineering through a series of hands-on labs, guided study sessions, and team-based design activities. It begins with a introduction to relational databases and their management using PostgreSQL. Students compare relational and non-relational database models, refresh their knowledge on SQL syntax and perform basic queries on a real world dataset being used for the block project. In parallel, they begin preparing for collaborative project work in teams, exploring a real client case, and familiarizing themselves with a large, multi-source dataset.
The course then transitions into the principles of data warehousing, focusing on the differences between transactional databases (OLTP) and analytical systems (OLAP). Students learn the structure and purpose of data warehouses, study schema design approaches, and apply dimensional modeling using the Kimball methodology. Key concepts such as normalized versus denormalized data, fact and dimension tables, and star versus snowflake schemas are explored in depth. Students also gain a conceptual understanding of data pipelines, including batch and streaming ETL processes.
Advanced SQL skills are developed through targeted upskilling sessions, where students learn to work with, for example, time-based data, window functions and advanced joins. This technical learning is complemented by an introduction to critical database design considerations, such as security, scalability, privacy, and reliability—framed through real-world scenarios. The final stages of the course involve integrating these skills into a complete project workflow: students draft, test, and document their team’s database design, connect it to a Python-based machine learning workflow, and prepare SQL scripts to support both their analytical models and project deliverables. Throughout, emphasis is placed on applied problem-solving, critical thinking, and collaboration in a professional data engineering context.
Course Content
- Relational & Non-Relational Databases
- Structured Query Language (SQL)
- SQL Functions: Filtering, Aggregation, Joins, Subqueries, Case & Window Functions
- Date and Time Operations in SQL
- Normalization and Denormalization
- Schema Designs: Star and Snowflake Schemas
- Dimensional Modeling: Facts and Dimensions
- Data Warehousing and ETL Pipelines
- OLAP vs OLTP Sytems
- Database Design Considerations
Prerequisites
- Introduction to SQL (Y1B3)