Data architecture

Course objectives

After this course, you will be able to:

  • Think like a data architect - understand how data flows, transforms, and supports real business decisions.
  • Work confidently with Python to clean, transform, and prepare data using practical, real‑world patterns.
  • Query and model data with SQL, from simple selects to joins, aggregations, and clean schema design.
  • Build small but realistic data pipelines, combining Python, SQL, and Git into a reproducible workflow.
  • Use Git effectively to version data projects, collaborate, and maintain clean, traceable code.
  • Design and deliver a complete data project, from raw files to validated tables and analytical queries.
  • Apply best practices in data quality, logic, and workflow design to create reliable, maintainable data solutions.

Course syllabus

Foundations of Data & Logic

  • What “data architecture” means: pipelines, storage, transformations
  • Types of data: structured, semi‑structured, unstructured
  • Logical reasoning for data workflows
  • Boolean logic, truth tables, predicates
  • Control flow logic (branching, conditions, invariants)
  • How logic maps to SQL WHERE clauses and Python conditionals
  • Lab exercises
    • Build truth tables for simple and compound conditions
    • Translate natural‑language rules into Boolean expressions
    • Write small logic puzzles in pseudocode
    • Practice conditional reasoning with simple Python snippets

Python for Data Work

  • Python syntax essentials: variables, types, operators
  • Lists, dicts, tuples, sets — when to use which
  • Control flow: if/else, loops, comprehensions
  • Functions, modules, imports
  • Context managers
  • File I/O (CSV, JSON)
  • Intro to data‑oriented libraries: csv, json, pathlib, collections
  • Making charts - matplotlib
  • Error handling and debugging patterns
  • Lab exercises
    • Write scripts that parse CSV/JSON files
    • Transform lists/dicts using loops and comprehensions
    • Implement small ETL‑style tasks (extract → transform → output)
    • Build a mini data-cleaning pipeline
    • Practice debugging broken scripts

SQL Essentials for Data Architecture

  • Relational model: tables, keys, constraints
  • SELECT syntax: projection, filtering, sorting
  • JOINs: inner, left, right, full
  • Aggregations: GROUP BY, HAVING
  • Subqueries and CTEs
  • Basic schema design: normalization, relationships
  • Transactions and ACID basics
  • Lab exercises
    • Query a sample database (PostgreSQL or SQLite)
    • Write JOIN queries for real‑world scenarios
    • Build aggregation reports (counts, sums, averages)
    • Create tables with constraints
    • Normalize a messy dataset into 3NF
    • Write CTE‑based transformations

Git & Version Control for Data Projects

  • Why version control matters in data architecture
  • Git basics: init, clone, add, commit, push, pull
  • Branching strategies (feature branches, main/dev)
  • Resolving merge conflicts
  • Using GitHub/GitLab for collaboration
  • Storing SQL/Python code in repos
  • Commit hygiene and reproducibility
  • Lab exercises
    • Create a repo and push Python/SQL exercises
    • Practice branching and merging
    • Resolve intentionally created merge conflicts
    • Review each other’s code via pull requests
    • Tag versions of a data pipeline project

Integrating Python + SQL + Git into Data Workflows

  • How Python scripts interact with databases
  • Using sqlite3 or psycopg2 to run SQL from Python
  • Designing small data pipelines
  • Folder structure and reproducible project layout
  • Logging, configuration, and environment separation
  • Lab exercises
    • Build a Python script that loads data → inserts into SQL → queries results
    • Create a small ETL pipeline stored in Git
    • Review and refactor each other’s code
    • Add documentation and README instructions

Hands-on Project

  • Design a simple schema
  • Load raw data (CSV/JSON) using Python
  • Transform and validate data
  • Insert into SQL tables
  • Produce analytical queries
  • Version the entire project in Git

Prerequisites

This course is ideal for anyone who wants to understand how modern data architecture works and build strong foundations for roles such as data engineer, analyst, or data architect.

Course duration

5 days, 8 class hours each
On-site course quote

In-house training course.

Run at your company premises.

Get a quote
On-line course quote

In-house training course.

Delivered live in a virtual classroom.

Get a quote
Upcoming public courses

No scheduled dates available for this course?

Request a course