Data Engineering

DATA 311 - Data Engineering (3-0-3)

Data lineage lifecycle, including question formulation, data collection and cleaning, and exploratory data analysis (EDA) and visualization. Introduction to statistical concepts such as measurement error. Techniques for scalable data processing concepts in data architecture and data stores (databases, warehousing, data lakes, data streams). Data ingestion and ETL (Extract, Transform, Load) processes. Batch vs. real-time data processing. Construction of data processing pipelines to support analytics and machine learning workflows. Workflow orchestration, automation, and the scheduling and managing of end-to-end data processing pipelines. Data observability and monitoring. Introduction to Infrastructure as Code (IaC) for data engineering. Alignment of data governance and security practices including privacy and compliance. End-to-end data hands-on data projects integrating diverse concepts, leveraging cloud platfonns, tools, and techniques to design, build, and deploy data processing pipelines.