The Data Engineering Bootcamp: Zero to Mastery
.MP4, AVC, 1920x1080, 30 fps | English, AAC, 2 Ch | 13h 20m | 2.85 GB
Instructor: Ivan Mushketyk
.MP4, AVC, 1920x1080, 30 fps | English, AAC, 2 Ch | 13h 20m | 2.85 GB
Instructor: Ivan Mushketyk
Learn Data Engineering end-to-end. Build real-time pipelines with Apache Kafka & Flink, data lakes on AWS, machine learning workflows with Spark, and integrate LLMs into production-ready systems. Designed to launch your career as a future-ready Data Engineer.
What you'll learn
- Learn the skills and real-world tools used by Data Engineers and become top 10% in your field
- Build stream-processing pipelines with Apache Kafka and Apache Flink
- Create scalable, cloud-based data lakes on AWS using S3, EMR, and Athena
- Develop distributed processing jobs with Apache Spark and orchestrate workflows with Apache Airflow
- Future-proof your skills by learning to integrate AI & machine learning including using Spark ML and LLMs
- Build real-world, production-ready projects and pipelines using popular open source software
Why this Data Engineering Bootcamp course?
Because this Data Engineering Bootcamp is focused on being comprehensive but efficient, while teaching you everything you need to become a Data Engineer step-by-step.
You'll start with Apache Spark, where you'll learn how to crunch massive, real-world Airbnb datasets using code. Then, you'll move on to building a modern data lake on AWS - no fluff, just real tools like S3, Elastic Map Reduce, Glue, and Athena. You’ll orchestrate your data pipelines with Apache Airflow and dive into streaming with Kafka and Flink to build real-time systems. And so much more!
Plus you’ll be at the forefront of the data engineering world by getting hands-on experience building stream processing applications using Apache Kafka and Apache Flink, and even incorporating Machine Learning, AI, and LLMs directly into your data workflows.
By the end, you'll know how to build end-to-end, production-grade data systems…the same skills hiring managers are actively looking for.
Here is what the course will cover to take you from Zero to Data Engineering Mastery:
The curriculum is presented in basic building blocks so that you can build your knowledge step-by-step.
We start from the very beginning by teaching you why data engineering is so important and in-demand. Then we dive in to building projects using the real-world tools that actual Data Engineers use in their day-to-day jobs.
By the end of this course, we know you're going to fall in love with Data Engineering!
Here's a high-level overview of what's covered in this Data Engineering Bootcamp:
Introduction to Data Engineering
Get a clear roadmap of what modern data engineering looks like and ensure your setup is ready to go. This section also introduces key prerequisites like Docker and virtual environments.
Big Data Processing with Apache Spark: Process & Analyze Real-World Airbnb Data
Learn to harness the power of Apache Spark to process large datasets efficiently. You’ll work with the DataFrame API, UDFs, aggregations, and tune Spark jobs for real-world performance.
Creating a Data Lake with AWS
Create a scalable data lake using S3, EMR, and Athena. Understand columnar data formats and build a modern storage solution for batch analytics.
Implementing Data Pipelines with Apache Airflow
Learn how to coordinate data tasks using Airflow. You’ll build reliable workflows, handle retries and failures, and run Spark jobs and data ingestion tasks smoothly.
Machine Learning with Spark ML: Create a Data Pipeline, Train a Model + more
Build ML pipelines using Spark’s scalable ML library. From classification to regression and model tuning, you’ll integrate intelligent insights into your data pipeline.
Using AI with Data Engineering: LLMs, HuggingFace + more
Explore how LLMs can fit into the data engineering stack. Use Hugging Face and Outlines to classify, transform, and generate structured output within Spark workflows.
Real-Time Data Processing ("Stream Processing") with Apache Kafka
Dive into Kafka and build robust streaming applications. Learn about producers, consumers, data ingestions, Kafka transactions, and build data pipelines that process incoming data in real time.
Stream Processing with Apache Flink
Use Flink to perform complex stream processing. Work with keyed streams, event time, joins, and build responsive, intelligent streaming apps using Kafka data.