Data Processing with Spark Kafka (Data Engineering Vol2 AWS)
Published 4/2025
Duration: 3h 35m | .MP4 1280x720, 30 fps(r) | AAC, 44100 Hz, 2ch | 1.28 GB
Genre: eLearning | Language: English
Published 4/2025
Duration: 3h 35m | .MP4 1280x720, 30 fps(r) | AAC, 44100 Hz, 2ch | 1.28 GB
Genre: eLearning | Language: English
Batch & Stream Processing using Spark and Kafka on AWS
What you'll learn
- Deep dive on Spark and Kafka using AWS EMR, Glue, MSK
- Understand Data Engineering (Volume 2) on AWS using Spark and Kafka
- Batch and Stream processing using Spark and Kafka
- Production level projects and hands-on to help candidates provide on-job-like training
- Get access to datasets of size 100 GB - 200 GB and practice using the same
- Learn Python for Data Engineering with HANDS-ON (Functions, Arguments, OOP (class, object, self), Modules, Packages, Multithreading, file handling etc.
- Learn SQL for Data Engineering with HANDS-ON (Database objects, CASE, Window Functions, CTE, CTAS, MERGE, Materialized View etc.)
- AWS Data Analytics services - S3, EMR, Glue, MSK
Requirements
- Good to have AWS and SQL knowledge
Description
This isVolume 2 of Data Engineeringcourse. In this course I will talk about Open Source Data Processing technologies -Spark and Kafka, which are the most used and most popular data processing frameworks forBatch & Stream Processing. In this course you will learnSpark from Level 100 to Level 400 with real-life hands on and projects.I will also introduce you to Data Lake on AWS (that is S3) & Data Lakehouse usingApache Iceberg.
I will use AWS as the hosting platform and talk about AWS Services likeEMR, S3, Glue and MSK. I will also show you Spark integration with other services likeAWS RDS, Redshift and DynamoDB.
You will get opportunities to do hands-on using large datasets (100 GB - 300 GB or more of data).This course will provide you hands-on exercises that match with real-time scenarios like Spark batch processing, stream processing, performance tuning, streaming ingestion, Window functions, ACID transactions on Iceberg etc.
Some other highlights:
5 Projects with different datasets. Total dataset size of 250 GB or more.
Contains training of data modelling -Normalization & ER Diagramfor OLTP systems.Dimensional modellingfor OLAP/DWH systems.
Other technologies covered - EC2, EBS, VPC and IAM.
Optional Python Course
Who this course is for:
- Python developers, Application Developers, Big Data Developers
- Data Engineers, Data Scientists, Data Analysts
- Database Administrators, Big Data Administrators
- Data Engineering Aspirants
- Solutions Architect, Cloud Architect, Big Data Architect
- Technical Managers, Engineering Managers, Project Managers
More Info