Data Engineering Master Course: Spark/Hadoop/Kafka/Mongodb

Posted By: ELK1nG

Data Engineering Master Course: Spark/Hadoop/Kafka/Mongodb
Last updated 5/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 5.61 GB | Duration: 12h 12m

Full Hands on course to become Big Data Engineer: Spark/Kafka/Hadoop/Flume/Hive/Sqoop/MongoDB. Data Engineering course.

What you'll learn

Hadoop Ecosystem, Sqoop, Flume, Hive

Expertise on writing code with Apache Spark

Learn Kafka Fundamentals and using Kafka Connectors

Learn writing queries and client in MongoDB

Learn Data Engineering technologies

Requirements

No

Description

In this course, you will start by learning what is hadoop distributed file system and most common hadoop commands required to work with Hadoop File system.Then you will be introduced to Sqoop Import Understand lifecycle of sqoop command.Use sqoop import command to migrate data from Mysql to HDFS.Use sqoop import command to migrate data from Mysql to Hive.Use various file formats, compressions, file delimeter,where clause and queries while importing the data.Understand split-by and boundary queries.Use incremental mode to migrate the data from Mysql to HDFS.Further, you will learn Sqoop Export to migrate data.What is sqoop exportUsing sqoop export, migrate data from HDFS to Mysql.Using sqoop export, migrate data from Hive to Mysql.Further, you will learn about Apache FlumeUnderstand Flume Architecture.Using flume, Ingest data from Twitter and save to HDFS.Using flume, Ingest data from netcat and save to HDFS.Using flume, Ingest data from exec and show on console.Describe flume interceptors and see examples of using interceptors.Flume multiple agents Flume Consolidation.In the next section, we will learn about Apache HiveHive IntroExternal & Managed TablesWorking with Different Files - Parquet,AvroCompressionsHive AnalysisHive String FunctionsHive Date FunctionsPartitioningBucketingYou will learn about Apache SparkSpark IntroCluster OverviewRDDDAG/Stages/TasksActions & TransformationsTransformation & Action ExamplesSpark Data framesSpark Data frames - working with diff File Formats & CompressionDataframes API'sSpark SQLDataframe ExamplesSpark with Cassandra IntegrationRunning Spark on Intellij IDERunning Spark on EMRYou will learn about Apache KafkaKafka ArchitecturePartitions and offsetsKafka Producers and ConsumersKafka SerDEsKafka MessagesKafka ConnectorIngesting Data using Kafka ConnectorYou will learn about MongoDBMongoDB UsecasesCRUD OperationsMongoDB OperatorsWorking with ArraysMongoDB with SparkData Engineering Interview PreparationSqoop Interview QuestionsHive Interview QuestionsSpark Interview QuestionsData Engineering common questionsData Engineering Real project questions.

Overview

Section 1: Big Data Introduction

Lecture 1 Meet your Instructor

Lecture 2 Course Intro

Lecture 3 Big Data Intro

Lecture 4 Understanding Big Data Ecosystem

Section 2: Google Cloud Cluster Setup

Lecture 5 Google Cloud Account Setup

Lecture 6 Troubleshooting Guide (April 2025)

Lecture 7 Dataproc Cluster Setup - Part1

Lecture 8 DataProc Cluster Setup - Part2

Lecture 9 Upload Files on Google Cloud

Lecture 10 Sqoop Setup

Lecture 11 Environment Update

Section 3: Hadoop & Yarn

Lecture 12 HDFS and Hadoop Commands

Lecture 13 Yarn Cluster Overview

Section 4: Sqoop Import

Lecture 14 Sqoop Introduction

Lecture 15 Managing Target Directories

Lecture 16 Working with Different Compressions

Lecture 17 Conditional Imports

Lecture 18 Split-by and Boundary Queries

Lecture 19 Field delimeters

Lecture 20 Incremental Appends

Lecture 21 Sqoop-Hive Cluster Fix

Lecture 22 Access Hive on Google Cloud

Lecture 23 Sqoop Hive Import

Lecture 24 Sqoop List Tables/Database

Lecture 25 Sqoop Import Practice1

Lecture 26 Sqoop Import Practice2

Section 5: Sqoop Export

Lecture 27 Export from Hdfs to Mysql

Lecture 28 Export from Hive to Mysql

Lecture 29 Export Avro Compressed to Mysql

Lecture 30 Bonus Lecture: Sqoop with Airflow

Section 6: Apache Flume

Lecture 31 Flume Setup

Lecture 32 Flume Introduction & Architecture

Lecture 33 Exec Source and Logger Sink

Lecture 34 Moving data from Twitter to HDFS

Lecture 35 Moving data from NetCat to HDFS

Lecture 36 Flume Interceptors

Lecture 37 Flume Interceptor Example

Lecture 38 Flume Multi-Agent Flow

Lecture 39 Flume Consolidation

Section 7: Apache Hive

Lecture 40 Access Hive Shell on Google Cloud

Lecture 41 Hive Introduction

Lecture 42 Hive Database

Lecture 43 Hive Managed Tables

Lecture 44 Hive External Tables

Lecture 45 Hive Inserts

Lecture 46 Hive Analytics

Lecture 47 Working with Parquet

Lecture 48 Compressing Parquet

Lecture 49 Working with Fixed File Format

Lecture 50 Alter Command

Lecture 51 Hive String Functions

Lecture 52 Hive Date Functions

Lecture 53 Hive Partitioning

Lecture 54 Hive Bucketing

Section 8: Spark with Yarn & HDFS

Lecture 55 What is Apache Spark

Lecture 56 Understanding Cluster Manager (Yarn)

Lecture 57 Understanding Distributed Storage (HDFS)

Lecture 58 Running Spark on Yarn/HDFS

Lecture 59 Understanding Deploy Modes

Section 9: GCS Cluster

Lecture 60 Spark on GCS Cluster

Lecture 61 Upload Data files for Spark

Section 10: Spark Internals

Lecture 62 Drivers & Executors

Lecture 63 RDDs & Dataframes

Lecture 64 Transformation & Actions

Lecture 65 Wide & Narrow Transformations

Lecture 66 Understanding Execution Plan

Lecture 67 Different Plans by Driver

Section 11: Spark RDD : Transformation & Actions

Lecture 68 Map/FlatMap Transformation

Lecture 69 Filter/Intersection

Lecture 70 Union/Distinct Transformation

Lecture 71 GroupByKey/ Group people based on Birthday months

Lecture 72 ReduceByKey / Total Number of students in each Subject

Lecture 73 SortByKey / Sort students based on their rollno

Lecture 74 MapPartition / MapPartitionWithIndex

Lecture 75 Change number of Partitions

Lecture 76 Join / join email address based on customer name

Lecture 77 Spark Actions

Section 12: Spark RDD Practice

Lecture 78 Upload Files

Lecture 79 Scala Tuples

Lecture 80 Filter Error Logs

Lecture 81 Frequency of word in Text File

Lecture 82 Population of each city

Lecture 83 Orders placed by Customers

Lecture 84 average rating of movie

Section 13: Spark Dataframes & Spark SQL

Lecture 85 Dataframe Intro

Lecture 86 Dafaframe from Json Files

Lecture 87 Dataframe from Parquet Files

Lecture 88 Dataframe from CSV Files

Lecture 89 Dataframe from Avro File

Lecture 90 Working with XML

Lecture 91 Working with Columns

Lecture 92 Working with String

Lecture 93 Working with Dates

Lecture 94 Dataframe Filter API

Lecture 95 DataFrame API Part1

Lecture 96 DataFrame API Part2

Lecture 97 Spark SQL

Lecture 98 Working with Hive Tables in Spark

Lecture 99 Datasets versus Dataframe

Lecture 100 User Defined Functions (UDFS)

Section 14: Using Intellij IDE

Lecture 101 Intellij Setup

Lecture 102 Project Setup

Lecture 103 Writing first Spark program on IDE

Lecture 104 Understanding spark configuration

Lecture 105 Adding Actions/Transformations

Lecture 106 Understanding Execution Plan

Section 15: Running Spark on EMR (AWS Cloud)

Lecture 107 EMR Cluster Overview

Lecture 108 Cluster Setup

Lecture 109 Setting Spark Code for EMR

Lecture 110 Using Spark-submit

Lecture 111 Running Spark on EMR Cluster

Section 16: Spark with Cassandra

Lecture 112 Cassandra Course

Lecture 113 Creating Spark RDD from Cassandra Table

Lecture 114 Processing Cassandra data in Spark

Lecture 115 Cassandra Rows to Case Class

Lecture 116 Saving Spark RDD to Cassandra

Section 17: Apache Kafka

Lecture 117 Kafka Section Intro

Lecture 118 Confluent Cluster Setup

Lecture 119 Kafka Architecture

Lecture 120 Partitions and Offsets

Lecture 121 Kafka Consumer/Producers

Lecture 122 Kafka Message

Lecture 123 Kafka Serialization & Deserialization

Lecture 124 Your First Python Producer

Lecture 125 Your First Python Consumer

Section 18: Kafka Connector

Lecture 126 What is Connector?

Lecture 127 Kafka Connector - AWS S3 to Kafka

Section 19: Spark Structured Streaming & Kafka (Coming Soon)

Lecture 128 Spark streaming Intro

Section 20: MongoDB

Lecture 129 MongoDB Intro

Lecture 130 MongoDB Usecase & Limitations

Lecture 131 MongoDB Installation

Section 21: CRUD Operations

Lecture 132 Find

Lecture 133 Find With Filter

Lecture 134 Insert

Lecture 135 Update

Lecture 136 Update Continues

Lecture 137 Projections

Lecture 138 Delete

Section 22: Working with Operators

Lecture 139 In / not in Operators

Lecture 140 gte / lte Operators

Lecture 141 and / or operators

Lecture 142 regex operator

Section 23: MongoDB Compass

Lecture 143 Working with GUI

Section 24: Advanced Mongo

Lecture 144 Validation/Schema

Lecture 145 Working with Indexes

Section 25: Spark with Mongo

Lecture 146 Spark Mongo Integration

Section 26: Data Engineering Interview Preparation

Lecture 147 Data Engineer Resume template

Lecture 148 Sqoop Interview Questions

Lecture 149 Hive Interview Questions

Lecture 150 Spark Interview Questions

Lecture 151 Data Engineering common Questions

Lecture 152 Data Engineering Real project Questions

Who want to learn Big data technologies,Who want to become Data Engineers