Spark Sql And Spark 3 Using Scala Hands-On With Labs
Last updated 2/2022
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 8.75 GB | Duration: 24h 12m
Last updated 2/2022
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 8.75 GB | Duration: 24h 12m
A comprehensive course on Spark SQL as well as Data Frame APIs using Scala with complementary lab access
What you'll learn
All the HDFS Commands that are relevant to validate files and folders in HDFS.
Enough Scala to work Data Engineering Projects using Scala as Programming Language
Spark Dataframe APIs to solve the problems using Dataframe style APIs.
Basic Transformations such as Projection, Filtering, Total as well as Aggregations by Keys using Spark Dataframe APIs
Inner as well as outer joins using Spark Data Frame APIs
Ability to use Spark SQL to solve the problems using SQL style syntax.
Basic Transformations such as Projection, Filtering, Total as well as Aggregations by Keys using Spark SQL
Inner as well as outer joins using Spark SQL
Basic DDL to create and manage tables using Spark SQL
Basic DML or CRUD Operations using Spark SQL
Create and Manage Partitioned Tables using Spark SQL
Manipulating Data using Spark SQL Functions
Advanced Analytical or Windowing Functions to perform aggregations and ranking using Spark SQL
Requirements
Basic programming skills
Self support lab (Instructions provided) or ITVersity lab at additional cost for appropriate environment.
Minimum memory required based on the environment you are using with 64 bit operating system
4 GB RAM with access to proper clusters or 16 GB RAM with virtual machines such as Cloudera QuickStart VM
Description
As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Spark Data Frame APIs using Scala as a Programming language. This course used to be a CCA 175 Spark and Hadoop Developer course for the preparation of the Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Spark SQL and Spark 3 using Scala as it covers industry-relevant topics beyond the scope of certification.About Data EngineeringData Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.I have prepared this course for anyone who would like to transition into a Data Engineer role using Spark (Scala). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache Spark.Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself.Setup of Single Node Big Data ClusterMany of you would like to transition to Big Data from Conventional Technologies such as Mainframes, Oracle PL/SQL, etc and you might not have access to Big Data Clusters. It is very important for you set up the environment in the right manner. Don't worry if you do not have the cluster handy, we will guide you through support via Udemy Q&A.Setup Ubuntu-based AWS Cloud9 Instance with the right configurationEnsure Docker is setupSetup Jupyter Lab and other key componentsSetup and Validate Hadoop, Hive, YARN, and SparkAre you feeling a bit overwhelmed about setting up the environment? Don't worry!!! We will provide complementary lab access for up to 2 months. Here are the details.Training using an interactive environment. You will get 2 weeks of lab access, to begin with. If you like the environment, and acknowledge it by providing a 5* rating and feedback, the lab access will be extended to additional 6 weeks (2 months). Feel free to send an email to support@itversity.com to get complementary lab access. Also, if your employer provides a multi-node environment, we will help you set up the material for the practice as part of the live session. On top of Q&A Support, we also provide required support via live sessions.A quick recap of ScalaThis course requires a decent knowledge of Scala. To make sure you understand Spark from a Data Engineering perspective, we added a module to quickly warm up with Scala. If you are not familiar with Scala, then we suggest you go through relevant courses on Scala as Programming Language.Data Engineering using Spark SQLLet us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.Getting Started with Spark SQLBasic Transformations using Spark SQLManaging Spark Metastore Tables - Basic DDL and DMLManaging Spark Metastore Tables Tables - DML and PartitioningOverview of Spark SQL FunctionsWindowing Functions using Spark SQLData Engineering using Spark Data Frame APIsSpark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.Data Processing Overview using Spark Data Frame APIs leveraging Scala as Programming LanguageProcessing Column Data using Spark Data Frame APIs leveraging Scala as Programming LanguageBasic Transformations using Spark Data Frame APIs leveraging Scala as Programming Language - Filtering, Aggregations, and SortingJoining Data Sets using Spark Data Frame APIs leveraging Scala as Programming LanguageAll the demos are given on our state-of-the-art Big Data cluster. You can avail of one-month complimentary lab access by reaching out to support@itversity.com with a Udemy receipt.
Overview
Section 1: Introduction
Lecture 1 CCA 175 Spark and Hadoop Developer - Curriculum
Section 2: Setting up Environment using AWS Cloud9
Lecture 2 Getting Started with Cloud9
Lecture 3 Creating Cloud9 Environment
Lecture 4 Warming up with Cloud9 IDE
Lecture 5 Overview of EC2 related to Cloud9
Lecture 6 Opening ports for Cloud9 Instance
Lecture 7 Associating Elastic IPs to Cloud9 Instance
Lecture 8 Increase EBS Volume Size of Cloud9 Instance
Lecture 9 Setup Jupyter Lab on Cloud9
Lecture 10 [Commands] Setup Jupyter Lab on Cloud9
Section 3: Setting up Environment - Overview of GCP and Provision Ubuntu VM
Lecture 11 Signing up for GCP
Lecture 12 Overview of GCP Web Console
Lecture 13 Overview of GCP Pricing
Lecture 14 Provision Ubuntu VM from GCP
Lecture 15 Setup Docker
Lecture 16 Why we are setting up Python and Jupyter Lab for Scala related course?
Lecture 17 Validating Python
Lecture 18 Setup Jupyter Lab
Section 4: Setup Hadoop on Single Node Cluster
Lecture 19 Introduction to Single Node Hadoop Cluster
Lecture 20 Setup Prerequisties
Lecture 21 [Commands] - Setup Prerequisites
Lecture 22 Setup Password less login
Lecture 23 [Commands] - Setup Password less login
Lecture 24 Download and Install Hadoop
Lecture 25 [Commands] - Download and Install Hadoop
Lecture 26 Configure Hadoop HDFS
Lecture 27 [Commands] - Configure Hadoop HDFS
Lecture 28 Start and Validate HDFS
Lecture 29 [Commands] - Start and Validate HDFS
Lecture 30 Configure Hadoop YARN
Lecture 31 [Commands] - Configure Hadoop YARN
Lecture 32 Start and Validate YARN
Lecture 33 [Commands] - Start and Validate YARN
Lecture 34 Managing Single Node Hadoop
Lecture 35 [Commands] - Managing Single Node Hadoop
Section 5: Setup Hive and Spark on Single Node Cluster
Lecture 36 Setup Data Sets for Practice
Lecture 37 [Commands] - Setup Data Sets for Practice
Lecture 38 Download and Install Hive
Lecture 39 [Commands] - Download and Install Hive
Lecture 40 Setup Database for Hive Metastore
Lecture 41 [Commands] - Setup Database for Hive Metastore
Lecture 42 Configure and Setup Hive Metastore
Lecture 43 [Commands] - Configure and Setup Hive Metastore
Lecture 44 Launch and Validate Hive
Lecture 45 [Commands] - Launch and Validate Hive
Lecture 46 Scripts to Manage Single Node Cluster
Lecture 47 [Commands] - Scripts to Manage Single Node Cluster
Lecture 48 Download and Install Spark 2
Lecture 49 [Commands] - Download and Install Spark 2
Lecture 50 Configure Spark 2
Lecture 51 [Commands] - Configure Spark 2
Lecture 52 Validate Spark 2 using CLIs
Lecture 53 [Commands] - Validate Spark 2 using CLIs
Lecture 54 Validate Jupyter Lab Setup
Lecture 55 [Commands] - Validate Jupyter Lab Setup
Lecture 56 Intergrate Spark 2 with Jupyter Lab
Lecture 57 [Commands] - Intergrate Spark 2 with Jupyter Lab
Lecture 58 Download and Install Spark 3
Lecture 59 [Commands] - Download and Install Spark 3
Lecture 60 Configure Spark 3
Lecture 61 [Commands] - Configure Spark 3
Lecture 62 Validate Spark 3 using CLIs
Lecture 63 [Commands] - Validate Spark 3 using CLIs
Lecture 64 Intergrate Spark 3 with Jupyter Lab
Lecture 65 [Commands] - Intergrate Spark 3 with Jupyter Lab
Section 6: Scala Fundamentals
Lecture 66 Introduction and Setting up of Scala
Lecture 67 Setup Scala on Windows
Lecture 68 Basic Programming Constructs
Lecture 69 Functions
Lecture 70 Object Oriented Concepts - Classes
Lecture 71 Object Oriented Concepts - Objects
Lecture 72 Object Oriented Concepts - Case Classes
Lecture 73 Collections - Seq, Set and Map
Lecture 74 Basic Map Reduce Operations
Lecture 75 Setting up Data Sets for Basic I/O Operations
Lecture 76 Basic I/O Operations and using Scala Collections APIs
Lecture 77 Tuples
Lecture 78 Development Cycle - Create Program File
Lecture 79 Development Cycle - Compile source code to jar using SBT
Lecture 80 Development Cycle - Setup SBT on Windows
Lecture 81 Development Cycle - Compile changes and run jar with arguments
Lecture 82 Development Cycle - Setup IntelliJ with Scala
Lecture 83 Development Cycle - Develop Scala application using SBT in IntelliJ
Section 7: Overview of Hadoop HDFS Commands
Lecture 84 Getting help or usage of HDFS Commands
Lecture 85 Listing HDFS Files
Lecture 86 Managing HDFS Directories
Lecture 87 Copying files from local to HDFS
Lecture 88 Copying files from HDFS to local
Lecture 89 Getting File Metadata
Lecture 90 Previewing Data in HDFS File
Lecture 91 HDFS Block Size
Lecture 92 HDFS Replication Factor
Lecture 93 Getting HDFS Storage Usage
Lecture 94 Using HDFS Stat Commands
Lecture 95 HDFS File Permissions
Lecture 96 Overriding Properties
Section 8: Apache Spark 2 using Scala - Data Processing - Overview
Lecture 97 Introduction for the module
Lecture 98 Starting Spark Context using spark-shell
Lecture 99 Overview of Spark read APIs
Lecture 100 Previewing Schema and Data using Spark APIs
Lecture 101 Overview of Spark Data Frame APIs
Lecture 102 Overview of Functions to Manipulate Data in Spark Data Frames
Lecture 103 Overview of Spark Write APIs
Section 9: Apache Spark 2 using Scala - Processing Column Data using Pre-defined Functions
Lecture 104 Introduction to Pre-defined Functions
Lecture 105 Creating Spark Session Object in Notebook
Lecture 106 Create Dummy Data Frames for Practice
Lecture 107 Categories of Functions on Spark DAta Frame Columns
Lecture 108 Using Spark Special Functions - col
Lecture 109 Using Spark Special Functions - lit
Lecture 110 Manipulating String Columns using Spark Functions - Case Conversion and Length
Lecture 111 Manipulating String Columns using Spark Functions - substring
Lecture 112 Manipulating String Columns using Spark Functions - split
Lecture 113 Manipulating String Columns using Spark Functions - Concatenating Strings
Lecture 114 Manipulating String Columns using Spark Functions - Padding Strings
Lecture 115 Manipulating String Columns using Spark Functions - Trimming unwanted characters
Lecture 116 Date and Time Functions in Spark - Overview
Lecture 117 Date and Time Functions in Spark - Date Arithmetic
Lecture 118 Date and Time Functions in Spark - Using trunc and date_trunc
Lecture 119 Date and Time Functions in Spark - Using date_format and other functions
Lecture 120 Date and Time Functions in Spark - dealing with unix timestamp
Lecture 121 Pre-defined Functions in Spark - Conclusion
Section 10: Apache Spark 2 using Scala - Basic Transformations using Data Frames
Lecture 122 Introduction to Basic Transformations using Data Frame APIs
Lecture 123 Starting Spark Context
Lecture 124 Overview of Filtering using Spark Data Frame APIs
Lecture 125 Filtering Data from Spark Data Frames - Reading Data and Understanding Schema
Lecture 126 Filtering Data from Spark Data Frames - Task 1 - Equal Operator
Lecture 127 Filtering Data from Spark Data Frames - Task 2 - Comparison Operators
Lecture 128 Filtering Data from Spark Data Frames - Task 3 - Boolean AND
Lecture 129 Filtering Data from Spark Data Frames - Task 4 - IN Operator
Lecture 130 Filtering Data from Spark Data Frames - Task 5 - Between and Like
Lecture 131 Filtering Data from Spark Data Frames - Task 6 - Using functions in Filter
Lecture 132 Overview of Aggregations using Spark Data Frame APIs
Lecture 133 Overview of Sorting using Spark Data Frame APIs
Lecture 134 Solution - Get Delayed Counts using Spark Data Frame APIs - Part 1
Lecture 135 Solution - Get Delayed Counts using Spark Data Frame APIs - Part 2
Lecture 136 Solution - Getting Delayed Counts By Date using Spark Data Frame APIs
Section 11: Apache Spark 2 using Scala - Joining Data Sets
Lecture 137 Prepare and Validate Data Sets
Lecture 138 Starting Spark Session or Spark Context
Lecture 139 Analyze Data Sets for Joins using Spark Data Frame APIs
Lecture 140 Eliminate Duplicate records from Data Frame using Spark Data Frame APIs
Lecture 141 Recap of Basic Transformations using Spark Data Frame APIs
Lecture 142 Joining Data Sets using Spark Data Frame APIs - Problem Statements
Lecture 143 Overview of Joins using Spark Data Frame APIs
Lecture 144 Inner Join using Spark Data Fr - Get number of flights departed from US airports
Lecture 145 Inner Join using Spark Data Fram - Get number of flights departed from US States
Lecture 146 Outer Join using Spark Data Frame APIs - Get Aiports - Never Used
Section 12: Apache Spark using SQL - Getting Started
Lecture 147 Getting Started with Spark SQL - Overview
Lecture 148 Overview of Spark Documentation
Lecture 149 Launching and using Spark SQL CLI
Lecture 150 Overview of Spark SQL Properties
Lecture 151 Running OS Commands using Spark SQL
Lecture 152 Understanding Spark Metastore Warehouse Directory
Lecture 153 Managing Spark Metastore Databases
Lecture 154 Managing Spark Metastore Tables
Lecture 155 Retrieve Metadata of Spark Metastore Tables
Lecture 156 Role of Spark Metastore or Hive Metastore
Lecture 157 Exercise - Getting Started with Spark SQL
Section 13: Apache Spark using SQL - Basic Transformations
Lecture 158 Basic Transformation using Spark SQL - Introduction
Lecture 159 Spark SQL - Overview
Lecture 160 Define Problem Statement for Basic Transformations using Spark SQL
Lecture 161 Prepare or Create Tables using Spark SQL
Lecture 162 Projecting or Selecting Data using Spark SQL
Lecture 163 Filtering Data using Spark SQL
Lecture 164 Joining Tables using Spark SQL - Inner
Lecture 165 Joining Tables using Spark SQL - Outer
Lecture 166 Aggregating Data using Spark SQL
Lecture 167 Sorting Data using Spark SQL
Lecture 168 Conclusion - Final Solution using Spark SQL
Section 14: Apache Spark using SQL - Basic DDL and DML
Lecture 169 Introduction to Basic DDL and DML using Spark SQL
Lecture 170 Create Spark Metastore Tables using Spark SQL
Lecture 171 Overview of Data Types for Spark Metastore Table Columns
Lecture 172 Adding Comments to Spark Metastore Tables using Spark SQL
Lecture 173 Loading Data Into Spark Metastore Tables using Spark SQL - Local
Lecture 174 Loading Data Into Spark Metastore Tables using Spark SQL - HDFS
Lecture 175 Loading Data into Spark Metastore Tables using Spark SQL - Append and Overwrite
Lecture 176 Creating External Tables in Spark Metastore using Spark SQL
Lecture 177 Managed Spark Metastore Tables vs External Spark Metastore Tables
Lecture 178 Overview of Spark Metastore Table File Formats
Lecture 179 Drop Spark Metastore Tables and Databases
Lecture 180 Truncating Spark Metastore Tables
Lecture 181 Exercise - Managed Spark Metastore Tables
Section 15: Apache Spark using SQL - DML and Partitioning
Lecture 182 Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
Lecture 183 Introduction to Partitioning of Spark Metastore Tables using Spark SQL
Lecture 184 Creating Spark Metastore Tables using Parquet File Format
Lecture 185 Load vs. Insert into Spark Metastore Tables using Spark SQL
Lecture 186 Inserting Data using Stage Spark Metastore Table using Spark SQL
Lecture 187 Creating Partitioned Spark Metastore Tables using Spark SQL
Lecture 188 Adding Partitions to Spark Metastore Tables using Spark SQL
Lecture 189 Loading Data into Partitioned Spark Metastore Tables using Spark SQL
Lecture 190 Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
Lecture 191 Using Dynamic Partition Mode to insert data into Spark Metastore Tables
Lecture 192 Exercise - Partitioned Spark Metastore Tables using Spark SQL
Section 16: Apache Spark using SQL - Pre-defined Functions
Lecture 193 Introduction - Overview of Spark SQL Functions
Lecture 194 Overview of Pre-defined Functions using Spark SQL
Lecture 195 Validating Functions using Spark SQL
Lecture 196 String Manipulation Functions using Spark SQL
Lecture 197 Date Manipulation Functions using Spark SQL
Lecture 198 Overview of Numeric Functions using Spark SQL
Lecture 199 Data Type Conversion using Spark SQL
Lecture 200 Dealing with Nulls using Spark SQL
Lecture 201 Using CASE and WHEN using Spark SQL
Lecture 202 Query Example - Word Count using Spark SQL
Section 17: Apache Spark using SQL - Pre-defined Functions - Exercises
Lecture 203 Prepare Users Table using Spark SQL
Lecture 204 Exercise 1 - Get number of users created per year
Lecture 205 Exercise 2 - Get the day name of the birth days of users
Lecture 206 Exercise 3 - Get the names and email ids of users added in the year 2019
Lecture 207 Exercise 4 - Get the number of users by gender
Lecture 208 Exercise 5 - Get last 4 digits of unique ids
Lecture 209 Exercise 6 - Get the count of users based up on country code
Section 18: Apache Spark using SQL - Windowing Functions
Lecture 210 Introduction to Windowing Functions using Spark SQL
Lecture 211 Prepare HR Database in Spark Metastore using Spark SQL
Lecture 212 Overview of Windowing Functions using Spark SQL
Lecture 213 Aggregations using Windowing Functions using Spark SQL
Lecture 214 LEAD or LAG Functions using Spark SQL
Lecture 215 Getting first and last values using Spark SQL
Lecture 216 Ranking using Windowing Functions in Spark SQL
Lecture 217 Order of execution of Spark SQL Queries
Lecture 218 Overview of Subqueries using Spark SQL
Lecture 219 Filtering Window Function Results using Spark SQL
Section 19: Sample scenarios with solutions
Lecture 220 Introduction to Sample Scenarios and Solutions
Lecture 221 Problem Statements - General Guidelines
Lecture 222 Initializing the job - General Guidelines
Lecture 223 Getting crime count per type per month - Understanding Data
Lecture 224 Getting crime count per type per month - Implementing the logic - Core API
Lecture 225 Getting crime count per type per month - Implementing the logic - Data Frames
Lecture 226 Getting crime count per type per month - Validating Output
Lecture 227 Get inactive customers - using Core Spark API (leftOuterJoin)
Lecture 228 Get inactive customers - using Data Frames and SQL
Lecture 229 Get top 3 crimes in RESIDENCE - using Core Spark API
Lecture 230 Get top 3 crimes in RESIDENCE - using Data Frame and SQL
Lecture 231 Convert NYSE data from text file format to parquet file format
Lecture 232 Get word count - with custom control arguments, num keys and file format
Any IT aspirant/professional willing to learn Data Engineering using Apache Spark,Python Developers who want to learn Spark using Scala to add additional skill to be a Data Engineer,Java or Scala Developers to learn Spark using Scala to add Data Engineering Skills to their profile