Tags
Language
Tags
December 2024
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 1 2 3 4

Spark Sql And Spark 3 Using Scala Hands-On With Labs

Posted By: ELK1nG
Spark Sql And Spark 3 Using Scala Hands-On With Labs

Spark Sql And Spark 3 Using Scala Hands-On With Labs
Last updated 2/2022
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 8.75 GB | Duration: 24h 12m

A comprehensive course on Spark SQL as well as Data Frame APIs using Scala with complementary lab access

What you'll learn
All the HDFS Commands that are relevant to validate files and folders in HDFS.
Enough Scala to work Data Engineering Projects using Scala as Programming Language
Spark Dataframe APIs to solve the problems using Dataframe style APIs.
Basic Transformations such as Projection, Filtering, Total as well as Aggregations by Keys using Spark Dataframe APIs
Inner as well as outer joins using Spark Data Frame APIs
Ability to use Spark SQL to solve the problems using SQL style syntax.
Basic Transformations such as Projection, Filtering, Total as well as Aggregations by Keys using Spark SQL
Inner as well as outer joins using Spark SQL
Basic DDL to create and manage tables using Spark SQL
Basic DML or CRUD Operations using Spark SQL
Create and Manage Partitioned Tables using Spark SQL
Manipulating Data using Spark SQL Functions
Advanced Analytical or Windowing Functions to perform aggregations and ranking using Spark SQL
Requirements
Basic programming skills
Self support lab (Instructions provided) or ITVersity lab at additional cost for appropriate environment.
Minimum memory required based on the environment you are using with 64 bit operating system
4 GB RAM with access to proper clusters or 16 GB RAM with virtual machines such as Cloudera QuickStart VM
Description
As part of this course, you will learn all the key skills to build Data Engineering Pipelines using Spark SQL and Spark Data Frame APIs using Scala as a Programming language. This course used to be a CCA 175 Spark and Hadoop Developer course for the preparation of the Certification Exam. As of 10/31/2021, the exam is sunset and we have renamed it to Spark SQL and Spark 3 using Scala as it covers industry-relevant topics beyond the scope of certification.About Data EngineeringData Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc. Apache Spark is evolved as a leading technology to take care of Data Engineering at scale.I have prepared this course for anyone who would like to transition into a Data Engineer role using Spark (Scala). I myself am a proven Data Engineering Solution Architect with proven experience in designing solutions using Apache Spark.Let us go through the details about what you will be learning in this course. Keep in mind that the course is created with a lot of hands-on tasks which will give you enough practice using the right tools. Also, there are tons of tasks and exercises to evaluate yourself.Setup of Single Node Big Data ClusterMany of you would like to transition to Big Data from Conventional Technologies such as Mainframes, Oracle PL/SQL, etc and you might not have access to Big Data Clusters. It is very important for you set up the environment in the right manner. Don't worry if you do not have the cluster handy, we will guide you through support via Udemy Q&A.Setup Ubuntu-based AWS Cloud9 Instance with the right configurationEnsure Docker is setupSetup Jupyter Lab and other key componentsSetup and Validate Hadoop, Hive, YARN, and SparkAre you feeling a bit overwhelmed about setting up the environment? Don't worry!!! We will provide complementary lab access for up to 2 months. Here are the details.Training using an interactive environment. You will get 2 weeks of lab access, to begin with. If you like the environment, and acknowledge it by providing a 5* rating and feedback, the lab access will be extended to additional 6 weeks (2 months). Feel free to send an email to support@itversity.com to get complementary lab access. Also, if your employer provides a multi-node environment, we will help you set up the material for the practice as part of the live session. On top of Q&A Support, we also provide required support via live sessions.A quick recap of ScalaThis course requires a decent knowledge of Scala. To make sure you understand Spark from a Data Engineering perspective, we added a module to quickly warm up with Scala. If you are not familiar with Scala, then we suggest you go through relevant courses on Scala as Programming Language.Data Engineering using Spark SQLLet us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.Getting Started with Spark SQLBasic Transformations using Spark SQLManaging Spark Metastore Tables - Basic DDL and DMLManaging Spark Metastore Tables Tables - DML and PartitioningOverview of Spark SQL FunctionsWindowing Functions using Spark SQLData Engineering using Spark Data Frame APIsSpark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.Data Processing Overview using Spark Data Frame APIs leveraging Scala as Programming LanguageProcessing Column Data using Spark Data Frame APIs leveraging Scala as Programming LanguageBasic Transformations using Spark Data Frame APIs leveraging Scala as Programming Language - Filtering, Aggregations, and SortingJoining Data Sets using Spark Data Frame APIs leveraging Scala as Programming LanguageAll the demos are given on our state-of-the-art Big Data cluster. You can avail of one-month complimentary lab access by reaching out to support@itversity.com with a Udemy receipt.

Overview

Section 1: Introduction

Lecture 1 CCA 175 Spark and Hadoop Developer - Curriculum

Section 2: Setting up Environment using AWS Cloud9

Lecture 2 Getting Started with Cloud9

Lecture 3 Creating Cloud9 Environment

Lecture 4 Warming up with Cloud9 IDE

Lecture 5 Overview of EC2 related to Cloud9

Lecture 6 Opening ports for Cloud9 Instance

Lecture 7 Associating Elastic IPs to Cloud9 Instance

Lecture 8 Increase EBS Volume Size of Cloud9 Instance

Lecture 9 Setup Jupyter Lab on Cloud9

Lecture 10 [Commands] Setup Jupyter Lab on Cloud9

Section 3: Setting up Environment - Overview of GCP and Provision Ubuntu VM

Lecture 11 Signing up for GCP

Lecture 12 Overview of GCP Web Console

Lecture 13 Overview of GCP Pricing

Lecture 14 Provision Ubuntu VM from GCP

Lecture 15 Setup Docker

Lecture 16 Why we are setting up Python and Jupyter Lab for Scala related course?

Lecture 17 Validating Python

Lecture 18 Setup Jupyter Lab

Section 4: Setup Hadoop on Single Node Cluster

Lecture 19 Introduction to Single Node Hadoop Cluster

Lecture 20 Setup Prerequisties

Lecture 21 [Commands] - Setup Prerequisites

Lecture 22 Setup Password less login

Lecture 23 [Commands] - Setup Password less login

Lecture 24 Download and Install Hadoop

Lecture 25 [Commands] - Download and Install Hadoop

Lecture 26 Configure Hadoop HDFS

Lecture 27 [Commands] - Configure Hadoop HDFS

Lecture 28 Start and Validate HDFS

Lecture 29 [Commands] - Start and Validate HDFS

Lecture 30 Configure Hadoop YARN

Lecture 31 [Commands] - Configure Hadoop YARN

Lecture 32 Start and Validate YARN

Lecture 33 [Commands] - Start and Validate YARN

Lecture 34 Managing Single Node Hadoop

Lecture 35 [Commands] - Managing Single Node Hadoop

Section 5: Setup Hive and Spark on Single Node Cluster

Lecture 36 Setup Data Sets for Practice

Lecture 37 [Commands] - Setup Data Sets for Practice

Lecture 38 Download and Install Hive

Lecture 39 [Commands] - Download and Install Hive

Lecture 40 Setup Database for Hive Metastore

Lecture 41 [Commands] - Setup Database for Hive Metastore

Lecture 42 Configure and Setup Hive Metastore

Lecture 43 [Commands] - Configure and Setup Hive Metastore

Lecture 44 Launch and Validate Hive

Lecture 45 [Commands] - Launch and Validate Hive

Lecture 46 Scripts to Manage Single Node Cluster

Lecture 47 [Commands] - Scripts to Manage Single Node Cluster

Lecture 48 Download and Install Spark 2

Lecture 49 [Commands] - Download and Install Spark 2

Lecture 50 Configure Spark 2

Lecture 51 [Commands] - Configure Spark 2

Lecture 52 Validate Spark 2 using CLIs

Lecture 53 [Commands] - Validate Spark 2 using CLIs

Lecture 54 Validate Jupyter Lab Setup

Lecture 55 [Commands] - Validate Jupyter Lab Setup

Lecture 56 Intergrate Spark 2 with Jupyter Lab

Lecture 57 [Commands] - Intergrate Spark 2 with Jupyter Lab

Lecture 58 Download and Install Spark 3

Lecture 59 [Commands] - Download and Install Spark 3

Lecture 60 Configure Spark 3

Lecture 61 [Commands] - Configure Spark 3

Lecture 62 Validate Spark 3 using CLIs

Lecture 63 [Commands] - Validate Spark 3 using CLIs

Lecture 64 Intergrate Spark 3 with Jupyter Lab

Lecture 65 [Commands] - Intergrate Spark 3 with Jupyter Lab

Section 6: Scala Fundamentals

Lecture 66 Introduction and Setting up of Scala

Lecture 67 Setup Scala on Windows

Lecture 68 Basic Programming Constructs

Lecture 69 Functions

Lecture 70 Object Oriented Concepts - Classes

Lecture 71 Object Oriented Concepts - Objects

Lecture 72 Object Oriented Concepts - Case Classes

Lecture 73 Collections - Seq, Set and Map

Lecture 74 Basic Map Reduce Operations

Lecture 75 Setting up Data Sets for Basic I/O Operations

Lecture 76 Basic I/O Operations and using Scala Collections APIs

Lecture 77 Tuples

Lecture 78 Development Cycle - Create Program File

Lecture 79 Development Cycle - Compile source code to jar using SBT

Lecture 80 Development Cycle - Setup SBT on Windows

Lecture 81 Development Cycle - Compile changes and run jar with arguments

Lecture 82 Development Cycle - Setup IntelliJ with Scala

Lecture 83 Development Cycle - Develop Scala application using SBT in IntelliJ

Section 7: Overview of Hadoop HDFS Commands

Lecture 84 Getting help or usage of HDFS Commands

Lecture 85 Listing HDFS Files

Lecture 86 Managing HDFS Directories

Lecture 87 Copying files from local to HDFS

Lecture 88 Copying files from HDFS to local

Lecture 89 Getting File Metadata

Lecture 90 Previewing Data in HDFS File

Lecture 91 HDFS Block Size

Lecture 92 HDFS Replication Factor

Lecture 93 Getting HDFS Storage Usage

Lecture 94 Using HDFS Stat Commands

Lecture 95 HDFS File Permissions

Lecture 96 Overriding Properties

Section 8: Apache Spark 2 using Scala - Data Processing - Overview

Lecture 97 Introduction for the module

Lecture 98 Starting Spark Context using spark-shell

Lecture 99 Overview of Spark read APIs

Lecture 100 Previewing Schema and Data using Spark APIs

Lecture 101 Overview of Spark Data Frame APIs

Lecture 102 Overview of Functions to Manipulate Data in Spark Data Frames

Lecture 103 Overview of Spark Write APIs

Section 9: Apache Spark 2 using Scala - Processing Column Data using Pre-defined Functions

Lecture 104 Introduction to Pre-defined Functions

Lecture 105 Creating Spark Session Object in Notebook

Lecture 106 Create Dummy Data Frames for Practice

Lecture 107 Categories of Functions on Spark DAta Frame Columns

Lecture 108 Using Spark Special Functions - col

Lecture 109 Using Spark Special Functions - lit

Lecture 110 Manipulating String Columns using Spark Functions - Case Conversion and Length

Lecture 111 Manipulating String Columns using Spark Functions - substring

Lecture 112 Manipulating String Columns using Spark Functions - split

Lecture 113 Manipulating String Columns using Spark Functions - Concatenating Strings

Lecture 114 Manipulating String Columns using Spark Functions - Padding Strings

Lecture 115 Manipulating String Columns using Spark Functions - Trimming unwanted characters

Lecture 116 Date and Time Functions in Spark - Overview

Lecture 117 Date and Time Functions in Spark - Date Arithmetic

Lecture 118 Date and Time Functions in Spark - Using trunc and date_trunc

Lecture 119 Date and Time Functions in Spark - Using date_format and other functions

Lecture 120 Date and Time Functions in Spark - dealing with unix timestamp

Lecture 121 Pre-defined Functions in Spark - Conclusion

Section 10: Apache Spark 2 using Scala - Basic Transformations using Data Frames

Lecture 122 Introduction to Basic Transformations using Data Frame APIs

Lecture 123 Starting Spark Context

Lecture 124 Overview of Filtering using Spark Data Frame APIs

Lecture 125 Filtering Data from Spark Data Frames - Reading Data and Understanding Schema

Lecture 126 Filtering Data from Spark Data Frames - Task 1 - Equal Operator

Lecture 127 Filtering Data from Spark Data Frames - Task 2 - Comparison Operators

Lecture 128 Filtering Data from Spark Data Frames - Task 3 - Boolean AND

Lecture 129 Filtering Data from Spark Data Frames - Task 4 - IN Operator

Lecture 130 Filtering Data from Spark Data Frames - Task 5 - Between and Like

Lecture 131 Filtering Data from Spark Data Frames - Task 6 - Using functions in Filter

Lecture 132 Overview of Aggregations using Spark Data Frame APIs

Lecture 133 Overview of Sorting using Spark Data Frame APIs

Lecture 134 Solution - Get Delayed Counts using Spark Data Frame APIs - Part 1

Lecture 135 Solution - Get Delayed Counts using Spark Data Frame APIs - Part 2

Lecture 136 Solution - Getting Delayed Counts By Date using Spark Data Frame APIs

Section 11: Apache Spark 2 using Scala - Joining Data Sets

Lecture 137 Prepare and Validate Data Sets

Lecture 138 Starting Spark Session or Spark Context

Lecture 139 Analyze Data Sets for Joins using Spark Data Frame APIs

Lecture 140 Eliminate Duplicate records from Data Frame using Spark Data Frame APIs

Lecture 141 Recap of Basic Transformations using Spark Data Frame APIs

Lecture 142 Joining Data Sets using Spark Data Frame APIs - Problem Statements

Lecture 143 Overview of Joins using Spark Data Frame APIs

Lecture 144 Inner Join using Spark Data Fr - Get number of flights departed from US airports

Lecture 145 Inner Join using Spark Data Fram - Get number of flights departed from US States

Lecture 146 Outer Join using Spark Data Frame APIs - Get Aiports - Never Used

Section 12: Apache Spark using SQL - Getting Started

Lecture 147 Getting Started with Spark SQL - Overview

Lecture 148 Overview of Spark Documentation

Lecture 149 Launching and using Spark SQL CLI

Lecture 150 Overview of Spark SQL Properties

Lecture 151 Running OS Commands using Spark SQL

Lecture 152 Understanding Spark Metastore Warehouse Directory

Lecture 153 Managing Spark Metastore Databases

Lecture 154 Managing Spark Metastore Tables

Lecture 155 Retrieve Metadata of Spark Metastore Tables

Lecture 156 Role of Spark Metastore or Hive Metastore

Lecture 157 Exercise - Getting Started with Spark SQL

Section 13: Apache Spark using SQL - Basic Transformations

Lecture 158 Basic Transformation using Spark SQL - Introduction

Lecture 159 Spark SQL - Overview

Lecture 160 Define Problem Statement for Basic Transformations using Spark SQL

Lecture 161 Prepare or Create Tables using Spark SQL

Lecture 162 Projecting or Selecting Data using Spark SQL

Lecture 163 Filtering Data using Spark SQL

Lecture 164 Joining Tables using Spark SQL - Inner

Lecture 165 Joining Tables using Spark SQL - Outer

Lecture 166 Aggregating Data using Spark SQL

Lecture 167 Sorting Data using Spark SQL

Lecture 168 Conclusion - Final Solution using Spark SQL

Section 14: Apache Spark using SQL - Basic DDL and DML

Lecture 169 Introduction to Basic DDL and DML using Spark SQL

Lecture 170 Create Spark Metastore Tables using Spark SQL

Lecture 171 Overview of Data Types for Spark Metastore Table Columns

Lecture 172 Adding Comments to Spark Metastore Tables using Spark SQL

Lecture 173 Loading Data Into Spark Metastore Tables using Spark SQL - Local

Lecture 174 Loading Data Into Spark Metastore Tables using Spark SQL - HDFS

Lecture 175 Loading Data into Spark Metastore Tables using Spark SQL - Append and Overwrite

Lecture 176 Creating External Tables in Spark Metastore using Spark SQL

Lecture 177 Managed Spark Metastore Tables vs External Spark Metastore Tables

Lecture 178 Overview of Spark Metastore Table File Formats

Lecture 179 Drop Spark Metastore Tables and Databases

Lecture 180 Truncating Spark Metastore Tables

Lecture 181 Exercise - Managed Spark Metastore Tables

Section 15: Apache Spark using SQL - DML and Partitioning

Lecture 182 Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL

Lecture 183 Introduction to Partitioning of Spark Metastore Tables using Spark SQL

Lecture 184 Creating Spark Metastore Tables using Parquet File Format

Lecture 185 Load vs. Insert into Spark Metastore Tables using Spark SQL

Lecture 186 Inserting Data using Stage Spark Metastore Table using Spark SQL

Lecture 187 Creating Partitioned Spark Metastore Tables using Spark SQL

Lecture 188 Adding Partitions to Spark Metastore Tables using Spark SQL

Lecture 189 Loading Data into Partitioned Spark Metastore Tables using Spark SQL

Lecture 190 Inserting Data into Partitions of Spark Metastore Tables using Spark SQL

Lecture 191 Using Dynamic Partition Mode to insert data into Spark Metastore Tables

Lecture 192 Exercise - Partitioned Spark Metastore Tables using Spark SQL

Section 16: Apache Spark using SQL - Pre-defined Functions

Lecture 193 Introduction - Overview of Spark SQL Functions

Lecture 194 Overview of Pre-defined Functions using Spark SQL

Lecture 195 Validating Functions using Spark SQL

Lecture 196 String Manipulation Functions using Spark SQL

Lecture 197 Date Manipulation Functions using Spark SQL

Lecture 198 Overview of Numeric Functions using Spark SQL

Lecture 199 Data Type Conversion using Spark SQL

Lecture 200 Dealing with Nulls using Spark SQL

Lecture 201 Using CASE and WHEN using Spark SQL

Lecture 202 Query Example - Word Count using Spark SQL

Section 17: Apache Spark using SQL - Pre-defined Functions - Exercises

Lecture 203 Prepare Users Table using Spark SQL

Lecture 204 Exercise 1 - Get number of users created per year

Lecture 205 Exercise 2 - Get the day name of the birth days of users

Lecture 206 Exercise 3 - Get the names and email ids of users added in the year 2019

Lecture 207 Exercise 4 - Get the number of users by gender

Lecture 208 Exercise 5 - Get last 4 digits of unique ids

Lecture 209 Exercise 6 - Get the count of users based up on country code

Section 18: Apache Spark using SQL - Windowing Functions

Lecture 210 Introduction to Windowing Functions using Spark SQL

Lecture 211 Prepare HR Database in Spark Metastore using Spark SQL

Lecture 212 Overview of Windowing Functions using Spark SQL

Lecture 213 Aggregations using Windowing Functions using Spark SQL

Lecture 214 LEAD or LAG Functions using Spark SQL

Lecture 215 Getting first and last values using Spark SQL

Lecture 216 Ranking using Windowing Functions in Spark SQL

Lecture 217 Order of execution of Spark SQL Queries

Lecture 218 Overview of Subqueries using Spark SQL

Lecture 219 Filtering Window Function Results using Spark SQL

Section 19: Sample scenarios with solutions

Lecture 220 Introduction to Sample Scenarios and Solutions

Lecture 221 Problem Statements - General Guidelines

Lecture 222 Initializing the job - General Guidelines

Lecture 223 Getting crime count per type per month - Understanding Data

Lecture 224 Getting crime count per type per month - Implementing the logic - Core API

Lecture 225 Getting crime count per type per month - Implementing the logic - Data Frames

Lecture 226 Getting crime count per type per month - Validating Output

Lecture 227 Get inactive customers - using Core Spark API (leftOuterJoin)

Lecture 228 Get inactive customers - using Data Frames and SQL

Lecture 229 Get top 3 crimes in RESIDENCE - using Core Spark API

Lecture 230 Get top 3 crimes in RESIDENCE - using Data Frame and SQL

Lecture 231 Convert NYSE data from text file format to parquet file format

Lecture 232 Get word count - with custom control arguments, num keys and file format

Any IT aspirant/professional willing to learn Data Engineering using Apache Spark,Python Developers who want to learn Spark using Scala to add additional skill to be a Data Engineer,Java or Scala Developers to learn Spark using Scala to add Data Engineering Skills to their profile