Tags
Language
Tags
April 2024
Su Mo Tu We Th Fr Sa
31 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 1 2 3 4

Data Engineering Essentials Using Sql, Python, And Pyspark (updated 2/2023)

Posted By: ELK1nG
Data Engineering Essentials Using Sql, Python, And Pyspark (updated 2/2023)

Data Engineering Essentials Using Sql, Python, And Pyspark
Last updated 2/2023
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 31.34 GB | Duration: 65h 57m

Learn key Data Engineering Skills such as SQL, Python, Apache Spark (Spark SQL and Pyspark) with Exercises and Projects

What you'll learn

Setup Development Environment to learn building Data Engineering Applications on GCP

Database Essentials for Data Engineering using Postgres such as creating tables, indexes, running SQL Queries, using important pre-defined functions, etc.

Data Engineering Programming Essentials using Python such as basic programming constructs, collections, Pandas, Database Programming, etc.

Data Engineering using Spark Dataframe APIs (PySpark). Learn all important Spark Data Frame APIs such as select, filter, groupBy, orderBy, etc.

Data Engineering using Spark SQL (PySpark and Spark SQL). Learn how to write high quality Spark SQL queries using SELECT, WHERE, GROUP BY, ORDER BY, ETC.

Relevance of Spark Metastore and integration of Dataframes and Spark SQL

Ability to build Data Engineering Pipelines using Spark leveraging Python as Programming Language

Use of different file formats such as Parquet, JSON, CSV etc in building Data Engineering Pipelines

Setup self support single node Hadoop and Spark Cluster to get enough practice on HDFS and YARN

Understanding Complete Spark Application Development Life Cycle to build Spark Applications using Pyspark. Review the applications using Spark UI.

Requirements

Laptop with decent configuration (Minimum 4 GB RAM and Dual Core)

Sign up for GCP with the available credit or AWS Access

Setup self support lab on cloud platforms (you might have to pay the applicable cloud fee unless you have credit)

CS or IT degree or prior IT experience is highly desired

Description

As part of this course, you will learn all the Data Engineering Essentials related to building Data Pipelines using SQL, Python as Hadoop, Hive, or Spark SQL as well as PySpark Data Frame APIs. You will also understand the development and deployment lifecycle of Python applications using Docker as well as PySpark on multinode clusters. You will also gain basic knowledge about reviewing Spark Jobs using Spark UI.About Data EngineeringData Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc.Here are some of the challenges the learners have to face to learn key Data Engineering Skills such as Python, SQL, PySpark, etc.Having an appropriate environment with Apache Hadoop, Apache Spark, Apache Hive, etc working together.Good quality content with proper support.Enough tasks and exercises for practiceThis course is designed to address these key challenges for professionals at all levels to acquire the required Data Engineering Skills (Python, SQL, and Apache Spark).To make sure you spend time learning rather than struggling with technical challenges, here is what we have done.Training using an interactive environment. You will get 2 weeks of lab access, to begin with. If you like the environment and acknowledge it by providing ratings and feedback, the lab access will be extended to additional 6 weeks (2 months). Feel free to send an email to support@itversity.com to get complementary lab access. Also, if your employer provides a multi-node environment, we will help you set up the material for the practice as part of the live session. On top of Q&A Support, we also provide required support via live sessions.Make sure we have a system with the right configuration and quickly set up a lab using Docker with all the required Python, SQL, Pyspark as well as Spark SQL material. It will address a lot of pain points related to networking, database integration, etc. Feel free to reach out to us via Udemy Q&A, in case you struck at the time of setting up the environment.You will start with foundational skills such as Python as well as SQL using a Jupyter-based environment. Most of the lecturers have quite a few tasks and also at the end of each and every module, there are enough exercises or practice tests to evaluate the skills taught.Once you are comfortable with programming using Python and SQL, then you will ensure you understand how to quickly set up and access Single Node Hadoop and Spark Cluster.The content is streamlined in such a way that, you use learner-friendly interfaces such as Jupyter Lab to practice them.If you end up signing up for the course do not forget to rate us 5* if you like the content. If not, feel free to reach out to us and we will address your concerns.Highlights of this courseHere are some of the highlights of this Data Engineering course using technologies such as Python, SQL, Hadoop, Spark, etc.The course is designed by 20+ years of experienced veteran (Durga Gadiraju) with most of his experience around data. He has more than a decade of Data Engineering as well as Big Data experience with several certifications. He has a history of training hundreds of thousands of IT professionals in Data Engineering as well as Big Data.Simplified setup of all the key tools to learn Data Engineering or Big Data such as Hadoop, Spark, Hive, etc.Dedicated support where 100% of questions are answered in the past few months.Tons of material with real-world experiences and Data Sets. The material is made available both under the Git repository as well as in the lab which you are going to set up.Complementary Lab Access for 2 Weeks which can be extended to 8 Weeks.30 Day Money back guarantee.Content DetailsAs part of this course, you will be learning Data Engineering Essentials such as SQL, and Programming using Python and Apache Spark. Here is the detailed agenda for the course.Data Engineering Labs - Python and SQLYou will start with setting up self-support Data Engineering Labs on Cloud9 or on your Mac or PC so that you can learn the key skills related to Data Engineering with a lot of practice leveraging tasks and exercises provided by us. As you pass the sections related to SQL and Python, you will also be guided to set up Hadoop and Spark Lab.Provision AWS Cloud9 Instance (in case your Mac or PC does not have enough capacity)Setup Docker Compose to start the containers to learn Python and SQL (using Postgresql)Access the material via Jupyter Lab environment setup using Docker and learn via hands-on practice.Once the environment is set up, the material will be directly accessible.Database Essentials - SQL using PostgresIt is important for one to be proficient with SQL to take care of building data engineering pipelines. SQL is used for understanding the data, performing ad-hoc analysis, and also in building data engineering pipelines.Getting Started with PostgresBasic Database Operations (CRUD or Insert, Update, Delete)Writing Basic SQL Queries (Filtering, Joins, and Aggregations)Creating Tables and Indexes using Postgres DDL CommandsPartitioning Tables and Indexes using Postgres DDL CommandsPredefined Functions using SQL (String Manipulation, Date Manipulation, and other functions)Writing Advanced SQL Queries using PostgresqlProgramming Essentials using PythonPython is the most preferred programming language to develop data engineering applications. As part of several sections related to Python, you will be learning most of the important aspects of Python to build data engineering applications effectively.Perform Database OperationsGetting Started with PythonBasic Programming Constructs in Python (for loops, if conditions)Predefined Functions in Python (string manipulation, date manipulation, and other standard functions)Overview of Collections such as list and set in PythonOverview of Collections such as dict and tuple in PythonManipulating Collections using loops in Python. This is primarily designed to get enough practice with Python Programming around Python Collections.Understanding Map Reduce Libraries in Python. You will learn functions such as map, filter, etc. You will also understand details about itertools.Overview of Python Pandas Libraries. You will be learning about how to read from files, and processing the data in Pandas Data Frame by applying Standard Transformations such as filtering, joins, sorting, etc. Also, you'll be learning how to write data to files. Database Programming using Python - CRUD OperationsDatabase Programming using Python - Batch Operations. There will be enough emphasis on best practices to load data into Databases in bulk or batches.Setting up Single Node Data Engineering Cluster for PracticeThe most common approach to building data engineering applications at scale is by using Apache Spark integrated with HDFS and YARN. Before getting into data engineering using Apache Spark and Hadoop, we need to set up an environment to practice data engineering using Apache Spark. As part of this section, we will primarily focus on setting up a single node cluster to learn key skills related to data engineering using distributed frameworks such as Apache Spark and Apache Hadoop.We have simplified the complex tasks of setting up Apache Hadoop, Apache Hive, and Apache Spark leveraging Docker. Within an hour without running into too many technical issues, you will be able to set up the cluster. However, if you run into any issues, feel free to reach out to us and we will help you to overcome the challenges.Master required Hadoop Skills to build Data Engineering ApplicationsAs part of this section, you will primarily focus on HDFS commands so that we can copy files into HDFS. The data copied into HDFS will be used as part of building data engineering pipelines using Spark and Hadoop with Python as a Programming Language.Overview of HDFS CommandsCopy Files into HDFS using put or copyFromLocal command using appropriate HDFS CommandsReview whether the files are copied properly or not to HDFS using HDFS Commands.Get the size of the files using HDFS commands such as du, df, etc.Some fundamental concepts related to HDFS such as block size, replication factor, etc.Data Engineering using Spark SQLLet us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQL will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.Getting Started with Spark SQLBasic Transformations using Spark SQLManaging Tables - Basic DDL and DML in Spark SQLManaging Tables - DML and Create Partitioned Tables using Spark SQLOverview of Spark SQL Functions to manipulate strings, dates, null values, etcWindowing Functions using Spark SQL for ranking, advanced aggregations, etc.Data Engineering using Spark Data Frame APIsSpark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Apache Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.Data Processing Overview using Spark or Pyspark Data Frame APIs.Projecting or Selecting data from Spark Data Frames, renaming columns, providing aliases, dropping columns from Data Frames, etc using Pyspark Data Frame APIs.Processing Column Data using Spark or Pyspark Data Frame APIs - You will be learning functions to manipulate strings, dates, null values, etc.Basic Transformations on Spark Data Frames using Pyspark Data Frame APIs such as Filtering, Aggregations, and Sorting using functions such as filter/where, groupBy with agg, sort or orderBy, etc.Joining Data Sets on Spark Data Frames using Pyspark Data Frame APIs such as join. You will learn inner joins, outer joins, etc using the right examples.Windowing Functions on Spark Data Frames using Pyspark Data Frame APIs to perform advanced Aggregations, Ranking, and Analytic FunctionsSpark Metastore Databases and Tables and integration between Spark SQL and Data Frame APIsDevelopment, Deployment as well as Execution Life Cycle of Spark ApplicationsOnce you go through the content related to Apache Spark using a Jupyter-based environment, we will also walk you through the details about how the Spark applications are typically developed using Python, deployed as well as reviewed.Setup Python Virtual Environment and Project for Spark Application Development using PycharmUnderstand complete Spark Application Development Lifecycle using Pycharm and PythonBuild a zip file for the Spark Application, copy it to the environment where it is supposed to run, and run.Understand how to review the Spark Application Execution Life Cycle.Desired Audience for this Data Engineering Essentials coursePeople from different backgrounds can aim to become Data Engineers. We cover most of the Data Engineering essentials for the aspirants who want to get into the IT field as Data Engineers as well as professionals who want to propel their career toward Data Engineering from legacy technologies.College students and entry-level professionals to get hands-on expertise with respect to Data Engineering. This course will provide enough skills to face interviews for entry-level data engineers.Experienced application developers to gain expertise related to Data Engineering.Conventional Data Warehouse Developers, ETL Developers, Database Developers, and PL/SQL Developers to gain enough skills to transition to being successful Data Engineers.Testers to improve their testing capabilities related to Data Engineering applications.Other hands-on IT Professional who wants to get knowledge about Data Engineering with Hands-On Practice.Prerequisites to practice Data Engineering SkillsHere are the prerequisites for someone who wants to be a Data Engineer.LogisticsComputer with decent configuration (At least 4 GB RAM, however 8 GB is highly desired). However, this will not suffice if you do not have a multi-node cluster. We will walk you through the cheaper options to set up the environment and practice.Dual Core is required and Quad-Core is highly desiredChrome BrowserHigh-Speed InternetDesired BackgroundEngineering or Science DegreeAbility to use computerKnowledge or working experience with databases and any programming language is highly desiredTraining Approach for learning required Data Engineering SkillsHere are the details related to the training approach for you to master all the key Data Engineering Skills to propel your career toward Data Engineering.It is self-paced with reference material, code snippets, and videos provided as part of Udemy.One can either use the environment provided by us or set up their own environment using Docker on AWS or GCP or the platform of their choice.We would recommend completing 2 modules every week by spending 4 to 5 hours per week.It is highly recommended to take care of the exercises at the end to ensure that you are able to meet all the key objectives for each module.Support will be provided through Udemy Q&A.The course is designed in such a way that one can self-evaluate through the course and confirm whether the skills are acquired.Here is the approach we recommend you to take this course.The course is hands-on with thousands of tasks, you should practice as you go through the course.You should also spend time understanding the concepts. If you do not understand the concept, I would recommend moving on and coming back later to the topic.Go through the consolidated exercises and see if you are able to solve the problems or not.Make sure to follow the order we have defined as part of the course.After each and every section or module, make sure to solve the exercises. We have provided enough information to validate the output.By the end of the course, then you can come to the conclusion that you are able to master essential skills related to SQL, Python, and Apache Spark.

Overview

Section 1: Introduction about the course

Lecture 1 Introduction about course

Lecture 2 Desired Audience

Lecture 3 Pre-requisites

Lecture 4 [Must Watch] 30 Day Money Back Guarantee - Feedback and Rating

Lecture 5 Training Approach

Lecture 6 Overview of Environment for Hands on Practice

Lecture 7 How to access data sets used in this course?

Section 2: Getting Started with ITVersity Labs for Data Engineering Essentials on Udemy

Lecture 8 Introduction to Getting Started with ITVersity Labs and Udemy

Lecture 9 Logging in into the ITVersity Python and Data Engineering Lab

Lecture 10 Setup Data Engineering Material from GitHub

Lecture 11 Overview of ITVersity Labs and Udemy

Lecture 12 Overview of Jupyter Lab Environment

Lecture 13 Using Jupyter Lab Sidebar to Navigate through the content

Lecture 14 Understanding Jupyter Launcher

Lecture 15 Creating Jupyter Notebooks and Overview of Kernels

Lecture 16 Managing Tabs and Kernels using Jupyter Lab Environment

Lecture 17 Overview of Jupyter Notebooks and Cells

Lecture 18 Running Shell Commands using Jupyter Notebook

Lecture 19 Getting Information to Connect to Databases to run queries

Lecture 20 Running SQL Queries using Jupyter Notebooks

Section 3: Setup Environment to learn Python, SQL, Hadoop, Spark using Docker on Windows 11

Lecture 21 Setup Environment using Docker on Windows 11 - Introduction

Lecture 22 Understanding System Configuration of Windows 11 PC

Lecture 23 Steps to setup Docker Desktop on Windows 11

Lecture 24 Enable WSL2 on Windows 11 by installing Ubuntu VM using WSL

Lecture 25 Install Linux Kernel Update Package on Windows 11 for Docker Desktop

Lecture 26 Download and Install Docker Desktop on Windows 11

Lecture 27 Validating git using WSL Ubuntu on Windows 11

Lecture 28 Clone Data Engineering Essentials Material on Windows 11

Lecture 29 Start Python and SQL Containers using docker-compose command on Windows 11

Lecture 30 Download and Install Pycharm on Windows 11

Lecture 31 Setup Pycharm Project for Data Engineering

Lecture 32 Review Docker Compose File for Data Engineering Essentials Material

Lecture 33 Review important Docker Compose Commands to manage services

Lecture 34 Access Jupyter Based Environment to learn Python and SQL

Lecture 35 Getting Jupyter Lab Token to login into Jupyter Lab

Section 4: Setup Environment to learn Python, SQL, Hadoop, Spark using Docker on Windows 10

Lecture 36 Understanding System Configuration

Lecture 37 Setup Docker Desktop on Windows

Lecture 38 Validate Docker on Windows using Command Line leveraging Power Shell

Lecture 39 Review Docker Desktop Resource Configurations

Lecture 40 Clone GitHub Repository on Windows

Lecture 41 Setup Pycharm Project for Data Engineering Essentials

Lecture 42 Update Git Global Settings related to Line Endings

Lecture 43 Review Services Docker Compose

Lecture 44 Start Python and SQL Environment using Docker Compose

Lecture 45 Review resource utilization after setting up Python and SQL Environment

Lecture 46 Access Jupyter Based Environment to learn Python

Lecture 47 Getting Jupyter Lab Token to login into Jupyter Lab

Section 5: Setup Environment to learn Python, SQL, Hadoop and Spark using Docker on Mac

Lecture 48 Setup Environment using Mac

Lecture 49 Setup Docker Desktop on Mac

Lecture 50 Validate Docker Setup on Mac

Lecture 51 Review Memory and CPU Settings of Docker Desktop for Mac

Lecture 52 Configure Docker Desktop for Data Engineering Essentials Environment

Lecture 53 Clone GitHub Repository for Data Engineering Essentials

Lecture 54 Setup as Pycharm Project to review the files using IDE

Lecture 55 Review Docker Compose file for Python and SQL Lab

Lecture 56 Start Python and SQL Environment using Docker Compose

Lecture 57 Review resource utilization after setting up Python and SQL Environment

Lecture 58 Access Jupyter Based Environment to learn Python

Lecture 59 Getting Jupyter Lab Token to login into Jupyter Lab

Section 6: Setting up Environment to learn Python, SQL as well as Spark using AWS Cloud9

Lecture 60 Getting Started with Cloud9

Lecture 61 Creating Cloud9 Environment

Lecture 62 Warming up with Cloud9 IDE

Lecture 63 Details about material to setup postgres database using docker

Lecture 64 Overview of EC2 related to Cloud9

Lecture 65 Opening ports for Cloud9 Instance

Lecture 66 Associating Elastic IPs to Cloud9 Instance

Lecture 67 Increase EBS Volume Size of Cloud9 Instance

Lecture 68 Setup Docker Compose on AWS Cloud9 Instance

Lecture 69 Clone GitHub Repository

Lecture 70 Setup Python and SQL Environment using Docker Compose

Lecture 71 Update Inbound Rules of AWS EC2 Security Group

Lecture 72 Login into the Jupyter based environment

Section 7: Networking Concepts for Beginners - ip addresses and port numbers

Lecture 73 Enable telnet on Windows

Lecture 74 Different IP Address Types

Lecture 75 Port Numbers associated with Applications or Services

Lecture 76 Reverting port for SSH to default port number

Lecture 77 Setup Apache2 on Ubuntu

Lecture 78 Overview of localhost

Lecture 79 Overview of Private IP Address associated with a server

Lecture 80 Overview of Public IP Address associated with a server

Lecture 81 Setup Web Application and access using local ip

Lecture 82 Setup Web Application and access using private ip

Lecture 83 Disable Access to Web Application using Public ip

Lecture 84 Install sshuttle on Mac using brew

Lecture 85 Access Web Application using Private IP using SSH as proxy

Section 8: Database Essentials - Getting Started

Lecture 86 Setup SMS Database using Postgres

Lecture 87 Connecting to Postgresql Database

Lecture 88 Using psql to interact with Postgresql Database using CLI

Lecture 89 Data Loading Utilities in Postgresql

Section 9: Database Essentials - Database Operations

Lecture 90 Database Operations - Overview

Lecture 91 Database CRUD Operations

Lecture 92 Creating Table in Postgres Database

Lecture 93 Inserting Data into Postgres Database Table

Lecture 94 Updating Data in Postgres Database Table

Lecture 95 Deleting Data in Postgres Database Table

Lecture 96 Overview of Database Transactions

Lecture 97 Exercise - DML or CRUD Operations using Postgresql

Section 10: Database Essentials - Writing Basic SQL Queries

Lecture 98 Standard Transformations

Lecture 99 Overview of Data Model

Lecture 100 Define Problem Statement

Lecture 101 Preparing Database Tables using Postgres

Lecture 102 Selecting or Projecting Data from Postgres Database Tables using SQL

Lecture 103 Filtering Data from Postgres Database Tables using SQL

Lecture 104 Joining Postgres Database Tables using SQL - Inner

Lecture 105 Joining Postgres Database Tables using SQL - Outer

Lecture 106 Performing Aggregations using SQL on Postgres Database Tables

Lecture 107 Sorting Data in Postgres Tables using SQL

Lecture 108 Solution - Daily Product Revenue using SQL on Postgres Database Tables

Lecture 109 Exercises - Writing Basic SQL Queries on Postgres Database Tables

Section 11: Database Essentials - Creating Tables and Indexes

Lecture 110 DDL - Data Definition Language

Lecture 111 Overview of Data Types used while creating Postgres Database Tables

Lecture 112 Adding or Modifying Columns using Alter in Postgres Database Tables

Lecture 113 Different Type of Constraints used on Database Tables

Lecture 114 Managing Constraints on Postgres Database Tables

Lecture 115 Indexes on Postgres Database Tables

Lecture 116 Indexes for Constraints on Postgres Database Tables

Lecture 117 Overview of Sequences used on Postgres Database Tables

Lecture 118 Truncating Postgres Database Tables

Lecture 119 Dropping Postgres Database Tables

Lecture 120 Exercises and Solutions - Managing Database Objects using Postgresql

Section 12: Database Essentials - Partitioning Tables and Indexes

Lecture 121 Overview of Partitioning of Postgres Database Tables

Lecture 122 List Partitioning of Database Tables

Lecture 123 Managing Partitions of Postgres Database Tables - List

Lecture 124 Manipulating Data in Postgres Database Partitioned Tables

Lecture 125 Range Partitioning of Postgres Database Tables

Lecture 126 Managing Partitions of Postgres Database Tables - Range

Lecture 127 Repartitioning of Postgres Database Tables - Range

Lecture 128 Hash Partitioning of Postgres Database Tables

Lecture 129 Managing Partitions of Postgres Database Tables - Hash

Lecture 130 Usage Scenarios of Database Partitioned Tables

Lecture 131 Sub Partitioning of Postgres Database Tables

Lecture 132 Exercise - Partitioned Tables of Postgres Database Tables

Section 13: Database Essentials - Predefined Functions

Lecture 133 Overview of SQL Functions in Postgres

Lecture 134 String Manipulation Functions in SQL using Postgres

Lecture 135 Case Conversion and Length using Functions in SQL using Postgres

Lecture 136 Extracting Data - Using substr and split_part Functions in SQL using Postgres

Lecture 137 Using position or strpos Functions in SQL using Postgres

Lecture 138 Trimming and Padding Functions in SQL using Postgres

Lecture 139 Reverse and Concatenate Multiple Strings using Functions in SQL using Postgres

Lecture 140 String Replacement using Functions in SQL using Postgres

Lecture 141 Date Manipulation Functions using SQL in Postgres

Lecture 142 Getting Current Date or Timestamp using Functions in SQL using Postgres

Lecture 143 Date Arithmetic using Functions in SQL using Postgres

Lecture 144 Beginning Date or Time using date_trunc Function in SQL using Postgres

Lecture 145 Using to_char and to_date Functions in SQL using Postgres

Lecture 146 Extracting Information using extract Function in SQL using Postgres

Lecture 147 Dealing with Unix Timestamp or epoch using Functions in SQL using Postgres

Lecture 148 Overview of Numeric Functions using SQL in Postgres

Lecture 149 Data Type Conversion using Functions in SQL using Postgres

Lecture 150 Handling NULL Values using SQL in Postgres

Lecture 151 Using CASE and WHEN as part of SQL in Postgres

Section 14: Database Essentials - Writing Advanced SQL Queries

Lecture 152 Overview of Database Views using Postgres Database

Lecture 153 Overview of Named Queries using SQL in Postgres

Lecture 154 Overview of Sub Queries using SQL in Postgres

Lecture 155 CTAS - Create Table As Select using Postgres

Lecture 156 Advanced DML Operations on Postgres Database Tables

Lecture 157 Merging or Upserting Data into Postgres Database Tables

Lecture 158 Pivoting Rows into Columns using SQL in Postgres

Lecture 159 Overview of Analytic Functions using SQL in Postgres

Lecture 160 Analytic Functions - Aggregations using SQL in Postgres

Lecture 161 Cumulative or Moving Aggregations using SQL in Postgres

Lecture 162 Analytic Functions using SQL in Postgres - Windowing

Lecture 163 Analytic Functions using SQL in Postgres - Ranking

Lecture 164 Analytic Functions using SQL in Postgres - Filtering

Lecture 165 Ranking and Filtering using SQL in Postgres - Recap

Lecture 166 Exercises - Writing Advanced Queries

Section 15: Programming Essentials using Python - Perform Database Operations

Lecture 167 Introduction - Perform Database Operations

Lecture 168 Overview of SQL

Lecture 169 Create Database and Users Table

Lecture 170 DDL - Data Definition Language

Lecture 171 DML - Data Manipulation Language

Lecture 172 DQL - Data Query Language

Lecture 173 CRUD Operations - DML and DQL

Lecture 174 TCL - Transaction Control Language

Lecture 175 Example - Data Engineering

Lecture 176 Example - Web Application

Lecture 177 Exercise - Database Operations

Section 16: Programming Essentials using Python - Getting Started with Python

Lecture 178 Installing Python on Windows

Lecture 179 Overview of Anaconda

Lecture 180 Python CLI and Jupyter Notebook

Lecture 181 Overview of Jupyter Lab

Lecture 182 Using IDEs - Pycharm

Lecture 183 Using Visual Studio Code

Lecture 184 Using ITVersity Labs

Lecture 185 Leveraging Google Colab

Section 17: Programming Essentials using Python - Basic Programming Constructs

Lecture 186 Basic Programming Constructs using Python - Introduction

Lecture 187 Getting Help using help function in Python

Lecture 188 Python Variables and Objects

Lecture 189 Python Data Types - Commonly Used

Lecture 190 Operators in Python

Lecture 191 Tasks - Data Types and Operators using Python

Lecture 192 Developing Conditionals using Python

Lecture 193 All about for loops in Python

Lecture 194 Running os commands in Python

Lecture 195 Exercises - Basic Programming Constructs using Python

Lecture 196 Dynamic Arithmetic Operations using eval and exec in Python

Section 18: Programming Essentials using Python - Predefined Functions

Lecture 197 Predefined Functions in Python - Introduction

Lecture 198 Overview of Predefined Functions in Python

Lecture 199 Numeric Functions in Python

Lecture 200 Overview of Strings in Python

Lecture 201 String Manipulation Functions in Python

Lecture 202 Formatting Strings in Python

Lecture 203 Print and Input Functions in Python

Lecture 204 Date Manipulation Functions in Python

Lecture 205 Exercises - Predefined Functions in Python

Section 19: Programming Essentials using Python - User Defined Functions

Lecture 206 Developing User Defined Functions in Python - Introduction

Lecture 207 Defining Functions in Python

Lecture 208 Doc Strings in Python

Lecture 209 Returning Variables from Python Functions

Lecture 210 Passing Function Parameters and Arguments to Python Functions

Lecture 211 Varying Arguments in Python

Lecture 212 Keyword Arguments in Python

Lecture 213 Recap of User Defined Functions in Python

Lecture 214 Passing Functions as Arguments to Python Functions

Lecture 215 Lambda or Anonymous Functions in Python

Lecture 216 Usage of Lambda Functions in Python Functions

Lecture 217 Exercise - User Defined Functions in Python

Section 20: Programming Essentials using Python - Overview of Collections - list and set

Lecture 218 Overview of Collections in Python - list and set - Introduction

Lecture 219 Overview of list and set in Python

Lecture 220 Common Operations on Python Collections

Lecture 221 Accessing elements from Python list

Lecture 222 Adding elements to Python list

Lecture 223 Updating and Deleting elements from Python list

Lecture 224 Other or Miscellaneous Python list operations

Lecture 225 Adding and Deleting elements using Python set

Lecture 226 Typical Python set operations

Lecture 227 Validating Python sets

Lecture 228 Usage of Python list and set

Lecture 229 Exercises - Basic Operations on Python list and set

Lecture 230 Python List of Delimited Strings

Lecture 231 Sorting data in Python lists and tuples

Lecture 232 Sorting list of Delimited Strings using Python

Lecture 233 Exercises - Sorting lists and sets in Python

Section 21: Programming Essentials using Python - Overview of Collections - dict and tuple

Lecture 234 Manipulating Collections using loops in Python - Introduction

Lecture 235 Overview of Python dict and tuple

Lecture 236 Common Operations on dict and tuple using Python

Lecture 237 Accessing Elements from Python tuples

Lecture 238 Accessing Elements from Python dict

Lecture 239 Manipulating Python dict

Lecture 240 Common Examples of Python dict

Lecture 241 Representing Tables or Excel Sheets as Python List of Tuples

Lecture 242 Representing Tables or Excel Sheets as Python List of dicts

Lecture 243 Process Python dict values

Lecture 244 Processing Python dict items

Lecture 245 Sorting Python dict items

Lecture 246 Exercises - Overview of Python Collections - dict and set

Section 22: Programming Essentials using Python - Manipulating Collections using loops

Lecture 247 Manipulating Collections using loops in Python - Introduction

Lecture 248 Reading Files into Python Collections

Lecture 249 Overview of Standard Transformations

Lecture 250 Row Level Transformations using Python loops

Lecture 251 Getting Unique Elements using Python loops

Lecture 252 Filtering Data using Python loops and conditionals

Lecture 253 Preparing Data Sets

Lecture 254 Quick recap of Python dict operations

Lecture 255 Performing Total Aggregations using Python loops

Lecture 256 Overview of Grouped Aggregations using Python loops

Lecture 257 Get Order Count by Status using Python loops

Lecture 258 Get Revenue Details per Order using Python loops

Lecture 259 Get Order Count by Month using Python loops

Lecture 260 Joining Data Sets using Python loops

Lecture 261 Manipulate Collections using Comprehensions in Python

Lecture 262 List Comprehensions using Python

Lecture 263 Set Comprehensions using Python

Lecture 264 Dict Comprehensions in Python

Lecture 265 Limitations of using loops to process data sets

Lecture 266 Exercises - Manipulating Collections using Python loops

Section 23: Programming Essentials using Python - Development of Map Reduce APIs

Lecture 267 Develop myFilter Function using Python loops and conditionals

Lecture 268 Validate myFilter using Python loops and conditionals

Lecture 269 Develop myMap Function using Python loops

Lecture 270 Validate myMap Function using Python loops

Lecture 271 Develop myReduce Function using Python loops

Lecture 272 Validate myReduce Function using Python loops

Lecture 273 Develop myReduceByKey Function using Python loops

Lecture 274 Validate myReduceByKey Function using Python loops

Lecture 275 Develop myJoin Function using Python loops

Lecture 276 Validate myJoin Function using Python loops

Lecture 277 Exercises - Development of Map Reduce APIs using Python loops and Conditionals

Section 24: Programming Essentials using Python - Understanding Map Reduce Libraries

Lecture 278 Preparing Data Sets

Lecture 279 Filtering Data using Python filter

Lecture 280 Projecting data using Python map

Lecture 281 Row Level Transformations using Python map

Lecture 282 Aggregations using Python reduce

Lecture 283 Get Revenue for a given product id using Python Map Reduce

Lecture 284 Get total items sold and revenue for a product using Python Map reduce

Lecture 285 Get total commission amount using Python Map Reduce

Lecture 286 Overview of itertools

Lecture 287 Cumulative Operations using Python itertools

Lecture 288 Using Python itertools starmap

Lecture 289 Overview of Python itertools groupby

Lecture 290 Get order count by status using Python itertools groupby

Lecture 291 Get revenue per order using Python itertools groupby

Lecture 292 Limitations of Python Map Reduce Libraries

Lecture 293 Exercises - Understanding Python Map Reduce Libraries

Section 25: Programming Essentials using Python - Basics of File IO using Python

Lecture 294 Basics of File IO using Python - Introduction

Lecture 295 Overview of File IO using Python

Lecture 296 Understand concepts behind Folders and Files

Lecture 297 Getting File Paths and File Names

Lecture 298 Overview of Retail Data

Lecture 299 Read text file into string using Python File I/O

Lecture 300 Write string to text file using Python File I/O

Lecture 301 Overview of modes to write into files using Python File I/O

Lecture 302 Overview of Delimited Strings

Lecture 303 Read csv into list of strings using Python File I/O

Lecture 304 Writing Strings to file in Append Mode using Python File I/O

Lecture 305 Managing Files and Folders using Python File I/O

Section 26: Programming Essentials using Python - Delimited Files and Collections

Lecture 306 Understanding Delimited Files and Collections

Lecture 307 Overview of Delimited Text Files

Lecture 308 Recap of basic file IO using Python

Lecture 309 Read Delimited files into list of tuples using Python File I/O

Lecture 310 Write Delimited Strings into files using Python File I/O

Lecture 311 Overview of Python CSV Module to process files

Lecture 312 Read Delimited data into list using Python CSV APIs

Lecture 313 Writing iterables to files using Python CSV APIs

Lecture 314 Advantages of using using APIs in Python CSV module

Lecture 315 Apply Schema on lists from files using Python

Section 27: Programming Essentials using Python - Overview of Pandas Libraries

Lecture 316 Overview of Python Pandas Libraries

Lecture 317 Understanding Python Pandas Data Structures

Lecture 318 Overview of Python Series

Lecture 319 Creating Python Data Frames from lists

Lecture 320 Basic Operations on Python Data Frames

Lecture 321 Reading Data from CSV Files to Python Pandas Data Frames

Lecture 322 Projecting and Filtering using Python Pandas Data Frame APIs

Lecture 323 Performing Total Aggregations using Python Pandas Data Frame APIs

Lecture 324 Performing Grouped Aggregations using Python Pandas Data Frame APIs

Lecture 325 Writing Python Pandas Data Frames to Files

Lecture 326 Joining Data in Python Pandas Data Frames using join

Section 28: Programming Essentials using Python - Database Programming - CRUD Operations

Lecture 327 Database Operations using Python - CRUD Operations - Introduction

Lecture 328 Overview of Database Programming using Python

Lecture 329 Recap of RDBMS Concepts

Lecture 330 Setup Database Client Libraries for Python Applications

Lecture 331 Develop Function to get Database Connection using Python

Lecture 332 Create Database Table in Postgres using Python

Lecture 333 Inserting Data into Table in Postgres using Python

Lecture 334 Updating Existing Table Data in Postgres using Python

Lecture 335 Deleting Data From Table in Postgres using Python

Lecture 336 Querying Data From Table in Postgres using Python

Lecture 337 Recap - CRUD Operations using Python

Section 29: Programming Essentials using Python - Database Programming - Batch Operations

Lecture 338 Database Programming using Python - Batch Operations - Introduction

Lecture 339 Recap of Insert using Python

Lecture 340 Preparing Database to perform batch operations using Python

Lecture 341 Reading Data From File using Python File I/O

Lecture 342 Batch Loading of Data into Database Table using Python

Lecture 343 Best Practices for Batch Loading into Database Table using Python

Section 30: Programming Essentials using Python - Processing JSON Data

Lecture 344 Processing JSON Data - Introduction

Lecture 345 Process JSON using Python Pandas

Lecture 346 JSON Data Types

Lecture 347 Create JSON String

Lecture 348 Process JSON String

Lecture 349 Single JSON Document in Files

Lecture 350 Multiple JSON Documents in files

Lecture 351 Process JSON using Pandas

Lecture 352 Different JSON Formats supported by Python Pandas

Lecture 353 Common Use Cases for JSON

Lecture 354 Write to JSON files using Python json module

Lecture 355 Write to JSON files using Python Pandas

Section 31: Programming Essentials using Python - Processing REST Payloads

Lecture 356 Overview of REST APIs

Lecture 357 Using curl command

Lecture 358 Overview of Postman

Lecture 359 Getting Started with Python requests module

Lecture 360 Convert REST Payload to Python Objects

Lecture 361 Process REST Payload using Python Collection Operations

Lecture 362 Process REST Payload using Python Pandas

Section 32: Understanding Python Virtual Environments

Lecture 363 Introduction to Python Virtual Environments

Lecture 364 Validating Python Versions

Lecture 365 Create Python Virtual Environment for Web Application

Lecture 366 Reviewing dependencies installed in Python Virtual Environment

Lecture 367 Installing Dependencies for Web Application using Python pip

Lecture 368 Getting Details about installed packages using Python pip

Lecture 369 Uninstall Packages using Python pip

Lecture 370 Cleanup Python Virtual Environment

Lecture 371 Recreate and Activate Python Virtual Environment for Web Application

Lecture 372 Define requirements file for Python Web Application

Lecture 373 Install Dependencies using requirements file for Python Web Application

Lecture 374 Create Virtual Environment for Data Engineering Application using Python

Lecture 375 Install Dependencies for Data Engineering Application using Python

Lecture 376 Install Dependencies for Data Engineering Application using Python 3.6

Lecture 377 Validate Python and Package Compatibility and Install Python 3.6

Lecture 378 Conclusion about understanding Python Virtual Environments

Section 33: Overview of Pycharm for Python Application Development

Lecture 379 Introduction to Pycharm for Python Application Development

Lecture 380 Installation of Pycharm on Windows for Python Application Development

Lecture 381 Installation of Pycharm on Mac for Python Application Development

Lecture 382 Setup Python Getting Started Project using Pycharm

Lecture 383 Setup Python Getting Started Project using Pycharm on Mac

Lecture 384 Setup de-demo Python project using Pycharm

Lecture 385 Accessing Settings in Pycharm and Changing Font Size

Lecture 386 Accessing Settings in Pycharm and Changing Font Size on Mac

Lecture 387 Install Python Packages using Pycharm

Lecture 388 Overview of Pycharm Integrated Terminal

Lecture 389 Overview of Pycharm Integrated Terminal on Mac

Lecture 390 Overview of Run Time Arguments for Python Applications

Lecture 391 Passing Run Time Arguments to Python Applications using Pycharm

Section 34: Data Copier - Getting Started

Lecture 392 Introduction to Getting Started for Data copier using Python

Lecture 393 Problem Statement - Data Copier using Python

Lecture 394 Create Working Directory for the Python Project

Lecture 395 Setup Docker on Windows 10 Pro

Lecture 396 Quick Overview of Docker

Lecture 397 Prepare Dataset

Lecture 398 Create Postgres Container

Lecture 399 Setup Postgres Database for development

Lecture 400 Overview of Postgres Database Commands

Lecture 401 Setup Python Project using Pycharm

Lecture 402 Managing Python Dependencies for the project

Lecture 403 Create GitHub Project

Section 35: Data Copier - Reading Data using Pandas

Lecture 404 Reading Data using Python Pandas - Introduction

Lecture 405 Overview of Retail Data

Lecture 406 Adding Python Pandas to the project

Lecture 407 Reading JSON Data using Python Pandas

Lecture 408 Previewing Data using Python Pandas

Lecture 409 Reading Data in Chunks using Python Pandas

Lecture 410 Dynamically read files using Python os module

Section 36: Data Copier - Database Programming using Pandas

Lecture 411 Database Programming using Python Pandas - Introduction

Lecture 412 Validate Postgres Setup using Docker

Lecture 413 Add required dependencies for database programming using Python pandas

Lecture 414 Create users table in retail_db Database

Lecture 415 Populating Sample Data into users table

Lecture 416 Reading data from table using Python Pandas

Lecture 417 Truncate users Postgres Database Table

Lecture 418 Writing Python Pandas Dataframe to table

Lecture 419 Validating users data in Postgres Database Table

Lecture 420 Drop users Postgres Database Table

Section 37: Data Copier - Loading Data from files to tables

Lecture 421 Loading Data from files to tables - Introduction

Lecture 422 Populating Departments data into table

Lecture 423 Validate departments table

Lecture 424 Populating orders table in chunks using Python Pandas

Lecture 425 Validate orders table in Postgres Database

Lecture 426 Validate orders table using pandas

Section 38: Data Copier - Modularizing the application

Lecture 427 Overview of Python main function

Lecture 428 Overview of Python Environment Variables

Lecture 429 Using Python os module for Environment Variables

Lecture 430 Passing Environment Variables to Python Applications using Pycharm

Lecture 431 Read logic using Python Pandas

Lecture 432 Validate read logic developed using Python Pandas

Lecture 433 Write logic using Python Pandas

Lecture 434 Validate write logic developed using Python Pandas

Lecture 435 Integrate read and write logic using Python

Lecture 436 Validate Integration logic developed using Python

Lecture 437 Develop logic to load multiple tables using Python

Lecture 438 Validate Python logic for table list as run time argument

Lecture 439 Push Python Application Changes to remote git repository

Section 39: Data Copier - Dockerizing the application

Lecture 440 Dockerizing the application - Introduction

Lecture 441 Prepare Database for validation

Lecture 442 Pull and validate appropriate python image

Lecture 443 Create and attach network to database docker container

Lecture 444 Quick recap about Docker containers

Lecture 445 Review Python based Data Copier Application

Lecture 446 Deploying Python application and installing dependencies in the docker container

Lecture 447 Copy source data files into container

Lecture 448 Add Python Data Copier container to custom network

Lecture 449 Installing OS libraries as part of Docker container

Lecture 450 Validate Network Connectivity between Docker Containers

Lecture 451 Running Application from the Docker Container

Lecture 452 Delete Docker Container

Section 40: Data Copier - Using custom Docker Image

Lecture 453 Using Custom Docker Image - Introduction

Lecture 454 Getting started with docker custom image

Lecture 455 Install OS Modules in custom docker image

Lecture 456 Copying Python Source Code to Docker Custom Image

Lecture 457 Adding dependencies to the custom image

Lecture 458 Understanding docker custom image build process

Lecture 459 Mounting Data Folders on to Docker Container

Lecture 460 Passing Environment Variables to Docker Container

Lecture 461 Add Python Data Copier Container to custom network

Lecture 462 Run Python application using Docker

Section 41: Data Copier - Deploy and Validate Application on Remote Server

Lecture 463 Deploy and Validate Python Application on Remote Server - Introduction

Lecture 464 Push Application Changes to GitHub Repository

Lecture 465 Requirements to deploy application on Virtual Machine

Lecture 466 Clone Application on remote machine

Lecture 467 Setup Data Set for Validation

Lecture 468 Setup Network and Database Folder for Database using Docker

Lecture 469 Setup Docker Container for the Database

Lecture 470 Setup Database and Tables as part of Docker based Database Server

Lecture 471 Building Custom Docker Image for application

Lecture 472 Run and Validate Dockerized Application

Section 42: Validate ITVersity Hadoop and Spark Cluster (for ITVersity lab customers)

Lecture 473 Setup Development Environment using VS Code Remote Development Extension Pack

Lecture 474 Review Data Sets Provided as part of Gateway Nodes of Hadoop and Spark Cluster

Lecture 475 Validate HDFS on Multi Node Hadoop and Spark Cluster from Gateway Node

Lecture 476 Validate Hive on Hadoop and Spark Multinode Cluster

Lecture 477 Review Hadoop HDFS and YARN Property Files on Hadoop and Spark Cluster

Lecture 478 Review Hadoop HDFS and YARN Property Files using Visual Studio Code Editor

Lecture 479 Review Hive Property Files on Multinode Hadoop and Spark Cluster

Lecture 480 Review Spark 2 Property Files and Important Properties

Lecture 481 Validate Spark Shell CLI using Spark 2

Lecture 482 Validate Pyspark CLI using Spark 2

Lecture 483 Validate Spark SQL CLI using Spark 2

Lecture 484 Review Spark 3 Property Files and Important Properties

Lecture 485 Validate Spark Shell CLI using Spark 3

Lecture 486 Validate Pyspark CLI using Spark 3

Lecture 487 Validate Spark SQL CLI using Spark 3

Section 43: Setup Single Node Hadoop and Spark Cluster or Lab using Docker

Lecture 488 Setup Single Node Hadoop and Spark Cluster or Lab using Docker

Lecture 489 Pre-requisites to setup Hadoop and Spark Lab

Lecture 490 Configure Docker Desktop

Lecture 491 Update Hadoop and Spark Content

Lecture 492 Clone GitHub Repository to setup and learn Hadoop and Spark

Lecture 493 Cleaning up Docker Containers used for Python and SQL Practice

Lecture 494 Review Hadoop and Spark Lab details in Docker Compose File

Lecture 495 Pull Docker Image for Single Node Hadoop and Spark

Lecture 496 Start Docker Containers related to Hadoop and Spark

Lecture 497 Overview of reviewing Hadoop and Spark Lab setup using Docker

Lecture 498 Connecting to Terminal of Spark and Hadoop Containers

Lecture 499 Review HDFS and YARN on Single Node Hadoop and Spark Cluster

Lecture 500 Review and Validate HIve on Single Node Hadoop and Spark Cluster

Lecture 501 Validate Spark 2 using Pyspark and Spark SQL on Single Node Lab

Lecture 502 Validate Spark 3 using Pyspark and Spark SQL on Single Node Lab

Lecture 503 Validate HIve Metastore used as part of Single Node Hadoop and Spark Cluster

Lecture 504 Access Hadoop and Spark Material using Jupyter lab environment

Lecture 505 Managing Single Node Hadoop and Spark Cluster using Docker

Section 44: Introduction to Hadoop eco system - Overview of HDFS

Lecture 506 Getting help or usage

Lecture 507 Listing HDFS Files

Lecture 508 Managing HDFS Directories

Lecture 509 Copying files from local to HDFS

Lecture 510 Copying files from HDFS to local

Lecture 511 Getting Files Metadata

Lecture 512 Previewing Data in HDFS Files

Lecture 513 HDFS Block Size

Lecture 514 HDFS Replication Factor

Lecture 515 Getting HDFS Storage Usage

Lecture 516 USing HDFS Stat Commands

Lecture 517 HDFS File Permissions

Lecture 518 Overriding Properties of Hadoop or HDFS commands

Section 45: Data Engineering using Spark SQL - Getting Started

Lecture 519 Getting Started - Overview

Lecture 520 Overview of Spark Documentation

Lecture 521 Launching and using Spark SQL CLI

Lecture 522 Overview of Spark SQL Properties

Lecture 523 Running OS Commands using Spark SQL

Lecture 524 Understanding Warehouse Directory

Lecture 525 Managing Spark Metastore Databases

Lecture 526 Managing Spark Metastore Tables

Lecture 527 Retrieve Metadata of Tables

Lecture 528 Role of Spark Metastore or Hive Metastore

Lecture 529 Exercise - Getting Started with Spark SQL

Section 46: Data Engineering using Spark SQL - Basic Transformations

Lecture 530 Basic Transformations - Introduction

Lecture 531 Spark SQL - Overview

Lecture 532 Define Problem Statement

Lecture 533 Prepare Tables

Lecture 534 Projecting Data

Lecture 535 Filtering Data

Lecture 536 Joining Tables - Inner

Lecture 537 Joining Tables - Outer

Lecture 538 Aggregation Data

Lecture 539 Sorting Data

Lecture 540 Conclusion - Final Solution

Section 47: Data Engineering using Spark SQL - Managing Tables - Basic DDL and DML

Lecture 541 Introduction

Lecture 542 Create Spark Metastore Tables

Lecture 543 Overview of Data Types

Lecture 544 Adding Comments

Lecture 545 Loading Data Into Tables - Local

Lecture 546 Loading Data Into Tables - HDFS

Lecture 547 Loading Data - Append and Overwrite

Lecture 548 Creating External Tables

Lecture 549 Managed Tables vs External Tables

Lecture 550 Overview of File Formats

Lecture 551 Drop Tables and Databases

Lecture 552 Truncating Tables

Lecture 553 Exercise - Managed Tables

Section 48: Data Engineering using Spark SQL - Managing Tables - DML and Partitioning

Lecture 554 Introduction - Managing Tables - DML and Partitioning

Lecture 555 Introduction to Partitioning

Lecture 556 Creating Tables using Parquet

Lecture 557 Load vs Insert

Lecture 558 Inserting Data using Stage Table

Lecture 559 Creating Partitioned Tables

Lecture 560 Adding Partitions to Tables

Lecture 561 Loading Data into Partitioned Tables

Lecture 562 Inserting Data into Partitions

Lecture 563 Using Dynamic Partition Mode

Lecture 564 Exercise - Partitioned Tables

Section 49: Data Engineering using Spark SQL - Overview of Spark SQL Functions

Lecture 565 Introduction - Overview of Spark SQL Functions

Lecture 566 Overview of Functions

Lecture 567 Validating Functions

Lecture 568 String Manipulation Functions

Lecture 569 Date Manipulation Functions

Lecture 570 Overview of Numeric Functions

Lecture 571 Data Type Conversion

Lecture 572 Dealing with Nulls

Lecture 573 Using CASE and WHEN

Lecture 574 Query Example - Word Count

Section 50: Data Engineering using Spark SQL - Windowing Functions

Lecture 575 Introduction - Windowing Functions

Lecture 576 Prepare HR Database

Lecture 577 Overview of Windowing Functions

Lecture 578 Aggregations using Windowing Functions

Lecture 579 Using LEAD or LAG

Lecture 580 Getting first and last values

Lecture 581 Ranking using Windowing Functions

Lecture 582 Order of execution of SQL.cmproj

Lecture 583 Overview of Subqueries

Lecture 584 Filtering Windowing Function Results

Section 51: Apache Spark using Python - Data Processing Overview

Lecture 585 Starting Spark Context - pyspark

Lecture 586 Overview of Spark Read APIs

Lecture 587 Understanding airlines data

Lecture 588 Inferring Schema

Lecture 589 Previewing Airlines Data

Lecture 590 Overview of Data Frame APIs

Lecture 591 Overview of Functions

Lecture 592 Overview of Spark Write APIs

Section 52: Apache Spark using Python - Processing Column Data

Lecture 593 Overview of Predefined Functions in Spark

Lecture 594 Create Dummy Data Frame

Lecture 595 Categories of Functions

Lecture 596 Special Functions - col and lit

Lecture 597 Common String Manipulation Functions

Lecture 598 Extracting Strings using substring

Lecture 599 Extracting Strings using split

Lecture 600 Padding Characters around Strings

Lecture 601 Trimming Characters from Strings

Lecture 602 Date and Time Manipulation Functions

Lecture 603 Date and Time Arithmetic

Lecture 604 Using Date and Time Trunc Functions

Lecture 605 Date and Time Extract Functions

Lecture 606 Using to_date and to_timestamp

Lecture 607 Using date_format Function

Lecture 608 Dealing with Unix Timestamp

Lecture 609 Dealing with Nulls

Lecture 610 Using CASE and WHEN

Section 53: Apache Spark using Python - Basic Transformations

Lecture 611 Overview of Basic Transformations

Lecture 612 Data Frames for basic transformations

Lecture 613 Basic Filtering of Data

Lecture 614 Filtering Example using dates

Lecture 615 Boolean Operators

Lecture 616 Using IN Operator or isin Function

Lecture 617 Using LIKE Operator or like Function

Lecture 618 Using BETWEEN Operator

Lecture 619 Dealing with Nulls while Filtering

Lecture 620 Total Aggregations

Lecture 621 Aggregate data using groupBy

Lecture 622 Aggregate data using rollup

Lecture 623 Aggregate data using cube

Lecture 624 Overview of Sorting Data Frames

Lecture 625 Solution - Problem 1 - Get Total Aggregations

Lecture 626 Solution - Problem 2 - Get Total Aggregations By FlightDate

Section 54: Apache Spark using Python - Joining Data Sets

Lecture 627 Prepare Datasets for Joins

Lecture 628 Analyze Datasets for Joins

Lecture 629 Problem Statements for Joins

Lecture 630 Overview of Joins

Lecture 631 Using Inner Joins

Lecture 632 Left or Right Outer Join

Lecture 633 Solution - Get Flight Count Per US Airport

Lecture 634 Solution - Get Flight Count Per US State

Lecture 635 Solution - Get Dormant US Airports

Lecture 636 Solution - Get Origins without master data

Lecture 637 Solution - Get Count of Flights without master data

Lecture 638 Solution - Get Count of Flights per Airport without master data

Lecture 639 Solution - Get Daily Revenue

Lecture 640 Solution - Get Daily Revenue rolled up till Yearly

Section 55: Apache Spark using Python - Spark Metastore

Lecture 641 Overview of Spark Metastore

Lecture 642 Exploring Spark Catalog

Lecture 643 Creating Metastore Tables using catalog

Lecture 644 Inferring Schema for Tables

Lecture 645 Define Schema for Tables using StructType

Lecture 646 Inserting into Existing Tables

Lecture 647 Read and Process data from Metastore Tables

Lecture 648 Create Partitioned Tables

Lecture 649 Saving as Partitioned Table

Lecture 650 Creating Temporary Views

Lecture 651 Using Spark SQL

Section 56: Getting Started with Semi Structured Data using Spark

Lecture 652 Introduction to Getting Started with Semi Structured Data using Spark

Lecture 653 Create Spark Metastore Table with Special Data Types

Lecture 654 Overview of ARRAY Type in Spark Metastore Table

Lecture 655 Overview of MAP and STRUCT Type in Spark Metastore Table

Lecture 656 Insert Data into Spark Metastore Table with Special Type Columns

Lecture 657 Create Spark Data Frame with Special Data Types

Lecture 658 Create Spark Data Frame with Special Types using Python List

Lecture 659 Insert Spark Data Frame with Special Types into Spark Metastore Table

Lecture 660 Review Data in the JSON File with Special Data Types

Lecture 661 Setup JSON Data Set to explore Spark APIs on Special Data Type Columns

Lecture 662 Read JSON Data with Special Types into Spark Data Frame

Lecture 663 Flatten Array Fields in Spark Data Frames using explode and explode_outer

Lecture 664 Get Size or Length of Array Type Columns in Spark Data Frame

Lecture 665 Concatenate Array Values into Delimited String using Spark APIs

Lecture 666 Convert Delimited Strings from Spark Data Frame Columns to Arrays

Lecture 667 Setup Data Sets to Build Arrays using Spark

Lecture 668 Read JSON Data into Spark Data Frame and Review Aggregate Operations

Lecture 669 Build Arrays from Flattened Rows of Spark Data Frame

Lecture 670 Getting Started with Spark Data Frames with Struct Columns

Lecture 671 Concatenate Struct Column Values in Spark Data Frame

Lecture 672 Filter Data on Struct Column Attributes in Spark Data Frame

Lecture 673 Create Spark Data Frame using Map Type Column

Lecture 674 Project Map Values as Columns using Spark Data Frame APIs

Lecture 675 Conclusion of Getting Started with Semi Structured Data using Spark

Section 57: Process Semi Structured Data using Spark Data Frame APIs

Lecture 676 Introduction to Process Semi Structured Data using Spark Data Frame APIs

Lecture 677 Review the Data Sets to generate denormalized JSON Data using Spark

Lecture 678 Setup JSON Data Sets in HDFS using HDFS Command

Lecture 679 Create Spark Data Frames using Data Frame APIs

Lecture 680 Join Orders and Order Items using Spark Data Frame APIs

Lecture 681 Generate Struct Field for Order Details using Spark

Lecture 682 Generate Array of Struct Field for Order Details using Spark

Lecture 683 Join Data Sets to generate denormalized JSON Data using Spark

Lecture 684 Denormalize Join Results using Spark Data Frame APIs

Lecture 685 Write Denormalized Customer Details to JSON Files using Spark

Lecture 686 Publish JSON Files for downstream applications

Lecture 687 Read Denormalized Data into Spark Data Frame

Lecture 688 Filter Denormalized Data Frame using Spark APIs

Lecture 689 Perform Aggregations on Denormalized Data Frame using Spark

Lecture 690 Flatten Semi Structured Data or Denormalized Data using Spark

Lecture 691 Compute Monthly Customer Revenue using Spark on Denormalized Data

Lecture 692 Conclusion of Processing Semi Structured Data using Spark Data Frame APIs

Section 58: Apache Spark - Development Life Cycle using Python

Lecture 693 Setup Virtual Environment and Install Pyspark

Lecture 694 [Commands] - Setup Virtual Environment and Install Pyspark

Lecture 695 Getting Started with Pycharm

Lecture 696
 - Getting Started with Pycharm

Lecture 697 Passing Run Time Arguments

Lecture 698 Accessing OS Environment Variables

Lecture 699 Getting Started with Spark

Lecture 700 Create Function for Spark Session

Lecture 701 [code] - Create Function for Spark Session

Lecture 702 Setup Sample Data

Lecture 703 Read Data from Files

Lecture 704 [code] - Read data from files

Lecture 705 Process Data using Spark APIs

Lecture 706 [code] - Process data using Spark APIs

Lecture 707 Write Data to Files

Lecture 708 [code] - Write data to files

Lecture 709 Validating Writing Data to Files

Lecture 710 Productionizing the Code

Lecture 711 [code] - Productionizing the code

Lecture 712 Setting up Data for Production Validation

Lecture 713 Running Application using YARN

Lecture 714 Detailed Validation of the Application

Section 59: Spark Application Execution Life Cycle and Spark UI

Lecture 715 Deploying and Monitoring Spark Applications - Introduction

Lecture 716 Overview of Types of Spark Cluster Managers

Lecture 717 Setup EMR Cluster with Hadoop and Spark

Lecture 718 Overall Capacity of Big Data Cluster with Hadoop and Spark

Lecture 719 Understanding YARN Capacity of an Enterprise Cluster

Lecture 720 Overview of Hadoop HDFS and YARN Setup on Multi-node Cluster

Lecture 721 Overview of Spark Setup on top of Hadoop

Lecture 722 Setup Data Set for Word Count application

Lecture 723 [Instructions and Commands] Setup Data Set for Word Count Application

Lecture 724 Develop Word Count Application

Lecture 725 [code] Develop Word Count Application

Lecture 726 Review Deployment Process of Spark Application

Lecture 727 Overview of Spark Submit Command

Lecture 728 Switching between Python Versions to run Spark Apps or launch Pyspark CLI

Lecture 729 Switching between Pyspark Versions to run Spark Apps or launch Pyspark CLI

Lecture 730 Review Spark Configuration Properties at Run Time

Lecture 731 Develop Shell Script to run Spark Application

Lecture 732 [code] Develop Shell Script to run Spark Application

Lecture 733 Run Spark Application and review default executors

Lecture 734 Overview of Spark History Server UI

Section 60: Setup SSH Proxy to access Spark Application logs

Lecture 735 Setup SSH Proxy to access Spark Application logs - Introduction

Lecture 736 Overview of Private and Public ips of servers in the cluster

Lecture 737 Overview of SSH Proxy

Lecture 738 Setup sshuttle on Mac or Linux

Lecture 739 Proxy using sshuttle on Mac or Linux

Lecture 740 Accessing Spark Application logs via SSH Proxy using sshuttle on Mac or Linux

Lecture 741 Side effects of using SSH Proxy to access Spark Application Logs

Lecture 742 Steps to setup SSH Proxy on Windows to access Spark Application Logs

Lecture 743 Setup PuTTY and PuTTYgen on Windows

Lecture 744 Quick Tour of PuTTY on Windows

Lecture 745 Configure Passwordless Login using PuTTYGen Keys on Windows

Lecture 746 Run Spark Application on Gateway Node using PuTTY

Lecture 747 Configure Tunnel to Gateway Node using PuTTY on Windows for SSH Proxy

Lecture 748 Setup Proxy on Windows and validate using Microsoft Edge browser

Lecture 749 Understanding Proxying Network Traffic overcoming Windows Caveats

Lecture 750 Update Hosts file for worker nodes using private ips

Lecture 751 Access Spark Application logs using SSH Proxy

Lecture 752 Overview of performing tasks related to Spark Applications using Mac

Section 61: Deployment Modes of Spark Applications

Lecture 753 Deployment Modes of Spark Applications - Introduction

Lecture 754 Default Execution Master Type for Spark Applications

Lecture 755 Launch Pyspark using local mode

Lecture 756 Running Spark Applications using Local Mode

Lecture 757 Overview of Spark CLI Commands such as Pyspark

Lecture 758 Accessing Local Files using Spark CLI or Spark Applications

Lecture 759 Overview of submitting spark application using client deployment mode

Lecture 760 Overview of submitting spark application using cluster deployment mode

Lecture 761 Review the default logging while submitting Spark Applications

Lecture 762 Changing Spark Application Log Level using custom log4j properties

Lecture 763 Submit Spark Application using client mode with log level info

Lecture 764 Submit Spark Application using cluster mode with log level info

Lecture 765 Submit Spark Applications using SPARK_CONF_DIR with custom properties files

Lecture 766 Submit Spark Applications using Properties File

Computer Science or IT Students or other graduates with passion to get into IT,Data Warehouse Developers who want to transition to Data Engineering roles,ETL Developers who want to transition to Data Engineering roles,Database or PL/SQL Developers who want to transition to Data Engineering roles,BI Developers who want to transition to Data Engineering roles,QA Engineers to learn about Data Engineering,Application Developers to gain Data Engineering Skills[/code][/code][/code][/code][/code][/code][/code]