Big Data With Apache Spark 3 And Python: From Zero To Expert

Posted By: ELK1nG

Big Data With Apache Spark 3 And Python: From Zero To Expert
Published 11/2022
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.76 GB | Duration: 4h 19m

Complete bootcamp to learn PySpark, Databricks, Spark Machine Learning, Advanced Analytics, Koalas and Spark Streaming

What you'll learn
Introduction to Big Data and Apache Spark Fundamentals
Spark RDDs, Dataframes and Spark Koalas
Machine Learning with Spark
Advanced features with Apache Spark
Advanced analytics and data visualization toold
Spark in cloud with Azure and Databricks
Spark Streaming and GraphX
Databricks
Machine learning in Databricks
Requirements
Python
Description
If you are looking for a hands-on, complete and advanced course to learn Big Data with Apache Spark and Python, you have come to the right place.This course is designed to cover the complete skillset of Apache Spark, from RDDs, Spark SQL, Dataframes, and Spark Streaming, to Machine Learning with Spark ML, Advanced Analytics, data visualization, Spark Koalas, and Databricks.With lessons, downloadable study guides, hands-on exercises, and real-world use cases, this is the only course you'll need to learn Apache Spark.Apache Spark has become the reference tool for Big Data, surpassing Hadoop MapReduce. Spark works up to 100 times faster than Hadoop MapReduce and has a complete ecosystem of functionalities for machine learning and data analytics. This makes Apache Spark one of the most in-demand skills for data engineers, data scientists, etc. Big Data is one of the most valuable skills today. So this course will teach you everything you need to position yourself in the Big Data job market.In this course we will teach you the complete skillset of Apache Spark and PySpark. Starting from the basics to the most advanced features. We will use visual presentations in Power Point, sharing clear explanations and useful professional advice.This course has the following sections:Introduction to big data and fundamentals of Apache SparkInstallation of Apache Spark and libraries such as Anaconda, Java, etc.Spark RDDsSpark DataframesAdvanced features with Apache SparkAdvanced analytics and data visualizationSpark KoalasMachine Learning with SparkSpark Streaming Spark GraphXDatabricksSpark in the cloud (Azure)If you're ready to sharpen your skills, increase your career opportunities, and become a Big Data expert, join today and get immediate and lifetime access to:• Complete guide to Apache Spark (PDF e-book)• Downloadable Spark project files and code• Hands-on exercises and quizzes• Spark resources like: Cheatsheets and Summaries• 1 to 1 expert support• Course question and answer forum• 30 days money back guaranteeSee you there!

Overview

Section 1: Spark Fundamentals

Lecture 1 How to get the most out of this course

Lecture 2 Course material

Lecture 3 Spark Fundamentals

Lecture 4 Apache Spark execution

Lecture 5 Apache Spark ecosystem and documentation

Lecture 6 PySpark: operation, cluster administration and architecture

Section 2: Installing Apache Spark locally

Lecture 7 Download Spark, Java and Anaconda

Lecture 8 Setting environment variables

Lecture 9 Running Spark in Prompt and Jupyter Notebook

Lecture 10 Fixing common problems

Section 3: Basic Features and RDDs

Lecture 11 PySpark Cheat Sheet

Lecture 12 RDD Fundamentals

Lecture 13 Initialize PySpark with SparkSession and the SparkContext

Lecture 14 Transformations in RDDs like map, filter, flatMap and distinct

Lecture 15 Transformations in RDDs like reduceByKey, groupByKey or sortByKey

Lecture 16 RDD actions such as count, first, collect or take

Section 4: Spark DataFrames and Apache Spark SQL

Lecture 17 PySpark Cheatsheet: SQL

Lecture 18 Fundamentals and advantages of DataFrames

Lecture 19 Characteristics of DataFrames and data sources

Lecture 20 Creating DataFrames in PySpark

Lecture 21 Operations with PySpark DataFrames

Lecture 22 Different types of joins in DataFrames

Lecture 23 SQL queries in PySpark

Lecture 24 Advanced features for loading and exporting data in PySpark

Section 5: Advanced features in Apache Spark

Lecture 25 Funciones avanzadas y optimización del rendimiento

Lecture 26 BroadCast Join and caching

Lecture 27 User Defined Functions (UDF) and advanced SQL functions

Lecture 28 Handling and imputation of missing values

Lecture 29 Partitioning and catalog of APIs

Lecture 30 Practical Exercise: Advanced Analytics with Apache Spark

Section 6: Advanced Analytics with Apache Spark

Lecture 31 Introduction to advanced analytics with Spark

Lecture 32 Data loading and data schema modification

Lecture 33 Inspect data in PySpark

Lecture 34 Column transformation in PySpark

Lecture 35 Advanced missing data imputation in PySpark

Lecture 36 Data selection with PySpark and PySpark SQL

Lecture 37 Data visualization and graph generation in PySpark

Lecture 38 Persist data with PySpark

Section 7: Kolas: The Apache Spark Pandas API

Lecture 39 Spark Koalas Fundamentals

Lecture 40 Feature Engineering with Koalas

Lecture 41 Creating DataFrames with Koalas

Lecture 42 Data manipulation and DataFrames with Koalas

Lecture 43 Working with missing data in Koalas

Lecture 44 Data visualization and graph generation with Koalas

Lecture 45 Importing and exporting data with Koalas

Lecture 46 Hands-on exercise with Koalas

Section 8: Machine Learning with Apache Spark

Lecture 47 Fundamentals of Machine Learning with Spark

Lecture 48 Spark Machine Learning Components

Lecture 49 Stages of developing a Machine Learning model

Lecture 50 Import data and exploratory data analysis (EDA)

Lecture 51 Data preprocessing with PySpark

Lecture 52 Training the machine learning model in PySpark

Lecture 53 Evaluation of the Machine Learning model

Section 9: Spark Streaming

Lecture 54 Practical example of counting words with Spark Streaming

Lecture 55 Spark Streaming Configurations: Output Modes and Operation Types

Lecture 56 Time Window Operations in Spark Streaming

Lecture 57 Spark Streaming Capabilities

Lecture 58 Use case: Real-time bank fraud detection (Part I)

Lecture 59 Use case: Real-time bank fraud detection (Part II)

Lecture 60 Spark Streaming Exercise

Section 10: Introduction to Databricks

Lecture 61 Introduction to Databricks

Lecture 62 Databricks Terminology and Databricks Community

Lecture 63 Delta Lake

Lecture 64 Create a free Databricks account

Section 11: Apache Spark on Databricks

Lecture 65 Introduction to the Databricks environment

Lecture 66 Getting started with Databricks

Lecture 67 Creating and saving DataFrames in Databricks

Lecture 68 Data transformation and visualization in Databricks

Lecture 69 Use case: Population data analytics

Section 12: Machine Learning in Databricks

Lecture 70 Import and exploratory analysis of the data

Lecture 71 Variable preprocessing with PySpark and Databricks

Lecture 72 Definition of the Machine Learning model and development of the Pipeline

Lecture 73 Model evaluation with PySpark and Databricks

Lecture 74 Hyperparameter tuning and registration in MLFlow

Lecture 75 Predictions with new data and visualization of the results

Section 13: Additional material

Lecture 76 Additional Resources: Complete Guide to Spark

Anyone who wants to learn advanced big data skills,Anyone who knows Python and wants to adquire Big Data processing skills,Anyone that want to make a career as a data engineer, data analyst or data scientist,Anyone interested in learning Apache Spark and Pyspark for Big Data analysis,Anyone that want to learn cutting-edge technology in Big Data