Big Data Processing And Machine Learning With Apache Spark
Last updated 4/2019
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 4.36 GB | Duration: 8h 54m
Last updated 4/2019
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 4.36 GB | Duration: 8h 54m
Leverage the power of Apache Spark to perform data processing, analytics, and machine learning on your data in real-time
What you'll learn
Query your structured data using Spark SQL and work with the DataSets API
Uncover what RDDs (Resilient Distributed Datasets) are and how to perform operations on them
Train machine learning models with streaming data, and use them for making real-time predictions
Implement high-velocity streaming and data processing use cases while working with streaming API
Dive into MLlib– the machine learning functional library in Spark with highly scalable algorithm
See analytical use case implementations using MLLib, GraphX, and Spark streaming
Examine a number of real-world use cases with hands-on projects
Build Hadoop and Apache Spark jobs that process data quickly and effectively
Requirements
Knowledge of Python programming is assumed but prior experience of working with Apache Spark is not required.
Description
Apache Spark is highly configurable and is gaining rapid popularity in the Big Data markets because of its in-memory data processing that makes it high-speed data processing engine. It also has well-built libraries for machine learning and graph analytics algorithms. This brings in Apache Spark to solve scalable machine learning problems and also work with high streaming real-time data. If you want to get the most out of the trending Big Data framework for all your data processing and machine learning needs, then this course is for you.This course focuses on performing data streaming, data analytics, and machine learning with Apache Spark. You will learn to load data from a variety of structured sources such as JSON, Hive, and Parquet using Spark SQL and schema RDDs. You will also build streaming applications and learn best practices for managing high-velocity streaming and external data sources. Next, you will explore Spark machine learning libraries and GraphX where you will perform graphical processing and analysis. Finally, you will build projects which will help you put your learnings into practice and get a stronghold of the topic.Contents and OverviewThis training program includes 4 complete courses, carefully chosen to give you the most comprehensive training possible.The first course, Apache Spark in 7 Days, is designed to give you a fundamental understanding of and hands-on experience in writing basic code as well as running applications on a Spark cluster. You will work on interesting examples and assignments that will demonstrate and help you understand basic operations, querying machine learning, and streaming.In the second course, Big Data Processing using Apache Spark, you will learn how to leverage Apache Spark to be able to process big data quickly. You will learn the basics of Spark API and its architecture in detail. You will then learn about Data Mining and Data Cleaning, wherein you will understand the Input Data Structure and how Input data is loaded. You will also write actual jobs that analyze data.The third course, Big Data Analytics Projects with Apache Spark, contains various projects that consist of real-world examples. The first project is to find top selling products for an e-commerce business by efficiently joining data sets in the paradigm. Next, a Market Basket Analysis will help you identify items likely to be purchased together and find correlations between items in a set of transactions. Moving on, you will learn about probabilistic logistic regression by finding an author for a post. Next, you will build a content-based recommendation system for movies to predict whether an action will happen, which you will do by building a trained model. Finally, you will use the MapReduce Spark program to calculate mutual friends on the social network.In the fourth course, Hands-On Machine Learning with Scala and Spark, you will go through day-to-day challenges that programmers face while implementing ML pipelines and consider different approaches and models to solve complex problems. You will learn about the most effective machine learning techniques and implement them in your favour. You will also implement algorithms with practical hands-on projects wherein you will build data models and understand how they work by using different types of algorithms.By the end of this course, you will be able to process large datasets, extract features from it, and apply a machine learning model that is well suited to your problem.Meet Your Expert(s):We have the best work of the following esteemed author(s) to ensure that your learning journey is smooth:Karen Yang has been a passionate self-learner in computer science for over 6 years. She has programming, big data processing, and engineering experience. Her recent interests include cloud computing. She previously taught for 5 years in a college evening adult program.Tomasz Lelek is a Software Engineer and Co-Founder of InitLearn. He mostly does programming in Java and Scala. He dedicates his time and effort to get better at everything. He is currently diving into Big Data technologies. Tomasz is very passionate about everything associated with software development. He has been a speaker at a few conferences in Poland-Confitura and JDD, and at the Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He was also a speaker at an international event in Dhaka. He is very enthusiastic and loves to share his knowledge.
Overview
Section 1: Apache Spark in 7 Days
Lecture 1 The Course Overview
Lecture 2 Setting Up an AWS Account
Lecture 3 Launching a Spark Cluster on EC2
Lecture 4 Setting Up Your Environment
Lecture 5 Running a Test Application
Lecture 6 Creating RDDs
Lecture 7 Actions
Lecture 8 Transformations
Lecture 9 Joins, Set, and Numeric Operations
Lecture 10 Shared Variables
Lecture 11 Installing Jupyter Notebook
Lecture 12 RDDs and DataFrames
Lecture 13 DataFrame Row Operations
Lecture 14 DataFrame Column Operations
Lecture 15 DataFrame Manipulation
Lecture 16 Views
Lecture 17 Schemas
Lecture 18 SQL Operations
Lecture 19 I/O Options
Lecture 20 HIVE
Lecture 21 Basic Statistics
Lecture 22 Pipelines
Lecture 23 Feature Extractors
Lecture 24 Feature Transformers
Lecture 25 Feature Selectors
Lecture 26 Classification
Lecture 27 Regression
Lecture 28 Clustering
Lecture 29 Collaborative Filtering
Lecture 30 Model Selection and Tuning
Lecture 31 DStreams
Lecture 32 DStream Window Operations
Lecture 33 Structured Streaming
Lecture 34 Window Operations
Lecture 35 Joining Batch and Streaming Data
Section 2: Big Data Processing using Apache Spark
Lecture 36 The Course Overview
Lecture 37 Overview of the Apache Spark and Its Architecture
Lecture 38 Start a Project Using Apache Spark, Look at build.sbt
Lecture 39 Creating the Spark Context
Lecture 40 Looking at API of Spark
Lecture 41 Looking at the Input Data Structure
Lecture 42 Using RDD API in the Data Mining Process
Lecture 43 Loading Input Data
Lecture 44 Cleaning Input Data
Lecture 45 Logic for Counting Words
Lecture 46 Using RDD API Transformations and Actions to Solve a Problem
Lecture 47 Testing Spark Job
Lecture 48 Summary of Data Processing
Section 3: Big Data Analytics Projects with Apache Spark
Lecture 49 The Course Overview
Lecture 50 Explaining Ways of Joining Datasets
Lecture 51 Developing Spark Algorithm for Joining/Windowing Datasets
Lecture 52 Testing Logic in MapReduce Spark — Finding Top Sellers
Lecture 53 Drawing Conclusions from Top Sellers Data
Lecture 54 Market Basket Analysis Goals
Lecture 55 Where MBA Algorithms Are Useful?
Lecture 56 Implementing MBA MapReduce Algorithm in Spark
Lecture 57 Finding Association Rules Between Products
Lecture 58 Analyzing Post for an Author
Lecture 59 Extracting Information from Unstructured Text
Lecture 60 Extracting Information via Spark DataFrame
Lecture 61 Sentiment Analysis of Posts Using Logistic Regression
Lecture 62 Finding an Author of a Post
Lecture 63 Content-Based Recommendation Systems Explanation
Lecture 64 Finding Correlation Between Movies and Users
Lecture 65 Testing Logic in MapReduce Spark
Lecture 66 Finding Recommendation for Given User
Lecture 67 Finding Common Friends Problem — Graph Approach
Lecture 68 Creating a Graph Using GraphX and Property Graph
Lecture 69 Solution — Examining Available Methods
Lecture 70 Finding Closest Friend for Given User Using Page Rank
Section 4: Hands-On Machine Learning with Scala and Spark
Lecture 71 The Course Overview
Lecture 72 Analyzing Text Input Data
Lecture 73 Feature Generation from Text – Count Vectorizer, TFIDF, LDA
Lecture 74 Extracting Features from Data – Transforming Text into Vector of Numbers
Lecture 75 Bag-of-Words and Skip Gram
Lecture 76 Training Classification Models – Implementing Word2Vect Using Apache Spark
Lecture 77 Logistic Regression Explanation
Lecture 78 Writing a Logistic Regression Model Per Author in Apache Spark
Lecture 79 Training Regression Model
Lecture 80 Key Concepts, Machine Learning Pipelines, and Operations
Lecture 81 Learn How to Validate Models Using Cross-Validation
Lecture 82 Analyzing Time of Post Using Clustering – (GMM Explanation)
Lecture 83 Implementing GMM in Apache Spark
Lecture 84 K-Means Clustering Explanation and Use Cases
Lecture 85 Implementing K-Means Clustering in Apache Spark
Lecture 86 Measure Accuracy Using Area Under ROC
Lecture 87 Dimensionality Reduction Using Singular Value Decomposition (SVD)
Lecture 88 Building Recommendation Engine in Spark Using Collaborative Filtering
Lecture 89 Using Recommendation Engine to Get Top Recommendations
Lecture 90 Dense and Sparse Vectors
Lecture 91 LabeledPoints, Rating, and Other Data Types
Lecture 92 The Spark versus Deep Learning Use Case
Lecture 93 Spark for Parallelizing Deep Learning Evaluation
Lecture 94 Deep Learning As a Feature Generator for Existing Spark ML Algorithms
Lecture 95 Spark/Deep Learning Made Simple
This course will be particularly useful if you are a developer, data analyst, data engineer, or data scientist. However, anyone interested in learning how to use Spark will also benefit from this course.