Machine Learning with Apache Spark 3.0 using Scala
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English (US) | Size: 3.87 GB | Duration: 8h 20m
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English (US) | Size: 3.87 GB | Duration: 8h 20m
Machine Learning with Apache Spark 3.0 using Scala with Examples and 4 Projects
What you'll learn
Understand the fundamentals of Machine Learning and its types (supervised, unsupervised, classification, regression, clustering).
Learn the basics of Apache Spark 3.0 and how it supports large-scale data processing.
Work hands-on with Spark RDDs, DataFrames, and Datasets using Scala.
Explore Spark MLlib – the machine learning library in Spark – and how it enables scalable ML solutions.
Build end-to-end Machine Learning pipelines using Spark, from data ingestion to model evaluation.
Gain practical experience with real-world datasets such as predict rain in Australia, Iris flower classification, ad click prediction, and mall customer segment
Learn how to work with different data sources like CSV, JSON, Parquet, Avro, LIBSVM, and images.
Master feature engineering techniques such as TF-IDF, Word2Vec, CountVectorizer, PCA, n-grams, StringIndexer, OneHotEncoder, VectorAssembler, and more.
Implement various classification models including Decision Trees, Logistic Regression, Naive Bayes, Random Forests, Gradient-Boosted Trees, Linear SVM,
Apply different regression models such as Linear Regression, Decision Trees, Random Forests, and Gradient-Boosted Trees.
Work with clustering algorithms like KMeans for customer segmentation.
Understand the concepts behind machine learning pipelines and how to use Spark’s pipeline API effectively.
Get tips, tricks, and best practices for writing efficient and production-ready ML models in Spark using Scala.
Requirements
Basic programming knowledge – familiarity with any programming language (Scala, Java, Python, or C++) will be helpful.
Scala basics – prior exposure to Scala is recommended, but the course also covers essential Scala concepts needed for Spark ML.
Basic math & statistics – understanding of concepts like mean, median, variance, probability, and linear algebra will make learning ML easier.
No prior Spark experience required – the course includes an optional section on Apache Spark basics, making it beginner-friendly.
A computer with internet access to create a free Databricks account or run Spark locally.
Enthusiasm to learn Machine Learning and Big Data technologies hands-on!
Description
Do you want to master Machine Learning at scale using one of the most powerful Big Data frameworks in the world? This course will teach you Machine Learning with Apache Spark 3.0 and Scala, step by step, through real-world projects and hands-on coding examples.Apache Spark is the industry-standard framework for processing and analyzing large datasets. Its MLlib (Machine Learning Library) provides scalable implementations of machine learning algorithms, making it possible to train, evaluate, and deploy models on massive amounts of data efficiently. Combined with Scala, the native language of Spark, you’ll learn how to build and optimize end-to-end machine learning pipelines.This course is designed for beginners to intermediate learners who want to get practical experience in applying machine learning techniques in Spark. You’ll start with Big Data and Spark basics, then move on to core machine learning concepts, and finally apply them to real-world datasets through hands-on projects like rain prediction, ad click prediction, iris flower classification, and customer segmentation.By the end of this course, you will have the skills and confidence to build scalable machine learning models using Spark 3.0 and Scala—skills that are highly in-demand in industries such as finance, e-commerce, telecom, and technology.What You Will LearnIntroduction to Machine Learning & Spark MLlibBasics of machine learning, types (supervised, unsupervised, classification, regression, clustering).What is Spark ML? How Spark MLlib simplifies building ML models at scale.Apache Spark Basics (Optional Section)Get familiar with Spark fundamentals: RDD, DataFrames, and Datasets.Set up Spark environment using Databricks.Learn notebook basics, cluster provisioning, and working with Scala.Data Handling & PreparationWork with different data sources: CSV, JSON, LIBSVM, Images, Avro, and Parquet.Understand the Machine Learning data pipeline in Spark.Practice feature extraction, transformation, and selection techniques.Feature Engineering in Spark MLLearn popular feature extractors like TF-IDF, Word2Vec, CountVectorizer, FeatureHasher.Apply transformers such as Tokenizer, StopWordsRemover, n-gram, PCA, StringIndexer, OneHotEncoder.Use feature selectors like RFormula and ChiSqSelector.Build and connect them into end-to-end ML pipelines.Machine Learning Models with SparkClassification Models: Decision Trees, Logistic Regression, Naive Bayes (Iris Prediction), Random Forest, Gradient-Boosted Trees, Linear SVM, One-vs-Rest.Regression Models: Linear Regression, Decision Tree Regression, Random Forest Regression, Gradient-Boosted Tree Regression, Predict Ads Clicks project.Clustering: KMeans (Customer Segmentation Project).Hands-On ProjectsRain Prediction in Australia (complete ML pipeline).Iris Flower Classification using Naive Bayes.Customer Segmentation using KMeans.Ad Click Prediction using Linear Regression.Multiple other classification and regression use cases with step-by-step Scala implementations.Spark MLlib in PracticeUnderstand how to train, evaluate, and optimize ML models at scale.Explore key concepts like shuffling, correlation, pipeline components, and evaluation metrics.
Who this course is for:
Beginners in Machine Learning who want to understand ML concepts and implement them using Apache Spark and Scala., Data Engineers & Big Data Developers looking to expand their skills into machine learning pipelines with Spark MLlib., Software Developers & Programmers who want to transition into the field of Data Science and AI using distributed computing., Data Scientists interested in leveraging Spark’s scalability for large datasets and production-grade ML models., Students & Researchers eager to apply machine learning concepts in real-world, big data environments., Professionals preparing for interviews or career transitions in Big Data, Spark, or ML-related roles., Anyone curious about building end-to-end ML projects using Spark’s powerful ecosystem.