Big Data Analytics with Hadoop and Apache Spark [Released: 10/2/2024]
.MP4, AVC, 1280x720, 30 fps | English, AAC, 2 Ch | 51m | 119 MB
Instructor: Kumaran Ponnambalam
.MP4, AVC, 1280x720, 30 fps | English, AAC, 2 Ch | 51m | 119 MB
Instructor: Kumaran Ponnambalam
Apache Hadoop was a pioneer in the world of big data technologies, and it continues to lead in enterprise big data storage. Apache Spark is the top big data processing engine and provides an impressive array of features and capabilities. When used together, the Hadoop Distributed File System (HDFS) and Spark can provide a truly scalable setup for big data analytics.
In this course, data analytics expert Kumaran Ponnambalam shows you how to leverage these two technologies to build scalable and optimized data analytics pipelines. Explore ways to optimize data modeling and storage on HDFS; discuss scalable data ingestion and extraction using Spark; and review actionable tips for optimizing data processing in Spark. Plus, complete a use case project that allows you to practice your new techniques.
Learning objectives
- Explain where and why Apache Spark stores its data.
- Differentiate between the types of data to work with.
- Explain how bucketing can be used to partition data.
- Analyze the execution plan when reading HDFS files with schema.
- Determine when and how to apply best practices for data processing.
- Leverage various tools and techniques to build a solution using Apache Spark and Hadoop.