Hands-On With Hadoop 2: 3-In-1

Posted By: ELK1nG

Hands-On With Hadoop 2: 3-In-1
Last updated 11/2020
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 5.37 GB | Duration: 10h 56m

Run your own Hadoop clusters on your own machine or in the cloud

What you'll learn

Understand the Hadoop 2.x Architecture

Create Map-reduce jobs

Plan, install and configure core Hadoop services on a Cluster

Validate the Cluster using HDFS, Map Reduce and Spark

Understand Cluster Life-Cycle and Performance tuning of a Hadoop Cluster

Hands-on solutions to your perplexing, real-world big data problems

Requirements

Good knowledge of Java

Description

Hadoop is the most popular, reliable and scalable distributed computing and storage for Big Data solutions. It comprises of components designed to enable tasks on a distributed scale, across multiple servers and thousands of machines.
This comprehensive 3-in-1 training course gives you a strong foundation by exploring Hadoop ecosystem with real-world examples. You’ll discover the process to set up an HDFS cluster along with formatting and data transfer in between your local storage and the Hadoop filesystem. Also get a hands-on solution to 10 real-world use-cases using Hadoop.





Contents and Overview This training program includes 3 complete courses, carefully chosen to give you the most comprehensive training possible.

The first course, Getting Started with Hadoop 2.x, opens with an introduction to the world of Hadoop, where you will learn Nodes, Data Sets, and operations such as map and reduce. The second section deals HDFS, Hadoop's file-system used to store data. Further on, you’ll discover the differences between jobs and tasks, and get to know about the Hadoop UI. After this, we turn our attention to storing data in HDFS and Data Transformations. Lastly, we will learn how to implement an algorithm in Hadoop map-reduce way and analyze the overall performance.


The second course, Hadoop Administration and Cluster Management, starts by installing the Apache Hadoop for cluster installation and configuring the required services. Learn various cluster operations like validations, and expanding and shrinking Hadoop services. You will then move onto gain a better understanding of administrative tasks like planning your cluster, monitoring, logging, security, troubleshooting and best practices. Techniques to keep your Hadoop clusters highly available and reliant are also covered in this course.


The third course, Solving 10 Hadoop'able Problems, covers the core parts of the Hadoop ecosystem, helping to give a broad understanding and get you up-and-running fast. Next, it describes a number of common problems as case-study projects Hadoop is able to solve. These sections are broken down into sections by different projects, each serving as a specific use case for solving big data problems.


By the end of this Learning Path, you’ll be able to plan, deploy, manage and monitor and performance-tune your Hadoop Cluster with Apache Hadoop.


About the Author
A K M Zahiduzzaman is a software engineer with NewsCred Dhaka. He is a software developer and technology enthusiast. He was a Ruby on Rails developer, but now working on NodeJS and angularJS and python. He is also working with a much wider vision as a technology company. The next goal is introducing SOA within the current applications to scale development via microservices. Zahiduzzaman has a lot of experience with Spark and is passionate about it. He is also a guitarist and has a band too. He was also a speaker for an international event in Dhaka. He is very enthusiastic and love to share his knowledge.


Gurmukh Singh is a technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo and has authored the book Monitoring Hadoop.               


Tomasz Lelek is a Software Engineer and Co-Founder of InitLearn. He mostly does programming in Java and Scala. He dedicates his time and efforts to get better at everything. He is currently delving into big data technologies. Tomasz is very passionate about everything associated with software development. He has been a speaker at a few conferences in Poland-Confitura and JDD, and at the Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He was also a speaker at an international event in Dhaka. He is very enthusiastic and loves to share his knowledge.



Overview

Section 1: Getting Started with Hadoop 2.x

Lecture 1 The Course Overview

Lecture 2 Installing Hadoop in Local

Lecture 3 Bring Process to Data

Lecture 4 NameNode Versus DataNode

Lecture 5 Map and Reduce Operations

Lecture 6 Order of Execution and Parallel Thinking

Lecture 7 Formatting a HDFS

Lecture 8 Formatting a HDFS

Lecture 9 Some Helpful Commands to Communicate with the HDFS

Lecture 10 HDFS Protocol and Using It in Applications

Lecture 11 Hadoop Jobs Versus Tasks

Lecture 12 The Hadoop UI for Task Progress

Lecture 13 Running a Couple of Example Jobs

Lecture 14 Analyze the Work Flow/Data Flow/Process Flow

Lecture 15 Introduction to the Movie Dataset

Lecture 16 Data Transformation and Storing to HDFS

Lecture 17 Devise a Simple Algorithm for Recommendation

Lecture 18 Implement the Algorithm in Hadoop Map-Reduce Way and Analyze Performance

Section 2: Hadoop Administration and Cluster Management

Lecture 19 The Course Overview

Lecture 20 Navigation of GitBash

Lecture 21 Navigation of Vagrant

Lecture 22 Navigation of VirtualBox

Lecture 23 Planning a Single Node Setup

Lecture 24 Install Apache Hadoop

Lecture 25 Apache Hadoop Overview

Lecture 26 Hadoop Distributed File System (HDFS)

Lecture 27 YARN Overview

Lecture 28 MapReduce

Lecture 29 Planning Hadoop Services Placement

Lecture 30 Planning ZooKeeper Placement

Lecture 31 Planning HDFS Service Placement

Lecture 32 Planning YARN

Lecture 33 Planning Spark Services

Lecture 34 HDFS Concepts

Lecture 35 HDFS Data Movement

Lecture 36 HDFS Admin Commands

Lecture 37 MapReduce Jobs

Lecture 38 Spark Jobs

Lecture 39 Start/Stop Services

Lecture 40 Manage Cluster Using Ambari

Lecture 41 Hadoop Upgrade

Lecture 42 Scaling Cluster – Part 1

Lecture 43 Scaling Cluster – Part 2

Lecture 44 HDFS Masters

Lecture 45 HA Configuration

Lecture 46 YARN Masters

Lecture 47 Linux ACLs

Lecture 48 HDFS ACLs Security – Part 1

Lecture 49 HDFS ACLs Security – Part 2

Lecture 50 Hadoop Users and Groups

Lecture 51 NameNode UI

Lecture 52 Apache Hadoop Auditing

Lecture 53 Hadoop Metrics

Lecture 54 Hadoop Logs and Monitoring

Lecture 55 Hadoop Troubleshooting – Part 1

Lecture 56 Hadoop Troubleshooting – Part 2

Section 3: Solving 10 Hadoop'able Problems

Lecture 57 The Course Overview

Lecture 58 Hadoop Distributed File System (HDFS)

Lecture 59 Distributed Compute Capability YARN

Lecture 60 Apache Hive for ETL and SQL Like

Lecture 61 Message Queuing and Data Ingestion Kafka

Lecture 62 NoSQL Datastores – Hadoop HBase, Accumulo

Lecture 63 Machine Learning – Spark and Spark MLlib

Lecture 64 Stream Processing – Spark Streaming

Lecture 65 Processing Payment Data from an Event Stream

Lecture 66 Advanced Aggregations Using Streaming API – PaymentAnalyzer

Lecture 67 Storing Time Series Data in HBase

Lecture 68 Detecting BOT Traffic Using Spark Streaming

Lecture 69 Make Web Log Data Queryable – Hive Sink

Lecture 70 Investigating Customers Data in Hive

Lecture 71 Trending Supply Chain – Finding Top Seller Item in a Streaming Way

Lecture 72 Enriching Top Sellers with Additional Information

Lecture 73 Analyzing Customer Churn (Quantitative) Using DataFrame Queries

Lecture 74 Analyzing Customer Churn (Amounts) Using DataFrame Queries

Lecture 75 Storing Low Granularity Structured Sensor Data in HBase

Lecture 76 Consuming Sensor Data Stored in HBase – Scan and Count

Lecture 77 Building Summaries on Data Streaming from Devices

Lecture 78 Introducing Spark GraphX – How to Represent a Graph?

Lecture 79 Perform Graph Operations Using GraphX

Lecture 80 Counting Degree of Vertices

Lecture 81 Neighborhood Aggregations – Collecting Neighbors

Lecture 82 Structural Operators – Connected Components

Lecture 83 Page Rank Using Spark GraphX

Lecture 84 Anomaly Detection

Lecture 85 Analyzing Web Logs for Suspicious Activity and Loading into Spark

Lecture 86 Implementing Clustering – Choosing Number of Clusters

Lecture 87 Detecting Anomalies in Network Traffic

Lecture 88 Analyzing Post for an Author

Lecture 89 Extracting Information from Unstructured Text

Lecture 90 Extracting Information Via Spark DataFrame

Lecture 91 Sentiment Analysis of Posts Using Logistic Regression

Lecture 92 Finding an Author of a Post

Lecture 93 Downloading and Setting Cloudera Sandbox

Lecture 94 Finding What Products Users Wants to Buy Using Cloudera Sandbox Toolkit

Lecture 95 Using Movies History to Suggest Interesting Content

Lecture 96 Testing and Experimenting with Recommendation Engine

This course is perfect for budding data scientists and data analysts with a firm understanding of Java and wants to get started with Hadoop