Hands-On With Hadoop 2: 3-In-1
Last updated 11/2020
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 5.37 GB | Duration: 10h 56m
Last updated 11/2020
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 5.37 GB | Duration: 10h 56m
Run your own Hadoop clusters on your own machine or in the cloud
What you'll learn
Understand the Hadoop 2.x Architecture
Create Map-reduce jobs
Plan, install and configure core Hadoop services on a Cluster
Validate the Cluster using HDFS, Map Reduce and Spark
Understand Cluster Life-Cycle and Performance tuning of a Hadoop Cluster
Hands-on solutions to your perplexing, real-world big data problems
Requirements
Good knowledge of Java
Description
Hadoop is the most popular, reliable and scalable distributed computing and storage for Big Data solutions. It comprises of components designed to enable tasks on a distributed scale, across multiple servers and thousands of machines.
This comprehensive 3-in-1 training course gives you a strong foundation by exploring Hadoop ecosystem with real-world examples. You’ll discover the process to set up an HDFS cluster along with formatting and data transfer in between your local storage and the Hadoop filesystem. Also get a hands-on solution to 10 real-world use-cases using Hadoop.
Contents and Overview This training program includes 3 complete courses, carefully chosen to give you the most comprehensive training possible.
The first course, Getting Started with Hadoop 2.x, opens with an introduction to the world of Hadoop, where you will learn Nodes, Data Sets, and operations such as map and reduce. The second section deals HDFS, Hadoop's file-system used to store data. Further on, you’ll discover the differences between jobs and tasks, and get to know about the Hadoop UI. After this, we turn our attention to storing data in HDFS and Data Transformations. Lastly, we will learn how to implement an algorithm in Hadoop map-reduce way and analyze the overall performance.
The second course, Hadoop Administration and Cluster Management, starts by installing the Apache Hadoop for cluster installation and configuring the required services. Learn various cluster operations like validations, and expanding and shrinking Hadoop services. You will then move onto gain a better understanding of administrative tasks like planning your cluster, monitoring, logging, security, troubleshooting and best practices. Techniques to keep your Hadoop clusters highly available and reliant are also covered in this course.
The third course, Solving 10 Hadoop'able Problems, covers the core parts of the Hadoop ecosystem, helping to give a broad understanding and get you up-and-running fast. Next, it describes a number of common problems as case-study projects Hadoop is able to solve. These sections are broken down into sections by different projects, each serving as a specific use case for solving big data problems.
By the end of this Learning Path, you’ll be able to plan, deploy, manage and monitor and performance-tune your Hadoop Cluster with Apache Hadoop.
About the Author
A K M Zahiduzzaman is a software engineer with NewsCred Dhaka. He is a software developer and technology enthusiast. He was a Ruby on Rails developer, but now working on NodeJS and angularJS and python. He is also working with a much wider vision as a technology company. The next goal is introducing SOA within the current applications to scale development via microservices. Zahiduzzaman has a lot of experience with Spark and is passionate about it. He is also a guitarist and has a band too. He was also a speaker for an international event in Dhaka. He is very enthusiastic and love to share his knowledge.
Gurmukh Singh is a technology professional with 14+ years of industry experience in infrastructure design, distributed systems, performance optimization, and networks. He has worked in big data domain for the last 5 years and provides consultancy and training on various technologies. He has worked with companies such as HP, JP Morgan, and Yahoo and has authored the book Monitoring Hadoop.
Tomasz Lelek is a Software Engineer and Co-Founder of InitLearn. He mostly does programming in Java and Scala. He dedicates his time and efforts to get better at everything. He is currently delving into big data technologies. Tomasz is very passionate about everything associated with software development. He has been a speaker at a few conferences in Poland-Confitura and JDD, and at the Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He was also a speaker at an international event in Dhaka. He is very enthusiastic and loves to share his knowledge.
Overview
Section 1: Getting Started with Hadoop 2.x
Lecture 1 The Course Overview
Lecture 2 Installing Hadoop in Local
Lecture 3 Bring Process to Data
Lecture 4 NameNode Versus DataNode
Lecture 5 Map and Reduce Operations
Lecture 6 Order of Execution and Parallel Thinking
Lecture 7 Formatting a HDFS
Lecture 8 Formatting a HDFS
Lecture 9 Some Helpful Commands to Communicate with the HDFS
Lecture 10 HDFS Protocol and Using It in Applications
Lecture 11 Hadoop Jobs Versus Tasks
Lecture 12 The Hadoop UI for Task Progress
Lecture 13 Running a Couple of Example Jobs
Lecture 14 Analyze the Work Flow/Data Flow/Process Flow
Lecture 15 Introduction to the Movie Dataset
Lecture 16 Data Transformation and Storing to HDFS
Lecture 17 Devise a Simple Algorithm for Recommendation
Lecture 18 Implement the Algorithm in Hadoop Map-Reduce Way and Analyze Performance
Section 2: Hadoop Administration and Cluster Management
Lecture 19 The Course Overview
Lecture 20 Navigation of GitBash
Lecture 21 Navigation of Vagrant
Lecture 22 Navigation of VirtualBox
Lecture 23 Planning a Single Node Setup
Lecture 24 Install Apache Hadoop
Lecture 25 Apache Hadoop Overview
Lecture 26 Hadoop Distributed File System (HDFS)
Lecture 27 YARN Overview
Lecture 28 MapReduce
Lecture 29 Planning Hadoop Services Placement
Lecture 30 Planning ZooKeeper Placement
Lecture 31 Planning HDFS Service Placement
Lecture 32 Planning YARN
Lecture 33 Planning Spark Services
Lecture 34 HDFS Concepts
Lecture 35 HDFS Data Movement
Lecture 36 HDFS Admin Commands
Lecture 37 MapReduce Jobs
Lecture 38 Spark Jobs
Lecture 39 Start/Stop Services
Lecture 40 Manage Cluster Using Ambari
Lecture 41 Hadoop Upgrade
Lecture 42 Scaling Cluster – Part 1
Lecture 43 Scaling Cluster – Part 2
Lecture 44 HDFS Masters
Lecture 45 HA Configuration
Lecture 46 YARN Masters
Lecture 47 Linux ACLs
Lecture 48 HDFS ACLs Security – Part 1
Lecture 49 HDFS ACLs Security – Part 2
Lecture 50 Hadoop Users and Groups
Lecture 51 NameNode UI
Lecture 52 Apache Hadoop Auditing
Lecture 53 Hadoop Metrics
Lecture 54 Hadoop Logs and Monitoring
Lecture 55 Hadoop Troubleshooting – Part 1
Lecture 56 Hadoop Troubleshooting – Part 2
Section 3: Solving 10 Hadoop'able Problems
Lecture 57 The Course Overview
Lecture 58 Hadoop Distributed File System (HDFS)
Lecture 59 Distributed Compute Capability YARN
Lecture 60 Apache Hive for ETL and SQL Like
Lecture 61 Message Queuing and Data Ingestion Kafka
Lecture 62 NoSQL Datastores – Hadoop HBase, Accumulo
Lecture 63 Machine Learning – Spark and Spark MLlib
Lecture 64 Stream Processing – Spark Streaming
Lecture 65 Processing Payment Data from an Event Stream
Lecture 66 Advanced Aggregations Using Streaming API – PaymentAnalyzer
Lecture 67 Storing Time Series Data in HBase
Lecture 68 Detecting BOT Traffic Using Spark Streaming
Lecture 69 Make Web Log Data Queryable – Hive Sink
Lecture 70 Investigating Customers Data in Hive
Lecture 71 Trending Supply Chain – Finding Top Seller Item in a Streaming Way
Lecture 72 Enriching Top Sellers with Additional Information
Lecture 73 Analyzing Customer Churn (Quantitative) Using DataFrame Queries
Lecture 74 Analyzing Customer Churn (Amounts) Using DataFrame Queries
Lecture 75 Storing Low Granularity Structured Sensor Data in HBase
Lecture 76 Consuming Sensor Data Stored in HBase – Scan and Count
Lecture 77 Building Summaries on Data Streaming from Devices
Lecture 78 Introducing Spark GraphX – How to Represent a Graph?
Lecture 79 Perform Graph Operations Using GraphX
Lecture 80 Counting Degree of Vertices
Lecture 81 Neighborhood Aggregations – Collecting Neighbors
Lecture 82 Structural Operators – Connected Components
Lecture 83 Page Rank Using Spark GraphX
Lecture 84 Anomaly Detection
Lecture 85 Analyzing Web Logs for Suspicious Activity and Loading into Spark
Lecture 86 Implementing Clustering – Choosing Number of Clusters
Lecture 87 Detecting Anomalies in Network Traffic
Lecture 88 Analyzing Post for an Author
Lecture 89 Extracting Information from Unstructured Text
Lecture 90 Extracting Information Via Spark DataFrame
Lecture 91 Sentiment Analysis of Posts Using Logistic Regression
Lecture 92 Finding an Author of a Post
Lecture 93 Downloading and Setting Cloudera Sandbox
Lecture 94 Finding What Products Users Wants to Buy Using Cloudera Sandbox Toolkit
Lecture 95 Using Movies History to Suggest Interesting Content
Lecture 96 Testing and Experimenting with Recommendation Engine
This course is perfect for budding data scientists and data analysts with a firm understanding of Java and wants to get started with Hadoop