Tags
Language
Tags
April 2024
Su Mo Tu We Th Fr Sa
31 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 1 2 3 4

O'Reilly - Hadoop Fundamentals for Data Scientists [repost]

Posted By: ParRus
O'Reilly - Hadoop Fundamentals for Data Scientists [repost]

O'Reilly - Hadoop Fundamentals for Data Scientists
WEBRip | English | MP4 | 1920 x 1080 | AVC ~3919 kbps | AAC ~125 Kbps | 48.0 KHz | 1 ch | 05:50:53 | 7.72 GB
Genre: Video Tutorial / Computer Science, Software Engineering, Statistics and Data Analysis

Get a practical introduction to Hadoop, the framework that made big data and large-scale analytics possible by combining distributed computing techniques with distributed storage. In this video tutorial, hosts Benjamin Bengfort and Jenny Kim discuss the core concepts behind distributed computing and big data, and then show you how to work with a Hadoop cluster and program analytical jobs. You'll also learn how to use higher-level tools such as Hive and Spark.
Hadoop is a cluster computing technology that has many moving parts, including distributed systems administration, data engineering and warehousing methodologies, software engineering for distributed computing, and large-scale analytics. With this video, you'll learn how to operationalize analytics over large datasets and rapidly deploy analytical jobs with a variety of toolsets. Once you've completed this video, you'll understand how different parts of Hadoop combine to form an entire data pipeline managed by teams of data engineers, data programmers, data researchers, and data business people.
- Understand the Hadoop architecture and set up a pseudo-distributed development environment
- Learn how to develop distributed computations with MapReduce and the Hadoop Distributed File System (HDFS)
- Work with Hadoop via the command-line interface
- Use the Hadoop Streaming utility to execute MapReduce jobs in Python
- Explore data warehousing, higher-order data flows, and other projects in the Hadoop ecosystem
- Learn how to use Hive to query and analyze relational data using Hadoop
- Use summarization, filtering, and aggregation to move Big Data towards last mile computation
- Understand how analytical workflows including iterative machine learning, feature analysis, and data modeling work in a Big Data context

Benjamin Bengfort is a data scientist and programmer in Washington DC who prefers technology to politics but sees the value of data in every domain. Alongside his work teaching, writing, and developing large-scale analytics with a focus on statistical machine learning, he is finishing his PhD at the University of Maryland where he studies machine learning and artificial intelligence. Jenny Kim, a software engineer in the San Francisco Bay Area, develops, teaches, and writes about big data analytics applications and specializes in large-scale, distributed computing infrastructures and machine-learning algorithms to support recommendations systems.

01. Hadoop Fundamentals For Data Scientists
0101 Overview Of The Video Course

02. A Distributed Computing Environment
0201 The Motivation For Hadoop
0202 A Brief History Of Hadoop
0203 Understanding The Hadoop Architecture
0204 Setting Up A Pseudo-Distributed Environment
0205 The Distributed File System - HDFS
0206 Distributed Computing With MapReduce
0207 Word Count - The Hello World Of Hadoop

03. Computing With Hadoop
0301 How A MapReduce Job Works
0302 Mappers And Reducers Into Detail
0303 Working With Hadoop Via The Command Line - Starting HDFS And Yarn
0304 Working With Hadoop Via The Command Line - Loading Data Into HDFS
0305 Working With Hadoop Via The Command Line - Running A MapReduce Job
0306 How To Use Our Github Goodies
0307 Working Into Python With Hadoop Streaming
0308 Common MapReduce Tasks
0309 Spark on Hadoop 2
0310 Creating A Spark Application With Python

04. The Hadoop Ecosystem
0401 The Hadoop Ecosystem
0402 Data Warehousing With Hadoop
0403 Higher Order Data Flows
0404 Other Notable Projects

05. Working With Data On Hive
0501 Introduction To Hive
0502 Interacting With Data Via The Hive Console
0503 Creating Databases, Tables, And Schemas For Hive
0504 Loading Data Into Hive From HDFS
0505 Querying Data And Performing Aggregations With Hive

06. Towards Last Mile Computing
0601 Decomposing Large Data Sets To A Computational Space
0602 Linear Regressions
0603 Summarizing Documents With TF-IDF
0604 Classification Of Text
0605 Parallel Canopy Clustering
0606 Computing Recommendations Via Linear Log-Likelihoods

General
Complete name part12.mp4
Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (mp42/mp41)
File size : 229 MiB
Duration : 7mn 55s
Overall bit rate mode : Variable
Overall bit rate : 4 049 Kbps
Encoded date : UTC 2015-01-08 20:45:22
Tagged date : UTC 2015-01-08 20:45:43
©TIM : 00;00;00;00
©TSC : 30000
©TSZ : 1001

Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L4
Format settings, CABAC : Yes
Format settings, ReFrames : 3 frames
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 7mn 55s
Bit rate mode : Variable
Bit rate : 3 919 Kbps
Maximum bit rate : 6 000 Kbps
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 29.970 (30000/1001) fps
Standard : NTSC
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.063
Stream size : 222 MiB (97%)
Language : English
Encoded date : UTC 2015-01-08 20:45:22
Tagged date : UTC 2015-01-08 20:45:22
Color range : Limited
Color primaries : BT.709
Transfer characteristics : BT.709
Matrix coefficients : BT.709

Audio
ID : 2
Format : AAC
Format/Info : Advanced Audio Codec
Format profile : LC
Codec ID : 40
Duration : 7mn 55s
Source duration : 7mn 55s
Bit rate mode : Variable
Bit rate : 125 Kbps
Maximum bit rate : 191 Kbps
Channel(s) : 1 channel
Channel positions : Front: C
Sampling rate : 48.0 KHz
Frame rate : 46.875 fps (1024 spf)
Compression mode : Lossy
Stream size : 7.10 MiB (3%)
Source stream size : 7.10 MiB (3%)
Language : English
Encoded date : UTC 2015-01-08 20:45:22
Tagged date : UTC 2015-01-08 20:45:22

Screenshots

O'Reilly - Hadoop Fundamentals for Data Scientists [repost]

O'Reilly - Hadoop Fundamentals for Data Scientists [repost]

O'Reilly - Hadoop Fundamentals for Data Scientists [repost]

O'Reilly - Hadoop Fundamentals for Data Scientists [repost]

O'Reilly - Hadoop Fundamentals for Data Scientists [repost]

O'Reilly - Hadoop Fundamentals for Data Scientists [repost]

Exclusive eLearning Videos ParRus-blogadd to bookmarks

O'Reilly - Hadoop Fundamentals for Data Scientists [repost]