Build Real World Big Data Projects
Published 12/2022
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 4.08 GB | Duration: 5h 36m
Published 12/2022
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 4.08 GB | Duration: 5h 36m
Work with Big Data Tools, SQL Databases, AWS, ETL, Data Integration Tools & more to master real-world Big Data Projects
What you'll learn
How to Build a Scalable Data Pipeline using various Components
Data Warehouse Design
Data Preparation,Cleaning, Data Transformation and Manipulation
Industry Project Ready projects
Requirements
It is also beneficial to have prior knowledge of SQL, programming basics, data pipelines and ETL concepts
Description
A real data engineering project usually involves multiple components. Setting up a data engineering project, while conforming to best practices can be extremely time-consuming. If you areA data analyst, student, scientist, or engineer looking to gain data engineering experience, but are unable to find a good starter project.1. Wanting to work on a data engineering project that simulates a real-life project.2. Looking for an end-to-end data engineering project.3. Looking for a good project to get data engineering experience for job interviews.Then this Course is for you. In this Course, you willLearn How to Set up data infrastructure such as Airflow, Redshift, Snowflake, etcLearn data pipeline best practices.Learn how to spot failure points in data pipelines and build systems resistant to failures.Learn how to design and build a data pipeline from business requirements.Learn How to Build End to End ETL PipelineSet up Apache Airflow, AWS EMR, AWS Redshift, AWS Spectrum, and AWS S3.Tech stack: ➔Language: Python➔Package: PySpark➔Services: Docker, Kafka, Amazon Redshift,S3, IICS, DBT Many MoreRequirementsThis course presume that students have prior knowledge of AWS or its Big Data services.Having a fair understanding of Python and SQL would help but it is not mandatory.Every Month New Projects will be added
Overview
Section 1: Build ETL Data Pipeline on AWS EMR Cluster
Lecture 1 Exploration of the dataset
Lecture 2 Creating EMR Cluster
Lecture 3 Login into EMR part 1
Lecture 4 Login into EMR part 2
Lecture 5 Upload Data into Amazon S3
Lecture 6 using HIve as ETL Tool
Lecture 7 Hive Data Insertion
Lecture 8 Install Tableau Desktop
Lecture 9 Install Driver
Lecture 10 Connect Tableau to Amazon EMR Hive
Lecture 11 Add data schema and Table
Lecture 12 plot charts in Tableau part 1
Lecture 13 plot charts in Tableau part 2
Lecture 14 plot charts in Tableau part 3
Lecture 15 plot charts in Tableau part 4
Lecture 16 plot charts in Tableau part 5
Lecture 17 Building Dashboard and story
Section 2: Build Modern ETL Data Pipeline using IICS
Lecture 18 Tour to Architecture diagram
Lecture 19 Exploration of the dataset
Lecture 20 Upload data to AWS S3
Lecture 21 Create Postgresql in aws
Lecture 22 Download Pgadmin
Lecture 23 set up postgres sql and create schemas
Lecture 24 Create schemas and order table in your postgres instance
Lecture 25 set up infromatica cloud account
Lecture 26 Add S3 Connection part 1
Lecture 27 Add S3 Connection part 2
Lecture 28 Add postgres Connection
Lecture 29 Create customer Destination in Datawarehouse
Lecture 30 EL for aws s3 to data warehouse
Lecture 31 Create order Destination in Datawarehouse
Lecture 32 EL for app database to data warehouse
Lecture 33 Create DBT account
Lecture 34 dbt part 1
Lecture 35 dbt part 2
Lecture 36 dbt part 3
Lecture 37 dbt part 4
Lecture 38 dbt part 5
Lecture 39 dbt part 6
Lecture 40 dbt part 7
Lecture 41 dbt part 8
Lecture 42 Tableau and postgres set up
Lecture 43 How to Build charts in Tableau
Section 3: Create A Data Pipeline based on Messaging Using PySpark and Airflow
Lecture 44 Tour to Architecture diagram
Lecture 45 Create EC2 Instance
Lecture 46 SSH into EC2 Instance
Lecture 47 Envirnoment setup with docker
Lecture 48 Copy Important folder from local to ec2 and give required permissions
Lecture 49 To connect to different services locally after port forwarding
Lecture 50 To get into bash shell of different containers
Lecture 51 Insert Nifi Template
Lecture 52 Data Extraction with Nifi
Lecture 53 Data encryption parsing
Lecture 54 Data sources hdfs kafka part 1
Lecture 55 Data sources hdfs kafka part 2
Lecture 56 Data sources hdfs kafka part 3
Lecture 57 streaming data from kafka to pyspark
Lecture 58 pyspark streaming output kafka nifi hdfs part 1
Lecture 59 pyspark streaming output kafka nifi hdfs part 2
Lecture 60 Move Data HDFS to hive Table part 1
Lecture 61 Move Data HDFS to hive Table part 2
Lecture 62 Dataflow Orchestration with Airflow part 1
Lecture 63 Dataflow Orchestration with Airflow part 2
Lecture 64 Connecting with Data Visualization Tool
Lecture 65 Plot charts
People with some software background who want to learn the New technology in big data analysis will want to check this out. This course focuses on Various Big data Tools; we introduce some Data Engineering and data Science concepts along the way, but that's not the focus. If you want to learn how to Build Data Engineering Projects , then this course is for you.,Data analysts and Data Engineer who are curious about Big Data Tools and how it relates to their work.