Tags
Language
Tags
December 2024
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 1 2 3 4

Azure Databricks & Spark Core For Data Engineers(Python/SQL)

Posted By: ParRus
Azure Databricks & Spark Core For Data Engineers(Python/SQL)

Azure Databricks & Spark Core For Data Engineers(Python/SQL)
WEBRip | English | MP4 | 1280 x 720 | AVC ~582 Kbps | 30 fps
AAC | 128 Kbps | 44.1 KHz | 2 channels | Subs: English (.srt) | ~15 hours | 4.88 GB
Genre: eLearning Video / IT & Software, Other IT & Software, Databricks

Real World Project on Formula1 Racing for Data Engineers using Azure Databricks, Delta Lake, Azure Data Factory [DP203]
What you'll learn
You will learn how to build a real world data project using Azure Databricks and Spark Core. This course has been taught using real world data from Formula1 motor racing
You will acquire professional level data engineering skills in Azure Databricks, Delta Lake, Spark Core, Azure Data Lake Gen2 and Azure Data Factory (ADF)
You will learn how to create notebooks, dashboards, clusters, cluster pools and jobs in Azure Databricks
You will learn how to ingest and transform data using PySpark in Azure Databricks
You will learn how to transform and analyse data using Spark SQL in Azure Databricks
You will learn about Data Lake architecture and Lakehouse architecture. Also, you will learn how to implement a solution for Lakehouse architecture using Delta Lake.
You will learn how to create Azure Data Factory pipelines to execute Databricks notebooks
You will learn how to create Azure Data Factory triggers to schedule pipelines as well as monitor them.
You will gain the skills required around Azure Databricks and Data Factory to pass the Azure Data Engineer Associate certification exam DP203, but the primary objective of the course is not to teach you to pass the exams.
You will learn how to connect to Azure Databricks from PowerBI to create reports

Requirements
All the code and step-by-step instructions are provided, but the skills below will greatly benefit your journey
Basic Python programming experience will be required
Basic SQL knowledge will be required
Knowledge of cloud fundamentals will be beneficial, but not necessary
Azure subscription will be required, If you don't have one we will create a free account in the course

Description
Welcome!

I am looking forward to helping you with learning one of the in-demand data engineering tools in the cloud, Azure Databricks! This course has been taught with implementing a data engineering solution using Azure Databricks and Spark core for a real world project of analysing and reporting on Formula1 motor racing data.

This is like no other course in Udemy for Azure Databricks. Once you have completed the course including all the assignments, I strongly believe that you will be in a position to start a real world data engineering project on your own and also proficient on Azure Databricks. I have also included lessons on Azure Data Lake Storage Gen2, Azure Data Factory as well as PowerBI. The primary focus of the course is Azure Databricks and Spark core, but it also covers the relevant concepts and connectivity to the other technologies mentioned. Please note that the course doesn't cover other aspects of Spark such as Spark streaming and Spark ML. Also the course has been taught using PySpark as well as Spark SQL; It doesn't cover Scala or Java.

The course follows a logical progression of a real world project implementation with technical concepts being explained and the Databricks notebooks being built at the same time. Even though this course is not specifically designed to teach you the skills required for passing the Azure Data Engineer Associate Certification Exam DP203, it can greatly help you get most of the necessary skills required for the exam.

I value your time as much as I do mine. So, I have designed this course to be fast-paced and to the point. Also, the course has been taught with simple English and no jargons. I start the course from basics and by the end of the course you will be proficient in the technologies used.

Currently the course teaches you the following

Azure Databricks

Building a solution architecture for a data engineering solution using Azure Databricks, Azure Data Lake Gen2, Azure Data Factory and Power BI

Creating and using Azure Databricks service and the architecture of Databricks within Azure

Working with Databricks notebooks as well as using Databricks utilities, magic commands etc

Passing parameters between notebooks as well as creating notebook workflows

Creating, configuring and monitoring Databricks clusters, cluster pools and jobs

Mounting Azure Storage in Databricks using secrets stored in Azure Key Vault

Working with Databricks Tables, Databricks File System (DBFS) etc

Using Delta Lake to implement a solution using Lakehouse architecture

Creating dashboards to visualise the outputs

Connecting to the Azure Databricks tables from PowerBI

Spark (Only PySpark and SQL)

Spark architecture, Data Sources API and Dataframe API

PySpark - Ingestion of CSV, simple and complex JSON files into the data lake as parquet files/ tables.

PySpark - Transformations such as Filter, Join, Simple Aggregations, GroupBy, Window functions etc.

PySpark - Creating local and temporary views

Spark SQL - Creating databases, tables and views

Spark SQL - Transformations such as Filter, Join, Simple Aggregations, GroupBy, Window functions etc.

Spark SQL - Creating local and temporary views

Implementing full refresh and incremental load patterns using partitions

Delta Lake

Emergence of Data Lakehouse architecture and the role of delta lake.

Read, Write, Update, Delete and Merge to delta lake using both PySpark as well as SQL

History, Time Travel and Vacuum

Converting Parquet files to Delta files

Implementing incremental load pattern using delta lake

Azure Data Factory

Creating pipelines to execute Databricks notebooks

Designing robust pipelines to deal with unexpected scenarios such as missing files

Creating dependencies between activities as well as pipelines

Scheduling the pipelines using data factory triggers to execute at regular intervals

Monitor the triggers/ pipelines to check for errors/ outputs.

Who this course is for:
University students looking for a career in Data Engineering
IT developers working on other disciplines trying to move to Data Engineering
Data Engineers/ Data Warehouse Developers currently working on on-premises technologies, or other cloud platforms such as AWS or GCP who want to learn Azure Data Technologies
Data Architects looking to gain an understanding about Azure Data Engineering stack

also You can find my other last: IT & Software-posts

General
Complete name : 004 Databricks Utilities.mp4
Format : MPEG-4
Format profile : Base Media
Codec ID : isom (isom/iso2/avc1/mp41)
File size : 61.4 MiB
Duration : 11 min 56 s
Overall bit rate : 719 kb/s
Writing application : Lavf58.12.100

Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : Main@L3.1
Format settings : CABAC / 4 Ref Frames
Format settings, CABAC : Yes
Format settings, RefFrames : 4 frames
Format settings, GOP : M=4, N=60
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 11 min 56 s
Bit rate : 582 kb/s
Nominal bit rate : 3 000 kb/s
Width : 1 280 pixels
Height : 720 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 30.000 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.021
Stream size : 49.7 MiB (81%)
Writing library : x264 core 148
Encoding settings : cabac=1 / ref=3 / deblock=1:0:0 / analyse=0x1:0x111 / me=umh / subme=6 / psy=1 / psy_rd=1.00:0.00 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=1 / 8x8dct=0 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-2 / threads=22 / lookahead_threads=3 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=3 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=60 / keyint_min=6 / scenecut=0 / intra_refresh=0 / rc_lookahead=60 / rc=cbr / mbtree=1 / bitrate=3000 / ratetol=1.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / vbv_maxrate=3000 / vbv_bufsize=6000 / nal_hrd=none / filler=0 / ip_ratio=1.40 / aq=1:1.00

Audio
ID : 2
Format : AAC
Format/Info : Advanced Audio Codec
Format profile : LC
Codec ID : mp4a-40-2
Duration : 11 min 56 s
Bit rate mode : Constant
Bit rate : 128 kb/s
Channel(s) : 2 channels
Channel positions : Front: L R
Sampling rate : 44.1 kHz
Frame rate : 43.066 FPS (1024 SPF)
Compression mode : Lossy
Stream size : 10.9 MiB (18%)
Default : Yes
Alternate group : 1

Screenshots

Azure Databricks & Spark Core For Data Engineers(Python/SQL)

Azure Databricks & Spark Core For Data Engineers(Python/SQL)

Azure Databricks & Spark Core For Data Engineers(Python/SQL)

Azure Databricks & Spark Core For Data Engineers(Python/SQL)

Azure Databricks & Spark Core For Data Engineers(Python/SQL)

✅ Exclusive eLearning Videos ParRus-blogadd to bookmarks
Feel free to contact me PM
when links are dead or want any repost

Azure Databricks & Spark Core For Data Engineers(Python/SQL)