Tags
Language
Tags
July 2025
Su Mo Tu We Th Fr Sa
29 30 1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31 1 2
    Attention❗ To save your time, in order to download anything on this site, you must be registered 👉 HERE. If you do not have a registration yet, it is better to do it right away. ✌

    ( • )( • ) ( ͡⚆ ͜ʖ ͡⚆ ) (‿ˠ‿)
    SpicyMags.xyz

    Apache Iceberg: Complete Hands-On Masterclass

    Posted By: lucky_aut
    Apache Iceberg: Complete Hands-On Masterclass

    Apache Iceberg: Complete Hands-On Masterclass
    Published 7/2025
    Duration: 3h 22m | .MP4 1920x1080 30 fps(r) | AAC, 44100 Hz, 2ch | 2.07 GB
    Genre: eLearning | Language: English

    Master Apache Iceberg with hands-on labs using PySpark, Databricks, and Google Colab—no setup or data needed

    What you'll learn
    - Learn why Apache Iceberg is redefining modern data lakes and how it overcomes limitations of Hive, Delta Lake, and Hudi.
    - Set up Iceberg in Databricks & Google Colab—no local install or cloud budget needed—and start building hands-on from your browser.
    - Perform DDL and DML operations (insert, update, delete) on Iceberg tables and explore internal metadata structures such as snapshots, manifests, and partitions.
    - Master Iceberg’s time travel and metadata tables to build scalable, version-controlled, cost-efficient data pipelines.
    - Understand Iceberg’s architecture under the hood—how it handles schema, partition evolution, and decouples compute from storage.
    - Explore real-world debugging, rollback, and auditing use cases using Iceberg’s powerful snapshot and metadata tracking system.

    Requirements
    - Basic understanding of Python programming is helpful.
    - Some familiarity with PySpark and big data file formats like Parquet is recommended.
    - Familiarity with file formats like Parquet or ORC is useful but not mandatory.
    - No need for cloud subscriptions or local installations — Databricks Community Edition and Google Colab are used (both are free).

    Description
    This course offers a practical, hands-on introduction toApache Iceberg, the modern open table format designed for today’s large-scale data lakes and lakehouses. Whether you’re a data engineer, developer, or architect, this course will help you understand and apply Iceberg concepts through real-world exercises—without the need for any infrastructure setup.

    You’ll learn to create, query, and manage Iceberg tables usingPySparkin bothDatabricks Community EditionandGoogle Colab—two free platforms accessible from your browser. We cover everything from understanding table formats, DDL and DML operations, partition evolution, schema evolution, metadata tables, and Iceberg’s powerfultime travelcapability.

    All code and sample data are provided chapter by chapter. You’ll generate data on the fly, inspect table structures, and compare metadata files usingVS Codeand online JSON viewers. No local installation, no external datasets—just clear, interactive learning.

    What You’ll Learn

    Key differences between file formats and table formats in big data

    How to create and manage Apache Iceberg tables using PySpark

    Comparing Hive tables and Iceberg with practical demos

    Running Iceberg on Databricks and Google Colab (setup included)

    Performing DDL and DML operations (insert, update, delete)

    Using Iceberg’s built-in metadata tables to inspect file-level and snapshot info

    Implementingtime travelto query historical data versions

    Understanding how Iceberg handles schema evolution and partition changes

    Comparing Iceberg with Delta Lake and Hudi in practical scenarios

    By the end of the course, you’ll have a strong working knowledge of Apache Iceberg and be ready to use it in real-world data projects with confidence.

    Who this course is for:
    - This course is designed for data engineers, data architects, and developers who work with large-scale data pipelines and are looking to modernize their data lake or lakehouse architecture. If you’re currently using Hive, Delta Lake, or Hudi and want to explore a more flexible, scalable, and engine-agnostic table format, this course is for you.
    - It’s also ideal for those who want to gain hands-on experience with Apache Iceberg using free tools like Databricks Community Edition and Google Colab — without needing complex infrastructure or cloud setup.
    - Whether you’re building batch or streaming pipelines, working on schema evolution, or just want to understand Iceberg’s time travel and metadata capabilities, this course will help you build practical skills with real-world applications.
    - No prior experience with Iceberg is required — just a willingness to learn and some basic familiarity with Python or PySpark.
    More Info

    Please check out others courses in your favourite language and bookmark them
    English - German - Spanish - French - Italian
    Portuguese