Introduction to Sparklyr for Data Science
MP4 | Video: AVC 1280x720 | Audio: AAC 44KHz 2ch | Duration: 1.5 Hours | 601 MB
Genre: eLearning | Language: English
MP4 | Video: AVC 1280x720 | Audio: AAC 44KHz 2ch | Duration: 1.5 Hours | 601 MB
Genre: eLearning | Language: English
Join data scientist Kelly O'Briant for an exploration of sparklyr, the package from RStudio which provides an interface to Apache Spark from R. For many data scientists who rely on R for their work, the paradigm shift from local in-memory computations to scalable distributed data processing can be complicated to navigate. This course provides an easy-to-follow R based method for working with big data. You'll connect to Spark, run some sparklyr code, and explore some practical applications of Spark SQL and sparklyr functionality. You'll wrap up by performing some exploratory analysis and feature generation using a Kaggle competition data set. Learners should have a moderate level of experience with doing data science tasks or workflows in R.
Explore the benefits and limitations of choosing sparklyr for distributed computing in R
Discover how to interact with data in Apache Spark through sparklyr and Spark SQL
Understand how to connect to Spark locally or to a remote Spark cluster
Learn to perform exploratory data analysis in Spark using sparklyr, dplyr, and DBI
Master the differences between working with data frames in R versus Spark
Understand how to build data products in R that don't rely on storing big data locally