Book nowWaitinglist
Prices are displayed without VAT by default.

Cloudera Developer for Spark & Hadoop I

Learn how to import data into your Apache Hadoop cluster and process it with Spark, Hive, Flume, Sqoop, Impala, and other Hadoop ecosystem tools This four-day hands-on training course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark, Hive, Flume, Sqoop, and Impala, this training course is the best preparation for the real-world challenges faced by Hadoop developers. Participants learn to identify which tool is the right one to use in a given situation, and will gain hands-on experience in developing using those tools.

The training was kept interesting and trainers were very knowledgeable.
Data Scientist

Audience Profile: Cloudera Developer for Spark & Hadoop l Training

You will benefit from the Cloudera Developer for Spark & Hadoop l Training if:

  • you are a developer or engineer
  • you have programming experience
  • you have the ability to program in Scala and/or Python since Apache Spark examples and hands-on exercises are presented in those languages
  • you have a basic familiarity with the Linux command line
  • you have some knowledge of SQL

Prior knowledge of Hadoop is not required.

Achievements upon completion

Through instructor-led discussion and interactive, hands-on exercises, you will navigate the Hadoop ecosystem.

After completing the 4-day training, you will know:
Through instructor-led discussion and interactive, hands-on exercises, participants will learn Apache Spark and how it integrates with the entire Hadoop ecosystem, including:

  • How data is distributed, stored, and processed in a Hadoop cluster
  • How to use Sqoop and Flume to ingest data
  • How to process distributed data with Apache Spark
  • How to model structured data as tables in Impala and Hive
  • How to choose the best data storage format for different data usage patterns
  • Best practices for data storage

You will have hands-on experience in:

  • Identifying which tool is the right one to use in a given situation, and using those tools
  • Best practices for data storage
  • Working with RDDs in Spark

You will have the skills to:

  • process distributed data with Apache Spark
  • choose the best data storage format for different data usage patterns

Additional Information

Prerequisites

!Please note, that you need to bring your own laptop for this training.

Certification

This training is an excellent place to start for people working towards the CCP: Data Engineer certification. Although study is required before passing the exam (we recommend Developer Training for Spark and Hadoop II: Advanced Techniques), this training covers many of the subjects tested in the CCP: Data Engineer exam.

Xebia Academy (based in Hilversum, Amsterdam area) is an official training partner of Cloudera, the leader in Apache Hadoop-based software and services.

https://training.xebia.com/big-data/cloudera-developer-for-spark-and-hadoop-1