Are you interested in this course? Please let us know.
 Book nowWaitinglist
Prices are displayed without VAT by default..
  • Training info
  • Category Data Science
  • Price (excl. VAT)
  • Language {{course.language}}
  • Duration 3 Days
  • Time 09:00 - 17:00
  • Lunch Included

Data Science with Spark

Apache Spark is a powerful open-source processing engine built around speed, ease of use, and sophisticated analytics. This three-day training, taught in English, empowers data scientists to use Spark with Python from the command line, notebooks, scripts and from Jupyter notebooks. Through instructor-led discussion and interactive, hands-on exercises, you will master the tools that Spark offers to perform large-scale data science.

Q: Is Data Science with Spark training right for me?

You will benefit from Data Science with Spark training if:

  • you are interested in learning about Spark and how to use it
  • you perform data science and want to apply machine learning models

Q: What will I achieve by completing this training?

Master the tools that Spark offers to perform large-scale data science.

You will learn:

  • All about Spark and the capabilities it offers
  • How to use Spark in combination with Python and Jupyter notebook
  • How stages and tasks influence your Spark jobs
  • The best data format to use with Spark
  • The definition of data frames and how to use them
  • How to convert between pandas and Spark data frames
  • How to use Spark's built-in machine-learning libraries to do regression, classification, clustering, and ALS
  • How to use Spark Structured Streaming

You will gain hands-on experience in:

  • Using Spark from the command line, notebooks, and from scripts
  • Loading and saving DataFrames using Parquet
  • Machine Learning with Spark
  • Spark Streaming

You will develop skills to:

  • Work with data frames
  • Apply machine-learning algorithms in Spark
  • Use pipelines and cross-validation

Q: What else should I know?

Requirements

You will need to bring your own laptop for this training with the following requirements:

  • 8GB RAM minimum
  • 25GB of free hard disk space
  • SSH client installed
  • Ability to install software

http://training.xebia.com/data-science/data-science-with-spark