Data Science with Spark

Apache Spark is a powerful open-source processing engine built around speed, ease of use, and sophisticated analytics. This three-day training, taught in English, empowers data scientists to use Spark with Python from the command line, notebooks, scripts and from Jupyter notebooks. Through instructor-led discussion and interactive, hands-on exercises, you will master the tools that Spark offers to perform large-scale data science.

Q: Is Data Science with Spark training right for me?

  • Yes - if you are interested in learning about Spark and how to use it
  • Yes - if you perform data science and want to apply machine learning models

Q: What will I achieve by completing this training?

Master the tools that Spark offers to perform large-scale data science.

You will learn:

  • All about Spark and the capabilities it offers
  • How to use Spark in combination with Python and Jupyter notebook
  • How stages and tasks influence your Spark jobs
  • The best data format to use with Spark
  • The definition of data frames and how to use them
  • How to convert between pandas and Spark data frames
  • How to use Spark's built-in machine-learning libraries to do regression, classification, clustering, and ALS
  • How to use Spark Structured Streaming

You will gain hands-on experience in:

  • Using Spark from the command line, notebooks, and from scripts
  • Loading and saving DataFrames using Parquet
  • Machine Learning with Spark
  • Spark Streaming

You will develop skills to:

  • Work with data frames
  • Apply machine-learning algorithms in Spark
  • Use pipelines and cross-validation

Q: What else should I know?


You will need to bring your own laptop for this training with the following requirements:

  • 8GB RAM minimum
  • 25GB of free hard disk space
  • SSH client installed
  • Ability to install software

Custom in-company classes

We also offer customized, in-company training to provide Xebia’s immersive learning curriculum on-site at your business location.

Or call Xebia Academy at +31 35 538 1921
Sales Team