Data Science with Spark

Apache Spark is a powerful open-source processing engine built around speed, ease of use, and advanced analytics. This three-day training, taught in English, empowers data scientists to use Spark with Python. Through instructor-led discussion and interactive, hands-on exercises, you will master the tools that Spark offers to perform large-scale data science.
"Machine learning with spark was fun, seeing the new spark language. Getting to learn and get enthouasiastic with other people that are data scientists or becoming one." - Data Scientist, KPN

Is Data Science with Spark training right for me?

  • Yes - if you are interested in learning about Spark and how to use it
  • Yes - if you are a data science practitioner and want to apply your skills at scale
  • Yes - if you want to learn how to use Spark’s machine learning and streaming capabilities 

What will I achieve by completing this training?

Master the tools that Spark offers to perform large-scale data science.

You will learn:

  • All about Spark and the capabilities it offers
  • How to use Spark in combination with Python and Jupyter notebook
  • How you can optimize your Spark jobs with stages and tasks
  • The data formats to use with Spark
  • What Spark DataFrames are and how you should use them
  • How to convert between pandas and Spark DataFrames
  • How to use Spark's built-in machine-learning libraries to do regression, classification, and recommenders
  • How to use Spark Structured Streaming

You will gain hands-on experience in:

  • Using Spark from the command line, notebooks, and from scripts
  • Loading and saving DataFrames using CSV, Parquet and Apache Hive
  • Machine learning with Spark
  • Spark Streaming

You will develop the skills to:

  • Work with Spark DataFrames
  • Apply machine-learning algorithms in Spark
  • Using streaming algorithms

What else should I know?


  • Know the basics of programming in Python
  • Familiar with the basics of data manipulation, SQL, etc.

You will need to bring your own laptop for this training with the following requirements:

  • 8GB RAM minimum
  • 25GB of free hard disk space
  • SSH client installed
  • Ability to install software

This training is taught by our training partner GodataDriven

Tell us what you need

Interested in this training, but looking for a customized, in-company course that fits your business best? We are here to help you succeed.

Or call Xebia Academy at +31 35 538 1921
Sales Team