Data Science with Spark
Q: Is Data Science with Spark training right for me?
You will benefit from Data Science with Spark training if:
- you are interested in learning about Spark and how to use it
- you perform data science and want to apply machine learning models
Q: What will I achieve by completing this training?
Master the tools that Spark offers to perform large-scale data science.
You will learn:
- All about Spark and the capabilities it offers
- How to use Spark in combination with Python and Jupyter notebook
- How stages and tasks influence your Spark jobs
- The best data format to use with Spark
- The definition of data frames and how to use them
- How to convert between pandas and Spark data frames
- How to use Spark's built-in machine-learning libraries to do regression, classification, clustering, and ALS
- How to use Spark Structured Streaming
You will gain hands-on experience in:
- Using Spark from the command line, notebooks, and from scripts
- Loading and saving DataFrames using Parquet
- Machine Learning with Spark
- Spark Streaming
You will develop skills to:
- Work with data frames
- Apply machine-learning algorithms in Spark
- Use pipelines and cross-validation
Q: What else should I know?
You will need to bring your own laptop for this training with the following requirements:
- 8GB RAM minimum
- 25GB of free hard disk space
- SSH client installed
- Ability to install software