Data Science with Apache Spark

Day 1 Schedule

Morning:

Intro to Spark + DataFrames

Lunch: 12 - 1 pm

Afternoon:

Built-in Functions

Caching + Partitioning

ETL

Day 2 Schedule

Morning:

User Defined Functions

Optimizing Spark Jobs

Lunch: 12 - 1 pm

Afternoon:

SparkML & Data Cleansing

Linear Regression

Structured Streaming & Deployment Options

Day 3 Schedule

Morning:

Decision Trees

Model Tuning, Cross-Validation, and Grid Search

Lunch: 12 - 1 pm

Afternoon:

XGBoost & 3rd Party Libraries

ML Electives

Course Objectives

RDDs, DataFrames, Datasets

Understand strengths, patterns, mechanisms, and limitations of Spark

Build intuition behind the algorithms

Deployment Options

Types of common ML problems and gotchas

Survey

Spark before?


Machine Learning?


Language: Python? Scala?

Introductions

  1. Professional
  2. Name + Responsibilities

  3. Personal
  4. Interests/Fun fact

  5. Expectations?

Let's get started!