Hands on Deep Learning with Keras, Tensorflow, and Apache Spark™

©Databricks 2018

Brooke Wenig

M.S. Computer Science (Distributed Machine Learning)

Fluent in 中文


brooke@databricks.com
LinkedIn
©Databricks 2018

When I'm not working...

©Databricks 2018

Schedule

Keras and Neural Network Fundamentals

MLFlow

CNNs and ImageNet

Transfer Learning and Deep Learning Pipelines

Horovod: Distributed Tensorflow

©Databricks 2018

Survey

Spark before?


Pandas/Numpy?


Machine Learning? Deep Learning?


Expectations?

©Brooke Wenig 2018

Course Objectives

Fundamentals of Deep Learning and best practices

Utilize Keras, Deep Learning Pipelines, and Horovod

Understand when/where to use transfer learning

List advantages/disadvantages of distributed neural network training

©Databricks 2018

Course Non-Objectives

Demonstrate every new research technique/API

Detailed Math/CS behind the algorithms

©Databricks 2018

Machine Learning Overview

Supervised Learning

Unsupervised Learning

Reinforcement Learning

©Databricks 2018

Supervised Machine Learning

Classification

Regression

©Databricks 2018

Unsupervised Machine Learning

Learn structure of the unlabeled data

©Databricks 2018

Reinforcement Learning

Learning what to do to maximize reward

Explore and exploit

©Databricks 2018

"All models are wrong; some models are useful."

©Databricks 2018

How to build/evaluate models?

©Databricks 2018

Build a Model

©Databricks 2018

Time Series Analysis

©Databricks 2018

Accuracy

What if I told you I had a model that was 99% accurate in predicting brain cancer?

©Databricks 2018

Baseline Model

You ALWAYS want to have a baseline model to compare to

This should be a "dummy" model, i.e. a coin flip

©Databricks 2018

Model Selection?

Underlying data distribution

Some models are more costly to train

Need for interpretability?

©Databricks 2018

Right to Explanation


General Data Protection Regulation
©Databricks 2018

Deep Learning Overview

©Databricks 2018

Why are you here?

©Databricks 2018
©Databricks 2018
©Databricks 2018

What is Deep Learning?

Composing representations of data in a hierarchical manner

©Databricks 2018

Layers

Input layer (fixed)

Zero or more hidden layers

Output layer (fixed)

©Databricks 2018

Backpropagation

Calculate gradients to update weights

©Databricks 2018

Loss functions

©Databricks 2018
Image credit F. Chollet
©Databricks 2018

Regression Evaluation

Measure "closeness" between label and prediction

  • When predicting someone's weight, better to be off by 2 lbs instead of 20 lbs

Evaluation metrics:

  • Loss: $(y - \hat{y})$
  • Absolute loss: $|y - \hat{y}|$
  • Squared loss: $(y - \hat{y})^2$
©Databricks 2018

Evaluation metric: MSE

$Error = (y_{i} - \hat{y_{i}})$

$SE = (y_{i} - \hat{y_{i}})^2$

$SSE = \sum_{i=1}^n (y_{i} - \hat{y_{i}})^2$

$MSE = \frac{1}{n}\sum_{i=1}^n (y_{i} - \hat{y_{i}})^2$

$RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^n (y_{i} - \hat{y_{i}})^2}$

©Databricks 2018

Train vs. Test MSE

Which is more important? Why?

©Databricks 2018
©Databricks 2018

Keras

High-level Python API to build neural networks

Official high-level API of Tensorflow

Supports: Tensorflow, Theano, and CNTK

Has over 250,000 users

Released by François Chollet in 2015

©Databricks 2018

Why Keras?

©Databricks 2018

Intro to Keras I

©Databricks 2018

Activation Functions

Provide non-linearity in our neural networks to learn more complex relationships

Sigmoid

Tangent

ReLU

Leaky ReLU

PReLU

ELU

©Databricks 2018

Sigmoid

Saturates and kills gradients

Not zero-centered

©Databricks 2018
Image credit A. Karpathy
©Databricks 2018

Tangent

Zero centered!

BUT, like the sigmoid, its activations saturate

©Databricks 2018
Image credit A. Karpathy
©Databricks 2018

ReLU

BUT, gradients can still go to zero

©Databricks 2018
Image credit A. Karpathy
©Databricks 2018

Leaky ReLU

For x < 0: $$f(x) = \alpha * x$$ For x >= 0: $$f(x) = x$$

©Databricks 2018
Image credit A. Karpathy
©Databricks 2018

Comparison

©Databricks 2018

Optimizers

©Databricks 2018

Stochastic Gradient Descent (SGD)

Choosing a proper learning rate can be difficult

Image credit F. Chollet
©Databricks 2018

Stochastic Gradient Descent

Easy to get stuck in local minima

Image credit F. Chollet
©Databricks 2018

Momentum

Accelerates SGD: Like pushing a ball down a hill

Take average of direction we’ve been heading (current velocity and acceleration)

Limits oscillating back and forth, gets out of local minima

©Databricks 2018

ADAM

Adaptive Moment Estimation (Adam)


©Databricks 2018

Intro to Keras II

©Databricks 2018

Keras Lab

©Databricks 2018

Hyperparameter Selection

©Databricks 2018

Hyperparameter Selection

Which dataset should we use to select hyperparameters? Train? Test?

©Databricks 2018

Validation Dataset

Split the dataset into three!

  • Train on the training set
  • Select hyperparameters based on performance of the validation set
  • Test on test set
©Databricks 2018

LIME: Model Interpretability

©Databricks 2018

Advanced Keras Demo

©Databricks 2018

Intro to Keras II Lab

©Databricks 2018

MLFlow

©Databricks 2018

MLFlow Lab

©Databricks 2018

Convolutional Neural Networks

©Databricks 2018
©Databricks 2018

ImageNet Challenge

Classify images in one of 1000 categories

2012 Deep Learning breakthrough with AlexNet: 16% top-5 test error rate (next closest was 25%)

©Databricks 2018

VGG16 (2014)

One of the most widely used architectures for its simplicity

©Databricks 2018

Convolutions

Focus on Local Connectivity (fewer parameters to learn)

Filter/kernel slides across input image (often 3x3)

CS 231 Convolutional Networks
©Databricks 2018
Image Kernels Visualization
©Databricks 2018

Max vs Avg. Pooling

©Databricks 2018

Residual Connection

©Databricks 2018

Inception

©Databricks 2018

What do CNNs Learn?

Breaking Convnets
©Databricks 2018

CNN Demo

©Databricks 2018

Transfer Learning

©Databricks 2018

Transfer Learning

IDEA: Intermediate representations learned for one task may be useful for other related tasks

©Databricks 2018

When to use Transfer Learning?

©Databricks 2018

Transfer Learning Demo

©Databricks 2018

Recommendation Systems

©Databricks 2018

Want to win $1,000,000 for your data science skills?

©Databricks 2018

Recommendation Systems

©Databricks 2018

Ratings Matrix

©Databricks 2018

Naive Approaches

  1. Hand curated
  2. Aggregates

Problems with these approaches?

©Databricks 2018

Goal: Scalable, Personalizable Recommendations

©Databricks 2018

Collaborative Filtering

k factors characterize the users and items (k << n)

©Databricks 2018

Better

Use the user + product factors as input to neural network

Can augment with additional features

Build distributed neural network for end-to-end scalability!

©Databricks 2018

ALS + NN

©Databricks 2018

Horovod

©Databricks 2018
©Databricks 2018

Horovod

Created by Alexander Sergeev of Uber, open-sourced in 2017

Simplifies distributed neural network training

Supports TensorFlow, Keras, and PyTorch

©Databricks 2018

Classical Parameter Server

©Databricks 2018

All-Reduce


					

# Only one line of code change! optimizer = hvd.DistributedOptimizer(optimizer)

©Databricks 2018

Horovod Estimator

Part of Databricks' Runtime for ML

Distributed Tensorflow training on Spark DataFrames

MLlib Estimator API

Specify model via tf.estimator model_fn
model_fn(features, labels, mode) → tf.EstimatorSpec

©Databricks 2018

Horovod on Spark

©Databricks 2018

Horovod Estimator

Shards data across nodes’ local disks

Trains a tf.estimator across nodes

Feeds TFRecord data to estimator

Automatic checkpointing, logging

Simultaneous model evaluation

©Databricks 2018

Horovod Demo

©Databricks 2018

Horovod Lab

©Databricks 2018

Ok, so how do I find the optimal neural network architecture?

©Databricks 2018
Neural Architecture Search with Reinforcement Learning

©Databricks 2018
©Databricks 2018

Resources

Horovod Meetup Talk

MLflow

Deep Learning with Python

Stanford's CS 231

fast.ai

Blog posts & webinars

Databricks Runtime for ML

©Databricks 2018

Pyception

©Databricks 2018

A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and PyTorch

Jules Damji and Brooke Wenig @ 15:20 Thursday!

©Databricks 2018

Thank you!

©Databricks 2018