Model Tuning and Selection

Overview

What is Logistic Regression?
Create the Modeler.
Cross Validation
Evaluating.

Details

Logistic Regression

# Import LogisticRegression

from pyspark.ml.classification import LogisticRegression

# Create a LogisticRegression Estimator

lr = LogisticRegression()

Cross Validation

elasticNetParam.
regParam.

Evaluator

# Import the evaluation submodule

import pyspark.ml.evaluation as evals

# Create a BinaryClassificationEvaluator

evaluator = evals.BinaryClassificationEvaluator(metricName = "areaUnderROC")

Make a Grid

Create a Grid Search to look for the optimal hyperparameters.
The .addGrid() method takes a model parameter (an attribute of the model Estimator, lr, that you created a few exercises ago) and a list of values that you want to try.
The .build() method takes no arguments, it just returns the grid that you'll use later.

# Import the tuning submodule

import pyspark.ml.tuning as tune

# Create the parameter grid

grid = tune.ParamGridBuilder()

# Add the hyperparameter

grid = grid.addGrid(lr.regParam, np.arange(0, .1, .01))

grid = grid.addGrid(lr.elasticNetParam, [0,1])

# Build the grid

grid = grid.build()

Make a Validator

# Create the CrossValidator

cv = tune.CrossValidator(estimator=lr,

estimatorParamMaps=grid,

evaluator=evaluator

)

Fit the Models

# Fit cross validation models

models = cv.fit(training) #Dataset is training

# Extract the best model

best_lr = models.bestModel

Evaluate the Model

# Use the model to predict the test set

test_results = best_lr.transform(test)

# Evaluate the predictions

print(evaluator.evaluate(test_results))

References

Page updated

Google Sites

Report abuse