Machine Learning Pipeline

1) Overview:

2) Details:

Problem Formulation

Framing the Problem

Does ML or the traditional approach make more sense?
Supervised or unsupervised?
Labelled data to train?
Validate the use of machine learning and confirm that you have access to the right people and data.

Data Sources

Data Considerations

Feature Engineering

Dealing with your data to make it usable.
A process of selecting or creating the features that you will use to train your model.
Feature extraction: building up valuable information from raw data by reformating, combining, and transforming primary features into new ones.
Feature selection: selecting the features that are most relevant and discarding the rest.

Preparing Data

Overfitting and Underfitting

Overfitting:
- The model performs well on training data, but it does not perform well on the evaluation data.
- It essentially memorizes the training data instead of actually learning the relationship between features and labels.
Underfitting:
- The model performs poorly on the training data.
- It cannot capture the relationship between the input examples (often called X) and the target values (often called Y).
Balanced:
- Good trade-off between the error on the training data and the evaluation data.

References:

Page updated

Google Sites

Report abuse

About Me: