Feature Scaling & Normalization Guide

Introduction

Feature scaling and normalization are essential steps in machine learning because most algorithms rely on numerical stability and distance-based calculations. When features are on vastly different scales—such as age (0–100) and income (0–100,000)—the model may unintentionally give more importance to the larger-scaled variable. Scaling ensures that all features contribute equally, improves optimisation speed, and prevents distorted model behaviour.

Some algorithms are highly sensitive to feature magnitude—like SVM, KNN, and neural networks—while others, such as tree-based models, remain unaffected. Understanding when and how to scale is a key skill in feature engineering.

Why Scaling Matters

Prevents one feature from dominating others
Improves gradient descent convergence
Ensures fair distance calculations in KNN, K-Means, SVM
Helps stabilise neural network training
Reduces numerical instability
Makes model behaviour more interpretable and reliable

Common Feature Scaling & Normalization Methods

1. Standardization (Z-Score Scaling)

Standardization transforms data so that each feature has a mean of 0 and a standard deviation of 1.

Formula:
new_value = (value – mean) / standard_deviation

Use when:

Your data follows a normal distribution
You're using linear models, logistic regression, SVM, KNN, PCA, or neural networks

Why it’s useful:
It centers the distribution and helps algorithms converge faster.

2. Min-Max Normalization

Rescales data into a fixed range, often 0 to 1.

Formula:
new_value = (value – min) / (max – min)

Use when:

You need values strictly between 0 and 1
You use distance-based algorithms (KNN, K-Means)
Neural network models (especially those using sigmoid or tanh activation)

Important note:
Sensitive to outliers—extreme values can compress everything else.

3. Robust Scaling

Reduces the effect of outliers by scaling based on the median and IQR (interquartile range).

Formula:
new_value = (value – median) / IQR

Use when:

Your dataset contains extreme outliers
You want stable scaling without letting outliers dominate

4. Log Transform

Applies a logarithmic transformation to reduce skewness.

Use when:

The feature is right-skewed (e.g., income, transaction amounts)
You want to compress large ranges
You need a more normal-like distribution

Note:
Can only be applied to positive values.

Which Algorithms Need Scaling?

**Algorithms that require scaling:**

Support Vector Machines (SVM)
K-Nearest Neighbours (KNN)
K-Means clustering
Logistic Regression
Linear Regression (better performance)
PCA (Principal Component Analysis)
Neural Networks (deep learning models)

These are sensitive because they rely on distance calculations or gradient descent.

**Algorithms that do not need scaling:**

Decision Trees
Random Forest
XGBoost, LightGBM, CatBoost
Naive Bayes
Rules-based algorithms

Tree models split on thresholds, so feature magnitude does not affect performance.

Common Mistakes to Avoid

Scaling before splitting into train/test (causes data leakage)
Scaling categorical data accidentally
Using Min-Max with heavy outliers
Applying log transform to zero or negative values
Scaling target variable unless specifically required for regression

Best Practices

Always fit the scaler only on training data
Use the same scaler to transform the test set
Use pipelines to automate scaling with model training
Combine scaling with imputation and encoding in a proper workflow

Closing Summary

Feature scaling is an essential preprocessing step that directly influences model accuracy, stability, and training efficiency. While not all algorithms require scaling, understanding which methods to apply—and when—is critical for producing robust machine-learning models. This episode equips you with the foundational techniques to scale features correctly and avoid common pitfalls, setting the stage for deeper feature engineering strategies in the coming episodes.

Feature Scaling & Normalization

Introduction

Why Scaling Matters