Skip to main content

Command Palette

Search for a command to run...

Feature Scaling & Normalization

Why scaling matters and how different normalization methods transform model performance.

Published
3 min read
Feature Scaling & Normalization

Introduction

Feature scaling and normalization are essential steps in machine learning because most algorithms rely on numerical stability and distance-based calculations. When features are on vastly different scales—such as age (0–100) and income (0–100,000)—the model may unintentionally give more importance to the larger-scaled variable. Scaling ensures that all features contribute equally, improves optimisation speed, and prevents distorted model behaviour.

Some algorithms are highly sensitive to feature magnitude—like SVM, KNN, and neural networks—while others, such as tree-based models, remain unaffected. Understanding when and how to scale is a key skill in feature engineering.


Why Scaling Matters

  • Prevents one feature from dominating others

  • Improves gradient descent convergence

  • Ensures fair distance calculations in KNN, K-Means, SVM

  • Helps stabilise neural network training

  • Reduces numerical instability

  • Makes model behaviour more interpretable and reliable


Common Feature Scaling & Normalization Methods

1. Standardization (Z-Score Scaling)

Standardization transforms data so that each feature has a mean of 0 and a standard deviation of 1.

Formula:
new_value = (value – mean) / standard_deviation

Use when:

  • Your data follows a normal distribution

  • You're using linear models, logistic regression, SVM, KNN, PCA, or neural networks

Why it’s useful:
It centers the distribution and helps algorithms converge faster.


2. Min-Max Normalization

Rescales data into a fixed range, often 0 to 1.

Formula:
new_value = (value – min) / (max – min)

Use when:

  • You need values strictly between 0 and 1

  • You use distance-based algorithms (KNN, K-Means)

  • Neural network models (especially those using sigmoid or tanh activation)

Important note:
Sensitive to outliers—extreme values can compress everything else.


3. Robust Scaling

Reduces the effect of outliers by scaling based on the median and IQR (interquartile range).

Formula:
new_value = (value – median) / IQR

Use when:

  • Your dataset contains extreme outliers

  • You want stable scaling without letting outliers dominate


4. Log Transform

Applies a logarithmic transformation to reduce skewness.

Use when:

  • The feature is right-skewed (e.g., income, transaction amounts)

  • You want to compress large ranges

  • You need a more normal-like distribution

Note:
Can only be applied to positive values.


Which Algorithms Need Scaling?

Algorithms that require scaling:

  • Support Vector Machines (SVM)

  • K-Nearest Neighbours (KNN)

  • K-Means clustering

  • Logistic Regression

  • Linear Regression (better performance)

  • PCA (Principal Component Analysis)

  • Neural Networks (deep learning models)

These are sensitive because they rely on distance calculations or gradient descent.


Algorithms that do not need scaling:

  • Decision Trees

  • Random Forest

  • XGBoost, LightGBM, CatBoost

  • Naive Bayes

  • Rules-based algorithms

Tree models split on thresholds, so feature magnitude does not affect performance.


Common Mistakes to Avoid

  • Scaling before splitting into train/test (causes data leakage)

  • Scaling categorical data accidentally

  • Using Min-Max with heavy outliers

  • Applying log transform to zero or negative values

  • Scaling target variable unless specifically required for regression


Best Practices

  • Always fit the scaler only on training data

  • Use the same scaler to transform the test set

  • Use pipelines to automate scaling with model training

  • Combine scaling with imputation and encoding in a proper workflow


Closing Summary

Feature scaling is an essential preprocessing step that directly influences model accuracy, stability, and training efficiency. While not all algorithms require scaling, understanding which methods to apply—and when—is critical for producing robust machine-learning models. This episode equips you with the foundational techniques to scale features correctly and avoid common pitfalls, setting the stage for deeper feature engineering strategies in the coming episodes.