Skip to main content

TL;DR

An optimization method that minimizes the sum of squared residuals: \( \min_\beta \sum_i (y_i - x_i^T \beta)^2 \). The solution is \( \hat{\beta} = (X^TX)^{-1}X^Ty \) for ordinary least squares (OLS).

By Valenke Exam Prep Team·Last updated 2026-06-03

Least Squares

An optimization method that minimizes the sum of squared residuals: \( \min_\beta \sum_i (y_i - x_i^T \beta)^2 \). The solution is \( \hat{\beta} = (X^TX)^{-1}X^Ty \) for ordinary least squares (OLS).

Why it matters for interviews

OLS is the foundation of empirical finance: factor model estimation, alpha/beta decomposition, and signal construction all use least squares. Understanding regularized variants (ridge, LASSO) is increasingly important.

Definition and Mathematical Foundation

An optimization method that minimizes the sum of squared residuals: \( \min_\beta \sum_i (y_i - x_i^T \beta)^2 \). The solution is \( \hat{\beta} = (X^TX)^{-1}X^Ty \) for ordinary least squares (OLS).

Application in Quantitative Finance

OLS is the foundation of empirical finance: factor model estimation, alpha/beta decomposition, and signal construction all use least squares. Understanding regularized variants (ridge, LASSO) is increasingly important.

Related Terms

Ready to practice for the Quant Trading Interview?

Adaptive practice powered by Item Response Theory targets your weak areas. Start with 3 free sessions.

Start free practice →

Frequently Asked Questions

What is ridge regression?
Adds an L2 penalty: \( \min_\beta \|y - X\beta\|^2 + \lambda\|\beta\|^2 \). Solution: \( \hat{\beta} = (X^TX + \lambda I)^{-1}X^Ty \). It shrinks coefficients toward zero, stabilizing estimates when predictors are correlated or p is large relative to n.
What is LASSO?
Adds an L1 penalty: \( \min_\beta \|y - X\beta\|^2 + \lambda\|\beta\|_1 \). Unlike ridge, LASSO produces sparse solutions (exactly zero coefficients), performing variable selection. Useful for identifying which factors drive returns.
What is the geometric interpretation of OLS?
The OLS prediction \( \hat{y} = X\hat{\beta} \) is the orthogonal projection of y onto the column space of X. The residual \( y - \hat{y} \) is perpendicular to all columns of X.