2D Principal Component Analysis (PCA)

Step 0 Create Your Data

Points: 0 (min. 3 required)

Click to add points

Raw Data Points

Click on the canvas to add data points, or use the preset buttons. PCA will find the directions along which your data varies the most.

Point	x	y
No data yet

Step 1 Center the Data

Original

Centered

Mean

Why Center?

We subtract the mean from each point so the data's centroid moves to the origin (0, 0). This ensures PCA finds directions of variance from the center of the data.

Computing the Mean

$$\boldsymbol{\mu} = \frac{1}{n} \sum_{i=1}^{n} \mathbf{x}_i = \begin{pmatrix} \bar{x} \\ \bar{y} \end{pmatrix}$$

Computed mean: μ = (?, ?)

Centering Each Point

$$\mathbf{x}_i^{\text{centered}} = \mathbf{x}_i - \boldsymbol{\mu}$$

Each point is shifted so the new mean becomes (0, 0).

Step 2 Compute the Covariance Matrix

Centered Data

Var(x)

Var(y)

The Covariance Matrix

The symbol Σ (Greek letter "sigma") denotes the covariance matrix:

Σ =

(

Var(x)

Cov(x,y)

Var(y)

)

=

(

?
?
?
?

)

Diagonal: Variance
How much each variable spreads on its own axis

Large Var(x) → wide horizontal spread
Large Var(y) → wide vertical spread

Off-diagonal: Covariance
How variables move together

Cov > 0 → upward slope (↗)
Cov < 0 → downward slope (↘)
Cov ≈ 0 → no relationship

The matrix is symmetric: Cov(x,y) = Cov(y,x). It encodes the "shape" of the data cloud.

Formula: How to Compute the Matrix

The covariance matrix is built from outer products of centered vectors. Let $\mathbf{p}_i$ denote the i-th point (a 2D vector):

$$\Sigma = \frac{1}{n-1} \sum_{i=1}^{n} (\mathbf{p}_i - \boldsymbol{\mu})(\mathbf{p}_i - \boldsymbol{\mu})^T$$

If a centered point has coordinates $(x, y)$, its outer product with itself is a 2×2 matrix:

$$\begin{pmatrix} x \\ y \end{pmatrix} \begin{pmatrix} x & y \end{pmatrix} = \begin{pmatrix} x^2 & xy \\ yx & y^2 \end{pmatrix}$$

Summing these matrices over all points gives: diagonal = sum of squares (variance), off-diagonal = sum of products (covariance). Notice that $xy = yx$, so the covariance matrix is always symmetric.

🎛️ Interactive: Explore the Covariance Matrix

We can think of the covariance matrix as a transformation that shapes data. Starting from a circular cloud of points (Var(x)=Var(y)=1, Cov=0), the matrix stretches, squashes, and rotates it into an ellipse.

Your data: Var(x)=?, Var(y)=?, Cov=? — try these values to see the transformation that produces your data's shape!

Ellipse shows 2σ contour

Var(x) 1.00

Horizontal spread

Var(y) 1.00

Vertical spread

Cov(x,y) 0.00

Valid range: [−1.00, 1.00] — variables can't co-vary more than they each vary on their own (Cov² ≤ Var(x)·Var(y))

Σ = (

)

Equal variances, no correlation → circular distribution

Step 3 Find Principal Components (Eigenvalues & Eigenvectors)

Centered Data

PC1 (λ₁)

PC2 (λ₂)

Ellipse

The Goal: Find the Principal Components

We want to find the directions along which our data varies the most. These are called principal components. Mathematically, they turn out to be the eigenvectors of the covariance matrix. But before we can find them, we need to understand a few concepts.

Background: What is a Determinant?

Geometrically, a determinant measures how much a transformation "scales area":

• Det = 0: The matrix squashes 2D space onto a line — it loses a dimension

• Det ≠ 0: The matrix is invertible — no dimension is completely lost

For a 2×2 matrix $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$, the determinant is $ad - bc$. For our covariance matrix Σ:

$$\det(\Sigma) = \text{Var}(x) \cdot \text{Var}(y) - \text{Cov}(x,y)^2$$

When det(Σ) = 0: This means Var(x)·Var(y) = Cov(x,y)². The covariance "maxes out" — the variables co-vary as much as they possibly can. This is perfect correlation: all points lie exactly on a line. The data is truly 1-dimensional, just embedded in 2D.

When det(Σ) > 0: The data genuinely spreads in multiple directions. The larger the determinant, the more "2D" the data is.

What is an Eigenvector?

An eigenvector of a matrix is a special direction: when you apply the matrix transformation, the vector only gets stretched or shrunk, not rotated. The stretching factor is called the eigenvalue (λ, "lambda").

$$\Sigma \mathbf{v} = \lambda \mathbf{v}$$

This says: "Applying Σ to vector v just scales it by λ."

For PCA: The eigenvectors of the covariance matrix point along the principal axes of the data ellipse. The eigenvalue λ tells us the variance along that direction. The eigenvector with the largest λ is PC1 (the direction of maximum variance).

Why not just drop x or y? If your data spreads diagonally, dropping x loses half the information and dropping y loses the other half. Instead, we rotate to align with the principal axes first, then drop the direction with the least variance. The eigenvectors tell us exactly how to rotate.

Step 3a: Find Eigenvalues via the Characteristic Equation

To find λ, we rearrange Σv = λv into (Σ − λI)v = 0. For a non-zero solution v to exist, the matrix (Σ − λI) must be singular (determinant = 0):

$$\det(\Sigma - \lambda I) = 0$$

Subtracting λ from the diagonal gives:

$$\det\begin{pmatrix} \text{Var}(x) - \lambda & \text{Cov}(x,y) \\ \text{Cov}(x,y) & \text{Var}(y) - \lambda \end{pmatrix} = 0$$

Step 3b: Expand to Get a Quadratic Equation

Applying the determinant formula $ad - bc$:

$$(\text{Var}(x) - \lambda)(\text{Var}(y) - \lambda) - \text{Cov}(x,y)^2 = 0$$

Expanding gives a quadratic equation in λ:

$$\lambda^2 - \underbrace{(\text{Var}(x) + \text{Var}(y))}_{\text{trace}} \lambda + \underbrace{(\text{Var}(x) \cdot \text{Var}(y) - \text{Cov}(x,y)^2)}_{\text{determinant}} = 0$$

This quadratic has two solutions — those are our two eigenvalues λ₁ and λ₂.

Substituting your values: ...

The Trace and Determinant in Terms of Eigenvalues

Trace = Var(x) + Var(y) = λ₁ + λ₂ — the total variance is split between the two principal components.

Det = Var(x)·Var(y) − Cov(x,y)² = λ₁ × λ₂ — the product of eigenvalues.

This is why det ≈ 0 means one eigenvalue is tiny: if λ₁ × λ₂ ≈ 0 but λ₁ + λ₂ is substantial, then one λ must be near zero. That direction has almost no variance — perfect for discarding!

Step 3c: Solve Using the Quadratic Formula

Our equation $\lambda^2 - \text{trace} \cdot \lambda + \text{det} = 0$ is solved by:

$$\lambda = \frac{\text{trace} \pm \sqrt{\text{trace}^2 - 4 \cdot \text{det}}}{2}$$

trace(Σ) = ?, det(Σ) = ?

What this means: ...

λ₁ = ? (PC1 - larger)

λ₂ = ? (PC2 - smaller)

Step 3d: Find Eigenvectors

Now that we have λ₁ and λ₂, we find each eigenvector by solving (Σ − λI)v = 0:

$$\begin{pmatrix} \text{Var}(x) - \lambda & \text{Cov}(x,y) \\ \text{Cov}(x,y) & \text{Var}(y) - \lambda \end{pmatrix} \begin{pmatrix} v_x \\ v_y \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix}$$

From the first row: $(\text{Var}(x) - \lambda)v_x + \text{Cov}(x,y) \cdot v_y = 0$

Solving and normalizing gives the principal component directions:

v₁ = (?, ?)

v₂ = (?, ?)

The Variance Ellipse

The ellipse axes align with the eigenvectors (PC1 and PC2)
Axis lengths are proportional to √λ (standard deviation along each direction)
Elongated ellipse: PC1 captures most variance
Nearly circular: Both PCs are equally important

Variance Explained

$$\text{Variance explained by PC}_i = \frac{\lambda_i}{\lambda_1 + \lambda_2} \times 100\%$$

PC1: ?%

PC2: ?%

√λ₁ = ? (PC1 std dev)

√λ₂ = ? (PC2 std dev)

Rotation angle θ = ?°

Step 4 Project onto Principal Component

Original (2D)

Projected (1D)

PC1 Line

Dimensionality Reduction

By projecting each point onto PC1, we reduce 2D data to 1D while keeping the direction of maximum variance. The distance from each original point to its projection represents the information "lost" (variance along PC2).

The Trade-off

High PC1 variance: Little information lost in projection
Low PC2 variance: The "lost" dimension didn't carry much signal anyway
This is compression: We go from 2 numbers per point to 1 number.

Projection Formula

To project point x onto PC1 (eigenvector v₁):

$$\mathbf{x}_{\text{proj}} = \underbrace{(\mathbf{x} \cdot \mathbf{v}_1)}_{\text{scalar score}} \cdot \underbrace{\mathbf{v}_1}_{\text{direction}}$$

Why Multiply by v₁ Again?

The projection happens in two steps with different purposes:

Step 1: Dot product (x · v₁) → gives a scalar (single number). This is the "score" or coordinate in the new 1D space. If you only want dimensionality reduction, you can stop here.
Step 2: Multiply by v₁ → converts the scalar back to a 2D point. This tells us WHERE on the PC1 line the projection lands (for visualization or reconstruction).

Example: If score = 2.5 and v₁ = (0.8, 0.6), then:

• Score alone: just "2.5" (1D representation)

• Projected point: 2.5 × (0.8, 0.6) = (2.0, 1.5) (location in 2D)

Information Retained vs Lost

Information retained (PC1): ?%

Information lost (PC2): ?%

This is the core tradeoff of PCA: we sacrifice some variance (information) for a lower-dimensional representation.