The XOR Problem

Why neural networks need hidden layers: some problems aren't linearly separable.

Part 1: Can a Single Line Separate the Classes?

Select a logic gate and try to draw a line that separates the green (output=1) points from the red (output=0) points.

x1x2Output
000
011
101
111

Perceptron (Single Neuron)

Hover over output neuron to see computation

Activation Function

Step: σ(x) = 1 if x > 0, else 0
Input Space

Perceptron Weights

Adjust weights to find a separating line

Adjust the weights to separate the classes!

Part 2: Solving XOR with a Hidden Layer

A hidden layer transforms the input space, making the problem linearly separable. Adjust the weights to see how the transformation works. One approach: h₁ detects "at least one input is on" (OR) and h₂ detects "not both are on" (NAND), then combine with AND. Another approach: h₁ detects "x₁ is on but not x₂" and h₂ detects "x₂ is on but not x₁", then combine with OR.

Network Architecture

Activation Function

Step: σ(x) = 1 if x > 0, else 0
h1:
Green region = h1 fires

h1 Weights

h1 = σ(w11·x1 + w12·x2 + b1)
Boundary: w11·x1 + w12·x2 + b1 = 0

h2:
Green region = h2 fires

h2 Weights

h2 = σ(w21·x1 + w22·x2 + b2)
Boundary: w21·x1 + w22·x2 + b2 = 0

Hidden Space (h1, h2)
output boundary ()

Output Layer Weights

Combined: Network Output (XOR)
h1 () h2 ()

How XOR Works

XOR = h1 AND h2

Adjust the weights to find a valid XOR solution.

Forward Pass Computation

Input (x1,x2) Hidden (h1,h2) Output (ŷ) Target Correct?
(0, 0)--0-
(0, 1)--1-
(1, 0)--1-
(1, 1)--0-
Network Accuracy: 0/4

The Key Insight

The XOR function outputs 1 when exactly one input is 1. In the input space, the two classes (0 and 1) are arranged diagonally — no single line can separate them.

The hidden layer acts as a feature transformation. Each hidden neuron computes:

$$h_i = \sigma(w_{i1}x_1 + w_{i2}x_2 + b_i)$$

where σ is the step function. This transforms the 4 input points into a new 2D space where they can be linearly separated.

There are infinitely many sets of weights that correctly solve XOR — the two presets above are just two examples of fundamentally different approaches.