Neural networks: foundations and mathematics · 01 / 09
The artificial neuron
From biological to mathematical, what really happens inside the elementary brick of a network.
Every neural network, from the simplest to the deepest, is an assembly of one elementary brick repeated by the millions. That brick, the artificial neuron, has nothing magical about it. It is an equation with three ingredients, inspired by a biological cell hundreds of millions of years old.
By the end of this chapter you will be able to answer three questions: what exactly does an artificial neuron compute, where does the idea come from, and why does a lone neuron carry a limitation that modern networks had to overcome.
The biological inspiration
Your brain contains roughly 86 billion neurons (Azevedo et al., 2009). Each neuron receives electrical signals from other neurons through its dendrites, integrates them in its cell body, and decides, based on an internal threshold, whether to fire a signal along its axon.
In 1943, Warren McCulloch and Walter Pitts model that behaviour with an equation (McCulloch & Pitts, 1943), laying the foundations of what will later be called the artificial neuron. It is not a faithful copy of biology, but a mathematical simplification that turns out to be powerful.
The referee analogy
Picture a football referee deciding whether a foul deserves a penalty. He weighs several pieces of information, each more or less important depending on context.
Practice before theory. Drag the sliders below. The output recomputes live. Your mission: find a combination where the output y is very close to 1, then another one where it is very close to 0. You will feel how each parameter pulls in its own direction.
Parametric diagram of a neuron with three inputs x₁, x₂, x₃, three weights w₁=0.8, w₂=0.5, w₃=0.9, a bias b=-0.5, and a sigmoid activation. With the default values (x₁=1, x₂=0, x₃=1), the weighted sum is z = 1·0.8 + 0·0.5 + 1·0.9 + (-0.5) = 1.2 and the output y = σ(1.2) ≈ 0.77. Dragging the sliders shows how each input, weight, and bias moves the output.
Three things to notice while playing:
An input at zero cancels the contribution of its weight, no matter the weight’s value.
Increasing a weight amplifies the influence of its input. Flipping it negative reverses the effect.
The bias shifts the output independently of the inputs. A very negative bias makes the neuron very hard to activate.
The mathematical formula
The neuron’s operation collapses into one equation. The expanded, readable form:
y = f( x₁·w₁ + x₂·w₂ + x₃·w₃ + b )
Expanded form
The same equation in compact notation with the sum symbol:
y=f(i=1∑nwixi+b)
And the densest form, in vector notation:
y=f(w⋅x+b)
All three forms say the same thing at different abstraction levels. Learn to recognise all three; you will encounter them in every scientific paper.
What each symbol means
xi : the inputs, the information the neuron receives
wi : the weights, the importance assigned to each input
Each input goes through a multiplication by its weight, all contributions sum with the bias, and the result passes through the activation function to produce the output.
From biological to artificial
Biological element
Role
Artificial counterpart
Dendrites
Receive incoming signals
Inputs x1,x2,…,xn
Synapses
Regulate signal strength
Weights w1,w2,…,wn
Cell body
Integrates all signals
Weighted sum ∑wixi
Activation threshold
Triggers (or not) a signal
Bias b + function f
Axon
Carries the output
Output y
An idea born in 1943
The artificial neuron has an 83-year history made of breakthroughs and long winters.
Year
Event
1943
McCulloch & Pitts model the neuron mathematically.
1949
Donald Hebb states “neurons that fire together wire together”.
Lighthill report in the UK: AI is judged unable to deliver on its promises. Funding collapses in Europe and the US. The first AI winter begins, lasting roughly a decade.
AlexNet (Krizhevsky, Sutskever, Hinton) crushes ImageNet on GPU. Modern deep learning begins.
What a single neuron cannot do
Formal definition: hyperplane and linear separability
A clean treatment demands a precise definition. Let x∈Rn be an input vector, w∈Rn a weight vector, b∈R a bias. We use Rn here to denote the set of ordered lists of n real numbers (chapter 2 formalises this notion). The set of points that zero the weighted sum
{x∈Rn∣w⋅x+b=0}
is called a hyperplane. In R2 it is a line, in R3 a plane, and in higher dimensions you can no longer draw it but the equation stands.
A hyperplane splits space into two half-spaces: one where w⋅x+b>0 (the neuron fires) and one where w⋅x+b<0 (it does not). A threshold neuron is exactly this partition operation.
A set of labelled points {(xi,yi)} with yi∈{0,1} is said linearly separable if there exists (w,b) such that the associated hyperplane correctly separates the points of label 1 from those of label 0. Otherwise it is not linearly separable.
Why XOR is not linearly separable
This result is not only visual, it can be proved in a few lines by contradiction. We use here the threshold function, which is that of the historical perceptron. The same impossibility holds with a sigmoid; we lose only the simplicity of the strict inequality.
Before tackling the proof, let us fix a notation that will be useful later. For a logical condition P, we write 1[P] for the indicator function of P: it equals 1 if P is true, 0 otherwise. In particular 1[z≥0] is the Heaviside function, sometimes written H(z), which we already met at the start of this chapter.
This proof fits in five lines but it closes the debate definitively: no choice of weights and bias can solve XOR with a single threshold neuron.
Drag the sliders below to tilt and shift the line. On AND and OR, you can reach 4 out of 4 correctly classified. On XOR, you will never succeed: one point is always on the wrong side.
Line : B = -1.00·A + 1.50
1 / 4 correctly classified
On XOR, no straight line correctly separates the four points. Try as much as you want, you will never hit 4 out of 4.
Figure: linear separation of XOR (interactive)
2D plot with four points from the XOR table: (0,0) and (1,1) labelled negative, (1,0) and (0,1) labelled positive. Sliders move and tilt a line. No orientation or position separates the two positive points from the two negative ones: XOR is not linearly separable, as Minsky and Papert proved formally in 1969.
That is exactly what Minsky and Papert proved in 1969. Geometrically, a neuron is a line; XOR requires a boundary that cannot be linear. The fix will come from multi-layer networks, which compose several lines to draw more complex boundaries.
The role of the bias, visually
Without bias, the line drawn by the neuron must pass through the origin. That is a strong constraint: most real-world problems have a decision boundary that does not sit at the origin.
The bias fixes this by translating the line anywhere in the plane. Mental image: weights control the orientation of the line (its slope); the bias controls its position.
To see it live, switch the component below to the OR dataset. Keep the slope near −1 and move only the intercept (which here plays the bias role). The line slides parallel to itself. Without that translation, you cannot correctly classify the three orange points.
Line : B = -1.00·A + 0.50
4 / 4 correctly classified
Figure: role of the bias on the OR dataset (interactive)
Same 2D plot, but on the OR table: only (0,0) negative, the three other points positive. Slope initialised at -1, intercept at 0.5. Moving only the intercept (which plays the role of the bias) slides the line parallel to itself. Without a bias (no translation), the line would have to pass through the origin and could not isolate (0,0) from the three others.
Another way to feel the bias: go back to the NeuronDiagram above, set every input to zero. The output now depends only on the bias. That is the neuron’s “base level”, independent of any input data.
In one sentence
An artificial neuron computes a linear combination of its inputs, adds a bias, and passes everything through a non-linear function. That’s it. The power comes from what we do with them once we stack them and train them.
On to chapter 2
You saw in the last section that the neuron formula also reads in the compact form y=f(w⋅x+b). That vector notation is everywhere in deep learning, but we never really defined it: what is a vector exactly? What does the central dot between w and x mean? Chapter 2 lays down those linear-algebra foundations, staying strictly useful for the rest of the course.
Exercises
Take a sheet of paper and a pencil. Solutions are right below, look at them only after trying.
Exercise 1: direct computation
Given a two-input neuron with w1=2, w2=−1, b=0.5, and ReLU activation ReLU(z)=max(0,z). Compute the output for x1=0.7 and x2=0.3.
Exercise 2: build an AND neuron
Find a triple (w1,w2,b) such that a threshold neuron H(z)=1[z≥0] with two binary inputs x1,x2∈{0,1} implements the logical AND function. Verify on the four cases.
Sources
Azevedo, F. A. et al. (2009). “Equal numbers of neuronal and nonneuronal cells make the human brain an isotropically scaled-up primate brain.” Journal of Comparative Neurology 513(5), 532-541. DOI 10.1002/cne.21974
McCulloch, W. & Pitts, W. (1943). “A Logical Calculus of Ideas Immanent in Nervous Activity.” Bulletin of Mathematical Biophysics 5(4), 115-133. DOI 10.1007/BF02478259
Hebb, D. O. (1949). The Organization of Behavior. Wiley. Archive.org
Rosenblatt, F. (1958). “The Perceptron: a probabilistic model for information storage and organization in the brain.” Psychological Review 65(6), 386-408. DOI 10.1037/h0042519
Minsky, M. & Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. MIT Press.
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986). “Learning representations by back-propagating errors.” Nature 323(6088), 533-536. DOI 10.1038/323533a0
Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012). “ImageNet Classification with Deep Convolutional Neural Networks.” NeurIPS 25. NeurIPS link
Lighthill, J. (1973). Artificial Intelligence: A General Survey. Science Research Council, United Kingdom. Archive.org
Further reading
Goodfellow, I., Bengio, Y. & Courville, A. (2016). Deep Learning. MIT Press. Chapter 6 on feed-forward networks. deeplearningbook.org
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer. Chapter 5 on neural networks.
Hastie, T., Tibshirani, R. & Friedman, J. (2009). The Elements of Statistical Learning. Springer. Chapter 11. Free PDF at Stanford
LeCun, Y. (lecture series at Collège de France, 2016 onwards). “L’apprentissage profond.” college-de-france.fr
Ng, A. (online course on Coursera, “Deep Learning Specialization”). Excellent complement.
Quiz
1. What does a neuron compute before the activation function?
2. Why can a single neuron not learn XOR?
3. What is the geometric role of the bias?
4. What does the activation function f do in y = f(Σ wᵢ xᵢ + b)?
5. True or false: the artificial neuron is a faithful copy of the biological neuron.
Quiz: chapter 1 recap (interactive)
Multiple-choice quiz checking the chapter's takeaways: what a neuron computes, the role of weights and bias, and the XOR limit. Five questions with an explanation for the correct answer upon submission. To be taken online for automatic scoring.