GLOSSARY

Glossary

Short definitions of the technical terms used in the interactive courses. Every dotted-underlined word in a course links back to its entry here.

Acknowledgement

The signal by which a consumer confirms that a message has been handled, allowing the broker to erase it. Its timing is decisive: acknowledging before processing risks loss (at-most-once), acknowledging after successful processing risks a duplicate on redelivery (at-least-once). As long as no acknowledgement arrives, the broker may redeliver the message.

Source: RabbitMQ, Consumer Acknowledgements

See also: At-most-once delivery , At-least-once delivery , Message broker
Activation function

A non-linear function applied to the output of a neuron's weighted sum. Without it, a neural network would collapse to a linear combination, no matter how deep. The classics are sigmoid, ReLU, tanh.

See also: Sigmoid , ReLU
Affine combination

A linear combination to which a constant term (bias) is added. A neuron's weighted sum together with its bias is an affine combination of its inputs. Composing several affine combinations without a non-linearity yields a single equivalent affine combination.

See also: Linear combination , Non-linearity , Bias
AI winter

Period of disinterest and funding cuts for artificial intelligence research. The first AI winter, in the 1970s and early 1980s, followed Minsky and Papert's critique of the perceptron (1969). The second, in the late 1980s and 1990s, followed disappointments with expert systems. Each winter preceded a comeback: backpropagation after the first, modern deep learning after the second.

Source: Russell & Norvig, *AIMA*, ch. 1

See also: Minsky & Papert , Perceptron , Backpropagation
Approximate search

A family of methods (ANN, for Approximate Nearest Neighbor) that accept occasionally missing the true nearest neighbor in order to answer far faster, or with far less memory. You trade a little accuracy, measured by recall, against speed or space. It stands opposed to exhaustive search, exact but costly. HNSW, IVF and product quantization are its main families.

See also: Exhaustive search , Recall@k , HNSW
Associated data (AAD)

Data authenticated by an AEAD algorithm but not encrypted, typically metadata such as a header, identifier, or usage context. It binds the ciphertext to its context: any mismatch between the expected associated data and the value provided at decryption invalidates the tag and causes the open operation to fail.

See also: Authenticated encryption (AEAD) , Authentication tag , Domain separation
Asynchronous messaging

An exchange mode where the sender drops a message and carries on without waiting for the recipient to process it. The message waits in a queue or a log until a consumer picks it up. It contrasts with synchronous communication, where the caller stays blocked until the reply arrives.

See also: Temporal coupling , Command message
At-least-once delivery

A guarantee that a message is processed one or more times, never zero. It is obtained by acknowledging only after successful processing and redelivering as long as no acknowledgement arrives: a crash after processing but before the acknowledgement triggers a redelivery, hence a duplicate. It never loses, but it can duplicate; this is why the processing must be idempotent.

Source: Kleppmann, 2017

See also: Acknowledgement , At-most-once delivery , Effectively-once delivery , Idempotence
At-most-once delivery

A guarantee that a message is processed zero or one time, never more. It is obtained by acknowledging the message as soon as it is received, before processing it: a crash between the acknowledgement and the end of processing loses the message, since it has already been erased. It never duplicates, but it can lose. Suited to streams where occasional loss is harmless.

Source: Kleppmann, 2017

See also: Acknowledgement , At-least-once delivery , Effectively-once delivery
Authenticated encryption (AEAD)

A cryptographic primitive that simultaneously guarantees confidentiality and integrity of a message. It produces an authentication tag alongside the ciphertext, and any tampering causes decryption to fail cleanly. AEAD (Authenticated Encryption with Associated Data) extends this by also authenticating unencrypted associated data bound to the usage context.

Source: NIST SP 800-38D

See also: Authentication tag , Associated data (AAD) , Nonce , Malleability
Authentication tag

A short value (typically 128 bits) produced by an AEAD algorithm or MAC, verified at decryption time. Any change to the ciphertext, associated data, or nonce invalidates the tag and causes the open operation to fail. It provides both integrity and authenticity guarantees.

See also: Authenticated encryption (AEAD) , Associated data (AAD) , Malleability
Backpropagation

An algorithm that computes the gradient of the loss function with respect to every weight in a neural network. It propagates the error from the output back through earlier layers using the chain rule. It is the core of multi-layer network training.

Source: Rumelhart, Hinton and Williams, 1986

See also: Activation function
Backward pass

The phase of backpropagation where the error signal propagates from output to input, the reverse direction of the forward pass, to assemble each weight's gradient. It reuses the activations stored during the forward pass instead of recomputing them.

See also: Forward pass , Backpropagation , Error signal
Bias

A constant term added to the weighted sum of a neuron, independent of the inputs. Geometrically, it translates the decision boundary in input space. Without bias, that boundary would necessarily pass through the origin.

See also: Weighted sum
Cartesian product

The operation that builds a set of pairs from two sets, written ×. A × B is the set of all ordered pairs (a, b) where a ∈ A and b ∈ B. Order matters: (a, b) is not (b, a). If A has m elements and B has n, then A × B has m × n. It is the starting brick of relations and functions.

See also: Set , Membership
Cauchy-Schwarz inequality

For any two vectors x and w in Rⁿ, |x · w| ≤ ‖x‖ · ‖w‖. Equality holds only if the two vectors are colinear. It is the fundamental inequality of linear algebra, ensuring consistency between the algebraic and geometric formulations of the dot product.

Source: Cauchy 1821, Schwarz 1888

See also: Dot product , Norm
Chain rule

The rule for differentiating a composition of functions: the derivative of f(g(x)) is f'(g(x)) times g'(x). Local derivatives are multiplied along the path. It is the mechanical heart of backpropagation.

See also: Derivative , Function composition , Backpropagation
Classification

A supervised learning task that predicts a class from a finite set. Binary if two classes (cat or not), multi-class beyond (bird species among 200). Typically uses sigmoid or softmax as output.

See also: Sigmoid , Softmax , Loss function
Coincidence detection

The ability of a stateful neuron to fire only when two inputs arrive within a short interval of each other. Its internal integration window makes this computation possible, where a memoryless neuron cannot. Illustrated by the Jeffress model (1948) for sound localization.

Source: Jeffress (1948)

See also: Stateful neuron
Command message

A message that expresses an intention, an order addressed to a specific recipient asking it to do something (for example "Charge the payment"). It is named in the imperative, in principle has a single handler, and the sender expects an effect to happen. To be distinguished from an event message, which states a fact that already occurred.

See also: Event message , Asynchronous messaging
Command-query separation

A design principle stating that an operation must either change the system state while returning nothing useful (a command), or return data while changing nothing (a query), never both at once. Coined by Bertrand Meyer (Command-Query Separation, CQS), it makes every message readable: you can tell at a glance whether it writes or reads. It is the line that cleanly separates a command from a query.

Source: Meyer, 1988

See also: Command message , Query message
Complement

The operation that returns what is not in a set, relative to a reference universe. The complement of A, written Aᶜ (or A bar), is the set of objects of the universe that do not belong to A. Its membership condition is a negation: x ∈ Aᶜ is equivalent to "not (x ∈ A)". The complement depends on the chosen universe: without a fixed universe, it has no meaning.

See also: Union , Intersection , Domain of discourse
Computation graph

A representation of a computation as a directed graph whose nodes are operations and whose edges are the values flowing between them. Reading a network as a computation graph makes backpropagation systematic: local derivatives are multiplied along the edges, from outputs back to inputs.

See also: Chain rule , Backpropagation , Forward pass
Constant-time

A property of an implementation whose execution duration does not depend on secret values, eliminating timing side channels. It is essential for authentication tag comparison and sensitive cryptographic operations. ChaCha20-Poly1305 is naturally constant-time in software, whereas AES requires hardware instructions (AES-NI) to achieve this property.

See also: Authentication tag , Padding oracle , Authenticated encryption (AEAD)
Consumer group

A set of consumers that share a single offset to split the reading of a log: within the group, each message is processed only once. Several distinct groups read the same log independently, each with its own offset, so a message is re-read as many times as there are groups. It is the log-side counterpart of a broker's competing consumers.

Source: Kleppmann, 2017

See also: Message log , Offset , Message broker
Cosine similarity

A measure of proximity between two vectors, defined as cos(θ) = (x · w) / (‖x‖ ‖w‖), a value in [-1, 1]. Equal to 1 when the vectors are aligned, 0 when orthogonal, -1 when opposite. A standard tool for comparing embeddings (words, sentences, images).

See also: Dot product , Norm , Cauchy-Schwarz inequality
Cost surface

The graph of the cost seen as a function of the network weights, with the data held fixed. Each set of weights is a point whose altitude is the corresponding cost. Learning amounts to descending toward a valley of this landscape, which the gradient chapters will do.

Source: Goodfellow, Bengio and Courville, 2016

See also: Loss function , Gradient descent
Counter-example

An element of the domain that makes a universal statement false. To refute "∀x, P(x)", it suffices to exhibit a single x such that P(x) is false: this is the direct reading of the equivalence ¬(∀x, P(x)) ≡ ∃x, ¬P(x). A counter-example demolishes a conjecture with nothing more to add.

See also: Universal quantifier , Existential quantifier , Implication
Cross-entropy

A classification loss that measures the gap between the predicted distribution and the target distribution. It equals minus the logarithm of the probability assigned to the correct class, so it blows up when the model is confident and wrong. Paired with the softmax function, it is the standard multiclass loss.

Source: Bishop, 2006

See also: Loss function , Softmax , Classification
Crypto-agility

The ability of a format or protocol to migrate to new cryptographic primitives without breaking existing data. It is typically implemented via a version byte at the head of the ciphertext, allowing old records to be decoded and new ones encrypted with the current algorithm. It is essential for preparing a post-quantum migration.

See also: Authenticated encryption (AEAD) , Domain separation
Curse of dimensionality

A set of counterintuitive phenomena that appear when the number of dimensions grows large. In vector search, two effects dominate: distances between randomly drawn points concentrate (nearest and farthest become almost indistinguishable), and two random vectors are almost always nearly perpendicular. This is what makes nearest-neighbor search hard in high dimension.

See also: Distance concentration , Orthogonality
Decision boundary

The set of input-space points where the model switches from one class to another, that is, where its output flips. For a single neuron it is a hyperplane ; for a multilayer network it can become polygonal, then curved.

Source: Bishop, 2006

See also: Hyperplane , Linearly separable , Multilayer perceptron
Deduplication key

The identifier a consumer uses to recognize a message it has already processed. It is often the stable message id provided by the broker, sometimes a business key (an order number). It is stored in an inbox table: a redelivered message whose key is already present is discarded. It answers the question: is this the same message?

Source: Kleppmann, 2017

See also: Idempotent consumer , Idempotence , Acknowledgement
Derivative

The slope of a function at a point. Formally, the limit of the rate of change (f(x+h) - f(x)) / h as h tends to zero. It tells how much, and in which direction, the output changes when the input moves by a hair.

See also: Chain rule , Gradient
Differential oracle

A testing method that validates fast-but-approximate (or optimized) code by comparing its output against a slow-but-exact reference, on the metric that truly matters. Instead of checking local properties (the result is well-formed), it measures the quality gap against the ground truth produced by the naive implementation. Essential when an algorithm can be structurally correct yet globally wrong: approximate search, caches, heuristics.

See also: Approximate search , Recall@k , Exhaustive search
Distance concentration

The phenomenon by which, in high dimension, distances between randomly drawn points tighten around a common value. The relative contrast (maximum minus minimum distance, divided by the minimum) tends to zero like the inverse of the square root of the dimension. As a result, the notion of a nearest neighbor loses meaning when all distances look alike.

See also: Curse of dimensionality , Euclidean distance
Distributional hypothesis

The founding idea of vector semantics: a word is characterized by the contexts in which it appears, so words that share contexts have close meanings. Summarized by Firth's phrase, you shall know a word by the company it keeps. This principle is what justifies learning embeddings where geometric proximity encodes proximity of meaning.

Source: Firth, 1957

See also: Embedding , Vector space
Domain of discourse

The set of objects that the variables of a quantified predicate range over. The truth of a quantified statement depends entirely on it: "∃x, x² = 2" is false over the integers but true over the reals. Stating the domain is therefore not a detail, it is part of the statement.

See also: Predicate , Universal quantifier , Existential quantifier
Domain separation

A technique that derives distinct keys or contexts for each role or usage, ensuring that a valid ciphertext in one domain cannot be replayed in another. It is typically implemented via role-specific associated data, key derivation prefixes, or version bytes. It is essential for preventing context-confusion attacks.

See also: Associated data (AAD) , Nonce , Crypto-agility
Dot product

An operation taking two vectors of equal dimension and returning a single number, computed as the sum of element-wise products. It is exactly the computation a neuron performs between its inputs and weights.

See also: Vector , Weighted sum
Dual write

A situation where a service must modify two distinct systems for a single action, typically its own database and a message broker. Since no single transaction spans both, a crash between the two writes leaves an inconsistency: the database is updated but the message was never sent, or the message was sent but the database was rolled back. This is the problem the transactional outbox solves.

Source: Richardson, Microservices Patterns

See also: Transactional outbox , Outbox relay
Dying ReLU

The phenomenon where a ReLU neuron whose input stays negative ends up with both zero output and zero gradient. The neuron freezes, stops updating, and remains dead until the end of training. Mitigated by Leaky ReLU, ELU, GELU variants.

See also: ReLU , Leaky ReLU
Effectively-once delivery

The combination of at-least-once delivery and idempotent processing, which makes the observable effect identical to a single processing. Delivery duplicates are not removed but neutralized: an already-processed message is recognized and its effect is not reapplied. It is the realistic approximation of exactly-once, which is impossible at the delivery level because of the two generals problem.

Source: Kleppmann, 2017

See also: At-least-once delivery , Idempotence , Two generals problem
Embedding

A representation of an object (word, sentence, image) as a vector of real numbers, learned by a neural network so that geometric proximity reflects proximity of meaning. Two texts with close meanings get close vectors. Typical dimensions range from a few hundred to a few thousand (for example 768 or 1536).

Source: Mikolov et al., 2013

See also: Vector , Vector space , Cosine similarity
Error signal

The sensitivity of the loss to a neuron's pre-activation, written delta = dL/dz. It measures how much the score would change if the neuron's net input shifted by a hair. Backpropagation computes this signal for every neuron, from output back to input, then derives each weight's gradient through the rule dL/dw = delta times the upstream activation.

See also: Backpropagation , Gradient , Partial derivative
Euclidean distance

The distance between two vectors u and v in Rⁿ, defined as the norm of their difference: d(u, v) = ‖u - v‖. It generalises the distance between two points of the plane to n dimensions. Used to measure similarity between two vector representations.

See also: Norm , Vector
Event message

A message that announces a fact that already happened (for example "Order paid"). It is named in the past tense, broadcast to whoever wants to listen, and the sender does not know who consumes it, or even whether anyone does. Several subscribers can react to the same event. To be distinguished from a command message, which asks a single recipient for a future action.

See also: Command message , Temporal coupling
Exhaustive search

A strategy that compares the query against every vector in the database, one by one, to extract the closest ones. Also called linear scan or Flat index. It is exact by construction (it cannot miss anything) but its cost grows linearly with the number of vectors and their dimension, in O(n x d). It serves as the reference oracle for judging approximate methods.

See also: Nearest neighbors , Recall@k
Existential quantifier

The symbol ∃, read "there exists". The statement ∃x, P(x) is true as soon as at least one element of the domain of discourse makes the predicate P true. Such an element is called a witness. Existence does not require uniqueness: one or several witnesses are enough.

See also: Predicate , Universal quantifier , Domain of discourse
Expressive power

The range of functions a model can represent as its parameters vary. A single perceptron expresses only linear separations ; adding hidden layers widens the expressive power until it can approximate any continuous function.

Source: Goodfellow, Bengio & Courville, 2016

See also: Universal approximation theorem , Multilayer perceptron , Hidden layer
Few-shot learning

A model's ability to learn a new task from very few examples (typically between 1 and 10). An open challenge for classical networks that need thousands of examples, but advancing fast with large foundation models.

See also: Foundation model
Forward pass

The forward propagation. A computation phase where an input traverses the network layer by layer, from inputs to output, applying at each neuron its weighted sum and activation function. Produces the final prediction.

See also: Weighted sum , Activation function , Backpropagation
Foundation model

A very large neural network trained on a massive amount of general-purpose data, which can then be adapted to many specific tasks. The term was coined by Bommasani et al. in 2021. Typical 2026 examples include GPT-4, Claude and Gemini.

Source: Bommasani et al., 2021

See also: Transformer , llm-mcp
Function composition

The operation of applying one function to the result of another, written f ring g. A multilayer network is a composition where each layer output becomes the next layer input, and this nesting, alternated with non-linear activations, is what creates the global non-linearity.

See also: Multilayer perceptron , Non-linearity , Hidden layer
Functional margin

For a sample $(x, y)$ with $y \in \{-1, +1\}$, the functional margin is the quantity $\hat\gamma = y (w \cdot x + b)$. It is strictly positive if and only if the sample is correctly classified. It depends on the scale of the weights and is not a geometric distance.

Source: Bishop, PRML, ch. 7

See also: Geometric margin , Linearly separable , Perceptron
GELU

Gaussian Error Linear Unit, a modern ReLU variant defined as GELU(x) = x · Φ(x) where Φ is the Gaussian cumulative distribution function. Smoother than ReLU around zero, dominant in transformers (GPT, BERT, Claude).

Source: Hendrycks and Gimpel, 2016

See also: ReLU , Transformer
Geometric margin

Minimal perpendicular distance between a separating hyperplane and the points of the dataset. Defined by $\gamma = \min_i y_i (w \cdot x_i + b) / \|w\|$ with $y_i \in \{-1, +1\}$. Plays a central role in Novikoff's theorem and in the formulation of support vector machines.

Source: Novikoff, 1962

See also: Functional margin , Linearly separable , Novikoff's theorem
Gradient

The vector of all partial derivatives of a function. It points in the direction of steepest increase of the function at a given point, and its norm measures the slope. In training, we follow the opposite of the gradient to drive the loss down.

See also: Gradient descent , Backpropagation
Gradient descent

An optimisation algorithm that iteratively adjusts the parameters of a model to minimise a loss function. At each step, it moves the parameters in the direction opposite the gradient, by a distance proportional to the learning rate. The dominant method for training neural networks.

Source: Cauchy, 1847

See also: Gradient , Learning rate , Loss function
Greedy search

A movement strategy in a proximity graph: at each step you hop to the neighbor closest to the query, and you stop as soon as no neighbor is closer than the current node. Fast and short-sighted, it takes the best local move without planning, which does not guarantee reaching the true nearest neighbor: it can get stuck in a local minimum.

See also: Proximity graph , Local minimum , Nearest neighbors
Half-space

One of the two regions in which a hyperplane partitions Rⁿ. Algebraically, the set of points x with w · x + b > 0 (resp. < 0). A threshold neuron splits space into exactly two half-spaces: active and inactive.

See also: Hyperplane , Threshold function
Hallucination

Output by a language model of a false statement, asserted with confidence. A structural flaw of training by likelihood maximisation, which pushes the model to always produce a plausible answer even when it should say it does not know.

See also: Foundation model , Transformer
Hidden layer

An intermediate layer in a neural network, sitting between the input layer and the output layer. Its neurons neither receive raw data nor produce the final prediction, they compute intermediate representations. A "deep" network has several hidden layers.

See also: Activation function , Multilayer perceptron , Expressive power
HNSW

Hierarchical Navigable Small World. An approximate search index that stacks proximity graphs in layers, sparse and coarse at the top, dense and fine at the bottom. A greedy walk descends layer by layer to find the nearest neighbors in about log n hops. Two knobs: M (neighbors per node, paid in memory) and ef (beam width, paid in time).

See also: Proximity graph , Small-world network , Greedy search , Recall@k
Hyperplane

A subset of Rⁿ defined by a linear equation w · x + b = 0. In two dimensions it is a line, in three a plane. It is exactly the decision boundary drawn by a single neuron.

See also: Vector , Dot product , Perceptron
Idempotence

The property of an operation whose repeated execution produces the same result as a single execution. Applied to messaging, it makes duplicates harmless: an already-processed message is recognized and its effect is not reapplied. It is the mechanism that turns at-least-once delivery into effectively-once delivery. How to concretely build an idempotent consumer (deduplication key, atomicity) is the subject of the next chapter.

Source: Hohpe & Woolf, 2003

See also: At-least-once delivery , Effectively-once delivery , Acknowledgement
Idempotent consumer

A consumer whose processing produces the same result whether a message is handled once or several times. It tracks every already-processed message by its deduplication key and, in the same transaction as the effect, marks that key as seen: on a redelivered duplicate it recognizes the key and skips the effect. The atomicity between applying the effect and recording the key is essential, otherwise a crash between the two reopens the two generals problem inside its own database.

Source: Hohpe & Woolf, Enterprise Integration Patterns

See also: Idempotence , Deduplication key , At-least-once delivery
Implication

The "if... then..." connective, written ⇒. The proposition P ⇒ Q is false in exactly one case: when P is true and Q is false. In particular, an implication with a false premise is always true.

See also: Logical connective , Logical equivalence
Inclusion

A relation between two sets, written ⊆. "A ⊆ B" reads "A is included in B" or "A is a subset of B", and means that every element of A is also an element of B. Its definition is a quantified statement: A ⊆ B is equivalent to "for all x, x ∈ A implies x ∈ B". Two sets are equal exactly when each is included in the other (double inclusion).

See also: Membership , Implication , Universal quantifier
Integrate-and-fire

A neuron model that accumulates input current in a membrane potential with a leak (time constant tau) and emits a spike when a threshold is crossed. The first internal state variable of a neuron, introduced by Lapicque in 1907.

Source: Lapicque (1907)

See also: Stateful neuron , Spiking neural network
Intersection

The operation that keeps only what two sets share, written ∩. A ∩ B is the set of objects that belong to both A and B. Its membership condition is a conjunction: x ∈ A ∩ B is equivalent to "x ∈ A and x ∈ B". When A ∩ B is empty, A and B are said to be disjoint.

See also: Union , Complement , Logical connective
Intrinsic plasticity

A lasting change in a neuron's excitability through its own internal dynamics (adaptive threshold, accommodation), without changing synaptic weights. A form of learning that resides not in the connections but in the neuron's state.

See also: Stateful neuron
Ion channel

A pore through a neuron's membrane that lets specific charged ions pass. Always-open channels leak a steady current, modelled as a resistance. Other channels open and close depending on the voltage itself and actively generate the spike (the Hodgkin-Huxley model).

Source: Hodgkin & Huxley, 1952

See also: Resting potential , Membrane potential
IVF (inverted file)

Inverted File. An index that partitions the space into cells, computed by k-means, and files each vector into the cell of its nearest centroid. At search time, only the nprobe cells closest to the query are scanned, not the whole database. IVF wins latency without reducing memory, since the vectors are still stored in full. The nprobe number tunes the trade-off between speed and recall.

See also: Approximate search , Nearest neighbors , Recall@k
Leaky ReLU

A ReLU variant that lets a small slope alpha (typically 0.01) pass on the negative side instead of being strictly zero. Formula: LeakyReLU(x) = x if x > 0, alpha x otherwise. Avoids the dying ReLU problem.

Source: Maas, Hannun and Ng, 2013

See also: ReLU , Dying ReLU
Learning rate

A positive scalar controlling the step size taken by gradient descent at each iteration. Too small, training is slow; too large, it oscillates or diverges. Often denoted η (eta) or α (alpha). The first hyperparameter to tune in any training run.

See also: Gradient descent
Learning rule

Procedure that updates the parameters (weights, bias) of a model from observed samples. For the perceptron, the rule is $w \leftarrow w + \eta \cdot y \cdot x$ and $b \leftarrow b + \eta \cdot y$ applied only when a sample is misclassified.

Source: Rosenblatt, 1958

See also: Learning rate , Perceptron , Gradient descent
Linear combination

An expression of the form a₁ v₁ + a₂ v₂ + ... + aₙ vₙ where the aᵢ are scalars and the vᵢ are vectors. A neuron's weighted sum is a linear combination of the inputs with the weights as coefficients.

See also: Weighted sum , Dot product
Linearly separable

A labelled dataset is linearly separable if there exists a hyperplane that correctly separates the label-1 points from the label-0 points. XOR is the historical example of a non linearly separable problem.

See also: Hyperplane , XOR (exclusive or) , Perceptron
Local minimum

A graph node whose immediate neighbors are all farther from the query than itself, even though a much better point exists elsewhere in the graph, out of direct reach. A greedy search wrongly stops there, believing it has found the nearest neighbor. Widening the beam (keeping several candidates) lets it escape.

See also: Greedy search , Proximity graph , Recall@k
Logical connective

A symbol that combines one or two propositions into a new one. The five basic connectives are negation (¬), conjunction (∧), disjunction (∨), implication (⇒) and equivalence (⇔).

See also: Proposition , Implication , Truth table
Logical equivalence

A relation between two propositions that share the same truth value in every possible case. The associated connective, written ⇔, reads "if and only if" and amounts to a double implication.

See also: Implication , Truth table
Loss function

A measure of the error between a network's prediction and the expected truth. Also called cost function. The higher it is, the more wrong the network. Training seeks to minimise it. Common examples: MSE for regression, cross-entropy for classification.

See also: Gradient descent , Gradient
Malleability

A property of an encryption scheme where modifying the ciphertext produces a predictable, exploitable change in the corresponding plaintext. Unauthenticated cipher modes (stream, CTR, CBC without MAC) are malleable. Using an AEAD algorithm eliminates this property by causing any modified ciphertext to fail decryption.

See also: Authenticated encryption (AEAD) , Authentication tag , Padding oracle
Mark I Perceptron

Physical machine built by Frank Rosenblatt between 1958 and 1960 at Cornell Aeronautical Laboratory. Able to recognise simple shapes thanks to 400 photoreceptors connected to weights that were tunable via motorised potentiometers. The first hardware implementation of a machine learning algorithm, distinct from the theoretical model published in 1958.

Source: Rosenblatt, 1958, 1960

See also: Perceptron
Matrix

A rectangular array of numbers organised in rows and columns. An m×n matrix has m rows and n columns. In a neural network, a layer of m neurons each having n inputs collapses into an m×n weight matrix.

See also: Vector , Dot product
Mean squared error

A loss function that averages the square of the gap between the prediction and the target. The square penalizes large gaps heavily and makes the cost differentiable everywhere. Written MSE, it is the natural choice for regression.

Source: Goodfellow, Bengio and Courville, 2016

See also: Loss function , Regression
Mediator

An object that centralizes message routing inside a single process, in memory. Instead of the sender referencing the right handler directly, it hands the message to the mediator, which knows which handler to route it to: a single one for a command or a query, zero to many subscribers for an event. It differs from a bus or broker, which provides the same service but across the network, between processes.

Source: Gamma et al., 1994

See also: Command message , Query message , Event message
Membership

The fundamental relation between an object and a set, written ∈. "x ∈ A" reads "x belongs to A" and means that x is one of the elements of A. Its negation is written ∉. Membership is the basic predicate of set theory: everything else, inclusion and operations, is defined from it.

See also: Set , Inclusion , Predicate
Membrane potential

The internal variable of a stateful neuron measuring its accumulated electric charge. It rises when inputs arrive, slowly leaks back toward rest when no input comes, and triggers a spike once it reaches a threshold, after which it resets.

See also: Stateful neuron , Integrate-and-fire , Spiking neural network
Message broker

An intermediary that receives messages, holds them in queues and hands each one to a consumer, then erases it once it has been acknowledged. The archetype is RabbitMQ: a delivered and acknowledged message is gone, it is not kept to be re-read. The progress state (what is left to deliver) lives in the broker, not in the reader.

Source: Hohpe & Woolf, 2003

See also: Message log , Consumer group , Mediator
Message log

An ordered, append-only sequence of messages kept instead of being erased after reading. The archetype is Kafka: each message gets a fixed position, and several readers can re-read it independently, each at its own pace. Unlike the broker, the read state does not live in the log but in the consumer, as an offset.

Source: Kleppmann, 2017

See also: Message broker , Offset , Consumer group
Metamorphic testing

A testing technique that checks an expected RELATION between several runs rather than one exact output value, useful when the right answer is unknown or too costly to compute (the oracle problem). For example: permuting the order of inputs must not change the result, or doubling an input must double the output. The differential oracle is a special case, where the checked relation is equality to an exact reference.

See also: Differential oracle , Approximate search
Minsky & Papert

Marvin Minsky and Seymour Papert, authors of *Perceptrons* (MIT Press, 1969), which formally proved the limits of a single perceptron, notably the impossibility of computing the XOR function. Their analysis contributed to the decline of public funding for neural network research until the mid-1980s.

Source: Minsky & Papert, *Perceptrons*, MIT Press, 1969

See also: Perceptron , XOR (exclusive or) , AI winter
Multilayer perceptron

A neural network organised in successive layers (input, one or more hidden layers, output), where each neuron applies an affine combination followed by an activation function. By stacking neurons it overcomes the single perceptron limit and computes non-linearly-separable functions such as XOR.

Source: Rumelhart, Hinton & Williams, 1986

See also: Hidden layer , Perceptron , XOR (exclusive or) , Function composition
Nearest neighbors

The problem of finding, among a set of points, the k points closest to a query under a distance or similarity measure. In vector search, k nearest neighbors (k-NN) means the k documents whose embedding is closest to the query's.

See also: Euclidean distance , Cosine similarity , Exhaustive search
Neuromorphic computing

A branch of computer science that designs hardware imitating biological brain operation (spiking neurons, local memory, asynchronous computation). An active research field at Intel (Loihi), IBM (TrueNorth) and several academic laboratories.

See also: Spiking neural network
Non-linearity

The property of a function that is not affine. A non-linear activation function is mandatory in a deep network, otherwise the composition of several layers reduces to a single equivalent affine layer and depth loses its point.

See also: Activation function , Hidden layer
Nonce

A value used exactly once with a given key. Uniqueness, not secrecy, is the critical property: reusing a nonce with the same key completely breaks the scheme. A 192-bit random nonce (XChaCha20) makes collisions negligible, while a 96-bit counter (AES-GCM, ChaCha20-Poly1305) requires careful management to never exceed 2^32 messages per key.

Source: RFC 8439

See also: Authenticated encryption (AEAD) , Domain separation
Norm

The length of a vector, measured as the square root of the sum of its squares. For a vector x = (x₁, ..., xₙ), the norm ‖x‖ = √(x₁² + ... + xₙ²). It is the generalisation of the Pythagorean theorem to n dimensions.

See also: Vector , Dot product
Normal vector

Vector $w$ that defines the direction perpendicular to a hyperplane with equation $w \cdot x + b = 0$. Its direction indicates which side of the hyperplane a point lies on; its norm sets the scale of the signed distance.

Source: Bishop, PRML, ch. 4

See also: Hyperplane , Dot product , Norm
Normalization

The operation that brings a vector to length 1 by dividing it by its norm, without changing its direction. On such normalized vectors, cosine similarity reduces to the dot product, and ranking by cosine coincides with ranking by Euclidean distance. This is why many vector databases normalize embeddings on ingestion.

See also: Norm , Cosine similarity , Euclidean distance
Novikoff's theorem

If a dataset is linearly separable with geometric margin $\gamma > 0$ and radius $R = \max_i \|x_i\|$, then the perceptron algorithm initialised at zero converges in at most $T \leq (R / \gamma)^2$ corrections, regardless of the learning rate.

Source: Novikoff, 1962

See also: Perceptron , Linearly separable , Geometric margin , Cauchy-Schwarz inequality
Offset

A consumer's read position in a message log: the number of the next message it will read. The consumer owns and advances its own offset, not the log. Two readers of the same log therefore have independent offsets, and rewinding an offset to an earlier position is enough to re-read history.

Source: Kleppmann, 2017

See also: Message log , Consumer group , Message broker
Orthogonality

Two vectors are orthogonal when their dot product is zero. Geometrically, this matches a 90-degree angle between them. In machine learning, orthogonal inputs contribute independently to a neuron's computation.

See also: Dot product , Vector
Outbox relay

A process that reads the outbox table, publishes pending messages to the broker, then marks them as sent. It runs separately from the business service. Since it can publish a message then crash before marking it sent, it will republish it: its delivery is at-least-once, and duplicates are absorbed downstream by an idempotent consumer.

Source: Richardson, Microservices Patterns

See also: Transactional outbox , At-least-once delivery , Row claim
Padding oracle

An attack that exploits any observable signal (error message, response time, detectable behavior) revealing whether the padding of an encrypted block is valid. By iterating ciphertext modifications and observing responses, an attacker can decrypt the message without knowing the key. It is one of the classic reasons why encryption without authentication is dangerous.

See also: Malleability , Authenticated encryption (AEAD) , Constant-time
Partial derivative

The derivative of a multivariable function with respect to a single variable, the others held constant. It measures the slope along one axis. Stacked together, the partial derivatives form the gradient.

See also: Gradient , Derivative
Partial order

The guarantee that messages are ordered only within each partition, not across the whole log. A log offers a partial order, not a total order: two messages in the same partition keep their relative order, but two messages in different partitions have no defined order between them. This is why you need a key that groups causally related messages into the same partition.

Source: Kleppmann, 2017

See also: Partition , Partition key , Offset
Partition

A sub-stream of a message log that holds an ordered subset of the messages. A log is split into several partitions so that consumers can read them in parallel. Ordering is guaranteed only within a single partition, never across partitions. Each message is assigned to a partition by its partition key.

Source: Kleppmann, 2017

See also: Partition key , Partial order , Message log
Partition key

The value used to route a message to a partition, usually by hashing it. Two messages with the same partition key always land on the same partition, so they stay ordered relative to one another; messages with different keys spread across partitions and are processed in parallel. Choosing it well (for example the order id) buys per-key ordering without sacrificing throughput. It answers the question: which messages must stay ordered together?

Source: Apache Kafka, Documentation

See also: Partition , Partial order , Consumer group
Perceptron

The first artificial neuron able to learn, invented by Frank Rosenblatt in 1958. It combines a weighted sum of the inputs with a threshold function to produce a binary 0 or 1 decision.

Source: Rosenblatt, 1958

See also: Weighted sum , Bias
Power set

The set of all subsets of a set E, written P(E). Its elements are themselves sets: the empty set and E itself always belong to it. If E has n elements, then P(E) has 2 to the power n, because each element of E is either taken or left out of a subset. For example P({a, b}) = {∅, {a}, {b}, {a, b}}.

See also: Set , Inclusion , Membership
Predicate

A statement containing one or more free variables, whose truth value depends on what is substituted for those variables. "x > 3" is a predicate: it is neither true nor false until x is fixed or quantified. Once all its variables are fixed or bound by a quantifier, a predicate becomes a proposition.

See also: Proposition , Universal quantifier , Existential quantifier
Product quantization

Product Quantization (PQ). A vector compression technique: each vector is split into several slices, and within each slice the sub-vector is replaced by the index of the nearest centroid in a small learned dictionary (a codebook). A vector thus becomes a handful of codes, often one byte each, instead of hundreds of reals. Product quantization saves a great deal of memory, at the cost of reduced recall, since distances are only estimated.

See also: Approximate search , Euclidean distance , Recall@k
Proposition

A mathematical statement that is unambiguously either true or false, with no third possibility. This principle, called bivalence, is the starting point of all propositional logic.

See also: Logical connective , Truth table
Proximity graph

A structure where each vector (a node) is connected by edges to a handful of its nearest neighbors. Instead of a relationless bag of vectors that forces a full scan, you get a network you can walk through, hop by hop, to approach a query without visiting every point. It is the foundation of graph-based indexes such as HNSW.

See also: Nearest neighbors , HNSW , Small-world network
Query message

A message that asks a specific recipient for information without changing any system state (for example "What is this customer's loyalty balance?"). It is phrased as a question, targets a single handler, and the sender always expects data in return. It is the third family of messages, alongside the command that orders and the event that states a fact.

Source: Hohpe & Woolf, 2003

See also: Command message , Event message , Command-query separation , Request-reply
Recall@k

A measure of the quality of an approximate search: the fraction of the k true nearest neighbors (computed by exhaustive search) that the approximate method recovers in its top k results. A recall@k of 1 means no exact neighbor was missed; a recall@k of 0.8 means one exact neighbor in five escaped the search.

See also: Nearest neighbors , Exhaustive search
Regression

A supervised learning task that predicts a continuous value (a price, a temperature, a probability). Typically uses the identity as output activation and MSE as the loss function.

See also: Loss function , Activation function
ReLU

An activation function defined as ReLU(x) = max(0, x). Linear for positive values, zero for negative ones. Simple, fast to compute, and largely solves the vanishing gradient problem. The de facto standard in hidden layers of deep networks since 2012.

Source: Nair and Hinton, 2010

See also: Activation function , Sigmoid
Request-reply

An exchange pattern where the sender issues a request then waits, on a return channel, for the matching reply. It is the natural shape of a query: "What is the balance?" calls for "240 points". To match each reply to its request when several are in flight, a correlation identifier is often attached. Described by Hohpe & Woolf as Request-Reply.

Source: Hohpe & Woolf, 2003

See also: Query message , Command-query separation
Resting potential

The stable voltage difference a neuron's membrane maintains between the inside and outside of the cell when it receives nothing, on the order of -65 millivolts. It is the value the membrane potential drifts back to when no input arrives.

Source: Gerstner et al., 2014

See also: Membrane potential , Ion channel
Row claim

A lock that lets a relay take a row from the outbox table without a concurrent relay taking the same one. In SQL, the FOR UPDATE SKIP LOCKED clause: each relay claims rows that are still free and skips those already locked by another. Without this lock, several relays would publish the same message, a double dispatch.

Source: PostgreSQL, Documentation (SELECT FOR UPDATE SKIP LOCKED)

See also: Outbox relay , Transactional outbox
Saturation

The phenomenon by which an activation function reaches a nearly constant value (and therefore a nearly zero derivative) over large regions of its domain. Sigmoid saturates at very negative or very positive values, causing the vanishing gradient.

See also: Sigmoid , Vanishing gradient
Set

A collection of objects, called its elements, regarded as a single whole. A set is entirely determined by its elements: two sets with exactly the same elements are equal. It is described by extension, listing its elements in braces such as {1, 2, 3}, or by comprehension, giving the property its elements satisfy, such as {x | x > 3}.

See also: Membership , Inclusion , Power set
Sigmoid

An S-shaped activation function that takes any real number and squashes it into the open interval (0, 1). Its formula is σ(x) = 1 / (1 + e⁻ˣ). Historically the most used, it is today often replaced by ReLU in hidden layers.

See also: Activation function , ReLU
Signed distance

Perpendicular distance from a point to a hyperplane, carrying a sign that depends on which side of the hyperplane the point lies on. For the hyperplane $w \cdot x + b = 0$, it equals $d(x) = (w \cdot x + b) / \|w\|$: positive on one side, negative on the other, zero on the hyperplane itself.

Source: Hastie, Tibshirani, Friedman, ESL, ch. 4

See also: Hyperplane , Normal vector , Euclidean distance
Small-world network

A network that combines many local links (to nearby neighbors) with a few rare long-range links (to distant regions). Those shortcuts collapse path lengths: to cross the network, the number of hops grows like the logarithm of the number of nodes rather than like their count. It is the principle behind six degrees of separation, and the core of HNSW's efficiency.

See also: Proximity graph , HNSW
Softmax

A function that turns a vector of reals into a probability distribution. For a vector z, softmax(z)_i = exp(z_i) / sum(exp(z_j)). Used as the output activation in multi-class classification.

See also: Sigmoid , Activation function
Spiking neural network

A family of neural networks that communicate through discrete spikes in time, closer to biological operation than classical continuous networks. An active research field, rarely used in industrial practice so far.

See also: Neuromorphic computing
Stateful neuron

A neuron whose output depends on an internal variable that evolves over time (membrane potential, adaptive threshold), and thus on its recent history. As opposed to a stateless neuron, whose output depends only on the instantaneous input.

See also: Integrate-and-fire , Spiking neural network
Surrogate gradient

A training trick for spiking neural networks. Since the binary spike is not differentiable, its derivative is replaced by a smooth approximation during backpropagation, while the forward pass keeps the spiking dynamics. Formalized by Neftci, Mostafa and Zenke (2019).

Source: Neftci, Mostafa & Zenke (2019)

See also: Spiking neural network , Neuromorphic computing
Tanh

Hyperbolic tangent, an activation function similar to sigmoid but compressing values into (-1, 1) instead of (0, 1). Often used when a zero-centered output is desired. Its formula is tanh(x) = (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ).

See also: Activation function , Sigmoid
Tautology

A proposition that is true regardless of the truth values of its components. Example: P ∨ ¬P (law of excluded middle). Its opposite is a contradiction, always false.

See also: Proposition , Truth table
Temporal coding

A way of representing information in the precise instant when a spike is emitted, rather than in the number of spikes per second. The firing time then carries the message, letting a spiking network compute with very few spikes. Contrasts with rate coding.

See also: Spiking neural network , Coincidence detection
Temporal coupling

A dependency that forces two services to be available at the same instant for an exchange to succeed. A direct synchronous call temporally couples the caller and the callee: if the callee is down or slow, the caller waits or fails. An asynchronous message removes this coupling by inserting a queue that accepts the request even when the recipient is absent.

See also: Asynchronous messaging , Event message
Threshold function

Binary activation function H(z) equal to 1 if z >= 0 and 0 otherwise. Also called the Heaviside function. It is the original activation of McCulloch-Pitts (1943) and Rosenblatt's perceptron (1958), later dropped because it is not differentiable.

See also: Activation function , Perceptron
Time constant

Characteristic timescale of a membrane's leak, written τ and equal to the product of resistance and capacitance, τ = R · C. After one time constant the potential has lost about 63% of its charge (37% remains, i.e. 1/e). It sets the discrete retention factor λ = e^(-Δt/τ).

Source: Gerstner et al., 2014

See also: Membrane potential , Integrate-and-fire
Transactional outbox

A pattern that removes the dual-write problem by writing only one system atomically. The service records the business state AND a row describing the message to send in the same database transaction. A relay then reads this outbox table and publishes the messages to the broker. An impossible distributed write becomes a local atomic write followed by a relay.

Source: Richardson, Microservices Patterns

See also: Dual write , Outbox relay , Row claim
Transformer

A neural network architecture introduced in 2017 by Vaswani et al. in "Attention is all you need". Built on the attention mechanism, it now dominates natural language processing and extends to vision and audio. It is the foundation of models like GPT, Claude, Gemini.

Source: Vaswani et al., 2017

See also: llm-mcp
Truth table

A table giving the truth value of a logical formula for each possible combination of its variables' values. For n variables it has 2ⁿ rows.

See also: Proposition , Logical connective
Two generals problem

A classic result in theoretical computer science: over an unreliable communication channel, where any message can be lost, no protocol with a finite number of messages lets two parties become jointly certain of a shared agreement. Applied to messaging, it proves that exactly-once delivery is impossible, because the acknowledgement itself can be lost: the sender must choose between risking loss or risking a duplicate.

Source: Akkoyunlu et al., 1975

See also: At-most-once delivery , At-least-once delivery , Effectively-once delivery
Union

The operation that joins two sets, written ∪. A ∪ B is the set of objects that belong to A or to B (or to both). Its membership condition is a disjunction: x ∈ A ∪ B is equivalent to "x ∈ A or x ∈ B". Union is to the logical "or" what the object is to the connective.

See also: Intersection , Complement , Logical connective
Universal approximation theorem

A result (Cybenko 1989, Hornik 1989) stating that a network with a single hidden layer, given enough neurons and a non-polynomial activation, can approximate any continuous function on a bounded domain to arbitrary accuracy. It guarantees that such a network exists, not that we can learn it.

Source: Cybenko, 1989 ; Hornik, 1989

See also: Multilayer perceptron , Expressive power , Hidden layer
Universal quantifier

The symbol ∀, read "for all" or "for every". The statement ∀x, P(x) is true when the predicate P holds for every element of the domain of discourse, without exception. A single counter-example is enough to refute it.

See also: Predicate , Existential quantifier , Domain of discourse , Counter-example
Vanishing gradient

The disappearance of the gradient in deep network layers. When the maximum derivative of an activation function is below 1, the gradient multiplies at each layer crossed and collapses exponentially. Identified by Glorot and Bengio (2010), it is one of the reasons for the shift to ReLU.

Source: Glorot and Bengio, 2010

See also: Sigmoid , ReLU , Saturation
Vector

A mathematical object represented as an ordered list of numbers. A vector of dimension n encodes n values. In machine learning, a neuron's inputs and weights are each a vector of the same dimension.

See also: Dot product
Vector space

A set whose elements, the vectors, can be added together and multiplied by a number, following consistency rules. Concretely for this course: the set of lists of n real numbers, where each embedding is a point. The dimension n is the number of coordinates.

See also: Vector , Embedding
Weighted sum

The addition of several values, each multiplied by a coefficient called weight. General formula Σ wᵢ xᵢ. It is the core of the artificial neuron's computation, before adding the bias and applying the activation function.

See also: Bias , Activation function
XOR (exclusive or)

Logical operation that returns 1 when exactly one of its two inputs is 1, and 0 otherwise. Its positive cases lie on a diagonal in 2D space, making them non separable by a single line. This makes XOR impossible to learn for a single perceptron.

Source: Minsky and Papert, 1969

See also: Perceptron

Glossary

Acknowledgement

Activation function

Affine combination

AI winter

Approximate search

Associated data (AAD)

Asynchronous messaging

At-least-once delivery

At-most-once delivery

Authenticated encryption (AEAD)

Authentication tag

Backpropagation

Backward pass

Bias

Cartesian product

Cauchy-Schwarz inequality

Chain rule

Classification

Coincidence detection

Command message

Command-query separation

Complement

Computation graph

Constant-time

Consumer group

Cosine similarity

Cost surface

Counter-example

Cross-entropy

Crypto-agility

Curse of dimensionality

Decision boundary

Deduplication key

Derivative

Differential oracle

Distance concentration

Distributional hypothesis

Domain of discourse

Domain separation

Dot product

Dual write

Dying ReLU

Effectively-once delivery

Embedding

Error signal

Euclidean distance

Event message

Exhaustive search

Existential quantifier

Expressive power

Few-shot learning

Forward pass

Foundation model

Function composition

Functional margin

GELU

Geometric margin

Gradient

Gradient descent

Greedy search

Half-space

Hallucination

Hidden layer

HNSW

Hyperplane

Idempotence

Idempotent consumer

Implication

Inclusion

Integrate-and-fire

Intersection

Intrinsic plasticity

Ion channel

IVF (inverted file)

Leaky ReLU

Learning rate

Learning rule

Linear combination

Linearly separable