Glossary
Short definitions of the technical terms used in the interactive courses. Every dotted-underlined word in a course links back to its entry here.
-
Acknowledgement
The signal by which a consumer confirms that a message has been handled, allowing the broker to erase it. Its timing is decisive: acknowledging before processing risks loss (at-most-once), acknowledging after successful processing risks a duplicate on redelivery (at-least-once). As long as no acknowledgement arrives, the broker may redeliver the message.
Source: RabbitMQ, Consumer Acknowledgements
See also: At-most-once delivery , At-least-once delivery , Message broker
-
Activation function
A non-linear function applied to the output of a neuron's weighted sum. Without it, a neural network would collapse to a linear combination, no matter how deep. The classics are sigmoid, ReLU, tanh.
-
Affine combination
A linear combination to which a constant term (bias) is added. A neuron's weighted sum together with its bias is an affine combination of its inputs. Composing several affine combinations without a non-linearity yields a single equivalent affine combination.
See also: Linear combination , Non-linearity , Bias
-
AI winter
Period of disinterest and funding cuts for artificial intelligence research. The first AI winter, in the 1970s and early 1980s, followed Minsky and Papert's critique of the perceptron (1969). The second, in the late 1980s and 1990s, followed disappointments with expert systems. Each winter preceded a comeback: backpropagation after the first, modern deep learning after the second.
Source: Russell & Norvig, *AIMA*, ch. 1
See also: Minsky & Papert , Perceptron , Backpropagation
-
Approximate search
A family of methods (ANN, for Approximate Nearest Neighbor) that accept occasionally missing the true nearest neighbor in order to answer far faster, or with far less memory. You trade a little accuracy, measured by recall, against speed or space. It stands opposed to exhaustive search, exact but costly. HNSW, IVF and product quantization are its main families.
See also: Exhaustive search , Recall@k , HNSW
-
Associated data (AAD)
Data authenticated by an AEAD algorithm but not encrypted, typically metadata such as a header, identifier, or usage context. It binds the ciphertext to its context: any mismatch between the expected associated data and the value provided at decryption invalidates the tag and causes the open operation to fail.
See also: Authenticated encryption (AEAD) , Authentication tag , Domain separation
-
Asynchronous messaging
An exchange mode where the sender drops a message and carries on without waiting for the recipient to process it. The message waits in a queue or a log until a consumer picks it up. It contrasts with synchronous communication, where the caller stays blocked until the reply arrives.
See also: Temporal coupling , Command message
-
At-least-once delivery
A guarantee that a message is processed one or more times, never zero. It is obtained by acknowledging only after successful processing and redelivering as long as no acknowledgement arrives: a crash after processing but before the acknowledgement triggers a redelivery, hence a duplicate. It never loses, but it can duplicate; this is why the processing must be idempotent.
Source: Kleppmann, 2017
See also: Acknowledgement , At-most-once delivery , Effectively-once delivery , Idempotence
-
At-most-once delivery
A guarantee that a message is processed zero or one time, never more. It is obtained by acknowledging the message as soon as it is received, before processing it: a crash between the acknowledgement and the end of processing loses the message, since it has already been erased. It never duplicates, but it can lose. Suited to streams where occasional loss is harmless.
Source: Kleppmann, 2017
See also: Acknowledgement , At-least-once delivery , Effectively-once delivery
-
Authenticated encryption (AEAD)
A cryptographic primitive that simultaneously guarantees confidentiality and integrity of a message. It produces an authentication tag alongside the ciphertext, and any tampering causes decryption to fail cleanly. AEAD (Authenticated Encryption with Associated Data) extends this by also authenticating unencrypted associated data bound to the usage context.
Source: NIST SP 800-38D
See also: Authentication tag , Associated data (AAD) , Nonce , Malleability
-
Authentication tag
A short value (typically 128 bits) produced by an AEAD algorithm or MAC, verified at decryption time. Any change to the ciphertext, associated data, or nonce invalidates the tag and causes the open operation to fail. It provides both integrity and authenticity guarantees.
See also: Authenticated encryption (AEAD) , Associated data (AAD) , Malleability
-
Backpropagation
An algorithm that computes the gradient of the loss function with respect to every weight in a neural network. It propagates the error from the output back through earlier layers using the chain rule. It is the core of multi-layer network training.
Source: Rumelhart, Hinton and Williams, 1986
See also: Activation function
-
Backward pass
The phase of backpropagation where the error signal propagates from output to input, the reverse direction of the forward pass, to assemble each weight's gradient. It reuses the activations stored during the forward pass instead of recomputing them.
See also: Forward pass , Backpropagation , Error signal
-
Bias
A constant term added to the weighted sum of a neuron, independent of the inputs. Geometrically, it translates the decision boundary in input space. Without bias, that boundary would necessarily pass through the origin.
See also: Weighted sum
-
Cartesian product
The operation that builds a set of pairs from two sets, written ×. A × B is the set of all ordered pairs (a, b) where a ∈ A and b ∈ B. Order matters: (a, b) is not (b, a). If A has m elements and B has n, then A × B has m × n. It is the starting brick of relations and functions.
See also: Set , Membership
-
Cauchy-Schwarz inequality
For any two vectors x and w in Rⁿ, |x · w| ≤ ‖x‖ · ‖w‖. Equality holds only if the two vectors are colinear. It is the fundamental inequality of linear algebra, ensuring consistency between the algebraic and geometric formulations of the dot product.
Source: Cauchy 1821, Schwarz 1888
See also: Dot product , Norm
-
Chain rule
The rule for differentiating a composition of functions: the derivative of f(g(x)) is f'(g(x)) times g'(x). Local derivatives are multiplied along the path. It is the mechanical heart of backpropagation.
See also: Derivative , Function composition , Backpropagation
-
Classification
A supervised learning task that predicts a class from a finite set. Binary if two classes (cat or not), multi-class beyond (bird species among 200). Typically uses sigmoid or softmax as output.
See also: Sigmoid , Softmax , Loss function
-
Coincidence detection
The ability of a stateful neuron to fire only when two inputs arrive within a short interval of each other. Its internal integration window makes this computation possible, where a memoryless neuron cannot. Illustrated by the Jeffress model (1948) for sound localization.
Source: Jeffress (1948)
See also: Stateful neuron
-
Command message
A message that expresses an intention, an order addressed to a specific recipient asking it to do something (for example "Charge the payment"). It is named in the imperative, in principle has a single handler, and the sender expects an effect to happen. To be distinguished from an event message, which states a fact that already occurred.
See also: Event message , Asynchronous messaging
-
Command-query separation
A design principle stating that an operation must either change the system state while returning nothing useful (a command), or return data while changing nothing (a query), never both at once. Coined by Bertrand Meyer (Command-Query Separation, CQS), it makes every message readable: you can tell at a glance whether it writes or reads. It is the line that cleanly separates a command from a query.
Source: Meyer, 1988
See also: Command message , Query message
-
Complement
The operation that returns what is not in a set, relative to a reference universe. The complement of A, written Aᶜ (or A bar), is the set of objects of the universe that do not belong to A. Its membership condition is a negation: x ∈ Aᶜ is equivalent to "not (x ∈ A)". The complement depends on the chosen universe: without a fixed universe, it has no meaning.
See also: Union , Intersection , Domain of discourse
-
Computation graph
A representation of a computation as a directed graph whose nodes are operations and whose edges are the values flowing between them. Reading a network as a computation graph makes backpropagation systematic: local derivatives are multiplied along the edges, from outputs back to inputs.
See also: Chain rule , Backpropagation , Forward pass
-
Constant-time
A property of an implementation whose execution duration does not depend on secret values, eliminating timing side channels. It is essential for authentication tag comparison and sensitive cryptographic operations. ChaCha20-Poly1305 is naturally constant-time in software, whereas AES requires hardware instructions (AES-NI) to achieve this property.
See also: Authentication tag , Padding oracle , Authenticated encryption (AEAD)
-
Consumer group
A set of consumers that share a single offset to split the reading of a log: within the group, each message is processed only once. Several distinct groups read the same log independently, each with its own offset, so a message is re-read as many times as there are groups. It is the log-side counterpart of a broker's competing consumers.
Source: Kleppmann, 2017
See also: Message log , Offset , Message broker
-
Cosine similarity
A measure of proximity between two vectors, defined as cos(θ) = (x · w) / (‖x‖ ‖w‖), a value in [-1, 1]. Equal to 1 when the vectors are aligned, 0 when orthogonal, -1 when opposite. A standard tool for comparing embeddings (words, sentences, images).
See also: Dot product , Norm , Cauchy-Schwarz inequality
-
Cost surface
The graph of the cost seen as a function of the network weights, with the data held fixed. Each set of weights is a point whose altitude is the corresponding cost. Learning amounts to descending toward a valley of this landscape, which the gradient chapters will do.
Source: Goodfellow, Bengio and Courville, 2016
See also: Loss function , Gradient descent
-
Counter-example
An element of the domain that makes a universal statement false. To refute "∀x, P(x)", it suffices to exhibit a single x such that P(x) is false: this is the direct reading of the equivalence ¬(∀x, P(x)) ≡ ∃x, ¬P(x). A counter-example demolishes a conjecture with nothing more to add.
See also: Universal quantifier , Existential quantifier , Implication
-
Cross-entropy
A classification loss that measures the gap between the predicted distribution and the target distribution. It equals minus the logarithm of the probability assigned to the correct class, so it blows up when the model is confident and wrong. Paired with the softmax function, it is the standard multiclass loss.
Source: Bishop, 2006
See also: Loss function , Softmax , Classification
-
Crypto-agility
The ability of a format or protocol to migrate to new cryptographic primitives without breaking existing data. It is typically implemented via a version byte at the head of the ciphertext, allowing old records to be decoded and new ones encrypted with the current algorithm. It is essential for preparing a post-quantum migration.
See also: Authenticated encryption (AEAD) , Domain separation
-
Curse of dimensionality
A set of counterintuitive phenomena that appear when the number of dimensions grows large. In vector search, two effects dominate: distances between randomly drawn points concentrate (nearest and farthest become almost indistinguishable), and two random vectors are almost always nearly perpendicular. This is what makes nearest-neighbor search hard in high dimension.
See also: Distance concentration , Orthogonality
-
Decision boundary
The set of input-space points where the model switches from one class to another, that is, where its output flips. For a single neuron it is a hyperplane ; for a multilayer network it can become polygonal, then curved.
Source: Bishop, 2006
See also: Hyperplane , Linearly separable , Multilayer perceptron
-
Deduplication key
The identifier a consumer uses to recognize a message it has already processed. It is often the stable message id provided by the broker, sometimes a business key (an order number). It is stored in an inbox table: a redelivered message whose key is already present is discarded. It answers the question: is this the same message?
Source: Kleppmann, 2017
See also: Idempotent consumer , Idempotence , Acknowledgement
-
Derivative
The slope of a function at a point. Formally, the limit of the rate of change (f(x+h) - f(x)) / h as h tends to zero. It tells how much, and in which direction, the output changes when the input moves by a hair.
See also: Chain rule , Gradient
-
Differential oracle
A testing method that validates fast-but-approximate (or optimized) code by comparing its output against a slow-but-exact reference, on the metric that truly matters. Instead of checking local properties (the result is well-formed), it measures the quality gap against the ground truth produced by the naive implementation. Essential when an algorithm can be structurally correct yet globally wrong: approximate search, caches, heuristics.
See also: Approximate search , Recall@k , Exhaustive search
-
Distance concentration
The phenomenon by which, in high dimension, distances between randomly drawn points tighten around a common value. The relative contrast (maximum minus minimum distance, divided by the minimum) tends to zero like the inverse of the square root of the dimension. As a result, the notion of a nearest neighbor loses meaning when all distances look alike.
See also: Curse of dimensionality , Euclidean distance
-
Distributional hypothesis
The founding idea of vector semantics: a word is characterized by the contexts in which it appears, so words that share contexts have close meanings. Summarized by Firth's phrase, you shall know a word by the company it keeps. This principle is what justifies learning embeddings where geometric proximity encodes proximity of meaning.
Source: Firth, 1957
See also: Embedding , Vector space
-
Domain of discourse
The set of objects that the variables of a quantified predicate range over. The truth of a quantified statement depends entirely on it: "∃x, x² = 2" is false over the integers but true over the reals. Stating the domain is therefore not a detail, it is part of the statement.
See also: Predicate , Universal quantifier , Existential quantifier
-
Domain separation
A technique that derives distinct keys or contexts for each role or usage, ensuring that a valid ciphertext in one domain cannot be replayed in another. It is typically implemented via role-specific associated data, key derivation prefixes, or version bytes. It is essential for preventing context-confusion attacks.
See also: Associated data (AAD) , Nonce , Crypto-agility
-
Dot product
An operation taking two vectors of equal dimension and returning a single number, computed as the sum of element-wise products. It is exactly the computation a neuron performs between its inputs and weights.
See also: Vector , Weighted sum
-
Dual write
A situation where a service must modify two distinct systems for a single action, typically its own database and a message broker. Since no single transaction spans both, a crash between the two writes leaves an inconsistency: the database is updated but the message was never sent, or the message was sent but the database was rolled back. This is the problem the transactional outbox solves.
Source: Richardson, Microservices Patterns
See also: Transactional outbox , Outbox relay
-
Dying ReLU
The phenomenon where a ReLU neuron whose input stays negative ends up with both zero output and zero gradient. The neuron freezes, stops updating, and remains dead until the end of training. Mitigated by Leaky ReLU, ELU, GELU variants.
See also: ReLU , Leaky ReLU
-
Effectively-once delivery
The combination of at-least-once delivery and idempotent processing, which makes the observable effect identical to a single processing. Delivery duplicates are not removed but neutralized: an already-processed message is recognized and its effect is not reapplied. It is the realistic approximation of exactly-once, which is impossible at the delivery level because of the two generals problem.
Source: Kleppmann, 2017
See also: At-least-once delivery , Idempotence , Two generals problem
-
Embedding
A representation of an object (word, sentence, image) as a vector of real numbers, learned by a neural network so that geometric proximity reflects proximity of meaning. Two texts with close meanings get close vectors. Typical dimensions range from a few hundred to a few thousand (for example 768 or 1536).
Source: Mikolov et al., 2013
See also: Vector , Vector space , Cosine similarity
-
Error signal
The sensitivity of the loss to a neuron's pre-activation, written delta = dL/dz. It measures how much the score would change if the neuron's net input shifted by a hair. Backpropagation computes this signal for every neuron, from output back to input, then derives each weight's gradient through the rule dL/dw = delta times the upstream activation.
See also: Backpropagation , Gradient , Partial derivative
-
Euclidean distance
The distance between two vectors u and v in Rⁿ, defined as the norm of their difference: d(u, v) = ‖u - v‖. It generalises the distance between two points of the plane to n dimensions. Used to measure similarity between two vector representations.
-
Event message
A message that announces a fact that already happened (for example "Order paid"). It is named in the past tense, broadcast to whoever wants to listen, and the sender does not know who consumes it, or even whether anyone does. Several subscribers can react to the same event. To be distinguished from a command message, which asks a single recipient for a future action.
See also: Command message , Temporal coupling
-
Exhaustive search
A strategy that compares the query against every vector in the database, one by one, to extract the closest ones. Also called linear scan or Flat index. It is exact by construction (it cannot miss anything) but its cost grows linearly with the number of vectors and their dimension, in O(n x d). It serves as the reference oracle for judging approximate methods.
See also: Nearest neighbors , Recall@k
-
Existential quantifier
The symbol ∃, read "there exists". The statement ∃x, P(x) is true as soon as at least one element of the domain of discourse makes the predicate P true. Such an element is called a witness. Existence does not require uniqueness: one or several witnesses are enough.
See also: Predicate , Universal quantifier , Domain of discourse
-
Expressive power
The range of functions a model can represent as its parameters vary. A single perceptron expresses only linear separations ; adding hidden layers widens the expressive power until it can approximate any continuous function.
Source: Goodfellow, Bengio & Courville, 2016
See also: Universal approximation theorem , Multilayer perceptron , Hidden layer
-
Few-shot learning
A model's ability to learn a new task from very few examples (typically between 1 and 10). An open challenge for classical networks that need thousands of examples, but advancing fast with large foundation models.
See also: Foundation model
-
Forward pass
The forward propagation. A computation phase where an input traverses the network layer by layer, from inputs to output, applying at each neuron its weighted sum and activation function. Produces the final prediction.
See also: Weighted sum , Activation function , Backpropagation
-
Foundation model
A very large neural network trained on a massive amount of general-purpose data, which can then be adapted to many specific tasks. The term was coined by Bommasani et al. in 2021. Typical 2026 examples include GPT-4, Claude and Gemini.
Source: Bommasani et al., 2021
See also: Transformer , llm-mcp
-
Function composition
The operation of applying one function to the result of another, written f ring g. A multilayer network is a composition where each layer output becomes the next layer input, and this nesting, alternated with non-linear activations, is what creates the global non-linearity.
See also: Multilayer perceptron , Non-linearity , Hidden layer
-
Functional margin
For a sample $(x, y)$ with $y \in \{-1, +1\}$, the functional margin is the quantity $\hat\gamma = y (w \cdot x + b)$. It is strictly positive if and only if the sample is correctly classified. It depends on the scale of the weights and is not a geometric distance.
Source: Bishop, PRML, ch. 7
See also: Geometric margin , Linearly separable , Perceptron
-
GELU
Gaussian Error Linear Unit, a modern ReLU variant defined as GELU(x) = x · Φ(x) where Φ is the Gaussian cumulative distribution function. Smoother than ReLU around zero, dominant in transformers (GPT, BERT, Claude).
Source: Hendrycks and Gimpel, 2016
See also: ReLU , Transformer
-
Geometric margin
Minimal perpendicular distance between a separating hyperplane and the points of the dataset. Defined by $\gamma = \min_i y_i (w \cdot x_i + b) / \|w\|$ with $y_i \in \{-1, +1\}$. Plays a central role in Novikoff's theorem and in the formulation of support vector machines.
Source: Novikoff, 1962
See also: Functional margin , Linearly separable , Novikoff's theorem
-
Gradient
The vector of all partial derivatives of a function. It points in the direction of steepest increase of the function at a given point, and its norm measures the slope. In training, we follow the opposite of the gradient to drive the loss down.
See also: Gradient descent , Backpropagation
-
Gradient descent
An optimisation algorithm that iteratively adjusts the parameters of a model to minimise a loss function. At each step, it moves the parameters in the direction opposite the gradient, by a distance proportional to the learning rate. The dominant method for training neural networks.
Source: Cauchy, 1847
See also: Gradient , Learning rate , Loss function
-
Greedy search
A movement strategy in a proximity graph: at each step you hop to the neighbor closest to the query, and you stop as soon as no neighbor is closer than the current node. Fast and short-sighted, it takes the best local move without planning, which does not guarantee reaching the true nearest neighbor: it can get stuck in a local minimum.
See also: Proximity graph , Local minimum , Nearest neighbors
-
Half-space
One of the two regions in which a hyperplane partitions Rⁿ. Algebraically, the set of points x with w · x + b > 0 (resp. < 0). A threshold neuron splits space into exactly two half-spaces: active and inactive.
See also: Hyperplane , Threshold function
-
Hallucination
Output by a language model of a false statement, asserted with confidence. A structural flaw of training by likelihood maximisation, which pushes the model to always produce a plausible answer even when it should say it does not know.
See also: Foundation model , Transformer
-
Hidden layer
An intermediate layer in a neural network, sitting between the input layer and the output layer. Its neurons neither receive raw data nor produce the final prediction, they compute intermediate representations. A "deep" network has several hidden layers.
See also: Activation function , Multilayer perceptron , Expressive power
-
HNSW
Hierarchical Navigable Small World. An approximate search index that stacks proximity graphs in layers, sparse and coarse at the top, dense and fine at the bottom. A greedy walk descends layer by layer to find the nearest neighbors in about log n hops. Two knobs: M (neighbors per node, paid in memory) and ef (beam width, paid in time).
See also: Proximity graph , Small-world network , Greedy search , Recall@k
-
Hyperplane
A subset of Rⁿ defined by a linear equation w · x + b = 0. In two dimensions it is a line, in three a plane. It is exactly the decision boundary drawn by a single neuron.
See also: Vector , Dot product , Perceptron
-
Idempotence
The property of an operation whose repeated execution produces the same result as a single execution. Applied to messaging, it makes duplicates harmless: an already-processed message is recognized and its effect is not reapplied. It is the mechanism that turns at-least-once delivery into effectively-once delivery. How to concretely build an idempotent consumer (deduplication key, atomicity) is the subject of the next chapter.
Source: Hohpe & Woolf, 2003
See also: At-least-once delivery , Effectively-once delivery , Acknowledgement
-
Idempotent consumer
A consumer whose processing produces the same result whether a message is handled once or several times. It tracks every already-processed message by its deduplication key and, in the same transaction as the effect, marks that key as seen: on a redelivered duplicate it recognizes the key and skips the effect. The atomicity between applying the effect and recording the key is essential, otherwise a crash between the two reopens the two generals problem inside its own database.
Source: Hohpe & Woolf, Enterprise Integration Patterns
See also: Idempotence , Deduplication key , At-least-once delivery
-
Implication
The "if... then..." connective, written ⇒. The proposition P ⇒ Q is false in exactly one case: when P is true and Q is false. In particular, an implication with a false premise is always true.
See also: Logical connective , Logical equivalence
-
Inclusion
A relation between two sets, written ⊆. "A ⊆ B" reads "A is included in B" or "A is a subset of B", and means that every element of A is also an element of B. Its definition is a quantified statement: A ⊆ B is equivalent to "for all x, x ∈ A implies x ∈ B". Two sets are equal exactly when each is included in the other (double inclusion).
See also: Membership , Implication , Universal quantifier
-
Integrate-and-fire
A neuron model that accumulates input current in a membrane potential with a leak (time constant tau) and emits a spike when a threshold is crossed. The first internal state variable of a neuron, introduced by Lapicque in 1907.
Source: Lapicque (1907)
See also: Stateful neuron , Spiking neural network
-
Intersection
The operation that keeps only what two sets share, written ∩. A ∩ B is the set of objects that belong to both A and B. Its membership condition is a conjunction: x ∈ A ∩ B is equivalent to "x ∈ A and x ∈ B". When A ∩ B is empty, A and B are said to be disjoint.
See also: Union , Complement , Logical connective
-
Intrinsic plasticity
A lasting change in a neuron's excitability through its own internal dynamics (adaptive threshold, accommodation), without changing synaptic weights. A form of learning that resides not in the connections but in the neuron's state.
See also: Stateful neuron
-
Ion channel
A pore through a neuron's membrane that lets specific charged ions pass. Always-open channels leak a steady current, modelled as a resistance. Other channels open and close depending on the voltage itself and actively generate the spike (the Hodgkin-Huxley model).
Source: Hodgkin & Huxley, 1952
See also: Resting potential , Membrane potential
-
IVF (inverted file)
Inverted File. An index that partitions the space into cells, computed by k-means, and files each vector into the cell of its nearest centroid. At search time, only the nprobe cells closest to the query are scanned, not the whole database. IVF wins latency without reducing memory, since the vectors are still stored in full. The nprobe number tunes the trade-off between speed and recall.
See also: Approximate search , Nearest neighbors , Recall@k
-
Leaky ReLU
A ReLU variant that lets a small slope alpha (typically 0.01) pass on the negative side instead of being strictly zero. Formula: LeakyReLU(x) = x if x > 0, alpha x otherwise. Avoids the dying ReLU problem.
Source: Maas, Hannun and Ng, 2013
See also: ReLU , Dying ReLU
-
Learning rate
A positive scalar controlling the step size taken by gradient descent at each iteration. Too small, training is slow; too large, it oscillates or diverges. Often denoted η (eta) or α (alpha). The first hyperparameter to tune in any training run.
See also: Gradient descent
-
Learning rule
Procedure that updates the parameters (weights, bias) of a model from observed samples. For the perceptron, the rule is $w \leftarrow w + \eta \cdot y \cdot x$ and $b \leftarrow b + \eta \cdot y$ applied only when a sample is misclassified.
Source: Rosenblatt, 1958
See also: Learning rate , Perceptron , Gradient descent
-
Linear combination
An expression of the form a₁ v₁ + a₂ v₂ + ... + aₙ vₙ where the aᵢ are scalars and the vᵢ are vectors. A neuron's weighted sum is a linear combination of the inputs with the weights as coefficients.
See also: Weighted sum , Dot product
-
Linearly separable
A labelled dataset is linearly separable if there exists a hyperplane that correctly separates the label-1 points from the label-0 points. XOR is the historical example of a non linearly separable problem.
See also: Hyperplane , XOR (exclusive or) , Perceptron
-
Local minimum
A graph node whose immediate neighbors are all farther from the query than itself, even though a much better point exists elsewhere in the graph, out of direct reach. A greedy search wrongly stops there, believing it has found the nearest neighbor. Widening the beam (keeping several candidates) lets it escape.
See also: Greedy search , Proximity graph , Recall@k
-
Logical connective
A symbol that combines one or two propositions into a new one. The five basic connectives are negation (¬), conjunction (∧), disjunction (∨), implication (⇒) and equivalence (⇔).
See also: Proposition , Implication , Truth table
-
Logical equivalence
A relation between two propositions that share the same truth value in every possible case. The associated connective, written ⇔, reads "if and only if" and amounts to a double implication.
See also: Implication , Truth table
-
Loss function
A measure of the error between a network's prediction and the expected truth. Also called cost function. The higher it is, the more wrong the network. Training seeks to minimise it. Common examples: MSE for regression, cross-entropy for classification.
See also: Gradient descent , Gradient
-
Malleability
A property of an encryption scheme where modifying the ciphertext produces a predictable, exploitable change in the corresponding plaintext. Unauthenticated cipher modes (stream, CTR, CBC without MAC) are malleable. Using an AEAD algorithm eliminates this property by causing any modified ciphertext to fail decryption.
See also: Authenticated encryption (AEAD) , Authentication tag , Padding oracle
-
Mark I Perceptron
Physical machine built by Frank Rosenblatt between 1958 and 1960 at Cornell Aeronautical Laboratory. Able to recognise simple shapes thanks to 400 photoreceptors connected to weights that were tunable via motorised potentiometers. The first hardware implementation of a machine learning algorithm, distinct from the theoretical model published in 1958.
Source: Rosenblatt, 1958, 1960
See also: Perceptron
-
Matrix
A rectangular array of numbers organised in rows and columns. An m×n matrix has m rows and n columns. In a neural network, a layer of m neurons each having n inputs collapses into an m×n weight matrix.
See also: Vector , Dot product
-
Mean squared error
A loss function that averages the square of the gap between the prediction and the target. The square penalizes large gaps heavily and makes the cost differentiable everywhere. Written MSE, it is the natural choice for regression.
Source: Goodfellow, Bengio and Courville, 2016
See also: Loss function , Regression
-
Mediator
An object that centralizes message routing inside a single process, in memory. Instead of the sender referencing the right handler directly, it hands the message to the mediator, which knows which handler to route it to: a single one for a command or a query, zero to many subscribers for an event. It differs from a bus or broker, which provides the same service but across the network, between processes.
Source: Gamma et al., 1994
See also: Command message , Query message , Event message
-
Membership
The fundamental relation between an object and a set, written ∈. "x ∈ A" reads "x belongs to A" and means that x is one of the elements of A. Its negation is written ∉. Membership is the basic predicate of set theory: everything else, inclusion and operations, is defined from it.
-
Membrane potential
The internal variable of a stateful neuron measuring its accumulated electric charge. It rises when inputs arrive, slowly leaks back toward rest when no input comes, and triggers a spike once it reaches a threshold, after which it resets.
See also: Stateful neuron , Integrate-and-fire , Spiking neural network
-
Message broker
An intermediary that receives messages, holds them in queues and hands each one to a consumer, then erases it once it has been acknowledged. The archetype is RabbitMQ: a delivered and acknowledged message is gone, it is not kept to be re-read. The progress state (what is left to deliver) lives in the broker, not in the reader.
Source: Hohpe & Woolf, 2003
See also: Message log , Consumer group , Mediator
-
Message log
An ordered, append-only sequence of messages kept instead of being erased after reading. The archetype is Kafka: each message gets a fixed position, and several readers can re-read it independently, each at its own pace. Unlike the broker, the read state does not live in the log but in the consumer, as an offset.
Source: Kleppmann, 2017
See also: Message broker , Offset , Consumer group
-
Metamorphic testing
A testing technique that checks an expected RELATION between several runs rather than one exact output value, useful when the right answer is unknown or too costly to compute (the oracle problem). For example: permuting the order of inputs must not change the result, or doubling an input must double the output. The differential oracle is a special case, where the checked relation is equality to an exact reference.
See also: Differential oracle , Approximate search
-
Minsky & Papert
Marvin Minsky and Seymour Papert, authors of *Perceptrons* (MIT Press, 1969), which formally proved the limits of a single perceptron, notably the impossibility of computing the XOR function. Their analysis contributed to the decline of public funding for neural network research until the mid-1980s.
Source: Minsky & Papert, *Perceptrons*, MIT Press, 1969
See also: Perceptron , XOR (exclusive or) , AI winter
-
Multilayer perceptron
A neural network organised in successive layers (input, one or more hidden layers, output), where each neuron applies an affine combination followed by an activation function. By stacking neurons it overcomes the single perceptron limit and computes non-linearly-separable functions such as XOR.
Source: Rumelhart, Hinton & Williams, 1986
See also: Hidden layer , Perceptron , XOR (exclusive or) , Function composition
-
Nearest neighbors
The problem of finding, among a set of points, the k points closest to a query under a distance or similarity measure. In vector search, k nearest neighbors (k-NN) means the k documents whose embedding is closest to the query's.
See also: Euclidean distance , Cosine similarity , Exhaustive search
-
Neuromorphic computing
A branch of computer science that designs hardware imitating biological brain operation (spiking neurons, local memory, asynchronous computation). An active research field at Intel (Loihi), IBM (TrueNorth) and several academic laboratories.
See also: Spiking neural network
-
Non-linearity
The property of a function that is not affine. A non-linear activation function is mandatory in a deep network, otherwise the composition of several layers reduces to a single equivalent affine layer and depth loses its point.
See also: Activation function , Hidden layer
-
Nonce
A value used exactly once with a given key. Uniqueness, not secrecy, is the critical property: reusing a nonce with the same key completely breaks the scheme. A 192-bit random nonce (XChaCha20) makes collisions negligible, while a 96-bit counter (AES-GCM, ChaCha20-Poly1305) requires careful management to never exceed 2^32 messages per key.
Source: RFC 8439
See also: Authenticated encryption (AEAD) , Domain separation
-
Norm
The length of a vector, measured as the square root of the sum of its squares. For a vector x = (x₁, ..., xₙ), the norm ‖x‖ = √(x₁² + ... + xₙ²). It is the generalisation of the Pythagorean theorem to n dimensions.
See also: Vector , Dot product
-
Normal vector
Vector $w$ that defines the direction perpendicular to a hyperplane with equation $w \cdot x + b = 0$. Its direction indicates which side of the hyperplane a point lies on; its norm sets the scale of the signed distance.
Source: Bishop, PRML, ch. 4
See also: Hyperplane , Dot product , Norm
-
Normalization
The operation that brings a vector to length 1 by dividing it by its norm, without changing its direction. On such normalized vectors, cosine similarity reduces to the dot product, and ranking by cosine coincides with ranking by Euclidean distance. This is why many vector databases normalize embeddings on ingestion.
See also: Norm , Cosine similarity , Euclidean distance
-
Novikoff's theorem
If a dataset is linearly separable with geometric margin $\gamma > 0$ and radius $R = \max_i \|x_i\|$, then the perceptron algorithm initialised at zero converges in at most $T \leq (R / \gamma)^2$ corrections, regardless of the learning rate.
Source: Novikoff, 1962
See also: Perceptron , Linearly separable , Geometric margin , Cauchy-Schwarz inequality
-
Offset
A consumer's read position in a message log: the number of the next message it will read. The consumer owns and advances its own offset, not the log. Two readers of the same log therefore have independent offsets, and rewinding an offset to an earlier position is enough to re-read history.
Source: Kleppmann, 2017
See also: Message log , Consumer group , Message broker
-
Orthogonality
Two vectors are orthogonal when their dot product is zero. Geometrically, this matches a 90-degree angle between them. In machine learning, orthogonal inputs contribute independently to a neuron's computation.
See also: Dot product , Vector
-
Outbox relay
A process that reads the outbox table, publishes pending messages to the broker, then marks them as sent. It runs separately from the business service. Since it can publish a message then crash before marking it sent, it will republish it: its delivery is at-least-once, and duplicates are absorbed downstream by an idempotent consumer.
Source: Richardson, Microservices Patterns
See also: Transactional outbox , At-least-once delivery , Row claim
-
Padding oracle
An attack that exploits any observable signal (error message, response time, detectable behavior) revealing whether the padding of an encrypted block is valid. By iterating ciphertext modifications and observing responses, an attacker can decrypt the message without knowing the key. It is one of the classic reasons why encryption without authentication is dangerous.
See also: Malleability , Authenticated encryption (AEAD) , Constant-time
-
Partial derivative
The derivative of a multivariable function with respect to a single variable, the others held constant. It measures the slope along one axis. Stacked together, the partial derivatives form the gradient.
See also: Gradient , Derivative
-
Partial order
The guarantee that messages are ordered only within each partition, not across the whole log. A log offers a partial order, not a total order: two messages in the same partition keep their relative order, but two messages in different partitions have no defined order between them. This is why you need a key that groups causally related messages into the same partition.
Source: Kleppmann, 2017
See also: Partition , Partition key , Offset
-
Partition
A sub-stream of a message log that holds an ordered subset of the messages. A log is split into several partitions so that consumers can read them in parallel. Ordering is guaranteed only within a single partition, never across partitions. Each message is assigned to a partition by its partition key.
Source: Kleppmann, 2017
See also: Partition key , Partial order , Message log
-
Partition key
The value used to route a message to a partition, usually by hashing it. Two messages with the same partition key always land on the same partition, so they stay ordered relative to one another; messages with different keys spread across partitions and are processed in parallel. Choosing it well (for example the order id) buys per-key ordering without sacrificing throughput. It answers the question: which messages must stay ordered together?
Source: Apache Kafka, Documentation
See also: Partition , Partial order , Consumer group
-
Perceptron
The first artificial neuron able to learn, invented by Frank Rosenblatt in 1958. It combines a weighted sum of the inputs with a threshold function to produce a binary 0 or 1 decision.
Source: Rosenblatt, 1958
See also: Weighted sum , Bias
-
Power set
The set of all subsets of a set E, written P(E). Its elements are themselves sets: the empty set and E itself always belong to it. If E has n elements, then P(E) has 2 to the power n, because each element of E is either taken or left out of a subset. For example P({a, b}) = {∅, {a}, {b}, {a, b}}.
See also: Set , Inclusion , Membership
-
Predicate
A statement containing one or more free variables, whose truth value depends on what is substituted for those variables. "x > 3" is a predicate: it is neither true nor false until x is fixed or quantified. Once all its variables are fixed or bound by a quantifier, a predicate becomes a proposition.
See also: Proposition , Universal quantifier , Existential quantifier
-
Product quantization
Product Quantization (PQ). A vector compression technique: each vector is split into several slices, and within each slice the sub-vector is replaced by the index of the nearest centroid in a small learned dictionary (a codebook). A vector thus becomes a handful of codes, often one byte each, instead of hundreds of reals. Product quantization saves a great deal of memory, at the cost of reduced recall, since distances are only estimated.
See also: Approximate search , Euclidean distance , Recall@k
-
Proposition
A mathematical statement that is unambiguously either true or false, with no third possibility. This principle, called bivalence, is the starting point of all propositional logic.
See also: Logical connective , Truth table
-
Proximity graph
A structure where each vector (a node) is connected by edges to a handful of its nearest neighbors. Instead of a relationless bag of vectors that forces a full scan, you get a network you can walk through, hop by hop, to approach a query without visiting every point. It is the foundation of graph-based indexes such as HNSW.
See also: Nearest neighbors , HNSW , Small-world network
-
Query message
A message that asks a specific recipient for information without changing any system state (for example "What is this customer's loyalty balance?"). It is phrased as a question, targets a single handler, and the sender always expects data in return. It is the third family of messages, alongside the command that orders and the event that states a fact.
Source: Hohpe & Woolf, 2003
See also: Command message , Event message , Command-query separation , Request-reply
-
Recall@k
A measure of the quality of an approximate search: the fraction of the k true nearest neighbors (computed by exhaustive search) that the approximate method recovers in its top k results. A recall@k of 1 means no exact neighbor was missed; a recall@k of 0.8 means one exact neighbor in five escaped the search.
See also: Nearest neighbors , Exhaustive search
-
Regression
A supervised learning task that predicts a continuous value (a price, a temperature, a probability). Typically uses the identity as output activation and MSE as the loss function.
See also: Loss function , Activation function
-
ReLU
An activation function defined as ReLU(x) = max(0, x). Linear for positive values, zero for negative ones. Simple, fast to compute, and largely solves the vanishing gradient problem. The de facto standard in hidden layers of deep networks since 2012.
Source: Nair and Hinton, 2010
See also: Activation function , Sigmoid
-
Request-reply
An exchange pattern where the sender issues a request then waits, on a return channel, for the matching reply. It is the natural shape of a query: "What is the balance?" calls for "240 points". To match each reply to its request when several are in flight, a correlation identifier is often attached. Described by Hohpe & Woolf as Request-Reply.
Source: Hohpe & Woolf, 2003
See also: Query message , Command-query separation
-
Resting potential
The stable voltage difference a neuron's membrane maintains between the inside and outside of the cell when it receives nothing, on the order of -65 millivolts. It is the value the membrane potential drifts back to when no input arrives.
Source: Gerstner et al., 2014
See also: Membrane potential , Ion channel
-
Row claim
A lock that lets a relay take a row from the outbox table without a concurrent relay taking the same one. In SQL, the FOR UPDATE SKIP LOCKED clause: each relay claims rows that are still free and skips those already locked by another. Without this lock, several relays would publish the same message, a double dispatch.
Source: PostgreSQL, Documentation (SELECT FOR UPDATE SKIP LOCKED)
See also: Outbox relay , Transactional outbox
-
Saturation
The phenomenon by which an activation function reaches a nearly constant value (and therefore a nearly zero derivative) over large regions of its domain. Sigmoid saturates at very negative or very positive values, causing the vanishing gradient.
See also: Sigmoid , Vanishing gradient
-
Set
A collection of objects, called its elements, regarded as a single whole. A set is entirely determined by its elements: two sets with exactly the same elements are equal. It is described by extension, listing its elements in braces such as {1, 2, 3}, or by comprehension, giving the property its elements satisfy, such as {x | x > 3}.
See also: Membership , Inclusion , Power set
-
Sigmoid
An S-shaped activation function that takes any real number and squashes it into the open interval (0, 1). Its formula is σ(x) = 1 / (1 + e⁻ˣ). Historically the most used, it is today often replaced by ReLU in hidden layers.
See also: Activation function , ReLU
-
Signed distance
Perpendicular distance from a point to a hyperplane, carrying a sign that depends on which side of the hyperplane the point lies on. For the hyperplane $w \cdot x + b = 0$, it equals $d(x) = (w \cdot x + b) / \|w\|$: positive on one side, negative on the other, zero on the hyperplane itself.
Source: Hastie, Tibshirani, Friedman, ESL, ch. 4
See also: Hyperplane , Normal vector , Euclidean distance
-
Small-world network
A network that combines many local links (to nearby neighbors) with a few rare long-range links (to distant regions). Those shortcuts collapse path lengths: to cross the network, the number of hops grows like the logarithm of the number of nodes rather than like their count. It is the principle behind six degrees of separation, and the core of HNSW's efficiency.
See also: Proximity graph , HNSW
-
Softmax
A function that turns a vector of reals into a probability distribution. For a vector z, softmax(z)_i = exp(z_i) / sum(exp(z_j)). Used as the output activation in multi-class classification.
See also: Sigmoid , Activation function
-
Spiking neural network
A family of neural networks that communicate through discrete spikes in time, closer to biological operation than classical continuous networks. An active research field, rarely used in industrial practice so far.
See also: Neuromorphic computing
-
Stateful neuron
A neuron whose output depends on an internal variable that evolves over time (membrane potential, adaptive threshold), and thus on its recent history. As opposed to a stateless neuron, whose output depends only on the instantaneous input.
See also: Integrate-and-fire , Spiking neural network
-
Surrogate gradient
A training trick for spiking neural networks. Since the binary spike is not differentiable, its derivative is replaced by a smooth approximation during backpropagation, while the forward pass keeps the spiking dynamics. Formalized by Neftci, Mostafa and Zenke (2019).
Source: Neftci, Mostafa & Zenke (2019)
See also: Spiking neural network , Neuromorphic computing
-
Tanh
Hyperbolic tangent, an activation function similar to sigmoid but compressing values into (-1, 1) instead of (0, 1). Often used when a zero-centered output is desired. Its formula is tanh(x) = (eˣ - e⁻ˣ) / (eˣ + e⁻ˣ).
See also: Activation function , Sigmoid
-
Tautology
A proposition that is true regardless of the truth values of its components. Example: P ∨ ¬P (law of excluded middle). Its opposite is a contradiction, always false.
See also: Proposition , Truth table
-
Temporal coding
A way of representing information in the precise instant when a spike is emitted, rather than in the number of spikes per second. The firing time then carries the message, letting a spiking network compute with very few spikes. Contrasts with rate coding.
See also: Spiking neural network , Coincidence detection
-
Temporal coupling
A dependency that forces two services to be available at the same instant for an exchange to succeed. A direct synchronous call temporally couples the caller and the callee: if the callee is down or slow, the caller waits or fails. An asynchronous message removes this coupling by inserting a queue that accepts the request even when the recipient is absent.
See also: Asynchronous messaging , Event message
-
Threshold function
Binary activation function H(z) equal to 1 if z >= 0 and 0 otherwise. Also called the Heaviside function. It is the original activation of McCulloch-Pitts (1943) and Rosenblatt's perceptron (1958), later dropped because it is not differentiable.
See also: Activation function , Perceptron
-
Time constant
Characteristic timescale of a membrane's leak, written τ and equal to the product of resistance and capacitance, τ = R · C. After one time constant the potential has lost about 63% of its charge (37% remains, i.e. 1/e). It sets the discrete retention factor λ = e^(-Δt/τ).
Source: Gerstner et al., 2014
See also: Membrane potential , Integrate-and-fire
-
Transactional outbox
A pattern that removes the dual-write problem by writing only one system atomically. The service records the business state AND a row describing the message to send in the same database transaction. A relay then reads this outbox table and publishes the messages to the broker. An impossible distributed write becomes a local atomic write followed by a relay.
Source: Richardson, Microservices Patterns
See also: Dual write , Outbox relay , Row claim
-
Transformer
A neural network architecture introduced in 2017 by Vaswani et al. in "Attention is all you need". Built on the attention mechanism, it now dominates natural language processing and extends to vision and audio. It is the foundation of models like GPT, Claude, Gemini.
Source: Vaswani et al., 2017
See also: llm-mcp
-
Truth table
A table giving the truth value of a logical formula for each possible combination of its variables' values. For n variables it has 2ⁿ rows.
See also: Proposition , Logical connective
-
Two generals problem
A classic result in theoretical computer science: over an unreliable communication channel, where any message can be lost, no protocol with a finite number of messages lets two parties become jointly certain of a shared agreement. Applied to messaging, it proves that exactly-once delivery is impossible, because the acknowledgement itself can be lost: the sender must choose between risking loss or risking a duplicate.
Source: Akkoyunlu et al., 1975
See also: At-most-once delivery , At-least-once delivery , Effectively-once delivery
-
Union
The operation that joins two sets, written ∪. A ∪ B is the set of objects that belong to A or to B (or to both). Its membership condition is a disjunction: x ∈ A ∪ B is equivalent to "x ∈ A or x ∈ B". Union is to the logical "or" what the object is to the connective.
See also: Intersection , Complement , Logical connective
-
Universal approximation theorem
A result (Cybenko 1989, Hornik 1989) stating that a network with a single hidden layer, given enough neurons and a non-polynomial activation, can approximate any continuous function on a bounded domain to arbitrary accuracy. It guarantees that such a network exists, not that we can learn it.
Source: Cybenko, 1989 ; Hornik, 1989
See also: Multilayer perceptron , Expressive power , Hidden layer
-
Universal quantifier
The symbol ∀, read "for all" or "for every". The statement ∀x, P(x) is true when the predicate P holds for every element of the domain of discourse, without exception. A single counter-example is enough to refute it.
See also: Predicate , Existential quantifier , Domain of discourse , Counter-example
-
Vanishing gradient
The disappearance of the gradient in deep network layers. When the maximum derivative of an activation function is below 1, the gradient multiplies at each layer crossed and collapses exponentially. Identified by Glorot and Bengio (2010), it is one of the reasons for the shift to ReLU.
Source: Glorot and Bengio, 2010
See also: Sigmoid , ReLU , Saturation
-
Vector
A mathematical object represented as an ordered list of numbers. A vector of dimension n encodes n values. In machine learning, a neuron's inputs and weights are each a vector of the same dimension.
See also: Dot product
-
Vector space
A set whose elements, the vectors, can be added together and multiplied by a number, following consistency rules. Concretely for this course: the set of lists of n real numbers, where each embedding is a point. The dimension n is the number of coordinates.
-
Weighted sum
The addition of several values, each multiplied by a coefficient called weight. General formula Σ wᵢ xᵢ. It is the core of the artificial neuron's computation, before adding the bias and applying the activation function.
See also: Bias , Activation function
-
XOR (exclusive or)
Logical operation that returns 1 when exactly one of its two inputs is 1, and 0 otherwise. Its positive cases lie on a diagonal in 2D space, making them non separable by a single line. This makes XOR impossible to learn for a single perceptron.
Source: Minsky and Papert, 1969
See also: Perceptron