Documentation

ToMathlib.Probability.ProbabilityMassFunction.RenyiDivergence

Rényi Divergence for PMFs #

This file defines Rényi divergence on probability mass functions, following the multiplicative convention R_α(p ‖ q) = (∑ x, p(x)^α · q(x)^(1-α))^(1/(α-1)) used in the lattice cryptography literature (Falcon, GPV, etc.).

We also define the max-divergence (Rényi divergence of order ∞) as ⨆ x, p(x) / q(x).

Main Definitions #

PMF.renyiMGF a p q : ℝ≥0∞ — the Rényi moment generating function (the base quantity before exponentiation): ∑ x, p(x)^a * q(x)^(1-a)
PMF.renyiDiv a p q : ℝ≥0∞ — the multiplicative Rényi divergence of order a > 1: (renyiMGF a p q)^(1/(a-1))
PMF.maxDiv p q : ℝ≥0∞ — the max-divergence (order ∞): ⨆ x, p(x) / q(x)

Main Results #

renyiMGF_self — M_a(p ‖ p) = 1
renyiDiv_self — R_a(p ‖ p) = 1
renyiMGF_map_le — data processing inequality for deterministic maps
renyiDiv_map_le — data processing inequality for Rényi divergence
renyiDiv_prob_bound — probability preservation bound

Design Notes #

We use ℝ≥0∞ throughout rather than ℝ to avoid partiality issues (Rényi divergence can be infinite when q has zeros that p doesn't). The multiplicative form (not the log form) is used because:

The Falcon literature uses R_α (multiplicative), not D_α (log form).
Multiplicative form composes better: R_α(p₁⊗p₂ ‖ q₁⊗q₂) = R_α(p₁‖q₁) · R_α(p₂‖q₂).
The probability preservation bound is cleaner: q(E) ≥ p(E)^{α/(α-1)} / R_α(p‖q).

References #

Rényi, A. "On measures of entropy and information." 1961.
van Erven, T., Harremoës, P. "Rényi divergence and Kullback-Leibler divergence." 2014.
Bai, S., et al. "An Improved Compression Technique for Signatures Based on Learning with Errors." 2014. (multiplicative convention for lattice crypto)
Fouque, P.-A., et al. "A closer look at Falcon." ePrint 2024/1769.

Rényi Moment Generating Function #

noncomputable def PMF.renyiMGF {α : Type u_1} (a : ℝ) (p q : PMF α) :

The Rényi moment generating function of order a ∈ ℝ between two PMFs: M_a(p ‖ q) = ∑ x, p(x)^a * q(x)^(1-a). This is the base quantity whose 1/(a-1) power gives the multiplicative Rényi divergence.

Instances For

@[simp]

theorem PMF.renyiMGF_self {α : Type u_1} (a : ℝ) (p : PMF α) :

PMF.renyiMGF a p p = 1

Multiplicative Rényi Divergence #

noncomputable def PMF.renyiDiv {α : Type u_1} (a : ℝ) (p q : PMF α) :

The multiplicative Rényi divergence of order a > 1 between two PMFs: R_a(p ‖ q) = M_a(p ‖ q)^(1/(a-1)).

When a ≤ 1, this returns 1 (the trivial bound).

Instances For

@[simp]

theorem PMF.renyiDiv_self {α : Type u_1} (a : ℝ) (p : PMF α) :

PMF.renyiDiv a p p = 1

theorem PMF.renyiDiv_eq_rpow {α : Type u_1} {a : ℝ} (ha : 1 < a) (p q : PMF α) :

PMF.renyiDiv a p q = PMF.renyiMGF a p q ^ (a - 1)⁻¹

Max-Divergence (Rényi order ∞) #

noncomputable def PMF.maxDiv {α : Type u_1} (p q : PMF α) :

The max-divergence (Rényi divergence of order ∞): ⨆ x, p(x) / q(x). This bounds the pointwise likelihood ratio.

Instances For

@[simp]

theorem PMF.maxDiv_self {α : Type u_1} (p : PMF α) :

p.maxDiv p = 1

theorem PMF.maxDiv_one_le {α : Type u_1} (p q : PMF α) :

1 ≤ p.maxDiv q

Data Processing Inequality #

theorem PMF.renyiMGF_map_le {α' β : Type u₀} (a : ℝ) (ha : 1 ≤ a) (f : α' → β) (p q : PMF α') :

PMF.renyiMGF a (f <$> p) (f <$> q) ≤ PMF.renyiMGF a p q

Data processing inequality for the Rényi MGF under deterministic maps: M_a(f∗p ‖ f∗q) ≤ M_a(p ‖ q). Applying a deterministic function can only decrease the Rényi MGF.

theorem PMF.renyiDiv_map_le {α' β : Type u₀} (a : ℝ) (ha : 1 < a) (f : α' → β) (p q : PMF α') :

PMF.renyiDiv a (f <$> p) (f <$> q) ≤ PMF.renyiDiv a p q

Data processing inequality for the multiplicative Rényi divergence: R_a(f∗p ‖ f∗q) ≤ R_a(p ‖ q).

theorem PMF.renyiMGF_bind_right_le {α' β : Type u₀} (a : ℝ) (ha : 1 ≤ a) (f : α' → PMF β) (p q : PMF α') :

PMF.renyiMGF a (p.bind f) (q.bind f) ≤ PMF.renyiMGF a p q

Data processing inequality for the Rényi MGF under Markov kernels (post-processing).

theorem PMF.renyiDiv_bind_right_le {α' β : Type u₀} (a : ℝ) (ha : 1 < a) (f : α' → PMF β) (p q : PMF α') :

PMF.renyiDiv a (p.bind f) (q.bind f) ≤ PMF.renyiDiv a p q

Data processing inequality for the Rényi divergence under Markov kernels.

Multiplicativity (Product Distributions) #

theorem PMF.renyiMGF_prod {α' β : Type u₁} (a : ℝ) (p₁ q₁ : PMF α') (p₂ q₂ : PMF β) :

PMF.renyiMGF a (p₁.bind fun (x : α') => Prod.mk x <$> p₂) (q₁.bind fun (x : α') => Prod.mk x <$> q₂) = PMF.renyiMGF a p₁ q₁ * PMF.renyiMGF a p₂ q₂

Multiplicativity of the Rényi MGF for independent product distributions: M_a(p₁⊗p₂ ‖ q₁⊗q₂) = M_a(p₁ ‖ q₁) · M_a(p₂ ‖ q₂).

theorem PMF.renyiDiv_prod {α' β : Type u₁} (a : ℝ) (ha : 1 < a) (p₁ q₁ : PMF α') (p₂ q₂ : PMF β) :

PMF.renyiDiv a (p₁.bind fun (x : α') => Prod.mk x <$> p₂) (q₁.bind fun (x : α') => Prod.mk x <$> q₂) = PMF.renyiDiv a p₁ q₁ * PMF.renyiDiv a p₂ q₂

Multiplicativity of the Rényi divergence for independent product distributions: R_a(p₁⊗p₂ ‖ q₁⊗q₂) = R_a(p₁ ‖ q₁) · R_a(p₂ ‖ q₂).

Probability Preservation #

theorem PMF.renyiDiv_apply_bound {α : Type u_1} (a : ℝ) (ha : 1 < a) (p q : PMF α) (x : α) :

p x ^ (a / (a - 1)) / PMF.renyiDiv a p q ≤ q x

Pointwise probability preservation: for a single element x, p(x)^{a/(a-1)} / R_a(p ‖ q) ≤ q(x).

Proof: from p(x)^a * q(x)^{1-a} ≤ M_a(p ‖ q) (one term of the sum) we get p(x)^a ≤ M_a * q(x)^{a-1}, then take 1/(a-1) powers.

theorem PMF.renyiDiv_prob_bound {α : Type u_1} (a : ℝ) (ha : 1 < a) (p q : PMF α) (E : α → Prop) [DecidablePred E] :

(∑' (x : α), if E x then p x else 0) ^ (a / (a - 1)) / PMF.renyiDiv a p q ≤ ∑' (x : α), if E x then q x else 0

Probability preservation bound: for any event E, q(E) ≥ p(E)^{a/(a-1)} / R_a(p ‖ q).

This is the key lemma used in security reductions involving Rényi divergence: if R_a(p ‖ q) is small, then events that are likely under p remain likely under q.

From Relative Error to Rényi Divergence #

theorem PMF.renyiMGF_le_of_pointwise_le {α : Type u_1} (a : ℝ) (ha : 1 < a) (p q : PMF α) (δ : ENNReal) (hδ : ∀ (x : α), p x ≤ (1 + δ) * q x) :

PMF.renyiMGF a p q ≤ (1 + δ) ^ (a - 1)

If the pointwise relative error between two PMFs is bounded by δ, then the Rényi MGF is bounded. Specifically, if for all x, p(x) ≤ (1 + δ) · q(x), then M_a(p ‖ q) ≤ (1 + δ)^(a-1).

This is used to convert floating-point error bounds into Rényi divergence bounds.

theorem PMF.renyiDiv_le_of_pointwise_le {α : Type u_1} (a : ℝ) (ha : 1 < a) (p q : PMF α) (δ : ENNReal) (hδ : ∀ (x : α), p x ≤ (1 + δ) * q x) :

PMF.renyiDiv a p q ≤ 1 + δ

Rényi divergence bound from pointwise relative error.

Divergence to TV Distance #

theorem PMF.etvDist_le_of_maxDiv {α : Type u_1} (p q : PMF α) :

p.etvDist q ≤ 1 - (p.maxDiv q)⁻¹

Max-divergence bounds TV distance: ‖p - q‖_TV ≤ 1 - 1 / D_∞(p ‖ q).

When D = maxDiv p q = ⨆ x, p(x)/q(x) is finite, we have q(x) ≥ p(x)/D pointwise, so ∑ min(p(x), q(x)) ≥ 1/D, giving TV(p,q) = 1 - ∑ min(p(x),q(x)) ≤ 1 - 1/D.

theorem PMF.etvDist_sq_le_of_renyiDiv {α : Type u_1} (a : ℝ) (ha : 1 < a) (p q : PMF α) :

p.etvDist q ^ 2 ≤ 1 - (PMF.renyiDiv a p q)⁻¹

Finite-order Rényi divergence bounds TV distance (squared form): TV(p, q)² ≤ 1 - R_a(p ‖ q)⁻¹.

Proof sketch (not yet formalized):

TV ≤ √(1 - BC²) where BC = ∑ √(p(x) q(x)) is the Bhattacharyya coefficient (Hellinger–TV inequality, via Cauchy–Schwarz on |√p - √q| · |√p + √q|).
BC = M_{1/2}(p ‖ q), the Rényi MGF at order 1/2.
By log-convexity of α ↦ M_α (interpolating at 1/2, 1, α): BC ≥ R_α^{-1/2}, so BC² ≥ R_α⁻¹, giving TV² ≤ 1 - R_α⁻¹.

Formalizing this requires Hellinger distance and log-convexity of the Rényi MGF, neither of which is currently available.

Compare etvDist_le_of_maxDiv (the analogous linear bound for max-divergence).