Documentation

Mathlib.Probability.StrongLaw

The strong law of large numbers #

We prove the strong law of large numbers, in ProbabilityTheory.strong_law_ae: If X n is a sequence of independent identically distributed integrable random variables, then ∑ i ∈ range n, X i / n converges almost surely to 𝔼[X 0]. We give here the strong version, due to Etemadi, that only requires pairwise independence.

This file also contains the Lᵖ version of the strong law of large numbers provided by ProbabilityTheory.strong_law_Lp which shows ∑ i ∈ range n, X i / n converges in Lᵖ to 𝔼[X 0] provided X n is independent identically distributed and is Lᵖ.

Implementation #

The main point is to prove the result for real-valued random variables, as the general case of Banach-space valued random variables follows from this case and approximation by simple functions. The real version is given in ProbabilityTheory.strong_law_ae_real.

We follow the proof by Etemadi [Etemadi, An elementary proof of the strong law of large numbers][etemadi_strong_law], which goes as follows.

It suffices to prove the result for nonnegative X, as one can prove the general result by splitting a general X into its positive part and negative part. Consider Xₙ a sequence of nonnegative integrable identically distributed pairwise independent random variables. Let Yₙ be the truncation of Xₙ up to n. We claim that

Almost surely, Xₙ = Yₙ for all but finitely many indices. Indeed, ∑ ℙ (Xₙ ≠ Yₙ) is bounded by 1 + 𝔼[X] (see sum_prob_mem_Ioc_le and tsum_prob_mem_Ioi_lt_top).
Let c > 1. Along the sequence n = c ^ k, then (∑_{i=0}^{n-1} Yᵢ - 𝔼[Yᵢ])/n converges almost surely to 0. This follows from a variance control, as

  ∑_k ℙ (|∑_{i=0}^{c^k - 1} Yᵢ - 𝔼[Yᵢ]| > c^k ε)
    ≤ ∑_k (c^k ε)^{-2} ∑_{i=0}^{c^k - 1} Var[Yᵢ]    (by Markov inequality)
    ≤ ∑_i (C/i^2) Var[Yᵢ]                           (as ∑_{c^k > i} 1/(c^k)^2 ≤ C/i^2)
    ≤ ∑_i (C/i^2) 𝔼[Yᵢ^2]
    ≤ 2C 𝔼[X^2]                                     (see `sum_variance_truncation_le`)

As 𝔼[Yᵢ] converges to 𝔼[X], it follows from the two previous items and Cesàro that, along the sequence n = c^k, one has (∑_{i=0}^{n-1} Xᵢ) / n → 𝔼[X] almost surely.
To generalize it to all indices, we use the fact that ∑_{i=0}^{n-1} Xᵢ is nondecreasing and that, if c is close enough to 1, the gap between c^k and c^(k+1) is small.

Prerequisites on truncations #

def ProbabilityTheory.truncation {α : Type u_1} (f : α → ℝ) (A : ℝ) :

α → ℝ

Truncating a real-valued function to the interval (-A, A].

Equations

ProbabilityTheory.truncation f A = (Set.Ioc (-A) A).indicator id ∘ f

Instances For

theorem MeasureTheory.AEStronglyMeasurable.truncation {α : Type u_1} {m : MeasurableSpace α} {μ : Measure α} {f : α → ℝ} (hf : AEStronglyMeasurable f μ) {A : ℝ} :

AEStronglyMeasurable (ProbabilityTheory.truncation f A) μ

theorem ProbabilityTheory.abs_truncation_le_bound {α : Type u_1} (f : α → ℝ) (A : ℝ) (x : α) :

|truncation f A x| ≤ |A|

@[simp]

theorem ProbabilityTheory.truncation_zero {α : Type u_1} (f : α → ℝ) :

truncation f 0 = 0

theorem ProbabilityTheory.abs_truncation_le_abs_self {α : Type u_1} (f : α → ℝ) (A : ℝ) (x : α) :

|truncation f A x| ≤ |f x|

theorem ProbabilityTheory.truncation_eq_self {α : Type u_1} {f : α → ℝ} {A : ℝ} {x : α} (h : |f x| < A) :

truncation f A x = f x

theorem ProbabilityTheory.truncation_eq_of_nonneg {α : Type u_1} {f : α → ℝ} {A : ℝ} (h : ∀ (x : α), 0 ≤ f x) :

truncation f A = (Set.Ioc 0 A).indicator id ∘ f

theorem ProbabilityTheory.truncation_nonneg {α : Type u_1} {f : α → ℝ} (A : ℝ) {x : α} (h : 0 ≤ f x) :

0 ≤ truncation f A x

theorem MeasureTheory.AEStronglyMeasurable.memLp_truncation {α : Type u_1} {m : MeasurableSpace α} {μ : Measure α} {f : α → ℝ} [IsFiniteMeasure μ] (hf : AEStronglyMeasurable f μ) {A : ℝ} {p : ENNReal} :

MemLp (ProbabilityTheory.truncation f A) p μ

theorem MeasureTheory.AEStronglyMeasurable.integrable_truncation {α : Type u_1} {m : MeasurableSpace α} {μ : Measure α} {f : α → ℝ} [IsFiniteMeasure μ] (hf : AEStronglyMeasurable f μ) {A : ℝ} :

Integrable (ProbabilityTheory.truncation f A) μ

theorem ProbabilityTheory.moment_truncation_eq_intervalIntegral {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.AEStronglyMeasurable f μ) {A : ℝ} (hA : 0 ≤ A) {n : ℕ} (hn : n ≠ 0) :

∫ (x : α), truncation f A x ^ n ∂μ = ∫ (y : ℝ) in -A..A, y ^ n ∂MeasureTheory.Measure.map f μ

theorem ProbabilityTheory.moment_truncation_eq_intervalIntegral_of_nonneg {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.AEStronglyMeasurable f μ) {A : ℝ} {n : ℕ} (hn : n ≠ 0) (h'f : 0 ≤ f) :

∫ (x : α), truncation f A x ^ n ∂μ = ∫ (y : ℝ) in 0..A, y ^ n ∂MeasureTheory.Measure.map f μ

theorem ProbabilityTheory.integral_truncation_eq_intervalIntegral {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.AEStronglyMeasurable f μ) {A : ℝ} (hA : 0 ≤ A) :

∫ (x : α), truncation f A x ∂μ = ∫ (y : ℝ) in -A..A, y ∂MeasureTheory.Measure.map f μ

theorem ProbabilityTheory.integral_truncation_eq_intervalIntegral_of_nonneg {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.AEStronglyMeasurable f μ) {A : ℝ} (h'f : 0 ≤ f) :

∫ (x : α), truncation f A x ∂μ = ∫ (y : ℝ) in 0..A, y ∂MeasureTheory.Measure.map f μ

theorem ProbabilityTheory.integral_truncation_le_integral_of_nonneg {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.Integrable f μ) (h'f : 0 ≤ f) {A : ℝ} :

∫ (x : α), truncation f A x ∂μ ≤ ∫ (x : α), f x ∂μ

theorem ProbabilityTheory.tendsto_integral_truncation {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {f : α → ℝ} (hf : MeasureTheory.Integrable f μ) :

Filter.Tendsto (fun (A : ℝ) => ∫ (x : α), truncation f A x ∂μ) Filter.atTop (nhds (∫ (x : α), f x ∂μ))

If a function is integrable, then the integral of its truncated versions converges to the integral of the whole function.

theorem ProbabilityTheory.IdentDistrib.truncation {α : Type u_1} {m : MeasurableSpace α} {μ : MeasureTheory.Measure α} {β : Type u_2} [MeasurableSpace β] {ν : MeasureTheory.Measure β} {f : α → ℝ} {g : β → ℝ} (h : IdentDistrib f g μ ν) {A : ℝ} :

IdentDistrib (ProbabilityTheory.truncation f A) (ProbabilityTheory.truncation g A) μ ν

theorem ProbabilityTheory.sum_prob_mem_Ioc_le {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] {X : Ω → ℝ} (hint : MeasureTheory.Integrable X MeasureTheory.volume) (hnonneg : 0 ≤ X) {K N : ℕ} (hKN : K ≤ N) :

∑ j ∈ Finset.range K, MeasureTheory.volume {ω : Ω | X ω ∈ Set.Ioc ↑j ↑N} ≤ ENNReal.ofReal ((∫ (a : Ω), X a) + 1)

theorem ProbabilityTheory.tsum_prob_mem_Ioi_lt_top {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] {X : Ω → ℝ} (hint : MeasureTheory.Integrable X MeasureTheory.volume) (hnonneg : 0 ≤ X) :

∑' (j : ℕ), MeasureTheory.volume {ω : Ω | X ω ∈ Set.Ioi ↑j} < ⊤

theorem ProbabilityTheory.sum_variance_truncation_le {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] {X : Ω → ℝ} (hint : MeasureTheory.Integrable X MeasureTheory.volume) (hnonneg : 0 ≤ X) (K : ℕ) :

∑ j ∈ Finset.range K, (↑j ^ 2)⁻¹ * ∫ (a : Ω), (truncation X ↑j ^ 2) a ≤ 2 * ∫ (a : Ω), X a

Proof of the strong law of large numbers (almost sure version, assuming only pairwise independence) for nonnegative random variables, following Etemadi's proof.

theorem ProbabilityTheory.strong_law_aux1 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hindep : Pairwise (Function.onFun (fun (f g : Ω → ℝ) => IndepFun f g MeasureTheory.volume) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) {ε : ℝ} (εpos : 0 < ε) :

∀ᵐ (ω : Ω), ∀ᶠ (n : ℕ) in Filter.atTop , |∑ i ∈ Finset.range ⌊c ^ n⌋₊, truncation (X i) (↑i) ω - ∫ (a : Ω), (∑ i ∈ Finset.range ⌊c ^ n⌋₊, truncation (X i) ↑i) a| < ε * ↑⌊c ^ n⌋₊

The truncation of Xᵢ up to i satisfies the strong law of large numbers (with respect to the truncated expectation) along the sequence c^n, for any c > 1, up to a given ε > 0. This follows from a variance control.

theorem ProbabilityTheory.strong_law_aux2 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hindep : Pairwise (Function.onFun (fun (f g : Ω → ℝ) => IndepFun f g MeasureTheory.volume) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) :

∀ᵐ (ω : Ω), (fun (n : ℕ) => ∑ i ∈ Finset.range ⌊c ^ n⌋₊, truncation (X i) (↑i) ω - ∫ (a : Ω), (∑ i ∈ Finset.range ⌊c ^ n⌋₊, truncation (X i) ↑i) a) =o[Filter.atTop ] fun (n : ℕ) => ↑⌊c ^ n⌋₊

The truncation of Xᵢ up to i satisfies the strong law of large numbers (with respect to the truncated expectation) along the sequence c^n, for any c > 1. This follows from strong_law_aux1 by varying ε.

theorem ProbabilityTheory.strong_law_aux3 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) :

(fun (n : ℕ) => (∫ (a : Ω), (∑ i ∈ Finset.range n, truncation (X i) ↑i) a) - ↑n * ∫ (a : Ω), X 0 a) =o[Filter.atTop ] Nat.cast

The expectation of the truncated version of Xᵢ behaves asymptotically like the whole expectation. This follows from convergence and Cesàro averaging.

theorem ProbabilityTheory.strong_law_aux4 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hindep : Pairwise (Function.onFun (fun (f g : Ω → ℝ) => IndepFun f g MeasureTheory.volume) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) :

∀ᵐ (ω : Ω), (fun (n : ℕ) => ∑ i ∈ Finset.range ⌊c ^ n⌋₊, truncation (X i) (↑i) ω - ↑⌊c ^ n⌋₊ * ∫ (a : Ω), X 0 a) =o[Filter.atTop ] fun (n : ℕ) => ↑⌊c ^ n⌋₊

The truncation of Xᵢ up to i satisfies the strong law of large numbers (with respect to the original expectation) along the sequence c^n, for any c > 1. This follows from the version from the truncated expectation, and the fact that the truncated and the original expectations have the same asymptotic behavior.

theorem ProbabilityTheory.strong_law_aux5 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) :

∀ᵐ (ω : Ω), (fun (n : ℕ) => ∑ i ∈ Finset.range n, truncation (X i) (↑i) ω - ∑ i ∈ Finset.range n, X i ω) =o[Filter.atTop ] fun (n : ℕ) => ↑n

The truncated and non-truncated versions of Xᵢ have the same asymptotic behavior, as they almost surely coincide at all but finitely many steps. This follows from a probability computation and Borel-Cantelli.

theorem ProbabilityTheory.strong_law_aux6 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hindep : Pairwise (Function.onFun (fun (f g : Ω → ℝ) => IndepFun f g MeasureTheory.volume) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) {c : ℝ} (c_one : 1 < c) :

∀ᵐ (ω : Ω), Filter.Tendsto (fun (n : ℕ) => (∑ i ∈ Finset.range ⌊c ^ n⌋₊, X i ω) / ↑⌊c ^ n⌋₊) Filter.atTop (nhds (∫ (a : Ω), X 0 a))

Xᵢ satisfies the strong law of large numbers along the sequence c^n, for any c > 1. This follows from the version for the truncated Xᵢ, and the fact that Xᵢ and its truncated version have the same asymptotic behavior.

theorem ProbabilityTheory.strong_law_aux7 {Ω : Type u_1} [MeasureTheory.MeasureSpace Ω] [MeasureTheory.IsProbabilityMeasure MeasureTheory.volume] (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) MeasureTheory.volume) (hindep : Pairwise (Function.onFun (fun (f g : Ω → ℝ) => IndepFun f g MeasureTheory.volume) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) MeasureTheory.volume MeasureTheory.volume) (hnonneg : ∀ (i : ℕ) (ω : Ω), 0 ≤ X i ω) :

∀ᵐ (ω : Ω), Filter.Tendsto (fun (n : ℕ) => (∑ i ∈ Finset.range n, X i ω) / ↑n) Filter.atTop (nhds (∫ (a : Ω), X 0 a))

Xᵢ satisfies the strong law of large numbers along all integers. This follows from the corresponding fact along the sequences c^n, and the fact that any integer can be sandwiched between c^n and c^(n+1) with comparably small error if c is close enough to 1 (which is formalized in tendsto_div_of_monotone_of_tendsto_div_floor_pow).

theorem ProbabilityTheory.strong_law_ae_real {Ω : Type u_2} {m : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} (X : ℕ → Ω → ℝ) (hint : MeasureTheory.Integrable (X 0) μ) (hindep : Pairwise (Function.onFun (fun (x1 x2 : Ω → ℝ) => IndepFun x1 x2 μ) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) μ μ) :

∀ᵐ (ω : Ω) ∂μ, Filter.Tendsto (fun (n : ℕ) => (∑ i ∈ Finset.range n, X i ω) / ↑n) Filter.atTop (nhds (∫ (x : Ω), X 0 x ∂μ))

Strong law of large numbers, almost sure version: if X n is a sequence of independent identically distributed integrable real-valued random variables, then ∑ i ∈ range n, X i / n converges almost surely to 𝔼[X 0]. We give here the strong version, due to Etemadi, that only requires pairwise independence. Superseded by strong_law_ae, which works for random variables taking values in any Banach space.

theorem ProbabilityTheory.strong_law_ae_simpleFunc_comp {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {E : Type u_2} [NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E] [MeasurableSpace E] (X : ℕ → Ω → E) (h' : Measurable (X 0)) (hindep : Pairwise (Function.onFun (fun (x1 x2 : Ω → E) => IndepFun x1 x2 μ) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) μ μ) (φ : MeasureTheory.SimpleFunc E E) :

∀ᵐ (ω : Ω) ∂μ, Filter.Tendsto (fun (n : ℕ) => (↑n)⁻¹ • ∑ i ∈ Finset.range n, φ (X i ω)) Filter.atTop (nhds (∫ (x : Ω), (⇑φ ∘ X 0) x ∂μ))

Preliminary lemma for the strong law of large numbers for vector-valued random variables: the composition of the random variables with a simple function satisfies the strong law of large numbers.

theorem ProbabilityTheory.strong_law_ae_of_measurable {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} [MeasureTheory.IsProbabilityMeasure μ] {E : Type u_2} [NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E] [MeasurableSpace E] [BorelSpace E] (X : ℕ → Ω → E) (hint : MeasureTheory.Integrable (X 0) μ) (h' : MeasureTheory.StronglyMeasurable (X 0)) (hindep : Pairwise (Function.onFun (fun (x1 x2 : Ω → E) => IndepFun x1 x2 μ) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) μ μ) :

∀ᵐ (ω : Ω) ∂μ, Filter.Tendsto (fun (n : ℕ) => (↑n)⁻¹ • ∑ i ∈ Finset.range n, X i ω) Filter.atTop (nhds (∫ (x : Ω), X 0 x ∂μ))

Preliminary lemma for the strong law of large numbers for vector-valued random variables, assuming measurability in addition to integrability. This is weakened to ae measurability in the full version ProbabilityTheory.strong_law_ae.

theorem ProbabilityTheory.strong_law_ae {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} {E : Type u_2} [NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E] [MeasurableSpace E] [BorelSpace E] (X : ℕ → Ω → E) (hint : MeasureTheory.Integrable (X 0) μ) (hindep : Pairwise (Function.onFun (fun (x1 x2 : Ω → E) => IndepFun x1 x2 μ) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) μ μ) :

∀ᵐ (ω : Ω) ∂μ, Filter.Tendsto (fun (n : ℕ) => (↑n)⁻¹ • ∑ i ∈ Finset.range n, X i ω) Filter.atTop (nhds (∫ (x : Ω), X 0 x ∂μ))

Strong law of large numbers, almost sure version: if X n is a sequence of independent identically distributed integrable random variables taking values in a Banach space, then n⁻¹ • ∑ i ∈ range n, X i converges almost surely to 𝔼[X 0]. We give here the strong version, due to Etemadi, that only requires pairwise independence.

theorem ProbabilityTheory.strong_law_Lp {Ω : Type u_1} {mΩ : MeasurableSpace Ω} {μ : MeasureTheory.Measure Ω} {E : Type u_2} [NormedAddCommGroup E] [NormedSpace ℝ E] [CompleteSpace E] [MeasurableSpace E] [BorelSpace E] {p : ENNReal} (hp : 1 ≤ p) (hp' : p ≠ ⊤) (X : ℕ → Ω → E) (hℒp : MeasureTheory.MemLp (X 0) p μ) (hindep : Pairwise (Function.onFun (fun (x1 x2 : Ω → E) => IndepFun x1 x2 μ) X)) (hident : ∀ (i : ℕ), IdentDistrib (X i) (X 0) μ μ) :

Filter.Tendsto (fun (n : ℕ) => MeasureTheory.eLpNorm (fun (ω : Ω) => (↑n)⁻¹ • ∑ i ∈ Finset.range n, X i ω - ∫ (x : Ω), X 0 x ∂μ) p μ) Filter.atTop (nhds 0)

Strong law of large numbers, Lᵖ version: if X n is a sequence of independent identically distributed random variables in Lᵖ, then n⁻¹ • ∑ i ∈ range n, X i converges in Lᵖ to 𝔼[X 0].