The optimal transport problem and Wasserstein spaces
The optimal mass transfer problem was first introduced in 1781 by Gaspard Monge in his “Mémoire sur la Théorie des Déblais et des Remblais”. The seminal problem studied by Monge 1 can be formulated as follows in modern mathematical terms: given two probability measures µ, ν ∈ Pc(Rd), find a Borel map T : Rd → Rd such that T#µ = ν and ZRd ( ( ))d ( ) = T 0 ZRd ( ( ))d ( ) s.t. # = c x, T x µ x min c x, T x µ x T 0 µ ν . where c : R2d → (−∞, +∞] is a suitable cost function. As mentioned in the introduction, this problem is in fact extremely tough mathemati-cally speaking and remained unsolved for a long time. In particular, the image-measure constraint T#µ = ν is badly behaved in the sense that it does not define a nice closed or convex set in any reasonable functional topology. In 1942, the Russian mathematician Leonid Kantorovich introduced in  a relaxation of this constraint expressed in terms of transport plans.
Definition 1.3 (Transport plan). Let µ, ν ∈ P(Rd). We say that γ ∈ P(R2d) is a transport plan between µ and ν – denoted by γ ∈ (µ, ν) – provided that γ(A ×Rd) = µ(A) and γ(Rd × B) = ν(B) for any pair of Borel sets A, B ⊂ Rd. This property can be equivalently formulated in terms of pushforwards as π#1γ = µ and π#2γ = ν.
Transport plans are a much more convenient to study mass transportation problems than transport maps. Indeed, the set (µ, ν) is never empty and closed in the narrow topology of measures. The optimal mass transportation problem can then be formulated as follows: given two probability measures µ, ν ∈ P(Rd) and a cost function c : R2d → R, one searches for a transport plan γ ∈ (µ, ν) such that ZR2d ( )d ( ) = γ0 ZR2d ( )d ( ) s.t. γ0 ∈ ( ) c x, y γ x, y min c x, y γ0 x, y µ, ν .
This problem has been extensively studied in very broad contexts (see e.g. [17, 125, 132]) with high levels of generality on the underlying spaces and cost functions. In the particular case where c(x, y) = |x − y|p for some real number p ≥ 1, the optimal transport problem can be used to define a distance over the subset Pp(Rd) ⊂ P(Rd).
Definition 1.4 (Wasserstein distance and Wasserstein spaces). Given two probability measures µ, ν ∈ Pp(Rd), the p-Wasserstein distance between µ and ν is defined by Wp(µ, ν) = min ( |x − y|pdγ(x, y) 1/p Z γ R2d ) s.t. γ ∈ (µ, ν) .
The set of plans γ ∈ (µ, ν) achieving this optimal value is denoted 2 by o(µ, ν) and referred to as the set of optimal transport plans between µ and ν. The space (Pp(Rd), Wp) of probability measures with finite p-th moment endowed with the p-Wasserstein metric is called the Wasserstein space of order p. We recall some of the interesting properties of these spaces in the following Proposition (see e.g. [17, Chapter 7] or [132, Chapter 6]). Proposition 1.3 (Elementary properties of the Wasserstein spaces). The Wasserstein spaces (Pp(Rd), Wp) are separable geodesic spaces. The topology generated by the p-Wasserstein metric metrizes the weak-∗ topology of probability measures induced by the narrow convergence (1.2). More precisely, it holds that Wp(µ, µn) n + 0 if and only if p µn n→+∗∞ p →−→∞ * µ, x dµ(x). x dµn(x) →−→∞ ZR Z R | | | | d n + d Given µ, ν ∈ P(Rd), the Wasserstein distances are ordered, i.e. Wp1 (µ, ν) ≤ Wp2 (µ, ν) whenever p1 ≤ p2. Moreover, when p = 1, the following Kantorovich-Rubinstein duality formula holds Z W1(µ, ν) = sup φ(x) d(µ − ν)(x) s.t. Lip(φ; Rd) ≤ 1 , (1.4) φ Rd where Lip(φ; Rd) denotes the Lipschitz constant of the map φ(·) over Rd. In what follows, we shall mainly restrict our considerations to the Wasserstein spaces of order 1 and 2 built over Pc(Rd). In the particular case of the square Wasserstein distance, optimal transport plans enjoy a fairly explicit characterization that we recall in the following theorem which is a consequence of [17, Theorem 6.1.5].
Theorem 1.3 (Characterization of optimal plans for the square Wasserstein distance). Let µ, ν ∈ P2(R2d) and γ ∈ (µ, ν). Then, γ is a W2-optimal transport plan between µ and ν if and only if there exists a convex function ψ : Rd → Rd such that y ∈ ∂ψ(x) for γ -almost every (x, y) ∈ R2d.
This result is based on the necessary and suﬃcient optimality condition for the optimal optimal transport problem stating that γ is concentrated on a |·|2-monotone set described in terms of Kantorovich potentials. From this general result, we can deduce the following easy corollary which will prove useful in studying the geometric structure of (P2(Rd), W2). Corollary 1.1 (Perturbation of the identity by gradients of test functions). For any µ ∈ P2(Rd) and ξ ∈ Cc∞(Rd), there exists s¯ > 0 such that the application Id + srξ(·) is an W2-optimal transport map between µ and (Id + srξ)#µ for any s ∈ (−s,¯ s¯). Proof. For any s ∈ R, the map Id + srξ(·) is the gradient of the application x ∈ Rd 7→ 12 |x|2 + sξ(x). This function is smooth, and for |s| smaller than a given s¯ > 0, it is also convex. Whence, (Id, Id + srξ)#µ is concentrated on the subdiﬀerential of a convex function and is therefore W2-optimal as a consequence of Theorem 1.3.
We end these introductory paragraphs by recalling the concept barycenter in the context of optimal transport, and by providing an easy estimate for the 1-Wasserstein distance between barycenters.
Absolutely continuous curves in (P2(Rd), W2) and the analytical tangent space TanµP2(Rd)
In this section, we recall some structural results concerning continuity equations and absolutely continuous curves in Wasserstein spaces. We also link these concepts with the geometric structure of (P2(Rd), W2), and in particular with its tangent space TanµP2(Rd). We restrict our attention to the Wasserstein spaces of order 2 because in the subsequent chapters, we will only deal with compactly supported measures. We can therefore choose to use the squared Wasserstein distance which induces the metric structure which is the best suited to our control-theoretic purposes. Most of the notions presented in this section are borrowed from [17, Chapter 8]. Continuity equation in P2(Rd) are hyperbolic equations in divergence form given in the general form ∂tµ(t) + r · (w(t, ·)µ(t)) = 0, (1.7) where (t, x) ∈ [0, T ] ×Rd 7→w(t, x) ∈ Rd is a Borel vector field satisfying the integrability condition Z T Z |w(t, x)|2dµ(t)(x)dt < +∞. 0 Rd Equation (1.7) has to be understood in the sense of distributions against smooth and compactly supported test functions, i.e. Z T Z ∂tξ(t, x) + hrxξ(t, x), w(t, x)i dµ(t)(x)dt = 0, (1.8)
Classical subdiﬀerentials and Wasserstein gradients
In Proposition 1.6, we have proven a general chain rule along curves of measures generated by 1-parameter families of smooth vector fields. In Chapter 3, we will also require such a chain rule for curves of measures generated by multi-dimensional families of vector fields G : RN × Rd → Rd. In this context however, the subdiﬀerentiability of the functional in the sense of Definition 1.7 is not suﬃcient to perform such an expansion. With this goal in mind, we introduce the stronger and simpler notion of classical Wasserstein subdiﬀerential in the followin definition.
Definition 1.9 (Classical Wasserstein subdiﬀerentials and superdiﬀerentials). Let µ ∈ D(φ). We say that a map ξ ∈ L2(Rd, Rd; µ) belongs to the classical subdiﬀerential ∂−φ(µ) of φ(·) at µ provided that Z φ(ν) − φ(µ) ≥ sup hξ(x), y − xidγ(x, y) + o(W2(µ, ν)) γ∈ o(µ,ν) R2d for all ν ∈ P2(Rd). Similarly, we say that a map ξ ∈ L2(Rd, Rd; µ) belongs to the classical superdiﬀerential ∂+φ(µ) of φ(·) at µ if (−ξ) ∈ ∂−(−φ)(µ).
It has been proven recently in  that the definition of classical Wasserstein subdiﬀer-ential involving a supremum taken over the set of optimal transport plans is equivalent to the usual one introduced in  and described in Definition 1.7 which involves an infimum. This allows for the elaboration of a convenient notion of diﬀerentiability in Wasserstein spaces as detailed below. Definition 1.10 (Diﬀerentiable functionals in (P2(Rd), W2)). A functional φ : P2(Rd) 7→ R is said to be Wasserstein-diﬀerentiable at some µ ∈ D(φ) if ∂−φ(µ) ∩ ∂+φ(µ) 6= ∅. In this case, there exists a unique element rµφ(µ) ∈ ∂−φ(µ) ∩ ∂+φ(µ) ∩ TanµP2(Rd) called the Wasserstein gradient of φ(·) at µ, which satisfies Z φ(ν) − φ(µ) = hrµφ(µ)(x), y − xidγ(x, y) + o(W2(µ, ν)) (1.16) R2d for any ν ∈ P2(Rd) and any γ ∈ o(µ, ν). Given a Wasserstein-diﬀerentiable functional φ(·), we can define its Lie derivative in a tangent direction ξ ∈ TanµP2(Rd) by Lξφ(µ) = d φ((Id + sξ)#µ)|s=0 = ZRd hrµφ(µ)(x), ξ(x)idµ(x) (1.17) ds which exists an is uniquely determined as a consequence of Proposition 1.5 and Definition 1.10. In the following proposition, we show that the Wasserstein gradient introduced in Definition 1.10 coincides – when it exists – with the barycenter of the minimal selection in the extended subdiﬀerential defined in Theorem 1.5.
Proposition 1.7 (Wasserstein gradients are minimal selections). Let φ : P2(Rd) → (−∞, ∞] be diﬀerentiable at µ ∈ D(φ) in the sense of Definition 1.10. Then the minimal selection in the extended subdiﬀerential exists and is induced by the Wasserstein gradient, i.e. ∂◦φ(µ) = (Id × rµφ(µ))#µ. Proof. Since φ(·) is diﬀerentiable at µ ∈ D(φ), it holds that (Id × rµφ(µ))#µ ∈ ∂φ(µ) so that the extended Fréchet subdiﬀerential of φ(·) in the sense of Definition 1.7 is non-empty. Therefore by Theorem 1.5, there exists a minimal selection ∂◦φ(µ).
Table of contents :
1 Analysis in the Space of Measures and Optimal Transport Theory
1.1 Elements of measure theory and Wasserstein spaces
1.1.1 Elementary notions of measure theory
1.1.2 The optimal transport problem and Wasserstein spaces
1.1.3 Absolutely continuous curves in (P2(Rd),W2) and the analytical tangent space TanμP2(Rd)
1.2 First and second order differential calculus in (P2(Rd),W2)
1.2.1 Subdifferential calculus in Wasserstein spaces
1.2.2 Classical subdifferentials and Wasserstein gradients
1.2.3 Second order calculus in Wasserstein spaces
1.3 The continuity equation with non-local velocities in Rd
1.4 Directional derivatives of non-local flows
2 The Pontryagin Maximum Principle in the Wasserstein Space
2.1 A Simpler Pontryagin Maximum Principle with no interaction field and no running cost
2.2 Proof of Theorem 2.1
Step 1 : Needle-like variations in the non-local case
Step 2 : First-order optimality condition
Step 3 : Backward dynamics and Pontryagin maximization condition
Step 4 : Proof of the Pontryagin Maximum Principle for (P)
2.3 Examples of functionals satisfying hypotheses (H) of Theorem 2.1
3 A Pontryagin Maximum Principle for Constrained Optimal Control Problems in Wasserstein Spaces
3.1 Non-smooth multiplier rule and differentiable extension of functions
3.2 Proof of Theorem 3.1
Step 1 : Packages of needle-like variations
Step 2 : First-order optimality condition
Step 3 : Backward dynamics and partial Pontryagin maximization condition
Step 4 : Limiting procedure
3.3 Wasserstein differential of the running constraint penalization
3.4 Examples of functionals satisfying hypotheses (H)
4 Intrinsic Lipschitz Regularity of Mean-Field Optimal Controls
4.1 Preliminary material
4.1.1 Mean-field adapted structures and discrete measures
4.1.2 Existence of mean-field optimal controls for problem (P)
4.1.3 Existence of locally optimal Lipschitz feedbacks in finite-dimensional optimal control problems
4.2 Proof of Theorem 4.1
Step 1 : Packages of needle-like variations
Step 2 : Construction of a Lipschitz-in-space optimal controls for (PN) .
Step 3 : Existence of Lipschitz optimal controls for problem (P)
4.3 Discussions on the coercivity assumption (CON)
4.3.1 A generic sufficient condition for coercivity
4.3.2 Sharpness of the mean-field coercivity condition (CON) on a 1D
5 Convergence Analysis and Sparse Control of Weakly Cooperative Alignment Models
5.1 Convergence to consensus and flocking of randomly failed cooperative systems
5.1.1 Consensus under persistent excitation for first-order dynamics
5.1.2 Flocking for Cucker-Smale type systems with strong interactions
5.2 Sparse control of kinetic cooperative systems to approximate alignment .
5.2.1 Invariance properties of kinetic cooperative systems
5.2.2 Proof of Theorem 5.3
6 Generic Singularities of the 3D-Contact Sub-Riemannian Conjugate Locus
6.1 3D-Contact sub-Riemannian manifolds and their conjugate locus
6.2 Generic singularities of the full-conjugate locus