Step 2 sub-Finsler Pontryagin Maximum Principle

Section 3 Step 2 sub-Finsler Pontryagin Maximum Principle

In this section, the Pontryagin Maximum Principle will be rephrased in a convenient form for the purposes of Theorem 1.1. The precise statement to be proved is the following:

Proposition 3.1. Step 2 sub-Finsler PMP.

Let \(G\) be a step 2 sub-Finsler Carnot group with an arbitrary norm \(\norm{\cdot}\colon V_1\to\RR\) and let \(0\leq T\leq \infty\text{.}\) If \(u\colon [0,T]\to V_1\) is the control of a geodesic, then there exists an absolutely continuous curve \(a\colon [0,T]\to V_1^*\) and a skew-symmetric bilinear form \(B\colon V_1\times V_1\to\RR\) such that

At almost every \(t\in[0,T]\text{,}\) the curve \(a\) has the derivative
\begin{equation*} \frac{d}{dt}a(t)Y = B(u(t),Y)\quad\forall Y\in V_1\text{.} \end{equation*}
At almost every \(t\in[0,T]\text{,}\) the linear map \(a(t)\colon V_1\to\RR\) is a subdifferential of the squared norm \(\frac{1}{2}\norm{\cdot}^2\) at the point \(u(t)\in V_1\text{.}\)

Remark 3.2.

Up to changing the optimal control \(u\) on a set of measure zero, the subdifferential condition ii may be taken to hold for all \(t\in[0,T]\text{.}\)

Namely, if condition ii holds on a subset \(I\subset[0,T]\) of full measure, for any \(t\in [0,T]\setminus I\text{,}\) pick any converging sequence \(I\ni t_k\to t\) such that the limit \(\lim_{k\to\infty}u(t_k)\) exists, and redefine \(u(t)=\lim_{k\to\infty}u(t_k)\text{.}\) By the continuity of subdifferentials given by Lemma 2.18, it follows that \(a(t)\) is a subdifferential of the squared norm at the point \(u(t)\text{.}\)

Remark 3.3.

In the sub-Riemannian case, the squared norm \(\frac{1}{2}\norm{\cdot}^2\) is differentiable at every point, and the unique subdifferential is the inner product \(a(t) = \left\lt u(t),\cdot\right\gt\text{.}\) The derivative condition i then gives the usual linear ODE of controls in the implicit form

\begin{equation*} \left\lt \dot{u}(t),Y\right\gt = \frac{d}{dt}\left\lt u(t),Y\right\gt = B(u(t),Y)\quad\forall Y\in V_1\text{.} \end{equation*}

Subsection 3.1 General statement of the PMP

For the rest of Section 3, let \(G\) be a fixed sub-Finsler Carnot group of step 2 with an arbitrary norm \(\norm{\cdot}\colon V_1\to\RR\text{,}\) and let \(u\colon[0,T]\to V_1\) be the control of a geodesic \(\gamma\colon [0,T]\to G\text{.}\)

Consider first the finite time \(T\lt \infty\) case. By Definition 2.10 of the sub-Finsler distance, the control \(u\) minimizes the length functional \(\int_{0}^{T}\norm{u(t)}\,dt\) among all controls defining curves with the same endpoints as \(\gamma\text{.}\) Since a geodesic has by definition constant speed, it follows that \(u\) is also a minimizer of the energy functional \(\frac{1}{2}\int_{0}^{T}\norm{u(t)}^2\,dt\text{.}\)

Define the left-trivialized Hamiltonian

\begin{equation} h\colon V_1\times\RR\times\mathfrak{g}^*\to\RR,\quad h(u,\xi,\lambda) = \lambda(u) + \frac{1}{2}\xi\norm{u}^2\text{.}\label{eq-hamiltonian}\tag{3.1} \end{equation}

By the Pontryagin Maximum Principle as presented in Theorem 12.10 of [2], the control \(u\colon[0,T]\to V_1\) can minimize the energy \(\frac{1}{2}\int_{0}^{T}\norm{u(t)}^2\,dt\) only if there is an everywhere non-zero absolutely continuous dual curve \(t\mapsto (\xi,\lambda(t))\in \RR\times T^*_{\gamma(t)}G\) such that

\begin{align} \xi &\leq 0\label{PMP-normality}\tag{3.2}\\ \dot{\lambda} &= \vec{h}_{u(t),\xi}(\lambda)\quad\text{a.e. }t\in[0,T],\label{PMP-hamiltonian-flow}\tag{3.3}\\ h_{u(t),\xi}(\lambda(t)) &\geq h_{v,\xi}(\lambda(t))\quad\forall v\in V_1\quad\text{a.e. }t\in[0,T].\label{PMP-subdifferential}\tag{3.4} \end{align}

Here \(h_{v,\xi}\) and \(\vec{h}_{v,\xi}\text{,}\) for \(v\in V_1\text{,}\) are the left-invariant Hamiltonian and the associated Hamiltonian vector field respectively.

More explicitly, \(h_{v,\xi}\colon T^*G\to\RR\) is the function defined from the left-trivialized Hamiltonian (3.1) in the natural way by

\begin{equation*} h_{v,\xi}(\lambda)=h(v,\xi,L_{g}^*\lambda),\quad \forall \lambda\in T^*_gG\text{,} \end{equation*}

and \(\vec{h}_{v,\xi}\) is the Hamiltonian vector field associated with the left-invariant Hamiltonian \(h_{v,\xi}\) by duality through the canonical symplectic form on the cotangent bundle, see Section 4 of [1] for more details within the context of the PMP in the sub-Riemannian setting.

Observe that if \((\xi,\lambda(t))\) is a dual curve satisfying the conditions (3.2)–(3.4) of the PMP, then also any scalar multiple \((C\xi,C\lambda(t))\) for any \(C\gt 0\) satisfies the conditions (3.2)–(3.4) of the PMP. This observation allows the infinite time case \(T=\infty\) to be handled as a limit of the finite time case. Namely, if \(u\colon [0,\infty)\to V_1\) is the control of a geodesic, then all its finite restrictions \(\restr{u}{[0,k]}\colon [0,k]\to V_1\) for \(k\in\NN\) are also controls of geodesics, so by the above they have corresponding dual curves \(t\mapsto (\xi_k,\lambda_k(t))\text{.}\) By taking suitable rescalings of the \((\xi_k,\lambda_k)\text{,}\) there exists a non-zero limit \((\xi_\infty,\lambda_\infty)\text{,}\) which then satisfies the conditions (3.2)–(3.4) of the PMP on the entire interval \([0,\infty)\text{.}\)

Condition (3.2) is a binary condition \(\xi=0\) or \(\xi\neq 0\text{.}\) The case \(\xi=0\) is the case of an abnormal control \(u\text{,}\) and may be ignored in the step 2 setting, since the second order necessary criterion of the Goh condition (see e.g. Section 20 of [2]) implies that there are no strictly abnormal extremals in step 2. By rescaling \((\xi,\lambda)\) it therefore suffices to consider the normal case \(\xi=-1\text{.}\)

Subsection 3.2 The PMP in left-trivialized coordinates

Let \(X_1,\ldots,X_r\) be a basis of \(V_1\text{.}\) Fix a basis \(X_{r+1},\ldots,X_n\) for \(V_2=[V_1,V_1]\) by choosing a maximal linearly independent subset of the Lie brackets \(\{[X_i,X_j]: 1\leq i\lt j\leq r\}\text{.}\) By an abuse of notation, denote also by \(X_1,\ldots,X_n\text{,}\) the corresponding left-invariant frame of \(TG\text{.}\) Let \(\theta_1,\ldots,\theta_n\) be the dual left-invariant frame of \(T^*G\text{.}\) Writing the curve \(\lambda(t)\) in left-trivialized coordinates as

\begin{equation*} \lambda(t) = \sum_{i=1}^n\lambda_i(t)\theta_i(\gamma(t))\text{,} \end{equation*}

the Hamiltonian ODE (3.3) in the normal case \(\xi=-1\) takes the simpler form

\begin{equation} \dot{\lambda}_i(t) = \lambda(t)\Big(\Big[\sum_{j=1}^ru_j(t)X_j, X_i\Big](\gamma(t))\Big),\quad i=1,\dots,n\text{,}\label{eq-left-trivialized-hamiltonian-flow}\tag{3.5} \end{equation}

see Section 18.3 of [2] for the explicit computation.

Proof of Proposition 3.1.

The curve \(a\colon[0,T]\to V_1^*\) will be given by restricting the linear map

\begin{equation} a(t) := (L_{\gamma(t)})^*\lambda(t)\colon \mathfrak{g}\to\RR\label{eq-subdifferential-curve-defn}\tag{3.6} \end{equation}

to \(V_1\text{.}\) The skew-symmetric bilinear form \(B\colon V_1\times V_1\to\RR\) will be given by

\begin{equation} B(X,Y) := a(t)[X,Y]\text{.}\label{eq-bilinear-form-defn}\tag{3.7} \end{equation}

The curve \(a(t)\) of (3.6) has in the left-invariant frame the same coefficients as the curve \(\lambda(t)\text{,}\) i.e., the coefficients of \(a(t) = \sum_{i=1}^na_i(t)\theta_i(e)\) are exactly \(a_i=\lambda_i\text{.}\) Left-translating the Hamiltonian ODE (3.5) to the identity shows that for almost every \(t\in[0,T]\text{,}\) the components of the curve have the derivatives

\begin{equation} \dot{a}_i(t) = \frac{d}{dt}\lambda_i(t) = a(t)[u(t),X_i],\quad i=1,\ldots,n\text{.}\label{eq-coordinate-ode}\tag{3.8} \end{equation}

By the step 2 assumption, \([u(t),X_i]=0\) for all the vertical components \(i=r+1,\dots,n\text{,}\) so the vertical coefficients \(a_{r+1},\dots,a_n\) are all constant. Therefore \(a(t)[X,Y] = \sum_{i=r+1}^{n}a_i\theta_i([X,Y])\) is constant in \(t\text{.}\) That is, the expression (3.7) defines a unique bilinear form \(B\) independent from \(t\text{.}\)

Writing the system (3.8) in terms of the bilinear form \(B\text{,}\) the remaining non-trivial equations are exactly

\begin{equation*} \dot{a}_i(t) = a(t)[u(t),X_i] = B(u(t),X_i),\quad i=1,\ldots,r\text{.} \end{equation*}

The derivative condition 3.1 i follows by linearity, as for an arbitrary vector \(Y=y_1X_1+\dots+y_rX_r\in V_1\text{,}\) the above implies that

\begin{equation*} \frac{d}{dt}a(t)Y = \frac{d}{dt}\sum_{i=1}^na_i(t)y_i = \sum_{i=1}^nB(u(t),X_i)y_i = B(u(t),Y)\text{.} \end{equation*}

The subdifferential condition 3.1 ii for the linear functions \(a(t)\) follows from rephrasing the maximality condition (3.4). Namely, expanding out the explicit expressions of the normal Hamiltonians \(h_{u(t),-1}\) and \(h_{v,-1}\) from (3.1) and reorganizing terms, the maximality condition (3.4) is equivalently stated as

\begin{equation*} a(t)v - a(t)u(t) \leq \frac{1}{2}\norm{v}^2-\frac{1}{2}\norm{u(t)}^2 \quad\forall v\in V_1\quad\text{a.e. }t\in[0,T]\text{.} \end{equation*}

This is exactly Definition 2.17 stating that the linear function \(a(t)\) is a subdifferential of the squared norm \(\frac{1}{2}\norm{\cdot}^2\) at the point \(u(t)\in V_1\text{.}\)