Nonlinear Piola Transform


 

This is an extension for Linear Piola Transform, where we discuss the cases where the transformation from the reference element to the physical element is nonlinear such as a parametric element. The main tool here is actually from the exterior calculus framework, and will just be from the FEEC book by Arnold.

Let $\Omega, \hat \Omega$ by the physical element and reference element with an orientation-preserving, differentiable map $\phi: \hat \Omega \to \Omega$. Suppose $p: \Omega \to \mathbb{R}, v: \Omega \to \mathbb{R}^3$ are the functions on the physical element. The quantity $(p, \nabla \cdot v) = \int_\Omega p \nabla \cdot v \, dx$ is of importance.

The key observation here is that by letting $\nu = v^1 dx^2 \wedge dx^3 – v^2 dx^1 \wedge dx^3 + v^3 dx^1 \wedge dx^2$, we have $d\nu = v$ where $d$ is the exterior derivative. A key fact here is that with integral of differential forms is it preserves pullbacks (see second answer) hence
\begin{align*}
(p, \nabla \cdot v) = \int_\Omega p \nabla \cdot v \, dx = \int_\Omega p \wedge d\nu = \int_{\hat \Omega} \phi^*(p \wedge d\nu)
\end{align*}
where $\phi^*$ is the pullback of differential form $\phi$.

Now, it’s just a matter of algebra
\begin{align*}
\int_{\hat \Omega} \phi^*(p \wedge d\nu) = \int_{\hat \Omega} \phi^*p \wedge \phi^* d\nu = \int_{\hat \Omega} \phi^*p \wedge d\phi^*\nu.
\end{align*}
Now, we need to look up what the pullback does to a 0-form $p$ and on the 2-form $\nu$. Well, we can look this up or just algebraically do it (which I’ll do it sooner or later because I couldn’t find a good source online), we have $\phi^* p = p \cdot \phi$ and
\begin{align*}
\phi^* \nu = (\det \phi’ )(\phi’)^{-1} (v \cdot \phi): \hat \Omega \to \mathbb{R}^3
\end{align*}
which is exactly the Piola transform.

Simple Neural ODE Code

 

 

Maxwell’s Equations


This is a “short” note deriving the Maxwell’s equations from first principles and experiments of physics; I will assume that one is familiar with div-curl-grad and all that. This is essentially an extremely condensed version of Griffith’s textbook which discusses the fundamental laws. Some level of rigor is dropped for conciseness.

1. Electrostatics We start with the idea of electrostatics, which studies the forces exerted on charged particle(s) from static (non-moving) sources. The governing equation is Couloumb’s law
\begin{align}\label{eqn:coul}
F = \frac{1}{4\pi\varepsilon_0} \frac{q_1q_2}{r^2} \vec{r}
\end{align}
where $q_i$ are the charges, $\varepsilon_0$ is some constant (permittivity of free space) and $\vec r$ is the vector between the two charges. The sign is repulsive if $q_1, q_2$ have the same charge and vice versa.

Suppose we have a single charge $Q$.  If there exists a multiple sources affecting $Q$, we can simply take the sum of them (superposition). Similarly, we can take the integral if the sources has a distribution over some $n$-dimensional volume; this leads to the concept of the electric field, defined as
\begin{align}\label{eqn:elec}
E(r) = \frac{1}{4\pi\varepsilon_0} \sum_{k=1}^n\frac{q_k}{r_k^2} \vec{r_k} \to E(r) = \frac{1}{4\pi\varepsilon_0} \int \frac{1}{r^2} \vec r \, dq
\end{align}
where $dq$ is the charge per unit volume. I am abusing a bit of notation here with $E(r) \sim \int \frac{1}{r^2} \vec r \, dq$, but the electric field has a value at any point in space and $r^2$ term is simply the distance. With \cref{eqn:elec}, one can calculate the force on $Q$ as $F = QE(r)$.

Focusing on a single charge, suppose that it’s a positive charge at the origin; the electric field $E(r)$ of this configuration radiates away from the origin. If we think about it, we see the total flux of the electric field through a spherical (in fact any shape $S$) surface surrounding the single charge only depends on the value of the charge inside. This is equivalent to saying
\begin{align}
\oint_S E \cdot da = \frac{1}{\varepsilon_0} Q_{enc}.
\end{align}
Applying divergence theorem, we have that
\begin{align*}
\oint_S E \cdot da = \int_V \nabla \cdot E \, dV = \frac{1}{\varepsilon_0}\int_V \rho dV
\end{align*}
where $Q_{enc} = \int_V \rho$ with $\rho$ the charge density inside the volume, leading to Gauss’s law:
\begin{align}
\nabla \cdot E = \frac{1}{\varepsilon_0} \rho.
\end{align}

What about the curl of the electric field $E(r)$? If we assume that the charges aren’t moving, then we actually have $\nabla \times E = 0$; the proof is by visualization from a single point charge as the fields point outwards. Note again, that this is not true once we introduce motion since magnetism arises.

2. Magnetostatics Moving to the more difficult magnetism aspect, the key concept here is that moving charges generates a magnetic field $B$. So besides the electric field, a moving charge (which is related to the concept of current) produces a magnetic field $B$. In particular, along a wire, the magnetic field satisfies the right hand rule meaning we expect to see cross products here. Thus, for two wires parallel to each other with current running in the same direction, using the right hand rule, we see that the wires will attract.

In fact, Lorentz force law (axiom of the theory) states that the force on a charge $Q$, moving with velocity $v$ in a field of $B$ is
\begin{align*}
F_{mag} = Q(v \times B).
\end{align*}
Note that since the velocity is cross-producted with $B$, the resulting force is perpendicular to the velocity, meaning no work is done by the magnetic force since it can only change direction of the charge but not the speed. The total force, including electric force, is
\begin{align*}
F = Q(E + (v \times B)).
\end{align*}

Before moving, on we have to define a “current” which is charge per unit time passing a single point with a unit of amps or coulombs per second. A current carrying wire with charge moving at velocity $v$ and charge $\lambda$ is $I = \lambda v$. If the charge flow is over a surface/volume, we use the unit “surface/volume current density” to describe it.
The formula is at velocity $v$ and density $\sigma$ then $J = \rho v$ where $J$ is called the volume current density and $\rho$ volume charge density. Calculating the magnetic force on an volume is
\begin{align*}
F_{mag} = \int (v \times B) \rho \, dV = \int (J \times B) \, dV
\end{align*}
with appropriate changes for lower dimensional entities.

The total current crossing a surface $S$ is simply
\begin{align*}
I = \int_S J \cdot da.
\end{align*}
In particular, the charge per unit time leaving a volume is
\begin{align*}
\oint_S J \cdot da = \int_V (\nabla \cdot J) \, dV
\end{align*}
by divergence theorem. Since charge is conserved, after all, one can visualize them as little particles, the flow outside must come from inside, meaning we have the “continuity equation”
\begin{align}\label{eqn:cont}
\nabla \cdot J = – \frac{\partial \rho}{\partial t}.
\end{align}
In the study of magnetostatics, we assume that $\frac{\partial \rho}{\partial t} = 0$.

This allows us to discuss magnetostatics, where instead of stationary charges like electrostatics, we have steady currents $\frac{\partial J}{\partial t} = 0$. These types of situations don’t arise in experiments, but it’s oddly accurate even in household applications. The corresponding Couloumb’s law here is called Biot-Savart law, given by
\begin{align}\label{eqn:B}
B(r) = \frac{\mu_0}{4\pi} \int \frac{J(r’) \times \vec r}{r^2} \, dV’
\end{align}
on a volume where $\mu_0$ is called the permeability of free space with the units coming out of the $B$ is in terms of teslas $T$ (or gauss) which is Newton per amp-meter. We also abused notation here where $r^2$ is the distance and $\vec r$ is the direction.

In the most basic case of magnetostatics, we consider a single wire with current (comparable to a single point charge in electrostatics). The magnetic field lines are simply circles around the wire meaning the curl is non-zero. One can find from calculation is that
\begin{align*}
\oint B \, dl = \mu_0 I
\end{align*}
where we are integrating a circular path of radius $s$ around the wire; this generalizes by superposition to multiple wires carrying current. In fact, the domain doesn’t matter, as long as it goes around the wire as the magnetic field loses strength at the same rate of increase from the circumference/perimeter. Now, the current $I$ enclosed by the volume can be expressed as
\begin{align*}
I = \int J \cdot dA
\end{align*}
where $J$ is the volume current density, meaning applying Stokes theorem gives us
\begin{align*}
\nabla \times B = \mu_0 J
\end{align*}
The above is a nice thought experiment, but it doesn’t generalize lol. One of the assumptions made (which is not obvious) is that the wire is of infinite straight wires! It is better to look at the Bio-Savart law itself.

We really want to look at \cref{eqn:B}. Note that $B$ is a function of $(x,y,z)$, the current distribution depends on $(x’, y’, z’)$, while $r$ is the distance between the point and the tilde points with the integral over the tilde; a key note is the div and curl of $B$ are over the unprimed coordinates.

With some amount of work using product rules and all that, one can show that $\nabla \cdot B = 0$, and taking a curl results in
\begin{align*}
\nabla \times B = \mu_0 J(r) \rightarrow \oint B \cdot dI = \mu_0 I_{enc}
\end{align*}
which is called Ampere’s law (so our above derivation is actually correct!).

Let’s do a quick review of magnetostatics and electrostatics:

  1. [Electrostatics]: Gauss’s law discusses the divergence and of the electric field, and the curl of it is always zero. These are called Maxwell’s equations for electrostatics. Essentially derived from Coulomb’s law plus superposition.

  2. [Magnetostatics]: Ampere’s law discusses the curl of the magnetic field, while the divergence is zero.  Again, these are Maxwell’s equations and derived from Biot-Savart law.

There’s more things to discuss, like the potential for magnetism, but we will skip it to move onto more interesting stuff.

3. Electrodynamics When there’s a current, there needs to a be a force moving those charges. Apparently, for most substances, one has
\begin{align*}
J = \sigma f
\end{align*}
where $J$ is the current density, $f$ force per unit charge, and $\sigma$ is a proportionality factor related to the conductivity/resistivity of a matter. For our purposes (e.g. not chemical or gravitational or nuclear), we have
\begin{align*}
J = \sigma (E + v \times B)
\end{align*}
but a good first-order approximation, since $v$ is usually small, is $J = \sigma E$ (called Ohm’s law usually written as $V = IR$).

Another way of describing this force is called the electromotive force, or emf, of the circuit. The emf is not a force, but rather defined as
\begin{align*}
\mathcal{E} = \oint f \cdot dl
\end{align*}
which is really force per unit charge. Another interpretation is it’s the work done per unit charge by the source (such as a battery). From this again, one can easily tie in what a generator is which uses motional emfs as the principle. The action of moving a wire through a magnetic field generates an emf of $\mathcal{E} = vBh$ where $h$ is the length of the wire, $v$ is the velocity and $B$ the magnetic field; this is very much an interpretation of work. Indeed, if we let $\Phi$ be the flux of the $B$ through the loop of wire, then $\mathcal{E} = -\frac{d\Phi}{dt}$.

A key concept of electrodynamics is the fact that a changing magnetic field induces an electric field. Through experimentation, this relation can be better quantified as $\oint E dl = – \int \frac{\partial B}{\partial t} da$ which means that, by Stokes’ theorem, $\nabla \times E = – \frac{\partial B}{\partial t}$; this is called Faraday’s law. This generalizes electrostatic to be time-dependent regime. With Ampere’s law, we can talk about Maxwell’s contribution, which at the time, was
\begin{align*}
\nabla \cdot E &= \frac{1}{\varepsilon_0} \rho, \\
\nabla \cdot B &= 0, \\
\nabla \times E &= – \frac{\partial B}{\partial t}, \\
\nabla \times B &= \mu_0 J.
\end{align*}

The problem with the above formula is that it’s not consistent with simple exterior calculus rules. In particular, div of curl should be zero, but the divergence of the curl of the magnetic field is not zero. Of course, for steady current, $\nabla \cdot J = 0$, but in general no.

The problem is that $\nabla \cdot J$ isn’t zero; we can rewrite this term using \cref{eqn:cont}
\begin{align*}
\nabla \cdot J = – \frac{\partial \rho}{\partial t} = – \frac{\partial}{\partial t}(\varepsilon_0 \nabla \cdot E) = – \nabla \cdot (\varepsilon_0 \frac{\partial E}{\partial t}).
\end{align*}
It goes without saying that just adding the above term will kill the divergence term! Lab experiments couldn’t find this term since $J$ is quite large usually, but arises in so called electromagnetic waves.

 

KKT Conditions

I need to relearn optimization, so here’s my incredibly short review.

The most basic problem to solve is to minimize $f(x)$ such that $g(x) = 0$. Since the constraint is an equality condition, we use the Lagrange multiplier. We look at the Lagrangian function $\mathcal{L}(x, \lambda) = f(x) – \lambda g(x)$; it’s easy to see that the minima of the original problem satisfies some sort of saddle point condition. This Lagrangian minimization problem can be solved by taking the gradients and setting it equal to 0.

As an example, let’s consider the curl-curl problem I’ve been looking at.
\begin{align*}
A u &= f, \\
B u &= 0
\end{align*}
with some appropriate boundary conditions which I will skip. The matrix $A$ corresponds to a curl-curl operator in strong form and $B$ is a div operator. The zero-divergence condition on the function is critical for the physics; $u$ should be thought of as a magnetic field and thus satisfy Gauss’s law.

Since $A$ is positive semi-definite, this is really a minimization problem $\min J(u) = u^T A u/2 – f^T u$ with corresponding Lagrangian $\mathcal{L}(u, \lambda) = u^T A u/ 2 – f^T u – \lambda^T Bu$ (note that in this case, $\lambda$ is a vector). The gradient with respect to the multiplier gives $Bu = 0$ as expected, and the gradient with respect to $u$ gives $Au – B^T\lambda = f$. Combining these gives the saddle point problem that we are familiar with. We can also obtain this sort of result using the functional formulation; without diving into too much details, it’s a similar process except with the Euler-Lagrange equation.

So what in the world is KKT conditions then? It’s just the generalization of Lagrange multipliers to inequalities… Specifically, one forms the Lagrangian again, and then set all the gradients equal to 0 and magically we get the correct minimum. Now looking back on this, this is a really strong result. But man, the way the econ professors taught this was tragically bad.

On a Theorem by Seeley

In my current work, I use the eigenfunctions of the Laplacian: $\varphi_k, \lambda_k \in H^1_0, \mathbb{R}^+$ satisfying
\begin{align*}
-\Delta \varphi_k = \lambda_k \varphi_k.
\end{align*}
It is well known that $\{\varphi_k\}_{k=1}^\infty$ provides an orthonormal basis for $L^2$, and is also orthogonal in $H^1$. Hence, any function $f \in L^2$ can be expressed as $f = \sum_{k=1}^\infty f_k \varphi_k$ where $f_k = (f, \varphi_k)$.

Unfortunately, there are not many properties which can be derived from this eigenfunction expansion. Besides the fact that the squared $L^2$ norm of $f$ is simply $\sum_{k=1}^\infty f_k^2$, and the $H^1$ norm is $\sum_{k=1}^\infty f_k^2 \lambda_k$, the connection between regularity and expansion is tenuous at best.

I was excited about the paper Eigenfunction Expansions of Analytic Functions by Seeley. In it, the author claimed to have derived a theorem giving necessary and sufficient conditions on analyticity and the eigenfunction expansion: a function $f$ is analytic iff $\sum_{k=1}^\infty s^{\sqrt\lambda_k} f_k^2 < \infty$ or $\{s^{\sqrt{\lambda_k}} |f_k| \}$ is bounded for some $s > 1$. Unfortunately, I don’t think it is an iff.

In particular, on a square with $f = 1$. We know the coefficients are
\begin{align*}
f_{ij} = \frac{2 \left((-1)^i-1\right) \left((-1)^j-1\right)}{\pi ^2 i j} \approx \frac{1}{ij}
\end{align*}
and so the theorem is stating that
\begin{align*}
s^{\sqrt{\lambda_{mm}}} f_{mm} &= \frac{2 \left((-1)^m-1\right)^2 s^{\sqrt{2} \pi m}}{\pi ^2 m^2} \\
&\approx \frac{s^{\sqrt{2} \pi m}}{m^2} \to \infty
\end{align*}
as $m\to \infty$ which is clearly unbounded.

Rather, more conditions needs to be imposed on the theorem. It is not hard to show that any function satisfying $\sum_{k=1}^\infty s^{\sqrt\lambda_k} f_k^2 < \infty$ is in any $\mathbb{H}^s := \{ u \in L^2 | \sum_{k=1}^\infty u_k^2 \lambda_k^s < \infty \}$ space for $s \ge 0$. Coincidentally, we know $\mathbb{H}^s = H^s_0$ for $1 > s > 1/2$ by an interpolation argument, meaning that at the minimum, our functions need to vanish at the boundary.

Eigenfunctions and Eigenvalues of the Laplacian of the “Pacman” Domain

We will derive eigenfunctions and eigenvalues on a Pacman domain, which in polar coordinates is $\Omega = \{(r, \theta) : r \in [0, 1], \theta \in [0, 3\pi/2]\}$.
The problem is
\begin{align*}
-\Delta u &= \lambda u \qquad \Omega\\
u &= 0 \qquad \partial \Omega
\end{align*}

In polar coordinates, the Laplacian is
\begin{align}
\Delta = \frac{\partial^2 }{\partial r^2} + \frac{1}{r} \frac{\partial}{\partial r} + \frac{1}{r^2} \frac{\partial^2}{\partial \theta^2}.
\end{align}
Thus, using separation of variables $u(r, \theta) = R(r) \Theta(\theta)$ where $R(1) = 0, \Theta(0) = \Theta(3\pi/2) = 0$, we have
\begin{align*}
\Delta u &= \Theta R” + \frac{1}{r} R’ \Theta + \frac{1}{r^2} R \Theta ” = -\lambda R \Theta.
\end{align*}
Simplifying, we have
\begin{align}\label{eqn:sum0}
\frac{r^2 R” + r R’ + \lambda r^2 R}{R} + \frac{\Theta ”}{\Theta} = 0.
\end{align}
In order for the above to be satisfied, we need each term to be constant, so assume that
\begin{align*}
\frac{\Theta”}{\Theta} = -\lambda_\theta
\end{align*}
where $-\lambda_\theta$ is a constant.
Taking into account the boundary condition, we know that
\begin{align*}
\Theta(\theta) = \sin\left(\frac{2}{3}n \theta \right)
\end{align*}
and $\lambda_\theta = \frac{4}{9}n^2$ for $n \in \mathbb{Z}$.

Now, using (2), we have the corresponding ODE for the $R$ variable
\begin{align*}
r^2 R” + r R’ + (\lambda r^2 – \frac{4}{9}n^2) R = 0.
\end{align*}
Let $\rho = \sqrt\lambda r$, then $R_r = R_\rho \frac{d\rho}{dr} = \sqrt\lambda R_\rho$ and hence $R_{rr} = \lambda R_{\rho\rho}$, hence
\begin{align*}
\rho^2 R” + \rho R’ + (\rho^2 – \frac{4}{9} n^2) R = 0.
\end{align*}
By the change of variables, we know that $R(\rho) = J_{2/3 n}(\rho)$ where $J$ is the Bessel function.

It remains to impose the boundary condition $R = 0$ at $r = 1$, so
\begin{align*}
R(\sqrt\lambda r) = J_{2/3 n}(\sqrt \lambda r) \qquad J_{2/3 n}(\sqrt{\lambda}) = 0.
\end{align*}
meaning that $\lambda = \alpha_{2/3 n, k}^2$ for $k \ge 1$, which are the eigenvalues.

Scaling Arguments

This is a pretty important concept in PDEs and its numerical approximations. Specifically, tt shows up in Bramble-Hilbert lemma, and domain decomposition analysis.
Most of this post is pretty much written right after reading Toselli and Widlund’s book, so there are a lot of resemblance.

Let $\Omega$ be a bounded domain in $\mathbb{R}^n$ which is ‘nice’ (say Lipschitz boundary) with radius $h$. Now let $u, v \in H^1(\Omega)$ such that
\begin{align*}
|v|_{H^1(\Omega)} \le C||u||_{H^1(\Omega)}
\end{align*}
and we wish to obtain the $h$ dependence from $C$.

What we do is to first consider a scaled domain $\hat \Omega$ which is just $\Omega$ scaled to be of radius 1, with the change of basis $x = h\hat x$.
If we find the corresponding inequality on $\hat \Omega$, then the constant $C$ will not depend on $h$.
Let $\hat v(\hat x) := v(h\hat x)$, then we note that $\hat \nabla \hat v(\hat x) = h\hat \nabla v(h\hat x)$ where $\hat \nabla $ is the gradient with respect to $\hat x$.
Then,
\begin{align*}
|v|^2_{H^1(\Omega)} &= \int_\Omega |\nabla v(x)|^2 \, dx \\
&= \int_{\hat \Omega} |\hat\nabla v(h \hat x)|^2 h^n \, d\hat x \\
&= \int_{\hat \Omega} |\hat\nabla \hat v(\hat x)|^2 h^{-2} h^n \, d\hat x = h^{n-2}|\hat v|_{H^1(\hat \Omega)}^2
\end{align*}

But for $L^2$ norm, there is no $h^2$ scaling, hence
\begin{align*}
||u||_{L^2(\Omega)}^2 &= \int_\Omega |u(x)|^2 \, dx \\
&= \int_{\hat \Omega} |u(h \hat x)|^2 h^n \, d\hat x = h^n ||\hat u||_{L^2(\hat \Omega)}^2.
\end{align*}
This is why derivatives mixing causes scaling issues.

Putnam 2003 A2

Let $a_1, \ldots, a_n$ and $b_1, \ldots, b_n$ be non-negative real numbers. Show that $$(a_1\ldots a_n)^{1/n} + (b_1\ldots b_n)^{1/n} \le [(a_1 + b_1) \cdots (a_n + b_n)]^{1/n}.$$

Solution: we will use the generalized Holder’s inequality which states that
$$||f_1\ldots f_n ||_1 \le ||f_1||_{\lambda_1} \cdots ||f_n||_{\lambda_n}$$
for lambda weights $\lambda_1^{-1} + \cdots + \lambda_n^{-1} = 1$ all greater than 1.

Assuming this is true, let $f_i = (a_i^{1/n}, b_i^{1/n})$ and the norms be the discrete $l^p$ norm. This will give us $||f_1 \ldots f_n||_1 = (a_1\ldots a_n)^{1/n} + (b_1\ldots b_n)^{1/n}$ as everything is non-negative. The weight will be uniform $\lambda_i = n$, then the right hand side will be
$$||f_i||_{n} = (a_i + b_i)^{1/n}$$
and we have our inequality.

The sole remaining thing to prove is the generalized Holder’s inequality. We will assume the famous base case of the two element case. In the inductive case, we have
\begin{align*}
||f_1\cdots f_{n+1}||_1 &\le ||f_1 \cdots f_n||_{\lambda_{n+1}/(\lambda_{n+1} – 1)} ||f_{n+1}||_{\lambda_{n+1}} \\
&= ||(f_1 \cdots f_n)^{\lambda_{n+1}/(\lambda_{n+1} – 1)}||_1^{(\lambda_{n+1} – 1)/\lambda_{n+1}} ||f_{n+1}||_{\lambda_{n+1}}.
\end{align*}
From here, just change the weights and use the inductive case and we are done.

Putnam 2003 A1

Let n be a fixed positive integer. How many ways are there to write n as a sum of positive integers, n = a_1 + a_2 + \cdots + a_k, with k an arbitrary positive integer and a_1 \le a_2 \le \cdots \le a_k \le a_1 + 1? For example with n = 4, there are 4 ways: 4, 2 + 2, 1 + 1+ 2, 1+1+1+1.

Solution: Denote K_n to be the set of tuples of (a_1, \ldots, a_k) with the above properties. We claim that |K_n| = n. We will use induction. It is easy to verify the claim for |K_n| = 1,2,3,4 for n = 1,2,3,4 respectively.

Assume that |K_{l}| = l for some positive integer l, then for a given tuple (a_1, \ldots, a_k) \in K_l we can add 1 to one of the elements in the tuple, and still preserve the property that a_1 \le a_2 \le \cdots \le a_k \le a_1 + 1. If a_1 \not = a_k, then we simply can add 1 where the integers jump, otherwise a_1 = a_2 = \cdots = a_k and we can just have a_k + 1. This gives rise to tuples which are in K_{l+1}. Finally, we have a tuple of 1s to add; this results in |K_{l+1}| = l+1.

We are not done here, as we would need to show that there exists no other tuples in K_{l+1} that we cannot construct as above. This is easy to see, as we can do the inverse operation of subtracting one (with the exception of the tuple of all 1s).