Math/Programming – Page 2

May 18, 2022December 9, 2024

Saddle Point Property

I felt dumb while trying to derive this, so here it is. Sourced from page 129 of Braess FEM textbook.

Let $X, M$ be two Hilbert spaces, and $a: X \times X \to \mathbb{R}, \, b: X \times M \to \mathbb{R}$ continuous bilinear forms.
Assume that $a$ is symmetric and that $a(u, u) \ge 0$ for all $u \in X$.
Let $f \in X’, g \in M’$ and let
\begin{align*}
\mathcal{L}(v, \mu) &= \frac{1}{2}a(v, v) – \langle f, v \rangle + (b(v, \mu) – \langle g, \mu\rangle)
\end{align*}
which is simply the Lagrangian of a constrained minimization problem.
Assume that $(u, \lambda)$ satisfies
\begin{align}
a(u, v) + b(v, \lambda) &= \langle f, v \rangle \qquad &\forall v \in X, \\
b(u, \mu) &= \langle g, \mu \rangle \qquad &\forall \mu \in M,
\end{align}
then one has the saddle point property
\begin{align*}
\mathcal{L}(u, \mu) \le \mathcal{L}(u, \lambda) \le \mathcal{L}(v, \lambda) \qquad \forall (v, \mu) \in X \times M.
\end{align*}

The first inequality is actually an equality by noting that $\mathcal{L}(u, \mu) = \frac{1}{2}a(u, u) – \langle f, u \rangle= \mathcal{L}(u, \lambda)$ by using (2).
For the other inequality, let $v = u + w$, then
\begin{align*}
\mathcal{L}(v, \lambda) = \mathcal{L}(u + w, \lambda) &= \frac{1}{2}a(u + w, u + w) – \langle f, u + w \rangle + (b(u + w, \lambda) – \langle g, \lambda\rangle) \\
&= \mathcal{L}(u, \lambda) + \frac{1}{2}a(w, w) – \langle f, w \rangle + a(u, w) + b(w, \lambda) \\
&= \mathcal{L}(u, \lambda) + \frac{1}{2}a(w, w) \ge\mathcal{L}(u, \lambda)
\end{align*}
where (1) and (2) are used.

May 7, 2022December 9, 2024

A fixed order double integral

I have a good example where copying code straight from Stackoverflow is a terrible idea. Consider a simple rectangle $[0, 1] \times [0, 2]$ which we want to perform a double integral over. Of course, there’s the classic way to use scipy’s $\texttt{dblquad}$

In [1]:

from scipy.integrate import dblquad

def f(x, y): 
    return x + 2 * y

dblquad(lambda y, x: f(x, y), 0, 1, lambda x: 0, lambda x: 2)[0]

Out[1]:

5.0

This above is all and nice, but $\texttt{dblquad}$ is very accurate, and hence slow. What if we don’t need that level of accuracy? We can try to use the $\texttt{fixed_quad}$ function in scipy. Here’s a Stackoverflow question about the exact same thing, with some sample code.

Let’s copy its code and test it. It seems to work:

In [2]:

from scipy.integrate import fixed_quad

fixed_quad(lambda x: fixed_quad(f, 0, 1, args=(x, ), n=5)[0], 0, 2, n=5)[0]

Out[2]:

5.000000000000001

Of course, we have very fast speed up, at least on my laptop

In [3]:

10000 loops, best of 3: 39.3 µs per loop

In [4]:

1000 loops, best of 3: 261 µs per loop

Seems to work well right? It’s too bad it fails for very simple cases…

In [5]:

def g(x, y): 
    return x * y

print(dblquad(lambda y, x: g(x, y), 0, 1, lambda x: 0, lambda x: 2)[0])
print(fixed_quad(lambda x: fixed_quad(g, 0, 1, args=(x, ), n=5)[0], 0, 2, n=5)[0])

0.9999999999999999
1.333333333333333

Ah, what’s going wrong? Turns out it has to deal with the way $\texttt{fixed_quad}$ deals with the array. It doesn’t really evaluate the tensor product of the quadrature points, but rather along some subset of it.

A better implementation is just to rewrite the code myself (or to use something like quadpy):

In [6]:

from scipy.special import roots_legendre
import numpy as np
roots, weights = roots_legendre(5)

def dbl_integral(func, left: int, right: int, bottom: int, top: int):
    """
    Calculates double integral \int_c^d \int_a^b f(x, y) dx dy.
    """
    x = (right - left) * (roots + 1) / 2.0 + left

    y = (top - bottom) * (roots + 1) / 2.0 + bottom

    total = 0
    for index_i, i in enumerate(x):
        # The first ones_like(y) is to ensure we obtain a vector for some of my usages... 
        total += np.dot(np.ones_like(y) * func(i, y) * weights[index_i], weights)

    return (top - bottom) * (right - left) / 4 * total

In [7]:

print(dblquad(lambda y, x: f(x, y), 0, 1, lambda x: 0, lambda x: 2)[0])
print(fixed_quad(lambda x: fixed_quad(f, 0, 1, args=(x, ), n=5)[0], 0, 2, n=5)[0])
print(dbl_integral(f, 0, 1, 0, 2))

print(dblquad(lambda y, x: g(x, y), 0, 1, lambda x: 0, lambda x: 2)[0])
print(fixed_quad(lambda x: fixed_quad(g, 0, 1, args=(x, ), n=5)[0], 0, 2, n=5)[0])
print(dbl_integral(g, 0, 1, 0, 2))

5.0
5.000000000000001
5.0
0.9999999999999999
1.333333333333333
1.0

And we also maintain the speed (though not as fast but it’s not hard to write above in einsum or optimize it a bit more), assuming we precalculate the weights and roots:

In [8]:

1000 loops, best of 3: 246 µs per loop
10000 loops, best of 3: 38.1 µs per loop
10000 loops, best of 3: 74.1 µs per loop

April 7, 2022December 9, 2024

Nonlinear Piola Transform

This is an extension for Linear Piola Transform, where we discuss the cases where the transformation from the reference element to the physical element is nonlinear such as a parametric element. The main tool here is actually from the exterior calculus framework, and will just be from the FEEC book by Arnold.

Let $\Omega, \hat \Omega$ by the physical element and reference element with an orientation-preserving, differentiable map $\phi: \hat \Omega \to \Omega$. Suppose $p: \Omega \to \mathbb{R}, v: \Omega \to \mathbb{R}^3$ are the functions on the physical element. The quantity $(p, \nabla \cdot v) = \int_\Omega p \nabla \cdot v \, dx$ is of importance.

The key observation here is that by letting $\nu = v^1 dx^2 \wedge dx^3 – v^2 dx^1 \wedge dx^3 + v^3 dx^1 \wedge dx^2$, we have $d\nu = v$ where $d$ is the exterior derivative. A key fact here is that with integral of differential forms is it preserves pullbacks (see second answer) hence
\begin{align*}
(p, \nabla \cdot v) = \int_\Omega p \nabla \cdot v \, dx = \int_\Omega p \wedge d\nu = \int_{\hat \Omega} \phi^*(p \wedge d\nu)
\end{align*}
where $\phi^*$ is the pullback of differential form $\phi$.

Now, it’s just a matter of algebra
\begin{align*}
\int_{\hat \Omega} \phi^*(p \wedge d\nu) = \int_{\hat \Omega} \phi^*p \wedge \phi^* d\nu = \int_{\hat \Omega} \phi^*p \wedge d\phi^*\nu.
\end{align*}
Now, we need to look up what the pullback does to a 0-form $p$ and on the 2-form $\nu$. Well, we can look this up or just algebraically do it (which I’ll do it sooner or later because I couldn’t find a good source online), we have $\phi^* p = p \cdot \phi$ and
\begin{align*}
\phi^* \nu = (\det \phi’ )(\phi’)^{-1} (v \cdot \phi): \hat \Omega \to \mathbb{R}^3
\end{align*}
which is exactly the Piola transform.

February 9, 2022December 9, 2024

Simple Neural ODE Code

This is a quick and dirty demonstration of training Neural ODEs using Jax. The paper is here. I will be following their notation and functional syntax for easy comparison.

In [1]:

import jax.numpy as jnp
import numpy as np
from jax import grad, jacfwd, jit
from matplotlib import pyplot as plt
from typing import Callable
from tqdm.notebook import tqdm


from IPython import display

Let’s first define some a simple ODE and solve/plot it using a forward Euler time step as proof of concept. Note that all the sophisticated explicit time steppers basically are combinations of Euler’s. At the same time, we will not be using any sophisticated Jax functions for now. Our ODE is of the form $z_t = \theta z$ where $\theta$ is a 2 by 2 matrix.

In [2]:

theta_true = jnp.array([[0.0, 1.0], [-1.0, 0.0]])

def f(u: jnp.ndarray, t: jnp.float32, theta: jnp.ndarray = theta_true) -> jnp.ndarray: 
    return theta @ u

def ODESolve(z_t0: jnp.ndarray, f: Callable, t0: jnp.float32, t1: jnp.float32, theta: jnp.ndarray) -> jnp.ndarray:
    delta_t = 0.001
    N = int((t1 - t0) / delta_t)
    z_holder = jnp.array(z_t0)
    for i in range(N): 
        z_holder += delta_t * f(z_holder, i * delta_t, theta)
    
    return z_holder

WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

First, let’s make some benchmark data and fake data with only the initial condition off. Are we able to recover the true $z_{t_0}$?

In [3]:

z_t0_true = jnp.array([0, 1.0])
z_t1_true = ODESolve(z_t0_true, f, 0.0, 1.0, theta_true)
z_t0_noisy = jnp.array([0, 1.1])
z_t1_noisy = ODESolve(z_t0_noisy, f, 0.0, 1.0, theta_true)

Now, let’s make a loss function, and see if we can make a gradient decent algorithm such that we can find the true initial condition.

In [4]:

def L(z_t1: jnp.ndarray): 
    return jnp.linalg.norm(z_t1 - z_t1_true) ** 2 / 2

Before proceeding, let’s note that Jax allows one to find gradients and Jacobians very easily!

In [5]:

# partial L/ partial z_t1
plpzt1_noisy = grad(L)(z_t1_noisy)

# Jacobians with respect to z, and theta respectively
f_z = jacfwd(f, 0)
f_theta = jacfwd(f, 2)

Now, we need to find the gradients of L and f at specific points and places. This is simply algorithm 1 of the paper

In [6]:

def algorithm1(theta: jnp.ndarray, t0: jnp.float32, t1: jnp.float32, z_t1: jnp.ndarray, plpzt1: jnp.ndarray):
    # Define initial augmented state
    s0 = jnp.concatenate((z_t1, plpzt1, jnp.zeros_like(theta).flatten()))

    def aug_dynamics(u: jnp.ndarray, t: jnp.float32, theta: jnp.ndarray): 
        """
        u consists of z(t), a(t) and theta (4 by 4), also negative since I can't do backwards lol
        """
        return -jnp.concatenate((f(u[0:2], t, theta), 
                                - u[2:4].T @ f_z(u[0:2], t, theta) , 
                                (-u[2:4].T @ f_theta(u[0:2], t, theta)).flatten()))
    
    output = ODESolve(s0, aug_dynamics, 0.0, 1.0, theta_true)
    
    # Split the augmented data back to our regular inputs
    return output[0:2], output[2:4], output[4:].reshape((2,2))
    
print(algorithm1(theta_true, 0.0, 1.0, z_t1_noisy, plpzt1_noisy))

(DeviceArray([1.0890653e-06, 1.1010987e+00], dtype=float32), DeviceArray([1.3050158e-07, 1.0009895e-01], dtype=float32), DeviceArray([[0.03007106, 0.03902269],
             [0.03902263, 0.08009262]], dtype=float32))

Note that the derivative with respect to the initial condition $\partial L/ \partial z(t_0)$ is essentially $(0, .1)$ which indicates that we have succesfully obtained the correct gradient. Note that two points is not enough to properly define a ODE and we can certainly change the dynamics to go from z_t0_noisy to z_t1_true by simply changing theta.

Let’s encapsulate the above in a gradient descent algorithm… which is a bit slow…

In [7]:

my_theta = jnp.array(theta_true)
learning_rate = 0.1

def train(initial_z0, initial_theta): 
    # Make copies because I don't know how Jax works
    z_t0 = jnp.array(initial_z0)
    theta = jnp.array(initial_theta)
    
    for i in tqdm(range(5)): 
        # Given current initial conditions, let's 
        z_t1 = ODESolve(z_t0, f, 0.0, 1.0, theta)

        plpzt1 = grad(L)(z_t1)

        _, plpzt0, plptheta = algorithm1(theta, 0.0, 1.0, z_t1, plpzt1)

        z_t0 = z_t0 - learning_rate * plpzt0
        theta = theta - learning_rate * plptheta.reshape((2,2))
    
    return z_t0, theta
    
print(train(z_t0_noisy, theta_true))

(DeviceArray([8.1869512e-06, 1.0662646e+00], dtype=float32), DeviceArray([[-0.00988956,  0.9871685 ],
             [-1.0128397 , -0.02635211]], dtype=float32))

This seems to work! Success.

Note that the above code is rather slow, so let’s see how we can use jax things to speed it up! Using the XLA, and compiled code with jit, we see that using a 3rd order solver is even faster than the Euler time step by 10 folds!

In [8]:

from jax import lax

@jit
def f(u: jnp.ndarray, t: jnp.float32, theta: jnp.ndarray = theta_true) -> jnp.ndarray: 
    return theta @ u

def ODESolve_lax(z_t0: jnp.ndarray, f: Callable, t0: jnp.float32, t1: jnp.float32, theta: jnp.ndarray) -> jnp.ndarray:
    """
    We use the lax loop instead, which makes things faster so we use a third order RK3... which is still faster
    """
    delta_t = 0.001
    N = jnp.floor_divide(t1 - t0, delta_t).astype(jnp.int32)
    z_holder = jnp.array(z_t0)
    
    def rk3(i: jnp.int32, val: jnp.ndarray):
        k1 = f(val, i * delta_t, theta)
        k2 = f(val + delta_t * k1, i * delta_t + delta_t, theta)
        k3 = f(val + delta_t * (1 / 4 * k1 + 1 / 4 * k2), i * delta_t + 1 / 2 * delta_t, theta)
        return val + delta_t * (k1 / 6 + k2 / 6 + 2 * k3 / 3)
    
    return lax.fori_loop(0, N, rk3, z_t0)

In [9]:

647 ms ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
57.1 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [10]:

print(ODESolve(z_t0_noisy, f, 0.0, 2 * jnp.pi, theta_true))
print(ODESolve_lax(z_t0_noisy, f, 0.0, 2 * jnp.pi, theta_true))

[-2.0616804e-04  1.1034613e+00]
[-2.0406539e-04  1.1000000e+00]

Now that ODESolve_lax is much faster (this is only on CPU… performance should be better if we export to GPU), we can try to apply jit to other functions to make it faster, and try to fit an actual ODE with mulitple points. First let’s define a loss function, and use jit to define the Jacobians:

In [11]:

@jit
def L(z_t1: jnp.ndarray, z_t1_true: jnp.ndarray): 
    """
    Simple function; easy to jit
    """
    return jnp.linalg.norm(z_t1 - z_t1_true) ** 2 / 2

In [12]:

f_z = jit(jacfwd(f, 0))
f_theta = jit(jacfwd(f, 2))

Now, rewriting algorithm1 with jit is fairly easy…

In [13]:

@jit
def algorithm1(theta: jnp.ndarray, t0: jnp.float32, t1: jnp.float32, z_t1: jnp.ndarray, plpzt1: jnp.ndarray):
    # Define initial augmented state
    s0 = jnp.concatenate((z_t1, plpzt1, jnp.zeros_like(theta).flatten()))
    
    # Just an index for the length of the problem 
    z_size = len(z_t1)
    
    @jit
    def aug_dynamics(u: jnp.ndarray, t: jnp.float32, theta: jnp.ndarray): 
        """
        u consists of z(t), a(t) and theta, also negative since I can't do backwards lol
        """
        z = u[0:z_size]
        a = u[z_size:2 * z_size]
        return -jnp.concatenate((f(z, t, theta), 
                                - a.T @ f_z(z, t, theta) , 
                                (-a.T @ f_theta(z, t, theta)).flatten()))
    

    output = ODESolve_lax(s0, aug_dynamics, t0, t1, theta_true)
    return output[0:z_size], output[z_size:2 * z_size], output[2 * z_size:].reshape((z_size, z_size))

print(algorithm1(theta_true, 0.0, 1.0, z_t1_noisy, plpzt1_noisy))

(DeviceArray([1.1002590e-03, 1.1005477e+00], dtype=float32), DeviceArray([0.00010013, 0.10004898], dtype=float32), DeviceArray([[0.03002402, 0.03898256],
             [0.03898243, 0.07997467]], dtype=float32))

As written the above algorithm doesn’t support batching, which is a huge pain in the butt. I couldn’t get vmap from jax to work out of the box but we won’t need it here.

Now the gradient formula is much faster. Let’s make a cooler true ODE solution, and plot it

In [14]:

theta_true = jnp.array([[-.1, .9], [-.9, -.2]])
init_true = jnp.array([0.0, 1.0])

total_time = 4 * jnp.pi
num_data = 20

@jit
def predict(initial_point: jnp.ndarray, theta: jnp.ndarray, total_time = 4 * jnp.pi, num_data = 20): 
    # jnp arrays are immutable
    prediction = jnp.zeros((num_data, 2))
    prediction = prediction.at[0, :].set(initial_point)
    for i in range(1, num_data): 
        prediction = prediction.at[i, :].set(ODESolve_lax(prediction[i - 1, :], f, total_time / num_data * i, 
                                        total_time / num_data * (i + 1), theta))

    return jnp.array(prediction)


all_data = predict(init_true, theta_true)

# TODO: we might want to add some sort of noise in the future
plt.plot(all_data[:, 0], all_data[:, 1], '-o')

Out[14]:

[<matplotlib.lines.Line2D at 0x7f30fc134c18>]

With the data done, we want to recover the both the dynamics and the initial condition. I think the ODE demo on the author github only recovers dynamics.

Since we care about the exact initial conditions for the above dynamics, it only makes sense that all our samples which alters the initial conditions should contain our dirty initial conditions. The following training loop does exactly that; it takes some sample of $(t_0, t_K)$ where $K$ is some random number (could be a batch) and then alters the parameters due to that.

The initial theta is very sensitive; this sort of model can’t really optimize over bifurcations very well it seems so if the model doesn’t converge, just try to run again.

In [15]:

# We start with some initial parameter and points
my_theta = jnp.array([[0.0, 1.0], [-1.0, 0.0]]) + np.random.normal(0, .4, size=(2,2))
initial_point = init_true + np.random.normal(0, .2, size=(2,))

# Custom gradient descent; no need for Adams in this case hopefully
learning_rate = 0.1

gradL = jit(grad(L))
batch_size = 3
epochs = 1000

for epoch in tqdm(range(epochs)): 
    if epoch 
        z = predict(initial_point, my_theta)
        plt.plot(z[:, 0], z[:, 1], alpha=(epoch / epochs) * .7 + 0.3 )
        plt.xlim([-.9, 0.9])
        plt.ylim([-1.0, 1.4])
        plt.title(f"Epoch {epoch}; Loss {L(z, all_data)}")
        display.clear_output(wait=True)
        display.display(plt.gcf())
    
    # Once we get close enough it's fine
    if L(z, all_data) < 1e-4: 
        break
    
    # Choose batch of time 
    times_indices = np.random.choice(np.arange(1, len(all_data)), size=batch_size, replace=False)
    
    for ind in times_indices:
        my_point = ODESolve_lax(initial_point, f, 0, ind * total_time / num_data, my_theta)
        
        # Get gradient # Need to batch this ultimately at some point 
        _, plpz0, plptheta = algorithm1(
            my_theta, 
            0.0, 
            ind * total_time / num_data, 
            my_point, 
            gradL(my_point, all_data[ind, :])
        )
        
        # Apply gradient 
        initial_point = initial_point - learning_rate * plpz0
        my_theta = my_theta - learning_rate * plptheta.reshape((2,2))
        
plt.plot(all_data[:, 0], all_data[:, 1], '-o', linestyle='dashed', alpha=.9)

Out[15]:

December 28, 2021December 9, 2024

Linear Piola Transforms

I’m too lazy to convert the LaTeX to WP friendly, so here it is as a PDF.

November 23, 2021December 9, 2024

Maxwell’s Equations

This is a “short” note deriving the Maxwell’s equations from first principles and experiments of physics; I will assume that one is familiar with div-curl-grad and all that. This is essentially an extremely condensed version of Griffith’s textbook which discusses the fundamental laws. Some level of rigor is dropped for conciseness.

1. Electrostatics We start with the idea of electrostatics, which studies the forces exerted on charged particle(s) from static (non-moving) sources. The governing equation is Couloumb’s law
\begin{align}\label{eqn:coul}
F = \frac{1}{4\pi\varepsilon_0} \frac{q_1q_2}{r^2} \vec{r}
\end{align}
where $q_i$ are the charges, $\varepsilon_0$ is some constant (permittivity of free space) and $\vec r$ is the vector between the two charges. The sign is repulsive if $q_1, q_2$ have the same charge and vice versa.

Suppose we have a single charge $Q$. If there exists a multiple sources affecting $Q$, we can simply take the sum of them (superposition). Similarly, we can take the integral if the sources has a distribution over some $n$-dimensional volume; this leads to the concept of the electric field, defined as
\begin{align}\label{eqn:elec}
E(r) = \frac{1}{4\pi\varepsilon_0} \sum_{k=1}^n\frac{q_k}{r_k^2} \vec{r_k} \to E(r) = \frac{1}{4\pi\varepsilon_0} \int \frac{1}{r^2} \vec r \, dq
\end{align}
where $dq$ is the charge per unit volume. I am abusing a bit of notation here with $E(r) \sim \int \frac{1}{r^2} \vec r \, dq$, but the electric field has a value at any point in space and $r^2$ term is simply the distance. With \cref{eqn:elec}, one can calculate the force on $Q$ as $F = QE(r)$.

Focusing on a single charge, suppose that it’s a positive charge at the origin; the electric field $E(r)$ of this configuration radiates away from the origin. If we think about it, we see the total flux of the electric field through a spherical (in fact any shape $S$) surface surrounding the single charge only depends on the value of the charge inside. This is equivalent to saying
\begin{align}
\oint_S E \cdot da = \frac{1}{\varepsilon_0} Q_{enc}.
\end{align}
Applying divergence theorem, we have that
\begin{align*}
\oint_S E \cdot da = \int_V \nabla \cdot E \, dV = \frac{1}{\varepsilon_0}\int_V \rho dV
\end{align*}
where $Q_{enc} = \int_V \rho$ with $\rho$ the charge density inside the volume, leading to Gauss’s law:
\begin{align}
\nabla \cdot E = \frac{1}{\varepsilon_0} \rho.
\end{align}

What about the curl of the electric field $E(r)$? If we assume that the charges aren’t moving, then we actually have $\nabla \times E = 0$; the proof is by visualization from a single point charge as the fields point outwards. Note again, that this is not true once we introduce motion since magnetism arises.

2. Magnetostatics Moving to the more difficult magnetism aspect, the key concept here is that moving charges generates a magnetic field $B$. So besides the electric field, a moving charge (which is related to the concept of current) produces a magnetic field $B$. In particular, along a wire, the magnetic field satisfies the right hand rule meaning we expect to see cross products here. Thus, for two wires parallel to each other with current running in the same direction, using the right hand rule, we see that the wires will attract.

In fact, Lorentz force law (axiom of the theory) states that the force on a charge $Q$, moving with velocity $v$ in a field of $B$ is
\begin{align*}
F_{mag} = Q(v \times B).
\end{align*}
Note that since the velocity is cross-producted with $B$, the resulting force is perpendicular to the velocity, meaning no work is done by the magnetic force since it can only change direction of the charge but not the speed. The total force, including electric force, is
\begin{align*}
F = Q(E + (v \times B)).
\end{align*}

Before moving, on we have to define a “current” which is charge per unit time passing a single point with a unit of amps or coulombs per second. A current carrying wire with charge moving at velocity $v$ and charge $\lambda$ is $I = \lambda v$. If the charge flow is over a surface/volume, we use the unit “surface/volume current density” to describe it.
The formula is at velocity $v$ and density $\sigma$ then $J = \rho v$ where $J$ is called the volume current density and $\rho$ volume charge density. Calculating the magnetic force on an volume is
\begin{align*}
F_{mag} = \int (v \times B) \rho \, dV = \int (J \times B) \, dV
\end{align*}
with appropriate changes for lower dimensional entities.

The total current crossing a surface $S$ is simply
\begin{align*}
I = \int_S J \cdot da.
\end{align*}
In particular, the charge per unit time leaving a volume is
\begin{align*}
\oint_S J \cdot da = \int_V (\nabla \cdot J) \, dV
\end{align*}
by divergence theorem. Since charge is conserved, after all, one can visualize them as little particles, the flow outside must come from inside, meaning we have the “continuity equation”
\begin{align}\label{eqn:cont}
\nabla \cdot J = – \frac{\partial \rho}{\partial t}.
\end{align}
In the study of magnetostatics, we assume that $\frac{\partial \rho}{\partial t} = 0$.

This allows us to discuss magnetostatics, where instead of stationary charges like electrostatics, we have steady currents $\frac{\partial J}{\partial t} = 0$. These types of situations don’t arise in experiments, but it’s oddly accurate even in household applications. The corresponding Couloumb’s law here is called Biot-Savart law, given by
\begin{align}\label{eqn:B}
B(r) = \frac{\mu_0}{4\pi} \int \frac{J(r’) \times \vec r}{r^2} \, dV’
\end{align}
on a volume where $\mu_0$ is called the permeability of free space with the units coming out of the $B$ is in terms of teslas $T$ (or gauss) which is Newton per amp-meter. We also abused notation here where $r^2$ is the distance and $\vec r$ is the direction.

In the most basic case of magnetostatics, we consider a single wire with current (comparable to a single point charge in electrostatics). The magnetic field lines are simply circles around the wire meaning the curl is non-zero. One can find from calculation is that
\begin{align*}
\oint B \, dl = \mu_0 I
\end{align*}
where we are integrating a circular path of radius $s$ around the wire; this generalizes by superposition to multiple wires carrying current. In fact, the domain doesn’t matter, as long as it goes around the wire as the magnetic field loses strength at the same rate of increase from the circumference/perimeter. Now, the current $I$ enclosed by the volume can be expressed as
\begin{align*}
I = \int J \cdot dA
\end{align*}
where $J$ is the volume current density, meaning applying Stokes theorem gives us
\begin{align*}
\nabla \times B = \mu_0 J
\end{align*}
The above is a nice thought experiment, but it doesn’t generalize lol. One of the assumptions made (which is not obvious) is that the wire is of infinite straight wires! It is better to look at the Bio-Savart law itself.

We really want to look at \cref{eqn:B}. Note that $B$ is a function of $(x,y,z)$, the current distribution depends on $(x’, y’, z’)$, while $r$ is the distance between the point and the tilde points with the integral over the tilde; a key note is the div and curl of $B$ are over the unprimed coordinates.

With some amount of work using product rules and all that, one can show that $\nabla \cdot B = 0$, and taking a curl results in
\begin{align*}
\nabla \times B = \mu_0 J(r) \rightarrow \oint B \cdot dI = \mu_0 I_{enc}
\end{align*}
which is called Ampere’s law (so our above derivation is actually correct!).

Let’s do a quick review of magnetostatics and electrostatics:

[Electrostatics]: Gauss’s law discusses the divergence and of the electric field, and the curl of it is always zero. These are called Maxwell’s equations for electrostatics. Essentially derived from Coulomb’s law plus superposition.
[Magnetostatics]: Ampere’s law discusses the curl of the magnetic field, while the divergence is zero. Again, these are Maxwell’s equations and derived from Biot-Savart law.

There’s more things to discuss, like the potential for magnetism, but we will skip it to move onto more interesting stuff.

3. Electrodynamics When there’s a current, there needs to a be a force moving those charges. Apparently, for most substances, one has
\begin{align*}
J = \sigma f
\end{align*}
where $J$ is the current density, $f$ force per unit charge, and $\sigma$ is a proportionality factor related to the conductivity/resistivity of a matter. For our purposes (e.g. not chemical or gravitational or nuclear), we have
\begin{align*}
J = \sigma (E + v \times B)
\end{align*}
but a good first-order approximation, since $v$ is usually small, is $J = \sigma E$ (called Ohm’s law usually written as $V = IR$).

Another way of describing this force is called the electromotive force, or emf, of the circuit. The emf is not a force, but rather defined as
\begin{align*}
\mathcal{E} = \oint f \cdot dl
\end{align*}
which is really force per unit charge. Another interpretation is it’s the work done per unit charge by the source (such as a battery). From this again, one can easily tie in what a generator is which uses motional emfs as the principle. The action of moving a wire through a magnetic field generates an emf of $\mathcal{E} = vBh$ where $h$ is the length of the wire, $v$ is the velocity and $B$ the magnetic field; this is very much an interpretation of work. Indeed, if we let $\Phi$ be the flux of the $B$ through the loop of wire, then $\mathcal{E} = -\frac{d\Phi}{dt}$.

A key concept of electrodynamics is the fact that a changing magnetic field induces an electric field. Through experimentation, this relation can be better quantified as $\oint E dl = – \int \frac{\partial B}{\partial t} da$ which means that, by Stokes’ theorem, $\nabla \times E = – \frac{\partial B}{\partial t}$; this is called Faraday’s law. This generalizes electrostatic to be time-dependent regime. With Ampere’s law, we can talk about Maxwell’s contribution, which at the time, was
\begin{align*}
\nabla \cdot E &= \frac{1}{\varepsilon_0} \rho, \\
\nabla \cdot B &= 0, \\
\nabla \times E &= – \frac{\partial B}{\partial t}, \\
\nabla \times B &= \mu_0 J.
\end{align*}

The problem with the above formula is that it’s not consistent with simple exterior calculus rules. In particular, div of curl should be zero, but the divergence of the curl of the magnetic field is not zero. Of course, for steady current, $\nabla \cdot J = 0$, but in general no.

The problem is that $\nabla \cdot J$ isn’t zero; we can rewrite this term using \cref{eqn:cont}
\begin{align*}
\nabla \cdot J = – \frac{\partial \rho}{\partial t} = – \frac{\partial}{\partial t}(\varepsilon_0 \nabla \cdot E) = – \nabla \cdot (\varepsilon_0 \frac{\partial E}{\partial t}).
\end{align*}
It goes without saying that just adding the above term will kill the divergence term! Lab experiments couldn’t find this term since $J$ is quite large usually, but arises in so called electromagnetic waves.

November 5, 2021December 9, 2024

KKT Conditions

I need to relearn optimization, so here’s my incredibly short review.

The most basic problem to solve is to minimize $f(x)$ such that $g(x) = 0$. Since the constraint is an equality condition, we use the Lagrange multiplier. We look at the Lagrangian function $\mathcal{L}(x, \lambda) = f(x) – \lambda g(x)$; it’s easy to see that the minima of the original problem satisfies some sort of saddle point condition. This Lagrangian minimization problem can be solved by taking the gradients and setting it equal to 0.

As an example, let’s consider the curl-curl problem I’ve been looking at.
\begin{align*}
A u &= f, \\
B u &= 0
\end{align*}
with some appropriate boundary conditions which I will skip. The matrix $A$ corresponds to a curl-curl operator in strong form and $B$ is a div operator. The zero-divergence condition on the function is critical for the physics; $u$ should be thought of as a magnetic field and thus satisfy Gauss’s law.

Since $A$ is positive semi-definite, this is really a minimization problem $\min J(u) = u^T A u/2 – f^T u$ with corresponding Lagrangian $\mathcal{L}(u, \lambda) = u^T A u/ 2 – f^T u – \lambda^T Bu$ (note that in this case, $\lambda$ is a vector). The gradient with respect to the multiplier gives $Bu = 0$ as expected, and the gradient with respect to $u$ gives $Au – B^T\lambda = f$. Combining these gives the saddle point problem that we are familiar with. We can also obtain this sort of result using the functional formulation; without diving into too much details, it’s a similar process except with the Euler-Lagrange equation.

So what in the world is KKT conditions then? It’s just the generalization of Lagrange multipliers to inequalities… Specifically, one forms the Lagrangian again, and then set all the gradients equal to 0 and magically we get the correct minimum. Now looking back on this, this is a really strong result. But man, the way the econ professors taught this was tragically bad.

June 22, 2021December 9, 2024

On a Theorem by Seeley

In my current work, I use the eigenfunctions of the Laplacian: $\varphi_k, \lambda_k \in H^1_0, \mathbb{R}^+$ satisfying
\begin{align*}
-\Delta \varphi_k = \lambda_k \varphi_k.
\end{align*}
It is well known that $\{\varphi_k\}_{k=1}^\infty$ provides an orthonormal basis for $L^2$, and is also orthogonal in $H^1$. Hence, any function $f \in L^2$ can be expressed as $f = \sum_{k=1}^\infty f_k \varphi_k$ where $f_k = (f, \varphi_k)$.

Unfortunately, there are not many properties which can be derived from this eigenfunction expansion. Besides the fact that the squared $L^2$ norm of $f$ is simply $\sum_{k=1}^\infty f_k^2$, and the $H^1$ norm is $\sum_{k=1}^\infty f_k^2 \lambda_k$, the connection between regularity and expansion is tenuous at best.

I was excited about the paper Eigenfunction Expansions of Analytic Functions by Seeley. In it, the author claimed to have derived a theorem giving necessary and sufficient conditions on analyticity and the eigenfunction expansion: a function $f$ is analytic iff $\sum_{k=1}^\infty s^{\sqrt\lambda_k} f_k^2 < \infty$ or $\{s^{\sqrt{\lambda_k}} |f_k| \}$ is bounded for some $s > 1$. Unfortunately, I don’t think it is an iff.

In particular, on a square with $f = 1$. We know the coefficients are
\begin{align*}
f_{ij} = \frac{2 \left((-1)^i-1\right) \left((-1)^j-1\right)}{\pi ^2 i j} \approx \frac{1}{ij}
\end{align*}
and so the theorem is stating that
\begin{align*}
s^{\sqrt{\lambda_{mm}}} f_{mm} &= \frac{2 \left((-1)^m-1\right)^2 s^{\sqrt{2} \pi m}}{\pi ^2 m^2} \\
&\approx \frac{s^{\sqrt{2} \pi m}}{m^2} \to \infty
\end{align*}
as $m\to \infty$ which is clearly unbounded.

Rather, more conditions needs to be imposed on the theorem. It is not hard to show that any function satisfying $\sum_{k=1}^\infty s^{\sqrt\lambda_k} f_k^2 < \infty$ is in any $\mathbb{H}^s := \{ u \in L^2 | \sum_{k=1}^\infty u_k^2 \lambda_k^s < \infty \}$ space for $s \ge 0$. Coincidentally, we know $\mathbb{H}^s = H^s_0$ for $1 > s > 1/2$ by an interpolation argument, meaning that at the minimum, our functions need to vanish at the boundary.

October 29, 2020December 9, 2024

Eigenfunctions and Eigenvalues of the Laplacian of the “Pacman” Domain

We will derive eigenfunctions and eigenvalues on a Pacman domain, which in polar coordinates is $\Omega = \{(r, \theta) : r \in [0, 1], \theta \in [0, 3\pi/2]\}$.
The problem is
\begin{align*}
-\Delta u &= \lambda u \qquad \Omega\\
u &= 0 \qquad \partial \Omega
\end{align*}

In polar coordinates, the Laplacian is
\begin{align}
\Delta = \frac{\partial^2 }{\partial r^2} + \frac{1}{r} \frac{\partial}{\partial r} + \frac{1}{r^2} \frac{\partial^2}{\partial \theta^2}.
\end{align}
Thus, using separation of variables $u(r, \theta) = R(r) \Theta(\theta)$ where $R(1) = 0, \Theta(0) = \Theta(3\pi/2) = 0$, we have
\begin{align*}
\Delta u &= \Theta R” + \frac{1}{r} R’ \Theta + \frac{1}{r^2} R \Theta ” = -\lambda R \Theta.
\end{align*}
Simplifying, we have
\begin{align}\label{eqn:sum0}
\frac{r^2 R” + r R’ + \lambda r^2 R}{R} + \frac{\Theta ”}{\Theta} = 0.
\end{align}
In order for the above to be satisfied, we need each term to be constant, so assume that
\begin{align*}
\frac{\Theta”}{\Theta} = -\lambda_\theta
\end{align*}
where $-\lambda_\theta$ is a constant.
Taking into account the boundary condition, we know that
\begin{align*}
\Theta(\theta) = \sin\left(\frac{2}{3}n \theta \right)
\end{align*}
and $\lambda_\theta = \frac{4}{9}n^2$ for $n \in \mathbb{Z}$.

Now, using (2), we have the corresponding ODE for the $R$ variable
\begin{align*}
r^2 R” + r R’ + (\lambda r^2 – \frac{4}{9}n^2) R = 0.
\end{align*}
Let $\rho = \sqrt\lambda r$, then $R_r = R_\rho \frac{d\rho}{dr} = \sqrt\lambda R_\rho$ and hence $R_{rr} = \lambda R_{\rho\rho}$, hence
\begin{align*}
\rho^2 R” + \rho R’ + (\rho^2 – \frac{4}{9} n^2) R = 0.
\end{align*}
By the change of variables, we know that $R(\rho) = J_{2/3 n}(\rho)$ where $J$ is the Bessel function.

It remains to impose the boundary condition $R = 0$ at $r = 1$, so
\begin{align*}
R(\sqrt\lambda r) = J_{2/3 n}(\sqrt \lambda r) \qquad J_{2/3 n}(\sqrt{\lambda}) = 0.
\end{align*}
meaning that $\lambda = \alpha_{2/3 n, k}^2$ for $k \ge 1$, which are the eigenvalues.

August 14, 2018August 16, 2018

Scaling Arguments

This is a pretty important concept in PDEs and its numerical approximations. Specifically, tt shows up in Bramble-Hilbert lemma, and domain decomposition analysis.
Most of this post is pretty much written right after reading Toselli and Widlund’s book, so there are a lot of resemblance.

Let $\Omega$ be a bounded domain in $\mathbb{R}^n$ which is ‘nice’ (say Lipschitz boundary) with radius $h$. Now let $u, v \in H^1(\Omega)$ such that
\begin{align*}
|v|_{H^1(\Omega)} \le C||u||_{H^1(\Omega)}
\end{align*}
and we wish to obtain the $h$ dependence from $C$.

What we do is to first consider a scaled domain $\hat \Omega$ which is just $\Omega$ scaled to be of radius 1, with the change of basis $x = h\hat x$.
If we find the corresponding inequality on $\hat \Omega$, then the constant $C$ will not depend on $h$.
Let $\hat v(\hat x) := v(h\hat x)$, then we note that $\hat \nabla \hat v(\hat x) = h\hat \nabla v(h\hat x)$ where $\hat \nabla $ is the gradient with respect to $\hat x$.
Then,
\begin{align*}
|v|^2_{H^1(\Omega)} &= \int_\Omega |\nabla v(x)|^2 \, dx \\
&= \int_{\hat \Omega} |\hat\nabla v(h \hat x)|^2 h^n \, d\hat x \\
&= \int_{\hat \Omega} |\hat\nabla \hat v(\hat x)|^2 h^{-2} h^n \, d\hat x = h^{n-2}|\hat v|_{H^1(\hat \Omega)}^2
\end{align*}

But for $L^2$ norm, there is no $h^2$ scaling, hence
\begin{align*}
||u||_{L^2(\Omega)}^2 &= \int_\Omega |u(x)|^2 \, dx \\
&= \int_{\hat \Omega} |u(h \hat x)|^2 h^n \, d\hat x = h^n ||\hat u||_{L^2(\hat \Omega)}^2.
\end{align*}
This is why derivatives mixing causes scaling issues.