A fixed order double integral

I have a good example where copying code straight from Stackoverflow is a terrible idea. Consider a simple rectangle $[0, 1] \times [0, 2]$ which we want to perform a double integral over. Of course, there’s the classic way to use scipy’s $\texttt{dblquad}$

In [1]:
from scipy.integrate import dblquad

def f(x, y): 
    return x + 2 * y

dblquad(lambda y, x: f(x, y), 0, 1, lambda x: 0, lambda x: 2)[0]
Out[1]:
5.0

This above is all and nice, but $\texttt{dblquad}$ is very accurate, and hence slow. What if we don’t need that level of accuracy? We can try to use the $\texttt{fixed_quad}$ function in scipy. Here’s a Stackoverflow question about the exact same thing, with some sample code.

Let’s copy its code and test it. It seems to work:

In [2]:
from scipy.integrate import fixed_quad

fixed_quad(lambda x: fixed_quad(f, 0, 1, args=(x, ), n=5)[0], 0, 2, n=5)[0]
Out[2]:
5.000000000000001

Of course, we have very fast speed up, at least on my laptop

In [3]:

10000 loops, best of 3: 39.3 µs per loop
In [4]:

1000 loops, best of 3: 261 µs per loop

Seems to work well right? It’s too bad it fails for very simple cases…

In [5]:
def g(x, y): 
    return x * y

print(dblquad(lambda y, x: g(x, y), 0, 1, lambda x: 0, lambda x: 2)[0])
print(fixed_quad(lambda x: fixed_quad(g, 0, 1, args=(x, ), n=5)[0], 0, 2, n=5)[0])
0.9999999999999999
1.333333333333333

Ah, what’s going wrong? Turns out it has to deal with the way $\texttt{fixed_quad}$ deals with the array. It doesn’t really evaluate the tensor product of the quadrature points, but rather along some subset of it.

A better implementation is just to rewrite the code myself (or to use something like quadpy):

In [6]:
from scipy.special import roots_legendre
import numpy as np
roots, weights = roots_legendre(5)

def dbl_integral(func, left: int, right: int, bottom: int, top: int):
    """
    Calculates double integral \int_c^d \int_a^b f(x, y) dx dy.
    """
    x = (right - left) * (roots + 1) / 2.0 + left

    y = (top - bottom) * (roots + 1) / 2.0 + bottom

    total = 0
    for index_i, i in enumerate(x):
        # The first ones_like(y) is to ensure we obtain a vector for some of my usages... 
        total += np.dot(np.ones_like(y) * func(i, y) * weights[index_i], weights)

    return (top - bottom) * (right - left) / 4 * total
In [7]:
print(dblquad(lambda y, x: f(x, y), 0, 1, lambda x: 0, lambda x: 2)[0])
print(fixed_quad(lambda x: fixed_quad(f, 0, 1, args=(x, ), n=5)[0], 0, 2, n=5)[0])
print(dbl_integral(f, 0, 1, 0, 2))

print(dblquad(lambda y, x: g(x, y), 0, 1, lambda x: 0, lambda x: 2)[0])
print(fixed_quad(lambda x: fixed_quad(g, 0, 1, args=(x, ), n=5)[0], 0, 2, n=5)[0])
print(dbl_integral(g, 0, 1, 0, 2))
5.0
5.000000000000001
5.0
0.9999999999999999
1.333333333333333
1.0

And we also maintain the speed (though not as fast but it’s not hard to write above in einsum or optimize it a bit more), assuming we precalculate the weights and roots:

In [8]:



1000 loops, best of 3: 246 µs per loop
10000 loops, best of 3: 38.1 µs per loop
10000 loops, best of 3: 74.1 µs per loop

Nonlinear Piola Transform


 

This is an extension for Linear Piola Transform, where we discuss the cases where the transformation from the reference element to the physical element is nonlinear such as a parametric element. The main tool here is actually from the exterior calculus framework, and will just be from the FEEC book by Arnold.

Let $\Omega, \hat \Omega$ by the physical element and reference element with an orientation-preserving, differentiable map $\phi: \hat \Omega \to \Omega$. Suppose $p: \Omega \to \mathbb{R}, v: \Omega \to \mathbb{R}^3$ are the functions on the physical element. The quantity $(p, \nabla \cdot v) = \int_\Omega p \nabla \cdot v \, dx$ is of importance.

The key observation here is that by letting $\nu = v^1 dx^2 \wedge dx^3 – v^2 dx^1 \wedge dx^3 + v^3 dx^1 \wedge dx^2$, we have $d\nu = v$ where $d$ is the exterior derivative. A key fact here is that with integral of differential forms is it preserves pullbacks (see second answer) hence
\begin{align*}
(p, \nabla \cdot v) = \int_\Omega p \nabla \cdot v \, dx = \int_\Omega p \wedge d\nu = \int_{\hat \Omega} \phi^*(p \wedge d\nu)
\end{align*}
where $\phi^*$ is the pullback of differential form $\phi$.

Now, it’s just a matter of algebra
\begin{align*}
\int_{\hat \Omega} \phi^*(p \wedge d\nu) = \int_{\hat \Omega} \phi^*p \wedge \phi^* d\nu = \int_{\hat \Omega} \phi^*p \wedge d\phi^*\nu.
\end{align*}
Now, we need to look up what the pullback does to a 0-form $p$ and on the 2-form $\nu$. Well, we can look this up or just algebraically do it (which I’ll do it sooner or later because I couldn’t find a good source online), we have $\phi^* p = p \cdot \phi$ and
\begin{align*}
\phi^* \nu = (\det \phi’ )(\phi’)^{-1} (v \cdot \phi): \hat \Omega \to \mathbb{R}^3
\end{align*}
which is exactly the Piola transform.

Simple Neural ODE Code

 

 

Maxwell’s Equations


This is a “short” note deriving the Maxwell’s equations from first principles and experiments of physics; I will assume that one is familiar with div-curl-grad and all that. This is essentially an extremely condensed version of Griffith’s textbook which discusses the fundamental laws. Some level of rigor is dropped for conciseness.

1. Electrostatics We start with the idea of electrostatics, which studies the forces exerted on charged particle(s) from static (non-moving) sources. The governing equation is Couloumb’s law
\begin{align}\label{eqn:coul}
F = \frac{1}{4\pi\varepsilon_0} \frac{q_1q_2}{r^2} \vec{r}
\end{align}
where $q_i$ are the charges, $\varepsilon_0$ is some constant (permittivity of free space) and $\vec r$ is the vector between the two charges. The sign is repulsive if $q_1, q_2$ have the same charge and vice versa.

Suppose we have a single charge $Q$.  If there exists a multiple sources affecting $Q$, we can simply take the sum of them (superposition). Similarly, we can take the integral if the sources has a distribution over some $n$-dimensional volume; this leads to the concept of the electric field, defined as
\begin{align}\label{eqn:elec}
E(r) = \frac{1}{4\pi\varepsilon_0} \sum_{k=1}^n\frac{q_k}{r_k^2} \vec{r_k} \to E(r) = \frac{1}{4\pi\varepsilon_0} \int \frac{1}{r^2} \vec r \, dq
\end{align}
where $dq$ is the charge per unit volume. I am abusing a bit of notation here with $E(r) \sim \int \frac{1}{r^2} \vec r \, dq$, but the electric field has a value at any point in space and $r^2$ term is simply the distance. With \cref{eqn:elec}, one can calculate the force on $Q$ as $F = QE(r)$.

Focusing on a single charge, suppose that it’s a positive charge at the origin; the electric field $E(r)$ of this configuration radiates away from the origin. If we think about it, we see the total flux of the electric field through a spherical (in fact any shape $S$) surface surrounding the single charge only depends on the value of the charge inside. This is equivalent to saying
\begin{align}
\oint_S E \cdot da = \frac{1}{\varepsilon_0} Q_{enc}.
\end{align}
Applying divergence theorem, we have that
\begin{align*}
\oint_S E \cdot da = \int_V \nabla \cdot E \, dV = \frac{1}{\varepsilon_0}\int_V \rho dV
\end{align*}
where $Q_{enc} = \int_V \rho$ with $\rho$ the charge density inside the volume, leading to Gauss’s law:
\begin{align}
\nabla \cdot E = \frac{1}{\varepsilon_0} \rho.
\end{align}

What about the curl of the electric field $E(r)$? If we assume that the charges aren’t moving, then we actually have $\nabla \times E = 0$; the proof is by visualization from a single point charge as the fields point outwards. Note again, that this is not true once we introduce motion since magnetism arises.

2. Magnetostatics Moving to the more difficult magnetism aspect, the key concept here is that moving charges generates a magnetic field $B$. So besides the electric field, a moving charge (which is related to the concept of current) produces a magnetic field $B$. In particular, along a wire, the magnetic field satisfies the right hand rule meaning we expect to see cross products here. Thus, for two wires parallel to each other with current running in the same direction, using the right hand rule, we see that the wires will attract.

In fact, Lorentz force law (axiom of the theory) states that the force on a charge $Q$, moving with velocity $v$ in a field of $B$ is
\begin{align*}
F_{mag} = Q(v \times B).
\end{align*}
Note that since the velocity is cross-producted with $B$, the resulting force is perpendicular to the velocity, meaning no work is done by the magnetic force since it can only change direction of the charge but not the speed. The total force, including electric force, is
\begin{align*}
F = Q(E + (v \times B)).
\end{align*}

Before moving, on we have to define a “current” which is charge per unit time passing a single point with a unit of amps or coulombs per second. A current carrying wire with charge moving at velocity $v$ and charge $\lambda$ is $I = \lambda v$. If the charge flow is over a surface/volume, we use the unit “surface/volume current density” to describe it.
The formula is at velocity $v$ and density $\sigma$ then $J = \rho v$ where $J$ is called the volume current density and $\rho$ volume charge density. Calculating the magnetic force on an volume is
\begin{align*}
F_{mag} = \int (v \times B) \rho \, dV = \int (J \times B) \, dV
\end{align*}
with appropriate changes for lower dimensional entities.

The total current crossing a surface $S$ is simply
\begin{align*}
I = \int_S J \cdot da.
\end{align*}
In particular, the charge per unit time leaving a volume is
\begin{align*}
\oint_S J \cdot da = \int_V (\nabla \cdot J) \, dV
\end{align*}
by divergence theorem. Since charge is conserved, after all, one can visualize them as little particles, the flow outside must come from inside, meaning we have the “continuity equation”
\begin{align}\label{eqn:cont}
\nabla \cdot J = – \frac{\partial \rho}{\partial t}.
\end{align}
In the study of magnetostatics, we assume that $\frac{\partial \rho}{\partial t} = 0$.

This allows us to discuss magnetostatics, where instead of stationary charges like electrostatics, we have steady currents $\frac{\partial J}{\partial t} = 0$. These types of situations don’t arise in experiments, but it’s oddly accurate even in household applications. The corresponding Couloumb’s law here is called Biot-Savart law, given by
\begin{align}\label{eqn:B}
B(r) = \frac{\mu_0}{4\pi} \int \frac{J(r’) \times \vec r}{r^2} \, dV’
\end{align}
on a volume where $\mu_0$ is called the permeability of free space with the units coming out of the $B$ is in terms of teslas $T$ (or gauss) which is Newton per amp-meter. We also abused notation here where $r^2$ is the distance and $\vec r$ is the direction.

In the most basic case of magnetostatics, we consider a single wire with current (comparable to a single point charge in electrostatics). The magnetic field lines are simply circles around the wire meaning the curl is non-zero. One can find from calculation is that
\begin{align*}
\oint B \, dl = \mu_0 I
\end{align*}
where we are integrating a circular path of radius $s$ around the wire; this generalizes by superposition to multiple wires carrying current. In fact, the domain doesn’t matter, as long as it goes around the wire as the magnetic field loses strength at the same rate of increase from the circumference/perimeter. Now, the current $I$ enclosed by the volume can be expressed as
\begin{align*}
I = \int J \cdot dA
\end{align*}
where $J$ is the volume current density, meaning applying Stokes theorem gives us
\begin{align*}
\nabla \times B = \mu_0 J
\end{align*}
The above is a nice thought experiment, but it doesn’t generalize lol. One of the assumptions made (which is not obvious) is that the wire is of infinite straight wires! It is better to look at the Bio-Savart law itself.

We really want to look at \cref{eqn:B}. Note that $B$ is a function of $(x,y,z)$, the current distribution depends on $(x’, y’, z’)$, while $r$ is the distance between the point and the tilde points with the integral over the tilde; a key note is the div and curl of $B$ are over the unprimed coordinates.

With some amount of work using product rules and all that, one can show that $\nabla \cdot B = 0$, and taking a curl results in
\begin{align*}
\nabla \times B = \mu_0 J(r) \rightarrow \oint B \cdot dI = \mu_0 I_{enc}
\end{align*}
which is called Ampere’s law (so our above derivation is actually correct!).

Let’s do a quick review of magnetostatics and electrostatics:

  1. [Electrostatics]: Gauss’s law discusses the divergence and of the electric field, and the curl of it is always zero. These are called Maxwell’s equations for electrostatics. Essentially derived from Coulomb’s law plus superposition.

  2. [Magnetostatics]: Ampere’s law discusses the curl of the magnetic field, while the divergence is zero.  Again, these are Maxwell’s equations and derived from Biot-Savart law.

There’s more things to discuss, like the potential for magnetism, but we will skip it to move onto more interesting stuff.

3. Electrodynamics When there’s a current, there needs to a be a force moving those charges. Apparently, for most substances, one has
\begin{align*}
J = \sigma f
\end{align*}
where $J$ is the current density, $f$ force per unit charge, and $\sigma$ is a proportionality factor related to the conductivity/resistivity of a matter. For our purposes (e.g. not chemical or gravitational or nuclear), we have
\begin{align*}
J = \sigma (E + v \times B)
\end{align*}
but a good first-order approximation, since $v$ is usually small, is $J = \sigma E$ (called Ohm’s law usually written as $V = IR$).

Another way of describing this force is called the electromotive force, or emf, of the circuit. The emf is not a force, but rather defined as
\begin{align*}
\mathcal{E} = \oint f \cdot dl
\end{align*}
which is really force per unit charge. Another interpretation is it’s the work done per unit charge by the source (such as a battery). From this again, one can easily tie in what a generator is which uses motional emfs as the principle. The action of moving a wire through a magnetic field generates an emf of $\mathcal{E} = vBh$ where $h$ is the length of the wire, $v$ is the velocity and $B$ the magnetic field; this is very much an interpretation of work. Indeed, if we let $\Phi$ be the flux of the $B$ through the loop of wire, then $\mathcal{E} = -\frac{d\Phi}{dt}$.

A key concept of electrodynamics is the fact that a changing magnetic field induces an electric field. Through experimentation, this relation can be better quantified as $\oint E dl = – \int \frac{\partial B}{\partial t} da$ which means that, by Stokes’ theorem, $\nabla \times E = – \frac{\partial B}{\partial t}$; this is called Faraday’s law. This generalizes electrostatic to be time-dependent regime. With Ampere’s law, we can talk about Maxwell’s contribution, which at the time, was
\begin{align*}
\nabla \cdot E &= \frac{1}{\varepsilon_0} \rho, \\
\nabla \cdot B &= 0, \\
\nabla \times E &= – \frac{\partial B}{\partial t}, \\
\nabla \times B &= \mu_0 J.
\end{align*}

The problem with the above formula is that it’s not consistent with simple exterior calculus rules. In particular, div of curl should be zero, but the divergence of the curl of the magnetic field is not zero. Of course, for steady current, $\nabla \cdot J = 0$, but in general no.

The problem is that $\nabla \cdot J$ isn’t zero; we can rewrite this term using \cref{eqn:cont}
\begin{align*}
\nabla \cdot J = – \frac{\partial \rho}{\partial t} = – \frac{\partial}{\partial t}(\varepsilon_0 \nabla \cdot E) = – \nabla \cdot (\varepsilon_0 \frac{\partial E}{\partial t}).
\end{align*}
It goes without saying that just adding the above term will kill the divergence term! Lab experiments couldn’t find this term since $J$ is quite large usually, but arises in so called electromagnetic waves.

 

KKT Conditions

I need to relearn optimization, so here’s my incredibly short review.

The most basic problem to solve is to minimize $f(x)$ such that $g(x) = 0$. Since the constraint is an equality condition, we use the Lagrange multiplier. We look at the Lagrangian function $\mathcal{L}(x, \lambda) = f(x) – \lambda g(x)$; it’s easy to see that the minima of the original problem satisfies some sort of saddle point condition. This Lagrangian minimization problem can be solved by taking the gradients and setting it equal to 0.

As an example, let’s consider the curl-curl problem I’ve been looking at.
\begin{align*}
A u &= f, \\
B u &= 0
\end{align*}
with some appropriate boundary conditions which I will skip. The matrix $A$ corresponds to a curl-curl operator in strong form and $B$ is a div operator. The zero-divergence condition on the function is critical for the physics; $u$ should be thought of as a magnetic field and thus satisfy Gauss’s law.

Since $A$ is positive semi-definite, this is really a minimization problem $\min J(u) = u^T A u/2 – f^T u$ with corresponding Lagrangian $\mathcal{L}(u, \lambda) = u^T A u/ 2 – f^T u – \lambda^T Bu$ (note that in this case, $\lambda$ is a vector). The gradient with respect to the multiplier gives $Bu = 0$ as expected, and the gradient with respect to $u$ gives $Au – B^T\lambda = f$. Combining these gives the saddle point problem that we are familiar with. We can also obtain this sort of result using the functional formulation; without diving into too much details, it’s a similar process except with the Euler-Lagrange equation.

So what in the world is KKT conditions then? It’s just the generalization of Lagrange multipliers to inequalities… Specifically, one forms the Lagrangian again, and then set all the gradients equal to 0 and magically we get the correct minimum. Now looking back on this, this is a really strong result. But man, the way the econ professors taught this was tragically bad.

On a Theorem by Seeley

In my current work, I use the eigenfunctions of the Laplacian: $\varphi_k, \lambda_k \in H^1_0, \mathbb{R}^+$ satisfying
\begin{align*}
-\Delta \varphi_k = \lambda_k \varphi_k.
\end{align*}
It is well known that $\{\varphi_k\}_{k=1}^\infty$ provides an orthonormal basis for $L^2$, and is also orthogonal in $H^1$. Hence, any function $f \in L^2$ can be expressed as $f = \sum_{k=1}^\infty f_k \varphi_k$ where $f_k = (f, \varphi_k)$.

Unfortunately, there are not many properties which can be derived from this eigenfunction expansion. Besides the fact that the squared $L^2$ norm of $f$ is simply $\sum_{k=1}^\infty f_k^2$, and the $H^1$ norm is $\sum_{k=1}^\infty f_k^2 \lambda_k$, the connection between regularity and expansion is tenuous at best.

I was excited about the paper Eigenfunction Expansions of Analytic Functions by Seeley. In it, the author claimed to have derived a theorem giving necessary and sufficient conditions on analyticity and the eigenfunction expansion: a function $f$ is analytic iff $\sum_{k=1}^\infty s^{\sqrt\lambda_k} f_k^2 < \infty$ or $\{s^{\sqrt{\lambda_k}} |f_k| \}$ is bounded for some $s > 1$. Unfortunately, I don’t think it is an iff.

In particular, on a square with $f = 1$. We know the coefficients are
\begin{align*}
f_{ij} = \frac{2 \left((-1)^i-1\right) \left((-1)^j-1\right)}{\pi ^2 i j} \approx \frac{1}{ij}
\end{align*}
and so the theorem is stating that
\begin{align*}
s^{\sqrt{\lambda_{mm}}} f_{mm} &= \frac{2 \left((-1)^m-1\right)^2 s^{\sqrt{2} \pi m}}{\pi ^2 m^2} \\
&\approx \frac{s^{\sqrt{2} \pi m}}{m^2} \to \infty
\end{align*}
as $m\to \infty$ which is clearly unbounded.

Rather, more conditions needs to be imposed on the theorem. It is not hard to show that any function satisfying $\sum_{k=1}^\infty s^{\sqrt\lambda_k} f_k^2 < \infty$ is in any $\mathbb{H}^s := \{ u \in L^2 | \sum_{k=1}^\infty u_k^2 \lambda_k^s < \infty \}$ space for $s \ge 0$. Coincidentally, we know $\mathbb{H}^s = H^s_0$ for $1 > s > 1/2$ by an interpolation argument, meaning that at the minimum, our functions need to vanish at the boundary.

Eigenfunctions and Eigenvalues of the Laplacian of the “Pacman” Domain

We will derive eigenfunctions and eigenvalues on a Pacman domain, which in polar coordinates is $\Omega = \{(r, \theta) : r \in [0, 1], \theta \in [0, 3\pi/2]\}$.
The problem is
\begin{align*}
-\Delta u &= \lambda u \qquad \Omega\\
u &= 0 \qquad \partial \Omega
\end{align*}

In polar coordinates, the Laplacian is
\begin{align}
\Delta = \frac{\partial^2 }{\partial r^2} + \frac{1}{r} \frac{\partial}{\partial r} + \frac{1}{r^2} \frac{\partial^2}{\partial \theta^2}.
\end{align}
Thus, using separation of variables $u(r, \theta) = R(r) \Theta(\theta)$ where $R(1) = 0, \Theta(0) = \Theta(3\pi/2) = 0$, we have
\begin{align*}
\Delta u &= \Theta R” + \frac{1}{r} R’ \Theta + \frac{1}{r^2} R \Theta ” = -\lambda R \Theta.
\end{align*}
Simplifying, we have
\begin{align}\label{eqn:sum0}
\frac{r^2 R” + r R’ + \lambda r^2 R}{R} + \frac{\Theta ”}{\Theta} = 0.
\end{align}
In order for the above to be satisfied, we need each term to be constant, so assume that
\begin{align*}
\frac{\Theta”}{\Theta} = -\lambda_\theta
\end{align*}
where $-\lambda_\theta$ is a constant.
Taking into account the boundary condition, we know that
\begin{align*}
\Theta(\theta) = \sin\left(\frac{2}{3}n \theta \right)
\end{align*}
and $\lambda_\theta = \frac{4}{9}n^2$ for $n \in \mathbb{Z}$.

Now, using (2), we have the corresponding ODE for the $R$ variable
\begin{align*}
r^2 R” + r R’ + (\lambda r^2 – \frac{4}{9}n^2) R = 0.
\end{align*}
Let $\rho = \sqrt\lambda r$, then $R_r = R_\rho \frac{d\rho}{dr} = \sqrt\lambda R_\rho$ and hence $R_{rr} = \lambda R_{\rho\rho}$, hence
\begin{align*}
\rho^2 R” + \rho R’ + (\rho^2 – \frac{4}{9} n^2) R = 0.
\end{align*}
By the change of variables, we know that $R(\rho) = J_{2/3 n}(\rho)$ where $J$ is the Bessel function.

It remains to impose the boundary condition $R = 0$ at $r = 1$, so
\begin{align*}
R(\sqrt\lambda r) = J_{2/3 n}(\sqrt \lambda r) \qquad J_{2/3 n}(\sqrt{\lambda}) = 0.
\end{align*}
meaning that $\lambda = \alpha_{2/3 n, k}^2$ for $k \ge 1$, which are the eigenvalues.

Scaling Arguments

This is a pretty important concept in PDEs and its numerical approximations. Specifically, tt shows up in Bramble-Hilbert lemma, and domain decomposition analysis.
Most of this post is pretty much written right after reading Toselli and Widlund’s book, so there are a lot of resemblance.

Let $\Omega$ be a bounded domain in $\mathbb{R}^n$ which is ‘nice’ (say Lipschitz boundary) with radius $h$. Now let $u, v \in H^1(\Omega)$ such that
\begin{align*}
|v|_{H^1(\Omega)} \le C||u||_{H^1(\Omega)}
\end{align*}
and we wish to obtain the $h$ dependence from $C$.

What we do is to first consider a scaled domain $\hat \Omega$ which is just $\Omega$ scaled to be of radius 1, with the change of basis $x = h\hat x$.
If we find the corresponding inequality on $\hat \Omega$, then the constant $C$ will not depend on $h$.
Let $\hat v(\hat x) := v(h\hat x)$, then we note that $\hat \nabla \hat v(\hat x) = h\hat \nabla v(h\hat x)$ where $\hat \nabla $ is the gradient with respect to $\hat x$.
Then,
\begin{align*}
|v|^2_{H^1(\Omega)} &= \int_\Omega |\nabla v(x)|^2 \, dx \\
&= \int_{\hat \Omega} |\hat\nabla v(h \hat x)|^2 h^n \, d\hat x \\
&= \int_{\hat \Omega} |\hat\nabla \hat v(\hat x)|^2 h^{-2} h^n \, d\hat x = h^{n-2}|\hat v|_{H^1(\hat \Omega)}^2
\end{align*}

But for $L^2$ norm, there is no $h^2$ scaling, hence
\begin{align*}
||u||_{L^2(\Omega)}^2 &= \int_\Omega |u(x)|^2 \, dx \\
&= \int_{\hat \Omega} |u(h \hat x)|^2 h^n \, d\hat x = h^n ||\hat u||_{L^2(\hat \Omega)}^2.
\end{align*}
This is why derivatives mixing causes scaling issues.

Putnam 2003 A2

Let $a_1, \ldots, a_n$ and $b_1, \ldots, b_n$ be non-negative real numbers. Show that $$(a_1\ldots a_n)^{1/n} + (b_1\ldots b_n)^{1/n} \le [(a_1 + b_1) \cdots (a_n + b_n)]^{1/n}.$$

Solution: we will use the generalized Holder’s inequality which states that
$$||f_1\ldots f_n ||_1 \le ||f_1||_{\lambda_1} \cdots ||f_n||_{\lambda_n}$$
for lambda weights $\lambda_1^{-1} + \cdots + \lambda_n^{-1} = 1$ all greater than 1.

Assuming this is true, let $f_i = (a_i^{1/n}, b_i^{1/n})$ and the norms be the discrete $l^p$ norm. This will give us $||f_1 \ldots f_n||_1 = (a_1\ldots a_n)^{1/n} + (b_1\ldots b_n)^{1/n}$ as everything is non-negative. The weight will be uniform $\lambda_i = n$, then the right hand side will be
$$||f_i||_{n} = (a_i + b_i)^{1/n}$$
and we have our inequality.

The sole remaining thing to prove is the generalized Holder’s inequality. We will assume the famous base case of the two element case. In the inductive case, we have
\begin{align*}
||f_1\cdots f_{n+1}||_1 &\le ||f_1 \cdots f_n||_{\lambda_{n+1}/(\lambda_{n+1} – 1)} ||f_{n+1}||_{\lambda_{n+1}} \\
&= ||(f_1 \cdots f_n)^{\lambda_{n+1}/(\lambda_{n+1} – 1)}||_1^{(\lambda_{n+1} – 1)/\lambda_{n+1}} ||f_{n+1}||_{\lambda_{n+1}}.
\end{align*}
From here, just change the weights and use the inductive case and we are done.