Putnam 2020 A2

For $k$ a non-negative integer, evaluate
\sum_{j=0}^k 2^{k-j} \binom{k +j}{j}.

Apparently, I only know how to do induction anymore now but this problem is straightforward using it. We suppose the inductive hypothesis, and assume that for a fixed $k$ that
\sum_{j=0}^k 2^{k-j} \binom{k +j}{j} = 2^{2k}.
We will proceed to show
\sum_{j=0}^{k + 1} 2^{k + 1-j} \binom{k + 1 +j}{j} = 2^{2k + 2}.

By a simple binomial identity, we have
\sum_{j=0}^{k + 1} 2^{k + 1-j} \binom{k + 1 +j}{j} &= \sum_{j=0}^{k + 1} 2^{k + 1-j} \binom{k+j}{j} + \sum_{j=0}^{k + 1} 2^{k + 1-j} \binom{k +j}{j – 1}.
Let $A$, $B$ be the first and second term respectively on the right hand side, then
A &= 2\sum_{j=0}^{k} 2^{k-j} \binom{k+j}{j} + \binom{2k + 1}{k+1} \\
&= 2^{2k + 1} + \binom{2k + 1}{k+1}.
For the other term,
B &= \sum_{j=1}^{k + 1} 2^{k + 1-j} \binom{k +j}{j – 1} \\
&= \sum_{j=0}^{k} 2^{k-j} \binom{k +1 +j}{j} \\
&= \frac{1}{2} \sum_{j=0}^{k + 1} 2^{k + 1-j} \binom{k +1 +j}{j} – \frac{1}{2} \binom{2k + 2}{k + 1}.
Thus, we rearranging, we have our result
\frac{1}{2}\sum_{j=0}^{k + 1} 2^{k + 1-j} \binom{k + 1 +j}{j} &= 2^{2k+1} + \binom{2k + 1}{k+1} – \frac{1}{2} \binom{2k + 2}{k + 1} \\
&= 2^{2k+1}.

Scaling Arguments

This is a pretty important concept in PDEs and its numerical approximations. Specifically, tt shows up in Bramble-Hilbert lemma, and domain decomposition analysis.
Most of this post is pretty much written right after reading Toselli and Widlund’s book, so there are a lot of resemblance.

Let $\Omega$ be a bounded domain in $\mathbb{R}^n$ which is ‘nice’ (say Lipschitz boundary) with radius $h$. Now let $u, v \in H^1(\Omega)$ such that
|v|_{H^1(\Omega)} \le C||u||_{H^1(\Omega)}
and we wish to obtain the $h$ dependence from $C$.

What we do is to first consider a scaled domain $\hat \Omega$ which is just $\Omega$ scaled to be of radius 1, with the change of basis $x = h\hat x$.
If we find the corresponding inequality on $\hat \Omega$, then the constant $C$ will not depend on $h$.
Let $\hat v(\hat x) := v(h\hat x)$, then we note that $\hat \nabla \hat v(\hat x) = h\hat \nabla v(h\hat x)$ where $\hat \nabla $ is the gradient with respect to $\hat x$.
|v|^2_{H^1(\Omega)} &= \int_\Omega |\nabla v(x)|^2 \, dx \\
&= \int_{\hat \Omega} |\hat\nabla v(h \hat x)|^2 h^n \, d\hat x \\
&= \int_{\hat \Omega} |\hat\nabla \hat v(\hat x)|^2 h^{-2} h^n \, d\hat x = h^{n-2}|\hat v|_{H^1(\hat \Omega)}^2

But for $L^2$ norm, there is no $h^2$ scaling, hence
||u||_{L^2(\Omega)}^2 &= \int_\Omega |u(x)|^2 \, dx \\
&= \int_{\hat \Omega} |u(h \hat x)|^2 h^n \, d\hat x = h^n ||\hat u||_{L^2(\hat \Omega)}^2.
This is why derivatives mixing causes scaling issues.

Putnam 2003 A2

Let $a_1, \ldots, a_n$ and $b_1, \ldots, b_n$ be non-negative real numbers. Show that $$(a_1\ldots a_n)^{1/n} + (b_1\ldots b_n)^{1/n} \le [(a_1 + b_1) \cdots (a_n + b_n)]^{1/n}.$$

Solution: we will use the generalized Holder’s inequality which states that
$$||f_1\ldots f_n ||_1 \le ||f_1||_{\lambda_1} \cdots ||f_n||_{\lambda_n}$$
for lambda weights $\lambda_1^{-1} + \cdots + \lambda_n^{-1} = 1$ all greater than 1.

Assuming this is true, let $f_i = (a_i^{1/n}, b_i^{1/n})$ and the norms be the discrete $l^p$ norm. This will give us $||f_1 \ldots f_n||_1 = (a_1\ldots a_n)^{1/n} + (b_1\ldots b_n)^{1/n}$ as everything is non-negative. The weight will be uniform $\lambda_i = n$, then the right hand side will be
$$||f_i||_{n} = (a_i + b_i)^{1/n}$$
and we have our inequality.

The sole remaining thing to prove is the generalized Holder’s inequality. We will assume the famous base case of the two element case. In the inductive case, we have
||f_1\cdots f_{n+1}||_1 &\le ||f_1 \cdots f_n||_{\lambda_{n+1}/(\lambda_{n+1} – 1)} ||f_{n+1}||_{\lambda_{n+1}} \\
&= ||(f_1 \cdots f_n)^{\lambda_{n+1}/(\lambda_{n+1} – 1)}||_1^{(\lambda_{n+1} – 1)/\lambda_{n+1}} ||f_{n+1}||_{\lambda_{n+1}}.
From here, just change the weights and use the inductive case and we are done.

Putnam 2003 A1

Let n be a fixed positive integer. How many ways are there to write n as a sum of positive integers, n = a_1 + a_2 + \cdots + a_k, with k an arbitrary positive integer and a_1 \le a_2 \le \cdots \le a_k \le a_1 + 1? For example with n = 4, there are 4 ways: 4, 2 + 2, 1 + 1+ 2, 1+1+1+1.

Solution: Denote K_n to be the set of tuples of (a_1, \ldots, a_k) with the above properties. We claim that |K_n| = n. We will use induction. It is easy to verify the claim for |K_n| = 1,2,3,4 for n = 1,2,3,4 respectively.

Assume that |K_{l}| = l for some positive integer l, then for a given tuple (a_1, \ldots, a_k) \in K_l we can add 1 to one of the elements in the tuple, and still preserve the property that a_1 \le a_2 \le \cdots \le a_k \le a_1 + 1. If a_1 \not = a_k, then we simply can add 1 where the integers jump, otherwise a_1 = a_2 = \cdots = a_k and we can just have a_k + 1. This gives rise to tuples which are in K_{l+1}. Finally, we have a tuple of 1s to add; this results in |K_{l+1}| = l+1.

We are not done here, as we would need to show that there exists no other tuples in K_{l+1} that we cannot construct as above. This is easy to see, as we can do the inverse operation of subtracting one (with the exception of the tuple of all 1s).

Vectorization over C

The title is probably misleading, but this is a lesson I needed to talk about.

I wrote out some simple code for the quadrature over the reference triangle last time, which involves a double loop. To my chagrin, my immediate reaction to speeding up the code was to put it into Cython, and give it some type declaration.

This did speed up my integrals, but not as much as vectorization. By simply condensing one of the loops into a dot product, and using vector-function evaluation, I sped up my code a substantial amount, especially with higher order integration of “hard” functions.

def quadtriangle_vector(f, w00, w10, x00, x10):
total = 0
for i in range(len(w00)):
total += w00[i] * np.dot(w10 / 2, f([(1 + x00[i]) * (1 - x10) / 2 - 1, x10]))
return total

To see what I mean, consider the following function

from scipy.special import eval_jacobi as jac
def f(x):
return jac(2, 1, 1, np.sin(x[0] - x[1]))
p = 20
x00, w00 = rj(p + 1, 0, 0)
x10, w10 = rj(p + 1, 1, 0)

The speedup I get is staggering.

%timeit quadtriangle(f, w00, w10, x00, x10)
3.11 ms ± 24.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit quadtriangle_vector(f, w00, w10, x00, x10)
348 µs ± 1.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Also, I tried to fully vectorize by removing the outer-loop. This actually slowed down the code a bit. Maybe I did it wrong? But for now, I’m decently happy with the speed.

SPD Matrices and a False Inequality

This one is from my research, and it’s a doozy. Given two vectors x, y such that each element in x is less in absolute value than the corresponding element in y, show that for any SPD matrix A that x^TAx \le y^T A y.

After spending a good amount of timing trying to prove this, I realized that this is in general not true (in fact, the result I was suppose to be chasing would’ve been disproven if the above statement was true). As a counter example, consider the following counterexample from a Bernstein basis application:

Let x = [1, 0, -1/3], y = [1, -1, 1/3]. Let the matrix be

[2/7, 1/7, 2/35; 1/7, 6/35, 9/70 ; 2/35, 9/70, 6/35].

Then the quadratic forms will be 4/15 and 1/7 respectively.

A Matrix Inequality

An exercise from Braess FEM book: for A, B symmetric, positive definite matrices, let A \le B. We want to show that the inverses satisfies a similar property B^{-1} \le A ^{-1}.

The book actually gave quite a lot of hints for this one.

\begin{aligned}x^TB^{-1}x &= x^T A^{-1/2} A^{1/2} B^{-1} x \\&\le \sqrt{x^T A^{-1} x} \sqrt{x^T B^{-1} A B^{-1} x}.\end{aligned}

Then from the hypothesis, A \le B \implies x^T(B - A)x \ge 0 \implies y^T(B^{-1} - B^{-1}AB^{-1})y \ge 0.
So we can plug this in to our equation above to find that x^TB^{-1}x \le \sqrt{x^T A^{-1} x}\sqrt{x^T B^{-1}x}.

The (lack of a) Matrix

I think I finally understand why software packages like PETSc has an option for an operator when doing something like conjugate gradient. Why isn’t having a matrix good enough for everyone?

Well turns out that while all linear operators can be translated to a matrix, it may not be the best way to represent the operator. As an example, consider a basis transformation from Bernstein polynomials to Jacobi (or vice versa). It’s certainly possible to find and construct a matrix which does the operation, but it’s ugly.

On the other hand, it’s not that bad to write a code which utilizes the properties of the polynomials and convert it within in O(n^2) time. The key is that a Jacobi polynomial is a sum of Bernsteins, and Bernsteins can be degree raised or dropped at will.

This function will outperform the matrix in many sense. For one, there’s no need to construct a matrix, which will take n^2 operations in the first place. Next, matrix multiplication will take a n^3 operation, so if we optimize enough, we will always beat it. Finally, it’s really less painful to code, because each line of the function serves a visible purpose.

Anyways, I’m sold.

(I’ll eventually publish the code in the summer)

Burgers Equation

For conservation laws in general, there’s the implicit solution of u = F(x - ut). It’s surprisingly useful for quick and dirty “exact” solutions to a lot of problems. In particular inviscid Burgers’ equation with initial values becomes trivial to code up.

For example, in Python it would be literally one line:

return optimize.fsolve(lambda u: u - self.u0(x - u*t), 1)

Spectacular Books

Mathematicians suck at writing. This is part of the reasons why they went into math in the first place, because they suck at writing. Sucking at writing doesn’t mean that mathematicians don’t write books though; in fact there are tons of math books written by mathematicians.

The problem is most of them suck.

Half of them are for geniuses written by geniuses.

The other half are for geniuses written by borderline geniuses.

By far my two favorite books are William’s Probability with Martingales and Trefethen’s NLA book. Both British authors. Both written in a highly colloquial style.

It’s really too bad academics have this ego-stroking urge to write to the highest denominator rather than to, say grad students or undergrad.

God damn.