Marshall Jiang – Page 2 – Quirky Quintet

I started the Neapolitan quartet, which the NYTimes deemed the “best book of the 21st century”, sometime around September last year and I finally finished it this past month. It took this long, not because of the quality of the book, but because of travels and just… life. Historically, I’ve always liked bildungsroman and this trend has been continuing.

The series is huge, and covers the life and death of a friendship alongside themes and reflections from the mundane, like clothing, to the profound, love. From the very beginning when they were kids who believed in ogres to the end when losses and lovers peppered their lives. Throughout the fictional autobiography, our narrator Elena provides her sincerest opinions, and it’s like peering out into the world from inside her head; a view of Naples from the lens of an intelligent Italian woman who grew up in a poor, violent-ladened neighborhood.

And I understood her. Mostly.

I understood why she and some of her friends hated the Solaras, why she left her husband, why her relationship with Lina is as turbulent as it is. In my opinion, it’s a story about the brilliant but trapped Lina as much as it is about our Elena. At the same time, I can feel the narrator’s humanity, meaning her flaws were there too. I felt her selfishness and envy and lust and ire. I found myself judging her for the way she handled her paramour, and his relationship with her kids. Likewise, with, what I perceive to be, a betrayal of her promise to Lina.

A very human book.

The prose is impeccably detailed, with remarks on why she felt this or that, and why her friends might also feel a certain way. At times, the stream of consciousness far pales what my own internal monologue, and I wonder if I’m the weird one who never thinks these thoughts or notices these social interactions.

My really main criticism is how much politics is discussed, with little background information to help the reader. The entire time, I was confused about what the parties stood for and how they should intertwine with the beliefs of the characters. For something as overarching as this, I would’ve loved if the English translations provided slightly more context for the readers.

Also, as a side note, I also finished listening to John Green’s Tuberculosis is Everything. It’s a very short book which discusses the history of tuberculosis, its treatment and the societal impacts. Ultimately, it felt like a cry for a more caring world, and is worth a glance.

March 26, 2025June 1, 2025

PyTorch and register_full_backward_hook

The module method register_full_backward_hook is somewhat esoteric. The user is suppose to provide a function hook(module, grad_input, grad_output) -> tuple(Tensor) or None which will be executed “every time the gradients with respect to a module are computed.” But what is actually grad_input and grad_output? I think one of the simpler ways is to view it from an adjoint formulation.

For sake of simplicity, we consider a $N$ layer neural network with layers consisting of the form
$$
x_{n+1} = \Phi_n(x_n)
$$
where $x_n$ is the input into the $n$th layer, and $\Phi_n$ is some function with parameters which represent the $n$th layer. Suppose now that we consider a $L^2$ loss
$$
\min \frac{1}{2} ||y – x_N||^2
$$
where $y$ is the sample labels.

We can actually rewrite the above into a constrained minimization point of view
$$
\min \frac{1}{2} ||y – x_N||^2
$$
such that $x_N = \Phi_{N-1}(x_{N-1}), \ldots, x_1 = \Phi_0(x_0)$ for some sample data $x_0$.

From a math point of view, we can use Lagrange multipliers
$$
\mathcal L = \frac{1}{2} ||y – x_N||^2 + \sum_{i=0}^{N-1} \langle \lambda_{i+1} ,\Phi_{i}(x_{i}) – x_{i+1} \rangle.
$$
where $\langle \cdot, \cdot \rangle$ is simply the dot product. If we take the gradient with respect to $\lambda_1, \ldots, \lambda_{N}$ and set it equal to zero, we obtain our forward dynamics. This is pretty straightforward and actually reflects the register_forward_hook.

The real fun part is when we take the gradient with respect to the variables $x_1, \ldots, x_N$:
\begin{align*}
(y – x_N) – \lambda_{N}^T &= 0 \\
\lambda_{N}^T \nabla \Phi_{N-1}(x_{N-1}) – \lambda_{N-1}^T &= 0 \\
\lambda_{N-1}^T \nabla \Phi_{N-2}(x_{N-2}) – \lambda_{N-3}^T &= 0 \\
\vdots &= \vdots.
\end{align*}
In particular, we can rewrite the above as a backwards dynamics on the so called “adjoint” variable with initial conditions $\lambda_N = (y – x_n)^T$ with dynamics $\lambda_{i-1} = (\nabla \Phi_{i-1}(x_{i-1}))^T \lambda_{i}$. Note that we’re pretty liberal with our notation on the gradients/transposes, and some dimensions errors may have occurred. As it turns out, this adjoint variable is the component being calculated by the register_full_backward_hook!

But before we discuss use some code and see this in practice, why is useful? Why does PyTorch calculate $\lambda_i$ when doing the backward pass? We introduce one more bit of notation: let $\theta_n$ be the parameters of $\Phi_n$. The backwards pass requires us to calculate $\frac{\partial \mathcal L}{\partial \theta_n}$ for each $n$. Thus,
$$
\frac{\partial \mathcal L}{\partial \theta_n} = \frac{\partial \mathcal L}{\partial x_n} \frac{\partial x_n}{\partial \theta_n}.
$$
The second term is dependent on the layer type, and can be easily calculated, but the first term is simply the adjoint variable! Thus this adjoint term can be utilized to calculate the gradient.

Now with all the theory out of the way, let’s do a simple example in PyTorch to see that practice matches theory. For simplicity, we consider the following linear layers
$$
x_{n+1} = (A_n + I)x_n
$$
where $A, I$ are $k$ by $k$ matrices, and $x_n$ is the input from the previous layer. While the final output can simply be represented by a single matrix-vector product, we will stick with this formulation. The adjoint variable is thus now just $A_n + I$ transposed. The following code block really shows that the above derivation is true.

import torch
import torch.nn as nn

# We define each layer; corresponds to \Phi
class Layer(nn.Module):
    def __init__(self):
        super(Layer, self).__init__()
        # Just some random matrix
        self.A = nn.Parameter(
            torch.tensor([[1.0, -2.0, -1.0], [-2.0, 1.0, -2.0], [-1.0, -2.0, 1]]) / 4
        )
    def forward(self, u):
        return u + self.A @ u

# The model is the full with all layers
class Model(nn.Module):
    def __init__(self, n=3):
        super(Model, self).__init__()
        self.layers = nn.ModuleList([Layer() for _ in range(n)])
    def forward(self, u):
        for layer in self.layers:
            u = layer(u)
        return u

# This is the adjoint; just the transpose in this case
class Adjoint(nn.Module):
    def __init__(self, forward_model):
        super(Adjoint, self).__init__()
        self.layers = forward_model.layers
    def forward(self, u):
        print(f'Input: {u=}')
        for layer in self.layers:
            print(u, end=' \t')
            u = u + layer.A.T @ u
            print(u)
        return u

# Generate random data and label
u0 = torch.randn(3)
y = torch.randn(3)
model = Model()
model_adjoint = Adjoint(model)

# Define the hook function; hook must be of this form
def hook_fn(module, grad_input, grad_output):
    print(f'{grad_output=}, {grad_input=}')

# Register the hook function to each encoder layer
hooks = []
for i, layer in enumerate(model.layers):
    hook = layer.register_full_backward_hook(hook_fn)
    hooks.append(hook)
# Perform a forward pass
out = model(u0)
loss = 0.5 * torch.norm(out - y) ** 2
loss.backward()
print(model_adjoint(out - y))
# Remove hooks;
for hook in hooks:
    hook.remove()

Running it results in, for example, something like

grad_output=(tensor([-3.4323,  2.1327,  0.9475]),), grad_input=(tensor([-5.5936,  3.9083,  0.9761]),)
grad_output=(tensor([-5.5936,  3.9083,  0.9761]),), grad_input=(tensor([-9.1902,  7.1942,  0.6644]),)
grad_output=(tensor([-9.1902,  7.1942,  0.6644]),), grad_input=(None,)
Input: u=tensor([-3.4323,  2.1327,  0.9475], grad_fn=<SubBackward0>)
tensor([-3.4323,  2.1327,  0.9475], grad_fn=<SubBackward0>) 	tensor([-5.5936,  3.9083,  0.9761], grad_fn=<AddBackward0>)
tensor([-5.5936,  3.9083,  0.9761], grad_fn=<AddBackward0>) 	tensor([-9.1902,  7.1942,  0.6644], grad_fn=<AddBackward0>)
tensor([-9.1902,  7.1942,  0.6644], grad_fn=<AddBackward0>) 	tensor([-15.2510,  13.2556,  -0.4691], grad_fn=<AddBackward0>)
tensor([-15.2510,  13.2556,  -0.4691], grad_fn=<AddBackward0>)

Hence the values of the adjoint matches whatever is being calculated in the hooks. Note that in the explicit adjoint calculation, we calculate an additional step which is not there in the hook code.

December 17, 2024

Arcane Season 2

I wrote about Arcane season 1 awhile ago.

The second season is also fantastic.

A bit rushed and somewhat confusing at times (one must pay attention at all times!), but the meat of the story is well-written with quite a few seriously poignant scenes.

Still highly recommend.

December 4, 2024December 8, 2024

Remarkably Bright Creatures

I went on a short road trip to Denver this past week, and finally used the Spotify audiobooks feature. The bright cover caught my eyes, and I listened to the whole of Remarkably Bright Creatures in the 12 hours on the road.

The book received a lot of attention from the internet, but, frankly, I was more annoyed at the book rather than enjoyed it. It was because the characters, while technically fleshed out, all had personality traits which gnawed at me.

The shopkeeper couldn’t stop gossiping. The main character Tova embodied some of the worst of what I think of as “boomer” traits. The biggest culprit was Cameron, who was arguably the worst man-child that I’ve ever read in any of my books, and I found it hard to listen to his excuses in his chapters. To be fair, it could be that the audiobook did a fantastic job of voicing him.

The plot was also laid wide open before the halfway point, after the octopus revealed he could discern genetic relations (which is…. silly, but this is fiction after all). The book then became an exercise in dramatic irony, with the main question of what sort of small knots the author will introduce before a happy conclusion.

I was… also frankly disappointed at the message. Tova is a character beset by tragedy. Her husband passed a few years ago, and her son passed long before that from suicide. Instead of exploring how her grief interplay in a mentor-mentee relationship, the author put in the twist of making Cameron and Tova related. This seem to be the only thing to satisfy Tova: to find family again. Why couldn’t she live with grief, something all of us must toil through? Should I start donating sperm to have unexpected grand kids in the future?

Maybe, I need to stick with “high” literature for now. Just compared to other books I’ve read recently, this one just seemed so lacking in substance.

I did like the octopus’ sass though. He was awesome.

November 28, 2024June 1, 2025

Pied-à-terre

small living unit, e.g., apartment or condominium, often located in a large city and not used as an individual’s primary residence

Of course there’s a French word for this.

November 8, 2024June 1, 2025

When Life…

give you lemons in a rough week, don’t use the lemons.

Put them in the kitchen and just turn on TV to watch Bob’s Burgers.

October 24, 2024December 8, 2024

A depressing set

My little trek through the world of literature of the current century is slightly slowing down over the last month. The opening of a new book always bring with it some emotional investment. The last three books that I finished demanded even more than the usual: Leaving the Atocha Station by Lerner, Never Let Me Go by Ishiguro, Austerlitz by Sebald. This will be a short and sweet discussion of the last two, and is really a “feelings” and not a “literature, here’s quotes that support me” discussion.

As a side note, it’s not that Lerner’s book was bad. It felt like an author’s bildungsroman through the main character’s journey in Spain. Lerner is a poet meaning the language was actually lovely, but sometimes leaning on the everything-as-metaphor stream of consciousness which can be difficult to read. However, it’s just not as impactful as the other two.

Ishiguro’s book is set in a parallel universe where biological sciences advanced far beyond the capabilities of our current world to the point where clones are created for the sole purposes of organ harvesting. We follow one of these clones in the later years as she examines her life and relationships while attending a boarding school. Sebald’s is a fictional account of a immensely troubled man who is reflecting on his personal history as he figures out how he, a Czech Jew, arrived in Wales from Prague during the summer of 1939 as a 5 year old.

There’s a deep sense of tragedy in both of these stories which tantalized me. Neither authors actually explicitly go at length to discuss either the impending or past dooms which haunts our characters and constantly obfuscate. Ishiguro masks the truth with words like “complete” to denote the death of a clone, reminding me of Carlin’s bit on euphemisms, while Sebald’s narrator somehow always gets distracted, perhaps in a constant bid to procrastinate the discovery of the truth. They stand in contrast between the two, one of imminent death and one of the crippling past but neither fully articulating the magnitude of tragedy.

It’s almost as if we’re watching these characters swimming in the ocean with cuts and scrapes while silhouettes of sharks lurk beneath. I couldn’t look away even while knowing what fate will befall our hopeful protagonists. In a world where so much detail is explicitly stated for the readers, it’s actually refreshing to have things unspoken. Paired with the way both authors write, the effect is an ethereal experience.

Both also rely on the power of memory as an introspection tool, and grossly remind me of how terrible mine is (and how I should start journaling again). As our protagonists learn, they refract the past through this new prism. Sebald’s character finally understood why he suffered mental collapse at Marienbad while Ishiguro’s character, incorrectly, guessed the purposes of the “gallery”, which was a mystery until the end. This introspection all inevitably guided our characters to the future, past the last pages of the novels.

None of the two stories are complete by the end. Kathy, the clone, left her love interest before his final “donation” and got her first donation summons, while Austerlitz was still traveling to find the whereabouts of his father. If my reading of Sebald’s themes of the last few pages in Paris is correct, he will never find his father. History, especially the grotesque, is increasingly guarded. And Kathy will not escape her fate of donations, no matter how much humanity she displays; Ishiguro’s tale is not a hero’s conquest, but more of a melancholic struggle.

Why is good literature always so sad. Onto My Brilliant Friend. That should be a happier book.

September 28, 2024December 8, 2024

Grasshopper

The grasshopper laid on the small open air passageway between the stairs and the front door of my apartment. A streak of ardent green juxtaposed against the gray, brutalist pockmarks of the concrete walkway. And yet, I almost stepped on it. Not, on purpose mind you, but because of the very subtle pull from peripheral vision; a beckoning of sorts when one’s mind is on autopilot.

It looked like it was dead. A grasshopper is not considered an elegant insect, with its many sharp angles and rectilinear tagma. Nothing like the gentle curves of a butterfly. But with this, comes a natural orientation that I could clearly see even while erect. It was lying on its side, throwing off the alignment to the ground attained by millions of years of evolution.

But occasional twitches showed specks of life remained. Unfortunately, my hands were full carrying trash to the bin, and saving this tiny green mote involved several steps. I would had to lean my trash bag against the wall, find and gently use a piece of card stock or paper to scoop the little fellow. Finally, take this little specimen down the flights of stairs and deposit it among the shrubs.

Maybe “several steps” is overselling it, but I ultimately did nothing and continued with my chores after returning from the bins. Was it really that hard to do something for a helpless creature stuck in a foreign land? The activation energy required so large that I chose inactivity? (To be fair, it was three flights of stairs…)

Or was my laissez faire attitude the correct choice for it was too weak to survive anyways? The wind was strong that day, and I suspected that it was blown from the nearby tree onto the balcony. Perhaps the traveler was just catching its breath and would straighten up by itself after several minutes

Twenty minutes later, when I was throwing away the recycling, it was gone.

September 4, 2024December 8, 2024

Piles

Living is the management of piles. Piles of laundry, piles of dishes, piles of books to read before we fade into the dirt…

Thank you NYTimes for publishing the list of your top books. It’s been awhile since I read a novel due to the pile (ahem) of New Yorker magazines on my coffee table, but it’s truly different to read a full length novel versus just a ten page short story.

Here are some notes from the past few books I finished:

Exit West by Hamid: frankly, I thought the novel was for YA audience. I did not like it at all. The premise, while interesting, meant that the plot could be guessed by page forty or so.
All the Light We Cannot See by Doerr: apparently bad Netflix adaptation, but solid novel. I really enjoyed the time jumps much like the Cloud Cuckoo Land with increasingly detailed world or anticipation, but the anti-war message is pretty heavy handed.
The Emperor of All Maladies by Mukherjee: marvelous writing balancing hope with despair. It’s certainly a difficult topic to read about, but I couldn’t put my Kindle down for the entire book. I do wish it would be… actually more technical… but it’s understandable the level which it is written.

Currently working through Never Let Me Go by Ishiguro and have Austerlitz arriving in a few weeks.

Author: Marshall Jiang

Protected: Contentment

The Neapolitan quartet