All posts

2022 letter

2022-12-28 23:29 GMT


The blog post I look forward to most every year is Dan Wang’s annual letter. I’m spoiled because he always posts it on New Year’s Day. I curl up with my cappuccino, devour it like the serial novella release that it is, and the rest of the year’s internet writing goes downhill from there. Alright, of course there’s good content everywhere, all the time. But Dan is a unique combination of thoughtful and witty, timely yet evergreen. I wish more people wrote letters like he did. This is a homage to Dan’s annual letter, in hopes that he won’t stop writing them like he’s said that he might.

I’ve been working in artificial general intelligence (more on what “general” means later) for just over a year. I lucked out, joining the field in what I’m sure everyone would agree has been an exceptional year. Let’s recap.

Can you believe that all the following got published in 2022? OpenAI’s InstructGPT (that was this past January!) and ChatGPT (December), and DeepMind’s Sparrow (September) aligned large language models with human feedback. DALL-E 2 (April), Imagen (May), Midjourney (July), and Stable Diffusion (August) overhauled generative art to varying degrees of public release. Gato (May), VPT (June), PaLM-SayCan (August), and Cicero (November) took actions in real and simulated worlds. The first applied one brain to many tasks; the second fine-tuned pre-training with imitation and reinforcement learning; the third integrated a state-of-the-art language model; the fourth acted with intent. All this triggered perhaps the largest round of startups and AI investment yet. But it’s not just companies, it’s also communities, like Cohere For AI. AI is also driving more policy in Washington than ever. America is banning advanced AI chips from export to China, and subsidizing domestic chip production.

The techniques that undergird the big systems flew under the radar outside the field. They include Induction Heads (March), the Chinchilla scaling laws (March), Let’s think step by step (May), and conferences chock-full of clever reformulations. Finally, I would be remiss not to mention DeepMind’s battery of results in the sciences: competitive programming (February), plasma control in fusion reactors (February), restoring ancient texts (March), matrix multiplication (October), and folding nearly every known protein (July). Science is the AI application that makes me feel warmest and fuzziest.

Dirac famously claimed that physics in the 1920s was a time when “even second-rate physicists could make first-rate discoveries.”1 Years from now, as we loll on the beach with our robot butlers, 2022’s biggest AI story will be clear. But it’s not clear now. Right now we only have seeds. None of the above AI systems have yet changed the world as much as the standouts from that golden age of physics. Still, we’ve only just started really living the 2020s. The field to watch is artificial intelligence.

It’s worth stressing that this rate of progress is not normal.2 My mother reminds me that when she was at the University of Science and Technology Beijing, she had to change into special slippers to enter the computer room, full of IBM XT 286’s (6 MHz processor). Her brother, who studied mining, told her to pick computer science because that clean room, on the top floor, was the only room with air conditioning. She tells this story so I don’t forget how fast my computer is. I’ll be telling my children a similar story so they don’t forget how fast their models train.

Fast progress gives me the confidence to write this letter. I wasn’t here in 2016, when everyone thought deep reinforcement learning would get us to AGI in five years. Nor was I here during most of the Transmogrification of the last five. Between doing my own research and reading everyone else’s, I barely had enough time to understand what actually is a Transformer. Often I’m coasting on descriptions from high-level explainer posts. The thing is, I’m pretty sure that’s also the case for experienced researchers. The sheer scale of creative destruction in artificial intelligence is astounding. Scientists who built their careers on decade-old bets now have to learn the same things that I do.3 I’m trying to keep up, but so are others. I hope my documenting my experience, with precise examples, is interesting or helpful to someone.

The difference between research and (non-PhD) university and corporate work is jarring. I’ll tell a story of how I came to realize this. Then I’ll argue for one trait that drove AI success this year, one that popular excitement papered over. I’ll do this all in the context of reinforcement learning at scale, my discipline, but the argument generalizes.

This year, I tried to replicate a paper from outside DeepMind on our internal setup. Anyone who has tried this knows that it’s a notoriously difficult task. Many papers just can’t be replicated. It’s usually no fault of the authors.4 It could be because someone didn’t report a hyperparameter they didn’t think it was important. It could be because the algorithm interacts in a strange way with a new environment. It could even be because of the way a different accelerator multiplies matrices. Flakiness is something we accept in empirical science.

I gave it a pretty good effort. I have a good sense, honed in college, of when I should cut my losses. Well, I thought, there’s probably a trick I don’t know. I’m not an expert on the mathematics in the proof. Maybe the authors forgot to include a line of the pseudocode. Or, you know, a missing hyperparameter, that happens all the time. They didn’t even say whether I should average before or after computing the variance. Besides, I have other deadlines. Knowing when to give up is a skill.

What I didn’t do, which was obvious in retrospect, was any research itself. Along comes Thomas. He rederived the paper’s key equations, in a different way than the authors did in the appendix (which Thomas didn’t look at). Suddenly I knew whether I should’ve averaged before or after computing the variance. In a couple hours, “just for fun,” he said, Thomas wrote a toy environment, generated some fake data, replicated the result, and reproduced why I couldn’t replicate the result. To my confusion about a factor in the proof, Thomas waved it off—oh, that’s a statistical trick people use to massage their result into this and that form. Happens all the time.

I didn’t know you could just, like, do that. Trust science so much. Believe that I had what I needed to discover new knowledge—the idea the paper presented, a clever idea I didn’t come up with myself; evidence that the idea works in certain cases, which allows the reader to gallop over ground in a direction the authors painstakingly identified amidst the gloom; anticipation that I can extend the idea, if I trekked the final mile myself.

I feel silly making this observation. Following through with the process of science in a rigorous way is obvious to anyone doing research full time. It’s how an elementary school student thinks it’s done, having learned the scientific method. It’s what accomplished scientists did, leading up to the contributions they made. Of course, prior knowledge helps, to develop an intuition for when to trust other work and when to investigate. Of course, choosing the right problem and abandoning dead ends is still a skill. People know what they know, and it makes sense to do what they know. But I had to see, not just hear about, a different way of working several times to internalize it.

I suppose the whole experience is more of an indictment of how poorly not-research prepares one for research. In college, people who were thorough, disciplined, and committed to excellence existed. Anyone could tell who they were, and see through who wasn’t. And while most problem sets covered well-trodden ground and gave the right assumptions, one could seek out messy, non-incremental problems. But these individuals and opportunities were outnumbered by those which, for good and bad reasons, responded to other overwhelming incentives. Do only enough to pass. This professor cares, that one doesn’t. Publish or perish, we pretend that most work is novel anyway. But also, your real priority is to hang out with your friends. It is the most time you will ever have together.

Most organizations don’t teach research either. When I was at NASA, I was disappointed to find much of the agency staid and uninspiring. Either contracts were fleeting, or civil servant job security induced indolence. The worst example was one man who hung up a confederate flag in his cubicle, and spent all day eating baby carrots instead of working. A few starry-eyed individuals carried entire projects on their back. Those rare few believed NASA could be the paragon of research engineering it was, and were willing to fight bureaucracy and apathetic colleagues, even though they could go to Google and quadruple their compensation. They’re why NASA can still find within itself an Artemis launch every three years, and a James Webb every twenty.

It’s easy to claim to do research but actually be playing pretend. Which is fine—following the scientific process is hard. And it taxes the soul to be the only one doing it. But if a group satisfies several conditions at the same time—extensive institutional memory, enough distance from distractions (of which profit might be one), and a critical mass of people who care—a very special culture, where the default behavior is actually doing research, emerges. Several AI labs have achieved this culture, both by their own merits and by fortunate global circumstances beyond their control.

Riva Tez declared: everything’s a scam. The research world may not be cranking out rigorous scientific process all the time. But pockets of activity exist today, and certainly throughout history, that are indubitably not a scam. I have more stories to persuade you. But first, let me zoom out and make a general point.

I put to you that however fast or easy progress in artificial intelligence seems, it’s actually the result of doubling down on legitimate scientific process: attention to detail, severe hypothesis testing, showing your work. No magic bungled in, no wellspring of low-hanging fruit accidentally struck, few vulgar shortcuts taken. There is only good science, day after day, maybe at a slightly uncomfortable clip because you don’t want to get scooped.

I also want to set up a mental image that I will exploit later. During AlphaGo’s match against Lee Sedol, Demis Hassabis (DeepMind’s CEO) made two offhand comments. The first was in game two, the game of move 37, the pinnacle of AlphaGo’s creativity. On AlphaGo’s invasion into Lee Sedol’s territory, and Dave Silver’s excitement—“Look at [Lee Sedol’s] face, look at his face.” Demis commented, “That is not a confident face.” Spare a few seconds to watch it. I timestamped it for you. Then, game four, the game where humanity brought it back from losing in a clean sweep, where Lee Sedol’s move 78 caused AlphaGo to fall into delusions. Lee Sedol made a different face at AlphaGo’s odd moves. Demis: “Look, Lee Sedol’s confused. He’s like, what is [AlphaGo] doing—that’s not a ‘I’m scared’ confused. It’s a, ‘what is it doing.’” Here’s the timestamp. It’s how one regards a novice who has yet to grasp the underlying essence of their craft.

Even in one of the highest-stakes evaluations of an artificial intelligence system this century, both team human and team AI evaluated each other as humans. Lee Sedol instinctively glanced at Aja Huang (the human playing for AlphaGo), but found no tell there. Team AI had every quantitative indicator they could want on the screens in front of them: win probability, steps searched, predictions of what expert humans would do. Instead they summed up the two most critical games of the match using their gut.

Likewise, research is at its core a human activity. It’s like any game we play, where it has clear cut rules like Go or not. So, I invite you to evaluate how we’re doing as researchers with the intuitions of a player of games. I think back to when I tried to replicate that paper. Copying pseudocode, not motivating the equations. Dice-rolling some new hyperparameters. Adding network layers, not isolating the complexities of my environment. Expecting it to just work like installing Chrome. Now, drop in Lee Sedol, judging me, and Demis, commentating Lee: “Look, Lee Sedol’s confused. He’s like, what is Zhengdong doing—that’s not a ‘I’m scared’ confused. It’s a, ‘what is he doing.’”

Got the image? Now to make my point more concrete. I’ve tried to use intuition to persuade you that you can trust your intuitions about good research. Now I have more stories to take advantage of those intuitions I’ve just whetted. These are tales of inane errors and inexplicable choices. Bugs and details.

Philip Pullman, in His Dark Materials, concludes each third of his trilogy with Lantern Slides. Lantern Slides are vignettes. Imagine holding an engraving up to a lantern or inserting a transparent laminated film into one of those old overhead projectors from American elementary school classrooms. Lantern Slides don’t add to the plot. They ornament the characters and the world. What better way is there to make the intuitive case for research legitimacy?

Lantern Slides: the contingency of artificial intelligence systems.

When walking about the countryside of Italy, the people will not hesitate to tell you that JAX has “una anima di pura programmazione funzionale.” JAX is a rite of passage for aspiring machine learners congregating north of King’s Cross St. Pancras. Pure functions free them, as long as they resign themselves to strict constraints. Pure functions bind them, as members of a minority, recognizable in an instant through the thin veil of blind review anonymity. A pure function will always return the same result when invoked with the same inputs. Until it does not. One team ceased research when they multiplied the same matrices and got two different outputs. The same logic, invoked with the same inputs. Days later, they discovered a pure function with a secret side hobby, happily casting fp16 to bfloat16.

When you have an idea that works, people can forgive you for not totally understanding why. They may even let you wave away the rationale at publication time. You may say, simply, “we found it beneficial to linearly project the queries, keys and values h times with different, learned linear projections” instead. Now, if this was one of your key contributions, one that gave certain existing techniques new life, people might start asking questions. If your paper then became the second most cited paper in the field, people might start wondering if we are a field of real science at all. Naturally, the above quotation commences Section 3.2.2, Multi-Head Attention, of the linchpin Attention Is All You Need.

The night before a conference submission deadline, not a single TPU in the quota was idle.

One experiment, after flatlining for days, began to pick up speed. Its owner had abandoned it for more opportune hypotheses. But with every baseline someone else scheduled, this experiment climbed the leaderboard. With every new and necessary ablation (thanks, reviewer number two), this experiment discovered more reward. For its owner had implemented a bug, which did not update the reinforcement learning agent’s target parameters every one hundred steps, but only when the scheduler preempted the experiment.5

Accelerated computing, an institution which giveth and taketh. Three point three five terabytes per second memory bandwidth. 1,979 teraFLOPS of half-precision floating-point, theoretical performance shown with sparsity, one-half lower without sparsity. A bullet train, hyper-optimized for the tracks it rides but impossible to change direction. How did we get here, where artificial intelligence is effectively synonymous with deep learning? Where hardware is custom built to software is custom built to hardware is custom built to software is custom built to hardware is custom built to software. What a monster.

The most frequent cause of getting scooped is not uploading to arXiv a few hours later than another lab. It’s the bug in your old code, which you spot only when you see the other lab publish your idea months later, but it actually works.

I am a reinforcement learning agent trained to balance poles on carts, crawl dungeons, and hunt down apples. My world is three-dimensional, colorful, and charming. My researchers have given me toys, which they hope will enrich my environment and the experiences of those who will later see me perform. But sometimes they are a bit careless. One time, they scrambled my action spec between every episode, which felt like waking up with my arms and legs in different places. Another time, they flipped the color channels only at test time, so I had to eat a purple banana instead of a yellow one. I’m not sure if they meant to do this. But I could do so much better than those other agents if they only gave me a fair chance.

“This was a large effort by a dedicated team. Each author made huge contributions on many fronts over long time periods.” (Learning to Act by Watching Unlabeled Online Videos.)

“Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head attention and the parameter-free position representation and became the other person involved in nearly every detail. Niki designed, implemented, tuned and evaluated countless model variants in our original codebase and tensor2tensor. Llion also experimented with novel model variants, was responsible for our initial codebase, and efficient inference and visualizations. Lukasz and Aidan spent countless long days designing various parts of and implementing tensor2tensor, replacing our earlier codebase, greatly improving results and massively accelerating our research.” (Attention Is All You Need.)

The other Slide wasn’t criticism; it was praise.

Deep in the bowels of a data center, the scheduler works its dark magic. Another case of the failed reproduction. The same configuration, the same hardware, the same seed, the same inputs, the same code, and yes, this time, the same data types. Can you spot the difference? You have no hope. You are running at scale; of your thousands of environments, over weeks, some get preempted more often than others. They sample more often; your distribution is one-of-a-kind on every run.

GPT-3 fact of the day: OpenAI filtered the Common Crawl data set (60 percent weight in the training mix) from 45TB of compressed plaintext to 570GB (Language Models are Few-Shot Learners). That is 98.73 percent filtered out. Something is going on here. Appendix A, barely half a page, is far from the full story. Inquiring minds wish to know.6

After AlphaGo, there was AlphaZero. Then there was MuZero, and after MuZero, perhaps Muesli, the most complex of them all. It has a policy gradient loss, an MPO-like loss, a MuZero-like transition model loss, and more regularizations than you can count, all trained end-to-end. Try ensuring the correctness of your method on this beast. One method that I cannot talk much about implicated the lambda returns of the V-trace return estimator. Should lambda be zero or one? How can you even figure it out after your data has run the gauntlet in policy optimization techniques?

This question would stump three veteran reinforcement learning practitioners, including authors of algorithms you’ve used. But they were, all of them, deceived. For another research engineer, wiser, more determined, forged in the fires of AI winter, imagined a test that proved them wrong. The effect of the correction on performance was negligible. In spite of that, it’s our business to care.

“…it is dangerous to think of these quick wins as coming for free… we argue that ML systems have a special capacity for incurring technical debt… This debt may be difficult to detect because it exists at the system level rather than the code level…” (Hidden Technical Debt in Machine Learning Systems.)

Do you want to hear some real war stories? Look no further than the Chronicles of OPT development, the most fascinating memoir released this year. Why was node-55 added to the cluster in an unhealthy state? This FAIR release is the most transparent and thorough evidence for my argument out there. I can’t wait to read analogous logbooks when the full history of AI gets written.

NASA’s Kennedy Space Center gift shop sells bumper stickers that read: Failure is not an option.

When you use a lot of computers at the same time, failure is guaranteed. This year I spent a lot of time working with Podracer architectures for scalable deep reinforcement learning. There is not much to write home about:

Mom, should I do inference on the learner TPU or on remote actor cores? I think these inputs go Batch, Time and not Time, Batch like the documentation says. Oh yeah, work is fine. There’s some downtime. It takes minutes to run an experiment in debug mode and longer to schedule the real thing. Hey, guess what. I’m going to measure unrolls in units of timesteps instead of SARSA transitions.

End of Lantern Slides: the contingency of artificial intelligence systems.

Nima Arkani-Hamed claims that theoretical physics is actually very blessed. The dominance of both general relativity and quantum mechanics has not stymied the field. In fact, because both are so solid, and yet so different, they form the structure from which the next grand result will emerge. These two giants adversarially limit contributions that seem promising but lead us astray. New theories must be consistent with the restrictive framework between the cracks. They bear the burden to transcend the commanding presence of both general relativity and quantum mechanics.

Imagine a research lab or a startup. They see a straight line from idea to result—they say, we just need to scale. We just need more parameters, more computers. We just need to be first. The world is drowning in enthusiasm, egging them on, cutting large checks, threading glowing tweets. Research is making fast progress. How anyone balances real scientific process with pace, I don’t know. But if that lab or startup thinks now is an auspicious time to be lax on science—well, I don’t know what you would think. All I imagine is Lee Sedol, judging them, and Demis, commentating Lee: “Look, Lee Sedol’s confused. He’s like, what are they doing—that’s not a ‘I’m scared’ confused. It’s a, ‘what are they doing.’”

Some people think that there are few if any scientific breakthroughs remaining. They think, research progress is hard to measure. Ideas are getting harder to find—maybe all the good ones have already been had. Maybe some extant thing is All You Need. I wouldn’t be so sure. Marvin Minsky, luminary of our field, predicted “Within a generation… the problem of creating artificial intelligence will substantially be solved.” That was in the late sixties. He joins a distinguished class. Lord Kelvin in 1897: “There is nothing new to be discovered in physics now. All that remains is more and more precise measurement.” Cicero reported that Aristotle thought that he had just about completed philosophy, and that it would surely be completed a short time after his death.7 Don’t be them. Here, actually, take the long-term view.

This is why I remain optimistic that it’s not too late to join the field. Not all the good ideas have been had. Even if research outputs are hard to measure, process and engineering inputs are legible. Feedback loops are tight. Do the obvious thing, good science.

You’ve made it this far. Let’s talk about artificial general intelligence.

First, definitions. AGI means many things to many people. I’m going to first present Siddarth et al.’s definition of Actually Existing Artificial Intelligence. She describes AEAI as a vision of artificial intelligence research with three shared commitments: human competition, autonomy, and centralization. Generalization is implicit in all three. Comparing artificial intelligence to humans often requires defining measures or constraints characterizing generality: learn to do a diverse set of tasks concurrently; prohibit the system from memorizing solutions. An autonomous AI would be more general than one humans have to regularly course-correct. Generality scales AI’s dual use sword-with-many-edges nature, like nuclear fission, so we desire centralization and tight control.

None of these properties are at first glance good or bad. They are perspectives on the future. On one hand, AGI is optimistic. AGI envisions a human-surpassing, autonomous, and central Ultimate Computer we would initialize with all we know. It sits humming, and every so often hands down a brand new, beautiful theorem. On the other hand, AEAI is a warning, a project which succumbs to Goodharting, exacerbates social inequalities, and empowers autocrats. All the while it peddles an ambiguous, credulous utopia.8

What I want to convey is that now I’m talking about the most speculative, ill-defined subfield of the research. The real thing, the holy grail, the theory of everything, if you will. I don’t have a slam dunk definition, much less a solution. I am in way over my head. Please read this section as separate from what came before in this letter.

Research under the vast umbrella of AGI has undoubtedly enabled humanity in many places. As of this year it is state of the art in few-shot transcription, designing enzymes to recycle plastic, and entertaining millions of people with art and text. And even if we never achieve it, the goal of an artificial “the real thing” general intelligence is a useful research north star. The theory of everything in physics, as totalizing a pursuit as it is, still inspires researchers to make meaningful progress towards that goal.

We could stop here. As I mentioned, what qualifies as AGI research works now, more than ever before. The criticism that AGI researchers are “just playing board games” is no longer convincing. Artificial intelligence is not only commercially viable, but will be economically transformative, and justify the costs of its massive, purportedly general models. AGI as a vision can continue to serve its purpose under the surface, driving useful research forward, the ultimate result always ten years away. If this is your brightline, you had a great year. The future is now. You have All You Need.

But if we’re still serious about Artificial General Intelligence: The Real Thing, it’s time to introspect. AGI researchers have set a daunting task. AGI should do every cognitive task a human can and more we cannot imagine. People will know it when they see it. Its ultimate impact dwarfs any interim effects. It will dwarf the petty considerations in this letter. If what matters most is the long future; if it seriously might merit moral concern; if accelerated beyond biological limitations would bear great windfall or doom; if that is you—you think that the AGI vision should not just be a helpful north star, but should be reality—

Introspect. We are doing more yet understanding less. There’s no reason why what people consider the field of artificial intelligence should constrain what we research. If we’re as ambitious as we claim, then we also have to take ourselves much more seriously.

This year AI researchers got made fun of a lot for the Google engineer and LaMDA consciousness episode. Leave the philosophy to the philosophers and leave the AI to the AI engineers. The problems of consciousness, free will, creativity, intelligence, yes, they are unspecific and hard and overlap, and humans have spent thousands of years thinking about them, making modest progress at best. But if yours is the above AGI vision, aren’t you kind of obligated to take this on? And do it properly, seriously?

We shouldn’t hide behind reducing intelligence to what is purely quantitative (even if that is what result may look like). We should understand that we don’t know the boundaries of intelligence. We must be interdisciplinary because intelligence is. Otherwise, we’re only doing what is easy for us. If it’s reasonable to bet on embodiment being essential for intelligence, why not also taste in art, a deep bond with an animal companion, a yogi’s sense of her breath, and the wholly ineffable aspects of the human experience? It’s not a coincidence that humanity, in its entirety, is the only example of the hardest form of intelligence that we would like to create.

AGI research is under-regularized.9 For the non-specialist, regularization is a statistical technique that pressures the result to be “simpler.” A concrete example is neural network weight decay. The algorithm penalizes having too large a sum of squared parameters in the same way as it penalizes making an incorrect prediction. This results in simpler functions, avoiding very complex and optimized but very useless functions like memorizing the training data.

Like it or not, the ultimate evaluation for AGI will be intuition. Satisfying a definition won’t convince most people AGI is here if they don’t feel it. So I invite you again to use intuition to evaluate the performance of research in artificial general intelligence. Time for some more Lantern Slides.10

Lantern Slides: research in artificial general intelligence is under-regularized.

“For example, a squirrel’s brain may be understood as a decision-making system that receives sensations from, and sends motor commands to the squirrel’s body. The behaviour of the squirrel may be understood as maximising a cumulative reward such as satiation (i.e. negative hunger). In order for a squirrel to minimise hunger, the squirrel-brain must presumably have abilities of perception (to identify good nuts), knowledge (to understand nuts), motor control (to collect nuts), planning (to choose where to cache nuts), memory (to recall locations of cached nuts) and social intelligence (to bluff about locations of cached nuts, to ensure they are not stolen). Each of these abilities associated with intelligence may therefore be understood as subserving a singular goal of hunger minimisation.”

Dave, stop trying to make Reward is enough happen. It’s not going to happen.

There exist bestselling trade books, which, if the laity read without reservation, give them the illusion that they know what is going on in that field. The work may not be wrong, and it has its role in the canon. But a scholar should not venture very far only thusly armed.

If you ask a historian, the answer might be Sapiens, Guns, Germs, and Steel, or if they really want to provoke you, 1421.

An economist might say Freakonomics.

A physicist, The Elegant Universe or Physics of the Impossible.

In artificial general intelligence, what?

Did you ever hear the tragedy of Darth Plagueis the Wise?

Darth Plagueis… was a cryptocurrency exchange so powerful and so wise, he could reallocate capital to influence the charitable giving of an entire movement. He had such a knowledge of the far future, he could even keep the ones he cared about… from becoming misaligned…

It’s ironic. He could save others from death, but not himself.11

Many scientists, including my colleagues, wield terminology from The Structure of Scientific Revolutions. Kuhn’s philosophy inspires organization design. Researchers use it to justify their research direction. Several of them have even opened the book.

The Transformer architecture, getting on in years now that it has reached age five, was never unpopular. But this year it became fashionable to hail it as exquisite, inevitable; even, “a new kind of computer.” It’s also a load of retconning, if you ask me. I wonder if the name “Neural Turing Machine” rings any bells.

Resist the siren call of anthropomorphism.

The story of DeepBlue and Kasparov, and of AlphaGo and Lee Sedol, is as much about the prowess of the human as it is the impressiveness of the AI system. They deserved utmost respect in their respective showmatches. The human did all that, with nothing but human intelligence. For every artificial intelligence milestone, there are those who worked so hard but did not prevail. If we presume to equal the greats in all professions—painters, novelists, educators, hospitality, we must equal that ambition with our respect for those individuals, too.

Isaac Asimov’s Foundation series is the single worst influence on artificial intelligence practitioners (and do not doubt that it is a big one). It prescribes a history that is predictable and inescapable, somehow with less than zero character development.

“The very term ‘AGI’ is an example of one such rationalisation, for the field used to be called ‘AI’ – artificial intelligence. But AI was gradually appropriated to describe all sorts of unrelated computer programs such as game players, search engines and chatbots, until the G for “general” was added to make it possible to refer to the real thing again, but now with the implication that an AGI is just a smarter species of chatbot…

I am not highlighting all these philosophical issues because I fear that AGIs will be invented before we have developed the philosophical sophistication to understand them and to integrate them into civilisation. It is for almost the opposite reason: I am convinced that the whole problem of developing AGIs is a matter of philosophy, not computer science or neurophysiology, and that the philosophical progress that will be essential to their future integration is also a prerequisite for developing them in the first place.”

That was David Deutsch, ten years ago, basically an eternity. Contra Minsky. Time sharpens all takes.

End of Lantern Slides: research in artificial general intelligence is under-regularized.

Imagine a research lab or a startup. They have an ambitious goal. Perhaps the most ambitious ever. Produce the most impactful innovation in human history, one that gives new insight into thousand-year-old questions. One which anyone would recognize as some kind of equal, beyond any reasonable doubt, one day even a moral equal. Curiously, this group of researchers is hyper-optimized to a strict algorithmic definition of their task. They also spend a great deal of time on forums and foom instead of talking to regular, intelligent humans. Dear reader, are you thinking what I’m thinking? Drop in Lee Sedol, Demis. Look—what are they doing?

My favorite paper that I read this year was Shaking the foundations: delusions in sequence models for interaction and control. This is a paper you can sit down with, read, and learn quite a bit in a self-contained way from beginning to end. It’s got a lot of mileage for analogies. We AGI researchers, like our models, might be spiraling, bamboozled by our own outputs. More scale, more modalities, more resources, and more data would be nice. But above all we need more legitimacy, in all its forms.

I thank a number of large language models for assisting in the search for descriptions. May you be large for many more weeks to come. I believe I was still an essential part of the team, though, and will be for a long time.

I thank Arjun Ramani, Tom McGrath, Avi Ruderman, Saffron Huang, Hugh Zhang, Allison Tam, Arya McCarthy, Karina Nguyen, Gustav Brincat, Ankit Rajan, Tyler Cowen, and Julia Garayo Willemyns for reading drafts of this section. Serena Cho took me to see the cover painting in the National Gallery.


Alright let’s talk about Star Wars. Recall, it is the greatest fictional universe of all time. This year we got Tales of the Jedi, which was just fine. We also got The Book of Boba Fett and Kenobi, which were both miserable. Let’s forget about those. This year was a great year to be a Star Wars fan for one reason and one reason only: Andor.

Andor is the best Star Wars since Rogue One, which itself is the best Star Wars since the imagining of the clone wars (recall also that George Lucas is the greatest artist of our time). The show is well-written and well-acted. That much is required. What really distinguishes Andor is its optimism. It can get away with it by relying on the Star Wars universe exactly when it should, and ignoring the universe everywhere else. Spoilers to follow.

I can sum up this one season of Andor in a couple telling indicators. It’s only a minor spoiler that we didn’t see a single lightsaber. And it’s only minor gatekeeping for me to say that there’s only one big cameo, Admiral Yularen, and that most people would fail to recognize him. For all the work I’ve put into consuming Star Wars all my life, I don’t have much to show for it. Andor doesn’t give me any chances to show off. The only advantage I get for being a faithful fan is knowing what a stormtrooper looks like (and knowing that they are the bad guys), knowing what the lasers sound like, and knowing that we call faster-than-light “hyperspace” and not “warp.” In short, I’m no better off than a filthy casual. We already knew from cast interviews that Andor would go light on the fan service, but who could’ve predicted that Tony Gilroy would forget which torch he was carrying?

But this much exchange with the Star Wars that came before is enough. What remains is breathing room for a good drama. Stormtroopers actually can aim, and are distressingly accurate on Rix Road. The Imperial machine exhibits trigger discipline, firing at will only when commanded. Imperial Security Bureau supervisors who have never seen combat look the part during their first action. Narkina 5 guards, who have no stake themselves, hide rather than let escaping prisoners dismember them. Kino doesn’t die at the climax of a theatrical firefight; he can’t swim. We watch Taramyn and Gorn grow for half the season; when they get fatally shot, the camera moves on without a second thought. Cassian doesn’t return home to be the star of the showdown; he gathers his loved ones and gets out.

Some would champion these examples as proof that Andor deserves an Emmy. They are mistaken. These details only let us put Andor among the excellent shows of the year, instead of only excellent Star Wars. We are so accustomed to the routine beats of the Disney drum that its subversion deludes us into thinking it is peak television.

The best argument that Andor is worthy of more recognition lies in the writing I bet critics will find the weakest—the monologues. These embody the optimism I alluded to earlier. There’s Kino’s last direction as shift manager: one way out! Luthen, reflecting on sacrifice with Lonni, burning his life to make a sunrise he’ll never see. Maarva, posthumously, warning of the disease that’s never more alive when we’re asleep. Nemik’s admonition that moves Cassian to try.

Even in the context of the entire plot, these monologues don’t land. I hate to say it, but by themselves they might make an experienced television viewer cringe. Andor chose to skip the exposition. We can substitute any generic oppressor, and any generic rebellion, in the words the characters say. They trust you to remember that the stormtroopers are the bad guys. Andor is trying to inspire some optimism in you, to recreate the magic of A New Hope amid the despair of the Vietnam War, without constructing the foundational sentiments within the season itself.

So I don’t expect Andor to win very much recognition. And that’s justified, because Andor can’t draw from most of the human experiences other dramas can. But that doesn’t impede Andor’s worth. The show instead draws from a science fiction that anyone can opt into. The galaxy feels empty only if you haven’t grown up in it, or recreated its histories in your LEGO sets.

Has there ever been a fictional universe with so many participant creators as Star Wars? Beyond the dozens of top-down, high-value Hollywood productions that give every actor under the sun a part in something or other. There’s hundreds of books, comics, and games, for which authors used to make their obligatory pilgrimage to Skywalker Ranch to ensure consistency with the canon. There’s the lucrative, ubiquitous merchandise. Anyone can imagine themselves in this universe sprawling through time and space. The Skywalker Saga is a subplot. In contrast, try to write yourself into some of the other universes with a Chosen One. How are you supposed to get out from under the shadow of Harry Potter?

Andor in fact demands great commitment to the franchise. Its fan service isn’t cheap. The writers don’t just cameo a vehicle that first appeared as a 1979 toy for you to fawn over, which they later use as merchandise fodder. Andor rewards you for wallowing in the Star Wars deep end. The more you share in the collaborative construction, the easier it is to overcome the cynicism of other television. It’s the way to appreciate Andor’s otherwise mawkish optimism.

All this to say, Andor believes in you, to remember that you’re watching Star Wars! As far as I can tell, most people have come around to giving Andor a good review. But I want the record to show that I was raving about it since the day it premiered. Those I have discussed it with can bear witness. The Force be with Tony Gilroy, brought in to reshoot Rogue One, and now for Andor, art that has won my heart.

I will not be taking questions at this time.

Taking a step back, this year of television was great for franchises in general. A bad trailer always precedes a bad show. A good trailer, though, leaves the question open. So the unblemished record all four of the Andor, House of the Dragon, Lower Decks, and The Rings of Power trailers encouraged me. You know what I think of Andor. I’ll review the rest one by one. It’ll be helpful to again see these works in their task of extending their universes.

House of the Dragon was outstanding. Game of Thrones has always enjoyed good acting. Good writing follows when source material by George R. R. Martin simply exists. I say exists becaues I did pick up a copy of Fire and Blood, and it’s not that good. The premise is, Fire and Blood is a scholarly tome that wouldn’t be out of place on a shelf in the Citadel. You get what you pay for. It’s boring.

He tried a little to make it interesting. But any attempt ruined the semblance of Medieval historiography. I’m not convinced that conflicting accounts is a problem when it’s obvious that certain accounts only exist to enliven the plot. A self-respecting maester could never. So in a world where the book is almost always better than the show adaptation, we got a unicorn. House of the Dragon might be the largest gap in the show-is-better-than-the-book direction in history.

People are protective of early Game of Thrones. When discussing House of the Dragon, they cautiously float the idea: hey, maybe House of the Dragon is almost as good as the good seasons of Game of Thrones. Dear friends, do not hedge. “Most men would rather deny a hard truth than face it:” House of the Dragon season one blows any season of Game of Thrones out of the water. Game of Thrones is just a watered-down version of A Song of Ice and Fire. A horrendous amount of requisite exposition relative to its length and budget made the show stumble.

The greatest accomplishment of Game of Thrones lies outside its story: that show gobsmacked an entire generation of sweet summer television audiences into demanding new heights of cruelty against their characters. The leisurely hazing that Game of Thrones oversaw prepared the world to receive House of the Dragon at speed. The proper pace of A Song of Ice and Fire.

A fair counterpoint is that Game of Thrones better follows multiple storylines. I get the appeal, but it’s not sustainable. Remember, the source material only has to exist, and it doesn’t, by the end. This was obvious when Game of Thrones ran off the rails. The books marked what needed to happen by when years ahead of time. And even there it’s not perfect. Daenerys is still bumming around in Meereen at the end of Dance, after all. No one can wing it with that many characters. I’ll give credit where it’s due, though. Without the success of Game of Thrones, no one would have greenlit an adaptation of that confusion of a book, Fire and Blood.

Unrelated to the universe, House of the Dragon pulled off a seamless mid-season recasting. Way better than I expected. Not only sufficiently well for a necessary time jump, but in a way that enhanced the characters in a way that would’ve been impossible otherwise. I’m sure there’s some lessons in running logistically demanding shows there. Watch it for that, if you don’t care about the universe.

I also enjoyed Lower Decks. I can’t evaluate it very well, though. The best jokes referenced old Star Trek, and what I know about this universe is relatively shallow. But this was the third season of Lower Decks, so there’s more inside jokes contained within the show—Peanut Hamper, Crisis Point II, Shax ejecting the warp core, new members of the Self-Aware Megalomaniacal Computer Storage. I watched the first two seasons right after I moved to London, so hearing the payoffs to setups I heard over a year ago was extremely satisfying. Silly, but sort of completing an arc in my life, too.

Lower Decks is also the rare cartoon, not for children, that’s not cynical. I respect Rick and Morty and Bojack Horseman for their creativity. But I don’t have the energy to withstand that much depressing humor in succession. Mike McMahan first wrote for Rick and Morty, which isn’t surprising because the animation style of Lower Decks is very similar, but is surprising for his range. It’s a great skill to be very funny without using a crutch like cynicism. I’m thinking of the Cerritos glamor shots. I’m thinking about the San Clemente, the Sherman Oaks, the Vacaville, the Burbank, the Fresno, the Santa Monica

One final note on Lower Decks. Star Trek has ever reflected the times we live, so it’s interesting to see the franchise engage with AI more often and with more color. I haven’t watched Picard or Strange New Worlds. Someone who has should write about this, though.

Recently it became fashionable to complain about the number of prequels, sequels, and other universe expansion. More Star Wars, more Star Trek, less originality. Yes there’s lots of lazy frachise expansion. But I present these productions as examples that add nuance to this assessment. There’s nothing wrong with lacking new Proper Nouns. Writers can be original in exploring the fantasy that a universe evokes. They can outsource exposition and getting the audience to care. Those are worthy skills, but they aren’t everything. I’d have no problem if we got more of these. Don’t let some old grouch enamored with Hamnet and the Bible, of all books, tell you what to think of other reprises.

Finally, The Rings of Power. It’s already obvious, care to guess what I think? It’s the only disappointing show of the four. Let me just say, they spent 715 million dollars and could not get the rights to the Silmarillion? Are you kidding me? I’m blind! (Rather, it makes us blind.) Why do they think we like Lord of the Rings? For the costumes? If we wanted to see shiny armor in a warm filter, we’d go rewatch the Hobbit trilogy. (Narrator: they did not.) In the Peter Jackson Lord of the Rings trilogy, only Lothlórien was this shiny.

Fine, I read something about Saul Zaentz Company matching rights, the Estate will never sell the Silmarillion rights, there’s enough of the Second Age in the Lord of the Rings appendices. But come on. What’s the point of hiring people who care if they have to watch where they step? Filling in the blanks is part of creative freedom, yes. But you can’t just… make up who gets to send people to Valinor. That’s offensive.

I’ll keep this part short. I didn’t end up finishing the season and am not sure if I’ll ever, so I shouldn’t say too much anyway. The Rings of Power didn’t respect that new audiences have the ability to commit to a deep universe. If you build it, they will come.

For a short span of weeks, we were blessed with House of the Dragon on Mondays (in the UK), Andor on Wednesdays, Lower Decks on Thursdays, and if you must, The Rings of Power on Fridays. Truly we live in the golden age of television.

This year I didn’t read as much as I would’ve liked. My excuse is moving to a new country. I know, the physical move was the end of last year. But I still need time to put up art, grow some houseplants, figure how to do my job, and spend my time in new ways, no? I spent some time on a hobby that seems related but is actually very different: buying books. London has tons of used bookstores, which is great for this. I did get to reading some of those books, too.

Memoirs of Hadrian. My favorite book that I read this year. Yourcenar guides you through Hadrian’s dense thoughts with such ease. A few seconds could pass, or a lifetime. Handling time badly could’ve easily derailed this book. But we stay in Hadrian’s thoughts, and only occasionally reach out to touch the real world, as deities like Roman emperors ought to do. An entire life passes, and an exactly appropriate amount of book passes.

This book scratched an itch I didn’t even know I had. You’d think that, with all the reading about the Romans in the past, and with all the historical strategy games I’ve played, I must have some sense of what this fiction is like. No, Memoirs of Hadrian is an unquestionably richer conception of what being the most powerful person relative to possibly any other point in history, absolute ruler of an empire that took weeks to cross on a fast horse, could have been like. Screenplay would somehow be too real, too grounded. We need imagination, prompted with words worthy of the Académie française.

I took a long time to start Love in the Time of Cholera after enjoying One Hundred Years of Solitude so much. Gabriel García Márquez is good at pausing the narrative to tell you a tidbit the characters will never know, like Juvenal forgetting when he first heard of Florentino and Fermina. I understand why American high schools will never assign this book. But can I say, if we students could avoid getting the wrong message, even an unrealistic meditation on obsession, pity, passion, and contentment would’ve been at least as useful as some of the other stuff they made us read.

I read His Dark Materials in one go, as God intended. Speaking of American public education, there’s a few major series which are formative to elementary school classes. And it’s not always the same series, or to the same degree. In my class The Chronicles of Narnia and Harry Potter dominated. If I could’ve chosen what to indoctrinate myself with (I can for my children, what a thought), I’d choose Lord of the Rings and His Dark Materials. Better characters. Better for the imagination. Better for the soul.

The only book of non-fiction I want to mention is When We Cease to Understand the World. Non-fiction is a big stretch because only the first chapter is non-fiction, and even then not the last paragraph. So good thing that’s the best chapter. I won’t presume to tell you what you should think after reading this book. But surely, everyone who reads this book will agree, any scientist who reads this book, and also think their work is worth a damn, should think something.

If anyone wants to grant me a small wish, nothing too crazy, I have an idea. I’d like to pick a bookstore, stop time until I read everything in it, and I must read everything, and then resume time and life as normal. I promise I won’t even pick a big bookstore like Strand. How about Daunt Marylebone, which is a reasonable size, looks friendly enough to spend an eternity in, and stocks everything important. I’ve thought about this a bit so I have what I consider quite a fair appraisal of my ask.

Arguments for, it’s overpowered. I’m not good at estimating, but that must be a few thousand books. Many times what I can go through in a lifetime. But it’s not only asking to be able to fit more books in a lifetime. All the reading gets compounded at once. Humans haven’t explored what happens when you download a few gigabytes of text onto a brain in an instant. It could be insane. I’ll find some connection for sure. Plus I’ll have read a bunch of books I know I should, but also know that I’ll otherwise never have time for and will just die first.

Arguments for, it’s overrated. I never asked to remember any of those books. It’s unclear what happens to my memory during time stoppage. Do I remember until time resumes, and then start forgetting all thousands of books at once, and remember nothing a year later? Or, do I still forget during time stoppage, so when time resumes, I’ll have forgotten everything except the last twenty books I read? I still have to reread stuff later in life, because books hit differently at different ages. Don’t forget also I have to read every book. So a bunch of junk self-help and every number in the Book of Numbers. And I’ll randomly know current releases really well, for one value of current. Not sure if that’s what I want.

In conclusion I’m not even asking for that much. So if you are a granter of modest, thoughtful wishes, please consider my case.

In other reading, this was an incredible year for The Economist obituaries. Not for anyone dying of course, for the writing. Congratuations are in order to Ann Wroe. In January I wrote to ask her if she would write one for Thich Nhat Hanh, and was so happy when she replied to tell me she was on the case. I’m not pretending that it was on account of me. But I’m happy to have detected something about how the paper picks its rare class of 50 individuals a year deserving of the best obituary in journalism. Apparently The New York Times requires that you “change the culture” to get them to write you one. Get over yourself, new york times. The Economist chooses from everyone you should know about.

This year, for a life you’ll have a hard time believing is true, read Gilberto Rodríguez Orejuela once ran 80% of the world cocaine market. For a single pivotal moment of life, Mario Terán was the man sent to kill Che Guevara. For the individual you’ll never have heard of without The Economist but will be happy that you have now, Lawrence MacEwen made a tiny island prosper. For one life awesome in their dedication to their craft, Franz Mohr was the man who made great concerts possible. For two heartwarming passions, Gloria Allen ran a charm school for young trans women and Jay Pasachoff travelled the world to catch the Moon eclipsing the Sun. (By the way, do any friends want to plan way ahead of time and see the next total solar eclipse with me?) But the best obituaries this year were the two that broke my heart. Pasha Lee went from Ukrainian screen idol to volunteer and Albert Woodfox found his true self in prison. I can’t choose.

In the world, it was a good year to have believed in Jake early. And unthinkably good that we have Brandon instead of that other guy. More seriously though, with some exceptions, we held on to democracy and other ideals worth fighting for. Sláva Ukrayíni.

I promise not to be the annoying friend who always references their foreign travels. I’ll contain myself to the following passage, because you can opt-out of it.

This year I spent time in these cities for the first time: Barcelona, Brussels, Amman, Paris, Ravenna, Geneva, and Berlin. The one that I most want to return to is Ravenna. When I moved to London, going to Italy was the highest priority on my list. Rome is too intimidating. So I still haven’t been to Rome. But Ravenna once surpassed Rome, so it was as good a choice as any.

I mentioned that I used to play historical strategy games set in Ancient Rome. That taught me Ravenna’s strategic importance. It’s a prime location in Northeastern Italy. Your battered legions in the process of pacifying Raetia and Noricum can just take a breather in Ravenna, no need to march all the way back to Rome. Also nice is access to the Adriatic, so you can embark to Greece without marching to Tarentum. Not the most erudite initial source of my interest in the city, but it is what it is.

In real life Ravenna was the capital city of the Western Roman Empire when Alaric sacked Rome in 410. Surrounded by marshland, very few armies conquered the city, preferring to bypass it. I read Judith Herrin’s Ravenna: Capital of Empire, Crucible of Europe in preparation of visiting. It’s not the most eventful book, but it has some interesting facts about the rivalry between Ravenna and Rome. Being there in person, though, it’s hard to believe that Ravenna could ever compare. I don’t know how big Rome used to be, but today walking across central Ravenna takes ten minutes. It’s puzzling to me how, without a pivotal event, hundreds of years of the real power in the West living in Ravenna didn’t prop it up into anything more today.

I’m happy it’s obscure, though. I spent three days there by myself in October. The weather was perfect. There were very few tourists. I saw peak mosaic, which is how people should be hearing about Ravenna. Apparently during the summer some confined mosaics are so popular that you can only book ten minute slots to see them. But since no one was there, I could spend as long as I wanted to staring at the ceiling. You don’t want to hear this but pictures don’t do it justice, you have to see it yourself. Have the beauty completely surround you under the domes and apses, and let it sink in that these things are over a thousand years old.

If you search for it, the mosaics reveal many stories. In the Basilica of Sant’Apollinare Nuovo there’s a depiction of the port town of Classe. Three nice boats in port, next to the town. You can’t see most of the town behind a very solid wall. Wow they did a good job on that mosaic of a wall, they must have really cared. No, it’s out of place among the other much more lavish mosaics that show more technical skill. It turns out that there was something else there, but it was too Arian, so when Agnellus rededicated the church to Orthodoxy it had to go. If you look up close there’s some trace of what came before.

In the United Kingdom I was glad to have gone to Cambridge, Oxford, Ely, and Bath multiple times, and Peterborough and Haltwhistle (for Hadrian’s Wall) once this year. Okay, I actually went to Hadrian’s Wall in November last year, but please let me talk about it since I didn’t write a letter last year. Peterborough Cathedral makes a delicious Sunday roast. Bath is a historically-themed shopping mall, don’t go. There’s a small church in Oxford, with a plaque buried in a corner honoring a chorister who sang at the church for 40 years. If that was my community, I wouldn’t want it built over, either. There’s a bus that runs along Hadrian’s Wall, and for most of the ride I was the only passenger on it. So it was news when another person got on. Except it wasn’t another passenger; it was the bus driver’s mom, getting on to give him a sausage roll.

In the United States, I revisited San Francisco, New York, Boston, Baltimore, and Washington DC. I used to want to live in New York slightly more than I wanted to live in London. As of this writing that has reversed. It started with my waiting 20 minutes for the next subway from JFK to Manhattan. Then, from the inside of some trains, you can’t even tell which line you’re on or what the next stop is. That should be illegal. New York also got too stressful for me. It felt dystopian to join the mass of suits spilling out from the PATH into the Westfield Oculus. New York has maxed out some measures, while London is more balanced and still has enough of everything you could want. Nevertheless, New York has far and away more personality. A breakdancer got on the subway with his speaker, and asked me to move to give him a bit of room. Then he gave me a fist bump before starting his routine. That never happens in London. And that is worth a lot.

Caveats out of the way that America is awesome. But I’m going to make a brief case for people to move to London (over New York, its only competition). In London the food at every price point, except either extreme, is better, due to better ingredients and people who have spent more time in the area where the dish originated. The coffee is better, and tea competes with coffee for people’s caffeine allowance. British humor is deeper. There’s lettuce, there’s tofu, and then there’s lettuce or tofu. There are more ads for La bohème than there are for Shen Yun. People read more physical books on the tube, which I will defend is a good indicator, and not just pretentious.

Prices are high, but compared to New York you’re getting more value. Young Royal Opera House tickets are 25 pounds, and a friend even saw then Prince Charles there. 45 minutes of public transit will get you all the way to Reading. Did you know that the minimum required paid time off is 5.6 weeks? I don’t think you’re pricing that in correctly. Most of Europe is a cheap two hour flight away. London has the most airports of any city in the world (six). And based on popular destinations you want to go (sorry if you wanted to go to New Zealand), I claim that these airports are the best-positioned airports in the world. So if you don’t like London, you can leave efficiently and often.

Thanks to my friends living in London for being patient as I rehashed some opinions they already know. But one quality I don’t think anyone praises enough is how lovely the skies are. People idealize the rolling English countryside hills, but they should be looking up. It’s another thing a photo won’t do justice to. But I’ll try, with one I took on a train, that I included at the end of this letter. Since high school AP Art History, my favorite painter has been Joseph Mallord William Turner. I always thought he was good at painting skies. Now I see that he was also spoiled by the skies he grew up under.

I was spoiled for Turners this year. The National Gallery hangs the highest number of his best oils. There’s the boat, of course, always near some version of Dutch boats in a gale. My favorite in that room with the blue walls is Ulysses deriding Polyphemus. It’s everything I like about Turner: subjects of boats, water, clouds, ancient history, mythology, modernity. I visited Boston this year in time for the exhibition Turner’s Modern World. I was lucky to see the MFA’s Slave Ship, too fragile to move, and the Cleveland Museum of Art’s Burning of the Houses of Lords and Commons, because I’m not planning to go to Cleveland. Franny Moyle’s The Extraordinary Life and Momentous Times of J. M. W. Turner, which I read in Boston, has good facts about those pieces. London is also blessed with the largest Turner collection anywhere, in the Tate Britain, but their curation is scattered. Turner is so easy to organize by period, location, or medium. I don’t understand why they chose to categorize by subject matter, and chose subjects they don’t have the oils to justify.

Oh sorry. I was talking about London, then I got distracted talking about Turner. Yeah, so, the sky. The biggest reason not to move to London is the darkness of winter. Nautical twilight starts at four in the afternoon in December. A friend remarked, and I agree, February is the worst month of the year. It’s cold, it’s dark, and you don’t have anything to look forward to. The first Bank Holiday isn’t until April. She also said, though, that if you can survive two winters, you’re good. My second winter was worse than the first. I used to think I was immune to seasonal affective disorder. This year I was not.

I’m always really happy to visit my parents in Arizona, which I did three times this year. Thank you British Airways for one nonstop London Heathrow to Phoenix a day. At the start of this year I’m pretty sure the jankiest Boeing 777 in BA’s fleet serviced the route. But at some point we got upgraded to an Airbus A350, thank you again BA. I’m a bit sad this means it’ll never be a Boeing 787, but I can’t be too picky.

By the way, some advice for Heathrow: always take the lift. Heathrow is a big airport, and that includes vertically. Except for the escalator on arrival from passport control to baggage claim, you’re always moving multiple floors at once. Don’t let the short length of the escalator deceive you. Once it drops you off, you’re going to turn around and get on another one. Watch the flight attendants all take the lift. They don’t have luggage and are certainly not lazy. The lift is almost always faster.

London to Phoenix is a ten hour flight. But it can be one of the best ten hour flights. A few stars have to align. The first is completely under your control: get a window seat. Thanks to DeepMind paying for my business travel I got British Airways bronze status this year, which is all I need to pick my seat for free. Second, you need a somewhat cloudless sky. A few clouds are good. As the plane ascends it banks, coasting right above a cloud layer, a ride way better than any amusement park simulation. Third, you need flight attendants nice enough to let you keep the window shades up. Here the A350 has a clear advantage over the 787. You can sneak open a manual shade. You can’t do that with a central opacity setting. They make you put the shades down so people can sleep or whatever, but we don’t want to support that.

People should be fighting each other to press their nose to the window. From Phoenix to London you can clearly see off the left side Kansas City, Topeka, Chicago (the coastline and the 394 are especially sharp), Toledo, Montreal, and so many more places I couldn’t be completely sure of from the in-flight map. From London to Phoenix you can see the tip of the Prince Christian Sound, with all its little ice islands. I also saw a cool waterway shape over Minnesota. I knew that that was a lucky flight when we taxied by the Heathrow Concorde before takeoff. Truly a queen, living out an elegant retirement.

At home in Arizona, I’m proud to have really improved at making coffee with a moka pot. The last secret was using a cast iron heat diffuser. I’ve completely eliminated any bitter aftertaste. I’ve been asked if I would choose to live forever if I could, and wouldn’t I get bored if I did. I don’t think I’ll ever get bored of my ritual of pulling a good shot and then failing my latte art day after day.

A note on personal productivity. I kept coming back to an observation I read a while ago that mental energy doesn’t get depleted, like physical energy, but fortified, as you do more. That has been true for me this year. It seems too good to be true. It says you can do this bundle of things and get them done, or you can do this bundle bigger in every single way and get that done for free. The tasks help themselves get done in a positive spiral.

So next year I’m setting the unrealistic target of both testing one meaningful hypothesis (my definition) a day, and reading one book a day. Among other personal goals. I probably won’t get there. This year I tested a good hypothesis once a week at best and once a month most of the time. I read a book roughly every two weeks. But I have faith that improving both those metrics at the same time will be easier than only doing one of them. At some point my brain does get fried. But it’s usually farther off than I think. Like my manager at NASA (she is brilliant) used to say about triathlons, it’s all a mental game.

It’s the Year of the Tiger. The year I was born was one, the year I turned 24, this year, is one, and it will be the Year of the Tiger for another month longer. Instead of other times scales, I thought about a twelve year plan. One year is too short, and five year plans have a bad reputation. But twelve is a nice number. Twelve zodiac signs, twelve disciples, twelve months a year. A friend convinced me long ago that our counting system should’ve been base-12. More divisors.

A key feature of the twelve year plan is that twelve years is so far away that I can’t pin anything down. Thinking long-term is a struggle, even when long is only twelve years. My only hope is to think about what I care about, vaguely where I want to be, and what I can do every day that hopefully adds up somewhere in the vicinity. One is being in the company of friends. This letter is dedicated to my friends, old and new. You made me feel so vividly alive this year. You know who you are.


  1. A friend in physics put it even more bluntly: “The Nobel Prize was basically free.” 

  2. Image generation improved rapidly. Proof by, just look. David Bender and Douglas Hofstadter tripped up GPT-3 with some crafted nonsense questions in June; ChatGPT blew past them in December. A podcaster and YouTuber I follow discussed early DALL-E, and speculated about video generation. Meta released Make-A-Video that same month, and they had to devote their whole next episode to AI too. 

  3. I don’t think anyone’s predictions will survive contact with five years of reality (though some are trying their best to retcon their publication record). Such is the business of predicting the future

  4. The authors aren’t blameless if their result isn’t replicable. But AI is a more forgiving field because the artifacts are for now less interpretable. 

  5. For the non-specialist, reinforcement learning agents are sometimes optimized towards periodically updated “target values” to stabilize learning. You still want to update them more often than leaving it up to the scheduler gods, though. For a nice overview of reinforcement learning, see Lilian Weng’s A (Long) Peek into Reinforcement Learning

  6. I stole lots of entertaining phrases from Tyler Cowen. 

  7. Cicero Tusculan Disputations III xxviii 69: “Thus Aristotle, accusing the old philosophers who taught that philosophy had been perfected by their own talents, says that they were either very stupid or very conceited; but that he sees that, since in a few years a great advance has been made, philosophy will in a short time be brought to completion.” Everyone should watch Agnes Callard’s Aims of Education

  8. There are other definitions for this project. An algorithmic one, from Legg and Hutter; one that forefronts efficiency, from Francois Chollet, and many more

  9. Thanks Eren for inventing this awesome application of the term. 

  10. In the Lantern Slides I pick on DeepMind a bit. But make no mistake: I am a huge fan of and absolutely adore the researchers I call out. I also think DeepMind is the most interdisciplinary AI research lab in the world. Other than engineers and mathematicians, we’ve got philosophers, artists, game designers, historians, governance specialists, every kind of scientist under the sun, and possibly you, too

  11. Sorry Sam. Can I still have 1.5 million?