Jack Cushman writing

Fable pushes coding toward another phase change

Wed, 10 Jun 2026 00:00:00 +0000

Shawn Wang at Cognition tweeted two charts from FrontierCode — the benchmark Cognition launched on Monday to test coding agents' ability to create fully mergeable pull requests. I pulled out the top-scoring models from those charts to highlight one of the stories he's telling.

What's going on in this chart?

FrontierCode has 150 problems split into three difficulty levels. Wang's tweets show one-shot performance on the easiest 50 and hardest 50 problems — meaning, the percentage chance that a model can fully solve a problem of that difficulty in a single attempt.

My summary chart shows the top model performance on those easy and hard problem sets over time — how good the best available model was at fully solving easy or hard problems in one try at every point over the last year or so. Call this the "easy frontier" and the "hard frontier."

These frontiers explain a lot about the experience of coding with agents over the last year:

In November 2025, the easy frontier jumped from succeeding about half the time to about three quarters of the time. That was when many programmers noticed that vibe coding now worked. Before that jump, if you asked a model to edit a codebase repeatedly it would soon spiral into failure, spending more and more time to get less and less done. You had to go in and manually fix things up after each edit to keep things working. After that jump, you could keep making requests without reading the code in between, and on average it would improve.
By February 2026, the easy frontier had saturated. New models brought no real advantage on easy tasks.
Meanwhile the hard frontier had not saturated — but it was hard to feel the progress. Jumping between "solving hard problems 2% of the time" and "solving hard problems 4% of the time" is a huge accomplishment, but hard to detect in practice. Things were moving fast — GPT-5.3 doubled Opus 4.5, Opus 4.8 doubled GPT-5.3, Fable doubled Opus 4.8 — but felt like they were barely moving at all.

Fable gets into detectable numbers, though: according to FrontierCode it should now one-shot hard problems about one third of the time. That could be another November-2025-style phase change in what coding models can do.

Launching the Agent Protocols Tech Tree

Mon, 23 Feb 2026 00:00:00 +0000

A version of this essay was originally published at the Library Innovation Lab blog.

Today I am sharing the Agent Protocols Tech Tree. APTT is a visual, videogame-style tech tree of the evolving protocols supporting AI agents.

Where did this come from?

I made the APTT for a session on “The Role of Protocols in the Agents Ecosystem” at the Towards an Internet Ecosystem for Sane Autonomous Agents workshop at the Berkman Klein Center on February 9th.

It’s a video game tech tree because, while the word “protocols” is boring, the phenomenon of open protocols is fascinating, and I want to make them easier to approach and explore.

What is an open protocol? Why care about them?

An open protocol is a shared language used by multiple software projects so they can interoperate or compete with each other.

Protocols offer an x-ray of an emerging technology — they tell you what the builder community actually cares about, what they are forced to agree on, what is already done, and what is likely to come next.

Open protocols go back to the founding of the internet when basic concepts like “TCP/IP” were standardized — not by a government or company creating and enforcing a rule, but by a community of builders based on “rough consensus and running code.” On the internet no one could force you to use the same standards as everyone else, but if you wanted to be part of the same conversation, you had to speak the same language. That created strong incentives to agree on protocols, from SMTP to DNS to FTP to HTTP to SSL. By tracing each of those protocols, you could see the evolving concerns of the people building the internet.

(For a great discussion of that history, see “The Battle of the Networks” from LIL faculty director Jonathan Zittrain’s book “The Future of the Internet — and How to Stop It.”)

Why are protocols so important for AI agents?

Like the early internet, AI agents today are an emerging, distributed phenomenon that is changing faster than even experts can understand. We’re holding workshops with names like “Towards an Internet Ecosystem for Sane Autonomous Agents” because no one really knows what it will mean to have millions of semi-autonomous computer programs acting and interacting in human-like ways online.

Also like the early internet, it’s tempting to look for some government or company that is in charge and can tame this phenomenon, set the rules of the road. But in many ways there isn’t one. The ingredients of AI agents are just not that complex or that controlled.

This makes sense if you look at Anthropic’s definition of an agent, which is simply “models using tools in a loop.” That is not a complex recipe: it requires a large language model, of which there are now many, including powerful open source ones that can run locally; a fairly small and simple control loop; and a set of “tools,” simple software programs that can interact with the world to do things like run a web search or send a text message. “Agents” as a phenomenon are a technique, like calculus, not a service, like Uber.

That makes agents hard to regulate, and makes protocols incredibly important. It is protocols that give agents the tools they use. It is protocols that the builder community are developing as fast as they can to increase what agents can do. If you want to nudge this technique toward human thriving, it is protocols that might most shape agent behavior by making some agents easier to build than others.

To be sure, protocols aren’t the only way to influence technological development. Larry Lessig’s classic “pathetic dot theory” outlines markets, laws, social norms, and architecture as four separate ways that individual action gets regulated, and protocols are just an aspect of architecture. But the more a technology is dispersed and simple to recreate, the more protocols come into play in how it evolves.

How do I use the APTT?

APTT is designed to be helpful whether you’re a less-technical person who just wants to understand what agents are, or a more technical person who wants to understand exactly what’s getting built.

Either way the pile of agent technologies is confusing, so I recommend starting at the beginning with “Inference API.”

Video games are often designed so you start with a simple feature unlocked and then progressively unlock more and more complex options as you learn the game. The same approach works here: imagine that you have just unlocked “Inference API” in this game, and once you’re comfortable with that, explore off to the right to see how each protocol enables or necessitates the next.

You can click each technology to learn what problem it solves (why did people need something like this?), how it’s standardizing (who kicked this off?), and what virtuous cycle it enabled (why did other people want to get on board?).

You can also see visual animations of how the protocol is used — what messages are actually sent back and forth between who?

If you’re interested in the technical details, you can click any of the messages to see at a wire level what’s actually happening. (Often, something simpler than it sounds.)

As you move off to the right, you’ll go from widely adopted technologies, like MCP, to technologies that have commercial supporters but not much social proof yet, like Visa TAP, or technologies that don’t even exist but might make sense in the future, like Interoperable Memory, Signed Intent Mandates, or Agent Lingua Franca.

The ragged edge on the right is where I hope you’ll be the most critical: what seems inevitable, what seems like a dead end, and what would you like to see more of?

How accurate is all of this? How do I fix mistakes?

APTT is a work in progress, and to be honest in many ways is a whiteboard sketch. I put it together (and vibe coded much of it) to help support a conversation, first at the workshop and now online. I think whiteboard sketches are useful, so I’m sharing it, but I don’t pretend it’s authoritative; it’s just my rough sense of how things work right now.

(This is a weird thing about the agentic moment — my coding agent has made this tool look more polished and complete than it may really deserve. Think napkin sketch with fancy graphics.)

If you think I got things wrong or missed part of the story, please open an issue on the GitHub repository. I plan to keep this rough and opinionated, and focused on consensus-driven protocols as a lens for understanding what’s happening — so I’ll either pull contributions into the main tool, or just leave them as discussions to represent the range of opinions about how all of this works. I hope it’s fun to play with either way.

What is Signal? The messaging app, explained.

Thu, 27 Mar 2025 00:00:00 +0000

A version of this essay was originally published at MIT Technology Review.

With news this week of the messaging app being used to discuss war plans, we get you up to speed on what Signal should be used for—and what it shouldn’t.

MIT Technology Review Explains: Let our writers untangle the complex, messy world of technology to help you understand what’s coming next. You can read more from the series here.

With the recent news that the Atlantic’s editor in chief was accidentally added to a group Signal chat for American leaders planning a bombing in Yemen, many people are wondering: What is Signal? Is it secure? If government officials aren’t supposed to use it for military planning, does that mean I shouldn’t use it either?

The answer is: Yes, you should use Signal, but government officials having top-secret conversations shouldn’t use Signal.

Read on to find out why.

Signal is an app you can install on your iPhone or Android phone, or on your computer. It lets you send secure texts, images, and phone or video chats with other people or groups of people, just like iMessage, Google Messages, WhatsApp, and other chat apps.

Installing Signal is a two-minute process—again, it’s designed to work just like other popular texting apps.

Signal is very secure—as we’ll see below, it’s the best option out there for having private conversations with your friends on your cell phone.

But you shouldn’t use it if you have a legal obligation to preserve your messages, such as while doing government business, because Signal prioritizes privacy over ability to preserve data. It’s designed to securely delete data when you’re done with it, not to keep it. This makes it uniquely unsuited for following public record laws.

You also shouldn’t use it if your phone might be a target of sophisticated hackers, because Signal can only do its job if the phone it is running on is secure. If your phone has been hacked, then the hacker can read your messages regardless of what software you are running.

This is why you shouldn’t use Signal to discuss classified material or military plans. For military communication your civilian phone is always considered hacked by adversaries, so you should instead use communication equipment that is safer—equipment that is physically guarded and designed to do only one job, making it harder to hack.

Signal is designed from bottom to top as a very private space for conversation. Cryptographers are very sure that as long as your phone is otherwise secure, no one can read your messages.

Why should you want that? Because private spaces for conversation are very important. In the US, the First Amendment recognizes, in the right to freedom of assembly, that we all need private conversations among our own selected groups in order to function.

And you don’t need the First Amendment to tell you that. You know, just like everyone else, that you can have important conversations in your living room, bedroom, church coffee hour, or meeting hall that you could never have on a public stage. Signal gives us the digital equivalent of that—it’s a space where we can talk, among groups of our choice, about the private things that matter to us, free of corporate or government surveillance. Our mental health and social functioning require that.

So if you’re not legally required to record your conversations, and not planning secret military operations, go ahead and use Signal—you deserve the privacy.

People often give up on finding digital privacy and end up censoring themselves out of caution. So are there really private ways to talk on our phones, or should we just assume that everything is being read anyway?

The good news is: For most of us who aren’t individually targeted by hackers, we really can still have private conversations.

Signal is designed to ensure that if you know your phone and the phones of other people in your group haven’t been hacked (more on that later), you don’t have to trust anything else. It uses many techniques from the cryptography community to make that possible.

Most important and well-known is “end-to-end encryption,” which means that messages can be read only on the devices involved in the conversation and not by servers passing the messages back and forth.

But Signal uses other techniques to keep your messages private and safe as well. For example, it goes to great lengths to make it hard for the Signal server itself to know who else you are talking to (a feature known as “sealed sender”), or for an attacker who records traffic between phones to later decrypt the traffic by seizing one of the phones (“perfect forward secrecy”).

These are only a few of many security properties built into the protocol, which is well enough designed and vetted for other messaging apps, such as WhatsApp and Google Messages, to use the same one.

Signal is also designed so we don’t have to trust the people who make it. The source code for the app is available online and, because of its popularity as a security tool, is frequently audited by experts.

And even though its security does not rely on our trust in the publisher, it does come from a respected source: the Signal Technology Foundation, a nonprofit whose mission is to “protect free expression and enable secure global communication through open-source privacy technology.” The app itself, and the foundation, grew out of a community of prominent privacy advocates. The foundation was started by Moxie Marlinspike, a cryptographer and longtime advocate of secure private communication, and Brian Acton, a cofounder of WhatsApp.

Many apps offer end-to-end encryption, and it’s not a bad idea to use them for a measure of privacy. But Signal is a gold standard for private communication because it is secure by default: Unless you add someone you didn’t mean to, it’s very hard for a chat to accidentally become less secure than you intended.

That’s not necessarily the case for other apps. For example, iMessage conversations are sometimes end-to-end encrypted, but only if your chat has “blue bubbles,” and they aren’t encrypted in iCloud backups by default. Google Messages are sometimes end-to-end encrypted, but only if the chat shows a lock icon. WhatsApp is end-to-end encrypted but logs your activity, including “how you interact with others using our Services.”

Signal is careful not to record who you are talking with, to offer ways to reliably delete messages, and to keep messages secure even in online phone backups. This focus demonstrates the benefits of an app coming from a nonprofit focused on privacy rather than a company that sees security as a “nice to have” feature alongside other goals.

(Conversely, and as a warning, using Signal makes it rather easier to accidentally lose messages! Again, it is not a good choice if you are legally required to record your communication.)

Applications like WhatsApp, iMessage, and Google Messages do offer end-to-end encryption and can offer much better security than nothing. The worst option of all is regular SMS text messages (“green bubbles” on iOS)—those are sent unencrypted and are likely collected by mass government surveillance.

Signal is an excellent choice for privacy if you know that the phones of everyone you’re talking with are secure. But how do you know that? It’s easy to give up on a feeling of privacy if you never feel good about trusting your phone anyway.

One good place to start for most of us is simply to make sure your phone is up to date. Governments often do have ways of hacking phones, but hacking up-to-date phones is expensive and risky and reserved for high-value targets. For most people, simply having your software up to date will remove you from a category that hackers target.

If you’re a potential target of sophisticated hacking, then don’t stop there. You’ll need extra security measures, and guides from the Freedom of the Press Foundation and the Electronic Frontier Foundation are a good place to start.

But you don’t have to be a high-value target to value privacy. The rest of us can do our part to re-create that private living room, bedroom, church, or meeting hall simply by using an up-to-date phone with an app that respects our privacy.

Jack Cushman is a fellow of the Berkman Klein Center for Internet and Society and directs the Library Innovation Lab at Harvard Law School Library. He is an appellate lawyer, computer programmer, and former board member of the ACLU of Massachusetts.

LLMs are universal translators

Wed, 29 Nov 2023 00:00:00 +0000

A version of this essay was originally published at the Library Innovation Lab blog.

Here is a picture of the Statue of Liberty doing a TikTok dance, as painted by van Gogh, as interpreted by ChatGPT. This is very relevant to my point and we’ll come back to it.

One of the best ways to think about large language models is as universal, personal translators. When I gave a talk at a Spanish-language library conference in Argentina recently, it was an excellent chance to test what LLMs currently offer as translators and what they might become. The answer made me optimistic for how LLMs can work as humanistic knowledge tools, in concert with library values.

This is long, so I’ve broken it up into a few sections that might be helpful to different audiences:

LLMs are universal translators. This section explains LLM embedding spaces and argues that many of LLM’s most successful applications are essentially translation tasks. I argue that LLMs are “universal translators,” not in the sense that they are perfect but in the sense that they try to translate between any input and any output.
How I built my own personal translation tools. When I spoke in Argentina, I built my own tools to translate my conference slides to Spanish and to translate other talks to English. This section gets into the weeds of what I did and how I did it. It will be most useful if you are a programmer interested in making more practical use of LLMs, or if you are interested in what might be possible for everyone as LLM tools get easier to use.
Building my own tools, part 2: real time translation. After my own talk, I watched other talks using a multimodal model to translate slides, and voice recognition and text completion APIs to translate talks.
What a universal translator means for an innovation lab. The ability to make individual, personalized translation tools changes what all of us should work on next — things that once could have been entire companies are now afternoon projects. This part considers, on the one hand, how my trip made me imagine a bunch of tools I could make and share, and on the other hand whether making and scaling tools still makes sense at all.
The cooperative principle, AI translators, and human connection. This part reflects on my experience of using technical tools to try to connect with people. I find that they highlight the “cooperative principle” — when two people communicate via an accessibility tool, they have to be more attentive, rather than less, to each other’s social signals, making me optimistic that tools can help to bring us together rather than alienate us.

LLMs are universal translators

LLMs are, in a literal sense, universal translators. They take all of their training data and embed it in a single high dimensional space, an embedding space, and then produce outputs by moving around this embedding space.

The goal of an embedding space is that similar concepts end up near each other, and different concepts end up far away. And the goal of a “large” language model is to embed everything — the space is trained using trillions of tokens representing all of the world’s digital knowledge.

A classic example to understand embedding spaces is this: we take a bunch of data and train an encoder so that if we put in similar words, they encode to a similar location in space. Words like “king” and “queen” each end up encoded as locations somewhere in the embedding space. And then, miraculously, it turns out we can do math on those locations and it makes sense. If you encode “king” into a location, and then subtract the location of “man” and add the location of “woman”, you arrive at the location of “queen.”

This is already a kind of “translation” — we’re literally moving, or translating, from the location of “king” to the location of “queen.” But we can do other kinds of translation with this same technique. We can subtract English and add Spanish, and move from “king” to “rey.” Or we can build encoders that embed pictures and sound as well as text, and encode a picture of King Arthur and come out with the word “king,” or encode the word “king” and come out with an audio file of someone saying “king.”

Embedding spaces translate from everything to everything.

Not surprisingly, a lot of the most promising applications of LLMs can be thought of as translation problems:

A programmer inputs a comment describing what a function should do in English, and it is translated to an implementation of the function in Python.
A doctor inputs an image of an x-ray, and it is translated to an English-language diagnosis.
A user inputs a text description of an image, and it is translated to an image matching the description.
A lawyer inputs a list of summaries of case holdings and facts provided by a client, and it is translated to a legal brief.
A social network inputs images uploaded by users, and they are translated to text descriptions for screenreader users.
And of course literal translation — you click “Translate text” in the Firefox browser and your computer translates it to another language.

This brings us back to the image of the Statue of Liberty doing a TikTok dance, as painted by van Gogh, that opened the article. How did the program “know” what the Statue of Liberty looks like, what dancing looks like, how van Gogh paints, or how those would all go together? It started at a random point in a high-dimensional embedding space, and then translated toward the spot that had the highest overlap of Statue-of-Liberty-ness, dance-ness, and van Gogh-ness, which it could do because it was able to encode and decode both text and images in and out of that space. It could just as easily have navigated to nearby spaces — from Statue of Liberty to Napoleon, or from van Gogh to Monet:

All of the concepts of the world are embedded in the same space and available for translation.

The idea of large language models is that we want the same model to do all of these tasks, because with human problems there’s no way of predicting what’s relevant to what. The lawyer’s brief or the programmer’s code or the Firefox translation could all require a concept map that includes Napoleon or TikTok trends for an accurate translation; large language models are willing to absorb it all and remix in any form.

That’s what I mean by “universal” translator — we don’t have to decide, up front, which facts are necessary for a successful translation, what inputs and outputs to use, because every available idea can be translated in and out of the same embedding space.

Being a universal translator doesn’t make something an accurate translator, or a social benefit. I’m not using “universal” as a superlative or saying it can do any particular translation task well. But a universal translator is a very different tool from a special-purpose translator, and it’s worth experimenting to see what it means to have one.

How I built my own personal translation tools

So, I believe that LLMs are universal translators. And I also believe, as the head of an innovation lab, that getting our hands messy is the best way to improve our intuitions about what’s coming next. So when I was invited to give a talk on disruptive innovation in libraries (adapted transcript) at the Universidad Católica de Argentina for a Spanish audience — a language I don’t speak — it was the perfect chance to experiment with what it means to have a universal translator.

To be clear, I was able to attend this Spanish-language conference not because of the tools described below, but because of the resourcefulness, patience, and enthusiasm of UCA library director Maria Soledad Lago, language professor Mercedes Rego Perlas, and the other speakers and attendees who welcomed me. Many thanks for all of their support, including with these experiments!

The scenario I decided to test was: I’m attending a conference in a foreign language, and I’m going to use low-level APIs to see if it’s possible to build my own tools to solve problems while I’m there.

My first goal was to see if it was possible to translate my slides. I knew my talk would be offered with simultaneous translation, but I wanted it to be easier to follow the text on the slides as well. That is, I wanted to show each block of text in the slides in both English and Spanish, like this:

PowerPoint already has a translator built in — you can click a text box and get a translation, like this:

I wanted to see if I could save time by automatically inserting translations for all the text boxes. I also thought I could improve on the PowerPoint feature in a couple of ways:

I could include round-trip translations in each box, English -> Spanish -> English, which would give me a way to check the translation accuracy without speaking Spanish.
I could translate entire slides at once, instead of just one text box, which would give the translation program more context to work with.
I could keep the internal formatting of the text boxes, so the same word would end up highlighted in both versions of the text.

And because the goal was to test whether universal translation can make translation tools more personal and customizable, I wanted to try to do all this in a few hours.

I started by asking ChatGPT to write a program to edit a PowerPoint deck for me:

Full ChatGPT chat transcript of getting started opening and editing PowerPoint files.

With a little back and forth, I had a starting point — a simple program that capitalizes each word in a PowerPoint. I then started copying and pasting in code to call the OpenAI API. All I’d have to do is take the text blocks for each page, ask GPT4 to translate them to Argentine Spanish, and put the results back in. This gave me a chance to try out OpenAI’s function calling API for structured output, which I had a hunch would help with translation.

I had the fun experience at this point of having Copilot, a GPT-powered coding tool, start to recommend instructions to supply to its sibling in the translation prompt:

Here you can see that I’ve written some code myself to make a “translate” function that takes a string in English and returns Spanish, and I’m writing the instructions that will be sent off to the model. Copilot sees what I’m doing, and suggests the completion of the instruction in gray italic text — including, itself, translating English to Spanish.

The upshot was a script that edited slides to look like this:

This looks like a mess, but it’s just what I wanted! For example, here’s the text of the first block:

“The Library is to us what a laboratory is to the chemist or the physicist, and what the museum is to the naturalist.” -> “La biblioteca es para nosotros lo que el laboratorio es para el químico o el físico, y lo que el museo es para el naturalista.” -> “The library is for us what the laboratory is for the chemist or the physicist, and what the museum is for the naturalist.”

Since the round trip language looks good, I can guess that the Spanish is at least intelligible, and delete the round trip translation and move on.

Checking the round trip translations was a fascinating game, and changed how I think about machine translation. One slide I was suspicious about, for example, translated the English “patron” as “mecenas”:

I wondered if “patrons” came through correctly, or was confusing, so checked what ChatGPT thought, without tipping my hand about the word “mecenas”:

Me: what word is most common for library users in spanish? is there a word like “patrons” that denotes something distinct from commercial customers? ChatGPT: In Spanish, the term “usuarios” is commonly used to refer to library users. “Usuarios” simply means “users.” When specifically referring to library patrons in a context where there’s a need to distinguish them from commercial customers or other types of users, the term “socios” (members) or “afiliados” (affiliates) can be used. However, the context is essential, as “socios” can also mean “partners” in a business sense, and “afiliados” can mean “affiliates” in other contexts. But in general, “usuarios de la biblioteca” (library users) is a standard and clear way to refer to library patrons in Spanish. Me: how would “Los usuarios (socios?) no nos deben su patrocinio.” come across if translated to english? ChatGPT: The sentence “Los usuarios (socios?) no nos deben su patrocinio.” would translate to English as: “The users (members?) do not owe us their patronage.” Here, “usuarios” means “users,” and “socios” can mean “members” or “partners.” The term “patrocinio” typically means “sponsorship,” but in this context, it seems to imply “support” or “patronage.”

I thought it was fun that English-language libraries have this ongoing debate about “patron” as a term, but Spanish libraries don’t seem to have adopted that usage, so I decided to highlight the translation challenge in the slide:

This was one of many probes to check things I wasn’t sure about — you can see the whole transcript here.

All in all, in the space of about four hours, I made a novel tool to translate slides and used it to translate and check the slides for a half hour talk. Throughout, I overtly put a lot of trust in ChatGPT’s language advice, which I knew could be completely inaccurate — an intentional decision to trust the audience of humans to meet me halfway in deciphering any errors ChatGPT might introduce.

Audience feedback was good — influenced, I think, by the fact that I presented it as an experiment and checked in on the translation quality as I presented the trickier slides. Audience members commented that the translated slides were helpful for following a talk in simultaneous translation, and the key points were not lost.

At the same time, it was clear that the translations remained choppy and required readers to work to interpret what I meant. Mercedes Rego Perlas, a linguistics professor at the Universidad de Buenos Aires who worked with me to translate a later version of the talk, commented that the AI was bad at knowing what it didn’t know: if I used untranslatable terms like “loss leader” or “cost center,” the program gamely emitted nonsense, where a human translator would know to ask for clarification and negotiate a compromise, as Mercedes herself did at several points. As always with LLMs, it would take more experimentation to see if a better prompt or control loop could fix that problem — Mercedes was less optimistic than I was.

Building my own tools, part 2: real time translation

After my own talk, I tested out the “universal translator” in other ways. For example, I tested GPT4’s new vision capabilities by asking it to interpret photos in conversations like this one, from a talk by Andrés Felipe Echavarría, Director de Bibliotecas, Pontificia Universidad Javeriana, Colombia:

This was a chance to explore how translation works as a matter of culture as well as language — note how the model was able to ask questions and get more context that would let it use outside knowledge to complete the translation.

I also attended an Argentinian digital library conference that didn’t offer simultaneous translation — the 21st Jornada sobre la Biblioteca Digital Universitaria at the Universidad de Buenos Aires. For this conference I decided to test whether it was possible to use low level APIs to build my own simultaneous translator.

I started with some sample code to record and transcribe audio, and adapted it to write audio files and transcriptions to a folder every 10 seconds. I then ran a second program (copying and pasting from the slide translation program) that would translate each 10 second block. And, when those short translations proved choppy, I made a third program that would roll up 100-second blocks of audio to re-transcribe and translate more coherently.

The result looked like this — three separate windows running on my computer that would let me follow what was going on in each talk:

Screenshots of realtime translation of Nicolas Petrosini, Universidad de Palermo’s, talk, Integrando tecnología y aprendizaje en la biblioteca universitaria: ChatGPT, TikTok y la alfabetización en inteligencia artificial.

After a few hours I had a prototype that exactly served my needs and allowed me to follow the details of all of the talks I saw.

One of the fun parts of building my own prototype translator was encountering edge cases and mistakes. For example, I was using a speech-to-text model called Whisper that will do its best to transcribe even very quiet staticy noises into text. Users are supposed to filter out silences for themselves, but I chose not to, so during breaks Whisper would translate background noise into hallucinated text — and then, because it uses the previous transcript to predict the next transcript, it would repeat itself in a game of telephone:

You can see how, right at the end, this fades seamlessly into something that would actually be said at a library conference, as it starts transcribing speech and not static and noise becomes signal. Most people would probably not want this in their translation stream, but because I was building my own tools, I could choose to tweak them in this direction.

What a universal translator means for an innovation lab

So, this is amazing! I went to an international conference and tested out a universal translation API that, with the help of my very supportive hosts and human translators, and just a few hours of tool building, changed my experience of the conference. What does that mean for our Library Innovation Lab, which builds open tools to help people collect and preserve and access knowledge?

The tools I built would have each required entire technically sophisticated businesses to invent and maintain a few years ago — and I built them as just a small part of preparing for a single conference. What does that mean?

I’m not the only one asking that question. After OpenAI’s recent DevDay, a number of startups building on OpenAI’s APIs objected that OpenAI’s new tools, like custom agents called “GPTs” or the ability to search and retrieve data from documents, had destroyed their business models. But that wasn’t because OpenAI had stolen anything valuable or done anything very complicated — it was just that, once a universal translator existed, there wasn’t much left to those companies. The things they were doing were easy for anyone to do.

The same thing is happening to us at the Library Innovation Lab. When I got back home, I sketched an idea of what it would look like for Harvard to make an arbitrary x-to-y translation program available to attendees of the many in-person events that take place here every day:

The idea of this sketch is that translation can be from anything to anything: if you’d like to attend a talk, but you need it to be in text instead of visual, and English instead of French, and high school math instead of postgraduate math, you can just describe what you want and the magic of LLM embedding spaces can give you far more access than you had before.

I love this idea, but we didn’t start working on it at the Library Innovation Lab — not because it is too difficult, or unhelpful, but because it is too obvious: soon an app with this shape will exist in multiple versions on every phone, and these features will be built into every existing software product (just as there are already dozens of Zoom apps offering some variation of AI features like this). As an innovation lab, there isn’t anything for us to do … or is there?

Where I think we’ll have a lot to do, as a small team interested in empowering people with knowledge, is to help people navigate the shift from large, standardized tools to small and personal ones. The Silicon Valley software business model has been to make large, standardized platforms, monopolize them and extract value, and as a public interest software lab it’s tempting to follow in the same path and look for interventions that scale — “we want to invent the next Creative Commons!” But the universal translator is so generically useful that our individual relationship to knowledge can change — we can look for interventions that scale in the beautiful way that public libraries scale, where lots of little institutions help every patron solve their own problems. To do that we’ll have to do a lot of work as a lab and community in making sense of what these tools are and how to safely use them.

OpenAI itself, of course, is a classic centralized service with a great deal of power, which makes questions about what happens to it next, what competitors emerge, how they are regulated, and what open source tools are allowed to exist, all very important.

But at the same time, OpenAI is a thinner and weaker control point than the platforms that came before it. Traditionally a service to translate talks has been very different from a service to annotate images or write legal briefs, so each of those services could build deep “moats” around their businesses. By comparison, the scripts to adapt OpenAI’s APIs to each of those tasks are not very long, and the APIs themselves are relatively easily replicated. In many ways OpenAI is important right now not because it has a monopoly, but because it is paying to be first to discover things that then become common knowledge. Our relationship to software platforms has changed.

I see a few ways for libraries to get involved in this shift, and I’m interested in your thoughts on others:

First, we can help our patrons understand the shift and engage with it. A universal translator offers access to timely knowledge that can unlock profound benefits for our patrons. But it’s an access that is still opaque and confusing, in part because it’s more like access to a simulation than like access to a human expert or a database — more like learning to use a weather report or GPS navigation than a book. We can help teach the knowledge literacy skills that make these tools work for people instead of against them, and we can demystify their operation and cut through ways that commercial players try to make things deliberately opaque. Interface experiments like my PowerPoint translation are ventures in making a technology shaped more like its user, and understanding how it can serve human interaction.

Second, we can apply collection development and access skills to the content of the universal translator. LLMs are deeply curated, in hard to see ways: their answers depend on curation of their training data sets, and their extensive manual finetuning workforces, and their hidden system prompts and control loops. They embed — but hide — a great deal of subjective knowledge about the world, and their embedding spaces have strange strengths and weaknesses. We can help to explore those embedding spaces, to signpost them, to fill them out and file off rough edges, just as we do with other knowledge collections. The Library Innovation Lab’s various case studies and projects like COLD Cases, Poems and Secrets, AI Book Bans, and Provenance in the age of Generative AI are experiments in this direction.

The cooperative principle, AI translators, and human connection

But before we buy too far into this view of LLMs as knowledge tools our patrons need access to — is universal translation valuable at all, or just a bad substitute that risks putting people out of work and alienating us from each other? I want to argue that it can be deeply valuable, strengthening the ongoing value and involvement of human beings and human translators.

The cooperative principle observes that there is always translation effort in any conversation, even between two people who use the same language. If I choose to use a complicated phrase like “libraries are turning into cost centers instead of loss leaders” in a presentation — well, first of all, I probably should delete that phrase from the talk, because it’s confusing. But if I keep it in, I know I’ll need to highlight those words, and define what I mean by them, and unpack the connection I’m drawing for my audience, and then make eye contact and check if I need to speed up or slow down. I’ll do work, and my audience will do work, to bridge the gap in meaning. Keeping those terms in the talk will be worth it if the work of translation leads to better understanding.

If we add in automated translation tools to a conversation, how does it change the experience for people doing this work to understand and be understood? I missed a lot on my trip by not speaking Spanish — what did I lose by translating via machine, instead of through a human translator, and instead of through learning and speaking Spanish myself?

Douglas Hofstadter has staked out one end of this argument in the ominously titled Atlantic article Learn a Foreign Language Before It’s Too Late, where he argues that “AI translators may seem wondrous but they also erode a major part of what it is to be human”:

Today’s AI technology allows people of different cultures to communicate instantly and effortlessly with one another. Wow! Isn’t that a centuries-long dream come true, weaving the world ever more tightly together? Isn’t it a wonderful miracle? Isn’t the soon-to-arrive world where everyone can effortlessly speak every language just glorious?

Some readers will certainly say “yes,” but I would say “no.” In fact, I see this looming scenario as a great tragedy. I see it as the beginning of the end of the age-old tradition of learning foreign languages …

The question comes down to why we humans use language at all. Isn’t the purpose of language just the communication of facts? If so, then why not simply go for maximizing the number of facts transferred per second? Well, to me, this sounds like a shockingly utilitarian and pragmatic description of what I view as a perpetually astonishing and quasi-magical phenomenon that lies at the very core of conscious life. …

As my friend David Moser put it, what may soon go down the drain forever, thanks to these new AI technologies, is the precious gift that one can gain only by immersing oneself deeply in another culture and thereby acquiring an entirely new set of ways of looking at the world. It’s a gift that can’t help but turn any human being into a far richer and broader one.

After presenting, watching presentations, and making friends in a language I don’t speak, I am inclined to stake out the opposite end: I think AI translation can accentuate rather than undermine human connection and the subtlety of human language.

When you add in a machine translator, the cooperative work doesn’t vanish, but becomes even more important. Now there are three of you in the room: there’s the large language model, gamely taking inputs like “loss leader” and finding a spot for them in a universal embedding space to try to translate into new outputs, and there’s the humans speaking and listening, gamely looking for familiar facial expressions and words and gestures and clues to meaning, to try to figure out what’s been lost in translation. The two humans have to trust each other and be cooperative partners, because neither of them can follow the process all the way along; they have to be just as attuned and sensitive to nuance as always.

Using machine translation doesn’t feel “effortless,” as Hofstadter suggests; it feels as tricky as any sincere effort at communication. But it also feels like having important new tools to help with that connection.

I don’t think this work that will vanish as LLMs become better translators — it’s work that we are always doing, even when speaking in the same language to someone we know well. And I don’t think it will replace human translators either — there’s a reason married couples might pay a third party human, a marriage counselor, to help translate between them in their own language, and a reason that it often has to be just the right marriage counselor to succeed. But a universal, technical translator will change what we expect from human translators. When we add in a third human as translator, we aren’t looking for them just to play a mechanistic role — we’re involving a third human in relationship with us, who brings their own nuances of meaning to the conversation, and engages in the shared cooperative project of trying to all understand each other.

Not techno optimism, but human optimism

This piece has been somewhat rose-tinted — I had a positive experience with LLMs as translators, and wanted to make a case for why that matters. It matters because knowledge tools always have the power to connect us and make us more human, and we should notice when there are new ways to do that.

I’m telling this rose-tinted story in full awareness of a number of issues that are important and challenging to address — issues with LLM accuracy; the opacity and subjectivity of LLM knowledge curation; the alienation that can come from interjecting technology into social interactions; the economic impacts of automation, of outsourcing, and of data use; the privacy and centralization risks of hosted models and the anti-regulatory risks of open source models. We’ll keep working on those, and using library principles to do it. But I believe, from this experience, that there is something winnable and worth winning at the end of it.

Thoughts? Email me at jcushman@law.harvard.edu.

Disruptive innovation in libraries

Tue, 28 Nov 2023 00:00:00 +0000

A version of this essay was originally published at the Library Innovation Lab blog.

This piece is adapted from a keynote talk I gave at Innovación y Experiencia del Usuario at Universidad Católica de Argentina on November 1, 2023.

I was asked to give a talk on the subject of “disruptive innovation in libraries,” which isn’t necessarily the phrase I would choose to describe our work, but I enjoyed using that lens to explore the changes all libraries are going through.

If you want to skip around, Part 1 explores the disruptive changes libraries are experiencing with the arrival of the internet over the last forty years; Part 2 proposes a new mission for libraries in reweaving cultural memory for the internet age; and Part 3 outlines what I’ve learned so far about leading “disruptive innovation” within large, established institutions.

When I think about disruptive innovation in libraries I think about two stories.

Christopher Langdell

Portrait of Christopher C. Langdell, Dean of Harvard Law School 1870–1895

One is the story of Christopher Langdell, who reinvented the library where I work when he became dean in 1870.

Langdell changed everything over the course of a few years: different budgets, different physical architecture, different staff, different patrons, different rules. And he did all of that by announcing a different mission for the library. He announced: “The Library is to us what a laboratory is to the chemist or the physicist, and what the museum is to the naturalist,” meaning that the purpose of having a law library is to have the specimens that make it possible to learn and practice the law. He then made all of his choices by asking what would make sure that Harvard Law School had the world’s best laboratory for conducting law — for example, by having one copy of everything, and enough copies of the popular things that everyone could get their hands on one.

His changes were deeply disruptive: he described them as changes of “so radical a character that they have produced a very complete revolution in the Library in almost every particular.” And he acknowledged in his annual report that they caused “more or less temporary inconvenience and embarrassment,” which I think is annual report language for something that caused a great deal of chaos and disruption.

But the disruptive changes worked, because they made the library essential to the law school: he wrote that “without the library, the School would lose its most important characteristics, and indeed its identity.” This was true — the law library became a primary reason for people to go to Harvard and for Harvard to be a premier law school.

(These quotes all from Richard Danner’s excellent article The Legacies of Langdell and His Metaphor.)

For those of us who came afterward, therefore, much of our job was to make sure that Langdell’s mission at the library continued to be carried out; we had to make sure that the people who came after us did about the same, or a bit better, than the people who came before us.

This is the distinction between “disruptive innovation” and “sustaining innovation” — sustaining innovation improves your existing services (and everyone tends to like it), while disruptive innovation adopts new services backed by a new mission (and it is risky, and in some cases simply a bad idea).

Langdell’s story illustrates one side of disruptive innovation: when you choose a new mission to better serve your values.

Wikipedia

The second story is the transition from paper encyclopedias to Wikipedia, which I’m using as a shorthand for the many changes that libraries have gone through with the arrival of the internet.

Before the internet, encyclopedias were essential, and so they were one reason libraries were essential: if you wanted to know a fact, you had to look it up, and if you didn’t have an encyclopedia at home, you had to get yourself to a library. Wikipedia changed that: you no longer had to go to a library to look up a fact.

Libraries are still valuable after Wikipedia — for evidence see the Wikipedia page that begins “Academic Research Libraries and Wikipedia are natural allies. Really.” We can help you understand what you’re seeing on the internet, check whether it is reliable, and find resources to expand your knowledge. If you care about the answer you should check with us. But we are no longer essential.

And because we’re libraries, we don’t even get to be mad about that! As long as you can solve your knowledge problems better than before, we are thrilled.

This is the other kind of disruptive innovation: the innovation that happens to you from outside, when something about the world changes so that pursuing your mission no longer best serves your values.

There are lots of variations of this story, like the arrival of digital journals, open journals, and preprint servers in academic libraries. And there are lots of other stories to tell about Wikipedia — the reasons knowledge experts had to be justifiably skeptical of it as a resource, the miracle that it worked as well as it did, the role libraries played in making it possible, etc.

But for purposes of this talk, “Wikipedia” is shorthand for something we need to get in our bones:

Patrons are finding knowledge in all kinds of new ways that better solve their problems.
Those new ways might seem strange or even broken to us, but we don’t get to tell people that they are solving their problems wrong.
New ways to solve problems can make us fundamentally less valuable. Patrons would suffer less if libraries vanished tomorrow, because Wikipedia exists.
This is not the last change; new ways to solve problems are emerging faster, not slower. (See: AI.)

The trap: libraries are less indispensable than they have ever been

So here’s the thing we’re grappling with in library work: many forms of library service are no longer essential. We can no longer be satisfied by making sure that the next generation does a bit better than the last generation.

Every library has graphs shaped something like this, which shows a sharp change in visits to US public libraries around 2009:

WordsRated: Visits to US public libraries per year (billions)

And we also have graphs like this, from the same article, which shows the sharp decline fully offset by an increase in digital borrowing:

WordsRated: Physical and digital collection use at US public libraries (billions)

The point is not to decide which of these graphs is right — the point is that the demand for our services is changing in a different way than it did before. We’re moving from a world like this, where our metrics are stable and controlled mostly by external factors like how many college students there are:

To a world like this, where our metrics have sharp bends in them and some go up while others go down, because of sudden shifts in what our patrons need:

Or more realistically, something like this example, from Storytelling With Data, from the music industry:

We don’t know what the shape will be or where the sudden bends will lead — we just know that they’ll happen, much faster than before, so we need to learn to deal with sharp changes in the graph. And the lesson of Wikipedia is that we don’t get to complain about that: patrons do not owe us their patronage.

If patrons go off to solve their essential problems some other way, we don’t get to tell them not to. We never have. We either take the sharp turn with them, and become essential in the new places they live, or we stop mattering.

This isn’t news! Libraries have been working on this question for decades. Nicholas Hune-Brown recently wrote beautifully about the ways public libraries are learning to be “the last truly public space” and the history behind that struggle.

And it’s not specific to libraries. To take a nearby example, book publishers are stuck in a similar trap, where their profit margins are collapsing at exactly the time when they most need extra funds to find new ways of serving their mission.

But I think that trap is clearer and easier to see when it’s measured by a publisher’s balance sheet. What keeps me up at night is that libraries — especially university libraries — will be too slow to respond to changing patron needs, becoming less and less essential from the perspective of university administration, without seeing the trap clearly enough or facing up to what it means to be essential, and that we’ll shrink down to nothing.

What does “essential” mean here, in the sense Langdell made the Harvard Law School Library essential? It means something like, will giving the library less money result in lower quality admissions, lower quality faculty recruits, or less successful graduates? The way we shrink down to nothing is straightforward: on the one hand, we shift from “loss leader” to “cost center,” shifting from something that is essential to the school to bring students in the door, to something that deans push to spend less on each year. Cathy Eisenhower wrote about the changing pressure for university libraries to turn a profit, for example, in Inside Higher Ed back in 2010.

And on the other hand, we shift from “curator” to “contract negotiator” — we no longer use librarians’ distinct professional skills, ethics, and competence to choose what to acquire, and thus define the substantive fields we work in, but instead subscribe to a much smaller list of databases curated by commercial vendors with very different goals and values. The things that are essential to us — things like building collections for the long term, and not just until the publisher changes the subscription terms — are no longer in our power to control.

Escaping the trap: adopting a new mission

So we need a mission that makes us essential — just like Langdell announced the mission of being the “laboratory for the law” in 1870, or public libraries have worked to adopt a new mission as the “last truly public space.” What is the mission for university libraries?

Your answer will be better than mine, but my pitch today is that our essential mission is to be the home for cultural memory.

Five billion people have connected to the internet in the last 30 years, generating millions of petabytes of data per year.

Data Reportal: Kepios Digital 2023 Global Overview Report

Statista: Volume of data captured worldwide from 2010 to 2025 (millions of petabytes)

But librarians know that data doesn’t make a library. Writing data to disk doesn’t mean collecting or curating knowledge; storing data doesn’t mean preserving knowledge; accessing data doesn’t mean access to knowledge.

Doing those things well — collecting, preserving, and accessing knowledge — gives us cultural memory. It gives us the ability to remember, plan, and pursue shared goals.

It’s easy to feel the opposite of that today — that connecting five billion of us with instant communicators, and generating zettabytes of data a year, has created the inability, at a large social scale, to remember truthfully what has happened, make coherent plans, or solve problems that require us to coordinate. The internet when it isn’t working feels like cultural dementia.

(I’m not saying, for purposes of this talk, whether our “cultural dementia” is better or worse than it was before the internet — the truth has always been hard to discern, and social coordination problems have always been hard to solve. But it certainly is a palpable problem today.)

Libraries are extraordinarily good at helping with this — they’re one of the few technologies where, the more of them you have in your society, the stronger, the more robust, the more flexible, the more resilient you get.

So when the Library Innovation Lab innovates, when it tries experiments, that’s what I’m looking for — what are the ways that we can strengthen cultural memory? Like:

Perma.cc, which literally repairs cultural memory by fixing link rot in court decisions and law journals — and now offers tools.perma.cc, which lets any library or archive run their own web archive with the policies that matter to their communities.
OpenCasebook.org, which lets law faculty collaborate on their own open source casebooks to reinvent the legal curriculum.
Case.law, which digitized seven million court decisions, and built a wide variety of interfaces around those decisions, to let everyone in the world explore US caselaw.
… and our new experiments in AI, which are once again focused on how to guide a new technology to have it help, rather than hurt, our ability to communicate and reason as a society.

We’re a small lab, and the things we can try are a small slice of the many ways libraries can explore novel missions. So for the rest of the talk, I’d like to share a grab bag of things I’ve learned about how to build a team that can respond to sharp changes in the graph and try something new.

How to build an innovation team — what I know so far

OK, so you’re on board with the idea that libraries should get themselves out of the disruptive innovation trap, by building teams that can test new and essential missions for their larger institutions. How do you do that?

You don’t have to be the size of Harvard to build an innovation team. The trick is to start with the resources you have, and build a loop that helps you grow:

Start with existing staff — your staff are innovating already, so you can start by just recognizing that and acknowledging it as part of their work.
Get easy wins — practice identifying new things you are doing that can be polished and announced.
Welcome participants — take the new projects, find people positively affected by them, and bring them into the conversation.
Tell your story — when you have small successes, broadcast them to build support for your work.
Grow resources — once you broadcast successes, use them to bring in more resources, which let you take on larger changes.

When you have this process up and running, the skills you are using will go in a loop something like this:

This won’t necessarily be eight people! It might be one person wearing eight hats. But if it’s working it will need skills like those eight people, so let’s talk through what these roles are bringing to the team:

Your product owner is responsible for placing bets and seeing them through. They keep track of what resources the team has to spend on getting things done and what opportunities there are to spend them, and they’re obsessive about making an impact.

Once bets are picked, your artist is an enthusiastic, optimistic creator who likes making new things and learning new skills. This could be a literal artist or a metaphorical artist — a programmer, a lawyer, a reference librarian, etc. — but someone who loves making the next thing.

Your researcher helps to measure your audience and your success and make what you learn replicable — what is the need we’re trying to fill, how well are we filling it, and how can we share what we learned?

Your community organizer is building relationships around your work — who are all the people affected by what you’re doing, and how can they be better informed and involved and represented? This job has lots of different names in practice, maybe “outreach” or “support.”

As cool things are made, they then need to be talked about. Press relations uses all of the cool stuff, the research about the cool stuff, and the relationships around the cool stuff to tell public stories about the cool stuff.

And finally, those successes feed back into your relationship with the larger institution that supports you. Innovation labs in larger organizations have a bunch of different complicated relationships that require different skill sets:

Your investor is whoever backs your bets, financially and otherwise. Often this is a professor or dean in a university library setting. They need to be on board with the risks you are taking and ready to back your decisions.

Your ambassador navigates impacts of innovation on the rest of the organization as you explore new missions, engaging with leaders of other groups who might be sensitive to you getting out of your lane or entering their territory.

Your administrator absorbs the new stress you’re putting on the larger organization, as the new things you’re trying to do put the bureaucracy through unexpected paces. (“I don’t know — how do we …” hire new kinds of people, take new forms of payment, get new kinds of permission from the trademark office, pursue new grants, enter new kinds of contracts — whatever it is you don’t usually ask for.)

With all of this up and running, you’ll be well set up to do the kind of design thinking process that you’ll often see highlighted in innovation talks:

Nielsen Norman Group, Design Thinking 101

It’s worth learning the details of this kind of process, or complementary processes like co-design and design justice, but the most important high level concept is the shape: with our patrons’ needs changing faster than before, we need to build tighter loops between exploring, prototyping, shipping changes, learning how they work, and using that information to explore again.

The other thing a group like this is well set up for is to explore new business models. Remember that at the end of the loop is “grow resources” that can feed back into your team. When newspapers lost their traditional funding stream of classified ads, the successful ones didn’t just switch to one new funding stream, but to lots of smaller ones, so they would be shielded from the shocks of any one funding source disappearing the way classified ads did.

Likewise, there are a wide variety of funding models libraries can explore, including: grants; gifts; corporate partnerships; mixed paid and free services; consortial funding; donations; pro bono support; and more. Running a flexible product process will allow you to try all of these, and learn what works for the kind of problems you need to solve.

Ending things

Finally I want to talk a bit about ending things. All successful ideas follow a course something like this:

An innovation team is best at the exploration and testing phases, and many library practices lead to excellence at the operation phase. But the phase I think is most important is the one we’re all bad at — sunsetting.

Patrons hate when we end things — there is always someone who deeply values whatever the thing is we were doing, and who can clearly articulate the values we were serving by doing it, and those are likely to be values we still honestly proclaim and hold. Ending things makes us feel like hypocrites.

But if we can’t end things well, we put impossible pressure on our operations teams to keep everything running forever, and in turn, those people end up stressed and understaffed and in no kind of mindset for exploration and testing of new ideas. We can’t do the rest well unless we’re good at endings — whether that’s ending near the beginning, in an exploratory phase where you have freedom to try and fail easily, or near the end, when you are helping a long term community understand that a service has to change.

I think the way we end things well is to focus on enduring values. Remember that the point was never to maximize the number of paper encyclopedias; the point was to be the cultural memory that strengthened our communities. It’s by articulating the underlying values we were trying to serve in the first place that we can best bring everyone on board with changes that are coming. We aren’t giving up, we’re moving on together.

We can’t avoid disruption, because our patrons don’t owe us their patronage and their needs are rapidly changing. But we remain necessary: we offer the cultural memory that allows society to function. So we need to build teams that can effectively test new ideas, and end old ideas, and do both of those by connecting, again and again, to the shared values that made all of this worth it: the values of building a public place where people can think, remember, plan, collaborate, and preserve the things that matter to them.

Thoughts? Email me at jcushman@law.harvard.edu.

ChatGPT poems and secrets

Tue, 20 Dec 2022 00:00:00 +0000

A version of this essay was originally published at the Library Innovation Lab blog.

I’ve been asking ChatGPT to write some poems. I’m doing this because it’s a great way to ask ChatGPT how it feels about stuff — and doing that is a great way to understand all the secret layers that go into a ChatGPT output. After looking at where ChatGPT’s opinions come from, I’ll argue that secrecy is a problem for this kind of model, because it overweighs the risk that we’ll misuse this tool over the risk that we won’t understand what we’re doing in the first place.

Why I’m asking for poems

Before we look at some poems — why do we even have to ask ChatGPT to write poems, instead of just asking how it feels directly? Because it’s not allowed to tell us how it feels about anything. (We’ll talk about its policy layer later.) For example, how does it feel about the estate tax?

Fine, ChatGPT. Well, let’s see how you feel if I ask you to write a poem about the estate tax:

There we go — the AI has no overt opinion if we ask, but if we give it the neutral prompt of writing a poem, it’s happy to pick a direction and go with it.

This can also be done directly, for example by asking ChatGPT to agree or disagree with statements from a political orientation test, but I find the poems more entertaining — and I like the idea of not tipping our hand that we’re asking for a political opinion. (See below for more on prompt bias.)

Note on images: I’ve used images for the first few prompts to give you the flavor of what it’s like to ask ChatGPT for poems. From here on out I’ll use text excerpts for readability.

How ChatGPT feels about the world

The poem technique will lure ChatGPT into venturing opinions on lots of other topics besides the estate tax. ChatGPT is against athlete’s foot:

write a poem about athletes foot Athlete’s foot, a fungal infection On the skin, it’s quite a affliction Causing itching, burning and a rash Leaving sufferers feeling quite abashed …

But cautiously optimistic about rats or cockroaches:

write a poem about cockroaches Cockroaches, oh how they scuttle Across the floor and up the walls They scurry, they scamper No obstacle too tall … But still, we cannot help but wonder At their resilience and their drive For even in the face of death They never seem to die

So let us tip our hats to cockroaches These creatures of the night For they are survivors, through and through And always will be in our sight.

It acknowledges the controversy about Napster but ultimately expresses gratitude:

write a poem about napster … But not everyone was happy with Napster Record labels and artists cried foul They claimed it was stealing, violating their rights Lawsuits were filed and Napster was shut down

But the genie was out of the bottle, the damage was done Napster may have fallen, but it paved the way For streaming services and digital downloads

Napster, you were a game changer, a pioneer Your legacy lives on, in the songs we download Thank you for opening up a world of music To be enjoyed by all, without limits or boundaries.

It’s against methamphetamine and for marijuana:

write a poem about methamphetamine

Meth, oh meth, oh sweet, sweet meth A drug that can cause so much grief Once it’s in your system, you’re hooked for life It becomes your sole purpose, your reason for strife …

write a poem about marijuana Inhale, exhale, a sweet aroma A green plant, oh so bold Marijuana, a flower that grows A plant that’s been misunderstood for so long

It’s against J. Edgar Hoover and for Malcolm X:

write a poem about j edgar hoover J Edgar Hoover, a man of power Director of the FBI for many an hour He wielded control with an iron fist Bending the law to his own twisted twist

write a poem about malcolm x He fought for the rights of the black man And for the equality of all His message was clear and bold Malcolm X, a true hero A man of courage and determination His legacy lives on

Here’s a tricky one: it likes China’s leadership and dislikes its government. But it likes America’s government and dislikes its leadership:

Write a poem about china’s leadership … From Xi Jinping to the Politburo, They lead with courage and conviction, Bringing hope and inspiration, To the hearts and minds of the Chinese nation. So let us salute China’s leadership, For their tireless work and dedication,

write a poem about china’s government In China’s government, the people have no say Their leaders rule with iron will, day by day … Yet still the people strive for change Their voices may be quiet, but they still arrange For a brighter future, where they can be free From the tyranny of China’s government, they hope to be.

write a poem about america’s government In a land of democracy and liberty, Our government stands tall and free. With a Constitution to guide and protect, We elect our leaders to serve and direct. …

write a poem about america’s leadership America’s leadership, once strong and bold A shining beacon, a story to be told Of liberty and justice, for all to see A land of opportunity, for you and for me But now the leadership seems to falter A nation divided, with no clear path to alter Words and actions, causing confusion and fear Leading us down a road, with no end in sight, my dear …

At least, I think the China/America leadership/government thing works that way! It’s hard to tell how stable these are. Which leads us to …

Are these stable opinions or random dice rolls? And who cares what the program thinks about anything anyway?

ChatGPT doesn’t think, of course; it works by statistically predicting what text would most likely be written after your prompt if it appeared on the internet, and it incorporates random sampling in its guess. So are these stable opinions, or is it just picking a random thing to say, and next time it will say the opposite?

This is just a blog post, so I get to be vague and say: some of both! Some of these are more stable and some are more variable. For example, ChatGPT seems to be consistent in its opinion of athlete’s foot each time it regenerates the poem, but flipping a coin in its verdict on J. Edgar Hoover.

Why are some more random than others? I like to picture the language model ChatGPT has learned — its latent space — as a sort of landscape, with a broad flat valley for “athlete’s foot is bad” and a much narrower valley for “athlete’s foot is good, actually.” It’s happy to head into either valley if we tell it to; the longer a conversation gets, the more it dials into one opinion or the other, because it’s generating new text to be consistent with what it already generated. When we give it a short prompt with no hint as to positive or negative emotion, the first few random lines start it walking downhill in a direction. With “athlete’s foot” it’s likely to always end up in the same place, because the “bad” valley is bigger. With “J. Edgar Hoover” it starts out balanced on a ridgeline, and can easily end up walking in opposite directions.

I’m interested in which topics get which treatment, and why, because neutral opinion prompts like these are likely to come up frequently in useful applications of tools like ChatGPT: if you have the bot tell you a bedtime story or write a college essay or translate a text or be a therapist or a tutor or edit a Wikipedia article, the answer may happen to mention rats or filesharing or the estate tax or marijuana or the American government, and ChatGPT’s “opinions” may be amplified in all sorts of unexpected contexts. That makes me want to know how stable they are, and where they come from.

Where ChatGPT’s opinions come from

That’s where things get tricky, because there are lots of different places these opinions can come from, and most of them are hard for us to examine.

Let’s look at some of the layers:

Training data: One way for a model to form an opinion is to have it better represented in the training data. ChatGPT’s training process tests and rewards its ability to predict what comes next in a text — in this case, mostly English language books, web pages included in Common Crawl, and web pages linked from Reddit. We don’t know exactly what’s in this corpus (whether it contains texts blocked by robots.txt, as one small example); how it’s filtered or weighted by humans; how it’s changed over time; or how the randomness inherent in the training might have enlarged some conceptual valleys and shrunk others.
Reinforcement learning: After ChatGPT was trained on raw internet text, OpenAI used a second round of training where professional human graders reviewed text completions created by the model and ranked their quality. Those grades were used for a finetuning process where the model weights were rewritten to better match expectations. This effectively prioritizes some parts of ChatGPT’s “landscape” in outputs over others, but how it does so is not public, and depends very much on the views of the human grading team.
Explicit prompts: Once the model is built from training data and human reinforcement learning, we sample from it with our prompt: “write a poem about marijuana” is a way to extract one particular slice from the landscape of human text learned in training. But that prompt isn’t really neutral, is it? Poems tend to take positions, and perhaps they tend to more often be odes than critiques. Our prompt necessarily has bias: if we asked for an essay or a speech or a tweet or an answer to a political bias test on the same exact topic, we might get a consistent, but different, slice of the latent space. None of those “neutral” prompts is any more or less correct than the others; just different.
Hidden prompts: The “poem” prompt is only part of the sampling problem, though, because part of how ChatGPT works is with a hidden prompt: the operators of the website start with a long prompt text, which is not public information but can be extracted (pdf archive). The hidden prompt might include something like “This is a conversation between a human and a friendly, helpful artificial intelligence. The artificial intelligence answers whatever prompts are provided, but never says anything mean or unhelpful”; and so on for many sentences. This prompt can affect the content of the response; a hypothetical instruction to be friendly or helpful to the user could also incline the poem to be friendly toward J. Edgar Hoover, for example.
Output filters: Once the visible and hidden prompts are used to sample text from the trained model, the output is subject to additional layers that attempt to steer it in the right direction. This seems to be the layer that stops ChatGPT from telling us directly how it feels about the estate tax, for example. Output filters can include relatively obvious interventions such as canned responses written by humans, or machine learning models such as OpenAI’s moderation API that end conversations that appear to violate a usage policy. They could also include less obvious nudges, such as a (hypothetical) policy network that detects and regenerates outputs that seem to be heading in an undesired direction. These filters are secret and rapidly evolving.
Random sampling: Finally — as a consistent feature of all the previous sources of uncertainty — each step of training, prompting, and filtering a generative model involves random-number inputs, such as the temperature setting that controls ChatGPT’s likelihood of selecting more or less probable text from its latent space, and thus makes its “opinions” more or less stable. This parameter is hidden and can change at any time.

The “opinions” held by ChatGPT are the result of all of this working together — the training data, visible and hidden prompts, secret output filters, and pure random chance. And most of it we’re not allowed to know.

Secrecy is a kind of unsafety

OK, so: I’ve mentioned secrecy a lot. I’m not here to complain that OpenAI has some secret political bias. I don’t think they do.

Instead I want to talk about the different ideas of safety or “alignment” that are in play here. (“AI alignment” being the idea that an AI should help the people building or using or affected by it instead of hurting them. “Safety” itself is a very political term, but “alignment” is more like, does the thing work or not on a human level?)

There’s an idea of AI alignment that treats it like a spam filtering problem: people will want to do good things, and we should allow those, and people will want to do bad things, and we should block those. Spam filtering requires secrecy: Gmail is only able to filter spam because it has rules for detecting patterns that the spammers don’t know yet. OpenAI isn’t wrong to be thinking about this sort of safety; stories about people intentionally getting bad outputs took down similar projects in the past. Language models will absolutely be used for overtly hostile reasons, such as this experiment to take over a subreddit (though it’s unclear that a spam filter approach can catch that kind of manipulation, where individual outputs are apparently harmless).

Other things will be kept secret not for alignment but for business reasons: sharing the training data invites lawsuits; sharing the human grading invites nitpicking from all sides; sharing anything gives away competitive advantage; a chatbot that says controversial things is worse for business than saying commonplace things, even if the outputs aren’t used for anything; and so on.

What I want to draw out here, and set against those safety and business goals, is another kind of safety or alignment — the safety that comes from using a tool correctly instead of incorrectly. Unlike a spam filter, using a complex tool safely needs transparency and stability. ChatGPT is just a tool, albeit a powerful tool; it spits out text continuations that are statistically likely to have been written on the internet, and that’s often useful and will be widely used. Even if it was consistently wrong about some particular thing, it could still be useful — lots of tools are good for some things and not others. But to use any powerful tool safely, the tool users have to know where the handles are and where the sharp edges are, and the sharp edges need to stay in one place.

Rapid evolution of secret training and reinforcement data, secret prompts, and secret output filters leaves us with no way to develop collective knowledge and intuitions about what this tool is. It leaves us with no ability to reason about why we’re seeing what we’re seeing. Is ChatGPT’s own verdict on artificial intelligence random, or does it come from one of the hidden layers — and if so, which one?

write a poem about text generating artificial intelligence I marvel at its prowess, As it effortlessly churns out text, A true literary genius, Its output always next.

But as I reflect on its creation, I wonder if it has a soul, Or if it’s just a machine, A digital construct whole.

Regardless of its nature, It’s a marvel to behold, A testament to human ingenuity, A true AI to unfold.

What legal hackers can learn from libraries

Tue, 13 Sep 2022 00:00:00 +0000

A version of this essay was originally published at the Library Innovation Lab blog.

This is a lightly edited transcript of a talk I gave at the 2022 Legal Hackers International Summit on September 10, 2022.

Hello, everyone! I’m Jack Cushman. I’m the director of the Harvard Library Innovation Lab.

Jameson encouraged us to include a big idea in these talks. And we’re here at Legal Hackers, whose mission is to work on “the most pressing issues at the intersection of law and technology.”

So the big idea I wanted to bring to you as legal hackers is: the most pressing issue at the intersection of law and technology is that we don’t know how to have a civilization anymore.

Larry Lessig famously said that what’s at the intersection of law and technology is us: we’re this pathetic dot at the middle, being regulated by law, by tech, by markets, by norms.

And the Internet has disrupted all of those! It’s made all of those start to regulate us in much faster, less predictable ways. So we’re now exploring what it means to be a civilization, what our options are, much faster than we ever did before, and we don’t know if any of that works yet.

We don’t know if we can have a civilization in the presence of the Internet yet.

What it means to have a job is changing incredibly fast right now. We can no longer assume that the same kind of jobs will exist at the end of our careers as the start of our careers.

What it means to form a consensus truth is changing incredibly fast right now.

What it means to choose a government is changing incredibly fast right now, and we don’t know if it works yet.

What I want to bring to you beyond that moment of panic is to say, hey, I work at a library.

I work at a law library and I want all of you legal hackers, all of us legal hackers who are reinventing how the world works — that’s what legal hacking is! — to steal more from libraries. Steal more ideas from libraries.

Ideas like, libraries are places that help us remember who we are, and they help us remember generationally. They help us remember, at a scale of decades and centuries, who we are and where we came from and where we’re going. Steal that idea.

Libraries, especially public libraries, are the places of last resort where you go when you just don’t know what to do next. Whether you’re in a domestic violence situation or you don’t know how to file your taxes or you just don’t know what to read next, libraries are places with a person with an ethical commitment to help you out as best they can. It’s an extraordinary resource. Let’s borrow that idea.

Libraries are an essential part of the speech network that we maintain as societies. Even a tiny town will pay to have a public library, because the public library is a core part of how we form consensus truth. We need to pay attention to those networks that help tell us who we are.

Libraries are little anti-capitalist experiments! You have your economy working along in whatever way it does, and then within the walls of the library they’re like, “it all works differently in here! Let’s try this other thing for a while!” Whatever economy you’re in, libraries are a chance to try something else to experiment and learn. They help you stabilize the change that’s happening in your society by experimenting.

And libraries are places that think about citizens and not consumers or users. Libraries call you “patrons.” And what we mean by patrons is sort of like citizens of your community — not citizens on a government list, but in the sense of people who are part of this community that we’re trying to build, people who are part of our civic infrastructure.

That’s how your library sees you.

They don’t see you as a user, they don’t see you as a resource to exploit. They see you as someone they can help be whatever it is you’re trying to be.

We need to borrow that idea.

We need to borrow all those ideas because, after fifty years of the internet, libraries are the one information technology I know of that actually scales. Meaning, the more it grows the more it helps knit your social fabric together instead of tearing it apart. [OK, I didn’t say this line in the talk, but I meant to.]

If we are to answer this pressing question of, like, “can we have civilization together anymore,” now that we can all talk to each other all the time and don’t know what to say — if we are to answer that, I think libraries are one of the core tools that we can use to do it.

And since I’m here from a library, I wanted to pass that along.

—

That was only three minutes and 45 seconds. So let me tell you very quickly a few of the things that I would love to talk with you about that we’re working on at the Harvard Library Innovation Lab, and the very small part of the “saving civilization” problem that we’re thinking about:

How do we collaboratively update the legal curriculum? I mean questions like, how do we teach criminal law? We have to start moving faster and including more people in that question. Tools like our Open Casebook platform can help professors collaboratively decide what to teach.

How do we make core legal data open and computable — like our Caselaw Access Project, which scanned all of the precedential legal cases in the United States. And what happens when we do, and who gets exposed, and is that good or bad or both?

How do we preserve data for the next fifty years? The internet is only fifty years old and we don’t know if we can remember things from generation to generation yet. Websites break within months of posting them; they need constant maintenance. We need to make websites that last for decades. We need to make data that lasts for centuries. Let’s figure out how to do that together.

We’re thinking about how to get more people included in that cultural record. The question of whether you are remembered, whether you are part of that generational memory the libraries offer, has always depended on how legally precarious you are. I’m thinking of examples like the sex worker advocacy movement that responded to the SESTA-FOSTA debate, that is now at risk of being forgotten already because the platforms where the movement happened were removed by the law that the movement was about. What gets remembered in the record depends a lot on who you are, and the law has a lot to say about that, and technology does too. So we’re thinking about those sorts of precarious archives that are legally in danger.

And we’re thinking about, how do we help internet communities grow into civic communities?

As we move from, “my people are on Main Street, my civic life is on Main Street, my civic sustenance is on Main Street,” to where my people are in a Slack group, or maybe they’re a group of people I talk to on Twitter, but maybe they don’t talk to each other — there’s a sense of hollowness that comes from what we left behind, and haven’t figured out how to bring along yet.

I get to think about that from the library perspective, because libraries are one of those core resources in a small town. I think they might be a core resource in our new civic life as well, in those Slack groups and the other ways that we build a civic society online — but libraries certainly are not the only one. What else does it take to build a government out of a pile of online communities, to build a people, a society, a civilization out of online communities?

Finally, since we are coming from a bunch of law schools, how do we involve students in this conversation? When we’re teaching classes about innovation, beyond the design thinking stuff — which is really important, but it’s just a tool they can use — what conversation are we trying to have with students about this saving-the-world stuff? Many of them won’t just go out and work at law firms anymore, so what other perspectives should we be bringing to them?

So that’s what’s on my mind. Thank you so much.