naml.us

Biography

Chief Scientist at the UK AI Security Institute (AISI). Alignment will be solved eventually, but we don’t know how to do it yet.

Past lives include leading the Scalable Alignment Team (SAT) at DeepMind, leading the Reflection Team at OpenAI, neural network theorem proving at Google Brain, cofounding Eddy Systems to autocorrect code as you type, and computational physics and geometry at Otherlab, D. E. Shaw Research, Pixar, and Weta Digital. I have screen credits on Ratatouille, WALL•E, Up, and Tintin.

Interests

AGI Safety
Artificial Intelligence
Theorem Proving
Computational Physics
Games!

Education

Ph.D. in Computer Science, 2007

Stanford University
B.Sc. in Mathematics and Computer Science, 2003

California Institute of Technology

We fine-tune pretrained language models using human feedback for various tasks, successfully matching the preferences of the external human labelers, though those preferences did not always match our own. Specifically, for summarization tasks the labelers preferred sentences copied wholesale from the input (we’d only asked them to ensure accuracy), so our models learned to copy. Summarization required 60k human labels; simpler tasks which continue text in various styles required only 5k. Our motivation is to move safety techniques closer to the general task of “machines talking to humans,” which we believe is key to extracting information about human values.

Properly aligning advanced AI systems with human values will require resolving many uncertainties related to the psychology of human rationality, emotion, and biases. These can only be resolved empirically through experimentation — if we want to train AI to do what humans want, we need to study humans.

We propose an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins. We believe that this or a similar approach could eventually help us train AI systems to perform far more cognitively advanced tasks than humans are capable of, while remaining in line with human preferences. We outline this method together with preliminary experiments and release a web interface so people can experiment with the technique.

Thank you for holding my duck

Mon, Mar 8, 2021 research

There’s a story I like to tell, which I vaguely remembered as originating at Bell Labs or Xerox PARC. A researcher had a rubber duck in his office. When he found himself stumped on a problem, he would pick up the duck, walk over to a colleague, and ask them to hold the duck. He would proceed to explain the problem, often realizing the solution himself in the middle of the explanation. Then he would say, “Thank you for holding my duck”, and leave.

Lessons from Lyndon Johnson

Wed, Sep 25, 2019 politics, games, safety

I’m in the middle of the third book in Robert Caro’s biography of Lyndon Johnson. In brief, Caro’s thesis is that (1) Lyndon Johnson cares only about power, and (2) Lyndon Johnson is spectacularly skilled at politics. Moreover, (2) holds in a strong sense: Johnson is not simply skilled at politics, but far more skilled than nearly everyone around him. As a result, Johnson’s life is an example of asymmetric play in a theoretically symmetric game, and a beautiful illustration of how such asymmetric play is equivalent to the game itself having asymmetric rules.

Morality does not come from within (retracted)

Sun, Dec 30, 2018

Retraction (15 April 2022): Greg Egan has kindly explained on Twitter that I was misinterpreting the narrator’s statements, and specifically that from the “from within” part means that morality is in part a result of human internal mental processes but that those processes of course condition on the external world. I am happy to stand corrected! Post prior to retraction: Greg Egan’s short story “Silver Fire” is about people falling back from secular values.

A constructive critique of Sapiens and Homo Deus

Fri, Jul 28, 2017

Thanks to a recommendation from Dandelion Mané, I recently read “Sapiens” and “Homo Deus” by Yuval Noah Harari. Both books are wonderful breaths of fresh air and perspective. “Sapiens” is organized as a history of the species Homo Sapiens, tracing from our evolutionary separation from other primates through the cognitive revolution, the agricultural revolution, through the rest of history to the present. From this historical background, “Homo Deus” attempts to extrapolate into the future, in particular asking how our morality and goals will evolve with technology.

Against long term thinking

Sat, Dec 17, 2016

The Long Now Foundation is a wonderful organization advocating for long term thinking. Specifically, by long term they mean the next ten thousand years: The Long Now Foundation was established in 01996 to develop the Clock and Library projects, as well as to become the seed of a very long-term cultural institution. The Long Now Foundation hopes to provide a counterpoint to today’s accelerating culture and help make long-term thinking more common.

Geoffrey Irving

Biography

Interests

Education

Selected papers

Fine-tuning language models from human preferences

AI safety needs social scientists

AI safety via debate

Deep network guided proof search

TensorFlow: A system for large-scale machine learning

Pentago is a first player win: strongly solving a game using parallel in-core retrograde analysis

A deterministic pseudorandom perturbation scheme for arbitrary polynomial predicates

Developing fractal curves

Robust high-resolution cloth using parallelism, history-based collisions and accurate friction

Volume conserving finite element simulations of deformable models

Efficient simulation of large bodies of water by coupling two and three dimensional techniques

Papers

Recent Posts

Thank you for holding my duck

Lessons from Lyndon Johnson

Morality does not come from within (retracted)

A constructive critique of Sapiens and Homo Deus

Against long term thinking

Projects