Selected papers

We fine-tune pretrained language models using human feedback for various tasks, successfully matching the preferences of the external human labelers, though those preferences did not always match our own. Specifically, for summarization tasks the labelers preferred sentences copied wholesale from the input (we’d only asked them to ensure accuracy), so our models learned to copy. Summarization required 60k human labels; simpler tasks which continue text in various styles required only 5k. Our motivation is to move safety techniques closer to the general task of “machines talking to humans,” which we believe is key to extracting information about human values.

Properly aligning advanced AI systems with human values will require resolving many uncertainties related to the psychology of human rationality, emotion, and biases. These can only be resolved empirically through experimentation — if we want to train AI to do what humans want, we need to study humans.

We propose an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins. We believe that this or a similar approach could eventually help us train AI systems to perform far more cognitively advanced tasks than humans are capable of, while remaining in line with human preferences. We outline this method together with preliminary experiments and release a web interface so people can experiment with the technique.

We apply deep learning based guidance to proof search in the theorem prover E. Using strategies that leverage deep neural networks, we have found first-order proofs of 7.36% of the first-order logic translations of the Mizar Mathematical Library theorems that did not previously have ATP generated proofs. This increases the ratio of statements in the corpus with ATP generated proofs from 56% to 59%.

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including CPUs, GPUs, and TPUs. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research.

We present a strong solution of the board game pentago, computed using exhaustive parallel retrograde analysis in 4 hours on 98304 ($3 \times 2^{15}$) threads of NERSC’s Cray Edison. At $3.0 \times 10^{15}$ states, pentago is the largest divergent game solved to date by two orders of magnitude, and the only example of a nontrivial divergent game solved using retrograde analysis.

We present a symbolic perturbation scheme for black box polynomial predicates which uses an infinite series of infinitesimal perturbations. Our method is as fast as Emiris and Canny’s randomized linear perturbation scheme, scaling reasonably with the degree of the polynomial even for fully degenerate input. Like Yap’s multiple infinitesimal scheme, the computed sign is deterministic, never requiring an algorithmic restart.

We introduce sculpural forms which replace the resolution dimension of L-systems with a third space dimension, turning a fractal curve into a surface. The distances between the steps of the sequence are scaled exponentially, so that self-similarity of the curves is reflected in self-similarity of the surface.

We simulate high resolution cloth consisting of up to 2 million triangles with highly detailed folds and wrinkles. To achieve this level of detail, we use a more accurate model for cloth-object friction, a robust history-based repulsion/collision framework, and distributed memory parallelism. The algorithm is demonstrated by several high-resolution and high-fidelity simulations.

We model highly deformable nonlinear incompressible solids by conserving volume locally near each node in a finite element mesh. Our method works with arbitrary constitutive models, and works with simple linear tetrahedra without locking. We correct errors in volume without introducing oscillations by treating position and velocity in separate implicit solves, and treat treat both object contact and self-contact as linear constraints during the incompressible solve to alleviate issues with conflicting constraints.

We simulate large bodies of water with complex surface effects by combining tall cells with linear pressure profiles with small cells near the interface. The philosophy is to use the best available method near the interface (in the three-dimensional region) and to coarsen the mesh away from the interface for efficiency. We coarsen with tall, thin cells (as opposed to octrees or AMR), because they maintain good resolution horizontally allowing for accurate representation of bottom topography.


Recent Posts

More Posts

There’s a story I like to tell, which I vaguely remembered as originating at Bell Labs or Xerox PARC. A researcher had a rubber duck in his office. When he found himself stumped on a problem, he would pick up the duck, walk over to a colleague, and ask them to hold the duck. He would proceed to explain the problem, often realizing the solution himself in the middle of the explanation. Then he would say, “Thank you for holding my duck”, and leave.


I’m in the middle of the third book in Robert Caro’s biography of Lyndon Johnson. In brief, Caro’s thesis is that (1) Lyndon Johnson cares only about power, and (2) Lyndon Johnson is spectacularly skilled at politics. Moreover, (2) holds in a strong sense: Johnson is not simply skilled at politics, but far more skilled than nearly everyone around him. As a result, Johnson’s life is an example of asymmetric play in a theoretically symmetric game, and a beautiful illustration of how such asymmetric play is equivalent to the game itself having asymmetric rules.


Retraction (15 April 2022): Greg Egan has kindly explained on Twitter that I was misinterpreting the narrator’s statements, and specifically that from the “from within” part means that morality is in part a result of human internal mental processes but that those processes of course condition on the external world. I am happy to stand corrected! Post prior to retraction: Greg Egan’s short story “Silver Fire” is about people falling back from secular values.


Thanks to a recommendation from Dandelion Mané, I recently read “Sapiens” and “Homo Deus” by Yuval Noah Harari. Both books are wonderful breaths of fresh air and perspective. “Sapiens” is organized as a history of the species Homo Sapiens, tracing from our evolutionary separation from other primates through the cognitive revolution, the agricultural revolution, through the rest of history to the present. From this historical background, “Homo Deus” attempts to extrapolate into the future, in particular asking how our morality and goals will evolve with technology.


The Long Now Foundation is a wonderful organization advocating for long term thinking. Specifically, by long term they mean the next ten thousand years: The Long Now Foundation was established in 01996 to develop the Clock and Library projects, as well as to become the seed of a very long-term cultural institution. The Long Now Foundation hopes to provide a counterpoint to today’s accelerating culture and help make long-term thinking more common.