DeepMind's AlphaCode Conquers Coding, Performing as Well as Humans

The secret to good programming might be to ignore everything we know about writing code. At least for AI.

It seems preposterous, but DeepMind’s new coding AI just trounced roughly 50 percent of human coders in a highly competitive programming competition. On the surface the tasks sound relatively simple: each coder is presented with a problem in everyday language, and the contestants need to write a program to solve the task as fast as possible—and hopefully, free of errors.

But it’s a behemoth challenge for AI coders. The agents need to first understand the task—something that comes naturally to humans—and then generate code for tricky problems that challenge even the best human programmers.

AI programmers are nothing new. Back in 2021, the non-profit research lab OpenAI released Codex, a program proficient in over a dozen programming languages and tuned in to natural, everyday language. What sets DeepMind’s AI release—dubbed AlphaCode—apart is in part what it doesn’t need.

Unlike previous AI coders, AlphaCode is relatively naïve. It doesn’t have any built-in knowledge about computer code syntax or structure. Rather, it learns somewhat similarly to toddlers grasping their first language. AlphaCode takes a “data-only” approach. It learns by observing buckets of existing code and is eventually able to flexibly deconstruct and combine “words” and “phrases”—in this case, snippets of code—to solve new problems.

When challenged with the CodeContest—the battle rap torment of competitive programming—the AI solved about 30 percent of the problems, while beating half the human competition. The success rate may seem measly, but these are incredibly complex problems. OpenAI’s Codex, for example, managed single-digit success when faced with similar benchmarks.

“It’s very impressive, the performance they’re able to achieve on some pretty challenging problems,” said Dr. Armando Solar-Lezama at MIT, who was not involved in the research.

The problems AlphaCode tackled are far from everyday applications—think of it more as a sophisticated math tournament in school. It’s also unlikely that the AI will take over programming completely, as its code is riddled with errors. But it could take over mundane tasks or offer out-of-the-box solutions that evade human programmers.

Perhaps more importantly, AlphaCode paves the road for a novel way to design AI coders: forget past experience and just listen to the data.

“It may seem surprising that this procedure has any chance of creating correct code,” said Dr. J. Zico Kolter at Carnegie Mellon University and the Bosch Center for AI in Pittsburgh, who was not involved in the research. But what AlphaCode shows is when “given the proper data and model complexity, a coherent structure can emerge,” even if it’s debatable whether the AI truly “understands” the task at hand.

Language to Code

AlphaCode is just the latest attempt at harnessing AI to generate better programs.

Coding is a bit like writing a cookbook. Each task requires multiple tiers of accuracy: one is the overall structure of the program, akin to an overview of the recipe. Another is detailing each procedure in extremely clear language and syntax, like describing each step of what to do, how much each ingredient needs to go in, at what temperature and with what tools.

Each of these parameters—say, cacao to make hot chocolate—are called “variables” in a computer program. Put simply, a program needs to define the variables—let’s say “c” for cacao. It then mixes “c” with other variables, such as those for milk and sugar, to solve the final problem: making a nice steaming mug of hot chocolate.

The hard part is translating all of that to an AI, especially when typing in a seemingly simple request: make me a hot chocolate.

Back in 2021, Codex made its first foray into AI code writing. The team’s idea was to rely on GPT-3, a program that’s taken the world by storm with its prowess at interpreting and imitating human language. It’s since grown into ChatGPT, a fun and not-so-evil chatbot that engages in surprisingly intricate and delightful conversations.

So what’s the point? As with languages, coding is all about a system of variables, syntax and structure. If existing algorithms work for natural language, why not use a similar strategy for writing code?

AI coding AI

AlphaCode took that approach.

The AI is built on a machine learning model called “large language model,” which underlies GPT-3. The critical aspect here is lots of data. GPT-3, for example, was fed billions of words from online resources like digital books and Wikipedia articles to begin “interpreting” human language. Codex was trained on over 100 gigabytes of scraped data from Github, a popular online software library, but still failed when faced with tricky problems.

AlphaCode inherits Codex’s “heart” in that it also operates similarly to a large language model. But two aspects set it apart, explained Kolter.

The first is training data. In addition to training AlphaCode on Github code, the DeepMind team built a custom dataset from CodeContests from two previous datasets, with over 13,500 challenges. Each came with an explanation of the task at hand, and multiple potential solutions across multiple languages. The result is a massive library of training data tailored to the challenge at hand.

“Arguably, the most important lesson for any ML [machine learning] the system is that it should be trained on data that is similar to the data it will see at runtime,” said Kolter.

The second trick is strength in numbers. When an AI writes code piece by piece (or token-by-token), it’s easy to write invalid or incorrect code, causing the program to crash or pump out outlandish results. AlphaCode tackles the problem by generating over a million potential solutions for a single problem—multiples larger than previous AI attempts.

As a sanity check and to narrow the results down, the AI candidate solves runs through simple test cases. It then clusters similar ones so it nails down just one from each cluster to submit to the challenge. It’s the most innovative step, said Dr. Kevin Ellis at Cornell University, who was not involved in the work.

The system worked surprisingly well. When challenged with a fresh set of problems, AlphaCode spit out potential solutions in two computing languages—Python or C++—while weeding out outrageous ones. When pitted against over 5,000 human participants, the AI outperformed about 45 percent of expert programmers.

A New Generation of AI Coders

While not yet at the level of humans, AlphaCode’s strength is its utter ingenuity.

Rather than copying and pasting sections of the previous training code, AlphaCode comes up with clever snippets without copying large chunks of code or logic in its “reading material.” This creativity could be due to its data-driven way of learning.

What’s missing from AlphaCode is “any architectural design in the machine learning model that relates to…generating code,” said Kolter. Writing computer code is like building a sophisticated building: it’s highly structured, with programs needing a defined syntax with clearly embedded context to generate a solution.

AlphaCode does none of it. Instead, it generates code similar to how large language models generate text, writing the entire program and then checking for potential mistakes (as a writer, this feels oddly familiar). How exactly the AI achieves this remains mysterious—the inner workings of the process are buried inside its as yet inscrutable machine “mind.”

That’s not to say AlphaCode is ready to take over programming. Sometimes it makes head-scratching decisions, such as generating a variable but not using it. There’s also the danger that it might memorize small patterns from a limited number of examples—a bunch of cats that scratched me equals all cats are evil—and the output of those patterns. This could turn them into stochastic parrots, explained Kolter, which are AIs that don’t understand the problem but can parrot, or “blindly mimic” possible solutions.

Similar to most machine learning algorithms, AlphaCode also needs computing power that few can tap into, even though the code is publicly released.

Nevertheless, the study hints at an alternative path for autonomous AI coders. Rather than endowing the machines with traditional programming wisdom, we might need to consider that the step isn’t always necessary. Rather, similar to tackling natural language, all an AI coder needs for success is data and scale.

Kolter put it best: “AlphaCode cast the die. The datasets are public. Let us see what the future holds.”

Image Credit: DeepMind