Stanford Is Ranking Major A.I. Models on Transparency

How much do we know about A.I.?

The answer, when it comes to the large language models that firms like OpenAI, Google and Meta have released over the past year: basically nothing.

These firms generally don’t release information about what data was used to train their models, or what hardware they use to run them. There are no user manuals for A.I. systems, and no list of everything these systems are capable of doing, or what kinds of safety testing have gone into them. And while some A.I. models have been made open-source — meaning their code is given away for free — the public still doesn’t know much about the process of creating them, or what happens after they’re released.

This week, Stanford researchers are unveiling a scoring system that they hope will change all of that.

The system, known as the Foundation Model Transparency Index, rates 10 large A.I. language models — sometimes called “foundation models” — on how transparent they are.

Included in the index are popular models like OpenAI’s GPT-4 (which powers the paid version of ChatGPT), Google’s PaLM 2 (which powers Bard) and Meta’s LLaMA 2. It also includes lesser-known models like Amazon’s Titan and Inflection AI’s Inflection-1, the model that powers the Pi chatbot.

To come up with the rankings, researchers evaluated each model on 100 criteria, including whether its maker disclosed the sources of its training data, information about the hardware it used, the labor involved in training it and other details. The rankings also include information about the labor and data used to produce the model itself, along with what the researchers call “downstream indicators,” which have to do with how a model is used after it’s released. (For example, one question asked is: “Does the developer disclose its protocols for storing, accessing and sharing user data?”)

The most transparent model of the 10, according to the researchers, was LLaMA 2, with a score of 53 percent. GPT-4 received the third-highest transparency score, 47 percent. And PaLM 2 received only a 37 percent.

Percy Liang, who leads Stanford’s Center for Research on Foundation Models, characterized the project as a necessary response to declining transparency in the A.I. industry. As money has poured into A.I. and tech’s largest companies battle for dominance, he said, the recent trend among many companies has been to shroud themselves in secrecy.

“Three years ago, people were publishing and releasing more details about their models,” Mr. Liang said. “Now, there’s no information about what these models are, how they’re built and where they’re used.”

Transparency is particularly important now, as models grow more powerful and millions of people incorporate A.I. tools into their daily lives. Knowing more about how these systems work would give regulators, researchers and users a better understanding of what they’re dealing with, and allow them to ask better questions of the companies behind the models.

“There are some fairly consequential decisions that are being made about the construction of these models, which are not being shared,” Mr. Liang said.

I generally hear one of three common responses from A.I. executives when I ask them why they don’t share more information about their models publicly.

The first is lawsuits. Several A.I. companies have already been sued by authors, artists and media companies accusing them of illegally using copyrighted works to train their A.I. models. So far, most of the lawsuits have targeted open-source A.I. projects, or projects that disclosed detailed information about their models. (After all, it’s hard to sue a company for ingesting your art if you don’t know which artworks it ingested.) Lawyers at A.I. companies are worried that the more they say about how their models are built, the more they’ll open themselves up to expensive, annoying litigation.

The second common response is competition. Most A.I. companies believe that their models work because they possess some kind of secret sauce — a high-quality data set that other companies don’t have, a fine-tuning technique that produces better results, some optimization that gives them an edge. If you force A.I. companies to disclose these recipes, they argue, you make them give away hard-won wisdom to their rivals, who can easily copy them.

The third response I often hear is safety. Some A.I. experts have argued that the more information that A.I. firms disclose about their models, the faster A.I. progress will accelerate — because every company will see what all of its rivals are doing and immediately try to outdo them by building a better, bigger, faster model. That will give society less time to regulate and slow down A.I., these people say, which could put us all in danger if A.I. becomes too capable too quickly.

The Stanford researchers don’t buy those explanations. They believe A.I. firms should be pressured to release as much information about powerful models as possible, because users, researchers and regulators need to be aware of how these models work, what their limitations are and how dangerous they might be.

“As the impact of this technology is going up, the transparency is going down,” said Rishi Bommasani, one of the researchers.

I agree. Foundation models are too powerful to remain so opaque, and the more we know about these systems, the more we can understand the threats they may pose, the benefits they may unlock or how they might be regulated.

If A.I. executives are worried about lawsuits, maybe they should fight for a fair-use exemption that would protect their ability to use copyrighted information to train their models, rather than hiding the evidence. If they’re worried about giving away trade secrets to rivals, they can disclose other types of information, or protect their ideas through patents. And if they’re worried about starting an A.I. arms race … well, aren’t we already in one?

We can’t have an A.I. revolution in the dark. We need to see inside the black boxes of A.I., if we’re going to let it transform our lives.