Can We Move Forward with the Open Source AI Definition?
December 04, 2024

Ben Cotton
Kusari

You can't observe what you can't see. It's why application developers and DevOps engineers collect performance metrics, and it's why developers and security engineers like open source software. The ability to inspect code increases trust in what the code does. With the increased prevalence of generative AI, there's a desire to have the same ability to inspect the AI models. Most generative AI models are black boxes, so some vendors are using the term "open source" to set their offerings apart.

But what does "open source AI" mean? There's no generally-accepted definition.

The Open Source Initiative (OSI) defines what "open source" means for software. Their Open Source Definition (OSD) is a broadly-accepted set of criteria for what makes a software license open source. New licenses are reviewed in public by a panel of experts to evaluate their compliance with the OSD. OSI focuses on software, leaving "open" in other domains to other bodies, but since AI models are, at the simplest level, software plus configuration and data, OSI is a natural home for creating a definition of open source AI.


What's Wrong with the OSAID?

The OSI attempted to do just that. The Open Source AI Definition (OSAID), released at the end of October, represents a collaborative attempt to craft a set of rules for what makes an AI model "open source." But while the OSD is generally accepted as an appropriate definition of "open source" for software, the OSAID has received mixed reviews.

The crux of the criticism is that the OSAID does not require that the training data be available, only "data information." Version 1.0, and its accompanying FAQ require that training data be made available if possible, but still permits providing only a description when the data is "unshareable." OSI's argument is that the laws that cover data are more complex, and have more jurisdictional variability, than the laws governing copyrightable works like software. There's merit to this, of course. The data used to train AI models includes copyrightable works like blog posts, paintings, and books, but it can also include sensitive and protected information like medical histories and other personal information. Model vendors that train on sensitive data couldn't legally share their training data, OSI argues, so a definition that requires it is pointless.

I appreciate the merits of that argument, but I — and others — don't find it compelling enough to craft the definition of "open source" around it, especially since model vendors can find plausible reasons to claim they can't share training data. The OSD is not defined based on what is convenient, it's defined based on what protects certain rights for the consumers of software. The same should be true for a definition of open source AI. The fact that some models cannot meet the definition should mean that those models are not open source; it should not mean that the definition is changed to be more convenient. If no models could possibly meet a definition of open, that's one thing. But many existing models do, and more could if the developers chose.

Any Definition Is a Starting Point

Despite the criticisms of the OSI's definition, a flawed definition is better than no definition. Companies use AI in many ways: from screening job applicants to writing code to creating social media images to customer service chatbots. Any of these uses pose the risk for reputational, financial, and legal harm. Companies who use AI need to know exactly what they're getting — and not getting. A definition for "open source AI" doesn't eliminate the need to carefully examine an AI model, but it does give a starting point.

The current OSD has evolved over the last two decades; it is currently on version 1.9. It stands to reason that the OSAID will evolve as people use it to evaluate real-world AI models. The criticisms of the initial version may inform future changes that result in a more broadly-accepted definition. In the meantime, other organizations have announced their own efforts to address deficiencies in the OSAID. The Digital Public Goods Alliance — a UN-endorsed initiative — will continue to require published training data in order to grant Digital Public Good status to AI systems.

It is also possible that we'll change how we speak about openness. Just like OSD-noncompliant movements like Ethical Source have introduced a new vocabulary, open source AI may force us to recognize that openness is a spectrum on several axes, not simply a binary attribute.

Ben Cotton is Head of Community at Kusari
Share this

Industry News

November 10, 2025

Parasoft is showcasing its latest innovations in software quality assurance for safety- and security-critical embedded systems at embedded world North America, booth 8031.

November 10, 2025

The Cloud Native Computing Foundation® (CNCF®), which builds sustainable ecosystems for cloud native software, announced new integrations between Falco, a graduated project, and Stratoshark, a forensic tool inspired by Wireshark.

November 10, 2025

CKEditor announced the launch of CKEditor AI, an addition to CKEditor that makes it a rich text editor to integrate multi-turn conversational AI.

November 10, 2025

BellSoft announced Hardened Images, a tool for enhancing the security and compliance of containerized applications in Kubernetes.

November 06, 2025

Check Point® Software Technologies Ltd. announced it has been named as a Recommended vendor in the NSS Labs 2025 Enterprise Firewall Comparative Report, with the highest security effectiveness score.

November 06, 2025

Buoyant announced upcoming support for Model Context Protocol (MCP) in Linkerd to extend its core service mesh capabilities to this new type of agentic AI traffic.

November 06, 2025

Dataminr announced the launch of the Dataminr Developer Portal and an enhanced Software Development Kit (SDK).

November 05, 2025

Google Cloud announced new capabilities for Vertex AI Agent Builder, focused on solving the developer challenge of moving AI agents from prototype to a scalable, secure production environment.

November 05, 2025

Prismatic announced the availability of its MCP flow server for production-ready AI integrations.

November 05, 2025

Aptori announced the general availability of Code-Q (Code Quick Fix), a new agent in its AI-powered security platform that automatically generates, validates and applies code-level remediations for confirmed vulnerabilities.

November 04, 2025

Perforce Software announced the availability of Long-Term Support (LTS) for Spring Boot and Spring Framework.

November 04, 2025

Kong announced the general availability of Insomnia 12, the open source API development platform that unifies designing, mocking, debugging, and testing APIs.

November 04, 2025

Testlio announced an expanded, end-to-end AI testing solution, the latest addition to its managed service portfolio.

November 03, 2025

Incredibuild announced the acquisition of Kypso, a startup building AI agents for engineering teams.

November 03, 2025

Sauce Labs announced Sauce AI for Insights, a suite of AI-powered data and analytics capabilities that helps engineering teams analyze, understand, and act on real-time test execution and runtime data to deliver quality releases at speed - while offering enterprise-grade rigorous security and compliance controls.