Projects

The LF AI & Data Foundation supports open source projects within artificial intelligence and the data space.

Filter by:

1chipML

1chipML is an open source library for basic numerical crunching and machine learning for microcontrollers.

Learn More

Acumos AI

Acumos AI is a platform and open source framework that makes it easy to build, share, and deploy AI apps.

Learn More

Adlik

Adlik is a toolkit for accelerating deep learning inference. The goal of Adlik is to accelerate deep learning inference process both on cloud and embedded environments.

Learn More

Adversarial Robustness Toolbox

Adversarial Robustness Toolbox (ART) provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats.

Learn More

AI Explainability 360

AI Explainability 360 is an open source toolkit that can help users better understand the ways that machine learning models predict labels using a wide variety of techniques throughout the AI application lifecycle.

Learn More

AI Fairness 360

AI Fairness 360 is an extensible open source toolkit that can help users understand and mitigate bias in machine learning models throughout the AI application lifecycle.

Learn More

Amundsen

Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Learn More

Angel ML

The Angel Project is a high-performance distributed machine learning platform based on Parameter Server, running on YARN and Apache Spark.

Learn More

Artigraph

Artigraph is a tool to improve the authorship, management, and quality of data.

Learn More

BeyondML

BeyondML is a framework for developing sparse neural networks that can perform multiple tasks across multiple data domains.

Learn More

BI & AI

The goal of this committee is to integrate the power of AI and BI to make it CI (Cognitive Intelligence) by combing the speed machines accelerate (AI) with the direction intuited by human insight (BI).

Learn More

BITOL

Within the BITOL project, the primary objective is to tackle multiple challenges, such as data normalization, ensuring the relevance of documentation, establishing service-level expectations, simplifying data and tool integration, and promoting a data product-oriented approach.

Learn More

CLAIMED

CLAIMED (Component Library for AI, Machine Learning, ETL and Data Science) is a runtime and programming language agnostic Data & AI component framework.

Learn More

DataOps Committee

The DataOps Committee in LF AI & Data is is a global group that consists of participants from various geographies focused on DataOps.

Learn More

DataPractices

DataPractices is a “Manifesto for Data Practices,” comprised of values and principles to illustrate the most effective, modern, and ethical approach to data teamwork.

Learn More

Datashim

Datashim is enabling and accelerating data access for Kubernetes/Openshift workloads in a transparent and declarative way.

Learn More

DeepCausality

DeepCausality is a hyper-geometric computational causality library that enables fast and deterministic context-aware causal reasoning over complex multi-stage causality models.

Learn More

DeepRec

DeepRec is a high-performance recommendation deep learning framework based on TensorFlow 1.15, Intel-TensorFlow and NVIDIA-TensorFlow.

Learn More

Delta

DELTA is a deep learning based end-to-end natural language and speech processing platform.

Learn More

DocArray

DocArray is a library for nested, unstructured, multimodal data in transit.

Learn More

Egeria

Egeria is the world’s first open source metadata standard. It provides open APIs, event formats, types and integration logic so organizations can share data management and governance across the entireenterprise without reformatting or restricting the data to a single format, platform, or vendor product.

Learn More

Egeria Conformance

To ensure both consistency and alignment with the standards driven by Egeria, the Egeria Conformance program is available for vendors to showcase how they are shipping Egeria as part of their offering.

Learn More

Elastic Deep Learning

EDL is an Elastic Deep Learning framework designed to help deep learning cloud service providers to build cluster cloud services using deep learning frameworks such as PaddlePaddle and TensorFlow.

Learn More

Elyra

Elyra is an open-source low code / no code framework for creating reproducible, scalable and component based data science pipelines.

Learn More

FATE

FATE (Federated AI Technology Enabler) is the world's first industrial grade federated learning open source framework to enable enterprises and institutions to collaborate on data while protecting data security and privacy.

Learn More

Feast

Feast is an open source feature store for machine learning. It was developed as a collaboration between Gojek and Google in 2018.

Learn More

Feathr

Feathr is an enterprise-grade, high-performance feature store.

Learn More

FlagAI

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale models.

Learn More

Flyte

Flyte is a production-grade, declarative, structured and highly scalable cloud-native workflow orchestration platform.

Learn More

ForestFlow

ForestFlow is a scalable policy-based cloud-native machine learning model server.

Learn More

Generative AI Commons

The LF AI & Data Generative AI Commons is committed to promoting the democratization, advancement, and adoption of efficient, secure, reliable, and ethical Generative AI open source innovations.

Learn More

Horovod

Horovod makes it easy to take a single-GPU TensorFlow program and successfully train it on many GPUs faster. Horovod also achieved significantly improved GPU resource usage figures.

Learn More

Intersectional Fairness (ISF)

Intersectional Fairness (ISF) is a bias detection and mitigation technology for addressing intersectional bias, which is caused by the combinations of multiple protected attributes.

Learn More

JanusGraph

JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.

Learn More

Kedro

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code.

Learn More

Kompute

Kompute is a general purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing use cases.

Learn More

KServe

KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks.

Learn More

LakeSoul

LakeSoul is a cloud-native Lakehouse framework developed by DMetaSoul team, and supports scalable metadata management, ACID transactions, efficient and flexible upsert operation, schema evolution, and unified streaming & batch processing.

Learn More

Ludwig

Ludwig is an open-source, declarative machine learning framework that makes it easy to define deep learning pipelines with a simple and flexible data-driven configuration system.

Learn More

Machine Learning eXchange

Machine Learning eXchange (MLX) is a Data and AI Assets Catalog and Execution Engine.

Learn More

Marquez

Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata.

Learn More

Milvus

Milvus is an open-source vector database that is highly flexible, reliable, and blazing fast.

Learn More

ML Security Committee

The ML Security committee is a global group that advances, showcases and explores challenges and solutions concerning the security of machine learning tooling, systems and use-cases.

Learn More

MLOps Committee

The LF AI & Data Foundation MLOps Committee helps related projects get more recognization and adoption through cooperation by a passionate community of members.

Learn More

NNStreamer

NNStreamer is a set of Gstreamer plugins that support ease and efficiency for Gstreamer developers adopting neural network models and neural network developers managing neural network pipelines and their filters.

Learn More

ONNX

ONNX is an open format to represent deep learning models.

Learn More

Open Platform for Enterprise AI

The mission of the Open Platform for Enterprise AI (OPEA) Project is to develop an ecosystem orchestration framework to efficiently integrate performant GenAI technologies and workflows leading to quicker GenAI adoption and business value.

Learn More

OpenBytes

OpenBytes aims to facilitate wider sharing of, and collaboration with, data in the AI community through the promotion of data standards and formats and enabling contributions of data.

Learn More

OpenDataology

OpenDataology is an open source dataset license compliance analysis project.

Learn More

OpenDS4All

ODPi’s OpenDS4All enables the creation of educational Data Science programs.

Learn More

OpenFL

OpenFL is a Python 3 library for federated learning that enables organizations to collaboratively train a model without sharing sensitive information.

Learn More

OpenLineage

OpenLineage proposes an open standard and API for lineage collection.

Learn More

Pyro

Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend.

Learn More

Recommenders

Best practices on recommendation systems.

Learn More

RosaeNLG

RosaeNLG is an open source project, template-based Natural Language Generation (NLG) automating the production of relatively repetitive texts based on structured input data and textual templates, run by a NLG engine.

Learn More

RWKV

Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V).

Learn More

SapientML

SapientML Project is a Meta-Learning based AutoML initiative designed to enhance the success of the AI model creation process.

Learn More

ShaderNN

ShaderNN is a lightweight deep learning inference framework optimized for Convolutional Neural Networks on mobile platforms.

Learn More

SOAJS

SOAJS is an open source microservices and API management platform.

Learn More

Sparklyr

Sparklyr is an open-source and modern interface to scale data science and machine learning workflows using Apache Spark™, R, and a rich extension ecosystem.

Learn More

Substra

Substra is a framework offering distributed orchestration of machine learning tasks among partners while guaranteeing secure and trustless traceability of all operations.

Learn More

The Open Voice Interoperability Initiative

The Open Voice Network Interoperability Initiative is developing The “Message Envelope,” a universal, open API for voice/chatbot and language model interoperability, analogous to HTTP AND HTML.

Learn More

The Open Voice Network Trust Mark Initiative

The Open Voice Network Trust Mark Initiative translates ethical principles into action, focusing on conversational AI.

Learn More

TonY

TonY is a framework to natively run deep learning jobs on Apache Hadoop.

Learn More

Trusted AI

The LF AI & Data Trusted AI Committee, a global group working on policies, guidelines, tools and use cases by industry to ensure the development of trustworthy AI systems and processes to develop them continue to improve over time, is now the Responsible AI Workstream of the Generative AI Commons

Learn More

Xtreme1

Xtreme1 is the next generation open source platform for multi-sensory training data.

Learn More

Adversarial Robustness Toolbox

Learn More

Egeria

Learn More

Flyte

Flyte is a production-grade, declarative, structured and highly scalable cloud-native workflow orchestration platform.

Learn More

Horovod

Horovod makes it easy to take a single-GPU TensorFlow program and successfully train it on many GPUs faster. Horovod also achieved significantly improved GPU resource usage figures.

Learn More

Marquez

Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata.

Learn More

Milvus

Milvus is an open-source vector database that is highly flexible, reliable, and blazing fast.

Learn More

ONNX

ONNX is an open format to represent deep learning models.

Learn More

OpenLineage

OpenLineage proposes an open standard and API for lineage collection.

Learn More

Pyro

Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend.

Learn More

Adlik

Adlik is a toolkit for accelerating deep learning inference. The goal of Adlik is to accelerate deep learning inference process both on cloud and embedded environments.

Learn More

AI Explainability 360

Learn More

AI Fairness 360

AI Fairness 360 is an extensible open source toolkit that can help users understand and mitigate bias in machine learning models throughout the AI application lifecycle.

Learn More

Amundsen

Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Learn More

Angel ML

The Angel Project is a high-performance distributed machine learning platform based on Parameter Server, running on YARN and Apache Spark.

Learn More

DataPractices

DataPractices is a “Manifesto for Data Practices,” comprised of values and principles to illustrate the most effective, modern, and ethical approach to data teamwork.

Learn More

Datashim

Datashim is enabling and accelerating data access for Kubernetes/Openshift workloads in a transparent and declarative way.

Learn More

Delta

DELTA is a deep learning based end-to-end natural language and speech processing platform.

Learn More

DocArray

DocArray is a library for nested, unstructured, multimodal data in transit.

Learn More

Elastic Deep Learning

EDL is an Elastic Deep Learning framework designed to help deep learning cloud service providers to build cluster cloud services using deep learning frameworks such as PaddlePaddle and TensorFlow.

Learn More

FATE

Learn More

Feast

Feast is an open source feature store for machine learning. It was developed as a collaboration between Gojek and Google in 2018.

Learn More

ForestFlow

ForestFlow is a scalable policy-based cloud-native machine learning model server.

Learn More

JanusGraph

JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.

Learn More

Kedro

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code.

Learn More

Kompute

Learn More

KServe

KServe provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks.

Learn More

Ludwig

Ludwig is an open-source, declarative machine learning framework that makes it easy to define deep learning pipelines with a simple and flexible data-driven configuration system.

Learn More

NNStreamer

Learn More

OpenDS4All

ODPi’s OpenDS4All enables the creation of educational Data Science programs.

Learn More

RWKV

Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V).

Learn More

SOAJS

SOAJS is an open source microservices and API management platform.

Learn More

Sparklyr

Sparklyr is an open-source and modern interface to scale data science and machine learning workflows using Apache Spark™, R, and a rich extension ecosystem.

Learn More

Substra

Substra is a framework offering distributed orchestration of machine learning tasks among partners while guaranteeing secure and trustless traceability of all operations.

Learn More

The Open Voice Interoperability Initiative

The Open Voice Network Interoperability Initiative is developing The “Message Envelope,” a universal, open API for voice/chatbot and language model interoperability, analogous to HTTP AND HTML.

Learn More

The Open Voice Network Trust Mark Initiative

The Open Voice Network Trust Mark Initiative translates ethical principles into action, focusing on conversational AI.

Learn More

TonY

TonY is a framework to natively run deep learning jobs on Apache Hadoop.

Learn More

1chipML

1chipML is an open source library for basic numerical crunching and machine learning for microcontrollers.

Learn More

Artigraph

Artigraph is a tool to improve the authorship, management, and quality of data.

Learn More

BeyondML

BeyondML is a framework for developing sparse neural networks that can perform multiple tasks across multiple data domains.

Learn More

BITOL

Learn More

CLAIMED

CLAIMED (Component Library for AI, Machine Learning, ETL and Data Science) is a runtime and programming language agnostic Data & AI component framework.

Learn More

DeepCausality

DeepCausality is a hyper-geometric computational causality library that enables fast and deterministic context-aware causal reasoning over complex multi-stage causality models.

Learn More

DeepRec

DeepRec is a high-performance recommendation deep learning framework based on TensorFlow 1.15, Intel-TensorFlow and NVIDIA-TensorFlow.

Learn More

Elyra

Elyra is an open-source low code / no code framework for creating reproducible, scalable and component based data science pipelines.

Learn More

Feathr

Feathr is an enterprise-grade, high-performance feature store.

Learn More

FlagAI

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale models.

Learn More

Intersectional Fairness (ISF)

Intersectional Fairness (ISF) is a bias detection and mitigation technology for addressing intersectional bias, which is caused by the combinations of multiple protected attributes.

Learn More

LakeSoul

Learn More

Machine Learning eXchange

Machine Learning eXchange (MLX) is a Data and AI Assets Catalog and Execution Engine.

Learn More

Open Platform for Enterprise AI

Learn More

OpenBytes

OpenBytes aims to facilitate wider sharing of, and collaboration with, data in the AI community through the promotion of data standards and formats and enabling contributions of data.

Learn More

OpenDataology

OpenDataology is an open source dataset license compliance analysis project.

Learn More

OpenFL

OpenFL is a Python 3 library for federated learning that enables organizations to collaboratively train a model without sharing sensitive information.

Learn More

Recommenders

Best practices on recommendation systems.

Learn More

RosaeNLG

Learn More

SapientML

SapientML Project is a Meta-Learning based AutoML initiative designed to enhance the success of the AI model creation process.

Learn More

ShaderNN

ShaderNN is a lightweight deep learning inference framework optimized for Convolutional Neural Networks on mobile platforms.

Learn More

Xtreme1

Xtreme1 is the next generation open source platform for multi-sensory training data.

Learn More

Acumos AI

Acumos AI is a platform and open source framework that makes it easy to build, share, and deploy AI apps.

Learn More

Trusted AI

Learn More

BI & AI

Learn More

DataOps Committee

The DataOps Committee in LF AI & Data is is a global group that consists of participants from various geographies focused on DataOps.

Learn More

Generative AI Commons

The LF AI & Data Generative AI Commons is committed to promoting the democratization, advancement, and adoption of efficient, secure, reliable, and ethical Generative AI open source innovations.

Learn More

ML Security Committee

The ML Security committee is a global group that advances, showcases and explores challenges and solutions concerning the security of machine learning tooling, systems and use-cases.

Learn More

MLOps Committee

The LF AI & Data Foundation MLOps Committee helps related projects get more recognization and adoption through cooperation by a passionate community of members.

Learn More