The Linux Foundation’s umbrella foundation for big data and AI projects, LF AI & Data, has announced LinkedIn’s open source feature store, known as Feathr.
To manage and serve features utilized in its machine learning applications, Feathr was initially created by LinkedIn. Feathr automates and standardizes the interaction with the data type, which is used in both the training and inference phases of machine learning, instead of manually working with features as part of a specific data pipeline.
It was founded to improve machine learning programs’ performance, accuracy, and consistency. Users can now access data features “by name” from within ML processes by declaring them once in a shared feature namespace.
By enabling the usage of the same features across different ML programs, productivity and accuracy are improved. In addition, feature stores offer a more repeatable technique for converting source data into features (which is something not found in all feature stores), and by centralizing the storage and serving of features, they improve the performance of ML serving at the inference stage.
Since Feathr was introduced in 2017, more people are using the program. The feature store is currently being used by the social media giant to track thousands of features, according to LinkedIn.
“It has reduced the engineering time required for adding and experimenting with new features from weeks to days,” LinkedIn data infrastructure engineer Hangfei Lin writes in a blog post today. “It’s also performed up to 50% faster than the custom feature processing pipelines that it replaced.”
This April, LinkedIn made the code for Feathr available to the public under an Apache 2.0 license, making it the first time the general public has used the open source feature store. Since then, the project “has achieved substantial popularity among the machine learning operations (MLOps) community” and is being adopted by companies across multiple industries, Lin writes.
LinkedIn is adding greater control to the open source project by donating Feathr to The Linux Foundation’s LF AI & Data group. This should help the project draw in more users and contributors.
“We’re excited to welcome Feathr to LF AI & Data and for it to be part of our technical project portfolio (41 projects and growing) with over 17K developers,” Dr. Ibrahim Haddad, the general manager of LF AI & Data, said in a press statement. “We aim to support Feathr to expand its user base, grow its community of developers, become a leader within its category, and enable collaboration and integration opportunities with other projects. We look forward to the project’s continued growth and success as part of LF AI & Data.”
Another character in the Feathr story is Microsoft (LinkedIn is owned by Microsoft). Lin claims that to make sure Feathr functions properly on Azure and is connected with other Azure products and projects, LinkedIn developers collaborated with their Microsoft Azure colleagues.
Feathr now supports Apache Spark, Juypter, Azure BLOB Storage, HDFS, Snowflake, Databricks Delta Tables, and SQL Server, according to this blog post on the Azure website. In April, Microsoft took part in the open-sourcing of Feathr as well.