Databricks introduces Delta Sharing, an open-source tool for sharing data
Databricks launched its fifth open-source project today, a new tool called Delta Sharing designed to be a vendor-neutral way to share data with any cloud infrastructure or SaaS product, so long as you have the appropriate connector. It’s part of the broader Databricks open-source Delta Lake project.
As CEO Ali Ghodsi points out, data is exploding, and moving data from Point A to Point B is an increasingly difficult problem to solve with proprietary tooling. “The number one barrier for organizations to succeed with data is sharing data, sharing it between different views, sharing it across organizations — that’s the number one issue we’ve seen in organizations,” Ghodsi explained.
Delta Sharing is an open-source protocol designed to solve that problem. “This is the industry’s first-ever open protocol, an open standard for sharing a data set securely. […] They can standardize on Databricks or something else. For instance, they might have standardized on using AWS Data Exchange, Power BI or Tableau — and they can then access that data securely.”
The tool is designed to work with multiple cloud infrastructure and SaaS services and out of the gate there are multiple partners involved, including the Big Three cloud infrastructure vendors Amazon, Microsoft and Google, as well as data visualization and management vendors like Qlik, Starburst, Collibra and Alation and data providers like Nasdaq, S&P and Foursquare
Ghodsi said the key to making this work is the open nature of the project. By doing that and donating it to The Linux Foundation, he is trying to ensure that it can work across different environments. Another big aspect of this is the partnerships and the companies involved. When you can get big-name companies involved in a project like this, it’s more likely to succeed because it works across this broad set of popular services. In fact, there are a number of connectors available today, but Databricks expects that number to increase over time as contributors build more connectors to other services.
Databricks operates on a consumption pricing model much like Snowflake, meaning the more data you move through its software, the more money it’s going to make, but the Delta Sharing tool means you can share with anyone, not just another Databricks customer. Ghodsi says that the open-source nature of Delta Sharing means his company can still win, while giving customers more flexibility to move data between services.
The infrastructure vendors also love this model because the cloud data lake tools move massive amounts of data through their services and they make money too, which probably explains why they are all on board with this.
One of the big fears of modern cloud customers is being tied to a single vendor as they often were in the 1990s and early 2000s when most companies bought a stack of services from a single vendor like Microsoft, IBM or Oracle. On one hand, you had the veritable single throat to choke, but you were beholden to the vendor because the cost of moving to another one was prohibitively high. Companies don’t want to be locked in like that again and open source tooling is one way to prevent that.
Databricks was founded in 2013 and has raised almost $2 billion. The latest round was in February for $1 billion at a $28 billion valuation, an astonishing number for a private company. Snowflake, a primary competitor, went public last September. As of today, it has a market cap of over $66 billion.