Data sharing is the process of sharing data resources within and between entities and organizations whilst preserving the data’s quality. It enables data to be passed between parties in a way that keeps the data intact so that all recipients receive the same metadata and level of accuracy as the original dataset. Because of this, data sharing is connected to data democratization, because both processes are concerned with improving accessibility to important data.
In practice, data sharing occurs most often between individuals and departments at large, global organizations. For example, a chief data officer at the Singapore office of a consultancy firm might share financial data with their counterpart in the firm’s US branch. Another common use case for data sharing is when data analysts or providers buy licenses for datasets to enrich their own internal datasets. This data must be shared securely.
There are different technology options that facilitate data sharing. A traditional method uses SFTP (SSH File Transfer Protocol). However, working with SFTP isn’t the most scalable data sharing method because it only serves files offloaded to an FTP server. More efficient data sharing solutions include products offered by Databricks, Oracle, AWS Redshift, and Snowflake. Databricks' Delta Sharing is a particularly powerful data sharing platform as it supports open data sharing across organizations. The data can be sent to cloud storage systems like S3, ADLS, and GCS. This sets Delta Sharing apart from other data sharing platforms which only support sharing between departments of the same company.