Connecting Unity Catalog to External Database - Lakehouse Federation
Databricks has recently make available a new functionality in the Unity Catalog that allows you to set up an external connection to an existing database.
This means that you should be able to interact with the database directly from Databricks and leveraging Spark functionalities. They have included this functionality in what they call the Lakehouse Federation.
Let’s take a look at some code examples to set it up, as it’s quite simple and only implies to run a few commands.
Before starting, check that you have the required permissions, if you don’t have them, it won’t let you create the connections, and also check that you can reach the database from Databricks from a networking perspective.
Example - Connecting to a SQL Server
In this case, we’re going to create an external catalog pointing to an Azure SQL database:
Create an external connection:
- Specify the right hostname of your database.
- Get the relevant secrets from a Key Vault with the credentials to connect to the database, or just use plan values (not recommended).
CREATE CONNECTION sqlserver_external_connection TYPE sqlserver
OPTIONS (
host '<database name>.database.windows.net',
port '1433',
user secret ('SecretScope Name','db-username'),
password secret ('SecretScope Name','db-password')
);
Create the external catalog:
- Give the catalog a name.
- Specify the name of the connection created previously.
- Specify the name of the database that you want to connect to.
CREATE FOREIGN CATALOG IF NOT EXISTS external_database_connection
USING CONNECTION sqlserver_external_connection
OPTIONS (database 'database_name');
At this point, you should be able to see the external catalog in the Databricks UI, and run queries:
Other systems
Following the same approach, and according to the documentation, you should be able to connect to a variety of services like:
- MySQL
- Synapse
- BigQuery
- Snowflake
- Redshift
Resources
Run queries using Lakehouse Federation - Azure Databricks | Microsoft Learn
Lakehouse Federation for Microsoft SQL Server - Azure Databricks | Microsoft Learn