Passing Key Vault secrets as environment variables to Databricks clusters and jobs
A very useful feature that Azure Databricks provide us, is the ability to connect the workspace with an Azure Key Vault, and create secret scopes so we can use the secrets in a very easy way, for example:
// To get a secret
dbutils.secrets.get(scope = "secret-scope", key = "secret-name")
But you can also pass secrets as environment variables, and to the spark configurations of clusters during its definition.
For example:
- From the UI: you can go to the cluster definition, and in Advance Options you can define variables that use secrets in the following way:
spark.configuration={{secrets/scope-name/secret-name}}
environment_variable={{secrets/scope-name/secret-name}}
- From a json’s job definition:
{
"name": "job-name",
.....
"tasks": [
{ ... }
],
"job_clusters": [
{
"job_cluster_key": "job_name",
"new_cluster": {
"spark_conf": {
"spark.configuracion.value": "{{secrets/defaultkeyvault/secret-name}}"
},
"spark_env_vars": {
"environment_variable": "{{secrets/defaultkeyvault/secret-name}}"
},
...
}
}
],
...
}
These two variables will be changed to its corresponding value when the cluster is starting, this way you don’t need to hardcode values!