Writing to Databases
Introduction
Spark allows you to connect to a Database and write data to it. For that you’ll need to setup a few things first.
Configuration
In order to connect to the database you need:
jdbc url: this is the url that will point to the database. Following this pattern you'll need:
- driver: database type.
- jdbcHostname: ip address or url that points to the database.
- jdbcPort: database port.
- jdbcDatabase: database name.
Example:
val jdbcUrl = s"jdbc:${driver}://${jdbcHostname}:${jdbcPort};database=${databaseName}"
You can include the user and password in this url or pass them as a Properties / Dictionary to spark as you’ll see in the following code snippets.
Note: This jdbc urls are database dependent. All of them follow the same pattern, but they may differ in some attributes, for example:
// Oracle
val jdbcUrl = s"jdbc:oracle:thin:${userName}/${password}@${jdbcHostname}:${jdbcPort}/${serviceName}"
// Postgres
val jdbcUrl = s"jdbc:postgresql://${jdbcHostname}:${jdbcPort}/${databaseName}?user=${userName}&password=${password}&sslmode=require"
// SQL Server
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${databaseName};user=${userName};password=${password}"
In order to connect to the database you need:
jdbc url: this is the url that will point to the database. Following this pattern you'll need:
- driver: database type.
- jdbcHostname: ip address or url that points to the database.
- jdbcPort: database port.
- jdbcDatabase: database name.
Example:
val jdbcUrl = s"jdbc:${driver}://${jdbcHostname}:${jdbcPort};database=${databaseName}"
You can include the user and password in this url or pass them as a Properties / Dictionary to spark as you’ll see in the following code snippets.
Note: This jdbc urls are database dependent. All of them follow the same pattern, but they may differ in some attributes, for example:
// Oracle
val jdbcUrl = s"jdbc:oracle:thin:${userName}/${password}@${jdbcHostname}:${jdbcPort}/${serviceName}"
// Postgres
val jdbcUrl = s"jdbc:postgresql://${jdbcHostname}:${jdbcPort}/${databaseName}?user=${userName}&password=${password}&sslmode=require"
// SQL Server
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${databaseName};user=${userName};password=${password}"
Writing data
There are several ways of setting up the write method:
// Without providing User and Password in the URL
import java.util.Properties
val connectionProperties =new Properties()
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
val data = spark.write.jdbc(jdbcUrl, "tableName", connectionProperties)
You can also chain all the options:
val data = spark
.write
.format("jdbc")
.mode("append")
.option("url", jdbcUrl)
.option("dbtable", tableName)
.option("user", username)
.option("password", password)
.save()