Skip to main content

Writing to Databases

Introduction

Spark allows you to connect to a Database and write data to it. For that you’ll need to setup a few things first.

Configuration

In order to connect to the database you need:

jdbc url: this is the url that will point to the database. Following this pattern you'll need:

  • driver: database type.
  • jdbcHostname: ip address or url that points to the database.
  • jdbcPort: database port.
  • jdbcDatabase: database name.

Example:

val jdbcUrl = s"jdbc:${driver}://${jdbcHostname}:${jdbcPort};database=${databaseName}"

You can include the user and password in this url or pass them as a Properties / Dictionary to spark as you’ll see in the following code snippets.

Note: This jdbc urls are database dependent. All of them follow the same pattern, but they may differ in some attributes, for example:

// Oracle
val jdbcUrl = s"jdbc:oracle:thin:${userName}/${password}@${jdbcHostname}:${jdbcPort}/${serviceName}"

// Postgres
val jdbcUrl = s"jdbc:postgresql://${jdbcHostname}:${jdbcPort}/${databaseName}?user=${userName}&password=${password}&sslmode=require"

// SQL Server
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${databaseName};user=${userName};password=${password}"

In order to connect to the database you need:

jdbc url: this is the url that will point to the database. Following this pattern you'll need:

  • driver: database type.
  • jdbcHostname: ip address or url that points to the database.
  • jdbcPort: database port.
  • jdbcDatabase: database name.

Example:

val jdbcUrl = s"jdbc:${driver}://${jdbcHostname}:${jdbcPort};database=${databaseName}"

You can include the user and password in this url or pass them as a Properties / Dictionary to spark as you’ll see in the following code snippets.

Note: This jdbc urls are database dependent. All of them follow the same pattern, but they may differ in some attributes, for example:

// Oracle
val jdbcUrl = s"jdbc:oracle:thin:${userName}/${password}@${jdbcHostname}:${jdbcPort}/${serviceName}"

// Postgres
val jdbcUrl = s"jdbc:postgresql://${jdbcHostname}:${jdbcPort}/${databaseName}?user=${userName}&password=${password}&sslmode=require"

// SQL Server
val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${databaseName};user=${userName};password=${password}"

Writing data

There are several ways of setting up the write method:

// Without providing User and Password in the URL
import java.util.Properties
val connectionProperties =new Properties()

connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")

val data = spark.write.jdbc(jdbcUrl, "tableName", connectionProperties)

You can also chain all the options:

val data = spark
.write
.format("jdbc")
.mode("append")
.option("url", jdbcUrl)
.option("dbtable", tableName)
.option("user", username)
.option("password", password)
.save()