Skip to main content

Generating a Spark StructType from a Scala Case Class

Problem

You are using Scala and Spark, and you want to generate a spark StructType schema from a Scala Class.

Solution

You can take advantage of the Scala Reflection system that will transform a case class to a spark struct for you. Here you can find a code example:

import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.types.{DataTypes, StringType, StructType}

// Define the Case Class
case class MyScalaClass(stringValue: String, intValue: Int, booleanValue: Boolean)

// Use the Reflection System
val sparkSchema = ScalaReflection.schemaFor[MyScalaClass].dataType.asInstanceOf[StructType]

// Check the output
sparkSchema.printTreeString()

>--- Output
root
|-- stringValue: string (nullable = true)
|-- intValue: integer (nullable = false)
|-- booleanValue: boolean (nullable = false)

This approach also works with complex types and compound classes.

import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.types.{DataTypes, StringType, StructType}

// Define the Case Classes
case class MyScalaClass(stringValue: String, intValue: Int, booleanValue: Boolean)
case class MyComplexClass(stringValue: String, myComplexType: MyScalaClass)

// Use the Reflection System
val complexSparkSchema = ScalaReflection.schemaFor[MyComplexClass].dataType.asInstanceOf[StructType]

// Check the output
complexSparkSchema.printTreeString()

>--- Output
root
|-- stringValue: string (nullable = true)
|-- myComplexType: struct (nullable = true)
| |-- stringValue: string (nullable = true)
| |-- intValue: integer (nullable = false)
| |-- booleanValue: boolean (nullable = false)