Generating a Spark StructType from a Scala Case Class
Problem
You are using Scala and Spark, and you want to generate a spark StructType
schema from a Scala Class.
Solution
You can take advantage of the Scala Reflection system that will transform a case class to a spark struct for you. Here you can find a code example:
import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.types.{DataTypes, StringType, StructType}
// Define the Case Class
case class MyScalaClass(stringValue: String, intValue: Int, booleanValue: Boolean)
// Use the Reflection System
val sparkSchema = ScalaReflection.schemaFor[MyScalaClass].dataType.asInstanceOf[StructType]
// Check the output
sparkSchema.printTreeString()
>--- Output
root
|-- stringValue: string (nullable = true)
|-- intValue: integer (nullable = false)
|-- booleanValue: boolean (nullable = false)
This approach also works with complex types and compound classes.
import org.apache.spark.sql.catalyst.ScalaReflection
import org.apache.spark.sql.types.{DataTypes, StringType, StructType}
// Define the Case Classes
case class MyScalaClass(stringValue: String, intValue: Int, booleanValue: Boolean)
case class MyComplexClass(stringValue: String, myComplexType: MyScalaClass)
// Use the Reflection System
val complexSparkSchema = ScalaReflection.schemaFor[MyComplexClass].dataType.asInstanceOf[StructType]
// Check the output
complexSparkSchema.printTreeString()
>--- Output
root
|-- stringValue: string (nullable = true)
|-- myComplexType: struct (nullable = true)
| |-- stringValue: string (nullable = true)
| |-- intValue: integer (nullable = false)
| |-- booleanValue: boolean (nullable = false)