One of the first things that people try when they need to do something that doesn’t come out of the box in Spark is to write a UDF, a User Defined Function, that allows them to achieve that functionality they’re looking for, but is that the best way of doing it? What are the performance implications of writing a UDF?
This post will be looking at implementing a function that returns a UUID with two different approaches by using a UDF and writing Custom Spark-Native code, and comparing their performance.
Let’s get started!