Skip to main content

3 posts tagged with "Spark"

View All Tags

· 8 min read

One of the first things that people try when they need to do something that doesn’t come out of the box in Spark is to write a UDF, a User Defined Function, that allows them to achieve that functionality they’re looking for, but is that the best way of doing it? What are the performance implications of writing a UDF?

This post will be looking at implementing a function that returns a UUID with two different approaches by using a UDF and writing Custom Spark-Native code, and comparing their performance.

Let’s get started!

· 5 min read

In this new era of information, geospatial data is becoming more and more relevant in the Data Engineering and Data Analytics work… but what do we mean when we talk about Geospatial Data?

In this post, I would like to introduce some basic concepts about Geospatial Processing using Spark, one of the most popular data processing frameworks of 2020.