TestContainers is a fascinating technology that allows you to spin up docker containers programmatically, so you can recreate situations in your tests using real infrastructure components like databases instead of using mocks.
3 posts tagged with "Spark"
View All TagsWhy You Should Start Writing Spark Custom Native Functions
One of the first things that people try when they need to do something that doesn’t come out of the box in Spark is to write a UDF, a User Defined Function, that allows them to achieve that functionality they’re looking for, but is that the best way of doing it? What are the performance implications of writing a UDF?
This post will be looking at implementing a function that returns a UUID with two different approaches by using a UDF and writing Custom Spark-Native code, and comparing their performance.
Let’s get started!
An Introduction to Geospatial Processing with Spark
In this new era of information, geospatial data is becoming more and more relevant in the Data Engineering and Data Analytics work… but what do we mean when we talk about Geospatial Data?
In this post, I would like to introduce some basic concepts about Geospatial Processing using Spark, one of the most popular data processing frameworks of 2020.