Reading SAS7BDAT files in Scala or Java
Problem
You need to read a SAS file in Scala to process its content or to convert it to other format.
Solution
There is a library in java called “parso” that is completely open source and allows you to open files SAS7BDAT and extract its content. The library hasn’t been updated in quite a few years, but it works for most of the cases.
Let’s take a look at some examples of how to use it:
Import the library
Using these coordinates (check if there is a new version before copying!):
- If you’re using maven
<dependency>
<groupId>com.epam</groupId>
<artifactId>parso</artifactId>
<version>2.0.14</version>
</dependency>
- If you’re using sbt:
libraryDependencies += "com.epam" % "parso" % "2.0.14"
Code examples of how to use it
Let's read a simple file:
import com.epam.parso.date.OutputDateType
import com.epam.parso.impl.{CSVDataWriterImpl, SasFileReaderImpl}
import java.io.{BufferedWriter, FileWriter}
val fileName = "... path to file... "
val inputFile = scala.reflect.io.File(fileName)
val file = inputFile.inputStream()
// Arguments -> InputStream, Encoding, OutputDateType: EPOCH_SECONDS, JAVA_TEMPORAL, SAS_VALUE
val sasFile: SasFileReaderImpl = new SasFileReaderImpl(file, null, OutputDateType.EPOCH_SECONDS);
// or simply with default values:
val sasFile: SasFileReaderImpl = new SasFileReaderImpl(file);
The content of the file would now be in the sasFile variable. You can examine the different columns of your file like this:
sasFile.getColumns().get(X)
To write the file, the library provides you with some Writers that you can use. For that you’ll need to define a BufferedWriter, so you can write to a file, and iterate throughout the whole file.
In this case, we’re using the “CSVDataWriterImpl” that needs a Writer, and optionally the “delimiter” you want to use, the line ending symbol and encoding.
val outputFilePath = "... path to output file ..."
val delimiter = ";"
// Now define your output
val out: BufferedWriter = new BufferedWriter(new FileWriter(outputFilePath, false))
val csvDataWriter = new CSVDataWriterImpl(out, delimiter)
csvDataWriter.writeColumnNames(sasFile.getColumns())
var hasData = true
while (hasData) {
val data = sasFile.readNext
if (data != null) {
csvDataWriter.writeRow(sasData.getColumns, data)
} else {
hasData = false
}
}
out.close()
This code will generate a CSV file with the content of the SASB7DAT file.
Writing to other formats but CSV
The Parso library only provides an implementation to write out as a CSV, so if you need to write the data out to a different format, you have a few options:
- Implement your own Writer using the CSV as starting point.
- Extract the data that you have in the sasFile variable and transform it to your needs, as the “readNext” method will give you the raw values of the file line by line. So you can easily work with this array and save them however you may need it.
sasFile.readNext
Extra
We also have a guide on how to Read SAS files using Spark! Check it out!