Flattening a Nested Parquet File using Sparkling Water : H2O Support

This article applies to Sparkling Water for h2o versions 3.28.0.1 and later.

After setting up Sparkling Water for your environment follow these steps:

1. Start sparkling-shell from the Sparkling Water folder:

bin/sparkling-shell

2. Import the parquet file:

val sqlContext = spark.sqlContext
val parquetFile = sqlContext.read.parquet("file:///path/to/file/")

To preview the imported file:

parquetFile.show(false)

3. Flatten the parquet file:

import ai.h2o.sparkling.ml.utils.SchemaUtils

val flattenDF = H2OSchemaUtils.flattenDataFrame(parquetFile)

To preview the flattened data frame:

flattenDF.show(false)

4. Save the flattened file to disk:

flattenDF.write.parquet("file:///tmp/flattened.parquet")

Machine intelligence for your business