This article applies to Sparkling Water for h2o versions 3.28.0.1 and later.
After setting up Sparkling Water for your environment follow these steps:
1. Start sparkling-shell from the Sparkling Water folder:
bin/sparkling-shell
2. Import the parquet file:
val sqlContext = spark.sqlContext val parquetFile = sqlContext.read.parquet("file:///path/to/file/")
To preview the imported file:
parquetFile.show(false)
3. Flatten the parquet file:
import ai.h2o.sparkling.ml.utils.SchemaUtils val flattenDF = H2OSchemaUtils.flattenDataFrame(parquetFile)
To preview the flattened data frame:
flattenDF.show(false)
4. Save the flattened file to disk:
flattenDF.write.parquet("file:///tmp/flattened.parquet")