运行spark程序,里面有将DataFrame的数据转为csv存储,
但因为数据里有Array类型,无法转换报错:
1
| CSV data source does not support array<string> data type
|
例子数据
1 2 3 4 5 6 7 8
| +----------+----------+-------------------+ |antecedent|consequent| confidence| +----------+----------+-------------------+ | [C47_D]| [C3_B]|0.35714285714285715| | [C47_D]| [C13_A]|0.35714285714285715| | [C47_D]| [C23_A]|0.35714285714285715| | [C47_D]| [C24_D]|0.35714285714285715| ...
|
前面2个是数组.
解决办法
多种,下面看2种
1.
1 2 3 4 5 6
| import org.apache.spark.sql.functions.udf
val stringify = udf((vs: Seq[String]) => s"""${vs.mkString(",")}""") df.withColumn("antecedent", stringify($"antecedent")) .withColumn("consequent", stringify($"consequent")) .write.csv("/path/data/csv")
|
2.
1 2 3 4
| case class Asso(antecedent: String, consequent: String, confidence: String)
df.rdd.map { line => Asso(line(0).toString, line(1).toString, line(2).toString) }. toDF().write.csv("/path/data/csv")
|