scala - implement sum aggregator on custom case class on spark sql dataframe (UDAF) -
i have case class
case class vec(var a: int, var b: int) { def +(v: vec): vec = { += v.a b += v.b } }
now if write
val rdistds: rdd[(int, vec)] = ... val sums: rdd[(int, vec)] = rdistds.reducebykey(_+_)
i sum of vectors associated each int
key. excellent.
however, i'd dataframe
potentially query planner , make code more readable.
i able following
val df: dataframe = ... // each row has row(theint: int, vec: vec) df.groupby(df("theint")).agg(sum(df("vec")))
is there way implement sum
aggregator on custom case class
use spark sql dataframe simulated above?
as stands get
java.lang.classcastexception: org.apache.spark.sql.types.integertype$ cannot cast org.apache.spark.sql.types.structtype @ org.apache.spark.sql.catalyst.expressions.cast.org$apache$spark$sql$catalyst$expressions$cast$$cast(cast.scala:429)
as of spark 1.4, don't think udaf supported.
please have @ following tickets more information:
Comments
Post a Comment