scala - implement sum aggregator on custom case class on spark sql dataframe (UDAF) -


i have case class

case class vec(var a: int, var b: int) {     def +(v: vec): vec = {         += v.a         b += v.b             } } 

now if write

val rdistds: rdd[(int, vec)] = ... val sums: rdd[(int, vec)] = rdistds.reducebykey(_+_) 

i sum of vectors associated each int key. excellent.

however, i'd dataframe potentially query planner , make code more readable.

i able following

val df: dataframe = ... // each row has row(theint: int, vec: vec) df.groupby(df("theint")).agg(sum(df("vec"))) 

is there way implement sum aggregator on custom case class use spark sql dataframe simulated above?

as stands get

java.lang.classcastexception:      org.apache.spark.sql.types.integertype$ cannot cast          org.apache.spark.sql.types.structtype @ org.apache.spark.sql.catalyst.expressions.cast.org$apache$spark$sql$catalyst$expressions$cast$$cast(cast.scala:429) 

as of spark 1.4, don't think udaf supported.

please have @ following tickets more information:


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -