Require kryo serialization in Spark (Scala) -


i have kryo serialization turned on this:

conf.set( "spark.serializer", "org.apache.spark.serializer.kryoserializer" ) 

i want ensure custom class serialized using kryo when shuffled between nodes. can register class kryo way:

conf.registerkryoclasses(array(classof[foo])) 

as understand it, not guarantee kyro serialization used; if serializer not available, kryo fall java serialization.

to guarantee kryo serialization happens, followed recommendation spark documentation:

conf.set("spark.kryo.registrationrequired", "true") 

but causes illegalarugmentexception thrown ("class not registered") bunch of different classes assume spark uses internally, example following:

org.apache.spark.util.collection.compactbuffer scala.tuple3 

surely not have manually register each of these individual classes kryo? these serializers defined in kryo, there way automatically register of them?

as understand it, not guarantee kyro serialization used; if serializer not available, kryo fall java serialization.

no. if set spark.serializer org.apache.spark.serializer. kryoserializer spark use kryo. if kryo not available, error. there no fallback.

so kryo registration then?

when kryo serializes instance of unregistered class has output qualified class name. that's lot of characters. instead, if class has been pre-registered, kryo can output numeric reference class, 1-2 bytes.

this crucial when each row of rdd serialized kryo. don't want include same class name each of billion rows. pre-register these classes. it's easy forget register new class , you're wasting bytes again. solution require every class registered:

conf.set("spark.kryo.registrationrequired", "true") 

now kryo never output full class names. if encounters unregistered class, that's runtime error.

unfortunately it's hard enumerate classes going serializing in advance. idea spark registers spark-specific classes, , register else. have rdd[(x, y, z)]? have register classof[scala.tuple3[_, _, _]].

the list of classes spark registers includes compactbuffer, if see error that, you're doing wrong. bypassing spark registration procedure. have use either spark.kryo.classestoregister or spark.kryo.registrator register classes. (see config options. if use graphx, registrator should call graphxutils. registerkryoclasses.)


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -