hadoop - Does Spark not support arraylist when writing to elasticsearch? -


i have following structure:

mylist = [{"key1":"val1"}, {"key2":"val2"}] myrdd = value_counts.map(lambda item: ('key', {      'field': somelist  })) 

i error: 15/02/10 15:54:08 info scheduler.tasksetmanager: lost task 1.0 in stage 2.0 (tid 6) on executor ip-10-80-15-145.ec2.internal: org.apache.spark.sparkexception (data of type java.util.arraylist cannot used) [duplicate 1]

rdd.saveasnewapihadoopfile(              path='-',              outputformatclass="org.elasticsearch.hadoop.mr.esoutputformat",              keyclass="org.apache.hadoop.io.nullwritable",              valueclass="org.elasticsearch.hadoop.mr.linkedmapwritable",              conf={          "es.nodes" : "localhost",          "es.port" : "9200",          "es.resource" : "mboyd/mboydtype"      })  

what want document end when written es is:

{ field:[{"key1":"val1"}, {"key2":"val2"}] } 

a bit late game, solution came after running in yesterday. add 'es.input.json': 'true' conf, , run json.dumps() on data.

modifying example, like:

import json  rdd = sc.parallelize([{"key1": ["val1", "val2"]}]) json_rdd = rdd.map(json.dumps) json_rdd.saveasnewapihadoopfile(      path='-',      outputformatclass="org.elasticsearch.hadoop.mr.esoutputformat",      keyclass="org.apache.hadoop.io.nullwritable",      valueclass="org.elasticsearch.hadoop.mr.linkedmapwritable",      conf={          "es.nodes" : "localhost",          "es.port" : "9200",          "es.resource" : "mboyd/mboydtype",         "es.input.json": "true"     } )  

Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

jquery - javascript onscroll fade same class but with different div -