apache spark running task on each rdd -


i have rdd distributed accross multiple machines in spark environment. execute function on each worker machine on rdd. not want collect rdd , execute function on driver. function should executed seperately on each executors own rdd. how can that

update (adding code) running in spark shell

import org.apache.spark.sql.cassandra.cassandrasqlcontext import java.util.properties   val cc = new cassandrasqlcontext(sc)  val rdd  = cc.sql("select * sams.events appname = 'test'");  val df = rdd.select("appname", "assetname"); 

here have df 400 rows. need save df sql server table. when try use df.write method gives me errors have posted in separate thread spark dataframe not appending table

i can open drivermanager conection , insert rows done in driver module of spark

import java.sql._ import com.microsoft.sqlserver.jdbc.sqlserverdriver // create statement connection statement statement = conn.createstatement();  // insert data statement.executeupdate("insert customers " + "values (1001, 'simpson', 'mr.', 'springfield', 2001)"); string connectionurl = "jdbc:sqlserver://localhost:1433;" +    "databasename=adventureworks;user=myusername;password=*****;"; connection con = drivermanager.getconnection(connectionurl); 

i need writing in executor machine. how can achieve this?

in order setup connections workers other systems, should use rdd.foreachpartitions(iter => ...)

foreachpartitions lets execute operation each partition, giving access data of partition local iterator. enough data per partition, time of setting resources (like db connections) amortized using such resources on whole partition.

abstract eg.

rdd.foreachpartition(iter =>     //setup db connection    val dbconn = driver.connect(ip, port)    iter.foreach{element =>         val query = makequery(element)        dbconn.execute(query)    }    dbconn.close } 

it's possible create singleton resource managers manage resources each jvm of cluster. see answer complete example of such local resource manager: spark-streaming , connection pool implementation


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -