How to connect to Amazon Redshift or other DB's in Apache Spark? -

i'm trying connect amazon redshift via spark, can join data have on s3 data on our rs cluster. found spartan documentation here capability of connecting jdbc:

https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#jdbc-to-other-databases

the load command seems straightforward (although don't know how enter aws credentials here, maybe in options?).

df = sqlcontext.load(source="jdbc", url="jdbc:postgresql:dbserver", dbtable="schema.tablename")

and i'm not entirely sure how deal spark_classpath variable. i'm running spark locally through ipython notebook (as part of spark distribution). define spark loads it?

anyway, now, when try running these commands, bunch of undecipherable errors, i'm kind of stuck now. or pointers detailed tutorials appreciated.

it turns out need username/pwd access redshift in spark, , done follows (using python api):

from pyspark.sql import sqlcontext sqlcontext = sqlcontext(sc) df = sqlcontext.load(source="jdbc",                       url="jdbc:postgresql://host:port/dbserver?user=yourusername&password=secret",                       dbtable="schema.table")

hope helps someone!

Search This Blog

Brant

How to connect to Amazon Redshift or other DB's in Apache Spark? -

Comments

Post a Comment

Popular posts from this blog

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -