java - Running Google Dataflow with PubsubIO source for testing -
i'm creating data-processing application using google cloud dataflow
- going stream data pubsub
bigquery
.
i'm bewildered infrastructure. created application prototype , can run locally, using files (with textio
) source , destination.
however if change source pubsubio.read.subscription(...)
fail "java.lang.illegalstateexception: no evaluator registered pubsubio.read" (i not surprised since see no methods pass authentication anyway).
but how supposed run this? should create virtual machine in google cloud engine
, deploy stuff there, or supposed describe job
somehow , submit dataflow api
(without caring of explicit vm-s?)
could please point me kind of step-by-step instruction on topic - or rather explain workflow shortly. i'm sorry question silly.
you need run pipeline on google cloud infrastructure in order access pubsub, see: https://cloud.google.com/dataflow/pipelines/specifying-exec-params#cloudexecution
from page:
// create , set pipelineoptions. dataflowpipelineoptions options = pipelineoptionsfactory.as(dataflowpipelineoptions.class); // cloud execution, set cloud platform project, staging location, // , specify dataflowpipelinerunner or blockingdataflowpipelinerunner. options.setproject("my-project-id"); options.setstaginglocation("gs://my-bucket/binaries"); options.setrunner(dataflowpipelinerunner.class); // create pipeline specified options. pipeline p = pipeline.create(options); // specify pipeline reads, transforms, , writes. ... // run pipeline. p.run();
Comments
Post a Comment