hadoop - Clean AWS EMR to allow reuse -


i have several task i'm preforming on aws emrs don't share data , use same emr perform them 1 after another. there way clean running emr initial state (remove hive tables, clean hdfs files etc.) avoid collision of data?

i want reuse emr several reasons:

  1. creation of new emr can take 5-10 minutes.
  2. my task relative shorts, 20-25 minutes.
  3. once emr created paying full hour.

we didn't find "quick , clean" api achieve behaviour. instead consolidate simple work methodology promise can clean data.

  • we work on specific db instead of default one.
  • we put our internal data files under specific location in hdfs.

so every time task started, first delete specific db if exists , recreate , recursively delete data under specific location in hdfs.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

jquery - javascript onscroll fade same class but with different div -