hadoop - Clean AWS EMR to allow reuse -
i have several task i'm preforming on aws emrs don't share data , use same emr perform them 1 after another. there way clean running emr initial state (remove hive tables, clean hdfs files etc.) avoid collision of data?
i want reuse emr several reasons:
- creation of new emr can take 5-10 minutes.
- my task relative shorts, 20-25 minutes.
- once emr created paying full hour.
we didn't find "quick , clean" api achieve behaviour. instead consolidate simple work methodology promise can clean data.
- we work on specific db instead of default one.
- we put our internal data files under specific location in hdfs.
so every time task started, first delete specific db if exists , recreate , recursively delete data under specific location in hdfs.
Comments
Post a Comment