hadoop - Removal Double Quote(") from CSV file using PIG -
i trying remove double quotes(") file.some of field has data "newyork,ny". please advice me do?i have tried delete (") csv.but not happening.stepwise codes given below:
i opening pig using pig -x local
1st step:
test4 = load '/home/hduser/desktop/flight_data.csv' using pigstorage(',') ( year: chararray, quarter: chararray, month: chararray, day_of_month: chararray, day_of_week: chararray, fl_date: chararray, unique_carrier: chararray, airline_id: chararray, carrier: chararray, tail_num: chararray, fl_num: chararray, origin: chararray, origin_city_name: chararray, origin_state_abr: chararray, origin_state_fips: chararray, origin_state_nm: chararray, origin_wac: chararray, dest: chararray, dest_city_name: chararray, dest_state_abr: chararray, dest_state_fips: chararray, dest_state_nm: chararray, dest_wac: chararray, crs_dep_time: chararray, dep_time: chararray, dep_delay: chararray, dep_delay_new: chararray, dep_del15: chararray, dep_delay_group: chararray, dep_time_blk: chararray, taxi_out: chararray, wheels_off: chararray, wheels_on: chararray, taxi_in: chararray, crs_arr_time: chararray, arr_time: chararray, arr_delay: chararray, arr_delay_new: chararray, arr_del15: chararray, arr_delay_group: chararray, arr_time_blk: chararray, cancelled: chararray, cancellation_code: chararray, diverted: chararray, crs_elapsed_time: chararray, actual_elapsed_time: chararray, air_time: chararray, flights: chararray, distance: chararray, distance_group: chararray, carrier_delay: chararray, weather_delay: chararray, nas_delay: chararray, security_delay: chararray, late_aircraft_delay: chararray);
2nd step:
new_data = foreach test4 generate flatten(regex_extract(origin_city_name,'."([^"])"',1)) statename;
after writing command,in new_data variable fields saving (). please suggest me option overcome problem.thanks in advance help.
i have tried way also,given below:
aviation_data = foreach test4 generate regex_extract($0,'([0-9]+)', 1), regex_extract($1,'([0-9]+)', 1), regex_extract($2,'([0-9]+)', 1), regex_extract($3,'([0-9]+)', 1), regex_extract($4,'([0-9]+)', 1), regex_extract($5,'([0-9]+)', 1), regex_extract($6,'([0-9]+)', 1), regex_extract($7,'([0-9]+)', 1), regex_extract($8,'([0-9]+)', 1), regex_extract($9,'([0-9]+)', 1), regex_extract($10,'([0-9]+)', 1), regex_extract($11,'([0-9]+)', 1), regex_extract($12,'([0-9]+)', 1), regex_extract($13,'([0-9]+)', 1), regex_extract($14,'([0-9]+)', 1), regex_extract($15,'([0-9]+)', 1), regex_extract($16,'([0-9]+)', 1), regex_extract($17,'([0-9]+)', 1), regex_extract($18,'([0-9]+)', 1), regex_extract($19,'([0-9]+)', 1), regex_extract($20,'([0-9]+)', 1), regex_extract($21,'([0-9]+)', 1), regex_extract($22,'([0-9]+)', 1), regex_extract($23,'([0-9]+)', 1), regex_extract($24,'([0-9]+)', 1), regex_extract($25,'([0-9]+)', 1), regex_extract($26,'([0-9]+)', 1), regex_extract($27,'([0-9]+)', 1), regex_extract($28,'([0-9]+)', 1), regex_extract($29,'([0-9]+)', 1), regex_extract($30,'([0-9]+)', 1), regex_extract($31,'([0-9]+)', 1), regex_extract($32,'([0-9]+)', 1), regex_extract($33,'([0-9]+)', 1), regex_extract($34,'([0-9]+)', 1), regex_extract($35,'([0-9]+)', 1), regex_extract($36,'([0-9]+)', 1), regex_extract($37,'([0-9]+)', 1), regex_extract($38,'([0-9]+)', 1), regex_extract($39,'([0-9]+)', 1), regex_extract($40,'([0-9]+)', 1), regex_extract($41,'([0-9]+)', 1), regex_extract($42,'([0-9]+)', 1), regex_extract($43,'([0-9]+)', 1), regex_extract($44,'([0-9]+)', 1), regex_extract($45,'([0-9]+)', 1), regex_extract($46,'([0-9]+)', 1), regex_extract($47,'([0-9]+)', 1), regex_extract($48,'([0-9]+)', 1), regex_extract($49,'([0-9]+)', 1), regex_extract($50,'([0-9]+)', 1), regex_extract($51,'([0-9]+)', 1), regex_extract($52,'([0-9]+)', 1), regex_extract($53,'([0-9]+)', 1), regex_extract($54,'([0-9]+)', 1);
results given below:
(2015,1,1,29,4,2015,,20304,,549,4837,,,,,04,,81,,,,,53,,93,1757,1851,54,54,1,3,1700,19,1910,2034,6,2005,2040,35,35,1,2,2000,0,,0,188,169,144,1,1107,5,0,0,0)
none of text field coming.
we can use either : org.apache.pig.piggybank.storage.csvexcelstorage() or org.apache.pig.piggybank.storage.csvloader().
refer below api links details
http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/csvexcelstorage.html http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/csvloader.html
test4 = load '/home/hduser/desktop/flight_data.csv' using org.apache.pig.piggybank.storage.csvexcelstorage() (....)
Comments
Post a Comment