python 2.7 - How to remove the duplicate lines retaining first occurences -
let's input text file "input_msg.txt" file contains follwing records..
jan 1 02:32:40 hello welcome python world
jan 1 02:32:40 hello welcome python world
mar 31 23:31:55 learn python
mar 31 23:31:55 learn python smart
mar 31 23:31:56 python scripting language
jan 1 00:00:01 hello welcome python world
jan 1 00:00:02 hello welcome python world
mar 31 23:31:55 learn python
mar 31 23:31:56 python scripting language
the expected output file ( let's outputfile.txt ) should contain below records...
jan 1 02:32:40 hello welcome python world
jan 1 02:32:40 hello welcome python world
mar 31 23:31:55 learn python
mar 31 23:31:55 learn python smart
mar 31 23:31:56 python scripting language
jan 1 00:00:01 hello welcome python world
jan 1 00:00:02 hello welcome python world
note: need records (including duplicate) starting "jan 1()" , don't need duplicate records not starting "jan 1()"
i have tried following program duplicate records getting deleted.
def remove_duplicate_lines(inputfile, outputfile): open(inputfile) fin, open(outputfile, 'w') out: lines = (line.rstrip() line in fin) unique_lines = ordereddict.fromkeys( (line line in lines if line) ) out.writelines("\n".join(unique_lines.iterkeys())) return 0
oputput of program below:
jan 1 02:32:40 hello welcome python world
mar 31 23:31:55 learn python
mar 31 23:31:55 learn python smart
mar 31 23:31:56 python scripting language
jan 1 00:00:01 hello welcome python world
your appreciated!!!
try this.
inputfile = open("in.txt", "r") log = [] line in inputfile: if line in log , line[0:5] != "jan 1": pass else: log.append(line) inputfile.close() outfile = open("out.txt", "w") item in log: outfile.write(item) outfile.close()
Comments
Post a Comment