Python: Fastest way of parsing first column of large table in array -


so have got 2 big tables compare (9 columns , approx 30 million rows).

#!/usr/bin/python import sys import csv   def compare(sam1, sam2, output):     open(sam1, "r") s1, open(sam2, "r") s2, open(output, "w") out:     reader1 = csv.reader(s1, delimiter = "\t")     reader2 = csv.reader(s2, delimiter = "\t")     writer  = csv.writer(out, delimiter = "\t")     list = []     line in reader1:         list.append(line[0])     list = set(list)      line in reader2:         field in line:             if field not in list:                 writer.writerow(line)  if __name__ == '__main__':     compare(sys.argv[1], sys.argv[2], sys.argv[3]) 

the first column contains identifier of rows , know ones in sam1.

so code working with, takes ages. there way speed up?

i tried speed converting list set, there no big difference.

edit: running quicker have whole lines out of input table , write lines exclusive id output file. how manage in quick way?

a few suggestions:

  • rather creating list turn set, work set directly:

    sam1_identifiers = set() line in reader1:     sam1_identifiers.add(line[0]) 

    this more memory efficient, because have single set rather list , set. might make bit faster.

    note i've changed variable name – list name of python builtin function, shouldn't use own variables.

  • since want find identifiers in sam1, rather nested if/for statements, compare , throw away identifiers found in sam2 in set of ids in sam1.

    sam2_identifiers = set() line in reader2:     sam2_identifiers.add(line[0])  print sam1 - sam2 

    or even

    sam2_identifiers = set() line in reader2:     sam1_identifiers.discard(line[0])  print sam1_identifiers 

    i suspect that's faster nested loops.

  • perhaps i've missed something, don't through every column each line of sam2? isn't sufficient @ line[0] identifier, sam1?


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -