replace - In Python, how could I convert a whitespace-delimited log output to a Markdown-formatted table? -


in python, how convert whitespace-delimited log output markdown-formatted table?

i have output log contains printouts of following form:

 - e_jets cutflow:                                   cut         events       1                       initial          13598       2                           grl           7250       3               el_n 25000 >= 1            326       4               el_n 25000 == 1            313       5               mu_n 25000 == 0            313       6             jetclean loosebad            313       7              jet_n 25000 >= 1            113       8              jet_n 25000 >= 2             26       9              jet_n 25000 >= 3              8      10              jet_n 25000 >= 4              2      11      mva_btag mv2c20 -0.4434               2      12                     variables              2      13                          save              2   - mu_jets cutflow:                                   cut         events       1                       initial          13598       2                           grl           7250       3               mu_n 25000 >= 1            326       4               mu_n 25000 == 1            313       5               el_n 25000 == 0            313       6             jetclean loosebad            313       7              jet_n 25000 >= 1            127       8              jet_n 25000 >= 2             47       9              jet_n 25000 >= 3             15      10              jet_n 25000 >= 4              4      11      mva_btag mv2c20 -0.4434               4      12                     variables              4      13                          save              4 

i want convert form following (markdown tables):

|**e_jets cutflow**|**cut**|**events**| |---|---|---| |1|initial|13598| |2|grl|7250| |3|el_n 25000 >= 1|326| |4|el_n 25000 == 1|313| |5|mu_n 25000 == 0|313| |6|jetclean loosebad|313| |7|jet_n 25000 >= 1|113| |8|jet_n 25000 >= 2|26| |9|jet_n 25000 >= 3|8| |10|jet_n 25000 >= 4|2| |11|mva_btag mv2c20 -0.4434|2| |12|variables|2| |13|save|2|  |**mu_jets cutflow**|**cut**|**events**| |---|---|---| |1|initial|13598| |2|grl|7250| |3|mu_n 25000 >= 1|326| |4|mu_n 25000 == 1|313| |5|el_n 25000 == 0|313| |6|jetclean loosebad|313| |7|jet_n 25000 >= 1|127| |8|jet_n 25000 >= 2|47| |9|jet_n 25000 >= 3|15| |10|jet_n 25000 >= 4|4| |11|mva_btag mv2c20 -0.4434|4| |12|variables|4| |13|save|4| 

one way replace 2 or more consecutive spaces vertical bar character. how in efficient way? given table have 3 columns, there better, more robust way this?

based on question, comments, , sample data posted (all spaces, no tabs), assumption real question how handle fact each line in data contains different number of spaces between each field.

assuming spaces can appear within field, 1 @ time, , 2 or more spaces delimit field, following regex find breaks between each field (two or more spaces):

[ ]{2,} 

then use substitution replace deliminators:

output = re.sub(r'[ ]{2,}', input, '|') 

of course, doesn't handle line endings or heading. heading, want split on lines anyway, can handle end of line also:

spaces_re = re.compile(r'[ ]{2,}')  new_lines  [] line in input.split('\n'):     new_lines.append(spaces_re.sub(line, '|')) output = '|\n'.join(new_lines) 

now, add if statement or 2 within loop detect heading , format appropriately , have working solution. wasn't asked about, i'll leave exercise reader.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -