python - How to get rows from numpy 2d where column value is maximum from group by other column? -

this pretty common sql query: select lines maximum value in column x, group group_id.

the result every group_id 1 (first) line column x value maximum within group.

i have 2d numpy array many columns lets simplify (id x y):

import numpy np rows = np.array([[1 22 1236]                  [1 11 1563]                  [2 13 1234]                  [2 10 1224]                  [2 23 1111]                  [2 23 1250]])

and want get:

[[1 22 1236]  [2 23 1111]]

i able through cumbersome loop, like:

  row_grouped_with_max = []    max_row = rows[0]   last_max = max_row[1]   last_row_group = max_row[0]   row in rows:     if last_max < row[1]:         max_row = row     if row[0] != last_row_group:             last_row_group = row[0]       last_max = 0       row_grouped_with_max.append(max_row)   row_grouped_with_max.append(max_row)

how in clean numpy way?

might not clean, here's vectorized way solve -

# sorted "rows" sorted_rows = rows[np.argsort(rows[:,0])]  # count of elements each id _,count = np.unique(sorted_rows[:,0],return_counts=true)  # form mask fill elements x-column n1 = count.max() n2 = len(count) mask = np.arange(n1) < count[:,none]  # form 2d matrix of id's each row each unique id id_2darray = np.empty((n2,n1)) id_2darray.fill(-np.inf) id_2darray[mask] = sorted_rows[:,1]  # id based max indices grp_max_idx = np.argmax(id_2darray,axis=1) + np.append([0],count.cumsum()[:-1])  # finally, "maxed"-x rows out = sorted_rows[grp_max_idx]

sample input, output -

in [101]: rows out[101]:  array([[   2,   13, 1234],        [   1,   22, 1236],        [   2,   23, 1250],        [   6,   12, 1345],        [   4,   10,  290],        [   2,   10, 1224],        [   2,   23, 1111],        [   4,   45,   99],        [   1,   11, 1563],        [   4,   23,   89]])  in [102]: out out[102]:  array([[   1,   22, 1236],        [   2,   23, 1250],        [   4,   45,   99],        [   6,   12, 1345]])

Search This Blog

Brant

python - How to get rows from numpy 2d where column value is maximum from group by other column? -

Comments

Post a Comment

Popular posts from this blog

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -