python - How to get rows from numpy 2d where column value is maximum from group by other column? -

this pretty common sql query: select lines maximum value in column x, group group_id.

the result every group_id 1 (first) line column x value maximum within group.

i have 2d numpy array many columns lets simplify (id x y):

import numpy np rows = np.array([[1 22 1236]                  [1 11 1563]                  [2 13 1234]                  [2 10 1224]                  [2 23 1111]                  [2 23 1250]]) 

and want get:

[[1 22 1236]  [2 23 1111]] 

i able through cumbersome loop, like:

  row_grouped_with_max = []    max_row = rows[0]   last_max = max_row[1]   last_row_group = max_row[0]   row in rows:     if last_max < row[1]:         max_row = row     if row[0] != last_row_group:             last_row_group = row[0]       last_max = 0       row_grouped_with_max.append(max_row)   row_grouped_with_max.append(max_row) 

how in clean numpy way?

might not clean, here's vectorized way solve -

# sorted "rows" sorted_rows = rows[np.argsort(rows[:,0])]  # count of elements each id _,count = np.unique(sorted_rows[:,0],return_counts=true)  # form mask fill elements x-column n1 = count.max() n2 = len(count) mask = np.arange(n1) < count[:,none]  # form 2d matrix of id's each row each unique id id_2darray = np.empty((n2,n1)) id_2darray.fill(-np.inf) id_2darray[mask] = sorted_rows[:,1]  # id based max indices grp_max_idx = np.argmax(id_2darray,axis=1) + np.append([0],count.cumsum()[:-1])  # finally, "maxed"-x rows out = sorted_rows[grp_max_idx] 

sample input, output -

in [101]: rows out[101]:  array([[   2,   13, 1234],        [   1,   22, 1236],        [   2,   23, 1250],        [   6,   12, 1345],        [   4,   10,  290],        [   2,   10, 1224],        [   2,   23, 1111],        [   4,   45,   99],        [   1,   11, 1563],        [   4,   23,   89]])  in [102]: out out[102]:  array([[   1,   22, 1236],        [   2,   23, 1250],        [   4,   45,   99],        [   6,   12, 1345]]) 
