python - `uniq` for 2D Theano tensor -


i have numpy code:

def uniq(seq):   """   unix tool uniq. removes repeated entries.   :param seq: numpy.array. (time,) -> label   :return: seq   """   diffs = np.ones_like(seq)   diffs[1:] = seq[1:] - seq[:-1]   idx = diffs.nonzero()   return seq[idx] 

now, want extend support 2d arrays , make use theano. should fast on gpu.

i array multiple sequences multiple batches in format (time,batch), , time_mask specifies indirectly length of each sequence.

my current try:

def uniq_with_lengths(seq, time_mask):   # seq (time,batch) -> label   # time_mask (time,batch) -> 0 or 1   num_batches = seq.shape[1]   diffs = t.ones_like(seq)   diffs = t.set_subtensor(diffs[1:], seq[1:] - seq[:-1])   time_range = t.arange(seq.shape[0]).dimshuffle([0] + ['x'] * (seq.ndim - 1))   idx = t.switch(t.neq(diffs, 0) * time_mask, time_range, -1)   seq_lens = t.sum(t.ge(idx, 0), axis=0)  # (batch,) -> len   max_seq_len = t.max(seq_lens)    # don't know better way without scan.   def step(batch_idx, out_seq_b1):     out_seq = seq[t.ge(idx[:, batch_idx], 0).nonzero(), batch_idx][0]     return t.concatenate((out_seq, t.zeros((max_seq_len - out_seq.shape[0],), dtype=seq.dtype)))   out_seqs, _ = theano.scan(     step,     sequences=[t.arange(num_batches)],     outputs_info=[t.zeros((max_seq_len,), dtype=seq.dtype)]   )   # out_seqs (batch,max_seq_len)   return out_seqs.t, seq_lens 

how construct out_seqs directly?

i out_seqs = seq[idx] i'm not sure how express that.

here's quick answer addresses part of task:

def compile_theano_uniq(x):     diffs = x[1:] - x[:-1]     diffs = tt.concatenate([tt.ones_like([x[0]], dtype=diffs.dtype), diffs])     y = diffs.nonzero_values()     return theano.function(inputs=[x], outputs=y)  theano_uniq = compile_theano_uniq(tt.vector(dtype='int32')) 

the key nonzero_values().

update: can't imagine way without using theano.scan. clear, , using 0 padding, i'm assuming given input

1 1 2 3 3 4 0 1 2 2 2 3 3 4 1 2 3 4 5 0 0 

you want output be

1 2 3 4 0 0 0 1 2 3 4 0 0 0 1 2 3 4 5 0 0 

or even

1 2 3 4 0 1 2 3 4 0 1 2 3 4 5 

you identify indexes of items want keep without using scan. either new tensor needs constructed scratch or values want keep how moved make sequences contiguous. neither approaches seem feasible without theano.scan.


Comments

Popular posts from this blog

searchKeyword not working in AngularJS filter -

sequelize.js - Sequelize: sort by enum cases -

user interface - how to replace an ongoing process of image capture from another process call over the same ImageLabel in python's GUI TKinter -