mapreduce - A node with massive degree in graph brings taking distinct() edges trouble -


i have graph around 75% connectivity comes 1 node

e.g. if sum of degree of nodes 100, node's degree 75.

after manipulations, massive duplicate edges exist regarding node.

assume 1 kind of node

1,2
1,2
1,2
1,2
1,2
1,2
1,3
1,3
1,3

however, has many duplicate keys taking distinct() edges. have tried re-partition before taking distinct() still doesn't work out of many duplicate keys, , writing disk , taking distinct() solves problem.

is there better way handle kind of extremely skew problem?


Comments