mapreduce - A node with massive degree in graph brings taking distinct() edges trouble -
i have graph around 75% connectivity comes 1 node
e.g. if sum of degree of nodes 100, node's degree 75.
after manipulations, massive duplicate edges exist regarding node.
assume 1 kind of node
1,2
1,2
1,2
1,2
1,2
1,2
1,3
1,3
1,3
however, has many duplicate keys taking distinct() edges. have tried re-partition before taking distinct() still doesn't work out of many duplicate keys, , writing disk , taking distinct() solves problem.
is there better way handle kind of extremely skew problem?
Comments
Post a Comment