machine learning - Dividing data sets into testing and training data -
i have dataset k examples , want partition m sets. how can programmatically. example, if k = 5 , m = 2, therefore, 5 / 2 = 2.5 how partition 2 , 3, , not 2, 2 , 1? similarly, if k = 10 , m = 3, want partitioned 3, 3 , 4, not 3, 3, 3 , 1.
usually, sort of functionality built tools. but, assuming observations independent, set random number generator , like:
for = 1 k do; set r = rand(); if r < 0.5 data[i].which = 'set1' else data[i].which = 'set2'
you can extend number of sets , probabilities.
for example k = 5, rows in single set (i'm thinking 3% of time). however, point of splitting data dealing larger amounts of data. if have 5 or 10 rows, splitting observations different partitions not way go.
Comments
Post a Comment