awk for loop to break up file into chunks -
i have large file break chunks field 2. field 2 ranges in value 0 250 million.
1 10492 rs55998931 c t 6 7 3 3 - 0.272727272727273 0.4375 1 13418 . g 6 1 2 3 ddx11l1 0.25 0.0625 1 13752 . t c 4 4 1 3 ddx11l1 0.153846153846154 0.25 1 13813 . t g 1 4 0 1 ddx11l1 0.0357142857142857 0.2 1 13838 rs200683566 c t 1 4 0 1 ddx11l1 0.0357142857142857 0.2
i want field 2 broken intervals of 50,000, overlapping 2,000. example, first 3 awk commands like:
awk '$1=="1" && $2>=0 && $2<=50000{print$0}' highalt.lowalt.allelecounts.filteredformissing.freq > chr1.0kb.50kb awk '$1=="1" && $2>=48000 && $2<=98000{print$0}' highalt.lowalt.allelecounts.filteredformissing.freq > chr1.48kb.98kb awk '$1=="1" && $2>=96000 && $2<=146000{print$0}' highalt.lowalt.allelecounts.filteredformissing.freq > chr1.96kb.146kb
i know there's way can using loop variables , j. can me out?
awk '$1=="1"{n=int($2/48000); print>("chr1." (48*n) "kb." (48*n+50) "kb");n--; if (n>=0 && $2/1000<=48*n+50) print>("chr1." (48*n) "kb." (48*n+50) "kb");}' highalt.lowalt.allelecounts.filteredformissing.freq
or spread out on multiple lines:
awk '$1=="1"{ n=int($2/48000) print>("chr1." (48*n) "kb." (48*n+50) "kb") n-- if (n>=0 && $2/1000<=48*n+50) print>("chr1." (48*n) "kb." (48*n+50) "kb") }' highalt.lowalt.allelecounts.filteredformissing.freq
how works
$1=="1"{
this selects lines first field 1. (you didn't mention in text code applied restriction.
n=int($2/48000)
this computes bucket line belongs in.
print>("chr1." (48*n) "kb." (48*n+50) "kb")
this writes line appropriate file
n--
this decrements bucket number
if (n>=0 && $2/1000<=48*n+50) print>("chr1." (48*n) "kb." (48*n+50) "kb")
if line fits within overlapping range of previous bucket, write bucket also.
}
this closes group started selecting
$1=="1"
.
Comments
Post a Comment