.net - Fastest way to save thousands of files in VB.NET? -
i'm downloading thousands of files every second. each file 5kb , total download speed 200mb/s. need save of these files.
the download process split between thousands of different async tasks running. when finish downloading file , want save it, add queue of files save.
here class looks like. create instance of class @ beginning, , have tasks add files need saved queue.
public class filesaver structure filetosave dim path string dim data() byte end structure private filequeue new concurrent.blockingcollection(of filetosave) sub new() task.run( async function() while 1 dim fl filetosave = filequeue.take() using sourcestream new filestream(fl.path, filemode.append, fileaccess.write, fileshare.none, buffersize:=4096, useasync:=true) await sourcestream.writeasync(fl.data, 0, fl.data.length) end using end while end function ) end sub public sub add(path string, data() byte) dim fl filetosave fl.path = path fl.data = data filequeue.add(fl) end sub public function count() return filequeue.count end function end class
there one instance of class, there one queue. each task not create separate queue. there 1 global instance of class internal queue, , tasks add files single queue.
i've since replaced concurrentqueue
default blockingcollection
, should work concurrentqueue
, allow me have blocking take()
collection, without having loop.
the hard disk i'm using supports ~180mb/s maximum read/write speeds. i'm downloading @ 200mb/s, , don't seem able save data fast enough queue keeps growing. wrong, , can't seem figure out what.
is best (fastest) way it? create improvements here?
edit: question put on hold, , can't post own answer figured out. i'll post here.
the problem here while writing file relatively cheap process, opening file writing not. since downloading thousands of files, saving each 1 separately, hurting performance.
what did instead group multiple downloaded files (while still in ram) 1 file (with delimiters), , write file disk. files i'm downloading have properties allow them logically grouped in way, , still used later. ratio 100:1.
i no longer seem write-bound, , i'm saving @ ~40mb/s, if hit premature limit, i'll update this. hope helps someone.
edit2: more progress on goal faster io.
since i'm combining multiple files one, means i'm performing total of 1 open (createfile) operation, , multiple writes opened file. good, still not optimal. better 1 10mb write rather ten 1mb writes. multiple writes slower, , cause disk fragmentation later slows down reads well. not good.
so solution buffer (or many can) downloaded files in ram, , once i've hit point, write them single file 1 write operation. have ~50gb of ram, works great me.
however, there problem. since i'm manually buffering write data few write operations possible, windows cache becomes redundant , starts slowing stuff down, , eating ram. lets rid of it.
the solution unbuffered (and async) i/o, supported windows' createfile(). not supported in .net. had use library (the 1 seems exist) accomplish can find here: http://programmingaddicted.blogspot.com/2011/05/unbuffered-overlapped-io-in-net.html
that allows simple unbuffered asynchronous io .net. requirement have manually sector-align byte() buffers otherwise writefile() fail "invalid parameter" error. in case required aligning buffers multiple of 512.
after of this, able hit ~110mb/s write speed drive. better expected.
i suggest tpl dataflow. looks want create producer/consumer.
the beauty of using tpl dataflow on current implementation can specify degree of parallelism. allow play numbers best tune solution meet needs.
as @graffito mentions, if using spinning platters, writing may limited number of files concurrently being written to, makes trial , error best tune performance.
you, of course, write own mechanism limit concurrency.
i hope helps.
[additional] worked @ company archived email had similar requirements of writing disk. company had issues io speeds when there many files in directory. result, chose limit files 1000 files/folders per directory. decision before time, may relevant project.
Comments
Post a Comment