.net - Fastest way to save thousands of files in VB.NET? -


i'm downloading thousands of files every second. each file 5kb , total download speed 200mb/s. need save of these files.

the download process split between thousands of different async tasks running. when finish downloading file , want save it, add queue of files save.

here class looks like. create instance of class @ beginning, , have tasks add files need saved queue.

public class filesaver  structure filetosave     dim path string     dim data() byte end structure  private filequeue new concurrent.blockingcollection(of filetosave)  sub new()     task.run(         async function()              while 1                 dim fl filetosave = filequeue.take()                 using sourcestream new filestream(fl.path, filemode.append, fileaccess.write, fileshare.none, buffersize:=4096, useasync:=true)                         await sourcestream.writeasync(fl.data, 0, fl.data.length)                 end using             end while          end function     ) end sub  public sub add(path string, data() byte)     dim fl filetosave     fl.path = path     fl.data = data     filequeue.add(fl) end sub  public function count()     return filequeue.count end function  end class 

there one instance of class, there one queue. each task not create separate queue. there 1 global instance of class internal queue, , tasks add files single queue.

i've since replaced concurrentqueue default blockingcollection, should work concurrentqueue, allow me have blocking take() collection, without having loop.

the hard disk i'm using supports ~180mb/s maximum read/write speeds. i'm downloading @ 200mb/s, , don't seem able save data fast enough queue keeps growing. wrong, , can't seem figure out what.

is best (fastest) way it? create improvements here?


edit: question put on hold, , can't post own answer figured out. i'll post here.

the problem here while writing file relatively cheap process, opening file writing not. since downloading thousands of files, saving each 1 separately, hurting performance.

what did instead group multiple downloaded files (while still in ram) 1 file (with delimiters), , write file disk. files i'm downloading have properties allow them logically grouped in way, , still used later. ratio 100:1.

i no longer seem write-bound, , i'm saving @ ~40mb/s, if hit premature limit, i'll update this. hope helps someone.


edit2: more progress on goal faster io.

since i'm combining multiple files one, means i'm performing total of 1 open (createfile) operation, , multiple writes opened file. good, still not optimal. better 1 10mb write rather ten 1mb writes. multiple writes slower, , cause disk fragmentation later slows down reads well. not good.

so solution buffer (or many can) downloaded files in ram, , once i've hit point, write them single file 1 write operation. have ~50gb of ram, works great me.

however, there problem. since i'm manually buffering write data few write operations possible, windows cache becomes redundant , starts slowing stuff down, , eating ram. lets rid of it.

the solution unbuffered (and async) i/o, supported windows' createfile(). not supported in .net. had use library (the 1 seems exist) accomplish can find here: http://programmingaddicted.blogspot.com/2011/05/unbuffered-overlapped-io-in-net.html

that allows simple unbuffered asynchronous io .net. requirement have manually sector-align byte() buffers otherwise writefile() fail "invalid parameter" error. in case required aligning buffers multiple of 512.

after of this, able hit ~110mb/s write speed drive. better expected.

i suggest tpl dataflow. looks want create producer/consumer.

the beauty of using tpl dataflow on current implementation can specify degree of parallelism. allow play numbers best tune solution meet needs.

as @graffito mentions, if using spinning platters, writing may limited number of files concurrently being written to, makes trial , error best tune performance.

you, of course, write own mechanism limit concurrency.

i hope helps.

[additional] worked @ company archived email had similar requirements of writing disk. company had issues io speeds when there many files in directory. result, chose limit files 1000 files/folders per directory. decision before time, may relevant project.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -