python - Import.io bulk extract slows down when more URLs are in list -


i have set import.io bulk extract works great say, 50 urls. literally zips through of them in seconds. however, when try extract of 40,000 urls, extractor starts fast first thousand or so, , progressively keeps getting slower every incremental url. 5,000 literally taking 4-5 seconds per url.

one solution seems work breaking them chunks of 1,000 urls @ time , doing separate bulk extract each. however, time consuming, , requires splicing of data @ end.

has experienced this, , if have more elegant solution?

thanks, mike

one less elegant solution create crawler. , before run insert the 10k urls in "where start crawling" box.

under advanced options set crawl depth zero, way pages put in start crawling input box.

that should trick. plus cawler has bunch of other options wait between pages , concurrent pages etc.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

jquery - javascript onscroll fade same class but with different div -