HTML Python Requests Library - Too Slow -


i'm using python requests library source code url , apply regex extract data using following code:

for url in urls:     print url     page = requests.get(url)     matches = re.findall('btn btn-primary font-bold">\s*<span>([^<]*)', page.text)     match in matches:         print match 

this code works far slow; takes more 5 seconds per request. there suggestions make faster?

also - should adding try/error code robustness?

i agree comments above speed profiling great way see slowing down. if option, 1 obvious way speed code parallelise it. here simple suggestion

from multiprocessing.dummy import pool threadpool import requests import re   def parallelurl(url):     print url     page = requests.get(url)     matches = re.findall('btn btn-primary font-bold">\s*<span>([^<]*)', page.text)     match in matches:        print match  pool = threadpool(6)  #play around number depends on processor  pool.map(parallelurl,urllist) 

on computer speeds accessing google 10 times 1.9s 0.3s.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

jquery - javascript onscroll fade same class but with different div -