HTML Python Requests Library - Too Slow -
i'm using python requests library source code url , apply regex extract data using following code:
for url in urls: print url page = requests.get(url) matches = re.findall('btn btn-primary font-bold">\s*<span>([^<]*)', page.text) match in matches: print match
this code works far slow; takes more 5 seconds per request. there suggestions make faster?
also - should adding try/error code robustness?
i agree comments above speed profiling great way see slowing down. if option, 1 obvious way speed code parallelise it. here simple suggestion
from multiprocessing.dummy import pool threadpool import requests import re def parallelurl(url): print url page = requests.get(url) matches = re.findall('btn btn-primary font-bold">\s*<span>([^<]*)', page.text) match in matches: print match pool = threadpool(6) #play around number depends on processor pool.map(parallelurl,urllist)
on computer speeds accessing google 10 times 1.9s 0.3s.
Comments
Post a Comment