HTML Python Requests Library - Too Slow -

HTML Python Requests Library - Too Slow -

i'm using python requests library source code url , apply regex extract data using following code:

for url in urls:     print url     page = requests.get(url)     matches = re.findall('btn btn-primary font-bold">\s*<span>([^<]*)', page.text)     match in matches:         print match

this code works far slow; takes more 5 seconds per request. there suggestions make faster?

also - should adding try/error code robustness?

i agree comments above speed profiling great way see slowing down. if option, 1 obvious way speed code parallelise it. here simple suggestion

from multiprocessing.dummy import pool threadpool import requests import re   def parallelurl(url):     print url     page = requests.get(url)     matches = re.findall('btn btn-primary font-bold">\s*<span>([^<]*)', page.text)     match in matches:        print match  pool = threadpool(6)  #play around number depends on processor  pool.map(parallelurl,urllist)

on computer speeds accessing google 10 times 1.9s 0.3s.

Comments