import.io - Running the crawler doesn't get the same data it does when training -


when training crawler scrape yelp page, gets information without me doing anything, when run crawler, address not recognized, , doesn't record.

getting company data yelp

in case want addresses companies in san francisco website www.yelp.com.

analysis of site

we can list of companies beginning letter “a” page:

http://www.yelp.com/sm/san-francisco-ca-us/a/1 

this directory page tells there 42 pages of results “a” 80 results per page.

this news.

create api

i going create api data first page , use bulk extract pass through list of urls 42 pages.

using magic, can generate api in few clicks:

  1. go magic.import.io
  2. paste in url yelp page (link above)
  3. click “extract data”
  4. click “get api”
  5. click “copy data”

now have api!

(note if need more control on include or exclude api can use extractor)

generate list of urls

to generate list of urls allow data pages 1 through 42 going use external service hosted at:

http://texttool.blogspot.co.uk/

locate "generate list of numbers" tool , generate list of urls:

http://www.yelp.com/sm/san-francisco-ca-us/a/1 http://www.yelp.com/sm/san-francisco-ca-us/a/2 http://www.yelp.com/sm/san-francisco-ca-us/a/3 http://www.yelp.com/sm/san-francisco-ca-us/a/4 http://www.yelp.com/sm/san-francisco-ca-us/a/5 http://www.yelp.com/sm/san-francisco-ca-us/a/6 http://www.yelp.com/sm/san-francisco-ca-us/a/7 http://www.yelp.com/sm/san-francisco-ca-us/a/8 http://www.yelp.com/sm/san-francisco-ca-us/a/9 http://www.yelp.com/sm/san-francisco-ca-us/a/10 http://www.yelp.com/sm/san-francisco-ca-us/a/11 http://www.yelp.com/sm/san-francisco-ca-us/a/12 http://www.yelp.com/sm/san-francisco-ca-us/a/13 http://www.yelp.com/sm/san-francisco-ca-us/a/14 http://www.yelp.com/sm/san-francisco-ca-us/a/15 http://www.yelp.com/sm/san-francisco-ca-us/a/16 http://www.yelp.com/sm/san-francisco-ca-us/a/17 http://www.yelp.com/sm/san-francisco-ca-us/a/18 http://www.yelp.com/sm/san-francisco-ca-us/a/19 http://www.yelp.com/sm/san-francisco-ca-us/a/20 http://www.yelp.com/sm/san-francisco-ca-us/a/21 http://www.yelp.com/sm/san-francisco-ca-us/a/22 http://www.yelp.com/sm/san-francisco-ca-us/a/23 http://www.yelp.com/sm/san-francisco-ca-us/a/24 http://www.yelp.com/sm/san-francisco-ca-us/a/25 http://www.yelp.com/sm/san-francisco-ca-us/a/26 http://www.yelp.com/sm/san-francisco-ca-us/a/27 http://www.yelp.com/sm/san-francisco-ca-us/a/28 http://www.yelp.com/sm/san-francisco-ca-us/a/29 http://www.yelp.com/sm/san-francisco-ca-us/a/30 http://www.yelp.com/sm/san-francisco-ca-us/a/31 http://www.yelp.com/sm/san-francisco-ca-us/a/32 http://www.yelp.com/sm/san-francisco-ca-us/a/33 http://www.yelp.com/sm/san-francisco-ca-us/a/34 http://www.yelp.com/sm/san-francisco-ca-us/a/35 http://www.yelp.com/sm/san-francisco-ca-us/a/36 http://www.yelp.com/sm/san-francisco-ca-us/a/37 http://www.yelp.com/sm/san-francisco-ca-us/a/38 http://www.yelp.com/sm/san-francisco-ca-us/a/39 http://www.yelp.com/sm/san-francisco-ca-us/a/40 http://www.yelp.com/sm/san-francisco-ca-us/a/41 http://www.yelp.com/sm/san-francisco-ca-us/a/42 

bulk extract

now can use bulk extract data each of urls in 1 go.

to this:

  1. go configure tab on yelp api
  2. select bulk extract drop down
  3. paste in list of 42 urls
  4. click “run queries”

note: may failed queries. clicking “x urls failed” text can retry failed queries.

export

you can export data spreadsheet, html or json.

further reading

http://support.import.io/knowledgebase/articles/669784-getting-company-data-from-yelp


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -