python - How can I run site js function with custom arguments? -
i need scrape google suggestions search input. use selenium+phantomjs webdriver.
search_input = selenium.find_element_by_xpath(".//input[@id='lst-ib']") search_input.send_keys('phantomjs har python') time.sleep(1.5) lxml.html import fromstring etree = fromstring(selenium.page_source) output = [] suggestion in etree.xpath(".//ul[@role='listbox']/li//div[@class='sbqs_c']"): output.append(" ".join([s.strip() s in suggestion.xpath(".//text()") if s.strip()]))
but see in firebug xhr request this. , response - simple text file data need. @ log:
selenium.get_log("har")
i can't see request. how can catch it? need url using template requests lib use other search words. or may possible run js initiate request other(not input field) arguments, possible?
you can solve python+selenium+phantomjs only.
here list of things i've done make work:
- pretend browser head changing phantomjs's user-agent through desired capabilities
- use explicit waits
- ask direct https://www.google.com/?gws_rd=ssl#q=phantomjs+har+python url
working solution:
from selenium.webdriver.common.by import selenium.webdriver.common.keys import keys selenium.webdriver.support.ui import webdriverwait selenium.webdriver.support import expected_conditions ec selenium import webdriver desired_capabilities = webdriver.desiredcapabilities.phantomjs desired_capabilities["phantomjs.page.customheaders.user-agent"] = "mozilla/5.0 (linux; u; android 2.3.3; en-us; lg-lu3000 build/gri40) applewebkit/533.1 (khtml, gecko) version/4.0 mobile safari/533.1" driver = webdriver.phantomjs(desired_capabilities=desired_capabilities) driver.get("https://www.google.com/?gws_rd=ssl#q=phantomjs+har+python") wait = webdriverwait(driver, 10) # focus input , trigger suggestion list shown search_input = wait.until(ec.visibility_of_element_located((by.name, "q"))) search_input.send_keys(keys.arrow_down) search_input.click() # wait suggestion box appear wait.until(ec.presence_of_element_located((by.css_selector, "ul[role=listbox]"))) # parse suggestions print "list of suggestions: " suggestion in driver.find_elements_by_css_selector("ul[role=listbox] li[dir]"): print suggestion.text
prints:
list of suggestions: python phantomjs screenshot python phantomjs ghostdriver python phantomjs proxy unable start phantomjs ghostdriver
Comments
Post a Comment