IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Selenium+PhantomJS设置User-Agent

    admin发表于 2016-10-08 06:03:50
    love 0

    Selenium+PhantomJS设置User-Agent, Selenium和PhantomJS设置User-Agent

    有些网站的WebServer对User-Agent有限制,可能会拒绝不熟悉的User-Agent的访问,所以,写Web自动化代码可能需要将User-Agent稍微伪装一下,否则可能会被拒绝访问。这里简单记录一下Selenium中使用PhantomJS,设置User-Agent的方法。

    python下Selenium依赖:

    sudo pip install selenium

    默认情况下,是没有自动设置User-Agent的,默认的User-Agent显示为PhantomJS;设置PhantomJS的user-agent,是要设置“phantomjs.page.settings.userAgent”这个desired_capability:

    '''
    Created on Dec 6, 2013
     
    @author: Jay
    @summary: Set user-agent before using PhantomJS to get a web page.
    '''
     
    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
     
    dcap = dict(DesiredCapabilities.PHANTOMJS)
    dcap["phantomjs.page.settings.userAgent"] = (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 "
    )
     
    driver = webdriver.PhantomJS(executable_path='./phantomjs', desired_capabilities=dcap)
    driver.get("http://dianping.com/")
    cap_dict = driver.desired_capabilities
    for key in cap_dict:
        print '%s: %s' % (key, cap_dict[key])
    print driver.current_url
    driver.quit

    执行后输出如下:

    jay@Jay-Air:~/workspace/python_study/dp/qa/2013/12 $python user_agent_phantomjs.py 
    rotatable: False
    takesScreenshot: True
    acceptSslCerts: False
    browserConnectionEnabled: False
    javascriptEnabled: True
    driverVersion: 1.0.3
    databaseEnabled: False
    locationContextEnabled: False
    platform: mac-unknown-32bit
    browserName: phantomjs
    version: 1.9.1
    driverName: ghostdriver
    nativeEvents: True
    phantomjs.page.settings.userAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 
    applicationCacheEnabled: False
    webStorageEnabled: False
    proxy: {u'proxyType': u'direct'}
    handlesAlerts: False
    cssSelectorsEnabled: True
    http://www.dianping.com/citylist

    关键点:

    dcap = dict(DesiredCapabilities.PHANTOMJS)
    dcap["phantomjs.page.settings.userAgent"] = (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 "
    )
    driver = webdriver.PhantomJS(executable_path='./phantomjs', desired_capabilities=dcap)

    也可通过添加以下代码观察是否成功:

    agent = browser.execute_script("return navigator.userAgent")
    print agent

    设置好后,就可以访问了。通过Selenium可以非常方便的渲染出所需的页面。



沪ICP备19023445号-2号
友情链接