IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Selenium+PhantomJS(系列四:模拟登录微博)

    admin发表于 2016-10-17 08:36:55
    love 0

    引入selenium package, 建立webdriver对象

    from selenium import webdriver  
    sel = selenium.webdriver.Chrome()

    打开设定的url,并等待response:

    loginurl = 'http://weibo.com/'  
    #open the login in page  
    sel.get(loginurl)  
    time.sleep(10)

    通过xpath找到登录框,并填入相应帐号密码,模拟点击登录:

    #sign in the username  
    try:  
        sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[1]/div[1]/input").send_keys('yourusername')  
        print 'user success!'  
    except:  
        print 'user error!'  
    time.sleep(1)  
    #sign in the pasword  
    try:  
        sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[2]/div[1]/input").send_keys('yourPW')  
        print 'pw success!'  
    except:  
        print 'pw error!'  
    time.sleep(1)  
    #click to login  
    try:  
        sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click()  
        print 'click success!'  
    except:  
        print 'click error!'  
    time.sleep(3)

    验证登录成功与否,若currenturl发生变化,则认为登录成功:

    curpage_url = sel.current_url  
    print curpage_url  
    while(curpage_url == loginurl):  
        #print 'please input the verify code:'  
        print 'please input the verify code:'  
        verifycode = sys.stdin.readline()  
        sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[3]/div[1]/input").send_keys(verifycode)  
        try:  
            sel.find_element_by_xpath("//div[@id='pl_login_form']/div/div[2]/div[6]/a").click()  
            print 'click success!'  
        except:  
             print 'click error!'  
        time.sleep(3)  
        curpage_url = sel.current_url

    通过对象的方法获取当前访问网站的session  cookie:

    #get the session cookie  
    cookie = [item["name"] + "=" + item["value"] for item in sel.get_cookies()]  
    #print cookie  
      
    cookiestr = ';'.join(item for item in cookie)  
    print cookiestr

    得到cookie之后,就可以通过urllib2、scrapy、requests等访问相应的网站:

    import urllib2  
    
    print '%%%using the urllib2 !!'  
    homeurl = sel.current_url  
    print 'homeurl: %s' % homeurl  
    headers = {'cookie':cookiestr}  
    req = urllib2.Request(homeurl, headers = headers)  
    try:  
        response = urllib2.urlopen(req)  
        text = response.read()  
        fd = open('homepage', 'w')  
        fd.write(text)  
        fd.close()  
        print '###get home page html success!!'  
    except:  
        print '### get home page html error!!'

    转载自:http://blog.csdn.net/warrior_zhang/article/details/50198699



沪ICP备19023445号-2号
友情链接