IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    python爬虫scrapy实践一

    闲人发表于 2015-06-30 17:12:19
    love 0

      之前为diunale.com做爬虫的时候,由于正在学习python,遂网上搜索了一下开源的爬虫资料,看了许多对于开源爬虫的比较发现开源爬虫scrapy比较好用。但是以前一直用的php,对python不熟悉,于是花一天时间粗略了解了一遍python的基础知识。然后就开干了,下面记录下安装和配置scrapy简单过程。

    运行环境:CentOS 6.6

    开始上来先得安装python运行环境。由于自带的2.6版本,少了很多东西,于是需要升级一下。先一次性将所有的包下载一下,后续要用的。

    
    #提前做些准备
    yum -y update
    yum -y install libffi-devel ntp make openssl openssl-devel pcre pcre-devel zlib zlib-devel gcc gcc-c++ libXpm libXpm-devel ncurses ncurses-devel libmcrypt libmcrypt-devel libxml2 libxml2-devel
    
    wget https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz
    wget https://pypi.python.org/packages/source/s/setuptools/setuptools-18.0.1.tar.gz
    wget https://pypi.python.org/packages/source/p/pip/pip-7.1.0.tar.gz
    wget https://pypi.python.org/packages/source/T/Twisted/Twisted-15.2.1.tar.bz2
    wget ftp://xmlsoft.org/libxml2/libxml2-git-snapshot.tar.gz
    wget http://xmlsoft.org/sources/libxslt-1.1.28.tar.gz
    wget https://pypi.python.org/packages/source/c/cryptography/cryptography-0.4.tar.gz
    wget ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz
    
    echo "安装python2.7"
    
    tar -zxvf Python-2.7.10.tgz
    cd Python-2.7.10
    mkdir /usr/local/python27
    ./configure --prefix=/usr/local/python27
    make
    make install
    mv /usr/bin/python /usr/bin/python_old
    ln -s /usr/local/python27/bin/python2.7 /usr/bin/python
    cd ..
    
    echo "修改安装python2.7后yum无法使用的情况"
    vi /usr/bin/yum
    将
    #!/usr/bin/python
    改为:
    #!/usr/bin/python_old
    
    
    echo "setuptools与pip安装"
    
    tar -zvxf setuptools-18.0.1.tar.gz
    cd setuptools-18.0.1
    python setup.py install 
    cd ..
    tar -zvxf pip-7.1.0.tar.gz
    cd pip-7.1.0
    python setup.py build 
    python setup.py install 
    cd ..
    
    #whereis pip
    #ln -s /usr/local/python27/bin/pip /usr/bin/pip
    #然后安装数据库支持
    pip install mysql-python
    
    #如果出现mysql-python安装时EnvironmentError: mysql_config not found
    #find / -name mysql_config
    #ln -s /www/wdlinux/mysql-5.1.63/bin/mysql_config /usr/bin/mysql_config
    
    
    tar -xjvf Twisted-15.2.1.tar.bz2
    cd Twisted-15.2.1
    python setup.py install
    cd ..
    tar -zxvf libxml2-git-snapshot.tar.gz
    cd libxml2-2.9.2/
    ./configure
    make
    make install
    cd ..
    tar -zxvf libxslt-1.1.28.tar.gz
    cd libxslt-1.1.28/
    ./configure
    make
    make install
    cd ..
    
    export LD_LIBRARY_PATH=/usr/local/lib
    
    tar -zxvf cryptography-0.4.tar.gz
    cd cryptography-0.4
    python setup.py build
    python setup.py install
    cd ..
    tar -zxvf libffi-3.2.1.tar.gz
    cd libffi-3.2.1
    ./configure
    make
    make install
    cd ..
    
    export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
    
    
    
    #64位系统需要链接
    ln -s /usr/local/lib64/libffi.a /usr/local/lib/libffi.a
    ln -s /usr/local/lib64/libffi.so /usr/local/lib/libffi.so
    ln -s /usr/local/lib64/libffi.la /usr/local/lib/libffi.la
    ln -s /usr/local/lib64/libffi.so.6 /usr/local/lib/libffi.so.6
    
    pip install scrapy
    pip install scrapy-crawlera
    
    #64位系统需要链接
    ln -s /usr/local/python27/bin/scrapy /usr/bin/scrapy
    
    pip install service_identity
    
    #创建工程diunale
    cd /home
    scrapy startproject diunale
    
    scrapy crawl DiunaleSpider
    

     

    参考文档

    http://scrapy.org/

    http://blog.pluskid.org/?p=381

    http://doc.scrapy.org/en/master/

    http://blog.csdn.net/slvher/article/details/42346887

    http://blog.csdn.net/niying/article/details/27103081

    http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html

    http://www.cnblogs.com/rwxwsblog/p/4557123.html

    http://www.cnblogs.com/rwxwsblog/p/4572367.html



沪ICP备19023445号-2号
友情链接