之前为diunale.com做爬虫的时候,由于正在学习python,遂网上搜索了一下开源的爬虫资料,看了许多对于开源爬虫的比较发现开源爬虫scrapy比较好用。但是以前一直用的php,对python不熟悉,于是花一天时间粗略了解了一遍python的基础知识。然后就开干了,下面记录下安装和配置scrapy简单过程。
运行环境:CentOS 6.6
开始上来先得安装python运行环境。由于自带的2.6版本,少了很多东西,于是需要升级一下。先一次性将所有的包下载一下,后续要用的。
#提前做些准备 yum -y update yum -y install libffi-devel ntp make openssl openssl-devel pcre pcre-devel zlib zlib-devel gcc gcc-c++ libXpm libXpm-devel ncurses ncurses-devel libmcrypt libmcrypt-devel libxml2 libxml2-devel wget https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz wget https://pypi.python.org/packages/source/s/setuptools/setuptools-18.0.1.tar.gz wget https://pypi.python.org/packages/source/p/pip/pip-7.1.0.tar.gz wget https://pypi.python.org/packages/source/T/Twisted/Twisted-15.2.1.tar.bz2 wget ftp://xmlsoft.org/libxml2/libxml2-git-snapshot.tar.gz wget http://xmlsoft.org/sources/libxslt-1.1.28.tar.gz wget https://pypi.python.org/packages/source/c/cryptography/cryptography-0.4.tar.gz wget ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gz echo "安装python2.7" tar -zxvf Python-2.7.10.tgz cd Python-2.7.10 mkdir /usr/local/python27 ./configure --prefix=/usr/local/python27 make make install mv /usr/bin/python /usr/bin/python_old ln -s /usr/local/python27/bin/python2.7 /usr/bin/python cd .. echo "修改安装python2.7后yum无法使用的情况" vi /usr/bin/yum 将 #!/usr/bin/python 改为: #!/usr/bin/python_old echo "setuptools与pip安装" tar -zvxf setuptools-18.0.1.tar.gz cd setuptools-18.0.1 python setup.py install cd .. tar -zvxf pip-7.1.0.tar.gz cd pip-7.1.0 python setup.py build python setup.py install cd .. #whereis pip #ln -s /usr/local/python27/bin/pip /usr/bin/pip #然后安装数据库支持 pip install mysql-python #如果出现mysql-python安装时EnvironmentError: mysql_config not found #find / -name mysql_config #ln -s /www/wdlinux/mysql-5.1.63/bin/mysql_config /usr/bin/mysql_config tar -xjvf Twisted-15.2.1.tar.bz2 cd Twisted-15.2.1 python setup.py install cd .. tar -zxvf libxml2-git-snapshot.tar.gz cd libxml2-2.9.2/ ./configure make make install cd .. tar -zxvf libxslt-1.1.28.tar.gz cd libxslt-1.1.28/ ./configure make make install cd .. export LD_LIBRARY_PATH=/usr/local/lib tar -zxvf cryptography-0.4.tar.gz cd cryptography-0.4 python setup.py build python setup.py install cd .. tar -zxvf libffi-3.2.1.tar.gz cd libffi-3.2.1 ./configure make make install cd .. export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH #64位系统需要链接 ln -s /usr/local/lib64/libffi.a /usr/local/lib/libffi.a ln -s /usr/local/lib64/libffi.so /usr/local/lib/libffi.so ln -s /usr/local/lib64/libffi.la /usr/local/lib/libffi.la ln -s /usr/local/lib64/libffi.so.6 /usr/local/lib/libffi.so.6 pip install scrapy pip install scrapy-crawlera #64位系统需要链接 ln -s /usr/local/python27/bin/scrapy /usr/bin/scrapy pip install service_identity #创建工程diunale cd /home scrapy startproject diunale scrapy crawl DiunaleSpider
参考文档
http://scrapy.org/
http://blog.pluskid.org/?p=381
http://doc.scrapy.org/en/master/
http://blog.csdn.net/slvher/article/details/42346887
http://blog.csdn.net/niying/article/details/27103081
http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html
http://www.cnblogs.com/rwxwsblog/p/4557123.html
http://www.cnblogs.com/rwxwsblog/p/4572367.html