数据分析的最终意义在于指导相关业务部门进行改进,大多数公司都会花费较多的人力物力在SEO(搜索引擎优化/自然搜索排名)方面。而作为一个SEOer,要做科学的SEO,最起码的就是需要了解各个搜索引擎(一般就只是看目标市场主流的几个)的蜘蛛爬行情况。蜘蛛爬行虽然并不代表收录肯定会好、多、快,或者排名高。但是,从蜘蛛的爬行数据中,我们确实是可以及时发现问题,调整一些优化手法。
网络上有人收集了百度的一些蜘蛛IP段的意义,比如某些IP段爬行是说明快照要更新了,某些IP段可能就会面临被K的风险,由于被转载次数太多,也不清楚哪个是原创就不提供链接了,具体相关文章大家可以Google一下,Google好像也有类似的总结文章。
下面,就说下这个如何具体来操作,这里是最终的实时报告效果预览,不出意外的话,安装好相关代码并进行配置后,可以马上看到:疯狂的蜘蛛。如下图。是的,对于大多数大型网站来说,蜘蛛爬行占用的服务器宽带可能比正常流量还多,因此有些公司会将部分无关的蜘蛛给屏蔽掉,或者给蜘蛛爬行设置延时,限制爬行频次和频率。
,
本文的核心代码来自这里,这位牛人之前在Cardinal Path干了4年,之前那个GA版本的代码就是它写的(我2011年翻译的文章在这里)。
首先,我们下载这个文件包,将文件解压到网站的某个目录。
其次,建立一个新的媒体资源(账号也可以,不能用之前的配置文件/视图,跟踪代码必须不同,避免数据混淆影响正常的网站数据分析),记下媒体资源ID,下面会用到。鉴于蜘蛛爬行可能会是正常PV的好几倍,为了避免数据超限,建议大型网站使用账户(GA限制每个账户每月最多一百万的hit)。
再次,拷贝其中sample.php的代码,复制其中的代码到通用模板的header.php(可能根据网站程序位置不太一样)中,按照提示修改其中的配置文件参数(标黄部分),示例如下:
/***UA For Search Bots****/
//Configuration
$UA_SB_ACCOUNT_ID = “UA-12345678-2“; //Replace with the UA Web Property ID.
$UA_SB_PATH = “api/ua-searchbots/ua-searchbots.php“; //location of the UA for Search Bots script
//Do not edit below this line
//Execute the UA for Search Bots script
include($UA_SB_PATH);
/***\**UA For Search Bots********/
?>
最后,到这一步如果你再去刚才创建的媒体资源查看实时报告,能看到数据的话(根据网站不同可能需要略微等下,一般晚上效果比较明显,大多数蜘蛛都在晚间活动很活跃),说明已经好了。
另外,为了我们更方便的查看数据,我们还可以创建一个自定义报告,这里提供了一个基础模板,大家可以直接使用或者参照稍微修改下用。
以下仅供技术人员参考,一般按照上文完成就OK了:
****如果你发现很多unknown的bot,那么可能需要把botconfig.php这个文件再完善下,比如加入360的搜索引擎蜘蛛、一些其他特定地区的搜索引擎蜘蛛,再或者可能是一些不安分的蜘蛛,具体可以查看服务器的访问日志,不想要的垃圾蜘蛛直接屏蔽吧,省得浪费资源。这里会用到一些正则表达式,其实很多其他地方也会用到,要想更好地应用Google Analytics,正则表达式是一个必修课。
// Search bot patterns and names
$bots = array( ‘Mediapartners-Google[ /]([0-9.]{1,10})’ => ‘Google Mediapartners’,
‘Mediapartners-Google’ => ‘Google Mediapartners’,
‘Googl(e|ebot)(-Image)/([0-9.]{1,10})’ => ‘Google Image’,
‘Googl(e|ebot)(-Image)/’ => ‘Google Image’,
‘^gsa-crawler’ => ‘Google’,
‘Googl(e|ebot)(-Sitemaps)/([0-9.]{1,10})?’ => ‘Google-Sitemaps’,
‘GSiteCrawler[ /v]([0-9.a-z]{1,10})?’ => ‘Google-Sitemaps’,
‘Googl(e|ebot)(-Sitemaps)’ => ‘Google-Sitemaps’,
‘Googl(e|ebot)(-Mobile)/([0-9.]{1,10})?’ => ‘Google-Mobile’,
‘^AdsBot-Google’ => ‘Google-AdsBot’,
‘^Feedfetcher-Google’ => ‘Google-Feedfetcher’,
‘compatible; Google Desktop’ => ‘Google Desktop’,
‘compatible; Googlebot/([0-9.]{1,10})?’ => ‘Google’,
’1Noonbot[/ ]([0-9.]{1,10})’ => ’1noon’,
‘^Yeti$’ => ’1noon’,
’123spider-Bot (Version: ([0-9.]{1,10})’ => ’123Spider’,
’192.comAgent’ => ’192.com’,
’2dehands.nl’ => ’2deHands’,
‘^A1 Sitemap Generator[ /]([0-9.]{1,10})’ => ‘A1 Sitemap’,
‘miggibot[ /]([0-9.]{1,10})’ => ‘A1 Sitemap’,
‘www.a2b.cc’ => ‘A2B’,
‘^ABACHOBot’ => ‘Abacho’,
‘^ABCdatos BotLink[ /]([0-9.]{1,10})’ => ‘ABCdatos’,
‘^abot[ /]([0-9.]{1,10})’ => ‘aBot’,
‘Libby[/ ]([0-9.]{1,10})’ => ‘About’,
‘About[/ ]([0-9.]{1,10})libwww-perl’ => ‘About’,
‘www.ackerm.com’ => ‘Ackerm’,
‘^AcoiRobot’ => ‘AcoiRobot’,
‘Acoon[ -]?Robot’ => ‘Acoon’,
‘Accoona-AI-Agent[/ ]([0-9.]{1,10})’ => ‘Accoona’,
‘^accoona’ => ‘Accoona’,
‘^Acme.Spider’ => ‘Acme’,
‘ActiveBookmark[/ ]([0-9.]{1,10})’ => ‘ActiveBookmark’,
‘Ad Muncher[/ v]([0-9.]{1,10})’ => ‘Ad Muncher’,
‘^AESOP_com_SpiderMan’ => ‘Aesop’,
‘^agadine[/ ]([0-9.]{1,10})’ => ‘Agada’,
‘AIBOT[/ ]([0-9.]{1,10})’ => ‘Aibot’,
‘aipbot[/ ]([0-9.]{1,10})’ => ‘Aipbot’,
‘Aleksika Spider[/ ]([0-9.]{1,10})’ => ‘Aleksika’,
‘ipd[ /]([0-9.]{1,10}).Alertsite.com’ => ‘AlertSite’,
‘^ia_archive’ => ‘Alexa’,
‘www.almaden.ibm.com/cs/crawler’ => ‘IBM Crawler’,
‘Scooter[ /-]*[a-z]*([0-9.]{1,10})’ => ‘Altavista’,
‘AltaVista V([0-9.]{1,10})’ => ‘Altavista’,
‘AltaVista Intranet V([0-9.]{1,10})’ => ‘Altavista’,
‘^(aranhabot|amzn_assoc)’ => ‘Amazon’,
‘^NutchEC2Test’ => ‘Amazon’,
‘^amibot’ => ‘Amidalla’,
‘Amfibibot[/ ]([0-9.]{1,10})’ => ‘Amfibi’,
‘Amfibibot’ => ‘Amfibi’,
‘AmphetaDesk[/ ]([0-9.]{1,10})’ => ‘AmphetaDesk’,
‘amphetameme[ -]?crawler’ => ‘Amphetameme’,
‘^AnnoMille( spider)?[/ ]([0-9.]{1,10})’ => ‘AnnoMille’,
‘AnsearchBot[/ ]([0-9.]{1,10})’ => ‘Ansearch’,
‘AnswerChase( PROve)?[/ ]([0-9.]{1,10})’ => ‘AnswerChase’,
‘antibot-V([0-9.]{1,10})’ => ‘antibot’,
‘^AONDE-Spider’ => ‘Aonde’,
‘^A-Online Search’ => ‘A-Online.at’,
‘^AOLserver-Tcl[/ ]([0-9.]{1,10})’ => ‘AOLserver’,
‘^AOLserver’ => ‘AOLserver’,
‘ApacheBench[ /]([0-9.]{1,10})’ => ‘ApacheBench’,
‘^BebopBot[ /]([0-9.]{1,10})’ => ‘Passion 4 Jazz’,
‘^Apexoo Spider ([0-9.]{1,10})’ => ‘Apexoo’,
‘^Aport’ => ‘Aport’,
‘appie[ /]([0-9.]{1,10})’ => ‘Walhello’,
‘compatible; Arachmo’ => ‘Arachmo’,
‘^X-Crawler’ => ‘Arexera’,
‘^TECOMAC-Crawler[ /]([0-9.]{1,10})’ => ‘Arexera’,
‘^www.arianna.it’ => ‘Arianna’,
‘^ArtfaceBot’ => ‘Artface’,
‘Sleek Spider[ /]([0-9.]{1,10})’ => ‘Any Search Info’,
‘Ask[ -]?Jeeves’ => ‘Ask Jeeves’,
‘teomaagent’ => ‘Ask Jeeves’,
‘^AskAboutOil[ /]([0-9.]{1,10})’ => ‘ASPseek’,
‘^asked[ /]Nutch[ -]([0-9.]{1,10})’ => ‘askEd!’,
‘^ASPseek[/ ]([0-9.]{1,10})’ => ‘ASPseek’,
‘AtlocalBot[/ ]([0-9.]{1,10})’ => ‘At Local’,
‘Atomz[/ ]([0-9.]{1,10})’ => ‘Atomz’,
‘^axel’ => ‘Axel’,
‘AxmoRobot’ => ‘Axmo’,
‘answerbus’ => ‘AnswerBus’,
‘AutoMapIt[ /](Bot)?’ => ‘AutoMapIt’,
‘augurnfind[/ ][v-]([0-9.]{1,10})’ => ‘Augurnfind’,
‘Awasu[/ ]([0-9a-z.]{1,10})’ => ‘Awasu’,
‘BACS http://www.ba.be’ => ‘ba.be’,
‘Baiduspider’ => ‘Baidu’,
’360Spider’ => ’360′,
’360Spider-Image’ => ’360-Image’,
’360Spider-Video’ => ’360-Video’,
‘www.thebananatree.org’ => ‘BananaTree’,
‘bdcindexer([0-9a-z.]{1,10})’ => ‘bdcindexer’,
‘^BDFetch’ => ‘BDFetch’,
‘BDNcentral Crawler v([0-9.]{1,10})’ => ‘Bdncentral’,
‘^BeamMachine[ /]([0-9.]{1,10})’ => ‘BeamMachine’,
‘Become(JP)?Bot[/ ]([0-9.]{1,10})’ => ‘Become’,
‘(BecomeBot|Exabot)@exava.com)$’ => ‘Become’,
‘BeebwareDirectory[/ ]v?([0-9.]{1,10})’ => ‘Beebware’,
‘^Big Brother’ => ‘Big Brother’,
‘^BigCliqueBOT[/ ]([0-9.]{1,10})’ => ‘BigClique’,
‘^BIGLOTRON’ => ‘Biglotron’,
‘Bigsearch.ca[/ ]Nutch[- ]([0-9.]{1,10})’ => ‘Bigsearch’,
‘Bilbo[ /]([0-9.]{1,10})’ => ‘Bilbo’,
‘Bilgi(Beta)?Bot[ /]([0-9.]{1,10})’ => ‘Bilgi’,
‘compatible; bingbot/([0-9.]{1,10})?’ => ‘Bing’,
‘Bitacle (ro)?bot[ (/V:]+([0-9.]{1,10})’ => ‘Bitacle’,
‘BitBeamer/([0-9.]{1,10})’ => ‘BitBeamer’,
‘^Biz360 spider’ => ‘Biz360′,
‘Blaiz-Bee[ /]([0-9.]{1,10})’ => ‘Blaiz-Bee’,
‘Naamah[ /]([0-9.a-z]{1,10})[ /]Blogbot’ => ‘blogbot.de’,
‘^Blogcensus’ => ‘Blogcensus’,
‘Blogdex[ /]([0-9.]{1,10})’ => ‘Blogdex’,
‘^blogg.de’ => ‘Blogg’,
‘BlogLand[/ ]([0-9.]{1,10})’ => ‘BlogLand’,
‘Bloglines[ /]([0-9.]{1,10})’ => ‘Bloglines’,
‘Bloglines’ => ‘Bloglines’,
‘blogmap’ => ‘Blogmap’,
‘Blogosphere’ => ‘Blogosphere’,
‘BlogPeople’ => ‘BlogPeople’,
‘Blogpulse’ => ‘Blogpulse’,
‘^BlogRanking(/RSS checker)?’ => ‘BlogRanking’,
‘blo.gs[ /]([0-9.]{1,10})’ => ‘Blo.gs’,
‘blo.gs’ => ‘Blo.gs’,
‘BlogShares[ /]V?([0-9.]{1,10})’ => ‘BlogShares’,
‘(^| |()Blogshares(.com| |))’ => ‘BlogShares’,
‘Blogslive’ => ‘BlogsLife’,
‘blogsnowbot’ => ‘BlogsNow’,
‘BlogsNow’ => ‘BlogsNow’,
‘^BlogStreetBot’ => ‘BlogStreet’,
‘nomadscafe_ra[/ ]([0-9.]{1,10})’ => ‘BlogSurf’,
‘BlogTickServer’ => ‘BlogTick’,
‘blogWatcher_Spider[/ ]([0-9.]{1,10})’ => ‘Blogwatcher’,
‘Blogwise.com(-MetaChecker)?[/ ]([0-9.]{1,10})’ => ‘Blogwise’,
‘BoardReader[ -](Image|Favicon)[ -]Fetcher[ /]+([0-9.]{1,10})’ => ‘BoardReader’,
‘bobby[ /]([0-9.]{1,10})’ => ‘Bobby’,
‘Boitho.com[ -](dc|robot)?[/ ]([0-9.]{1,10})’ => ‘Boitho’,
‘^booch[ /]([0-9.]{1,10})’ => ‘Booch’,
‘http://www.bookmark.ne.jp’ => ‘Bookmark’,
‘^Bookdog[ /]([0-9.]{1,10})’ => ‘Bookdog’,
‘BorderManager[ /]([0-9.]{1,10})’ => ‘Border Manager’,
‘BottomFeeder[ /]([0-9.]{1,10})’ => ‘BottomFeeder’,
‘BrowserEmulator[ /]([0-9.]{1,10})’ => ‘BrowserEmulator’,
‘Browsershots URL Check’ => ‘Browsershots’,
‘BrowserSpy’ => ‘BrowserSpy’,
‘BruinBot’ => ‘BruinBot’,
‘^Bruno’ => ‘Bruno’,
‘BTbot/([0-9.]{1,10})’ => ‘BitTorrent’,
‘Bulkfeeds[/ ]([a-z0-9.]{1,10})’ => ‘Bulkfeeds’,
‘^Norbert the Spider’ => ‘Burf.com’,
‘Butch(| )?([a-z0-9.]{1,10})’ => ‘Butch’,
‘^Camcrawler’ => ‘Camdiscover’,
‘^CazoodleBot/(Nutch|CazoodleBot)[/ -]([0-9.]{1,10})’ => ‘Cazoodle’,
‘CCGCrawl[/ ]([0-9.]{1,10})’ => ‘CCGCrawl’,
‘^Cerberian Drtrs’ => ‘Cerberian Drtrs’,
‘^CFNetwork[/ ]([0-9.]{1,10})’ => ‘Cerberian Drtrs’,
‘Charlotte[/ ]([0-9a-z.]{1,10})’ => ‘Charlotte’,
‘Cirilizator[/ ]([0-9.]{1,10})’ => ‘Cirilizator’,
‘(Claria|Diamond)(Bot)?[ /]([0-9.]{1,10})’ => ‘Claria’,
‘(Claria|Diamond)(Bot)’ => ‘Claria’,
‘claymont.com’ => ‘Claymont’,
‘OliverPerry’ => ‘Claymont’,
‘Clus(tered-Search-|h)Bot[ /]([0-9.]{1,10})’ => ‘Clush’,
‘ (QXW03018|obot))’ => ‘Cobion’,
‘^coldfusion’ => ‘ColdFusion’,
‘Combine[ /]([0-9.]{1,10})’ => ‘Combine’,
‘^comBot[ /]([0-9.]{1,10})’ => ‘comBot’,
‘cometsearch@cometsystems’ => ‘Comet’,
‘Commerobo[/ ]([0-9.]{1,10})’ => ‘Commerobo’,
‘Comrite[/ ]([0-9.]{1,10})’ => ‘ComRite’,
‘Convera(MultiMedia)?Crawler[/ ]([0-9.]{1,10})’ => ‘Convera’,
‘Convera Internet Spider V([0-9.]{1,10})’ => ‘Convera’,
‘^CoolBot’ => ‘CoolBot’,
‘^(voyager|cfetch|CosmixCrawler|carleson)[/ ]([0-9.]{1,10})’ => ‘Cosmix’,
‘^cosmos’ => ‘Cosmos’,
‘^beautybot[/ ]([0-9.]{1,10})’ => ‘Cosmoty’,
‘CreativeCommons[/ ]([0-9.]{1,6}(-dev)?)’ => ‘Creative Commons’,
‘CsCrawler’ => ‘CsCrawler’,
‘CSS(Check|Validator)’ => ‘CSSCheck’,
‘Custo[ /]([0-9.]{1,10})’ => ‘Custo’,
‘CyberNavi_WebGet[ /]([0-9.]{1,10})’ => ‘CyberNavi’,
‘Cyberz Communication Agent’ => ‘Cyberz’,
‘CydralSpider[ /]([0-9.]{1,10})’ => ‘Cydral’,
‘Cynthia[ /]([0-9.]{1,10})’ => ‘Cynthia Says’,
‘Downloader for X[ /]([0-9.]{1,10})’ => ‘Downloader for X’,
‘^DA[ /]([0-9.]{1,10})’ => ‘DA’,
‘DAUMOA[ /]([0-9.]{1,10})’ => ‘DAUM’,
‘DAUM Web Robot’ => ‘DAUM’,
‘Daum Communications Corp’ => ‘DAUM’,
‘EDI[ /]([0-9.]{1,10})’ => ‘DAUM’,
‘Edacious.*Intelligent Web Robot’ => ‘DAUM’,
‘RaBot[/ ]([0-9.]{1,10}) Agent’ => ‘DAUM’,
‘daypopbot[/ ]([0-9.]{1,10})’ => ‘Daypop’,
‘crawl at delfi dot lt’ => ‘Delfi’,
‘DepSpid[/ ]([0-9.]{1,10})’ => ‘DepSpid’,
‘DEVONtech’ => ‘DEVONagent’,
‘ Diffbot’ => ‘Diffbot’,
‘EZResult — Internet Search Engine’ => ‘Direct Hit’,
‘disco/Nutch[/ -]([0-9.]{1,10})’ => ‘disco’,
‘disco-crawl@discoveryengine.com’ => ‘disco’,
‘compatible; discobot/([0-9.]{1,10})’ => ‘disco’,
‘DISCo Pump[/ ]([0-9.]{1,10})’ => ‘DISCo Pump’,
‘DNS-Digger-Explorer[ /]([0-9.]{1,10})’ => ‘DNS-Digger’,
‘Doctor[ -]?HTML’ => ‘DoctorHTML’,
‘DomaindateiSpider[ /]([0-9.]{1,10})’ => ‘Domaindatei’,
‘^www.doweb.co.uk’ => ‘DoWeb’,
‘Download Ninja[ /]([0-9.]{1,10})’ => ‘Download Ninja’,
‘^Drupal’ => ‘Drupal’,
‘^DSNS’ => ‘DSNS Scanner’,
‘DTS Agent’ => ‘DTS Agent’,
‘EARTHCOM.info[/ ]([0-9a-z.]{1,10})’ => ‘Earthcom’,
‘eBay Relevance Ad Crawler’ => ‘eBay’,
‘TrueRobot[/ ]([0-9.]{1,10})’ => ‘Echo.com’,
‘eert spdr[/ ]([0-9.]{1,10})’ => ‘eert’,
‘eknip[ /]([0-9a-z.]{1,10})’ => ‘E-Knip’,
‘NextGenSearchBot[/ ]([0-9.]{1,10})’ => ‘Eliyon’,
‘^EmeraldShield’ => ‘EmeraldShield’,
‘DigExt; empas)$’ => ‘Empas’,
‘^EMPAS[-]ROBOT’ => ‘Empas’,
‘Speedy[ ]?Spider’ => ‘Entireweb’,
‘envolk[ITS]spider[/ ]([0-9.]{1,10})’ => ‘Envolk’,
‘envolk[/ ]([0-9.]{1,10})’ => ‘Envolk’,
‘ES.NET Crawler[ /]([0-9.]{1,10})’ => ‘ES.NET’,
‘eStyleSearch[ /]([0-9.]{1,10})’ => ‘eStyle Search’,
‘EuripBot[ /]([0-9.]{1,10})’ => ‘Eurip’,
‘www.euro-directory.com’ => ‘Euro Directory’,
‘Arachnoidea’ => ‘EuroSeek’,
‘^EvaalSE’ => ‘Evaal’,
‘^eventax[ /]([0-9.]{1,10})’ => ‘Eventax’,
‘EverbeeCrawler’ => ‘Everbee’,
‘Everest-Vulcan Inc.[ /]([0-9.]{1,10})’ => ‘Everest’,
‘^NG[ /]([0-9.]{1,10})’ => ‘ExaBot’,
‘Exabot/([0-9.]{1,10})’ => ‘ExaBot’,
‘ExaBotTest/([0-9.]{1,10})’ => ‘ExaBot’,
‘ExaBot-(Test|Images)/([0-9.]{1,10})’ => ‘ExaBot’,
‘^exactseek[ -]?(pagereaper|crawler)[ -]?([0-9.]{1,10})’ => ‘ExactSeek’,
‘ExactSeek[ .-]?(Crawler|com)’ => ‘ExactSeek’,
‘Architext[ -]?Spider’ => ‘Excite’,
‘Execrawl[ /]([0-9.]{1,10})’ => ‘Execrawl’,
‘Execrawl’ => ‘Execrawl’,
‘^NetMonitor[ /]([0-9.]{1,10})’ => ‘ExpertMonitor’,
‘^Windows-RSS-Platform[ /]([0-9.]{1,10})’ => ‘Explorer RSS’,
‘FacebookFeedParser[/ ]([0-9a-z.-]{1,10})’ => ‘Facebook’,
‘^FAST( Enterprise |-Web| MetaWeb )?Crawler[ /]([0-9.]{1,10})’ => ‘Fast’,
‘^FAST( Enterprise |-Web| MetaWeb | PartnerSite )?Crawler’ => ‘Fast’,
‘^Fast Crawler’ => ‘Fast’,
‘^libwww-perl[ /]([0-9.]{1,10}) FP[ /]([0-9.]{1,10})’ => ‘Fast’,
‘^fastbuzz.com’ => ‘Fastbuzz’,
‘^FavOrg’ => ‘FavOrg’,
‘favorstarbot[ /]([0-9.]{1,10})’ => ‘favorstar’,
‘^Faxobot[ /]([0-9.]{1,10})’ => ‘Faxo’,
‘FDSE[ -]?robot’ => ‘FDSE Robot’,
‘FeedBack[/ ]([0-9.]{1,10})’ => ‘FeedBack’,
‘^FeedBurner[/ ]([0-9.]{1,10})’ => ‘FeedBurner’,
‘FeedDemon[/ ]([0-9.]{1,10})’ => ‘FeedDemon’,
‘Feed::Find[ /]([0-9.]{1,10})’ => ‘FeedFind’,
‘FeedOnFeeds[/ ]([0-9.]{1,10})’ => ‘Feed On Feeds’,
‘UniversalFeedParser[/ ]([0-9a-z.-]{1,10})’ => ‘Feedparser’,
‘FeedParser’ => ‘Feedparser’,
‘^Feedreader’ => ‘Feedreader’,
‘FeedServer[/ ]([0-9.]{1,10})’ => ‘FeedServer’,
‘Feedster Crawler[/ ]([0-9.]{1,10})’ => ‘Feedster’,
‘^FeedValidator[/ ]([0-9.]{1,10})’ => ‘Feed Validator’,
‘^FDM[/ ]([0-9a-z.]{1,10})’ => ‘Free Download Manager’,
‘Filangy[/ ]([0-9.]{1,10})’ => ‘Filangy’,
‘FindAnISP’ => ‘FindAnISP’,
‘FindEngines! Bot’ => ‘FindEngines’,
‘Findexa Crawler’ => ‘Findexa’,
‘findlinks[ /]([0-9.]{1,10})’ => ‘FindLinks’,
‘^FindLinks’ => ‘FindLinks’,
‘^findoor(-Bot)?’ => ‘findoor’,
‘Firefly’ => ‘Firefly’,
‘^FlashGet’ => ‘FlashGet’,
‘FlickBot[ /]([0-9.]{1,10})’ => ‘FlickBot’,
‘^Forex Trading Network Organization’ => ‘Forex’,
‘fmII URL validator[ /]([0-9.]{1,10})’ => ‘freshmeat’,
‘freshmeat.net URL validator[ /]([0-9.]{1,10})’ => ‘freshmeat’,
‘www.friend.fr’ => ‘Friend’,
‘Frontier[ /]([0-9.]{1,10})’ => ‘Frontier’,
‘Gaisbot[ /]([0-9.]{1,10})’ => ‘Gaisbot’,
‘GalaxyBot[ /]([0-9.]{1,10})’ => ‘Galaxy’,
‘www.galaxy.com’ => ‘Galaxy’,
‘GameSpyHTTP[ /]([0-9.]{1,10})’ => ‘GameSpy’,
‘Genome[ -]?Machine’ => ‘Genome Machine’,
‘GeonaBot[ /]([0-9.]{1,10})’ => ‘Geona’,
‘The World as a Blog’ => ‘The World as a Blog’,
‘geourl[ /]([0-9.]{1,10})’ => ‘GeoUrl’,
‘^GeoURLBot[ /]([0-9.]{1,10})’ => ‘GeoUrl’,
‘ Crayon Crawler’ => ‘GetNetWise’,
‘GetRight[ /]([0-9.]{1,10})’ => ‘GetRight’,
‘GetSmart[ /]([0-9.]{1,10})’ => ‘GetSmart’,
‘(Gigabot|Sitesearch)[/ ]([0-9.]{1,10})’ => ‘Gigablast’,
‘GigabotSiteSearch[/ ]([0-9.]{1,10})’ => ‘Gigablast’,
‘Girafabot’ => ‘Girafa’,
‘Ocelli[ /]([0-9.]{1,10})’ => ‘GlobalSpec’,
‘glucose[ /]([0-9a-z.-]{1,10})’ => ‘Glucose’,
‘^GoForIt.com’ => ‘GoForIt’,
‘^GOFORITBOT’ => ‘GoForIt’,
‘^GoGuidesBot[ /]([0-9.]{1,10})’ => ‘GoGuides’,
‘(gazz|ichiro|mog(et|imogi))[ /]([0-9.]{1,10})’ => ‘Goo’,
‘DoCoMo[ /]([0-9.]{1,10})’ => ‘Goo’,
‘^Big Fish[ /]v?([0-9.]{1,10})’ => ‘GoonGee’,
‘^GPostbot’ => ‘GPost’,
‘^Gregarius[/ ]([0-9.]{1,10})’ => ‘Gregarius’,
‘grub[ -]?client[ /-]{1,5}([0-9.]{1,10})’ => ‘Grub’,
‘grub crawler’ => ‘Grub’,
‘Gulliver’ => ‘Gulliver’,
‘^GurujiBot[/ ]([0-9.]{1,10})’ => ‘Guruji’,
‘^Gush[/ ]([0-9.]{1,10})’ => ‘Gush’,
‘g(id)?zip[ -]?test(er)?’ => ‘Gzip Tester’,
‘^Hanzoweb’ => ‘Hanzoweb’,
‘^Harbot GateStation’ => ‘Harbot’,
‘Hatena (Antenna|Bookmark|Pagetitle Agent)[ /]([0-9.]{1,10})’ => ‘Hatena’,
‘^helix[ /]([0-9.]{1,10})’ => ‘Heritrix’,
‘heritrix[ /]([0-9.]{1,10})’ => ‘Heritrix’,
‘archive.org_bot’ => ‘Heritrix’,
‘InternetArchive[ /]([0-9.a-z]{1,10})’ => ‘Heritrix’,
‘HiddenMarket[ /-]([0-9.]{1,10})’ => ‘HiddenMarket’,
‘Honda-Search[ /]([0-9.]{1,10})’ => ‘Honda’,
‘HooWWWer[ /]([0-9.]{1,10})’ => ‘HooWWWer’,
‘Hotzonu[ /]([0-9.]{1,10})’ => ‘Hotzonu’,
‘HouxouCrawler[ /]Nutch.([0-9.]{1,10})’ => ‘Houxou’,
‘HouxouCrawler’ => ‘Houxou’,
‘htdig[ /]([0-9.]{1,10})’ => ‘ht://Dig’,
‘^HTML2JPG’ => ‘HTML2JPG’,
‘httperf[ /]([0-9.]{1,10})’ => ‘HTTPerf’,
‘httpunit[ /]([0-9.]{1,10})’ => ‘HttpUnit’,
‘HTTrack[ /]([0-9.]{1,10})’ => ‘HTTrack’,
‘HuRob[ /]([0-9.]{1,10})’ => ‘Hungary’,
‘iaskspider[ /]([0-9.]{1,10})’ => ‘IAsk’,
‘^iaskspider’ => ‘IAsk’,
‘^ICC-Crawler’ => ‘ICC-Crawler’,
‘BlogzIce[ /]([0-9.]{1,10})’ => ‘Icerocket’,
‘BlogSearch[ /]([0-9.]{1,10})’ => ‘Icerocket’,
‘^ICRASEMantic_spider[ /]([0-9.]{1,10})’ => ‘ICRA’,
‘^Mozilla[/ ]([0-9.]{1,10})[/ ](compatible[ ;]ICS’ => ‘Novell iChain Cool Solutions caching’,
‘Comaneci_bot[ /]([0-9.]{1,10})’ => ‘I know’,
‘ilial[ /]Nutch[ -]([0-9.]{1,10})’ => ‘Ilial’,
‘I(NGRID|lseRobot|lseBot)[ /]([0-9.]{1,10})’ => ‘Ilse’,
‘iltrovatore-setaccio[ /]([0-9.]{1,10})’ => ‘IlTrovatore’,
‘Iltrovatore-Setaccio’ => ‘IlTrovatore’,
‘iltrovatore[ /]([0-9.]{1,10})’ => ‘IlTrovatore’,
‘Indy[ -]?Library’ => ‘Indy Library’,
‘InelaBot[ /]([0-9.]{1,10})’ => ‘Inela’,
‘InetURL.?[ /]([0-9.]{1,10})’ => ‘InetURL’,
‘InfoArt crawler’ => ‘InfoArt’,
‘^DataFountains/DMOZ’ => ‘INFOMINE’,
‘^INFOMINE[ /]([0-9.]{1,10})’ => ‘INFOMINE’,
‘SideWinder[ /]?([0-9a-z.]{1,10})’ => ‘Infoseek’,
‘Infoseek’ => ‘Infoseek’,
‘slurp@inktomi.com’ => ‘Inktomi’,
‘^InnerpriseBot[ /]([0-9.]{1,10})’ => ‘Innerprise’,
‘URL[ _]Spider[ _]Pro[ /]([0-9.+]{1,10})’ => ‘Innerprise’,
‘^ES[ .]NET[ ]Crawler[ /]([0-9.]{1,10})’ => ‘Innerprise’,
‘^xyro’ => ‘Inria’,
‘^Insitor(,|.|naut)’ => ‘Insitor’,
‘^Internet Ninja[ /]([0-9.]{1,10})’ => ‘Internet Ninja’,
‘^InternetSeer.com’ => ‘InternetSeer’,
‘Interseek.com’ => ‘Interseek’,
‘IntraVnews[ /]([0-9.]{1,10})’ => ‘IntraVnews’,
‘^IP2(Map|Location)Bot[ /]([0-9.]{1,10})’ => ‘IP2LocationBot’,
‘^IP*Works! V([0-9.]{1,10})’ => ‘IPWorks’,
‘^ICRA_(label_generator|Semantic_spider)[ /]([0-9.]{1,10})’ => ‘Novell iChain Cool Solutions caching’,
‘Irvine[ /]([0-9.]{1,10})’ => ‘Irvine’,
‘ips-agent’ => ‘ips-agent’,
‘ISSpider[ /-]([0-9.]{1,10})’ => ‘ISSpider’,
‘iVia Site Checker.?[ /]([0-9.]{1,10})’ => ‘iVia’,
‘Jetbot[ /]([0-9.]{1,10})’ => ‘Jeteye’,
‘Jigsaw[ /]([0-9.]{1,10})’ => ‘Jigsaw’,
‘www.jobs.de’ => ‘jobs.de’,
‘jobs.de-Robot’ => ‘jobs.de’,
‘JPluck[ /]([0-9a-z.]{1,10})’ => ‘Jpluck’,
‘falcon[ /]([0-9.]{1,10})’ => ‘Jxta’,
‘jyte_fetcher[ /]([0-9.]{1,10})’ => ‘Jyte’,
‘Jyxobot[ /]([0-9.]{1,10})’ => ‘Jyxo’,
‘EasyDL[ /]([0-9.]{1,10})’ => ‘Keywen’,
‘kinjabot[ /]([0-9.]{1,10})’ => ‘Kinja’,
‘^kinjabot’ => ‘Kinja’,
‘lachesis’ => ‘Lachesis’,
‘lanshanbot[/ ]([0-9.]{1,10})’ => ‘Lachesis’,
‘LapozzBot[/ ]?([0-9.]{1,10})’ => ‘Lapozz’,
‘larbin[/ ]?([0-9.]{1,10})’ => ‘Larbin’,
‘^IPiumBot’ => ‘Laurion’,
‘^LeechGet[ /]([0-9.]{1,10})’ => ‘LeechGet’,
‘Linkguard Online[ /]([0-9.]{1,10})’ => ‘Linkguard’,
‘(compatible; Linkman)’ => ‘Linkman’,
‘checklink[ /]([0-9.]{1,10})’ => ‘Linkcheck’,
‘Link[ -]?(Chec(k|ker)|Val(et|idator))’ => ‘Linkcheck’,
‘Adaxas Spider’ => ‘Linkcheck’,
‘Agent-SharewarePlazaFileCheckBot[ /]([0-9.]{1,10})’ => ‘Linkcheck’,
‘NetMechanic V([0-9.]{1,10})’ => ‘Linkcheck’,
‘^InfoLink’ => ‘Linkcheck’,
‘InternetLinkAgent’ => ‘Linkcheck’,
‘; SPENG)’ => ‘Linkcheck’,
‘SharewarePlazaFileCheckBot’ => ‘Linkcheck’,
‘fileboost.net’ => ‘Linkcheck’,
‘^billbot’ => ‘Linkcheck’,
‘^Link.RU bot’ => ‘Link.RU’,
‘links sql’ => ‘Links SQL’,
‘LinkSweeper[ /]([0-9.]{1,10})’ => ‘Link Sweeper’,
‘^LinkWalker’ => ‘Link Walker’,
‘^Livedoor( SF( – California Crawl)?|Checkers)[ /]‘ => ‘Livedoor’,
‘^LiveJournal.com’ => ‘Live Journal’,
‘LjSEEK Picture-Bot[ /]+([0-9.]{1,10})’ => ‘ljpic’,
‘^lmspider’ => ‘Lmspider’,
‘^FiNDoBot[/ ]([0-9a-z.]{1,10})’ => ‘Locaters’,
‘www.look.com’ => ‘Look’,
‘Lookbot’ => ‘Look’,
‘^Martini’ => ‘LookSmart’,
‘^MantraAgent’ => ‘LookSmart’,
‘FurlBot’ => ‘LookSmart’,
‘looksmart-sv-fw’ => ‘LookSmart’,
‘NetResearchServer[ /]([0-9.]{1,10})’ => ‘LOOP’,
‘Lotkyll’ => ‘Lotkyll’,
‘lwp(-trivial|::simple)[ /]([0-9.]{1,10})’ => ‘lwp’,
‘Lycos_Spider_’ => ‘Lycos’,
‘MagpieRSS’ => ‘MagpieRSS’,
‘Mail[ -]?Sweeper’ => ‘Mail Sweeper’,
‘^Marvin’ => ‘Marvin’,
‘Mosad[ /]([0-9.]{1,10})’ => ‘Mat’Kurja’,
‘Mavicanet robot’ => ‘Mavicanet’,
‘^libwww[ /]([0-9.]{1,10})’ => ‘Mediater’,
‘Mercator’ => ‘Mercator’,
‘^RRC (crawler_admin@bigfoot.com)’ => ‘Metacarta’,
‘^flunky’ => ‘Metacarta’,
‘^Mozilla.*(samualt9@bigfoot.com)$’ => ‘Metacarta’,
‘MetaGer’ => ‘MetaGer’,
‘^XRL[ /]([0-9.a-z]{1,10})’ => ‘Metamark’,
‘MediBot[ /]([0-9.]{1,10})’ => ‘MetaMedic’,
‘Mirago’ => ‘Mirago’,
‘AlgoFeedback@miva.com’ => ‘Miva’,
‘Mj12bot[ /]v?([0-9.]{1,10})’ => ‘Majestic-12′,
‘MJ12bot (mini)[ /]([0-9.]{1,10})’ => ‘Majestic-12′,
‘Mnogosearch[ /-]([0-9.]{1,10})’ => ‘Mnogo’,
‘MojeekBot[ /]([0-9.]{1,10})’ => ‘MojeekBot’,
‘MOMspider[ /]([0-9.]{1,10})’ => ‘MOM Spider’,
‘^Moreoverbot[ /]([0-9.]{1,10})’ => ‘Moreover’,
‘MovableType[ /]([0-9.]{1,10})’ => ‘Movable Type’,
‘mozDex[ /]([0-9.]{1,6}(-dev)?)’ => ‘MozDex’,
‘MQbot’ => ‘MQbot’,
‘MSN(BOT|PTC)[ /]([0-9.]{1,10})’ => ‘MSN’,
‘MS Search ([0-9.]{1,10}) Robot’ => ‘MSN’,
‘MSNBOT_Mobile’ => ‘MSN Mobile’,
‘MSMOBOT’ => ‘MSN Mobile’,
‘MSNBOT-(MEDIA|PRODUCTS)[ /]([0-9.]{1,10})’ => ‘MS Live Search’,
‘MSProxy[ /]([0-9.]{1,10})’ => ‘MSProxy’,
‘^MSRBOT’ => ‘MSRBOT’,
‘Microsoft[ -]?WebDAV[ -]?MiniRedir’ => ‘MS-WebDAV’,
‘MTIcon[/ ]([0-9.]{1,10})’ => ‘MTIcon’,
‘MyRSS.jp[/ ]([0-9.]{1,10})’ => ‘MyRSS’,
‘Multimap Geotag Blog Parser[/ ]([0-9.]{1,10})’ => ‘Multimap’,
‘Najdi.si’ => ‘Najdi.si’,
‘NPBot’ => ‘Name Protect’,
‘NationalDirectory-WebSpider[ /]([0-9.]{1,10})’ => ‘National Directory’,
‘NATSU[ -]MICAN[/ ]([0-9a-z.]{1,10})’ => ‘Natsu Mican’,
‘NaverBot([-]dloader)?[/ -]([0-9.]{1,10})’ => ‘Naver’,
‘Naver(Bot)?’ => ‘Naver’,
‘^nabot’ => ‘Naver’,
‘Navisso(Bot)?’ => ‘Navisso’,
‘www.neofonie.de’ => ‘neofonie’,
‘Francis[ /]([0-9.]{1,10})’ => ‘Neomo’,
‘Nessus)$’ => ‘Nessus’,
‘NetAnts[ /]([0-9.]{1,10})’ => ‘NetAnts’,
‘netcraft’ => ‘Netcraft’,
‘Netluchs[ /]([0-9.a-z]{1,10})’ => ‘Netluchs’,
‘NetMechanic[ /V]{1,5}([0-9.]{1,10})’ => ‘NetMechanic’,
‘NetNose[ -]Crawler[/ ]([0-9.]{1,10})’ => ‘NetNose’,
‘netoskop’ => ‘Netoskop’,
‘NetPromoter Spider’ => ‘NetPromoter’,
‘^netprospector’ => ‘Netprospector’,
‘^NetPumper[/ ]([0-9.]{1,10})’ => ‘Netpumper’,
‘Netscape-Proxy[/ ]([0-9.]{1,10})’ => ‘Netscape Proxy’,
‘^WebFilter Robot ([0-9.]{1,10})’ => ‘NetSpective’,
‘^Netvibes’ => ‘ Netvibes’,
‘NewsFire[/ ]([0-9.]{1,10})’ => ‘NewsFire’,
‘NewsGato(r|rOnline)[/ ]([0-9.]{1,10})’ => ‘NewsGator’,
‘NewzCrawler[/ ]([0-9.]{1,10})’ => ‘NewzCrawler’,
‘^NextopiaBOT.[v ]([0-9.]{1,10})’ => ‘NewzCrawler’,
‘NG-Search[/ ]([0-9.]{1,10})’ => ‘NG Search’,
‘NimbleCrawler[/ ]([0-9.]{1,10})’ => ‘Nimble’,
‘^nuSearch’ => ‘NuSearch’,
‘Noago Spider’ => ‘Noago’,
‘TridentSpider[/ ]?([0-9.]{1,10})’ => ‘Noviforum’,
‘noxtrumbot[/ ]?([0-9.]{1,10})’ => ‘noXtrum’,
‘noyona.([0-9._]{1,10})’ => ‘Noyona’,
‘Nsauditor[ /]([0-9.]{1,10})’ => ‘Nsauditor’,
‘obidos[ -]?bot’ => ‘Bookwatch’,
‘ObjectsSearch[ /]([0-9.]{1,10})’ => ‘Objects Search’,
‘^oBot ‘ => ‘oBot’,
‘^Octora (Beta)?’ => ‘Octora’,
‘^Offline Explorer[ /]([0-9.]{1,10})’ => ‘OfflineExplorer’,
‘Omea Reader[ /]([0-9.]{1,10})’ => ‘Omea Reader’,
‘OnetSzukaj[ /]([0-9.]{1,10})’ => ‘Onet’,
‘^Onet.pl’ => ‘Onet’,
‘inktomi.search.onet’ => ‘Onet’,
‘^Online24-Bot . ([0-9.]{1,10})’ => ‘online24′,
‘^onCHECK-Robot’ => ‘onsearch’,
‘^OntoSpider[ /]([0-9.]{1,10})’ => ‘OntoSpider’,
‘openbot[ /]([0-9.]{1,10})’ => ‘Openfind’,
‘Openfind Robot[ /]([0-9.A-Z]{1,10})’ => ‘Openfind’,
‘^OpenTaggerBot’ => ‘OpenTagger’,
‘^OpenTextSiteCrawler[ /]([0-9.]{1,10})’ => ‘OpenText’,
‘^OpenWebSpider[ /]([0-9.]{1,10})’ => ‘OpenWebSpider’,
‘crawler@organica.us’ => ‘Organica’,
‘OutfoxMelonBot[ /]([0-9.]{1,10})’ => ‘Outfox Melon’,
‘OutfoxBot[ /]([0-9.]{1,10})’ => ‘Outfox Melon’,
‘Overture[ -]?WebCrawler’ => ‘Overture’,
‘^PageBitesHyperBot[ /]([0-9.]{1,10})’ => ‘PageBites’,
‘PanopeaBot[/ ]([0-9.]{1,10})’ => ‘PanopeaBot’,
‘^PEERbot’ => ‘Peerbot’,
‘^PHP[ /]([0-9.]{1,10})’ => ‘PHP’,
‘^PhpDig[ /]([0-9.]{1,10})’ => ‘PhpDig’,
‘^PHP version tracker’ => ‘PHP version tracker’,
‘^PictureOfInternet[ /]([0-9.]{1,10})’ => ‘PictureOfInternet’,
‘^Pingdom GIGRIB v([0-9.]{1,10})’ => ‘Pingdom’,
‘^Pingdom GIGRIB’ => ‘Pingdom’,
‘^Pingdom.com_bot_version([0-9.]{1,10})’ => ‘Pingdom’,
‘www.pinseri.com/bloglist’ => ‘Pinseri’,
‘Plagger[ /]([0-9.]{1,10})’ => ‘Plagger’,
‘Planet[ /]([0-9.]{1,10})’ => ‘Planet’,
‘PlantyNet_WebRobot[_ /]V?([0-9.]{1,10})’ => ‘PlantyNet’,
‘PluckFeedCrawler[ /]([0-9.]{1,10})’ => ‘Pluck’,
‘fido[ /]([0-9.]{1,10}) Harvest’ => ‘PlanetSearch’,
‘^POE-Component-Client-HTTP[/ ]([0-9.]{1,10})’ => ‘POE-Component’,
‘Pogodak.hr[/ ]?([0-9.]{1,10})’ => ‘Pogodak’,
‘P(oo|ooo)dle[ -]?predictor[ -]?([0-9.]{1,10})’ => ‘Poodle predictor’,
‘P(oo|ooo)dle[ -]?predictor’ => ‘Poodle predictor’,
‘Pompos[ /]([0-9.]{1,10})’ => ‘Pompos’,
‘Popdexter’ => ‘Popdexter’,
‘Powermarks[ /]([0-9.]{1,10})’ => ‘Powermarks’,
‘^PROBE!’ => ‘PROBE!’,
‘^Mozilla/[0-9.]{1,10} (compatible\;)$’ => ‘Proxy Cache’,
‘ProxyHunter’ => ‘ProxyHunter’,
‘^psbot’ => ‘PicSearch’,
‘^PubSub-RSS-Reader[ /]([0-9.]{1,10})’ => ‘PubSub’,
‘^PubSub.com’ => ‘PubSub’,
‘PukiWiki[ /]([0-9.]{1,10})’ => ‘PukiWiki’,
‘^PWeBot[ /]([0-9.]{1,10})’ => ‘PWeBot/X.Y’,
‘^pxys’ => ‘PXYS’,
‘^Qango.com’ => ‘Qango’,
‘QihooBot[ /]([0-9.]{1,10})’ => ‘Qihoo’,
‘Quantcastbot[ /]([0-9.]{1,10})’ => ‘Quantcast’,
‘Quepasa[ -]?Creep’ => ‘Quepasa’,
‘www.questfinder.com’ => ‘QuestFinder’,
‘^QweeryBot[ /]([0-9.]{1,10})’ => ‘Qweery’,
‘www.radian6.com’ => ‘Radian6′,
‘StackRambler[ /]([0-9.]{1,10})’ => ‘Rambler’,
‘^ramiba(-bot)?’ => ‘ramiba’,
‘^RedBot/redbot[ /-]([0-9.]{1,10})’ => ‘rediff’,
‘webmaster@repia.com’ => ‘Repia’,
‘Robozilla’ => ‘Robozilla’,
‘Rojo[ /]([0-9.]{1,10})’ => ‘Rojo’,
‘rss-bot[ /]([0-9.]{1,10})’ => ‘rss-bot’,
‘RssBandit[ /]([0-9.]{1,10})’ => ‘RssBandit’,
‘rssImagesBot[ /]([0-9.]{1,10})’ => ‘rssImages’,
‘RSSMicro.com’ => ‘RSSMicro’,
‘RSSOwl[ /]([0-9a-z.]{1,10})’ => ‘RSSOwl’,
‘RssReader[ /]([0-9.]{1,10})’ => ‘RssReader’,
‘RufusBot’ => ‘RufusBot’,
‘Runnk RSS finder’ => ‘Runnk’,
‘MaSagool’ => ‘Sagool’,
‘SanszBot’ => ‘Sansz’,
‘Sauce[ ]?Reader[ /]([0-9.]{1,10})’ => ‘Sauce Reader’,
‘SBIder[/ ]([0-9.]{1,10})’ => ‘SBIder’,
‘SBIder[/ ]SBIder.([0-9.]{1,10})’ => ‘SBIder’,
‘FAST-WebCrawler/[0-9a-z.]{1,10}/Scirus’ => ‘Scirus’,
‘Scrubby[ /]([0-9.]{1,10})’ => ‘Scrubby’,
‘Sun Download Manager[/ ]([0-9.]{1,10})’ => ‘SUN Download Manager’,
‘SEA-Links( HTML-Scanner Pingoo!)?[ /]([0-9.]{1,10})’ => ‘Sea Links’,
‘search.ch[ /]?V?([0-9.]{1,10})’ => ‘Search.ch’,
‘searchengineworld’ => ‘SearchEngineWorld’,
‘searchhippo’ => ‘Searchhippo’,
‘www.unitek-systems.co.uk[ /]([0-9.]{1,10})’ => ‘SearchThruUs’,
‘securecomputing’ => ‘Secure Computing’,
‘Seekbot[ /]([0-9.]{1,10})’ => ‘Seekport’,
‘semanticdiscovery[ /]([0-9.]{1,10})’ => ‘Semantic Discovery’,
‘^Sensis(.com.au)? Web Crawler’ => ‘Sensis’,
‘SeznamBot[ /]([0-9.]{1,10})’ => ‘Seznam’,
‘SharpReader[ /]([0-9.]{1,10})’ => ‘SharpReader’,
‘sherlock_spider’ => ‘Sherlock Spider’,
‘shim[ -]crawler’ => ‘Shim Crawler’,
‘^ShopWiki[ /]([0-9.]{1,10})’ => ‘ShopWiki’,
‘^Shoula.com Crawler ([0-9.]{1,10})’ => ‘Shoula’,
‘Siege[ /]([0-9.]{1,10})’ => ‘Siege’,
‘SietsCrawler[ /]([0-9.]{1,10})’ => ‘Siets’,
‘^(argus|simpy)[ /]([0-9.]{1,10})’ => ‘Simpy’,
‘asterias[ /]([0-9.]{1,10})’ => ‘SingingFish’,
‘Asterias Crawler v([0-9.]{1,10})’ => ‘SingingFish’,
‘asterias’ => ‘SingingFish’,
‘Sirketcebot[ /v]+([0-9.]{1,10})’ => ‘Sirketce’,
‘sirobot’ => ‘SiroBot’,
‘SiteBar[ /]([0-9.]{1,10})’ => ‘SiteBar’,
‘SBIder[/ ]([0-9a-z.-]{1,10})’ => ‘SiteSell’,
‘^SiteSpider’ => ‘SiteSpider’,
‘SitiDiBot[ /]([0-9.]{1,10})’ => ‘SitiDi’,
‘Skampy[ /]([0-9.-]{1,10})’ => ‘Skaffe’,
‘SKIZZLE! Distributed Internet Spider[ /v]+([0-9a-z.-]{1,10})’ => ‘Skizzle’,
‘^slug.ch crawl ([0-9a-z.-]{1,10})’ => ‘slugch’,
‘^Snoopy.+([0-9.]{1,10})’ => ‘Snoopy’,
‘sna-([0-9.]{1,10})’ => ‘Snoopy’,
‘^SnykeBot[ /]([0-9.]{1,10})’ => ‘Snyke’,
‘^Slider[ /]([0-9.]{1,10})’ => ‘Slider’,
‘soegning.dk[/ ]spider[ /]([0-9.]{1,10})’ => ‘Søgning’,
‘Mozilla\@somewhere.com’ => ‘somewhere.com’,
‘^SWSBot-Images[ /]([0-9.]{1,10})’ => ‘SmartWareSoft’,
‘SOFT411 Directory’ => ‘Soft411′,
‘Sogou web spider[ /]([0-9.]{1,10})’ => ‘Sogou’,
‘sohu[ -](agent|search)’ => ‘Sohu’,
‘Sopheus Project[ /]([0-9.]{1,10})’ => ‘Sopheus’,
‘SoupPotBot’ => ‘SoupPot’,
‘^SMBot[ /]([0-9.]{1,10})’ => ‘Specific Media’,
‘^Sphere Scout.+([0-9.]{1,10})’ => ‘Sphere Scout’,
‘^sproose[ /]([0-9a-z.]{1,10})’ => ‘sproose’,
‘SpurlBot[/ ]([0-9.]{1,10})’ => ‘SpurlBot’,
‘^Star Downloader( Pro)?’ => ‘Star Downloader’,
‘Steeler[ /]([0-9.]{1,10})’ => ‘Steeler’,
‘Strategic Board Bot’ => ‘Strategic Board’,
‘^suchbaer.de’ => ‘suchbaer.de’,
‘^suchbot’ => ‘suchbot’,
‘^gonzo([0-9]{1,2}).www.suchen.de’ => ‘suchen.de’,
‘^Suchknecht.at-Robot’ => ‘Suchknecht’,
‘^suchpadbot[ /]([0-9.]{1,10})’ => ‘suchpad’,
‘^Sunrise[ /]([0-9a-z.]{1,10})’ => ‘Sunrise’,
‘SuperBot[ /]([0-9.]{1,10})’ => ‘SuperBot’,
‘SurfControl’ => ‘SurfControl’,
‘ScSpider[ /]([0-9.]{1,10})’ => ‘SurfControl’,
‘AVSearch[ -]([0-9.]{1,10})’ => ‘SURFnet’,
‘Submission Spider at surfsafely.com’ => ‘Surfsafely’,
‘SurveyBot[ /]([0-9.]{1,10})’ => ‘Whois Survey’,
‘^Swooglebot[ /]([0-9.]{1,10})’ => ‘Swoogle’,
‘sw.deri.org’ => ‘SWSE’,
‘www.sygol.(com|net)’ => ‘Sygol’,
‘ Synapse)’ => ‘Synapse’,
‘^!Susie’ => ‘sync2it’,
‘^SyncIT[ /]([0-9.]{1,10})’ => ‘syncit’,
‘Syndic8[ /]([0-9.]{1,10})’ => ‘Syndic8′,
‘Syndicatie.nl robot v ([0-9.]{1,10})’ => ‘Syndicatie.nl’,
‘Syndicatie.nl robot;’ => ‘Syndicatie.nl’,
‘^SynoBot’ => ‘Synomia’,
‘SynooBot[ /]([0-9.]{1,10})’ => ‘SynooBot’,
‘Szukacz[ /]([0-9.]{1,10})’ => ‘Szukacz’,
‘^Tagword’ => ‘Tagword’,
‘^Trailfire-bot[ /]([0-9.]{1,10})’ => ‘Trailfire’,
‘IRLbot[ /]([0-9.]{1,10})’ => ‘Tamu Crawler’,
‘TAMU_CS_IRL_CRAWLER[ /]([0-9.]{1,10})’ => ‘Tamu Crawler’,
‘TargetSeek[ /]([0-9.]{1,10})’ => ‘TargetSeek’,
‘^TCDBOT/Nutch-([0-9.]{1,10})’ => ‘Trinity College Dublin’,
‘Technoratibot[ /]([0-9.]{1,10})’ => ‘Technorati’,
‘Teleport[ -]?Pro’ => ‘Teleport’,
‘^Fresh Search :: Terrar’ => ‘Terrar’,
‘Theophrastus[ /]([0-9.]{1,10})’ => ‘Theophrastus’,
‘^thumbnail.cz robot[ /]([0-9.]{1,10})’ => ‘thumbnails.cz’,
‘^thumbshots.(Version: |v)([0-9.]{2,10})e’ => ‘thumbshots’,
‘^thumbshots-de’ => ‘thumbshots’,
‘Thunderbird[ /]([0-9a-z.]{1,10})’ => ‘Thunderbird’,
‘T-H-U-N-D-E-R-S-T-O-N-E’ => ‘Thunderstone’,
‘timboBot’ => ‘timboBot’,
‘traycebot[ /]([0-9a-z.-]{1,10})’ => ‘trayce’,
‘B_l_i_t_z_B_O_T_@_t_r_i_c_u_s_._c_o_m’ => ‘Tricus’,
‘topicblogs[ /]([0-9.]{1,10})’ => ‘Topicblogs’,
‘tuezilla.de’ => ‘TÜzilla’,
‘TurnitinBot[ /]([0-9.]{1,10})’ => ‘Turnitin’,
‘TutorGig(Bot)?[ /]([0-9.]{1,10})’ => ‘TutorGig’,
‘Twiceler[ /-]([0-9.]{1,10})’ => ‘cuill’,
‘Twiceler’ => ‘cuill’,
‘TypePad/([0-9a-z.]{1,10})’ => ‘TypePad’,
‘UdmSearch[/ ]([0-9.]{1,10})’ => ‘UdmSearch’,
‘^Mackster.ukwizz’ => ‘UKWizz’,
‘Ultraseek’ => ‘Ultraseek’,
‘UltraSpider3000[/ ]([0-9.]{1,10})’ => ‘UltraSpider’,
‘umai[/ ]([0-9.]{1,10})’ => ‘umai’,
‘unchaos_crawler[_ /]([0-9.]{1,10})’ => ‘Unchaos’,
‘unchaos’ => ‘Unchaos’,
‘^unido-bot’ => ‘unido’,
‘updated[ /]([0-9a-z.]{1,10})’ => ‘Updated’,
‘^UptimeBot’ => ‘UptimeBot’,
‘^URI::Fetch[ /]([0-9.]{1,10})’ => ‘URI::Fetch’,
‘URLBase[ /]([0-9.]{1,10})’ => ‘URLBase’,
‘^URLBlaze’ => ‘URLBlaze’,
‘Microsoft URL[ -]?Control’ => ‘MS URL Control’,
‘^URLGetFile’ => ‘URLGetFile’,
‘UrlScope’ => ‘UrlScope’,
‘Snappy/([0-9.]{1,10})’ => ‘urltrends’,
‘usww.com’ => ‘usww’,
‘Mozilla/5.0 URL-Spider’ => ‘usww’,
‘^USyd-NLP-Spider’ => ‘USyd-NLP-Spider’,
‘Vagabondo[ /]([0-9.]{1,10})’ => ‘WiseGuys’,
‘Vagabondo-WAP[ /]([0-9.]{1,10})’ => ‘WiseGuys’,
‘W3C_Validator[ /]([0-9.]{1,10})’ => ‘W3C Validator’,
‘^vspider[ /]([0-9.]{1,10})’ => ‘Verity’,
‘^vspider’ => ‘Verity’,
‘InfoFly[ /]([0-9.]{1,10})’ => ‘Versions-project’,
‘^VMBot[ /]([0-9.]{1,10})’ => ‘VerticalMatch’,
‘Verzamelgids[ /]([0-9.]{1,10})’ => ‘Verzamelgids’,
‘AlkalineBOT[ /]([0-9.]{1,10})’ => ‘Vestris’,
‘Vindex[ /]([0-9.]{1,10})’ => ‘Vindex’,
‘VisBot[ /]([0-9.]{1,10})’ => ‘Visvo’,
‘VoilaBot[ /]?[a-z ]([0-9.]{1,10})’ => ‘Voila’,
‘VoilaBot;[ /]([0-9.]{1,10})’ => ‘Voila’,
‘Vonna.com b o t’ => ‘Vonna’,
‘Vortex[ /]([0-9.]{1,10})’ => ‘Vortex’,
‘^W3SiteSearch Crawler[_v]*([0-9.]{1,10})’ => ‘W3SiteSearch’,
‘^Waggr’ => ‘Wagger’,
‘^SurferF3[ /]([0-9./]{1,10})’ => ‘Wanadoo’,
‘wapalizer[ /]([0-9.]{1,10})’ => ‘Wapalizer’,
‘Watson[ /]([0-9.]{1,10})’ => ‘Dr.Watson’,
‘watson.addy.com’ => ‘Dr.Watson’,
‘^Wavefire[ /]([0-9.]{1,10})’ => ‘Wavefire’,
‘Waypath[ -]?Scout’ => ‘Waypath’,
‘Waypath (development )?crawler’ => ‘Waypath’,
‘^WDG(Site)?Validator[/ ]([0-9.]{1,10})’ => ‘WDG Validator’,
‘^Webagogo’ => ‘Webagogo’,
‘^WebAlta( Crawler)?[/ ]([0-9.]{1,10})’ => ‘WebAlta’,
‘ Webbot[/ ]([0-9.]{1,10})’ => ‘Webbot.ru’,
‘WebCapture[/ ]([0-9.]{1,10})’ => ‘WebCapture’,
‘webcollage’ => ‘Webcollage’,
‘WebCopier[/ ]v?([0-9.]{1,10})’ => ‘WebCopier’,
‘webcrawl.net’ => ‘WebCrawl’,
‘Web Downloader[/ ]([0-9.]{1,10})’ => ‘Web Downloader’,
‘^webfetch[/ ]([0-9.]{1,10})’ => ‘webfetch’,
‘^WebFindBot’ => ‘Webfind’,
‘^Webglimpse[/ ]([0-9.]{1,10})’ => ‘Webglimpse’,
‘^webGobbler[/ ]([0-9.]{1,10})’ => ‘webGobbler’,
‘^WebImages[/ ]([0-9.]{1,10})’ => ‘WebImages’,
‘^WebLight[/ ]([0-9.]{1,10})’ => ‘WebLight’,
‘^Weblink.s checker’ => ‘WebLink’s’,
‘^webmeasurement-bot’ => ‘Webmeasurement’,
‘^WebMiner[/ ]([0-9.]{1,10})’ => ‘WebMiner’,
‘^webmin’ => ‘Webmin’,
‘WebMon[ /]([0-9.]{1,10})’ => ‘Webmon’,
‘^WebPatrol[ /]([0-9.]{1,10})’ => ‘WebPatrol’,
‘WebPix[/ ]([0-9.]{1,10})’ => ‘WebPix’,
‘^WebRACE[/ ]([0-9.]{1,10})’ => ‘WebRACE’,
‘^WebReaper ‘ => ‘WebReaper’,
‘Der webresult.de Robot’ => ‘Webresult’,
‘WebRingChecker[/ ]([0-9.]{1,10})’ => ‘Webring Checker’,
‘WeBoX[/ ]([0-9.]{1,10})’ => ‘ WeBoX’,
‘WebSearch.COM.AU[/ ]+([0-9.]{1,10})’ => ‘WebSearch.COM.AU’,
‘WebSearchBench WebCrawler[v/ ]+([0-9.]{1,10})’ => ‘WebSearchBench’,
‘(Sqworm|websense|Konqueror/3.(0|1)(-rc[1-6])?; i686 Linux; 2002[0-9]{4})’ => ‘Websense’,
‘WebsiteWorth[v/ ]+([0-9.]{1,10})’ => ‘WebsiteWorth’,
‘webs(quash.com|ite[ -]?Monitor)’ => ‘Websquash’,
‘WebStripper[ /]([0-9.]{1,10})’ => ‘WebStripper’,
‘Web[ -]?ZIP[ /]([0-9.]{1,10})’ => ‘WebZIP’,
‘WEP Search[ /]([0-9.]{1,10})’ => ‘WEP Search’,
‘^West Wind Internet Protocols[ /]([0-9.]{1,10})’ => ‘West Wind Internet Protocols’,
‘Wget[ /]([0-9.]{1,10})’ => ‘Wget’,
‘WhizBang’ => ‘WhizBang’,
‘^WebFetch’ => ‘WingFlyer’,
‘TeamSoft WinInet Component’ => ‘WinInet’,
‘WinHttp.WinHttpRequest.([0-9.]{1,10})’ => ‘WinHTTP’,
‘^WIRE[ /]([0-9.]{1,10})’ => ‘WIRE’,
‘^WMP’ => ‘WMP’,
‘^WordChampBot’ => ‘WordChamp’,
‘WordPress[ /]([0-9.]{1,10})’ => ‘WordPress’,
‘^WorldLight’ => ‘WorldLight’,
‘WorQmada[ /]([0-9.]{1,10})’ => ‘WorQmada’,
‘Wotbox[ /]?[a-z]([0-9.]{1,10})’ => ‘Wotbox’,
‘NetSprint[ /-]{1,4}([0-9.]{1,10})’ => ‘Wirtualna Polska’,
‘WSB WebCrawler V([0-9.]{1,10})’ => ‘WebSearchBench’,
‘WSB ‘ => ‘WebSearchBench’,
‘^wume_crawler[ /]([0-9.]{1,10})’ => ‘WUME Lab’s’,
‘Wusage[ /]([0-9.]{1,10})’ => ‘Wusage’,
‘wwgrapevine[ /]([0-9.]{1,10})’ => ‘WWgrapevine’,
‘WWSBOT[ /]([0-9.]{1,10})’ => ‘WWSBOT’,
‘^www4mail[ /]([0-9.]{1,10})’ => ‘www4mail’,
‘^WWWC[ /]([0-9.]{1,10})’ => ‘WWWC’,
‘^WWWD[ /]([0-9.]{1,10})’ => ‘WWWD’,
‘WWWeasel( Robot)?[/ ]v?([0-9.]{1,10})’ => ‘WWWeasel’,
‘www.fi crawler’ => ‘www.fi’,
‘^WWW-Mechanize[/ ]([0-9.]{1,10})’ => ‘WWW-Mechanize’,
‘^wwwoffle[/ ]([0-9.]{1,10})’ => ‘WWWoffle’,
‘^wwwster[/ ]([0-9.]{1,10})’ => ‘WWWster’,
‘Wysigot[/ ]([0-9.]{1,10})’ => ‘Wysigot’,
‘Xaldon WebSpider’ => ‘Xaldon’,
‘Xenu(’s)? Link Sleuth[/ ]([0-9a-z.]{1,10})’ => ‘Xenu Link Sleuth’,
‘Xenu_Link_Sleuth_([0-9a-z.]{1,10})’ => ‘Xenu Link Sleuth’,
‘^Xerka WebBot v([0-9a-z.]{1,10})’ => ‘Xerka’,
‘^xirq[ /]([0-9a-z.]{1,10})’ => ‘XIRQ’,
‘^XMLSlurp[ /]([0-9a-z.]{1,10})’ => ‘XMLSlurp’,
‘XMLRPC’ => ‘Trackback’,
‘yacy.net’ => ‘Yacy’,
‘Yahoo(! ([a-z]{1,3} )?Slurp|-)’ => ‘Yahoo’,
‘Yahoo-MMCrawler[/ ]([0-9a-z.]{1,10})’ => ‘Yahoo’,
‘Yahoo-VerticalCrawler-FormerWebCrawler[/ ]([0-9a-z.]{1,10})’ => ‘Yahoo’,
‘^AnzwersCrawl[/ ]([0-9a-z.]{1,10})’ => ‘Yahoo’,
‘Y!J(-BSC|-SRD)[/ ]([0-9a-z.]{1,10})’ => ‘Yahoo’,
‘Y!OASIS/TEST’ => ‘Yahoo’,
‘Harvest-NG[/ ]([0-9a-z.]{1,10})’ => ‘Yahoo’,
‘Y!J; for robot study’ => ‘Yahoo’,
‘Yahoo Japan; for robot study’ => ‘Yahoo’,
‘^YahooFeedSeeker[/ ]([0-9a-z.]{1,10})’ => ‘Yahoo Feedseeker’,
‘Yandex[/ ]([0-9.]{1,10})’ => ‘Yandex’,
‘YandexBot/([0-9.]{1,10})’ => ‘Yandex’,
‘^yarienavoir.net[/ ]([0-9.]{1,10})’ => ‘Yarienavoir’,
‘YellCrawl[ /]V?([0-9.]{1,10})’ => ‘Yell’,
‘Yellbot[ /]Nutch-([0-9.]{1,10})’ => ‘Yell’,
‘YodaoBot(-Image|)?[ /]([0-9.]{1,10})’ => ‘Yodao’,
‘yoogliFetchAgent[ /]([0-9.]{1,10})’ => ‘Yoogli’,
‘Yotta(Shopping|Cars)Bot[ /]([0-9.]{1,10})’ => ‘Yotta’,
‘OmniExplorer_Bot[ /]([0-9.]{1,10})’ => ‘Yotta’,
‘Yoono’ => ‘Yoono’,
‘Gulper Web Bot[ /]([0-9.]{1,10})’ => ‘yuntis’,
‘Zao[ /]([0-9.]{1,10})’ => ‘Zao’,
‘Zao-crawler’ => ‘Zao’,
‘Zealbot[ /]([0-9.]{1,10})’ => ‘ZealBot’,
‘Zearchit’ => ‘Zearchit’,
‘^ZeBot(lseek.net|www.ze.bz)’ => ‘ze.bz’,
‘zedzo.digest[ /]([0-9.]{1,10})’ => ‘Zedzo’,
‘^zerxbot[ /](Version|v)*[ /]*([0-9.]{1,10})’ => ‘Zerx’,
‘Zeus’ => ‘Zeus’,
‘ZipppBot[ /]([0-9.]{1,10})’ => ‘Zippp’,
‘^Zippy[ v/]([0-9.]{1,10})’ => ‘Zippy’,
‘Zoekybot[ /]([0-9.]{1,10})’ => ‘Zoeky’,
‘(WISE|Zy)bo(rg|t)[ /]([0-9.]{1,10})’ => ‘WiseNutBot’,
‘^ZoomSpider’ => ‘zspider’,
‘zspider[ /]([0-9.a-z]{1,10})’ => ‘zspider’,
‘Blog[ -]?Bot’ => ‘BlogBot’,
‘holmes[/ ]([0-9.]{1,10})’ => ‘Centrum’,
‘^Centrum-checker’ => ‘Centrum’,
‘HTTP[ -]?Client[ /]([0-9.]{1,10})’ => ‘HTTPClient’,
‘HTTP[ -]?Client’ => ‘HTTPClient’,
‘^IncyWincy[ /]([0-9.]{1,10})’ => ‘IncyWincy’,
‘^IncyWincy’ => ‘IncyWincy’,
‘^java[ /]([0-9.a-z]{1,10})’ => ‘Java’,
‘^(fetch )?libfetch[ /]([0-9.]{1,10})’ => ‘Libfetch’,
‘^libww(w|w-perl|w-FM)[ /]([0-9.]{1,10})’ => ‘libWWW’,
‘^libww(w|w-perl|w-FM)’ => ‘libWWW’,
‘MyApp.*libww(w|w-perl|w-FM)’ => ‘libWWW’,
‘LiteFinder[ /]([0-9.]{1,10})’ => ‘LiteFinder’,
‘Nutc(hOrg|hCVS|h)?[ /]([0-9.]{1,10})’ => ‘Nutch’,
‘Nutch’ => ‘Nutch’,
‘Python[ -]?urllib’ => ‘Python-url’,
‘NASA Search[/ ]([0-9.]{1,10})’ => ‘SPAM’,
‘^PHOTO CHECK’ => ‘SPAM’,
‘^FOTOCHECKER’ => ‘SPAM’,
‘^IPTC CHECK’ => ‘SPAM’,
‘^DataCha0s’ => ‘SPAM’,
‘^Mac Finder’ => ‘SPAM’,
‘^Missigua Locator[ /]([0-9.]{1,10})’ => ‘SPAM’,
‘^Missouri College Browse’ => ‘SPAM’,
‘Email[ -]?Siphon’ => ‘SPAM’,
‘atSpider’ => ‘SPAM’,
‘autoemailspider’ => ‘SPAM’,
‘^Demo Bot’ => ‘SPAM’,
‘^Program Shareware’ => ‘SPAM’,
‘^Snapbot’ => ‘SPAM’,
‘^snap.com’ => ‘SPAM’,
‘^Guestbook Auto Submitter’ => ‘SPAM’,
‘panscient.com’ => ‘SPAM’,
‘(robot|spider|harvest|bot|(?<!msie)crawler)’ => ‘Unknown Robot’<br />
);<br />
?></p>
<p>另外,这个文件包并未记录IP和具体时间,如果你想记录这些关键信息,那么修改ua-searchbots.php这个文件吧,这个有点技术含量,并且需要对如下这种参数进行分析,懂的童鞋就试试吧,不懂的话也无妨,默认的代码基本够用了。</p>
<p><img alt="ec ua cookie" class="alignnone size-full wp-image-1445" src="http://blog.xiaoq.in/cdn/2013/11/ec-ua-cookie.gif" /></p>
<p><?php<br />
/<strong><br />
UA for Search Bots Copyright 2013 Adrian Vender – @adrianvender<br />
</strong>/<br />
/** Is this a bot? **/</p>
<p>// load bot config<br />
require_once(‘botconfig.php’);</p>
<p>// Get the user agent and attempt to match it with an item in the $bots array<br />
$userAgent=$_SERVER['HTTP_USER_AGENT'];<br />
$botname=””;<br />
foreach( $bots as $pattern => $bot ) {<br />
if ( preg_match( ‘#’.$pattern.’#i’ , $userAgent ) == 1 )<br />
{<br />
$botname = preg_replace ( “/\s{1,}/i” , ‘-’ , $bot );<br />
break;<br />
}<br />
}</p>
<p>//Exit GA for Search Bots script if no identified botname exists<br />
if($botname==””) {<br />
return;<br />
}</p>
<p>/** Yes, it’s a bot. Let move forward **/<br />
/** Basic Variable Setup **/</p>
<p>//Setup the UA parameters array<br />
$uaParams = array();</p>
<p>//Set the required Protocol Version<br />
$uaParams['v'] = 1;</p>
<p>//Set the UA accound id<br />
$uaParams['tid'] = $UA_SB_ACCOUNT_ID;</p>
<p>/** End Basic Variable Setup **/<br />
// Generate UUID v4 function – needed to generate a CID when one isn’t available<br />
// Credits for this function goes to Stu Miller – http://www.stumiller.me/implementing-google-analytics-measurement-protocol-in-php-and-wordpress/<br />
function gaGenUUID() {<br />
return sprintf( ‘%04x%04x-%04x-%04x-%04x-%04x%04x%04x’,<br />
// 32 bits for “time_low”<br />
mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff ),</p>
<p>// 16 bits for “time_mid”<br />
mt_rand( 0, 0xffff ),</p>
<p>// 16 bits for “time_hi_and_version”,<br />
// four most significant bits holds version number 4<br />
mt_rand( 0, 0x0fff ) | 0×4000,</p>
<p>// 16 bits, 8 bits for “clk_seq_hi_res”,<br />
// 8 bits for “clk_seq_low”,<br />
// two most significant bits holds zero and one for variant DCE1.1<br />
mt_rand( 0, 0x3fff ) | 0×8000,</p>
<p>// 48 bits for “node”<br />
mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff ), mt_rand( 0, 0xffff )<br />
);<br />
}<br />
function getUACookie() {<br />
if(isset($_COOKIE['_<em>uasearchcid'])) {<br />
return $_COOKIE['_</em>uasearchcid'];<br />
} else {<br />
$newcid = gaGenUUID();<br />
setcookie(‘__uasearchcid’,$newcid,time() + (86400 * 365 * 2)); // 86400 = 1 day<br />
return $newcid;<br />
}<br />
}<br />
// Sends a collection hit to the GA servers<br />
function uaCollectHit($utmUrl) {<br />
$cu = curl_init();<br />
curl_setopt($cu, CURLOPT_RETURNTRANSFER, 1);<br />
curl_setopt($cu, CURLOPT_URL, $utmUrl);<br />
$uaResult = curl_exec($cu);<br />
curl_close($cu);<br />
}</p>
<p>/** Set the parameters **/<br />
// Set a random ‘cid’<br />
$uaParams['cid'] = getUACookie();</p>
<p>// Set the hit type to pageview<br />
$uaParams['t'] = ‘pageview’;</p>
<p>// Get the hostname<br />
$domainName = $_SERVER["SERVER_NAME"];<br />
if ($domainName != ”) {<br />
$uaParams['dh'] = $domainName;<br />
}</p>
<p>// Get the URI of the page<br />
$documentPath = $_SERVER["REQUEST_URI"];<br />
if (empty($documentPath)) {<br />
$documentPath = “”;<br />
} else {<br />
$documentPath = $documentPath;<br />
}<br />
$uaParams['dp'] = $documentPath;</p>
<p>// Get the referrer from the utmr parameter.<br />
$documentReferer = $_SERVER["HTTP_REFERER"];<br />
if (empty($documentReferer) && $documentReferer !== “0″) {<br />
$documentReferer = “-”;<br />
} else {<br />
$documentReferer = $documentReferer;<br />
}<br />
$uaParams['dr'] = $documentReferer;</p>
<p>// Set bot name as campaign source<br />
$uaParams['cs'] = $botname;</p>
<p>// Set the campaign medium to ‘bot’<br />
$uaParams['cm'] = ‘bot’;</p>
<p>// Setup Cache Buster param to prevent caching of requests (using a unix timestamp)<br />
$uaParams['z'] = time();<br />
/** Now let’s prepare and send the data to GA **/</p>
<p>// The UA collection URL<br />
$uaPostLocation = “http://www.google-analytics.com/collect”;</p>
<p>// Set the parameters to a key=value payload<br />
$theParamList = “”;<br />
foreach($uaParams as $key => $value) {<br />
$theParamList .= $key.”=”.$value.”&”;<br />
}</p>
<p>// Construct the gif hit url.<br />
$utmUrl = $uaPostLocation . “?” .$theParamList;</p>
<p>// Finally send the data to GA<br />
uaCollectHit($utmUrl);</p>
<p>/** el fin **/<br />
?></p>
<p>其实呢,这个方法的思路在于我们对GA的数据收集机制了解得比较透彻,一般分析工具收集数据都是通过在网站安装js代码,然后这些js代码再收集相关数据,最终通过发送一个gif文件请求到数据分析工具的服务器。当我们了解这个机制之后,就可以通过PHP等程序将这些数据收集,并附加这些相关参数到具体的参数值后面,从而实现了即便客户端不支持cookie也能够跟踪到数据的目的。</p>
<p>其实,目前针对不支持cookie的移动网站进行数据收集,使用的正是这种方法。因此,如果采取<a href="https://developers.google.com/analytics/devguides/collection/other/mobileWebsites" target="_blank" title="Google Analytics for Mobile Websites">这种方法</a>,移动网站的跟踪数据可能会产生一些蜘蛛爬行的数据,从而影响数据精准性。当然,目前大多数智能手机都是支持cookie的,因此我们可以共用PC端的那个跟踪代码。</p>