Solr介绍:
Solr是一个独立的企业级搜索应用服务器,Solr基于Lucene的全文搜索服务器,同时对其进行了扩展,提供了比Lucene更为丰富的查询语言,同时实现了可配置、可扩展并对查询性能进行了优化,并且提供了一个完善的功能管理界面,是一款非常优秀的全文搜索引擎。Solr对外提供类似于Web-service的API接口。用户可以通过http请求,向搜索引擎服务器提交一定格式的XML文件/Json/文本等,生成索引;也可以通过Http Get操作提出查找请求,并得到Json格式的返回结果。
项目引入Solr时应该考虑的一些问题:
1、数据更新频率:每天数据增量有多大,随时更新还是定时更新
2、数据总量:数据要保存多长时间
3、一致性要求:期望多长时间内看到更新的数据,最长允许多长时间延迟
4、数据特点:数据源包括哪些,平均单条记录大小
5、业务特点:有哪些排序要求,检索条件
6、资源复用:已有的硬件配置是怎样的,是否有升级计划
SolrCloud:Solr分布式扩展方案
SolrCloud是基于ZooKeeper和Solr的分布式解决方案,为Solr添加分布式功能,用于建立高可用,高伸缩,自动容错,分布式索引,分布式查询的Solr服务器集群
它有几个特色功能:
1)集中式的配置信息
2)自动容错
3)近实时搜索
4)查询时自动负载均衡
Solr 5.5.0 + tomcat 7.0.69 + zookeeper-3.4.6 Cloud部署
(本文因为机器不够,只能在单机环境伪分布式部署,模拟4台真实机器,真实部署可将tomcat/zookeeper的端口不做调整即可)
4台tomcat组成下述部署方案:Collection分成两个Shard分别存储索引信息,每个Shard又分成两个core_node(一主一备)来调配索引最终配置完成结果如下:
(1)
apache-tomcat-7.0.69集群配置:
版本:apache-tomcat-7.0.69
下载:http://tomcat.apache.org/download-70.cgi
位置:/var/local/
数量:4台:/var/local/apache-tomcat-7.0.69-1,/var/local/apache-tomcat-7.0.69-2,/var/local/apache-tomcat-7.0.69-3,/var/local/apache-tomcat-7.0.69-4
说明:主要调整单机环境下tomcat的端口冲突
apache-tomcat-7.0.69-1
sudo vi /var/local/apache-tomcat-7.0.69-1/conf/server.xml
{
<Server port="18005" shutdown="SHUTDOWN">
<Connector port="18080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
<Connector port="18009" protocol="AJP/1.3" redirectPort="8443" />
}
apache-tomcat-7.0.69-2
{
sudo vi /var/local/apache-tomcat-7.0.69-2/conf/server.xml
<Server port="28005" shutdown="SHUTDOWN">
<Connector port="28080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
<Connector port="28009" protocol="AJP/1.3" redirectPort="8443" />
}
apache-tomcat-7.0.69-3
sudo vi /var/local/apache-tomcat-7.0.69-3/conf/server.xml
{
<Server port="38005" shutdown="SHUTDOWN">
<Connector port="38080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
<Connector port="38009" protocol="AJP/1.3" redirectPort="8443" />
}
apache-tomcat-7.0.69-4
sudo vi /var/local/apache-tomcat-7.0.69-4/conf/server.xml
{
<Server port="48005" shutdown="SHUTDOWN">
<Connector port="48080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
<Connector port="48009" protocol="AJP/1.3" redirectPort="8443" />
}
(2)
zookeeper-3.4.6集群配置:
版本:zookeeper-3.4.6
下载:https://zookeeper.apache.org/releases.html#download
位置:/var/local/
数量:4台:/var/local/zookeeper-3.4.6-1,/var/local/zookeeper-3.4.6-2,/var/local/zookeeper-3.4.6-3,/var/local/zookeeper-3.4.6-4
说明:主要调整单机环境下zookeeper的端口及目录冲突
zookeeper-3.4.6-1
sudo mkdir -p /var/local/zookeeper-3.4.6-1/data
sudo mkdir -p /var/local/zookeeper-3.4.6-1/data/log
echo 1 > /var/local/zookeeper-3.4.6-1/data/myid
(sudo vi /var/local/zookeeper-3.4.6-1/data/myid
{1})
cd /var/local/zookeeper-3.4.6-1/conf/ &&sudo mv zoo_sample.cfg zoo.cfg &&sudo vi zoo.cfg
{
clientPort=2181
dataDir=/var/local/zookeeper-3.4.6-1/data
dataLogDir=/var/local/zookeeper-3.4.6-1/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}
zookeeper-3.4.6-2
sudo cp -r /var/local/zookeeper-3.4.6-1 /var/local/zookeeper-3.4.6-2
sudo vi /var/local/zookeeper-3.4.6-2/data/myid
{2}
sudo vi /var/local/zookeeper-3.4.6-2/conf/zoo.cfg
{
clientPort=2182
dataDir=/var/local/zookeeper-3.4.6-2/data
dataLogDir=/var/local/zookeeper-3.4.6-2/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}
zookeeper-3.4.6-3
sudo cp -r /var/local/zookeeper-3.4.6-1 /var/local/zookeeper-3.4.6-3
sudo vi /var/local/zookeeper-3.4.6-3/data/myid
{3}
sudo vi /var/local/zookeeper-3.4.6-3/conf/zoo.cfg
{
clientPort=2183
dataDir=/var/local/zookeeper-3.4.6-3/data
dataLogDir=/var/local/zookeeper-3.4.6-3/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}
zookeeper-3.4.6-4
sudo cp -r /var/local/zookeeper-3.4.6-1 /var/local/zookeeper-3.4.6-4
sudo vi /var/local/zookeeper-3.4.6-4/data/myid
{4}
sudo vi /var/local/zookeeper-3.4.6-4/conf/zoo.cfg
{
clientPort=2184
dataDir=/var/local/zookeeper-3.4.6-4/data
dataLogDir=/var/local/zookeeper-3.4.6-4/data/log
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:2889:3889
server.3=127.0.0.1:2890:3890
server.4=127.0.0.1:2891:3891
}
(3)
solr-5.5.0集群配置:
版本:solr-5.5.0
下载:https://mirrors.tuna.tsinghua.edu.cn/apache/lucene/solr/5.5.0/
位置:/var/local/apache-tomcat-7.0.69-1~4
公有配置文件:/var/local/cloud_conf
数量:4台:/var/local/apache-tomcat-7.0.69-1,/var/local/apache-tomcat-7.0.69-2,/var/local/apache-tomcat-7.0.69-3,/var/local/apache-tomcat-7.0.69-4
Solr WEB系统部署:
sudo cp -r ~/solr_cloud/solr-5.5.0/server/solr-webapp/webapp /var/local/apache-tomcat-7.0.69-1/webapps/solr
sudo cp -r ~/solr_cloud/solr-5.5.0/server/lib/ext/* /var/local/apache-tomcat-7.0.69-1/webapps/solr/WEB-INF/lib
(其他需要用的jar包自行复制即可:~/solr_cloud/solr-5.5.0/dist/)
sudo cp -r ~/solr_cloud/solr-5.5.0/server/resources/log4j.properties /var/local/apache-tomcat-7.0.69-1/webapps/solr/WEB-INF/classes
(classes若不存在则手动建立)
solr-5.5.0-1
sudo mkdir -p /var/local/apache-tomcat-7.0.69-1/solr_home/
sudo cp ~/solr_cloud/solr-5.5.0/example/example-DIH/solr/solr.xml /var/local/apache-tomcat-7.0.69-1/solr_home/
sudo vi /var/local/apache-tomcat-7.0.69-1/webapps/solr/WEB-INF/web.xml
{
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/var/local/apache-tomcat-7.0.69-1/solr_home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
}
sudo vi /var/local/apache-tomcat-7.0.69-1/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -Dbootstrap_confdir=/var/local/cloud_conf -Dcollection.configName=myconf -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184 -DnumShards=4"
}
sudo vi /var/local/apache-tomcat-7.0.69-1/solr_home/solr.xml
{
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">18080</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:15000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
}
solr-5.5.0-2
sudo mkdir -p /var/local/apache-tomcat-7.0.69-2/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/solr_home/* /var/local/apache-tomcat-7.0.69-2/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/webapps/solr /var/local/apache-tomcat-7.0.69-2/webapps/
sudo vi /var/local/apache-tomcat-7.0.69-2/webapps/solr/WEB-INF/web.xml
{
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/var/local/apache-tomcat-7.0.69-2/solr_home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
}
sudo vi /var/local/apache-tomcat-7.0.69-2/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184"
}
sudo vi /var/local/apache-tomcat-7.0.69-2/solr_home/solr.xml
{
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">28080</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:15000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
}
solr-5.5.0-3
sudo mkdir -p /var/local/apache-tomcat-7.0.69-3/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/solr_home/* /var/local/apache-tomcat-7.0.69-3/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/webapps/solr /var/local/apache-tomcat-7.0.69-3/webapps/
sudo vi /var/local/apache-tomcat-7.0.69-3/webapps/solr/WEB-INF/web.xml
{
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/var/local/apache-tomcat-7.0.69-3/solr_home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
}
sudo vi /var/local/apache-tomcat-7.0.69-3/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184"
}
sudo vi /var/local/apache-tomcat-7.0.69-3/solr_home/solr.xml
{
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">38080</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:15000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
}
solr-5.5.0-4
sudo mkdir -p /var/local/apache-tomcat-7.0.69-4/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/solr_home/* /var/local/apache-tomcat-7.0.69-4/solr_home/
sudo cp -r /var/local/apache-tomcat-7.0.69-1/webapps/solr /var/local/apache-tomcat-7.0.69-4/webapps/
sudo vi /var/local/apache-tomcat-7.0.69-4/webapps/solr/WEB-INF/web.xml
{
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>/var/local/apache-tomcat-7.0.69-4/solr_home</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
}
sudo vi /var/local/apache-tomcat-7.0.69-4/bin/catalina.sh
{
JAVA_OPTS="$JAVA_OPTS -DzkHost=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184"
}
sudo vi /var/local/apache-tomcat-7.0.69-4/solr_home/solr.xml
{
<solrcloud>
<str name="host">${host:}</str>
<int name="hostPort">48080</int>
<str name="hostContext">${hostContext:solr}</str>
<int name="zkClientTimeout">${zkClientTimeout:15000}</int>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<str name="zkHost">127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183,127.0.0.1:2184</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory"
class="HttpShardHandlerFactory">
<int name="socketTimeout">${socketTimeout:0}</int>
<int name="connTimeout">${connTimeout:0}</int>
</shardHandlerFactory>
}
(5)
启动及说明:
启动tomcat集群
sudo /var/local/apache-tomcat-7.0.69-1/bin/startup.sh &&sudo /var/local/apache-tomcat-7.0.69-2/bin/startup.sh &&sudo /var/local/apache-tomcat-7.0.69-3/bin/startup.sh &&sudo /var/local/apache-tomcat-7.0.69-4/bin/startup.sh
关闭tomcat集群
sudo /var/local/apache-tomcat-7.0.69-1/bin/shutdown.sh &&sudo /var/local/apache-tomcat-7.0.69-2/bin/shutdown.sh &&sudo /var/local/apache-tomcat-7.0.69-3/bin/shutdown.sh &&sudo /var/local/apache-tomcat-7.0.69-4/bin/shutdown.sh
启动zookeeper集群
sudo /var/local/zookeeper-3.4.6-1/bin/zkServer.sh restart && sudo /var/local/zookeeper-3.4.6-2/bin/zkServer.sh restart &&sudo /var/local/zookeeper-3.4.6-3/bin/zkServer.sh restart &&sudo /var/local/zookeeper-3.4.6-4/bin/zkServer.sh restart
sudo /var/local/zookeeper-3.4.6-1/bin/zkServer.sh stop && sudo /var/local/zookeeper-3.4.6-2/bin/zkServer.sh stop &&sudo /var/local/zookeeper-3.4.6-3/bin/zkServer.sh stop
查看zookeeper集群状态
sudo /var/local/zookeeper-3.4.6-1/bin/zkServer.sh status &&sudo /var/local/zookeeper-3.4.6-2/bin/zkServer.sh status &&sudo /var/local/zookeeper-3.4.6-3/bin/zkServer.sh status &&sudo /var/local/zookeeper-3.4.6-4/bin/zkServer.sh status
访问Solr Web系统:(192.168.5.48即本机IP)
http://192.168.5.48:18080/solr/index.html
http://192.168.5.48:28080/solr/index.html
http://192.168.5.48:38080/solr/index.html
http://192.168.5.48:48080/solr/index.html
上述地址均可访问及管理Solr Web系统
说明:
1)solr-5.5.0中会出现在Collection中点击query命令时,误将地址栏 / 转义为 %2F 的bug,如下:
http://192.168.5.48:18080/solr/test1%2Fselect?indent=on&q=*:*&wt=json
2)solr-5.5.0中自带的zookeeper-3.4.6.jar,因此建议zookeeper选用-3.4.6版本的
3)solr-5以上的schema由managed-schema通过API管理,在solrconfig.xml中可以查看到:
<!-- To disable dynamic schema REST APIs, use the following for <schemaFactory>:
<schemaFactory class="ClassicIndexSchemaFactory"/>
When ManagedIndexSchemaFactory is specified instead, Solr will load the schema from
the resource named in 'managedSchemaResourceName', rather than from schema.xml.
Note that the managed schema resource CANNOT be named schema.xml. If the managed
schema does not exist, Solr will create it after reading schema.xml, then rename
'schema.xml' to 'schema.xml.bak'.
Do NOT hand edit the managed schema - external modifications will be ignored and
overwritten as a result of schema modification REST API calls.
When ManagedIndexSchemaFactory is specified with mutable = true, schema
modification REST API calls will be allowed; otherwise, error responses will be
sent back for these requests.
-->
<schemaFactory class="ManagedIndexSchemaFactory">
<bool name="mutable">true</bool>
<str name="managedSchemaResourceName">managed-schema</str>
</schemaFactory>
4)schema部分说明:
Stored:字段值会以保存一份原始内容在在索引中,可以被搜索组件组件返回,考虑到性能问题,对于长文本就不适合存储在索引中。
Indexed:表示字段会加被Sorl处理加入到索引中,只有被索引的字段才能被搜索到。
docValues: 表示此域是否需要添加一个docValues域,这对facet查询,group分组,排序,function查询有好处,能加快索引数据加载,对NRT近实时搜索比较友好,且更节省内存,但它也有一些限制,比如当前docValues域只支持strField,UUIDField,Trie*Field等域,且要求域的域值是单值不能是多值域
multiValued: 表示这个域是否可以存储多个值
omitNorms: 此属性若设置为true,即表示将忽略域值的长度标准化,忽略在索引过程中对当前域的权重设置,且会节省内存。
只有全文本域或者你需要在索引创建过程中设置域的权重时才需要把这个值设为false,对于基本数据类型且不分词的域如intFeild,longField,StrField等默认此属性值就是true,否则默认就是false.
omitPositions=true|false如果设置,省略掉term vector中的地址信息
omitTermFreqAndPositions=true|false 如果设置,省略掉freq和term vector中的地址信息
termVectors: 设置为true即表示需要为该field存储项向量信息,当你需要MoreLikeThis功能时,则需要将此属性值设为true,这样会带来一些性能提升。
termPositions: 是否存储Term的起始位置信息,这会增大索引的体积,但高亮功能需要依赖此项设置,否则无法高亮
termOffsets: 表示是否存储索引的位置偏移量,高亮功能需要此项配置,当你使用SpanQuery时,此项配置会影响匹配的结果集
sortMissingLast表示如果域值为null,在根据当前域进行排序时,把包含null值的document排在最后一位
sortMissingFirst:表示如果域值为null,在根据当前域进行排序时,把包含null值的document排在前面一位