最近环境上发现redis集群某个节点挂掉之后,应用的健康检查就一直无法正常。由于应用是基于kubernetes部署的,并且采用/actuator/health/readiness作为就绪探针,所以应用一直无法正常访问,直到该redis节点重启才恢复正常。
因为之前redis集群也出现过类似问题,但是相应的应用也只是短暂影响,很快就恢复正常了。
本地简单跑一下该应用,的确redis节点如果挂掉(无论哪个),健康检查就一直返回DOWN。
哪里出问题了? 只好先对比一下两个应用在实现上的差异,从业务代码集成redis这块分析是没有问题的,一时没找到具体原因。 突然我想起最近这个应用是基于spring boot 2.3.x的,要么再对比一下健康检查的实现差异!
经过对比发现,2.3.x版本和2.2.x版本在RedisReactiveHealthIndicator没有明显差别。但2.4.x就略有变化。
private Mono<Health> doHealthCheck(Health.Builder builder, ReactiveRedisConnection connection) {
boolean isClusterConnection = connection instanceof ReactiveRedisClusterConnection;
return connection.serverCommands().info("server").map((info) -> up(builder, info, isClusterConnection))
.onErrorResume((ex) -> Mono.just(down(builder, ex)))
.flatMap((health) -> connection.closeLater().thenReturn(health));
}
private Mono<Health> doHealthCheck(Health.Builder builder, ReactiveRedisConnection connection) {
return getHealth(builder, connection).onErrorResume((ex) -> Mono.just(builder.down(ex).build()))
.flatMap((health) -> connection.closeLater().thenReturn(health));
}
private Mono<Health> getHealth(Health.Builder builder, ReactiveRedisConnection connection) {
if (connection instanceof ReactiveRedisClusterConnection) {
return ((ReactiveRedisClusterConnection) connection).clusterGetClusterInfo()
.map((info) -> up(builder, info));
}
return connection.serverCommands().info("server").map((info) -> up(builder, info));
}
2.4.x版本对于集群方式,实现方式有些变化,最终根据提交发现了原因:2.4.0修复了一个BUG
变更描述信息如下:
Prior to Spring Data Redis version 2.2.8, the contents of the
Properties object returned from the
ReactiveRedisConnection.ServerCommands.info API were the same
for clustered and non-clustered Redis configurations, containing a set
of key/value pairs. This allowed ReactiveRedisHealthIndicator to get
a version property using a well-known key. Starting with Spring Data
Redis 2.2.8, the info property keys contain a host:port prefix in a
clustered Redis configuration. This prevented
ReactiveRedisHealthIndicator from getting the version property as
before and resulted in the health always being reported as DOWN.
This commit adjusts ReactiveRedisHealthIndicator to detect the
clustered configuration from Spring Data Redis and find the version
property for one of the reported cluster nodes.
这样我就理解了,spring-data-redis 2.2.8之后就破坏了原来的代码实现。而原来用的应用使用的2.2.8之前版本,所以没有发现问题。既然这样,就只能参考最新代码定制一个健康检查进行修复,等后续升级2.4.x之后再进行调整。
最终总结一下,如何才能在Spring Boot中确保Redis高可用可以生效。