When a Redis (non-cluster mode) server requiring authentication is restarted the Spring Boot Actuator RedisHealthIndicator
seems unable to recover. It repeatedly logs:
i.l.c.RedisCommandExecutionException: NOAUTH Authentication required.
at i.l.c.ExceptionFactory.createExecutionException(ExceptionFactory.java:135)
at i.l.c.ExceptionFactory.createExecutionException(ExceptionFactory.java:108)
at i.l.c.p.AsyncCommand.completeResult(AsyncCommand.java:120)
at i.l.c.p.AsyncCommand.complete(AsyncCommand.java:111)
at i.l.c.p.CommandWrapper.complete(CommandWrapper.java:59)
at i.l.c.p.CommandWrapper.complete(CommandWrapper.java:59)
at i.l.c.p.CommandHandler.complete(CommandHandler.java:654)
at i.l.c.p.CommandHandler.decode(CommandHandler.java:614)
at i.l.c.p.CommandHandler.channelRead(CommandHandler.java:565)
at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at i.n.c.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at i.n.h.ssl.SslHandler.unwrap(SslHandler.java:1478)
at i.n.h.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1227)
at i.n.h.ssl.SslHandler.decode(SslHandler.java:1274)
at i.n.h.c.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:503)
at i.n.h.c.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:442)
at i.n.h.c.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:281)
at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at i.n.c.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
at i.n.c.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
at i.n.c.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
at i.n.c.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
at i.n.c.e.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:792)
at i.n.c.e.EpollEventLoop.processReady(EpollEventLoop.java:502)
at i.n.c.e.EpollEventLoop.run(EpollEventLoop.java:407)
at i.n.u.c.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1050)
at i.n.u.i.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at i.n.u.c.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 common frames omitted
Wrapped by: o.s.d.r.RedisSystemException: Error in execution; nested exception is io.lettuce.core.RedisCommandExecutionException: NOAUTH Authentication required.
at o.s.d.r.c.l.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:54)
at o.s.d.r.c.l.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:52)
at o.s.d.r.c.l.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
at o.s.d.r.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
at o.s.d.r.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
at o.s.d.r.c.l.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:270)
at o.s.d.r.c.l.LettuceServerCommands.convertLettuceAccessException(LettuceServerCommands.java:571)
at o.s.d.r.c.l.LettuceServerCommands.info(LettuceServerCommands.java:215)
at o.s.d.r.c.DefaultedRedisConnection.info(DefaultedRedisConnection.java:1291)
at o.s.b.a.r.RedisHealthIndicator.doHealthCheck(RedisHealthIndicator.java:64)
at o.s.b.a.h.AbstractHealthIndicator.health(AbstractHealthIndicator.java:82)
at o.s.b.a.h.HealthIndicator.getHealth(HealthIndicator.java:37)
at o.s.b.a.h.HealthEndpointWebExtension.getHealth(HealthEndpointWebExtension.java:95)
at o.s.b.a.h.HealthEndpointWebExtension.getHealth(HealthEndpointWebExtension.java:43)
at o.s.b.a.h.HealthEndpointSupport.getContribution(HealthEndpointSupport.java:108)
at o.s.b.a.h.HealthEndpointSupport.getAggregateHealth(HealthEndpointSupport.java:119)
at o.s.b.a.h.HealthEndpointSupport.getContri...
After initially losing the connection and having it restored:
December 21st 2019, 22:33:31.857 lettuce-eventExecutorLoop-1-3 INFO io.lettuce.core.protocol.ConnectionWatchdog Reconnecting, last destination was REDACTED.apse1.cache.amazonaws.com/aaa.bbb.ccc.ddd:6379
December 21st 2019, 22:33:31.891 lettuce-epollEventLoop-4-2 WARN io.lettuce.core.protocol.ConnectionWatchdog Cannot reconnect to [REDACTED.apse1.cache.amazonaws.com:6379]: java.nio.channels.ClosedChannelException
December 21st 2019, 22:33:32.016 lettuce-epollEventLoop-4-3 INFO io.lettuce.core.protocol.ReconnectionHandler Reconnected to REDACTED.apse1.cache.amazonaws.com:6379
December 21st 2019, 22:33:37.615 XNIO-2 task-12 WARN org.springframework.boot.actuate.redis.RedisHealthIndicator Redis health check failed
...
repeated warnings
Configuration is standard
spring.redis.host=blah
spring.redis.port=blah
spring.redis.password=blah
spring.redis.ssl=true
spring.redis.timeout=5s
Environment Spring Boot 2.2.2.RELEASE Spring Data Redis 2.2.3.RELEASE Spring Framework 5.2.2.RELEASE Lettuce 5.2.1.RELEASE AWS ElastiCache for Redis (underlying Redis is 5.0.3) with authentication enabled
Possibly this is an issue with the underlying Spring Data Redis or Lettuce configuration, but we were a bit surprised that this wasn't working/auto-recovering and re-authenticating out of the box.
Since we use the Actuator endpoint as a Kubernetes readinessProbe
this meant traffic was not sent to the application and it was not able to self-heal.
Comment From: snicoll
Thanks for the report. What version of Spring Boot are you using?
Comment From: chadlwilson
Hi @snicoll - 2.2.2.RELEASE
- and here's the relevant part of our dependency tree
+--- org.springframework.boot:spring-boot-starter-data-redis -> 2.2.2.RELEASE
| +--- org.springframework.boot:spring-boot-starter:2.2.2.RELEASE (*)
| +--- org.springframework.data:spring-data-redis:2.2.3.RELEASE
| | +--- org.springframework.data:spring-data-keyvalue:2.2.3.RELEASE
| | | +--- org.springframework.data:spring-data-commons:2.2.3.RELEASE (*)
| | | +--- org.springframework:spring-context:5.2.2.RELEASE (*)
| | | +--- org.springframework:spring-tx:5.2.2.RELEASE (*)
| | | \--- org.slf4j:slf4j-api:1.7.26 -> 1.7.29
| | +--- org.springframework:spring-tx:5.2.2.RELEASE (*)
| | +--- org.springframework:spring-oxm:5.2.2.RELEASE
| | | +--- org.springframework:spring-beans:5.2.2.RELEASE (*)
| | | \--- org.springframework:spring-core:5.2.2.RELEASE (*)
| | +--- org.springframework:spring-aop:5.2.2.RELEASE (*)
| | +--- org.springframework:spring-context-support:5.2.2.RELEASE
| | | +--- org.springframework:spring-beans:5.2.2.RELEASE (*)
| | | +--- org.springframework:spring-context:5.2.2.RELEASE (*)
| | | \--- org.springframework:spring-core:5.2.2.RELEASE (*)
| | \--- org.slf4j:slf4j-api:1.7.26 -> 1.7.29
| \--- io.lettuce:lettuce-core:5.2.1.RELEASE
| +--- io.netty:netty-common:4.1.43.Final
| +--- io.netty:netty-handler:4.1.43.Final (*)
| +--- io.netty:netty-transport:4.1.43.Final (*)
| \--- io.projectreactor:reactor-core:3.3.0.RELEASE -> 3.3.1.RELEASE (*)
Also, if it helps for wider context; this is against the AWS ElastiCache Redis implementation. We noticed this problem after scheduling Redis maintenance/service updates in non-prod environments.
Comment From: snicoll
@chadlwilson unfortunately I can't reproduce the problem. To help us identify where the problem might come from, can you please share a small sample (zip or github repo) with instructions on how the problem can be reproduced?
Comment From: spring-projects-issues
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Comment From: chadlwilson
I've been struggling to reproduce this on a small sample with a local Redis. There are a lot of variables to eliminate which is a bit difficult without knowing all the underlying Spring Boot Actuator/Spring Data Redis/Lettuce magic. I'm thinking it's also possible that the issue lies with the way AWS ElasticCache behaves; and/or a particular state of pooled connections. :-/
Comment From: snicoll
Thanks for trying Chad. Perhaps @mp911de has an idea?
Comment From: chadlwilson
I might raise an issue on the Lettuce GitHub; since after looking through a lot of the code through Spring Boot and Spring Data Redis, it seems this behaviour is intended to be fully managed by the Lettuce ConnectionWatchdog
that handles reconnects of the single multiplexed connection.
It seems after reconnect it has left the StatefulRedisConnection
in an un-authenticated state so subsequent uses by the RedisHealthIndicator
fail.
Comment From: snicoll
That’s my feeling as well. Please hold on creating a duplicate to avoid efforts being spent in two places. I’ve pinged the lead of the lettuce driver so we should get additional feedback soon.
Comment From: mp911de
I tried to reproduce the issue without luck so far.
Lettuce attempts to reconnect using the initially provided password and the reconnect process does not report failures when the password changed during reconnects. If you can reproduce that issue in any environment, please raise the log level of io.lettuce.core.protocol
to DEBUG
or TRACE
(caution: creates lots of noise and performance drag on busy systems).
That issue isn't Spring Boot related, more likely it is Lettuce-related.
Comment From: snicoll
Thanks for the feedback. I am closing this issue now.
Please create an issue in the lettuce issue tracker if you manage to reproduce with the request logging information.