Hi,

recently the following tests started to fail sporadically:

  • LogFileWebEndpointAutoConfigurationTests.logFileWebEndpointIsAutoConfiguredWhenExternalFileIsSet
  • DiskSpaceHealthContributorAutoConfigurationTests.runWhenPathDoesNotExistShouldCreateIndicator

Both seem to be throwing similar errors like this:

Caused by: org.springframework.core.convert.ConversionFailedException: Failed to convert from type [java.lang.String] to type [java.io.File] for value 'external.log'; nested exception is java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [external.log]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
    at org.springframework.core.convert.support.ConversionUtils.invokeConverter(ConversionUtils.java:47)
    at org.springframework.core.convert.support.GenericConversionService.convert(GenericConversionService.java:191)
    at org.springframework.boot.context.properties.bind.BindConverter$CompositeConversionService.convert(BindConverter.java:170)
    at org.springframework.boot.context.properties.bind.BindConverter.convert(BindConverter.java:96)
    at org.springframework.boot.context.properties.bind.BindConverter.convert(BindConverter.java:88)
    at org.springframework.boot.context.properties.bind.Binder.bindProperty(Binder.java:434)
    at org.springframework.boot.context.properties.bind.Binder.bindObject(Binder.java:379)
    at org.springframework.boot.context.properties.bind.Binder.bind(Binder.java:319)
    ... 142 more
 Caused by: java.lang.IllegalStateException: Illegal access: this web application instance has been stopped already. Could not load [external.log]. The following stack trace is thrown for debugging purposes as well as to attempt to terminate the thread which caused the illegal access.
    at org.apache.catalina.loader.WebappClassLoaderBase.checkStateForResourceLoading(WebappClassLoaderBase.java:1385)
    at org.apache.catalina.loader.WebappClassLoaderBase.getResource(WebappClassLoaderBase.java:1038)
    at org.springframework.core.io.ClassPathResource.resolveURL(ClassPathResource.java:155)
    at org.springframework.core.io.ClassPathResource.exists(ClassPathResource.java:142)
    at org.springframework.boot.convert.StringToFileConverter.convert(StringToFileConverter.java:48)
    at org.springframework.boot.convert.StringToFileConverter.convert(StringToFileConverter.java:34)
    at org.springframework.core.convert.support.GenericConversionService$ConverterAdapter.convert(GenericConversionService.java:385)
    at org.springframework.core.convert.support.ConversionUtils.invokeConverter(ConversionUtils.java:41)

I tried fixing it directly, but I can't see what's going wrong in these cases.

Cheers, Christoph

Comment From: wilkinsona

Thanks, @dreis2211. I have a theory for this one but I've yet to put it to the test. My guess is that there's an earlier test that's using Tomcat and is polluting the thread context class loader. The flakiness is due to test ordering and Gradle not using the same order across runs.

Comment From: dreis2211

Yeah, it must be something like this - at least that's the direction I was looking for as well - but I couldn't find it.

Comment From: wilkinsona

I think I've managed to get to the bottom of this. The problem is in StringToFileConverter:

https://github.com/spring-projects/spring-boot/blob/b8fbe395ce26eeb3a7c5d582346e78b8b93c1fb4/spring-boot-project/spring-boot/src/main/java/org/springframework/boot/convert/StringToFileConverter.java#L34-L37

When the class is first loaded and static initialization is performed, new DefaultResourceLoader() is called. The behaviour of the default constructor is such that the resource loader will use "the thread context class loader at the time of this ResourceLoader's initialization". Depending on when StringToFileConverter is first loaded affects which ClassLoader that it captures. When the tests pass it captures the app ClassLoader. When the failure occurs it captures a TomcatEmbeddedWebappClassLoader.

It looks to me like calling new DefaultResourceLoader() and storing the result in a static field is rather dangerous. If it is called with a thread context class loader that is not the same as StringToFileConverter's ClassLoader then the DefaultResourceLoader instance will attempt to use a class loader that may have an entirely different lifecycle to its own. This could cause a memory leak due to a ClassLoader being referenced for longer than it should be or a failure like the one reported above where the ClassLoader has been closed but the DefaultResourceLoader is still attempting to use it.

Comment From: dreis2211

Wow....wouldn't have thought of something like that as the reason.

Comment From: wilkinsona

It looks to me like calling new DefaultResourceLoader() and storing the result in a static field is rather dangerous.

The danger is broader than this. Updating StringToFileConverter to make the resourceLoader field instance-scoped does not fix the problem. It doesn't fix the problem as there's still a static reference involved, it's just indirect now. In this case the static reference is the sharedInstance field of ApplicationConversionService. This field statically holds an ApplicationConversionService instance with which a StringToFileConverter is registered.

As I now understand it new DefaultResourceLoader() is dangerous whenever the result may be statically referenced either directly or indirectly. Given the difficulty of being certain that such a reference won't occur, I think any instance of DefaultResourceLoader that's stored in a field needs to be created with a specific ClassLoader.

Comment From: wilkinsona

The shared instance of ApplicationConversionService is created very early in all Spring Boot applications. It is created when SpringApplication creates the environment:

"main" #1 prio=5 os_prio=31 tid=0x00007fcfa300a000 nid=0x2603 at breakpoint[0x00007000063eb000]
   java.lang.Thread.State: RUNNABLE
    at org.springframework.core.io.DefaultResourceLoader.<init>(DefaultResourceLoader.java:64)
    at org.springframework.boot.convert.StringToFileConverter.<clinit>(StringToFileConverter.java:36)
    at org.springframework.boot.convert.ApplicationConversionService.addApplicationConverters(ApplicationConversionService.java:118)
    at org.springframework.boot.convert.ApplicationConversionService.configure(ApplicationConversionService.java:100)
    at org.springframework.boot.convert.ApplicationConversionService.<init>(ApplicationConversionService.java:61)
    at org.springframework.boot.convert.ApplicationConversionService.<init>(ApplicationConversionService.java:54)
    at org.springframework.boot.convert.ApplicationConversionService.getSharedInstance(ApplicationConversionService.java:80)
    - locked <0x0000000718a87000> (a java.lang.Class for org.springframework.boot.convert.ApplicationConversionService)
    at org.springframework.boot.SpringApplication.configureEnvironment(SpringApplication.java:478)
    at org.springframework.boot.SpringApplication.prepareEnvironment(SpringApplication.java:343)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:308)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1237)
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226)
    at smoketest.tomcat.SampleTomcatApplication.main(SampleTomcatApplication.java:52)

At this point the thread context class loader that is captured will be the app class loader (when running exploded) or the launched URL class loader (when running from a fat jar). In both cases this will also be the class loader that loaded StringtoFileConverter and DefaultResourceLoader. As such, replacing new DefaultResourceLoader() with new DefaultResourceLoader(StringToFileConverter.class.getClassLoader()), will leave the behaviour unchanged when running main code and address the problem during testing when a Tomcat-based test runs first.

The alternative option is to use new DefaultResourceLoader(null). This causes the loader to use the TCCL that's in place at the time of resource access. This would result in a potentially much bigger change in behaviour and should, therefore, be avoided.

Comment From: dreis2211

Who would have thought that this turns out be fairly complicated? Nice findings.