Affects: 5.2.7.RELEASE


We just updated from Spring Boot 2.1.3.RELEASE (which used Spring Framework 5.1.7.RELEASE) to 2.3.1.RELEASE (which uses Spring Framework 5.2.7.RELEASE).

After finishing up and running system tests, we noticed that there are some encoding issues with non-ASCII characters. For example, with the following endpoint (details omitted):

@RequestMapping("/v1")
public interface Api {

    @GetMapping("/inquiry")
    @ApiOperation(value = "...", httpMethod = "GET",
            notes = ".", response = InquiryData.class, tags = {"Inquiry"})
    @ApiResponses(value = {@ApiResponse(code = 200, message = "Data", response = InquiryData.class),
            @ApiResponse(code = 401, message = "Unauthorized", response = Errors.class),
            @ApiResponse(code = 500, message = "Internal Server Error", response = Errors.class)})
    InquiryData getInquiryData();

}

Which ordinarily would return something like the following:

{"name":"...","street":"...","zipCode":"...","city":"Münster","country":"DE"}
                                                      ^ non-ASCII

After updating to 5.2.7.RELEASE, the following response is attained:

{"name":"...","street":"...","zipCode":"...","city":"Münster","country":"DE"}

We do not have any other specific spring-web settings for encoding and/or decoding. We do have a custom ObjectMapper PostProcessor which configures some de-/serialization properties, but nothing that should impact encoding.

The issue only seems to occur when encoding a response. Decoding a request body seems to work fine even without consumes = MediaType.APPLICATION_JSON_UTF8_VALUE.

Putting @RequestMapping(value = "/v1", produces = MediaType.APPLICATION_JSON_UTF8_VALUE) also does seem to fix the issue, but MediaType.APPLICATION_JSON_UTF8_VALUE has been deprecated since 5.2. IMO, this means that the default should support UTF-8 out-of-the-box in that case.

I haven't been able to debug the issue yet, but I'll try to do so and add some more technical details to try and narrow the issue down.

Comment From: filpano

For what it's worth, it seems that this issue pertains to the same error.

Comment From: poutsma

UTF-8 should be the default out of the box, so I do not understand what is going on.

Please let us know if you are able to create a complete minimal sample (something that we can unzip or git clone, build, and deploy) that reproduces the problem.

Comment From: filpano

Thanks for the quick reply!

UTF-8 should be the default out of the box, so I do not understand what is going on.

That's what I thought. I'm not sure what's causing the above behaviour.

I'll try to update this issue with a reproducible sample project in the next few days.

Comment From: poutsma

@filpano Note that I did already make some improvements related to JSON character encoding in 5.2.8, so you could try a recent 5.2.8 snapshot and see if that improves things.

Comment From: poutsma

@filpano Note that I did already make some improvements related to JSON character encoding in 5.2.8, so you could try a recent 5.2.8 snapshot and see if that improves things.

Specifically, I fixed #25328, but that only applies when using application/*+json and reading the JSON response as a string.

Comment From: filpano

I've been able to replicate this issue, but it seems I was initially wrong. I only noticed the error during integration tests using MockMvc, which is why I was under the impression that it was a more general error.

I've debugged the issue and it seems to be due to the standard encoding of the MockHttpServtletResponse, which is set to ISO-8859-1 (see: https://github.com/spring-projects/spring-framework/blame/d51ab24a1b2eb1a32afa193c4a1a6ccc4485bd22/spring-test/src/main/java/org/springframework/mock/web/MockHttpServletResponse.java#L84 and https://github.com/spring-projects/spring-framework/blob/d51ab24a1b2eb1a32afa193c4a1a6ccc4485bd22/spring-web/src/main/java/org/springframework/web/util/WebUtils.java#L187).

Since we do not set a specific character encoding during @AutoConfigureMockMvc tests, it would seem that this default encoding is used, producing the error.

I've verified that during normal runtime, the response code is in UTF-8 as expected. For completeness's sake, I've included a runnable Demo Application which showcases the issue when running tests.

utf8_demo.zip

I would be glad to make a PR for this change since it seems fairly straightforward. Seeing how the commit that added that change was 12+ years ago, I imagine this was simply forgotten during the change to UTF-8 as the default encoding.

Comment From: poutsma

I would be glad to make a PR for this change since it seems fairly straightforward. Seeing how the commit that added that change was 12+ years ago, I imagine this was simply forgotten during the change to UTF-8 as the default encoding.

Unfortunately, it is not as simple as changing that default value. For one, there is a lot of code out there that would break if we change the default from ISO-8859-1 to UTF-8. Moreover, MockHttpServletResponse implements a type from the servlet spec, and the latest version 4 of that spec still lists ISO-8859-1 as the default; not UTF-8.

The underlying problem here is that you are testing against JSON contents using a string verifier, which is fragile at best. Instead, use JSONassert by calling the json method instead of string. So in the utf8_demo code you submitted:

mockMvc.perform(MockMvcRequestBuilders.get("/special"))
            .andExpect(content().json("{\"someString\":\"spücial chäräcters\"}"));

The above runs fine for me.

As an extra bonus, changing to json made me find a typo in the original expectation string, which missed the closing curly bracket and therefore would have failed even if the character encoding was correct.

Comment From: filpano

Thanks for the investigation. I thought that it might not be quite as simple - Cunningham's Law at work. :)

I guess my expectation was that UTF-8 should be the default in tests as well, but I was mistaken that this should always be the case. It seems that it is the default only when application/json is implied to be the accepted media type.

Moreover, MockHttpServletResponse implements a type from the servlet spec, and the latest version 4 of that spec still lists ISO-8859-1 as the default; not UTF-8.

Good to know, thanks for the background information.