Affects: 5.2.7.RELEASE
We just updated from Spring Boot 2.1.3.RELEASE
(which used Spring Framework 5.1.7.RELEASE
) to 2.3.1.RELEASE
(which uses Spring Framework 5.2.7.RELEASE
).
After finishing up and running system tests, we noticed that there are some encoding issues with non-ASCII characters. For example, with the following endpoint (details omitted):
@RequestMapping("/v1")
public interface Api {
@GetMapping("/inquiry")
@ApiOperation(value = "...", httpMethod = "GET",
notes = ".", response = InquiryData.class, tags = {"Inquiry"})
@ApiResponses(value = {@ApiResponse(code = 200, message = "Data", response = InquiryData.class),
@ApiResponse(code = 401, message = "Unauthorized", response = Errors.class),
@ApiResponse(code = 500, message = "Internal Server Error", response = Errors.class)})
InquiryData getInquiryData();
}
Which ordinarily would return something like the following:
{"name":"...","street":"...","zipCode":"...","city":"Münster","country":"DE"}
^ non-ASCII
After updating to 5.2.7.RELEASE
, the following response is attained:
{"name":"...","street":"...","zipCode":"...","city":"Münster","country":"DE"}
We do not have any other specific spring-web
settings for encoding and/or decoding. We do have a custom ObjectMapper
PostProcessor
which configures some de-/serialization properties, but nothing that should impact encoding.
The issue only seems to occur when encoding a response. Decoding a request body seems to work fine even without consumes = MediaType.APPLICATION_JSON_UTF8_VALUE
.
Putting @RequestMapping(value = "/v1", produces = MediaType.APPLICATION_JSON_UTF8_VALUE)
also does seem to fix the issue, but MediaType.APPLICATION_JSON_UTF8_VALUE
has been deprecated since 5.2
. IMO, this means that the default should support UTF-8
out-of-the-box in that case.
I haven't been able to debug the issue yet, but I'll try to do so and add some more technical details to try and narrow the issue down.
Comment From: filpano
For what it's worth, it seems that this issue pertains to the same error.
Comment From: poutsma
UTF-8 should be the default out of the box, so I do not understand what is going on.
Please let us know if you are able to create a complete minimal sample (something that we can unzip or git clone, build, and deploy) that reproduces the problem.
Comment From: filpano
Thanks for the quick reply!
UTF-8 should be the default out of the box, so I do not understand what is going on.
That's what I thought. I'm not sure what's causing the above behaviour.
I'll try to update this issue with a reproducible sample project in the next few days.
Comment From: poutsma
@filpano Note that I did already make some improvements related to JSON character encoding in 5.2.8, so you could try a recent 5.2.8 snapshot and see if that improves things.
Comment From: poutsma
@filpano Note that I did already make some improvements related to JSON character encoding in 5.2.8, so you could try a recent 5.2.8 snapshot and see if that improves things.
Specifically, I fixed #25328, but that only applies when using application/*+json
and reading the JSON response as a string.
Comment From: filpano
I've been able to replicate this issue, but it seems I was initially wrong. I only noticed the error during integration tests using MockMvc
, which is why I was under the impression that it was a more general error.
I've debugged the issue and it seems to be due to the standard encoding of the MockHttpServtletResponse
, which is set to ISO-8859-1
(see: https://github.com/spring-projects/spring-framework/blame/d51ab24a1b2eb1a32afa193c4a1a6ccc4485bd22/spring-test/src/main/java/org/springframework/mock/web/MockHttpServletResponse.java#L84 and https://github.com/spring-projects/spring-framework/blob/d51ab24a1b2eb1a32afa193c4a1a6ccc4485bd22/spring-web/src/main/java/org/springframework/web/util/WebUtils.java#L187).
Since we do not set a specific character encoding during @AutoConfigureMockMvc
tests, it would seem that this default encoding is used, producing the error.
I've verified that during normal runtime, the response code is in UTF-8 as expected. For completeness's sake, I've included a runnable Demo Application which showcases the issue when running tests.
I would be glad to make a PR for this change since it seems fairly straightforward. Seeing how the commit that added that change was 12+ years ago, I imagine this was simply forgotten during the change to UTF-8 as the default encoding.
Comment From: poutsma
I would be glad to make a PR for this change since it seems fairly straightforward. Seeing how the commit that added that change was 12+ years ago, I imagine this was simply forgotten during the change to UTF-8 as the default encoding.
Unfortunately, it is not as simple as changing that default value. For one, there is a lot of code out there that would break if we change the default from ISO-8859-1 to UTF-8. Moreover, MockHttpServletResponse implements a type from the servlet spec, and the latest version 4 of that spec still lists ISO-8859-1 as the default; not UTF-8.
The underlying problem here is that you are testing against JSON contents using a string verifier, which is fragile at best. Instead, use JSONassert by calling the json
method instead of string
. So in the utf8_demo code you submitted:
mockMvc.perform(MockMvcRequestBuilders.get("/special"))
.andExpect(content().json("{\"someString\":\"spücial chäräcters\"}"));
The above runs fine for me.
As an extra bonus, changing to json
made me find a typo in the original expectation string, which missed the closing curly bracket and therefore would have failed even if the character encoding was correct.
Comment From: filpano
Thanks for the investigation. I thought that it might not be quite as simple - Cunningham's Law at work. :)
I guess my expectation was that UTF-8 should be the default in tests as well, but I was mistaken that this should always be the case. It seems that it is the default only when application/json
is implied to be the accepted media type.
Moreover, MockHttpServletResponse implements a type from the servlet spec, and the latest version 4 of that spec still lists ISO-8859-1 as the default; not UTF-8.
Good to know, thanks for the background information.