Thorsten Schöning
2018-11-26 13:45:56 UTC
Hi all,
I'm currently testing migration of a legacy web app from Tomcat 7 to 8
to 8.5 and ran into problems regarding character encoding in 8.5 only.
That app uses JSP pages and declares all of those to be stored in
UTF-8, does really do so :-), and declares a HTTP-Content type of
"text/html; charset=UTF-8" as well. Textual content at HTML-level is
properly encoded using UTF-8 and looks properly in the browser etc.
attribute of some HTML-link and the character "ä" is replaced
somewhere with the Unicode character REPLACEMENT CHARACTER 0xFFFD. But
really only in Tomcat 8.5, not in 8 and not in 7.
I can fix that problem using either "SetCharacterEncodingFilter" or
the generated code seems to be the same in all versions as well,
especially regarding "request.getCharacterEncoding()".
"getCharacterEncoding" in Tomcat 8.8 has changed, the former
in all versions of Tomcat, but that doesn't make a difference to 8.5.
So I understand that using "setCharacterEncoding", I set the value
actually used in the generated Java now, even though the following is
Now I'm wondering about multiple things...
1. Doesn't "getCharacterEncoding" provide the encoding of the
HTTP-body? My JSP is called using GET and the Java quoted above
seems to build a query string as well. So why does it depend on
some body encoding instead of e.g. URIEncoding of the connector?
2. Is my former approach wrong or did changes in Tomcat 8.5 introduce
some regression? There is some conversion somewhere which was not
present in the past.
3. What is the correct fix I need now? The character encoding filter,
even though it only applies to bodies per documentation?
Thanks!
Mit freundlichen Grüßen,
Thorsten Schöning
--
Thorsten Schöning E-Mail: ***@AM-SoFT.de
AM-SoFT IT-Systeme http://www.AM-SoFT.de/
Telefon...........05151- 9468- 55
Fax...............05151- 9468- 88
Mobil..............0178-8 9468- 04
AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
I'm currently testing migration of a legacy web app from Tomcat 7 to 8
to 8.5 and ran into problems regarding character encoding in 8.5 only.
That app uses JSP pages and declares all of those to be stored in
UTF-8, does really do so :-), and declares a HTTP-Content type of
"text/html; charset=UTF-8" as well. Textual content at HTML-level is
properly encoded using UTF-8 and looks properly in the browser etc.
<jsp:include page="/WEB-INF/jsp/includes/search.jsp">
<jsp:param name="chooseSearchInputTitle"
value="Benutzer wählen"
/>
</jsp:include>
"search.jsp" simply outputs the value of the param as the "title"<jsp:param name="chooseSearchInputTitle"
value="Benutzer wählen"
/>
</jsp:include>
attribute of some HTML-link and the character "ä" is replaced
somewhere with the Unicode character REPLACEMENT CHARACTER 0xFFFD. But
really only in Tomcat 8.5, not in 8 and not in 7.
I can fix that problem using either "SetCharacterEncodingFilter" or
<% request.setCharacterEncoding("UTF-8"); %>
org.apache.jasper.runtime.JspRuntimeLibrary.include(request, response, "/WEB-INF/jsp/includes/search.jsp" + "?" + org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("chooseSearchInputTitle", request.getCharacterEncoding())+ "=" + org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("Benutzer wählen", request.getCharacterEncoding()), out, false);
The "ä" is properly encoded using UTF-8 in all versions of Tomcat andorg.apache.jasper.runtime.JspRuntimeLibrary.include(request, response, "/WEB-INF/jsp/includes/search.jsp" + "?" + org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("chooseSearchInputTitle", request.getCharacterEncoding())+ "=" + org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("Benutzer wählen", request.getCharacterEncoding()), out, false);
the generated code seems to be the same in all versions as well,
especially regarding "request.getCharacterEncoding()".
"getCharacterEncoding" in Tomcat 8.8 has changed, the former
@Override
public String getCharacterEncoding() {
String characterEncoding = coyoteRequest.getCharacterEncoding();
if (characterEncoding != null) {
return characterEncoding;
}
Context context = getContext();
if (context != null) {
return context.getRequestCharacterEncoding();
}
return null;
}
My connector in server.xml is configured to use "URIEncoding" as UTF-8public String getCharacterEncoding() {
String characterEncoding = coyoteRequest.getCharacterEncoding();
if (characterEncoding != null) {
return characterEncoding;
}
Context context = getContext();
if (context != null) {
return context.getRequestCharacterEncoding();
}
return null;
}
in all versions of Tomcat, but that doesn't make a difference to 8.5.
So I understand that using "setCharacterEncoding", I set the value
actually used in the generated Java now, even though the following is
Note that the encoding for GET requests is not set here, but on a Connector
https://tomcat.apache.org/tomcat-8.5-doc/config/filter.html#Set_Character_Encoding_Filter/IntroductionNow I'm wondering about multiple things...
1. Doesn't "getCharacterEncoding" provide the encoding of the
HTTP-body? My JSP is called using GET and the Java quoted above
seems to build a query string as well. So why does it depend on
some body encoding instead of e.g. URIEncoding of the connector?
2. Is my former approach wrong or did changes in Tomcat 8.5 introduce
some regression? There is some conversion somewhere which was not
present in the past.
3. What is the correct fix I need now? The character encoding filter,
even though it only applies to bodies per documentation?
Thanks!
Mit freundlichen Grüßen,
Thorsten Schöning
--
Thorsten Schöning E-Mail: ***@AM-SoFT.de
AM-SoFT IT-Systeme http://www.AM-SoFT.de/
Telefon...........05151- 9468- 55
Fax...............05151- 9468- 88
Mobil..............0178-8 9468- 04
AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow
---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org