Discussion:
Character encoding problems using jsp:include with jsp:param in Tomcat 8.5 only.
Thorsten Schöning
2018-11-26 13:45:56 UTC
Permalink
Hi all,

I'm currently testing migration of a legacy web app from Tomcat 7 to 8
to 8.5 and ran into problems regarding character encoding in 8.5 only.
That app uses JSP pages and declares all of those to be stored in
UTF-8, does really do so :-), and declares a HTTP-Content type of
"text/html; charset=UTF-8" as well. Textual content at HTML-level is
properly encoded using UTF-8 and looks properly in the browser etc.
<jsp:include page="/WEB-INF/jsp/includes/search.jsp">
<jsp:param name="chooseSearchInputTitle"
value="Benutzer wählen"
/>
</jsp:include>
"search.jsp" simply outputs the value of the param as the "title"
attribute of some HTML-link and the character "ä" is replaced
somewhere with the Unicode character REPLACEMENT CHARACTER 0xFFFD. But
really only in Tomcat 8.5, not in 8 and not in 7.

I can fix that problem using either "SetCharacterEncodingFilter" or
<% request.setCharacterEncoding("UTF-8"); %>
org.apache.jasper.runtime.JspRuntimeLibrary.include(request, response, "/WEB-INF/jsp/includes/search.jsp" + "?" + org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("chooseSearchInputTitle", request.getCharacterEncoding())+ "=" + org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("Benutzer wählen", request.getCharacterEncoding()), out, false);
The "ä" is properly encoded using UTF-8 in all versions of Tomcat and
the generated code seems to be the same in all versions as well,
especially regarding "request.getCharacterEncoding()".

"getCharacterEncoding" in Tomcat 8.8 has changed, the former
@Override
public String getCharacterEncoding() {
String characterEncoding = coyoteRequest.getCharacterEncoding();
if (characterEncoding != null) {
return characterEncoding;
}
Context context = getContext();
if (context != null) {
return context.getRequestCharacterEncoding();
}
return null;
}
My connector in server.xml is configured to use "URIEncoding" as UTF-8
in all versions of Tomcat, but that doesn't make a difference to 8.5.
So I understand that using "setCharacterEncoding", I set the value
actually used in the generated Java now, even though the following is
Note that the encoding for GET requests is not set here, but on a Connector
https://tomcat.apache.org/tomcat-8.5-doc/config/filter.html#Set_Character_Encoding_Filter/Introduction

Now I'm wondering about multiple things...

1. Doesn't "getCharacterEncoding" provide the encoding of the
HTTP-body? My JSP is called using GET and the Java quoted above
seems to build a query string as well. So why does it depend on
some body encoding instead of e.g. URIEncoding of the connector?

2. Is my former approach wrong or did changes in Tomcat 8.5 introduce
some regression? There is some conversion somewhere which was not
present in the past.

3. What is the correct fix I need now? The character encoding filter,
even though it only applies to bodies per documentation?

Thanks!

Mit freundlichen Grüßen,

Thorsten Schöning
--
Thorsten Schöning E-Mail: ***@AM-SoFT.de
AM-SoFT IT-Systeme http://www.AM-SoFT.de/

Telefon...........05151- 9468- 55
Fax...............05151- 9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow


---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Christopher Schultz
2018-11-26 15:07:50 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Thorsten,
Post by Thorsten Schöning
Hi all,
I'm currently testing migration of a legacy web app from Tomcat 7
to 8 to 8.5 and ran into problems regarding character encoding in
8.5 only. That app uses JSP pages and declares all of those to be
stored in UTF-8, does really do so :-), and declares a HTTP-Content
type of "text/html; charset=UTF-8" as well. Textual content at
HTML-level is properly encoded using UTF-8 and looks properly in
the browser etc.
In Tomcat 8.5 the following is introducing encoding problems,
<jsp:include page="/WEB-INF/jsp/includes/search.jsp"> <jsp:param
name="chooseSearchInputTitle" value="Benutzer wählen" />
</jsp:include>
"search.jsp" simply outputs the value of the param as the "title"
attribute of some HTML-link and the character "ä" is replaced
somewhere with the Unicode character REPLACEMENT CHARACTER 0xFFFD.
But really only in Tomcat 8.5, not in 8 and not in 7.
Have you been able to determine if the problem is on input or output?
Post by Thorsten Schöning
I can fix that problem using either "SetCharacterEncodingFilter"
<% request.setCharacterEncoding("UTF-8"); %>
FYI the SetCharacterEncodingFilter only modifies request encoding and
not response encoding. Also, it only changes the encoding of the
request *body* (e.g. PUT/POST), and not the encoding used to decode
the URI. That's configured in <Connector>'s URIEncoding. There is also
useBodyEncodingForURI which inherits the request body's encoding if
it's present. I recommend using useBodyEncodingForURI="true".

I recommend *always* using SetCharacterEncodingFilter, since web
browsers both habitually refuse to send a correct content/type and
often use UTF-8 in URLs in violation of the HTTP spec. The result is
essentially that everything works the way you *want* it to work,
except that you just have to "hope" it works instead of being able to
prove that it will.
Post by Thorsten Schöning
Looking at the generated Java code for the JSP I get the
org.apache.jasper.runtime.JspRuntimeLibrary.include(request,
response, "/WEB-INF/jsp/includes/search.jsp" + "?" +
org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("chooseSearchIn
putTitle",
Post by Thorsten Schöning
request.getCharacterEncoding())+ "=" +
org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("Benutzer
wählen", request.getCharacterEncoding()), out, false);
The "ä" is properly encoded using UTF-8 in all versions of Tomcat
and the generated code seems to be the same in all versions as
well, especially regarding "request.getCharacterEncoding()".
"getCharacterEncoding" in Tomcat 8.8 has changed, the former
@Override public String getCharacterEncoding() { String
characterEncoding = coyoteRequest.getCharacterEncoding(); if
(characterEncoding != null) { return characterEncoding; }
Context context = getContext(); if (context != null) { return
context.getRequestCharacterEncoding(); }
return null; }
This is just a fall-back for when there is no character encoding
defined in the request (because the browser didn't send one).
Post by Thorsten Schöning
My connector in server.xml is configured to use "URIEncoding" as
UTF-8 in all versions of Tomcat, but that doesn't make a difference
to 8.5. So I understand that using "setCharacterEncoding", I set
the value actually used in the generated Java now, even though the
Note that the encoding for GET requests is not set here, but on a Connector
https://tomcat.apache.org/tomcat-8.5-doc/config/filter.html#Set_Charac
ter_Encoding_Filter/Introduction
Post by Thorsten Schöning
Now I'm wondering about multiple things...
1. Doesn't "getCharacterEncoding" provide the encoding of the
HTTP-body?
Yes, but it comes directly from the browser, who often doesn't provide
it. There is no encoding-detection going on, so it's often "null" or
ISO-8859-1, which is the spec-defined default.
Post by Thorsten Schöning
My JSP is called using GET and the Java quoted above seems to build
a query string as well. So why does it depend on some body encoding
instead of e.g. URIEncoding of the connector?
Good question. Might be a bug, here.
Post by Thorsten Schöning
2. Is my former approach wrong or did changes in Tomcat 8.5
introduce some regression? There is some conversion somewhere which
was not present in the past.
Tomcat 8.5 follows the servlet spec, which in v4.0 added the
<web-app><request-character-encoding> to make things even more fun.
Actually, this can replace the use of the SetCharacterEncodingFilter.
Thanks for pointing this out; I wasn't aware of this feature of the
4.0 spec.
Post by Thorsten Schöning
3. What is the correct fix I need now? The character encoding
filter, even though it only applies to bodies per documentation?
Try setting <request-character-encoding> in your <web-app> like this:

web.xml
- -------
<web-app>
<request-character-encoding>UTF-8</request-character-encoding>
</web-app>

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlv8DEYACgkQHPApP6U8
pFjbihAAuX3vNtHpJ2qLpIofvz83wFbCxyVsgnRPGIQsqT/wxskOizwkKCmxnITc
pYEJHOEjF5U+C9QJtyC4iPz/Dj9MOfk8986NZ/9bhxFuGJsAifO1HKZ2vTvf9dYD
s5yAPJryQYaShgiDRPopYDgCOWi6a9mQMjvQeYclQjFAOa3MWMa4tlnKD2mOL4GQ
X/PuUiKA97XMmj6LZTwh9dGJwU2Fi6LlWOIXXP2qAB8RmcfIlDr20/m1OKg4l0Z3
dVzbD0rWM7tNCtDhnybclamdKv+apDJGS3NtTHzScXlqT51EdUiKup+mTJbaRncD
okL9MKlGLZYe5ankTGHaNH5P4BfhSv1BUYwiTXpUMgVpuAl5AMxEwu5ZHdoyeSJm
+B27/RLXMFue25Qtni6op06ssJGjQZyR5AxAN4qO/k3eTJUzAp5tLiJlbpJbMIzd
fEiL2kIkvIeHUE6Iz39deaWsFqu6m1hweSGcTXsvky0mEi20QZ9Pa+1E9UTvii20
HL0h/MxKlfJFc7yXmLU2SpTho4lTLUIMD57XOuYPQTkHBcW0QoHJLSCymANx/wpv
OdPjXsqGDBAKWteRTaB7caqU0Fb+Z3UHA8PUIjT4sPW88uHkRGA5XRLMWWlXe+Cx
DVwykOEkBaKXLWzZ51R+cYoWEWKtbR0pzEW+dA9JEMClWMrovkg=
=pfKy
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Thorsten Schöning
2018-11-27 09:48:22 UTC
Permalink
Guten Tag Christopher Schultz,
Post by Christopher Schultz
web.xml
- -------
<web-app>
<request-character-encoding>UTF-8</request-character-encoding>
</web-app>
Tested that with Tomcat 9 and this setting fixed my problem the same
as using SetCharacterEncodingFilter. It doesn't work in Tomcat 8.5, I
guess because that simply doesn't implement Servlet 4.0?

Because I still need to support Tomcat 7 and 8.0 for some time, I'll
keep SetCharacterEncodingFilter for now and just document the better
solution. Thanks!

P.S.:

I've send you a private mail some days ago, unrelated to Tomcat. Did
you get that? Just want to make sure that I'm not spam filtered.

Mit freundlichen Grüßen,

Thorsten Schöning
--
Thorsten Schöning E-Mail: ***@AM-SoFT.de
AM-SoFT IT-Systeme http://www.AM-SoFT.de/

Telefon...........05151- 9468- 55
Fax...............05151- 9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow


---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Christopher Schultz
2018-11-29 18:33:04 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Thorsten,
Guten Tag Christopher Schultz, am Montag, 26. November 2018 um
web.xml - ------- <web-app>
<request-character-encoding>UTF-8</request-character-encoding>
</web-app>
Tested that with Tomcat 9 and this setting fixed my problem the
same as using SetCharacterEncodingFilter. It doesn't work in Tomcat
8.5, I guess because that simply doesn't implement Servlet 4.0?
Correct. Tomcat 8.0 and 8.5 implement servlet 3.1. In Tomcat 8.x,
you'll need to use the SetCharacterEncodingFilter.
Because I still need to support Tomcat 7 and 8.0 for some time,
I'll keep SetCharacterEncodingFilter for now and just document the
better solution. Thanks!
Sounds good. The SetCharacterEncodingFilter should be entirely
forward-compatible.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlwAMN8ACgkQHPApP6U8
pFgY/w/+JyJy02PVIebDXUNYugq8rR2GR+7cQhrHiFwdR0kcf8/FySP8s/8IsJyn
JaCbQ4V/qssMRYlSaxHb2m7xpioraXJkXQE/3HGZyJFKnLykZcAwF86jTSuTesS0
I20IRMh5KJKMoCszmDfqMnY3vQSGJJ7G+Jc47myApKn7qu2igQcDHkVZSK7hEqsb
+ayfHiUIkyN24h6xvFEb7u5RDiATMli6GOverpW1t5+oWdDoUK452aQGQYfN8ojH
Nv2lI6r9OSKQoz3eA6xNkMLlfSPGCH1kzfDyY4KYqhBtxshTnxRzkEoZ3w+DjVjD
U69oOpLthm7nTiYbdGft4dMTcKW+17LczjEbRExV8ZqM3EI92a2iTPDhrva5T65E
dTcNuImv2dr9Ijgn6hvMttE1Ntubncy+UwRdfuGTAoeZ771zxrP7+6UN6BXyO14S
rwgAI1tPzwwsWHJ4emfNEERjKbKy0m5U/WivoKmVVDavGfYskCWQXkzZ64eUGxuU
QKANPJJcprELYw2bX06n+ViJ+zKRHju4SsdJuScKpiXsBgVqiE6MsilB5DKIO8vg
zypgshIpoKVjq3KevsEyHUbVNZguxv4wtSOsGhjkYpm0+e07e/MNLXaK2OnLxIV5
0OGfimo2pYNocS2iM2a2aiwi5PMfDchqjjVovyQvFSV4W3xaMIk=
=mqmG
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org

Loading...