Discussion:
Tomcat 8 on Solaris 10/11
Andrew Seales
2015-03-26 09:54:34 UTC
Permalink
Hi,

We are having a problem on our production servers where downloads of
certain files are getting randomly truncated. This includes static
Javascript files, file downloads via servlets, etc, where the file is
more than about 100K. Most of the time the file downloads successfully,
but some randomly get truncated. The truncation doesn't happen in
exactly the same place every time.

I've been able to recreate the issue on our development servers using
Tomcat 8.0.20 with Java 1.8.20 and 1.8.40. I've tried Solaris 10, 11,
SPARC and x64 CPUs and the same issue occurs. I've tested on a fresh
install of Tomcat 8 and dropped in one of our larger Javascript files
into the webapps/ROOT directory and made no other changes. I'm using a
Perl script to continuously download the file and test an md5 hash
against a known good value to test if the download breaks. It also seems
to only occur when the network speed isn't very good. I use the
following command to limit the speed of my network interface:

sudo tc qdisc add dev eth0 root handle 1:0 netem rate 128kbit

I've also tested the same Tomcat on a Redhat 6 server but that appears
to work fine.

If I revert to Tomcat 7.0.59, then Solaris works fine. The problem
appears to only occur with Tomcat 8 on Solaris. I've tried v8.0.14 and
8.0.20 and they both have the problem.

The Perl script is available from
http://dlib-bauer.ucs.ed.ac.uk/testdata.pl
The Javascript file is available from
http://dlib-bauer.ucs.ed.ac.uk/ext-datadownload-20150323_1157.js

Is anyone else running Tomcat 8 on Solaris 10 or 11 with Java 8, or know
of any problems on the platform?

Regards,
--
Andrew Seales

EDINA tel: +44 (0) 131 650 3022
Edinburgh University fax: +44 (0) 131 650 3308
Causewayside House url: http://edina.ac.uk
160 Causewayside email: ***@ed.ac.uk
Edinburgh EH9 1PR


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Rainer Jung
2015-03-26 11:12:01 UTC
Permalink
Post by Andrew Seales
Hi,
We are having a problem on our production servers where downloads of
certain files are getting randomly truncated. This includes static
Javascript files, file downloads via servlets, etc, where the file is
more than about 100K. Most of the time the file downloads successfully,
but some randomly get truncated. The truncation doesn't happen in
exactly the same place every time.
I've been able to recreate the issue on our development servers using
Tomcat 8.0.20 with Java 1.8.20 and 1.8.40. I've tried Solaris 10, 11,
SPARC and x64 CPUs and the same issue occurs. I've tested on a fresh
install of Tomcat 8 and dropped in one of our larger Javascript files
into the webapps/ROOT directory and made no other changes. I'm using a
Perl script to continuously download the file and test an md5 hash
against a known good value to test if the download breaks. It also seems
to only occur when the network speed isn't very good. I use the
sudo tc qdisc add dev eth0 root handle 1:0 netem rate 128kbit
I've also tested the same Tomcat on a Redhat 6 server but that appears
to work fine.
If I revert to Tomcat 7.0.59, then Solaris works fine. The problem
appears to only occur with Tomcat 8 on Solaris. I've tried v8.0.14 and
8.0.20 and they both have the problem.
The Perl script is available from
http://dlib-bauer.ucs.ed.ac.uk/testdata.pl
The Javascript file is available from
http://dlib-bauer.ucs.ed.ac.uk/ext-datadownload-20150323_1157.js
Is anyone else running Tomcat 8 on Solaris 10 or 11 with Java 8, or know
of any problems on the platform?
Yes, we do, on Solaris 10. I don't know of any such problems, but I
can't introduce the slow network condition here to test.

Is the file really truncated, i.e. too short, or is it corrupt? Can the
truncation also be seen in the Tomcat access log? If so, could you
replace the curl/md5sum based test with another HTTP client like
LWP::Simple in perl or "ab" coming with Apache httpd. Just to rule out
the client side of the picture.

Is truncation always happening at the same byte? Any pattern?

Which connector are you using? NIO? APR?

I personally would try the following to provide additional analysis
data: Find a setup, where you can log the client port. Use this setup
and snoop network traffic during the test on the client and server side.
Once the problem happens, use the local port number and timestamp to
extract the communication pattern on the server and client side. That
way you can see, which side closed/aborted the connection - or whether
it is something in between client and server.

Unfortunately logging the client port often is not trivial to achieve.
On the Tomcat side (access log), currently there is only the server port
available, not the remote port, although this would be very simple to
add. On the short hand it would maybe work to switch to perl plus LWP
and try to get the local port from LWP.

Regards,

Rainer





---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Aurélien Terrestris
2015-03-26 21:10:36 UTC
Permalink
As suggested by Rainer, I would try with the blocking connector and compare.

Otherwise, it could be that your file is using very long lines (only 5
lines for more than 800k of data). Maybe a tomcat-dev could have a
look on this.

$ wc ext-datadownload-20150323_1157.js
5 7634 838044 ext-datadownload-20150323_1157.js
Post by Andrew Seales
Hi,
We are having a problem on our production servers where downloads of
certain files are getting randomly truncated. This includes static
Javascript files, file downloads via servlets, etc, where the file is
more than about 100K. Most of the time the file downloads successfully,
but some randomly get truncated. The truncation doesn't happen in
exactly the same place every time.
I've been able to recreate the issue on our development servers using
Tomcat 8.0.20 with Java 1.8.20 and 1.8.40. I've tried Solaris 10, 11,
SPARC and x64 CPUs and the same issue occurs. I've tested on a fresh
install of Tomcat 8 and dropped in one of our larger Javascript files
into the webapps/ROOT directory and made no other changes. I'm using a
Perl script to continuously download the file and test an md5 hash
against a known good value to test if the download breaks. It also seems
to only occur when the network speed isn't very good. I use the
sudo tc qdisc add dev eth0 root handle 1:0 netem rate 128kbit
I've also tested the same Tomcat on a Redhat 6 server but that appears
to work fine.
If I revert to Tomcat 7.0.59, then Solaris works fine. The problem
appears to only occur with Tomcat 8 on Solaris. I've tried v8.0.14 and
8.0.20 and they both have the problem.
The Perl script is available from
http://dlib-bauer.ucs.ed.ac.uk/testdata.pl
The Javascript file is available from
http://dlib-bauer.ucs.ed.ac.uk/ext-datadownload-20150323_1157.js
Is anyone else running Tomcat 8 on Solaris 10 or 11 with Java 8, or know
of any problems on the platform?
Yes, we do, on Solaris 10. I don't know of any such problems, but I can't
introduce the slow network condition here to test.
Is the file really truncated, i.e. too short, or is it corrupt? Can the
truncation also be seen in the Tomcat access log? If so, could you replace
the curl/md5sum based test with another HTTP client like LWP::Simple in perl
or "ab" coming with Apache httpd. Just to rule out the client side of the
picture.
Is truncation always happening at the same byte? Any pattern?
Which connector are you using? NIO? APR?
Find a setup, where you can log the client port. Use this setup and snoop
network traffic during the test on the client and server side. Once the
problem happens, use the local port number and timestamp to extract the
communication pattern on the server and client side. That way you can see,
which side closed/aborted the connection - or whether it is something in
between client and server.
Unfortunately logging the client port often is not trivial to achieve. On
the Tomcat side (access log), currently there is only the server port
available, not the remote port, although this would be very simple to add.
On the short hand it would maybe work to switch to perl plus LWP and try to
get the local port from LWP.
Regards,
Rainer
---------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Andrew Seales
2015-03-27 14:49:27 UTC
Permalink
Post by Aurélien Terrestris
As suggested by Rainer, I would try with the blocking connector and compare.
Otherwise, it could be that your file is using very long lines (only 5
lines for more than 800k of data). Maybe a tomcat-dev could have a
look on this.
$ wc ext-datadownload-20150323_1157.js
5 7634 838044 ext-datadownload-20150323_1157.js
I can try the non-minified version to see if it makes a difference, but
we're getting the same problem with binary files too.
--
Andrew Seales

EDINA tel: +44 (0) 131 650 3022
Edinburgh University fax: +44 (0) 131 650 3308
Causewayside House url: http://edina.ac.uk
160 Causewayside email: ***@ed.ac.uk
Edinburgh EH9 1PR


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Andrew Seales
2015-03-27 14:48:28 UTC
Permalink
Post by Rainer Jung
Yes, we do, on Solaris 10. I don't know of any such problems, but I
can't introduce the slow network condition here to test.
Good to know. In case it wasn't clear, the network limiting is done on
the client side, not on the server.
Post by Rainer Jung
Is the file really truncated, i.e. too short, or is it corrupt? Can
the truncation also be seen in the Tomcat access log? If so, could you
replace the curl/md5sum based test with another HTTP client like
LWP::Simple in perl or "ab" coming with Apache httpd. Just to rule out
the client side of the picture.
Yes the file is definitely truncated rather than corrupted. Users of our
services with normal browsers are noticing the problem, it's what
prompted me to use the Perl+Curl test script.
Post by Rainer Jung
Is truncation always happening at the same byte? Any pattern?
Which connector are you using? NIO? APR?
I've tried both AJP13 and the standard HTTP1/1 connectors, both have the
same problem. When using AJP the Apache log shows the file size as being
truncated, I'll check the Tomcat log when using HTTP1/1 to see if it agrees.
Post by Rainer Jung
I personally would try the following to provide additional analysis
data: Find a setup, where you can log the client port. Use this setup
and snoop network traffic during the test on the client and server
side. Once the problem happens, use the local port number and
timestamp to extract the communication pattern on the server and
client side. That way you can see, which side closed/aborted the
connection - or whether it is something in between client and server.
Thanks, I'll give Wireshark or something like that a go to see if I can
see any TCP problems.
--
Andrew Seales

EDINA tel: +44 (0) 131 650 3022
Edinburgh University fax: +44 (0) 131 650 3308
Causewayside House url: http://edina.ac.uk
160 Causewayside email: ***@ed.ac.uk
Edinburgh EH9 1PR


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Rainer Jung
2015-03-27 15:15:49 UTC
Permalink
Post by Andrew Seales
Post by Rainer Jung
Yes, we do, on Solaris 10. I don't know of any such problems, but I
can't introduce the slow network condition here to test.
Good to know. In case it wasn't clear, the network limiting is done on
the client side, not on the server.
Post by Rainer Jung
Is the file really truncated, i.e. too short, or is it corrupt? Can
the truncation also be seen in the Tomcat access log? If so, could you
replace the curl/md5sum based test with another HTTP client like
LWP::Simple in perl or "ab" coming with Apache httpd. Just to rule out
the client side of the picture.
Yes the file is definitely truncated rather than corrupted. Users of our
services with normal browsers are noticing the problem, it's what
prompted me to use the Perl+Curl test script.
Post by Rainer Jung
Is truncation always happening at the same byte? Any pattern?
Which connector are you using? NIO? APR?
I've tried both AJP13 and the standard HTTP1/1 connectors, both have the
same problem. When using AJP the Apache log shows the file size as being
truncated, I'll check the Tomcat log when using HTTP1/1 to see if it agrees.
Post by Rainer Jung
I personally would try the following to provide additional analysis
data: Find a setup, where you can log the client port. Use this setup
and snoop network traffic during the test on the client and server
side. Once the problem happens, use the local port number and
timestamp to extract the communication pattern on the server and
client side. That way you can see, which side closed/aborted the
connection - or whether it is something in between client and server.
Thanks, I'll give Wireshark or something like that a go to see if I can
see any TCP problems.
Not necessarily a problem from the TCP layer point f view. But once you
can find a request that's truncated in the TCP log, you can look out

- whether it was a normal connection shutdown or a reset
- whether there were unusual pauses between packets triggering timeouts

etc. That's why you would benefit from your test client being able to
log the client port when a failure arises so you can filter easily in
Wireshark. I'm thinking about adding client port as a loggable item to
the Tomcat AccessLog but that won't help you right now.

I vaguely remember problems with TCP checksum offloading, but they
should be fixed long ago. See e.g.

http://compgroups.net/comp.unix.solaris/disable-e1000-tcp-checksum-offloading-t5220/472801

I also once at a customer saw a problem not of truncation, but the last
packet of a response as delayed quite noticable. That was fixed by
applying the current Solaris patch cluster at that time.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Rainer Jung
2015-03-30 09:04:34 UTC
Permalink
Post by Rainer Jung
I'm thinking about adding client port as a loggable item to
the Tomcat AccessLog but that won't help you right now.
Done and will be available starting with TC 8.0.22 and 7.0.62. Log
pattern format is %{remote}p like for Apache httpd.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Andrew Seales
2015-04-01 08:15:59 UTC
Permalink
Post by Rainer Jung
Not necessarily a problem from the TCP layer point f view. But once
you can find a request that's truncated in the TCP log, you can look out
- whether it was a normal connection shutdown or a reset
- whether there were unusual pauses between packets triggering timeouts
etc. That's why you would benefit from your test client being able to
log the client port when a failure arises so you can filter easily in
Wireshark. I'm thinking about adding client port as a loggable item to
the Tomcat AccessLog but that won't help you right now.
I vaguely remember problems with TCP checksum offloading, but they
should be fixed long ago. See e.g.
http://compgroups.net/comp.unix.solaris/disable-e1000-tcp-checksum-offloading-t5220/472801
I can't see any RST packets so I can only conclude its a normal(ish)
shutdown.

I have found that if I run my test client on a Linux server that's close
to the Solaris servers, I don't have an issue with truncation. Perhaps
there's a switch issue between our Solaris servers at the outside world.
Post by Rainer Jung
I also once at a customer saw a problem not of truncation, but the
last packet of a response as delayed quite noticable. That was fixed
by applying the current Solaris patch cluster at that time.
I'm not sure there's a delay but I'm sure the patch level is way out of
date.

Thanks for your help though, my workaround at the moment is to just run
Tomcat 7. We're planning to migrate of Solaris onto Linux anyway, we'll
just have to wait until then before upgrading to Tomcat 8.
--
Andrew Seales

EDINA tel: +44 (0) 131 650 3022
Edinburgh University fax: +44 (0) 131 650 3308
Causewayside House url: http://edina.ac.uk
160 Causewayside email: ***@ed.ac.uk
Edinburgh EH9 1PR



The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Loading...