Discussion:
very slow class loading on initial JSP/servlet request after restart
Sam Hokin
2009-02-19 23:23:49 UTC
Permalink
I've been struggling with a very severe problem on one of my three Tomcat servers since late December. I'm completely
stumped. I'm running 6.0.18 on all three servers, which are all running Fedora Core 9, all currently running Sun's JDK
1.6.0_12. The only difference is hardware. The server that's being a problem is a Dell PowerEdge 1950, quad dual core.
All three servers are configured exactly the same, with Apache HTTPD and using mod_proxy_ajp to pass requests on to
Tomcat's AJP connector. All Tomcat instances are running the tomcat-native APR v1.1.14 that comes bundled with Tomcat.
However, this problem persists on the problematic server if I don't run tomcat-native APR.

The problem, which spontaneously appeared a few days before Christmas on this one server, is that the initial request of
a JSP or servlet takes AGES to respond, usually exceeding several minutes. I've found that compilation of a JSP to a
class file is nearly instant, but response from the resulting compiled class takes forever. Further, I found that this
only happens when the offending JSP (or servlet) imports classes that I wrote, located in a JAR in Tomcat's /lib
directory or inside the web app's WEB-INF/lib directory, or expanded out under WEB-INF/classes. For example, the
following simple JSP,

<%@ page import="java.util.*, java.io.*, net.ims.jcms.*" %>
<%
java.util.Date endDate = new java.util.Date();
out.println(endDate.toString());
%>

takes AGES to respond, the first time it is requested after a Tomcat restart (even if the compiled class is present
under the /work tree), while another JSP with the same executable lines:

<%@ page import="java.util.*, java.io.* %>
<%
java.util.Date endDate = new java.util.Date();
out.println(endDate.toString());
%>

responds immediately. The net.ims.jcms package is something I wrote. After the first time a JSP has been requested,
subsequent requests respond immediately. So, the server still functions, but there is a horribly long delay before
pages respond after a Tomcat restart.

Again, this is not a compilation issue; I've monitored the /work directory and can see that the class files appear very
quickly. In addition, this problem recurs after a Tomcat restart when the compiled class files already exist under
/work. I've run the JVM with the -verbose:class option, and, sure enough, the long delay corresponds to a long delay
before the page's class is loaded by the JVM. As soon as the JVM spits out the line saying it loaded the class, the
response is sent.

So, my conclusion is that the class loader simply takes a very long time, on the first request, if the class imports the
package that I wrote. After the class is loaded into memory by that first request, subsequent requests respond normally.

I cannot divine any problem with my JAR file or classes within it that would result in this problem; furthermore, it
only occurs on one of three servers running the exact same everything. I have run MEMTEST86 on the offending server and
memory seems to be fine. It is fine for all other services, processes, etc. The problem is robust on this specific
server - it has transcended versions of Tomcat, the JVM, the kernel version, etc.

I would greatly appreciate any suggestions for further diagnostics, things to try, offending things I may have done in
my Java package, etc. Thanks!
André Warnier
2009-02-19 23:45:16 UTC
Permalink
Sam Hokin wrote:
[...]
Not being a Tomcat or Java specialist at all, but this being fairly late
in the day, I am piqued by your nicely told story and just trying to
imagine what could be the cause of the issue you describe. We Belgians,
like Hercule Poirot, have a knack for such things.
Considering that it happens only on the quad-core machine, and with
various versions of Java and Tomcat, my guess would be that something in
your classes is either so attractive (or so repulsive) that the 4 cores
are fighting between them about who will get it first, or at the
contrary who will be stuck with it. Since by construction these cores
are very similar in performance, it takes a long time for the matter to
be decided, which explains the initial delay. Now once the matter is
settled, there is no longer any discussion and thus also no further delays.
That seems to me consistent with the circumstances and conditions you
describe.
The fact that it started happening just before Christmas also leads me
to speculate that it must have something to do with reindeers, or
perhaps with chocolate.

;-)




















In order not to make this a totally useless post, I'll also speculate
more seriously, but still from a position of knowing little about Java
or Tomcat, or multi-core machines.
What happens after compilation, and at the first usage of a webapp, and
not later on ? Perhaps some "init" function being called. And maybe, in
such an init function, there is something that causes multiple cores to
be in contention for some resource.
Not much better than the above, I know. Sigh.
Caldarale, Charles R
2009-02-20 02:40:16 UTC
Permalink
> From: Sam Hokin [mailto:***@ims.net]
> Subject: very slow class loading on initial JSP/servlet
> request after restart
>
> The server that's being a problem is a Dell PowerEdge
> 1950, quad dual core.

Since I could never believe anything bad could be caused by chocolate (especially dark chocolate), let's go for something else.

1) What are the servers that don't have a problem?

2) How much RAM is on each server?

3) Is the JDK 32- or 64-bit on each server?

4) What's the Java heap size for Tomcat on each server?

5) Are you using the same file system on all three servers? What is it (ext3, reiser, ???)?

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
André Warnier
2009-02-20 12:37:15 UTC
Permalink
Caldarale, Charles R wrote:
>> From: Sam Hokin [mailto:***@ims.net]
>> Subject: very slow class loading on initial JSP/servlet
>> request after restart
>>
>> The server that's being a problem is a Dell PowerEdge
>> 1950, quad dual core.
>
> Since I could never believe anything bad could be caused by chocolate (especially dark chocolate), let's go for something else.
>
> 1) What are the servers that don't have a problem?
>
> 2) How much RAM is on each server?
>
> 3) Is the JDK 32- or 64-bit on each server?
>
> 4) What's the Java heap size for Tomcat on each server?
>
> 5) Are you using the same file system on all three servers? What is it (ext3, reiser, ???)?
>
Dropping the dark chocolate then from my suspicions, and notwithstanding
the above fact-finding questions, I still think the reindeers are
suspect, such as 4 of them pulling in opposite directions.

It kind of reminds me of my xml-parsing webapp of a while ago, which
took (and still takes) an inordinate amount of time to load initially
(10 minutes or so), but then has a perfectly acceptable reponse time
after that. That is on a single slow cpu and with less RAM than Chuck
has in his portable phone, but it may still be a good analogy since
these elements merely exacerbate the problem.
What is that webapp (or these classes) really doing when initially
loading, that they are no longer doing afterward ?
On my system also, the log message indicating that the application is
"loaded" comes only after these initial 10 minutes during which the
whole Tomcat is close to inoperative.

Since we have 4 processors here (do we, really? I am not so familiar
with the multi-core architecture), can each of these processors be
running a thread of these apps/classes, where on a single processor
there would be only one thread running ?
And would it be possible that each of these cpus/threads is doing
something that severely competes with the others trying to do the same
thing at the same time ? (such as using a lot of memory, and resulting
in massively swapping in/out). There may be 4 cpus, but there is still
only one each of the other resources, like RAM.
Would running Jconsole while the app is loading help ?
Peter Crowther
2009-02-20 13:03:22 UTC
Permalink
> From: André Warnier [mailto:***@ice-sa.com]
> >> From: Sam Hokin [mailto:***@ims.net]
> >> The server that's being a problem is a Dell PowerEdge
> >> 1950, quad dual core.

> Since we have 4 processors here (do we, really? I am not so familiar
> with the multi-core architecture), can each of these processors be
> running a thread of these apps/classes, where on a single processor
> there would be only one thread running ?

Quad dual-core*. That's 4 silicon dies, each with two copies of "the core of" a CPU. So (4 * 2) = 8 separate instruction execution units, 8 program counters, executing up to 8 instruction streams in parallel.

The cores on one die share some infrastructure like cache and memory access, but the interesting feature here is that this machine can run 8 threads concurrently.

- Peter

* Says the original poster. Looking at the Dell site, I reckon it's actually dual quad-core - so two dies, 4 cores per die. Still 8 concurrent threads, though.
Sam Hokin
2009-02-20 16:02:42 UTC
Permalink
Caldarale, Charles R wrote:

> 1) What are the servers that don't have a problem?

One is a single AMD Athlon 64 2800+, running the 64-bit Sun JDK (same version).

The other is a dual AMD Athlon MP 2000+, running the 32-bit Sun JDK (same version).

And the "bad" server has "8" Intel Xeon E5310 CPUs running the 32-bit Sun JDK (it is indeed a dual quad-core, not a quad
dual-core, as I incorrectly typed in my original post).

> 2) How much RAM is on each server?

The Athlon 64 has 1.5 GB RAM (2 GB swap).
The dual 32-bit Athlon has 1 GB RAM (2 GB swap).
The "bad" server has 2 GB RAM (2 GB swap).

> 3) Is the JDK 32- or 64-bit on each server?

32-bit on the two 32-bit machines, 64-bit on the 64-bit machine.

> 4) What's the Java heap size for Tomcat on each server?

I've played with heap size quite a bit to no avail. Currently, all are running 768 MB with the following JDK options:

JAVA_OPTS="-Xms768m -Xmx768m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:PermSize=128m -XX:MaxPermSize=128m"

> 5) Are you using the same file system on all three servers? What is it (ext3, reiser, ???)?

Yes. ext3 on all three machines, all are running current Fedora Core 9 distros (well, except for the new kernel which
came out yesterday).
Ron McNulty
2009-02-20 05:41:38 UTC
Permalink
Hi Sam

You jar - does it include any JSP tag files written as JSP fragments? There
are known issues in this area. The dependency management seems to get
confused, resulting in lots of unnecessary compilation of the .tag files

Regards

Ron

----- Original Message -----
From: "Sam Hokin" <***@ims.net>
To: <***@tomcat.apache.org>
Sent: Friday, February 20, 2009 12:23 PM
Subject: very slow class loading on initial JSP/servlet request after
restart


> I've been struggling with a very severe problem on one of my three Tomcat
> servers since late December. I'm completely stumped. I'm running 6.0.18
> on all three servers, which are all running Fedora Core 9, all currently
> running Sun's JDK 1.6.0_12. The only difference is hardware. The server
> that's being a problem is a Dell PowerEdge 1950, quad dual core. All three
> servers are configured exactly the same, with Apache HTTPD and using
> mod_proxy_ajp to pass requests on to Tomcat's AJP connector. All Tomcat
> instances are running the tomcat-native APR v1.1.14 that comes bundled
> with Tomcat. However, this problem persists on the problematic server if I
> don't run tomcat-native APR.
>
> The problem, which spontaneously appeared a few days before Christmas on
> this one server, is that the initial request of a JSP or servlet takes
> AGES to respond, usually exceeding several minutes. I've found that
> compilation of a JSP to a class file is nearly instant, but response from
> the resulting compiled class takes forever. Further, I found that this
> only happens when the offending JSP (or servlet) imports classes that I
> wrote, located in a JAR in Tomcat's /lib directory or inside the web app's
> WEB-INF/lib directory, or expanded out under WEB-INF/classes. For
> example, the following simple JSP,
>
> <%@ page import="java.util.*, java.io.*, net.ims.jcms.*" %>
> <%
> java.util.Date endDate = new java.util.Date();
> out.println(endDate.toString());
> %>
>
> takes AGES to respond, the first time it is requested after a Tomcat
> restart (even if the compiled class is present under the /work tree),
> while another JSP with the same executable lines:
>
> <%@ page import="java.util.*, java.io.* %>
> <%
> java.util.Date endDate = new java.util.Date();
> out.println(endDate.toString());
> %>
>
> responds immediately. The net.ims.jcms package is something I wrote.
> After the first time a JSP has been requested, subsequent requests respond
> immediately. So, the server still functions, but there is a horribly long
> delay before pages respond after a Tomcat restart.
>
> Again, this is not a compilation issue; I've monitored the /work directory
> and can see that the class files appear very quickly. In addition, this
> problem recurs after a Tomcat restart when the compiled class files
> already exist under /work. I've run the JVM with the -verbose:class
> option, and, sure enough, the long delay corresponds to a long delay
> before the page's class is loaded by the JVM. As soon as the JVM spits
> out the line saying it loaded the class, the response is sent.
>
> So, my conclusion is that the class loader simply takes a very long time,
> on the first request, if the class imports the package that I wrote.
> After the class is loaded into memory by that first request, subsequent
> requests respond normally.
>
> I cannot divine any problem with my JAR file or classes within it that
> would result in this problem; furthermore, it only occurs on one of three
> servers running the exact same everything. I have run MEMTEST86 on the
> offending server and memory seems to be fine. It is fine for all other
> services, processes, etc. The problem is robust on this specific server -
> it has transcended versions of Tomcat, the JVM, the kernel version, etc.
>
> I would greatly appreciate any suggestions for further diagnostics, things
> to try, offending things I may have done in my Java package, etc. Thanks!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-***@tomcat.apache.org
> For additional commands, e-mail: users-***@tomcat.apache.org
>
Sam Hokin
2009-02-20 16:08:25 UTC
Permalink
Ron McNulty wrote:
> Hi Sam
>
> You jar - does it include any JSP tag files written as JSP fragments?
> There are known issues in this area. The dependency management seems to
> get confused, resulting in lots of unnecessary compilation of the .tag
> files

No, it doesn't, Ron. But thanks for the heads-up on that. The JAR just contains class files for a couple of packages.
And compilation occurs quickly; it's the loading of the compiled class that's taking forever, even if they're already
compiled from a previous Tomcat run. And, once the class is loaded (as indicated by the output line from the
-verbose:class JVM option) execution is quick. I've not been able to get any diagnostic to indicate what's going on
during the interminable wait for pagename_jsp.class to load after the initial page request.
Christopher Schultz
2009-02-20 13:36:22 UTC
Permalink
Sam,

On 2/19/2009 6:23 PM, Sam Hokin wrote:
> The problem, which spontaneously appeared a few days before Christmas on
> this one server, is that the initial request of a JSP or servlet takes
> AGES to respond, usually exceeding several minutes.

Good... then you'll have plenty of time to /take a thread dump/ while
it's spinning its wheels.

I'll cast my hat into the ring: I suspect you're loading a resource that
wants to looking an external resource (like an XML file with a remote
DTD/Schema and validation is on in the parser) and the DNS is tripping
you up.

But, I have been wrong before ;)

- -chris
André Warnier
2009-02-20 15:25:56 UTC
Permalink
Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Sam,
>
> On 2/19/2009 6:23 PM, Sam Hokin wrote:
>> The problem, which spontaneously appeared a few days before Christmas on
>> this one server, is that the initial request of a JSP or servlet takes
>> AGES to respond, usually exceeding several minutes.
>
> Good... then you'll have plenty of time to /take a thread dump/ while
> it's spinning its wheels.
>
> I'll cast my hat into the ring: I suspect you're loading a resource that
> wants to looking an external resource (like an XML file with a remote
> DTD/Schema and validation is on in the parser) and the DNS is tripping
> you up.
>
I would like to point out at this moment that my initial analysis (the
one with the reindeers) does not look so bad after all.

More seriously, and for my own and possibly the OP's edification : we
thus have 8 cores/cpus available, and one webapp being loaded at Tomcat
startup, which takes ages, but independently of class compilation, Java
and Tomcat versions etc.. Say we have told Tomcat to start 100 threads.
Does that mean that possibly 8 threads (or more) are all (rather busily)
waiting for something in relation to some initialisation in the (rather
suspect) classes belonging to that webapp ?
The OP also mentions "the initial request takes ages". So this
initialisation would not happen really at Tomcat startup, but at the
first request. Any particular setup or circumstance that could be tried
to change when this happens, just to see ?
János Löbb
2009-02-20 15:47:46 UTC
Permalink
On Feb 20, 2009, at 10:25 AM, André Warnier wrote:

> Christopher Schultz wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> Sam,
>> On 2/19/2009 6:23 PM, Sam Hokin wrote:
>>> The problem, which spontaneously appeared a few days before
>>> Christmas on
>>> this one server, is that the initial request of a JSP or servlet
>>> takes
>>> AGES to respond, usually exceeding several minutes.
>> Good... then you'll have plenty of time to /take a thread dump/ while
>> it's spinning its wheels.
>> I'll cast my hat into the ring: I suspect you're loading a resource
>> that
>> wants to looking an external resource (like an XML file with a remote
>> DTD/Schema and validation is on in the parser) and the DNS is
>> tripping
>> you up.
> I would like to point out at this moment that my initial analysis
> (the one with the reindeers) does not look so bad after all.
>
> More seriously, and for my own and possibly the OP's edification :
> we thus have 8 cores/cpus available, and one webapp being loaded at
> Tomcat startup, which takes ages, but independently of class
> compilation, Java and Tomcat versions etc.. Say we have told Tomcat
> to start 100 threads.
> Does that mean that possibly 8 threads (or more) are all (rather
> busily) waiting for something in relation to some initialisation in
> the (rather suspect) classes belonging to that webapp ?
> The OP also mentions "the initial request takes ages". So this
> initialisation would not happen really at Tomcat startup, but at the
> first request. Any particular setup or circumstance that could be
> tried to change when this happens, just to see ?
>

No, it is definitely not the reindeers, but an octo-pussy :)

János
Caldarale, Charles R
2009-02-20 15:48:04 UTC
Permalink
> From: André Warnier [mailto:***@ice-sa.com]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart
>
> Does that mean that possibly 8 threads (or more) are all
> (rather busily) waiting for something in relation to some
> initialisation in the (rather suspect) classes belonging
> to that webapp ?

Not busily, no. They may be waiting, but Java makes it difficult (not impossible) to spin on a condition; we would also see that behavior on the other systems. Chris' speculation of DNS timeouts due to improper TCP/IP configuration on this one server is still the most likely cause, pending more information from the OP.

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
David Smith
2009-02-20 15:55:20 UTC
Permalink
André Warnier wrote:
> Christopher Schultz wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Sam,
>>
>> On 2/19/2009 6:23 PM, Sam Hokin wrote:
>>> The problem, which spontaneously appeared a few days before
>>> Christmas on
>>> this one server, is that the initial request of a JSP or servlet takes
>>> AGES to respond, usually exceeding several minutes.
>>
>> Good... then you'll have plenty of time to /take a thread dump/ while
>> it's spinning its wheels.
>>
>> I'll cast my hat into the ring: I suspect you're loading a resource that
>> wants to looking an external resource (like an XML file with a remote
>> DTD/Schema and validation is on in the parser) and the DNS is tripping
>> you up.
>>
> I would like to point out at this moment that my initial analysis (the
> one with the reindeers) does not look so bad after all.
>
> More seriously, and for my own and possibly the OP's edification : we
> thus have 8 cores/cpus available, and one webapp being loaded at
> Tomcat startup, which takes ages, but independently of class
> compilation, Java and Tomcat versions etc.. Say we have told Tomcat
> to start 100 threads.
> Does that mean that possibly 8 threads (or more) are all (rather
> busily) waiting for something in relation to some initialisation in
> the (rather suspect) classes belonging to that webapp ?
> The OP also mentions "the initial request takes ages". So this
> initialisation would not happen really at Tomcat startup, but at the
> first request. Any particular setup or circumstance that could be
> tried to change when this happens, just to see ?
>
>

Sounds to me like the webapp is waiting for something external to
respond -- possibly from the network and timing out. In my experience
there is usually some extra latency on the first request due to
compiling jsps, but I can't imagine it being significant -- especially
on a really good box w/ multiple cores. It'd be interesting to start up
wireshark on the server and see if there is any network traffic sourced
out of tomcat during the several minute wait. Maybe a firewall was
installed or reconfigured or some dependent resource was shutdown around
Christmas.


--David
Sam Hokin
2009-02-20 16:15:36 UTC
Permalink
Christopher Schultz wrote:
> Sam,
>
> On 2/19/2009 6:23 PM, Sam Hokin wrote:
>> The problem, which spontaneously appeared a few days before Christmas on
>> this one server, is that the initial request of a JSP or servlet takes
>> AGES to respond, usually exceeding several minutes.
>
> Good... then you'll have plenty of time to /take a thread dump/ while
> it's spinning its wheels.

Well, I have used a number of diagnostics, like jconsole, jstat and jhat, but haven't been able to glean anything
informative from them about my issue. I've run the -Xloggc JVM option, and watched it garbage collect, again not
informing me of anything. Yes, if I could figure out what the JVM is actually DOING doing the long wait, I'd hopefully
be able to deal with it, but I honestly haven't been able to figure out how to do that. What do you suggest?

> I'll cast my hat into the ring: I suspect you're loading a resource that
> wants to looking an external resource (like an XML file with a remote
> DTD/Schema and validation is on in the parser) and the DNS is tripping
> you up.

No, there are no external references in my Java package. It's all just local class files.
Caldarale, Charles R
2009-02-20 16:21:23 UTC
Permalink
> From: Sam Hokin [mailto:***@ims.net]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart
>
> Well, I have used a number of diagnostics, like jconsole,
> jstat and jhat, but haven't been able to glean anything
> informative from them about my issue.

Run jstack several times during the delay, and then let's compare the set of thread dumps.

What is the CPU utilization during the delay?

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Sam Hokin
2009-02-20 17:11:21 UTC
Permalink
Caldarale, Charles R wrote:
>
> Run jstack several times during the delay, and then let's compare the set of thread dumps.
>
> What is the CPU utilization during the delay?

OK, done. The CPU utilization during the delay is tiny, 1% or less. I've run jstack before, during, and after I've
done a first request on my tiny offending test JSP. Since the jstack output is lengthy (I've got several sites running
on this public server, unavoidably), I've thrown the resulting output on my server for easier viewing.

jstack.0.txt ran BEFORE my test request.
jstack.1-11.txt ran DURING the long wait.
jstack.12,13.txt ran AFTER the response was received.

http://ims.net/jstack/jstack.0.txt

http://ims.net/jstack/jstack.1.txt
http://ims.net/jstack/jstack.2.txt
http://ims.net/jstack/jstack.3.txt
http://ims.net/jstack/jstack.4.txt
http://ims.net/jstack/jstack.5.txt
http://ims.net/jstack/jstack.6.txt
http://ims.net/jstack/jstack.7.txt
http://ims.net/jstack/jstack.8.txt
http://ims.net/jstack/jstack.9.txt
http://ims.net/jstack/jstack.10.txt
http://ims.net/jstack/jstack.11.txt

http://ims.net/jstack/jstack.12.txt
http://ims.net/jstack/jstack.13.txt

I wish I could divine something informative from these. Perhaps one of you can.

I'm also trying to use Wireshark to see if there is some specific net activity that's going on during the long wait, as
suggested by David Smith.
Christopher Schultz
2009-02-20 18:03:39 UTC
Permalink
Sam,

On 2/20/2009 12:11 PM, Sam Hokin wrote:
> jstack.0.txt ran BEFORE my test request.

Your server is completely idle, here.

> jstack.1-11.txt ran DURING the long wait.

Observing jstack.1.txt, it looks like this is the only running thread
(complete stack trace shown just for this first one):

Thread 9994: (state = IN_NATIVE)
- java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0
(Compiled frame; information may be imprecise)
- java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2,
line=228 (Compiled frame)
- java.io.File.exists() @bci=20, line=733 (Compiled frame)
- sun.misc.URLClassPath$FileLoader.getResource(java.lang.String,
boolean) @bci=136, line=995 (Compiled frame)
- sun.misc.URLClassPath$FileLoader.findResource(java.lang.String,
boolean) @bci=3, line=962 (Compiled frame)
- sun.misc.URLClassPath.findResource(java.lang.String, boolean)
@bci=17, line=145 (Compiled frame)
- java.net.URLClassLoader$2.run() @bci=12, line=362 (Compiled frame)
-
java.security.AccessController.doPrivileged(java.security.PrivilegedAction,
java.security.AccessControlContext) @bci=0 (Interpreted frame)
- java.net.URLClassLoader.findResource(java.lang.String) @bci=13,
line=359 (Compiled frame)
- java.lang.ClassLoader.getResource(java.lang.String) @bci=30, line=978
(Compiled frame)
- java.lang.ClassLoader.getResource(java.lang.String) @bci=12, line=973
(Compiled frame)
- java.lang.ClassLoader.getResourceAsStream(java.lang.String) @bci=2,
line=1168 (Interpreted frame)
-
org.apache.catalina.loader.WebappClassLoader.getResourceAsStream(java.lang.String)
@bci=326, line=1205 (Interpreted frame)
-
org.apache.jasper.servlet.JasperLoader.getResourceAsStream(java.lang.String)
@bci=5, line=144 (Interpreted frame)
- org.apache.jasper.compiler.JDTCompiler$1.findType(java.lang.String)
@bci=89, line=193 (Compiled frame)
- org.apache.jasper.compiler.JDTCompiler$1.findType(char[], char[][])
@bci=124, line=178 (Compiled frame)
-
org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.askForType(org.eclipse.jdt.internal.compiler.lookup.PackageBinding,
char[]) @bci=27, line=122 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.lookup.PackageBinding.getType(char[])
@bci=16, line=126 (Interpreted frame)
- org.eclipse.jdt.internal.compiler.lookup.Scope.findType(char[],
org.eclipse.jdt.internal.compiler.lookup.PackageBinding,
org.eclipse.jdt.internal.compiler.lookup.PackageBinding) @bci=14,
line=1351 (Compiled frame)
-
org.eclipse.jdt.internal.compiler.lookup.Scope.getTypeOrPackage(char[],
int) @bci=1017, line=2408 (Compiled frame)
- org.eclipse.jdt.internal.compiler.lookup.Scope.getPackage(char[][])
@bci=14, line=2090 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.ast.QualifiedTypeReference.getTypeBinding(org.eclipse.jdt.internal.compiler.lookup.Scope)
@bci=17, line=69 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.ast.TypeReference.resolveType(org.eclipse.jdt.internal.compiler.lookup.ClassScope)
@bci=36, line=163 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.ast.TypeReference.resolveSuperType(org.eclipse.jdt.internal.compiler.lookup.ClassScope)
@bci=2, line=114 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.lookup.ClassScope.findSupertype(org.eclipse.jdt.internal.compiler.ast.TypeReference)
@bci=35, line=1121 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.lookup.ClassScope.connectSuperclass()
@bci=157, line=835 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.lookup.ClassScope.connectTypeHierarchy()
@bci=34, line=960 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.connectTypeHierarchy()
@bci=20, line=290 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.lookup.LookupEnvironment.completeTypeBindings()
@bci=72, line=218 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.Compiler.internalBeginToCompile(org.eclipse.jdt.internal.compiler.env.ICompilationUnit[],
int) @bci=198, line=603 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.Compiler.beginToCompile(org.eclipse.jdt.internal.compiler.env.ICompilationUnit[])
@bci=19, line=357 (Interpreted frame)
-
org.eclipse.jdt.internal.compiler.Compiler.compile(org.eclipse.jdt.internal.compiler.env.ICompilationUnit[])
@bci=6, line=371 (Interpreted frame)
-
org.apache.jasper.compiler.JDTCompiler.generateClass(java.lang.String[])
@bci=837, line=413 (Interpreted frame)
- org.apache.jasper.compiler.Compiler.compile(boolean, boolean)
@bci=30, line=317 (Interpreted frame)
- org.apache.jasper.compiler.Compiler.compile(boolean) @bci=3, line=295
(Interpreted frame)
- org.apache.jasper.compiler.Compiler.compile() @bci=2, line=282
(Interpreted frame)
- org.apache.jasper.JspCompilationContext.compile() @bci=31, line=586
(Interpreted frame)
-
org.apache.jasper.servlet.JspServletWrapper.service(javax.servlet.http.HttpServletRequest,
javax.servlet.http.HttpServletResponse, boolean) @bci=118, line=317
(Interpreted frame)
-
org.apache.jasper.servlet.JspServlet.serviceJspFile(javax.servlet.http.HttpServletRequest,
javax.servlet.http.HttpServletResponse, java.lang.String,
java.lang.Throwable, boolean) @bci=193, line=342 (Interpreted frame)
-
org.apache.jasper.servlet.JspServlet.service(javax.servlet.http.HttpServletRequest,
javax.servlet.http.HttpServletResponse) @bci=436, line=267 (Interpreted
frame)
- javax.servlet.http.HttpServlet.service(javax.servlet.ServletRequest,
javax.servlet.ServletResponse) @bci=30, line=717 (Interpreted frame)
-
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(javax.servlet.ServletRequest,
javax.servlet.ServletResponse) @bci=374, line=290 (Interpreted frame)
-
org.apache.catalina.core.ApplicationFilterChain.doFilter(javax.servlet.ServletRequest,
javax.servlet.ServletResponse) @bci=101, line=206 (Interpreted frame)
-
org.apache.catalina.core.StandardWrapperValve.invoke(org.apache.catalina.connector.Request,
org.apache.catalina.connector.Response) @bci=804, line=233 (Interpreted
frame)
-
org.apache.catalina.core.StandardContextValve.invoke(org.apache.catalina.connector.Request,
org.apache.catalina.connector.Response) @bci=365, line=191 (Interpreted
frame)
-
org.apache.catalina.core.StandardHostValve.invoke(org.apache.catalina.connector.Request,
org.apache.catalina.connector.Response) @bci=64, line=128 (Interpreted
frame)
-
org.apache.catalina.valves.ErrorReportValve.invoke(org.apache.catalina.connector.Request,
org.apache.catalina.connector.Response) @bci=6, line=102 (Interpreted frame)
-
org.apache.catalina.core.StandardEngineValve.invoke(org.apache.catalina.connector.Request,
org.apache.catalina.connector.Response) @bci=42, line=109 (Interpreted
frame)
-
org.apache.catalina.connector.CoyoteAdapter.service(org.apache.coyote.Request,
org.apache.coyote.Response) @bci=157, line=286 (Interpreted frame)
- org.apache.coyote.ajp.AjpAprProcessor.process(long) @bci=309,
line=419 (Compiled frame)
-
org.apache.coyote.ajp.AjpAprProtocol$AjpConnectionHandler.process(long)
@bci=39, line=378 (Interpreted frame)
- org.apache.tomcat.util.net.AprEndpoint$Worker.run() @bci=147,
line=1509 (Interpreted frame)
- java.lang.Thread.run() @bci=11, line=619 (Interpreted frame)


jstack.2.txt has the same stack trace:

- java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0
(Compiled frame; information may be imprecise)
- java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2,
line=228 (Compiled frame)
- java.io.File.exists() @bci=20, line=733 (Compiled frame)
- sun.misc.URLClassPath$FileLoader.getResource(java.lang.String,
boolean) @bci=136, line=995 (Compiled frame)
- sun.misc.URLClassPath$FileLoader.findResource(java.lang.String,
boolean) @bci=3, line=962 (Compiled frame)
- sun.misc.URLClassPath.findResource(java.lang.String, boolean)
@bci=17, line=145 (Compiled frame)

jstack.8.txt is the last thread dump where File.exists was stalled.

> http://ims.net/jstack/jstack.9.txt
> http://ims.net/jstack/jstack.10.txt
> http://ims.net/jstack/jstack.11.txt

The server appears to be idle, here.

It's a little weird that thread 9770 has NO STACK INFO AT ALL.

> http://ims.net/jstack/jstack.12.txt
> http://ims.net/jstack/jstack.13.txt

Also idle.

> I wish I could divine something informative from these. Perhaps one of
> you can.
>
> I'm also trying to use Wireshark to see if there is some specific net
> activity that's going on during the long wait, as suggested by David Smith.

Obviously, the File.exists method shouldn't be taking that long... it's
a pretty simple operation. Are you using an NFS or other network share?
Does your disk have any physical problems? Is this machine running next
to any equipment that generates a lot of stray alpha particles? :)

- -chris
Sam Hokin
2009-02-20 18:28:17 UTC
Permalink
Christopher Schultz wrote:
>
> jstack.8.txt is the last thread dump where File.exists was stalled.
>
>> http://ims.net/jstack/jstack.9.txt
>> http://ims.net/jstack/jstack.10.txt
>> http://ims.net/jstack/jstack.11.txt
>
> The server appears to be idle, here.
>
> It's a little weird that thread 9770 has NO STACK INFO AT ALL.
>
>> http://ims.net/jstack/jstack.12.txt
>> http://ims.net/jstack/jstack.13.txt
>
> Also idle.

Yeah, I _thought_ it was still stalled on 9-11, but it looks like I misjudged the timing by a bit. I was flipping back
and forth between virtual windows, hitting jstack while checking if my browser had the response from the server.

> Obviously, the File.exists method shouldn't be taking that long... it's
> a pretty simple operation. Are you using an NFS or other network share?
> Does your disk have any physical problems? Is this machine running next
> to any equipment that generates a lot of stray alpha particles? :)

Not that I know of. :) We now know it's hanging on java.io.File.exists(), though, so I suppose that's leading us
somewhere. It's a logical RAID array on four physical disks. I've run fsck on it (when this all first happened) and
it's fine. And this problem persists no matter where the classes are on the disk; whether they're in a JAR file
anywhere, or split out under WEB-INF/classes. I don't think it can be a disk issue. I guess I could write a standalone
Java routine to play with the File.exists() method and see if I can reproduce the delay "manually"....
David Smith
2009-02-20 18:36:37 UTC
Permalink
Sam Hokin wrote:
> Christopher Schultz wrote:
>>
>> jstack.8.txt is the last thread dump where File.exists was stalled.
>>
>>> http://ims.net/jstack/jstack.9.txt
>>> http://ims.net/jstack/jstack.10.txt
>>> http://ims.net/jstack/jstack.11.txt
>>
>> The server appears to be idle, here.
>>
>> It's a little weird that thread 9770 has NO STACK INFO AT ALL.
>>
>>> http://ims.net/jstack/jstack.12.txt
>>> http://ims.net/jstack/jstack.13.txt
>>
>> Also idle.
>
> Yeah, I _thought_ it was still stalled on 9-11, but it looks like I
> misjudged the timing by a bit. I was flipping back and forth between
> virtual windows, hitting jstack while checking if my browser had the
> response from the server.
>
>> Obviously, the File.exists method shouldn't be taking that long... it's
>> a pretty simple operation. Are you using an NFS or other network share?
>> Does your disk have any physical problems? Is this machine running next
>> to any equipment that generates a lot of stray alpha particles? :)
>
> Not that I know of. :) We now know it's hanging on
> java.io.File.exists(), though, so I suppose that's leading us
> somewhere. It's a logical RAID array on four physical disks. I've
> run fsck on it (when this all first happened) and it's fine. And this
> problem persists no matter where the classes are on the disk; whether
> they're in a JAR file anywhere, or split out under WEB-INF/classes. I
> don't think it can be a disk issue. I guess I could write a
> standalone Java routine to play with the File.exists() method and see
> if I can reproduce the delay "manually"....
>
>
It _could_ be a disk issue. I've seen disks that scan fine contain a
growing number of blocks causing read errors and continuous retries.

--David
Christopher Schultz
2009-02-20 23:11:36 UTC
Permalink
Sam and David,

On 2/20/2009 1:36 PM, David Smith wrote:
> Sam Hokin wrote:
>> It's a logical RAID array on four physical disks. I've
>> run fsck on it (when this all first happened) and it's fine.
>
> It _could_ be a disk issue. I've seen disks that scan fine contain a
> growing number of blocks causing read errors and continuous retries.

Yeah, you gotta run badblocks on it. fsck just checks the filesystem
integrity, not the physical disk.

- -chris
André Warnier
2009-02-20 23:23:48 UTC
Permalink
Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Sam and David,
>
> On 2/20/2009 1:36 PM, David Smith wrote:
>> Sam Hokin wrote:
>>> It's a logical RAID array on four physical disks. I've
>>> run fsck on it (when this all first happened) and it's fine.
>> It _could_ be a disk issue. I've seen disks that scan fine contain a
>> growing number of blocks causing read errors and continuous retries.
>
> Yeah, you gotta run badblocks on it. fsck just checks the filesystem
> integrity, not the physical disk.
>
If there are problems with the disks themselves, the Raid controller
should have a utility program allowing to inspect such, no ?
If they are hot-swap, you could also try to remove one of them and
insert a new one, let him rebuild it and try again. It will take a
while to go through all 4 of them this way, but should not interrupt
anything.

It would be quite a pity if this turned out to be a mere disk problem
though. Much less interesting than the current intellectual pursuit.
Sam Hokin
2009-02-20 23:44:14 UTC
Permalink
Thanks, Chris. I ran e2fsck with the -c option, which runs badblocks, when I tested it earlier. And I just ran
badblocks again - 0 bad blocks found. I wish I could fix this by simply as swapping out a bad disk (notwithstanding
Andre's desire for intellectual pursuits), but I really think it's software, either in some service mucking up the JVM
or the JVM itself. But it only manifests itself under Tomcat, and then only when this particular package is imported.

Christopher Schultz wrote:
> Sam and David,
>
> On 2/20/2009 1:36 PM, David Smith wrote:
>> Sam Hokin wrote:
>>> It's a logical RAID array on four physical disks. I've
>>> run fsck on it (when this all first happened) and it's fine.
>> It _could_ be a disk issue. I've seen disks that scan fine contain a
>> growing number of blocks causing read errors and continuous retries.
>
> Yeah, you gotta run badblocks on it. fsck just checks the filesystem
> integrity, not the physical disk.
Mark Thomas
2009-02-20 23:50:21 UTC
Permalink
Sam Hokin wrote:
> Thanks, Chris. I ran e2fsck with the -c option, which runs badblocks,
> when I tested it earlier. And I just ran badblocks again - 0 bad blocks
> found. I wish I could fix this by simply as swapping out a bad disk
> (notwithstanding Andre's desire for intellectual pursuits), but I really
> think it's software, either in some service mucking up the JVM or the
> JVM itself. But it only manifests itself under Tomcat, and then only
> when this particular package is imported.

Do you see the same issue if you pre-compile that JSP?

Mark
Martin Gainty
2009-02-21 01:44:56 UTC
Permalink
agreed
the jspc task from ANT will <pre>compile your jsps for you

(although you'll need jasper.jar and jasper-runtime.jar)
Documentation avaialable at
http://ant.apache.org/manual/OptionalTasks/jspc.html

ping me offline if you needs those jars..

Martin
______________________________________________
Disclaimer and confidentiality note
Everything in this e-mail and any attachments relates to the official business of Sender. This transmission is of a confidential nature and Sender does not endorse distribution to any party other than intended recipient. Sender does not necessarily endorse content contained within this transmission.




> Date: Fri, 20 Feb 2009 23:50:21 +0000
> From: ***@apache.org
> To: ***@tomcat.apache.org
> Subject: Re: very slow class loading on initial JSP/servlet request after restart
>
> Sam Hokin wrote:
> > Thanks, Chris. I ran e2fsck with the -c option, which runs badblocks,
> > when I tested it earlier. And I just ran badblocks again - 0 bad blocks
> > found. I wish I could fix this by simply as swapping out a bad disk
> > (notwithstanding Andre's desire for intellectual pursuits), but I really
> > think it's software, either in some service mucking up the JVM or the
> > JVM itself. But it only manifests itself under Tomcat, and then only
> > when this particular package is imported.
>
> Do you see the same issue if you pre-compile that JSP?
>
> Mark
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-***@tomcat.apache.org
> For additional commands, e-mail: users-***@tomcat.apache.org
>

_________________________________________________________________
Windows Live™: E-mail. Chat. Share. Get more ways to connect.
http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_HM_AE_Faster_022009
Sam Hokin
2009-02-24 18:54:09 UTC
Permalink
Mark Thomas wrote:
> Sam Hokin wrote:
>> Thanks, Chris. I ran e2fsck with the -c option, which runs badblocks,
>> when I tested it earlier. And I just ran badblocks again - 0 bad blocks
>> found. I wish I could fix this by simply as swapping out a bad disk
>> (notwithstanding Andre's desire for intellectual pursuits), but I really
>> think it's software, either in some service mucking up the JVM or the
>> JVM itself. But it only manifests itself under Tomcat, and then only
>> when this particular package is imported.
>
> Do you see the same issue if you pre-compile that JSP?

Surprisingly, yes. So it's not only a compilation issue. After some individual tests, I compiled the JSPs from the
four sites I have running on this server and restarted Tomcat, and the LONG delay before Tomcat begins serving page
requests continues. I looked at my individual test JSP, now precompiled, and, sure enough, it takes a long time to
respond to the first request after a Tomcat restart. In fact, it wasn't any better than when Tomcat compiles them on
the fly. So I'm back to thinking the classloader is involved in this delay as well.

I did check, just to be sure, and Tomcat did not write any class files under Catalina/work. And if I deleted a compiled
class file on a site, a request throws an error logged in catalina.out. So, Tomcat is definitely using the compiled
classes.
Christopher Schultz
2009-02-24 21:54:18 UTC
Permalink
Sam,

On 2/24/2009 1:54 PM, Sam Hokin wrote:
> Mark Thomas wrote:
>> Sam Hokin wrote:
>>> Thanks, Chris. I ran e2fsck with the -c option, which runs badblocks,
>>> when I tested it earlier. And I just ran badblocks again - 0 bad blocks
>>> found. I wish I could fix this by simply as swapping out a bad disk
>>> (notwithstanding Andre's desire for intellectual pursuits), but I really
>>> think it's software, either in some service mucking up the JVM or the
>>> JVM itself. But it only manifests itself under Tomcat, and then only
>>> when this particular package is imported.
>>
>> Do you see the same issue if you pre-compile that JSP?
>
> Surprisingly, yes. So it's not only a compilation issue. After some
> individual tests, I compiled the JSPs from the four sites I have running
> on this server and restarted Tomcat, and the LONG delay before Tomcat
> begins serving page requests continues.

... and what is happening when you take a thread dump. Is it
File.exists() taking forever again? If that's the case, the class
loading is what's taking forever. I have no idea what could be causing
that other than a disk read error (or network latency, but you mentioned
you weren't using NFS or anything like that).

Someone mentioned running strace which sounds like a good idea to me.
You'll just have to sift through a ton of information to find what
you're looking for since the JVM performs a /lot/ of system calls during
its operation.

- -chris
Mark Thomas
2009-02-25 09:42:50 UTC
Permalink
Sam Hokin wrote:
> Mark Thomas wrote:
>> Sam Hokin wrote:
>>> Thanks, Chris. I ran e2fsck with the -c option, which runs badblocks,
>>> when I tested it earlier. And I just ran badblocks again - 0 bad blocks
>>> found. I wish I could fix this by simply as swapping out a bad disk
>>> (notwithstanding Andre's desire for intellectual pursuits), but I really
>>> think it's software, either in some service mucking up the JVM or the
>>> JVM itself. But it only manifests itself under Tomcat, and then only
>>> when this particular package is imported.
>>
>> Do you see the same issue if you pre-compile that JSP?
>
> Surprisingly, yes. So it's not only a compilation issue.

Ok. To summarise when you include net.ims.jcms.* in your imports the
page compiles quickly but takes ages to respond to the first request.

I wonder if this is related to loading a specific class in your library.
Can you use a test JSP try and isolate which class(es) are causing the
slow down?

My thinking is if we can reduce the scope of the problem to importing a
single class, we can then separate out that class and reduce the code in
it bit by bit until we have the bare minimum that causes the problem.
Hopefully, there will be little enough code left that it will be obvious.

Mark
Tim Funk
2009-02-25 12:31:25 UTC
Permalink
While I am late to this ... Is this an accurate summary?

- Slow re-load on a server
- Server is a production server
- Other servers are OK so it can only be reproduced on production server
- Initial looks at network seem to be no network activity
- There seems to be a hint at File.exists() causing an issue
- The wisdom of the crowds is pointing to disk issue

Would it be possible to copy the tomcat installation and webapp to
another disk (or new directory) on the same machine and then run the
copy on an alternate port? If it "fails" like it does on the lower port
- you have a better sandbox for debugging. If not ... you have a new
interesting data point in the debug process.

Last oddball question - even though this is a Linux server, is there any
virus scanning software running?

-Tim

Mark Thomas wrote:
> Sam Hokin wrote:
>> Mark Thomas wrote:
>>> Sam Hokin wrote:
>>>> Thanks, Chris. I ran e2fsck with the -c option, which runs badblocks,
>>>> when I tested it earlier. And I just ran badblocks again - 0 bad blocks
>>>> found. I wish I could fix this by simply as swapping out a bad disk
>>>> (notwithstanding Andre's desire for intellectual pursuits), but I really
>>>> think it's software, either in some service mucking up the JVM or the
>>>> JVM itself. But it only manifests itself under Tomcat, and then only
>>>> when this particular package is imported.
>>> Do you see the same issue if you pre-compile that JSP?
>> Surprisingly, yes. So it's not only a compilation issue.
>
> Ok. To summarise when you include net.ims.jcms.* in your imports the
> page compiles quickly but takes ages to respond to the first request.
>
> I wonder if this is related to loading a specific class in your library.
> Can you use a test JSP try and isolate which class(es) are causing the
> slow down?
>
> My thinking is if we can reduce the scope of the problem to importing a
> single class, we can then separate out that class and reduce the code in
> it bit by bit until we have the bare minimum that causes the problem.
> Hopefully, there will be little enough code left that it will be obvious.
>
Sam Hokin
2009-02-25 17:43:43 UTC
Permalink
Tim Funk wrote:
> While I am late to this ... Is this an accurate summary?
>
> - Slow re-load on a server
> - Server is a production server
> - Other servers are OK so it can only be reproduced on production server
> - Initial looks at network seem to be no network activity
> - There seems to be a hint at File.exists() causing an issue
> - The wisdom of the crowds is pointing to disk issue

Yes, Tim, that's pretty accurate.

> Would it be possible to copy the tomcat installation and webapp to
> another disk (or new directory) on the same machine and then run the
> copy on an alternate port? If it "fails" like it does on the lower port
> - you have a better sandbox for debugging. If not ... you have a new
> interesting data point in the debug process.

That's a good suggestion, I'll do so. I've been a bit concerned about running a second Tomcat instance on this
production server since I've been having so much trouble with it, but if a second instance is a problem, I'll just turn
it off. It's probably better to leave the production instance alone, actually. I have changed JVM versions, Tomcat
versions, etc. over the course of this issue, by the way.

> Last oddball question - even though this is a Linux server, is there any
> virus scanning software running?

Nope.
André Warnier
2009-02-25 17:38:26 UTC
Permalink
Mark Thomas wrote:
> Sam Hokin wrote:
>> Mark Thomas wrote:
>>> Sam Hokin wrote:
[...]

>
> My thinking is if we can reduce the scope of the problem to importing a
> single class, we can then separate out that class and reduce the code in
> it bit by bit until we have the bare minimum that causes the problem.
> Hopefully, there will be little enough code left that it will be obvious.
>
Which is basically what I suggested a few posts back, after trying to
summarise the problem again. My posts are not always about reindeers or
chocolate.
Sam, because there is quite a bit of traffic on this issue, and because
the exact description seems to change a bit over time (e.g.
compilation-only or not), could you remake a summary of what you are
sure of so far ?
Without totally excluding it for instance, I would surmise that it is
unlikely to be a disk-related problem, because copying the complete
webapp somewhere else does not change the issue.
About the File.exists() issue and what it is checking, isn't a tool like
"lsof" a possible way to find out ? It can show all I/O like resources
used by a given process, be they sockets or files.
lsof has more options than a human can easily digest, but I think it is
a simple matter to just ask for the full list, for a given process-id.
You'll be surprised at how many I/O thinks a process uses, but maybe
something in the list will catch your eye, since you among all of us
knows best what it /should/ be using.
Sam Hokin
2009-02-25 18:48:09 UTC
Permalink
André Warnier wrote:
> Sam, because there is quite a bit of traffic on this issue, and because
> the exact description seems to change a bit over time (e.g.
> compilation-only or not), could you remake a summary of what you are
> sure of so far ?

1. Initial requests to ANY JSP page on a site, precompiled or not, or a compiled servlet, take an inordinate amount of
time to generate a response, often several minutes, as long as 20 minutes or so after the server is restarted.

2. After a response is finally generated from a given JSP, that JSP always responds quickly to subsequent requests.

3. The four virtual hosts start serving pages at the same time, after a Tomcat restart, as long as 20 minutes after
Tomcat started up.

4. jstack showed that the delay during a small JSP request was during a call to java.io.File.exists(). It is still not
known what the particular file was.

5. strace reveals that thousands and thousands of stat64, access and lstat64 calls are made to JAR files and classes
during the initial "stall phase." I've posted a couple short examples from my test6.jar requests.

> Without totally excluding it for instance, I would surmise that it is
> unlikely to be a disk-related problem, because copying the complete
> webapp somewhere else does not change the issue.

Yes, and I have done so several times.

> About the File.exists() issue and what it is checking, isn't a tool like
> "lsof" a possible way to find out ? It can show all I/O like resources
> used by a given process, be they sockets or files.

I'm looking at lsof now. Haven't seen anything remarkable so far, but I'll keep at it a bit.
Sam Hokin
2009-02-25 21:54:18 UTC
Permalink
NEW DISCOVERY - using a wildcard package import results in a much longer response delay!

I've taken Mark's suggestion that I import individual classes rather than importing all with a wildcard to see if a
particular class is slowing down classloading. It turns out that it's the WILDCARD itself that is causing most of the
delay!

If I request the following "wildcard import" JSP:

<%@ page import="java.util.Date" %>
<%@ page import="net.ims.jcms.*" %>

<%
java.util.Date endDate = new java.util.Date();
out.println(endDate.toString());
%>

it responds in 49 seconds, consistently, over several different tries (renaming the file each time to incur the delay).
That includes on-the-fly compilation.

If I request a JSP that imports EVERY class in net.ims.jcms explicitly (it's a lot of lines, so truncated below):

<%@ page import="java.util.Date" %>
<%@ page import="net.ims.jcms.AccessRole" %>

<%@ page import="net.ims.jcms.AccessRoleException" %>

<%@ page import="net.ims.jcms.AccessUser" %>

<%@ page import="net.ims.jcms.AccessUserException" %>

... a bunch more here ...
<%@ page import="net.ims.jcms.UtilityLink" %>
<%@ page import="net.ims.jcms.ValidationException" %>
<%
java.util.Date endDate = new java.util.Date();
out.println(endDate.toString());
%>

it responds in 3 seconds, consistently (with renaming), which isn't as fast as it should be, but it's way faster than 49
seconds.

I have no idea why a wildcard package import differs from importing explicit classes (from the same JAR), so I don't
know what this is telling us, but it sure seems to be significant! I have not been able to get lsof to display any
files related to these requests, I expect they open and close too quickly for me to catch it.

I have run strace during a request of a wildcard import JSP (test7.jsp) on a standalone Tomcat instance so that I don't
have lines from other requests. The request is made at 15:24:09, and the response came at 15:24:58. I've included the
strace lines from that time range here:

http://ims.net/media/strace-test-wildcard.txt

You can see the read call on the HTTP GET request on the sixth line. The HTTP 200 response is in a write call near the
bottom. Presumably everything that happened during the 49 seconds is recorded here. You can see that the .java file
has been opened within a second,

2498 15:24:09 open("/usr/local/tomcat-2/work/Catalina/test.ims.net/_/org/apache/jsp/test7_jsp.java",
O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 62 <0.000065>

and the compiled class is opened at 15:24:58,

2498 15:24:58 open("/usr/local/tomcat-2/work/Catalina/test.ims.net/_/org/apache/jsp/test7_jsp.class",
O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 62 <0.000064>

I have no idea how to interpret all the wheel-spinning that goes on in between.
Caldarale, Charles R
2009-02-25 22:22:00 UTC
Permalink
> From: Sam Hokin [mailto:***@ims.net]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart
>
> I have no idea how to interpret all the wheel-spinning that
> goes on in between.

The key lines have a 3+ second response time:

2498 15:24:35 stat64("/net/ims/jcms/Object.class", 0x7b6b9c20) = -1 ENOENT (No such file or directory) <3.020996>

Most of the calls return in much less than a millisecond, but there are a few ENOENT (not all) responses that hit this 3-second delay. All of the slow ones are looking at the /net/ims/jcms directory; what is that? Why is it in this webapp's classpath? (It might be in the jar's META-INF entries.)

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Sam Hokin
2009-02-25 23:14:24 UTC
Permalink
Caldarale, Charles R wrote:
>
> The key lines have a 3+ second response time:
>
> 2498 15:24:35 stat64("/net/ims/jcms/Object.class", 0x7b6b9c20) = -1 ENOENT (No such file or directory) <3.020996>
>
> Most of the calls return in much less than a millisecond, but there are a few ENOENT (not all) responses that hit this 3-second delay. All of the slow ones are looking at the /net/ims/jcms directory; what is that? Why is it in this webapp's classpath? (It might be in the jar's META-INF entries.)

Wow, good eye, Chuck, and those add up to at least 35 seconds of the delay - the bulk of it. I have absolutely no idea
why it's looking there. I certainly didn't put /net/ims/jcms in the classpath, and the jar file that contains
net.ims.jcms just has a stub MANIFEST.MF entry, nothing else, in META-INF. I don't have a META-INF directory under the
web app ROOT. The only place classes are stored under /net/ims/jcms is within WEB-INF/lib/jcms.jar. But this is
clearly where the bulk of the time is spent, and one can see how this could scale up really poorly, since Tomcat is
looking for random core classes like Object.class and Throwable.class under /net/ims/jcms.

For comparison, here is the strace output from a test JSP that imports all of the net.ims.jcms classes explicitly (no
wildcird import), and responds in three seconds:

http://ims.net/media/strace-test-explicit.txt

You can see the read/GET near the top and the write HTTP 200 response near the bottom. There is only ONE attempt to
find a file under /net/ims/jcms, and that only takes .025 sec.

It looks like we've boiled the problem, at least in this test JSP case, down to the classloader wasting 3 seconds per
try on stat-ing core Java classes in an imaginary location, /net/ims/jcms, on the server's filesystem, which actually
corresponds to a path WITHIN a jar file that I wrote, and which doesn't contain any core Java classes. This sure seems
like a bug to me, but why it is manifested only on this one server, and primarily when I use a <%@ page
import="net.ims.jcms.*" %> call, remains a mystery. I can see how these 3-second errant lookups can really bog down the
server when it starts up, though, when a ton of classes are being loaded for page requests on several virtual hosts.
Caldarale, Charles R
2009-02-26 02:13:27 UTC
Permalink
> From: Sam Hokin [mailto:***@ims.net]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart
>
> The only place classes are stored under /net/ims/jcms
> is within WEB-INF/lib/jcms.jar.

I'll poke around in the webapp classloader to see if I can find anything interesting, but in the meantime, is there a /net directory on the problematic server? If there is and it targets a remote file system, that might explain the long delay on these stat() calls.

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Sam Hokin
2009-02-26 15:50:24 UTC
Permalink
Caldarale, Charles R wrote:
>
> I'll poke around in the webapp classloader to see if I can find anything interesting, but in the meantime, is there a /net directory on the problematic server? If there is and it targets a remote file system, that might explain the long delay on these stat() calls.

You DA MAN, Charles! It exists, owned by root, created when the machine was last booted, is empty and is currently
locked. One of the other, non-problematic, servers has a similar empty /net directory; the other doesn't.

[***@pluto tmp]# stat /net
File: `/net'
Size: 0 Blocks: 0 IO Block: 1024 directory
Device: 14h/20d Inode: 4937 Links: 2
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2009-02-26 09:18:38.479994369 -0600
Modify: 2009-02-24 12:32:45.839946431 -0600
Change: 2009-02-24 12:32:45.839946431 -0600

[***@pluto tmp]# lsof /net
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
automount 2682 root 16r DIR 0,20 0 4937 /net

[***@pluto]# rmdir /net
rmdir: failed to remove `/net': Device or resource busy

It's created by automount, by the autofs service, along with /misc. /misc and /net are listed in /etc/auto.master. I
guess since Tomcat keeps hitting /net, it's never automatically unmounted. Not sure why /misc stays mounted.

SO, I've shut off autofs (on by default in a Fedora install, apparently), removed /net (and /misc), and SURE ENOUGH, my
test JSP that imports net.ims.jcms.* COMPILES AND RUNS FAST, in less than a second!!!

In summary, for unknown reasons, JDT is searching /net for core Java classes; /net exists, because it's listed in
/etc/auto.master and the autofs daemon created it with automount; JDT's stat on /net/* is slow for unknown reasons, but
that slowness adds up to a huge delay in classloading.

So, we've "fixed" the problem (!!!!) by shutting off autofs and removing /net, but the core issue, that JDT is searching
under /net for core classes, remains. That shouldn't be happening. I'll pursue that a bit on my test instance with
org.apache.jasper.level=ALL.

We're almost there! Thanks, everyone!
Sam Hokin
2009-02-26 15:59:10 UTC
Permalink
Sam Hokin wrote:
> Caldarale, Charles R wrote:
>>
>> I'll poke around in the webapp classloader to see if I can find
>> anything interesting, but in the meantime, is there a /net directory
>> on the problematic server? If there is and it targets a remote file
>> system, that might explain the long delay on these stat() calls.
>
> You DA MAN, Charles! It exists, owned by root, created when the machine
> was last booted, is empty and is currently locked. One of the other,
> non-problematic, servers has a similar empty /net directory; the other
> doesn't.

AND, I should add, the other two servers were NOT running autofs. In fact, autofs wasn't even installed on one of the
other servers. So on the non-problematic server with a /net directory, it's just an empty directory. I'm totally
convinced that autofs was causing the 3-second delays on stat64 /net/* calls.

So the answer to our old question "what's different about the problematic server?" is "it's running autofs, which is
creating and locking /net". (I've now uninstalled autofs on all servers.)
Jorge Medina
2009-02-26 16:13:13 UTC
Permalink
Just for curiosity, Is "/" included in your classpath ?


-----Original Message-----
From: Sam Hokin [mailto:***@ims.net]
Sent: Thursday, February 26, 2009 10:59 AM
To: Tomcat Users List
Subject: Re: very slow class loading on initial JSP/servlet request
after restart

Sam Hokin wrote:
> Caldarale, Charles R wrote:
>>
>> I'll poke around in the webapp classloader to see if I can find
>> anything interesting, but in the meantime, is there a /net directory
>> on the problematic server? If there is and it targets a remote file
>> system, that might explain the long delay on these stat() calls.
>
> You DA MAN, Charles! It exists, owned by root, created when the
> machine was last booted, is empty and is currently locked. One of the

> other, non-problematic, servers has a similar empty /net directory;
> the other doesn't.

AND, I should add, the other two servers were NOT running autofs. In
fact, autofs wasn't even installed on one of the other servers. So on
the non-problematic server with a /net directory, it's just an empty
directory. I'm totally convinced that autofs was causing the 3-second
delays on stat64 /net/* calls.

So the answer to our old question "what's different about the
problematic server?" is "it's running autofs, which is creating and
locking /net". (I've now uninstalled autofs on all servers.)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@tomcat.apache.org
For additional commands, e-mail: users-***@tomcat.apache.org
Caldarale, Charles R
2009-02-26 16:18:22 UTC
Permalink
> From: Jorge Medina [mailto:***@e-dialog.com]
> Subject: RE: very slow class loading on initial JSP/servlet
> request after restart
>
> Just for curiosity, Is "/" included in your classpath ?

Note that the classpath of interest is the one JDT uses, not any global one or any associated with Tomcat's classloaders. As stated earlier, setting the log level to ALL (or even FINE) for org.apache.jasper will display the classpath of interest.

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Christopher Schultz
2009-02-26 18:00:06 UTC
Permalink
Sam,

On 2/26/2009 10:50 AM, Sam Hokin wrote:
> Caldarale, Charles R wrote:
>>
>> I'll poke around in the webapp classloader to see if I can find
>> anything interesting, but in the meantime, is there a /net directory
>> on the problematic server? If there is and it targets a remote file
>> system, that might explain the long delay on these stat() calls.
>
> You DA MAN, Charles! It exists, owned by root, created when the machine
> was last booted, is empty and is currently locked. One of the other,
> non-problematic, servers has a similar empty /net directory; the other
> doesn't.

So, my first thought (remote DTD) was wrong, but my second (NFS share)
turned out to be right.

Such a crazy way to get to this point, though.

I wonder if there's a system CLASSPATH that's been configured to look
under /net.

- -chris
Caldarale, Charles R
2009-02-26 04:35:21 UTC
Permalink
> From: Sam Hokin [mailto:***@ims.net]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart

> It looks like we've boiled the problem, at least in this test
> JSP case, down to the classloader wasting 3 seconds per
> try on stat-ing core Java classes in an imaginary location,
> /net/ims/jcms, on the server's filesystem, which actually
> corresponds to a path WITHIN a jar file that I wrote, and
> which doesn't contain any core Java classes.

Some comments on the above:

1) It's not a Tomcat or JVM classloader that's doing the lookup - it's the JDT compiler hunting for unresolved class references.

2) The Object.class, Throwable.class, etc., core class files are not known to be core until they *haven't* been found on any imported packages that include a wildcard. That's why the lookups are done with a wildcard, and not done with explicit classes.

Once you have your test Tomcat set up on the problematic server, try adding this line to the conf/logging.properties file:

org.apache.jasper.level = ALL

That will display a *lot* of information about what's going on during JSP compilation. One thing in particular printed out during Tomcat initialization is the compiler classpath for JSPs in each webapp, which might show why the lookups are being done at the root of the file system.

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Sam Hokin
2009-02-26 16:31:15 UTC
Permalink
Caldarale, Charles R wrote:
> Once you have your test Tomcat set up on the problematic server, try adding this line to the conf/logging.properties file:
>
> org.apache.jasper.level = ALL
>
> That will display a *lot* of information about what's going on during JSP compilation. One thing in particular printed out during Tomcat initialization is the compiler classpath for JSPs in each webapp, which might show why the lookups are being done at the root of the file system.

OK, done. I started up my test instance, ran a single request on my test JSP, and shut it down. The resulting
catalina.out is here for your review:

http://ims.net/media/jasper-level-all.txt

Sure enough, / appears in the classpath for the ROOT webapp, right here:

[stuff]:/usr/local/lib/jtds-1.2.2.jar:/:/usr/local/tomcat-2/bin/bootstrap.jar:[other stuff]

/ is appended after a jar that I have symlinked under $TOMCAT_HOME/lib and before bootstrap.jar.

So, SINCE my company is ims.net and my Java packages start with net.ims; and SINCE I had autofs running which
automounted /net; and SINCE / is in the classpath, JDT searched under /net to find core classes, which incurred a very
long delay, presumably due to automount. If autofs is off and /net is simply an empty directory, the delay is not
incurred. My problem is cured by simply turning off autofs.

It seems to me that it's a bug that / is being appended to the classpath, but one that didn't result in pain until the
particular circumstance above came into play, clearly a VERY specific scenario.

THANKS, EVERYONE! I think we've solved this riddle! I owe you all beers, or your beverage of choice!
André Warnier
2009-02-26 17:36:31 UTC
Permalink
Sam Hokin wrote:
> Caldarale, Charles R wrote:
>> Once you have your test Tomcat set up on the problematic server, try
>> adding this line to the conf/logging.properties file:
>>
>> org.apache.jasper.level = ALL
>>
>> That will display a *lot* of information about what's going on during
>> JSP compilation. One thing in particular printed out during Tomcat
>> initialization is the compiler classpath for JSPs in each webapp,
>> which might show why the lookups are being done at the root of the
>> file system.
>
> OK, done. I started up my test instance, ran a single request on my
> test JSP, and shut it down. The resulting
> catalina.out is here for your review:
>
> http://ims.net/media/jasper-level-all.txt
>
> Sure enough, / appears in the classpath for the ROOT webapp, right here:
>
> [stuff]:/usr/local/lib/jtds-1.2.2.jar:/:/usr/local/tomcat-2/bin/bootstrap.jar:[other
> stuff]
>
> / is appended after a jar that I have symlinked under $TOMCAT_HOME/lib
> and before bootstrap.jar.

and, this is also maybe where these lstat calls were coming from (lstat
varies from stat in that it queries the symlink itself, rather than the
underlying real file).

>
> So, SINCE my company is ims.net and my Java packages start with net.ims;
> and SINCE I had autofs running which
> automounted /net; and SINCE / is in the classpath, JDT searched under
> /net to find core classes, which incurred a very
> long delay, presumably due to automount. If autofs is off and /net is
> simply an empty directory, the delay is not
> incurred. My problem is cured by simply turning off autofs.
>
> It seems to me that it's a bug that / is being appended to the
> classpath, but one that didn't result in pain until the
> particular circumstance above came into play, clearly a VERY specific
> scenario.
>
> THANKS, EVERYONE! I think we've solved this riddle! I owe you all beers,
> or your beverage of choice!
>
My preferred one is Westvleteren, arguably the best beer in the world.
See http://en.wikipedia.org/wiki/Westvleteren_Brewery
Cheap, but hard to get.
You can have it delivered at the ApacheCON Europe 2009, and I will take
care of the distribution to deserving contributors. Maybe we can even
get some to Minnesota, along with some Belgian dark chocolate.

I am very glad that this turned out after all as a (fruitful)
intellectual pursuit, rather than a boring hardware issue.
Caldarale, Charles R
2009-02-26 18:03:16 UTC
Permalink
> From: Sam Hokin [mailto:***@ims.net]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart
>
> Sure enough, / appears in the classpath for the ROOT webapp,

So that appears to be the sole remaining question: why is there a / in JDT's classpath?

Do you have a CLASSPATH environment variable set for the Tomcat process? (You shouldn't.)

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Caldarale, Charles R
2009-02-26 18:31:26 UTC
Permalink
> From: Caldarale, Charles R
> Subject: RE: very slow class loading on initial JSP/servlet
> request afterrestart
>
> Do you have a CLASSPATH environment variable set for the
> Tomcat process? (You shouldn't.)

Or is there a plain / in the -cp argument in the Tomcat startup script? Or has conf/catalina.properties been modified to include the file system root?

Any of those are a really Bad Idea.

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Sam Hokin
2009-02-26 21:43:52 UTC
Permalink
Caldarale, Charles R wrote:
>> Do you have a CLASSPATH environment variable set for the
>> Tomcat process? (You shouldn't.)
>
> Or is there a plain / in the -cp argument in the Tomcat startup script? Or has conf/catalina.properties been modified to include the file system root?
>
> Any of those are a really Bad Idea.

And none of these is the case. I've never put / in a classpath, and I cannot find it anywhere.
Pieter Temmerman
2009-02-27 07:27:22 UTC
Permalink
I'm glad you got it solved Sam.
It was an exciting topic to follow.
I haven't been reading the mailinglist for a long time, but there might
be a good chance you got the longest thread in the mailinglist
history :)

Cheers.

Pieter

On Thu, 2009-02-26 at 10:31 -0600, Sam Hokin wrote:
> Caldarale, Charles R wrote:
> > Once you have your test Tomcat set up on the problematic server, try adding this line to the conf/logging.properties file:
> >
> > org.apache.jasper.level = ALL
> >
> > That will display a *lot* of information about what's going on during JSP compilation. One thing in particular printed out during Tomcat initialization is the compiler classpath for JSPs in each webapp, which might show why the lookups are being done at the root of the file system.
>
> OK, done. I started up my test instance, ran a single request on my test JSP, and shut it down. The resulting
> catalina.out is here for your review:
>
> http://ims.net/media/jasper-level-all.txt
>
> Sure enough, / appears in the classpath for the ROOT webapp, right here:
>
> [stuff]:/usr/local/lib/jtds-1.2.2.jar:/:/usr/local/tomcat-2/bin/bootstrap.jar:[other stuff]
>
> / is appended after a jar that I have symlinked under $TOMCAT_HOME/lib and before bootstrap.jar.
>
> So, SINCE my company is ims.net and my Java packages start with net.ims; and SINCE I had autofs running which
> automounted /net; and SINCE / is in the classpath, JDT searched under /net to find core classes, which incurred a very
> long delay, presumably due to automount. If autofs is off and /net is simply an empty directory, the delay is not
> incurred. My problem is cured by simply turning off autofs.
>
> It seems to me that it's a bug that / is being appended to the classpath, but one that didn't result in pain until the
> particular circumstance above came into play, clearly a VERY specific scenario.
>
> THANKS, EVERYONE! I think we've solved this riddle! I owe you all beers, or your beverage of choice!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-***@tomcat.apache.org
> For additional commands, e-mail: users-***@tomcat.apache.org
>
--
Pieter Temmerman
email: ***@sadiel.es
skype: ptemmerman.sadiel

SADIEL TECNOLOGÍAS DE LA INFORMACIÓN, S.A. http://www.sadiel.es.
Christopher Schultz
2009-02-26 17:50:07 UTC
Permalink
Mark,

On 2/25/2009 4:42 AM, Mark Thomas wrote:
> Ok. To summarise when you include net.ims.jcms.* in your imports the
> page compiles quickly but takes ages to respond to the first request.
>
> I wonder if this is related to loading a specific class in your library.
> Can you use a test JSP try and isolate which class(es) are causing the
> slow down?

At first I was tempted by the "class is loading forever" possibility,
but if the report is correct, merely adding "import net.ims.jcms.*" to
the JSP causes the problem... not actually using the classes. The
compilation (apparently) goes fast. It's the class loading that is very
slow.

You can import anything you want in a JSP (or even a .java file) but the
.class file only includes references to the classes actually used, so no
class loading would be triggered.

Sam: are you sure you're not /using/ those classes? Just importing them?

- -chris
Christopher Schultz
2009-02-22 15:03:50 UTC
Permalink
Sam,

On 2/20/2009 6:44 PM, Sam Hokin wrote:
> Thanks, Chris. I ran e2fsck with the -c option, which runs badblocks,
> when I tested it earlier. And I just ran badblocks again - 0 bad blocks
> found.

Did you run badblocks on the array, or on an individual disk?

I would expect the disk-level check to be more useful.

- -chris
Sam Hokin
2009-02-23 22:47:35 UTC
Permalink
Christopher Schultz wrote:
> Sam,
>
> On 2/20/2009 6:44 PM, Sam Hokin wrote:
>> Thanks, Chris. I ran e2fsck with the -c option, which runs badblocks,
>> when I tested it earlier. And I just ran badblocks again - 0 bad blocks
>> found.
>
> Did you run badblocks on the array, or on an individual disk?
>
> I would expect the disk-level check to be more useful.

On the array. I'm not even sure I can get at a single disk from the OS with this hardware RAID configuration. The OS
had no idea that it's a RAID array as far as I know.
Caldarale, Charles R
2009-02-20 18:39:47 UTC
Permalink
> From: Sam Hokin [mailto:***@ims.net]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart
>
> We now know it's hanging on java.io.File.exists()

The file of interest couldn't be /dev/random, could it? (I can't imagine jasper using that for anything, but...)

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Sam Hokin
2009-02-20 20:05:04 UTC
Permalink
I'd like to correct an error I made in my earlier report of this problem. It turns out that the response delay in my
simple test JSP IS during compilation! I ran stat on the files in question, as well as looked at the time shown on the
response page (which is all the test JSP does). (I think I got confused by the output of the -verbose:class JVM option
earlier.)

Here's the file sequence from stat:

File: `test4.jsp'
Size: 146 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 7701336 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 500/ sam) Gid: ( 100/ users)
Access: 2009-02-20 13:39:29.000000000 -0600 <--- Tomcat accesses the file
Modify: 2009-02-20 10:39:09.000000000 -0600
Change: 2009-02-20 13:39:06.000000000 -0600

File: `/usr/local/tomcat/work/Catalina/www.ims.net/_/org/apache/jsp/test4_jsp.class'
Size: 3520 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 10650578 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 501/ tomcat) Gid: ( 501/ tomcat)
Access: 2009-02-20 13:40:17.000000000 -0600 <--- the compiled class appears almost a minute later
Modify: 2009-02-20 13:40:17.000000000 -0600
Change: 2009-02-20 13:40:17.000000000 -0600

The HTTP response came immediately after that, the page displaying: Fri Feb 20 13:40:17 CST 2009.

As before, jstack reveals that the thread is stalling on java.io.File.exists(), or specifically
java.io.UnixFileSystem.getBooleanAttributes(java.io.File).

On my other two servers, the exact same JSP as above is accessed, compiled and responds within one second.
André Warnier
2009-02-20 20:14:00 UTC
Permalink
Sam Hokin wrote:
> I'd like to correct an error I made in my earlier report of this
> problem. It turns out that the response delay in my simple test JSP IS
> during compilation!

Ha !
If it may help, I remember seeing an earlier thread this week, I believe
answered by Rainer Jung, providing some fancy compilations options.
Don't remember what the thread was about though.
Caldarale, Charles R
2009-02-20 20:28:43 UTC
Permalink
> From: André Warnier [mailto:***@ice-sa.com]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart
>
> If it may help, I remember seeing an earlier thread this
> week, I believe answered by Rainer Jung, providing some
> fancy compilations options.

Not related; Rainer's answer concerned how to control the internal byte code to native compilations done by the JIT in the JVM, not the JSP to Java translations done by jasper.

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Sam Hokin
2009-02-20 20:44:32 UTC
Permalink
Chuck's comment about JSP to Java translation done by Jasper reminded me that I should have included the time on the
Java source file created during JSP compilation. It shows that the Java source file was created immediately, but
accessed again six seconds later (!), presumably by the compiler.

File: `/usr/local/tomcat/work/Catalina/www.ims.net/_/org/apache/jsp/test4_jsp.java'
Size: 2333 Blocks: 8 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 10650454 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 501/ tomcat) Gid: ( 501/ tomcat)
Access: 2009-02-20 13:39:35.000000000 -0600
Modify: 2009-02-20 13:39:29.000000000 -0600
Change: 2009-02-20 13:39:29.000000000 -0600

In summary: test4.jsp accessed at 13:39:29; test4_jsp.java created at 13:39:29; test4_jsp.java accessed at 13:39:35;
test4_jsp.class created at 13:40:17; HTTP response at 13:40:17.

I'm going to play with File.exists() in this and the other servers to see if there is some generic problem with file
access on the slow server.
Caldarale, Charles R
2009-02-20 20:53:18 UTC
Permalink
> From: Sam Hokin [mailto:***@ims.net]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart
>
> I'm going to play with File.exists() in this and the other
> servers to see if there is some generic problem with file
> access on the slow server.

Make sure to try both in and outside of Tomcat's work directory and the webapp's directory underneath it. Might be something odd going on with that specific path. (No symlinks involved, I hope.)

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
Sam Hokin
2009-02-20 21:05:05 UTC
Permalink
Actually, /usr/local/tomcat is a symlink to /usr/local/apache-tomcat-6.0.18. But that's a pretty important symlink for
me, to make it easy to change Tomcat versions. But I'll investigate that. That's the only symlink involved that I can
think of.

Caldarale, Charles R wrote:
>> From: Sam Hokin [mailto:***@ims.net]
>> Subject: Re: very slow class loading on initial JSP/servlet
>> request afterrestart
>>
>> I'm going to play with File.exists() in this and the other
>> servers to see if there is some generic problem with file
>> access on the slow server.
>
> Make sure to try both in and outside of Tomcat's work directory and the webapp's directory underneath it. Might be something odd going on with that specific path. (No symlinks involved, I hope.)
>
> - Chuck
Sam Hokin
2009-02-20 22:44:43 UTC
Permalink
An update. I've written a simple standalone Java routine that calls File.exists(). It never hangs. In fact, I ran it
against the class file that Tomcat creates from a test JSP; it doesn't hang, returns false for the (long) time it takes
Tomcat to create the class file on the slow system, then returns true after the class file is created (and the HTTP
response is sent). I ran my routine as the tomcat user.

Since compilation seems to be the issue, I tried replacing Jasper JDT (by removing jasper-jdt.jar from /lib) with Ant
(by placing ant.jar in /lib) and restarting Tomcat. This did not improve my problem. JSPs still compile slowly with
Ant instead of JDT on the slow machine (and fast on the other machines). I've reverted back to JDT.
André Warnier
2009-02-20 23:35:27 UTC
Permalink
Sam Hokin wrote:
> I'd like to correct an error I made in my earlier report of this
> problem. It turns out that the response delay in my simple test JSP IS
> during compilation! I ran stat on the files in question, as well as
> looked at the time shown on the response page (which is all the test JSP
> does). (I think I got confused by the output of the -verbose:class JVM
> option earlier.)
>
Restarting from the point above..
Leaving the current test jsp page in place, but taking a copy of it
under another name, does the symptom then also show up on the copy ?
The same with the whole webapp, copied under another name.
I am saying "copy", not "move/rename", so as to really use another disk
area.

Separate question : does this Tomcat, contrary to the others, have the
SecurityManager on ? or vice-versa ?
Sam Hokin
2009-02-20 23:50:16 UTC
Permalink
André Warnier wrote:
> Sam Hokin wrote:
>> I'd like to correct an error I made in my earlier report of this
>> problem. It turns out that the response delay in my simple test JSP
>> IS during compilation! I ran stat on the files in question, as well
>> as looked at the time shown on the response page (which is all the
>> test JSP does). (I think I got confused by the output of the
>> -verbose:class JVM option earlier.)
>>
> Restarting from the point above..
> Leaving the current test jsp page in place, but taking a copy of it
> under another name, does the symptom then also show up on the copy ?

Yes, always. That's how I reproduce it without having to restart the server. I just copy from test1.jsp to test2.jsp,
and then test2.jsp takes a long time to compile. It's a very robust problem, with no specific source or compiled files
involved. It just requires a JSP that hasn't yet been compiled.

> The same with the whole webapp, copied under another name.

Any web app. There are four running now, I used to have about 8 running on this server, and I've put new web apps on
(like yesterday), always reproducing the problem. But all of these web apps import my Java package at the top of the JSPs.

> Separate question : does this Tomcat, contrary to the others, have the
> SecurityManager on ? or vice-versa ?

No it doesn't. SecurityManager is not on any of the servers. I have not changed catalina.policy, and I do not invoke
Tomcat with the -security option.
André Warnier
2009-02-21 21:05:44 UTC
Permalink
A summary so far :
- the problem is : on one certain machine, when deploying any JSP which
imports a certain self-made library of classes, that JSP takes an
inordinate amount of time (one minute or more) to compile for a 5-line JSP
- after compilation, the JSP in question responds totally normally to
requests
- there are no errors in the Tomcat logs. The logs show only the
inordinate amount of time needed to load the application, when the
offending library is included.
- the issue appears only on that particular machine, which is a modern
and fast machine, dual quad-core Intel Xeon E5310 CPUs, with a 4-disk
Raid array. The same JSP and library on any other of 3 machines using
the same JVM and Tomcat, does not exhibit the same problem
- the disk array was checked for problems using various means, and does
not exhibit any. The RAM was similarly checked.
- the issue persists if the JSP is copied under another name on the same
machine
- the issue persists if the whole webapp containing that JSP is copied
to a new webapp name on the same machine (and thus occupying a different
disk and memory area)
- the issue disappears if said self-made library is no longer imported
in the JSP; it reappears if the library is imported again
- the issue persists if the imported library is moved from
webapp/WEB-INF/lib to the Tomcat common lib location and vice-versa
- the issue persists despite changing the JVM version and/or Tomcat version
- the issue does not appear when importing the same self-made library in
a separate stand-alone Java program and compiling/running that program
on the same machine (?) (was that the whole library, or just a
File.exists() test ?)
- during the compilation of the bewitched JSP, the machine on which the
inordinate delay occurs does not exhibit any remarkably high CPU, memory
or disk usage
- the bewitching library of which it is question does not make evident
references to external network resources whose unavailability or slow
access could explain the compilation delay
- the affected system is not near any known source of Alpha particles,
nor near a reindeer farm, nor in the proximity of any significant
quantity of chocolate or other dark matter
- other applications on the striken machine work fine and do not exhibit
the above symptoms
- the machine in question does not activate Java's security manager (nor
do the other non-affected machines)


Suggestions of the day :

1.
under the assumption that the library in question does not contain
hundreds of classes, and that the problem occurs during compilation, so
that an error during execution of the JSP would not matter..

Start removing (commenting out) the content of each class one by one and
test, until the specific class causing the issue is found.

2.
turn off 1,2,3,4,5,6,7 of the machine cores and see if it changes the
symptoms

3.
I have a currently unused machine standing here, which in a spirit of
camaraderie and assistance I would offer to exchange for the affected
machine. It's not a quad-core or anything like that, but to my
knowledge when it was last powered on a couple of years ago, it did not
exhibit that kind of problem.
Caldarale, Charles R
2009-02-20 18:29:22 UTC
Permalink
> From: Christopher Schultz [mailto:***@christopherschultz.net]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart
>
> Is this machine running next to any equipment that
> generates a lot of stray alpha particles? :)

Nearby won't do it for alpha particles - they have to come from inside. They can't travel far (a few centimeters in air, zero for anything with signifcant density), but the high mass of the particle does result in measurable damage if it hits something.

Back in the days when they were a problem, it was discovered that some chip carriers were slightly radioactive, creating some of the first true - albeit unintentional - hardware random number generators.

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
André Warnier
2009-02-20 20:10:18 UTC
Permalink
Caldarale, Charles R wrote:
>> From: Christopher Schultz [mailto:***@christopherschultz.net]
>> Subject: Re: very slow class loading on initial JSP/servlet
>> request afterrestart
>>
>> Is this machine running next to any equipment that
>> generates a lot of stray alpha particles? :)
>
> Nearby won't do it for alpha particles - they have to come from inside.
Dark matter then. Like dark chocolate, as suggested by an earlier
poster. I keep being intrigued by the numerology here :
2 times 4 cores ..
4 Raid disks ..
8 reindeer ..
...
64 bit ..
it started a few days before Christmas (8 ?)..
That must be telling us something, we're just not smart enough yet to
figure it out.
Pieter Temmerman
2009-02-24 12:07:06 UTC
Permalink
Do we know which file file.exists() is trying to check?
May be a good idea to run Tomcat with strace, to see exactly which file
is it trying to access.

On Fri, 2009-02-20 at 21:10 +0100, André Warnier wrote:
> Caldarale, Charles R wrote:
> >> From: Christopher Schultz [mailto:***@christopherschultz.net]
> >> Subject: Re: very slow class loading on initial JSP/servlet
> >> request afterrestart
> >>
> >> Is this machine running next to any equipment that
> >> generates a lot of stray alpha particles? :)
> >
> > Nearby won't do it for alpha particles - they have to come from inside.
> Dark matter then. Like dark chocolate, as suggested by an earlier
> poster. I keep being intrigued by the numerology here :
> 2 times 4 cores ..
> 4 Raid disks ..
> 8 reindeer ..
> ...
> 64 bit ..
> it started a few days before Christmas (8 ?)..
> That must be telling us something, we're just not smart enough yet to
> figure it out.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-***@tomcat.apache.org
> For additional commands, e-mail: users-***@tomcat.apache.org
>
--
Pieter Temmerman
email: ***@sadiel.es
skype: ptemmerman.sadiel

SADIEL TECNOLOGÍAS DE LA INFORMACIÓN, S.A. http://www.sadiel.es.
Juha Laiho
2009-02-24 20:57:55 UTC
Permalink
So, looks like I'm late to the party, but will crash in nonetheless.


Sam Hokin wrote:
> Christopher Schultz wrote:
>> On 2/19/2009 6:23 PM, Sam Hokin wrote:
>>> The problem, which spontaneously appeared a few days before Christmas on
>>> this one server, is that the initial request of a JSP or servlet takes
>>> AGES to respond, usually exceeding several minutes.
>
> Well, I have used a number of diagnostics, like jconsole, jstat and
> jhat, but haven't been able to glean anything informative from them
> about my issue. I've run the -Xloggc JVM option, and watched it garbage
> collect, again not informing me of anything. Yes, if I could figure out
> what the JVM is actually DOING doing the long wait, I'd hopefully be
> able to deal with it, but I honestly haven't been able to figure out how
> to do that. What do you suggest?
>
>> I'll cast my hat into the ring: I suspect you're loading a resource that
>> wants to looking an external resource (like an XML file with a remote
>> DTD/Schema and validation is on in the parser) and the DNS is tripping
>> you up.
>
> No, there are no external references in my Java package. It's all just
> local class files.

Still it sounds like a network (DNS) issue, but it does not pay to just
argue, more information is needed.

One tool that I haven't yet seen suggested is 'strace', the Linux system
call tracer. This will show all the calls your application makes to the
operating system. As you say the application is mostly idle during the
delay, it is, in one way or another, waiting for some OS service to
complete. 'strace' should provide you with timestamped information on
what OS services were called, with which arguments, and how long did
it take for them to return with results. 'strace' will leave you with
a huge file (or a set of huge files, depending on the options you use),
and going through them will take some time - but you'll most likely
also find what causes the delay.

If you decide to go this way, trim down your application to the minimum
where you can still see the delay; with this you hopefully can get
at least some reduction to the amount of log data written. Even though it
is possible to temporarily attach strace to a running process (like
the Tomcat at the moment when it is hung), I'd recommend starting the
tomcat under truss (with -f or -ff to make it also trace new processes
created), because this way it is easier to trace back the file descriptor
numbers accessed.
--
..Juha
Sam Hokin
2009-02-24 21:16:11 UTC
Permalink
Juha Laiho wrote:
>
> One tool that I haven't yet seen suggested is 'strace', the Linux system
> call tracer. This will show all the calls your application makes to the
> operating system. As you say the application is mostly idle during the
> delay, it is, in one way or another, waiting for some OS service to
> complete. 'strace' should provide you with timestamped information on
> what OS services were called, with which arguments, and how long did
> it take for them to return with results. 'strace' will leave you with
> a huge file (or a set of huge files, depending on the options you use),
> and going through them will take some time - but you'll most likely
> also find what causes the delay.

Thanks, Juha. Actually Pieter suggested it a little while ago, and I've been trying to get some information out of
strace. The best I can do is to put strace in front of the java command that's inside catalina.sh. That's the command
that shows with ps -ef when Tomcat is running. BUT, I get nothing out of strace when I make page requests on a site, it
just shows output during Tomcat startup. So, I've not figured out how to get strace to say what the JVM is during the
delay. jstack has led us to a stalled File.exists() in one case, but we don't know what file it's looking for. And I'm
not convinced that File.exists() is the only method that's stalling.

Since this problem exists only on a production server, a server on which I must still serve at least two customer sites
(due to DNS issues) in addition to our own and any others I put on there, I'm a bit restricted in terms of how much I
can muck with it (not that I haven't brought those live sites down for awkward periods of time with the diagnosis I've
attempted so far). I wish I had a test environment on another server that replicates this issue, but my other two
servers run Tomcat perfectly fast, and since I don't understand what's causing the problem, I cannot make one of my
other servers reproduce it.

Another diagnostic problem is that undeploying a context with the Tomcat /manager app, and then starting it again, does
NOT reset this problem - the response to a JSP request is immediate (provided it had been requested since the last
Tomcat startup). This problem is only reset on a given JSP if I restart Tomcat entirely; I can reproduce it by creating
fresh JSPs with new names and requesting them.

But, clearly, the key diagnostic issue is finding out WHAT is going on during the delay that a JSP incurs when it is
first requested of a given Tomcat instance. I've not been able to find out from strace. I'll give truss -f and truss
-ff a try.
Caldarale, Charles R
2009-02-24 21:21:57 UTC
Permalink
> From: Sam Hokin [mailto:***@ims.net]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart

> But, clearly, the key diagnostic issue is finding out WHAT is
> going on during the delay that a JSP incurs when it is
> first requested of a given Tomcat instance.

I'm still wondering if /dev/random is involved; this kind of stall has been observed when /dev/random is used instead of /dev/urandom for random byte generation. Don't see why it would be invoked by a File.exists() call, though.

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
János Löbb
2009-02-24 21:30:42 UTC
Permalink
On Feb 24, 2009, at 4:16 PM, Sam Hokin wrote:

> Juha Laiho wrote:
>> One tool that I haven't yet seen suggested is 'strace', the Linux
>> system
>> call tracer. This will show all the calls your application makes to
>> the
>> operating system. As you say the application is mostly idle during
>> the
>> delay, it is, in one way or another, waiting for some OS service to
>> complete. 'strace' should provide you with timestamped information on
>> what OS services were called, with which arguments, and how long did
>> it take for them to return with results. 'strace' will leave you with
>> a huge file (or a set of huge files, depending on the options you
>> use),
>> and going through them will take some time - but you'll most likely
>> also find what causes the delay.
>
> Thanks, Juha. Actually Pieter suggested it a little while ago, and
> I've been trying to get some information out of strace. The best I
> can do is to put strace in front of the java command that's inside
> catalina.sh. That's the command that shows with ps -ef when Tomcat
> is running. BUT, I get nothing out of strace when I make page
> requests on a site, it just shows output during Tomcat startup. So,
> I've not figured out how to get strace to say what the JVM is during
> the delay. jstack has led us to a stalled File.exists() in one
> case, but we don't know what file it's looking for. And I'm not
> convinced that File.exists() is the only method that's stalling.
>
> Since this problem exists only on a production server, a server on
> which I must still serve at least two customer sites (due to DNS
> issues) in addition to our own and any others I put on there, I'm a
> bit restricted in terms of how much I can muck with it (not that I
> haven't brought those live sites down for awkward periods of time
> with the diagnosis I've attempted so far). I wish I had a test
> environment on another server that replicates this issue, but my
> other two servers run Tomcat perfectly fast, and since I don't
> understand what's causing the problem, I cannot make one of my other
> servers reproduce it.
>
> Another diagnostic problem is that undeploying a context with the
> Tomcat /manager app, and then starting it again, does NOT reset this
> problem - the response to a JSP request is immediate (provided it
> had been requested since the last Tomcat startup). This problem is
> only reset on a given JSP if I restart Tomcat entirely; I can
> reproduce it by creating fresh JSPs with new names and requesting
> them.
>
> But, clearly, the key diagnostic issue is finding out WHAT is going
> on during the delay that a JSP incurs when it is first requested of
> a given Tomcat instance. I've not been able to find out from
> strace. I'll give truss -f and truss -ff a try.
>

How about just using tcpdump during the long delay and see what the
machine is doing network wise ?

man tcpdump

János
Sam Hokin
2009-02-24 21:44:40 UTC
Permalink
János Löbb wrote:
>
> How about just using tcpdump during the long delay and see what the
> machine is doing network wise ?

I have, and network timeout was suggested earlier on this thread. There is no TCP activity of particular interest
during the delay. I've hunted for, and not seen, anything to do with the network during the stall. And IPv6 is turned
off, by the way. I can't rule out a network cause, but I've seen nothing with tcpdump or netstat associated with the
delay to date. And I can't muck with the network much since it's at a remote data center. I can drive out there late
at night to take it offline, for example, but I'd need a quick test plan for that to minimize the downtime. (I have
gone out there to test memory and disk, both tested fine.)
Pieter Temmerman
2009-02-25 09:49:19 UTC
Permalink
> Thanks, Juha. Actually Pieter suggested it a little while ago, and I've been trying to get some information out of
> strace. The best I can do is to put strace in front of the java command that's inside catalina.sh. That's the command
> that shows with ps -ef when Tomcat is running. BUT, I get nothing out of strace when I make page requests on a site, it
> just shows output during Tomcat startup. So, I've not figured out how to get strace to say what the JVM is during the
> delay.

Try this strace command:

strace -f -etrace=\!futex,gettimeofday,clock_gettime -p
PID_OF_TOMCAT_JAVA

where PID_OF_TOMCAT_JAVA is the PID of your Tomcat, duh! :)
Sam Hokin
2009-02-25 18:31:54 UTC
Permalink
Pieter Temmerman wrote:
> Try this strace command:
>
> strace -f -etrace=\!futex,gettimeofday,clock_gettime -p
> PID_OF_TOMCAT_JAVA
>
> where PID_OF_TOMCAT_JAVA is the PID of your Tomcat, duh! :)

Thanks! That certainly did the trick. The first time I ran strace against the Tomcat process, it brought Tomcat to a
halt. I had to restart it.

After the restart, it took an absolute eternity for the sites to respond to requests, as usual. One thing that I
haven't mentioned, by the way, is that all four sites begin to respond at the SAME TIME. There appears to be something
going on server-wide that finally resolves. (There are currently four Host entries in server.xml, each of which having
several Context entries, by the way. Three hosts are on the same IP address, one has its own.)

During the startup stall phase, I requested my simple test JSP page. strace revealed, as it did for all the other pages
being requested on the server, repeated calls to stat64 and lstat64 of the SAME class files. Here are the lines from
strace while test6.jsp (precompiled into test6_jsp.class) was stalled:

http://www.ims.net/media/strace-test6-stall.txt

After I shut off strace (being convinced that it was hindering the server from getting done with its post-startup
shenanigans), and the rest of the sites were finally responding to requests, I fired up strace again and requested
test6.jsp, and got this:

http://www.ims.net/media/strace-test6-ok.txt

along with a successful HTTP response. So: stat64; access; lstat64; stat64; read; read; read. The class probably was
already loaded from the previous stalled request.

During the Tomcat stall phase there are thousands and thousands of stat64, access, lstat64 calls to the same class files
(and directories above those files) across pages on all four sites.

My next step is to set up a new instance on another port so I (hopefully) stop beating on the production sites with
these diagnostics, and then do Mark's suggestion to import, one by one, the classes in my Java package to see if there
is a particular one that is causing the delay. I've tried that to some extent in the past and felt that the delay just
increased linearly as I added more classes, but I can do it more methodically.
Caldarale, Charles R
2009-02-25 20:09:06 UTC
Permalink
> From: Sam Hokin [mailto:***@ims.net]
> Subject: Re: very slow class loading on initial JSP/servlet
> request afterrestart
>
> http://www.ims.net/media/strace-test6-stall.txt

So what's going on during the frequent 11-second gaps? (Was Rosemary Woods here?)

Might want to redo the strace with the -T option to show the time spent inside each system call.

> During the Tomcat stall phase there are thousands
> and thousands of stat64, access, lstat64 calls to
> the same class files (and directories above those
> files) across pages on all four sites.

How many is "thousands and thousands"? 3,000 or 100,000? It's not unusual to see lots of calls to the stat derivates; they should return extremely quickly.

- Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
André Warnier
2009-02-25 20:57:05 UTC
Permalink
Caldarale, Charles R wrote:
>> From: Sam Hokin [mailto:***@ims.net]
>> Subject: Re: very slow class loading on initial JSP/servlet
>> request afterrestart
>>
>> http://www.ims.net/media/strace-test6-stall.txt
>
> So what's going on during the frequent 11-second gaps? (Was Rosemary Woods here?)

They are not only frequent, they are also totally regular. Looks like a
regular gap of 10 s + some fraction, that looks like 10 or 11 just
because of the logfile stamp resolution.


>
> Might want to redo the strace with the -T option to show the time spent inside each system call.
>
>> During the Tomcat stall phase there are thousands
>> and thousands of stat64, access, lstat64 calls to
>> the same class files (and directories above those
>> files) across pages on all four sites.

Actually it is totally regular : stat, access, lstat, stat, then 10.xxx
delay, then again stat, access, lstat, stat, 10 seconds +.
What in this directory path is (are) a symlink(s) ?
/home/sites/ims/jcms/webapps/ROOT/WEB-INF/classes/org/apache/jsp/test6_jsp.class
and to what does it (they) point ?
And how is this different from the other 2 systems that work fine ?
Sam Hokin
2009-02-25 21:55:51 UTC
Permalink
André Warnier wrote:
> What in this directory path is (are) a symlink(s) ?
> /home/sites/ims/jcms/webapps/ROOT/WEB-INF/classes/org/apache/jsp/test6_jsp.class

There are no symlinks in that path.

> And how is this different from the other 2 systems that work fine ?

It differs in no known way, other than being different hardware.
André Warnier
2009-02-26 11:04:17 UTC
Permalink
Sam Hokin wrote:
> André Warnier wrote:
>> What in this directory path is (are) a symlink(s) ?
>> /home/sites/ims/jcms/webapps/ROOT/WEB-INF/classes/org/apache/jsp/test6_jsp.class
>
>
> There are no symlinks in that path.

Then the question is : why is that thing doing lstat's ?
(I mean in the trace list that you posted)

>
>> And how is this different from the other 2 systems that work fine ?
>
> It differs in no known way, other than being different hardware.
One more mystery then.
If on the other systems, you do the same trace, does it show the same
sequence of lstat,access,stat ... ?
János Löbb
2009-02-24 21:21:18 UTC
Permalink
On Feb 24, 2009, at 3:57 PM, Juha Laiho wrote:

> So, looks like I'm late to the party, but will crash in nonetheless.
>
>
> Sam Hokin wrote:
>> Christopher Schultz wrote:
>>> On 2/19/2009 6:23 PM, Sam Hokin wrote:
>>>> The problem, which spontaneously appeared a few days before
>>>> Christmas on
>>>> this one server, is that the initial request of a JSP or servlet
>>>> takes
>>>> AGES to respond, usually exceeding several minutes.
>>
>> Well, I have used a number of diagnostics, like jconsole, jstat and
>> jhat, but haven't been able to glean anything informative from them
>> about my issue. I've run the -Xloggc JVM option, and watched it
>> garbage
>> collect, again not informing me of anything. Yes, if I could
>> figure out
>> what the JVM is actually DOING doing the long wait, I'd hopefully be
>> able to deal with it, but I honestly haven't been able to figure
>> out how
>> to do that. What do you suggest?
>>
>>> I'll cast my hat into the ring: I suspect you're loading a
>>> resource that
>>> wants to looking an external resource (like an XML file with a
>>> remote
>>> DTD/Schema and validation is on in the parser) and the DNS is
>>> tripping
>>> you up.
>>
>> No, there are no external references in my Java package. It's all
>> just
>> local class files.
>
> Still it sounds like a network (DNS) issue, but it does not pay to
> just
> argue, more information is needed.
>
> One tool that I haven't yet seen suggested is 'strace', the Linux
> system
> call tracer. This will show all the calls your application makes to
> the
> operating system. As you say the application is mostly idle during the
> delay, it is, in one way or another, waiting for some OS service to
> complete. 'strace' should provide you with timestamped information on
> what OS services were called, with which arguments, and how long did
> it take for them to return with results. 'strace' will leave you with
> a huge file (or a set of huge files, depending on the options you
> use),
> and going through them will take some time - but you'll most likely
> also find what causes the delay.
>
> If you decide to go this way, trim down your application to the
> minimum
> where you can still see the delay; with this you hopefully can get
> at least some reduction to the amount of log data written. Even
> though it
> is possible to temporarily attach strace to a running process (like
> the Tomcat at the moment when it is hung), I'd recommend starting the
> tomcat under truss (with -f or -ff to make it also trace new processes
> created), because this way it is easier to trace back the file
> descriptor
> numbers accessed.
> --
> ..Juha
>

It could be other network issue, like having IP6 enabled on the
machine connected to an IP4 network, or an inefficient routing of
network traffic for that particular machine. Sounds like it is timing
out on some network task.

János
Loading...