This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 43113 - Perm-generation memory leak when running Ant
Summary: Perm-generation memory leak when running Ant
Status: CLOSED FIXED
Alias: None
Product: projects
Classification: Unclassified
Component: Ant (show other bugs)
Version: 4.x
Hardware: PC Linux
: P2 blocker (vote)
Assignee: Jesse Glick
URL:
Keywords: PERFORMANCE
: 43839 (view as bug list)
Depends on:
Blocks: 41535 43839
  Show dependency tree
 
Reported: 2004-05-11 18:08 UTC by Petr Nejedly
Modified: 2006-03-24 09:51 UTC (History)
3 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Petr Nejedly 2004-05-11 18:08:23 UTC
After longer intensive usage (2-3 full days), the
IDE starts showing OOMEs with no apparent reason
(enough heap space).
I've tried to instrument the IDE under heavy load
using jvmstat's visualgc and several -XX flags,
which revealed that there is high class loading
fluctuation and that the number of loaded classes
slowly grows before it fills whole permanent
generation. Then OOME is thrown during
classloading, as the system don't have
perm-generation space for class definition.

Some numbers:
2nd day morning:
Perm gen=46MB, 12179 classes loaded, 852 unloaded
noon:
Perm gen=48MB, 14540 classes loaded, 2628 unloaded
evening:
Perm gen=54MB, 16418 classes loaded, 2789 unloaded

I'll try to keep using the IDE for third day to
prove OOMEs also in this run.

I suspect it has something to do with ant based BS
loadind/unloading classes on taskdefs/finishes,
as a lot of ant related classes gets loaded during 
build, if I haven't flexed build system long
enough before build.
Comment 1 Jesse Glick 2004-05-11 20:17:10 UTC
OK, well let me know what you find of course.
Comment 2 Petr Nejedly 2004-05-11 20:34:49 UTC
Later, I'll probably try to run on 1.5 with
java.lang.instrument.Instrumentation on and check whether the
accumulating classes are visible to Instrumentation and what (if) are
the duplicities.
But it explains strange OOMEs as not being usual heap problem.

BTW: You can get visualgc yourself from
http://developers.sun.com/dev/coolstuff/jvmstat/

It allows direct (no previous setup needed) attachement to running
JVM, so you can analyze the problems post-mortem as well.

Comment 3 Petr Nejedly 2004-05-12 19:22:23 UTC
OK, third day evening:
OOME, heap was only 41MB.

Perm gen=64MB, 21539 classes loaded, 5281 unloaded
Full gc itself haven't helped, but waiting a minute or so and doing
full-gc again unloaded 200-some classes and eased the perm heap by
about 800KB to
63250K, 21539 classes loaded, 5486 unloaded

If I look at the numbers, they are quite consistent on average,
11327 classes taking 46MB of PermGen,  16000 classes reaching the 64MB
limit of the PermGen.
I just regret I haven't tee-ed stderr to a file, to have a list of
loaded classes and check the consistency of loading/unloading...


Comment 4 _ pkuzel 2004-05-13 13:48:09 UTC
high class loading fluctuation - I observed it too in profiler. Almost
of 60% of byte[] allocations comes from classloaders (reading jars).
Even for warmed up IDE.
Comment 5 Jesse Glick 2004-05-16 17:06:19 UTC
Note that the last fix to issue #42431, to notify buildFinished
properly, may have been necessary to clean up AntClassLoader's created
during the build. So it is possible this was just one consequence of
that bug. TBD.
Comment 6 Petr Nejedly 2004-05-26 11:34:56 UTC
*** Issue 43839 has been marked as a duplicate of this issue. ***
Comment 7 Jesse Glick 2004-06-08 17:13:41 UTC
Please investigate.
Comment 8 Petr Nejedly 2004-06-10 10:03:05 UTC
I'm tracking it for a while, but it seems fixing issue 43839 really
helped. During a whole-day testing, the perm area was nearly constant
while the class loading/unloading code was heavily exercised.
The only growth was loading few for-reflection generated classes
during the initial cycles of my testing pattern, but that is OK:
*) these classes are generated once you hit a threshold on reflection
 usage of a given thing, so it took time before these things got
 created by the JDK
*) the slight growing trend stopped after ~10 cycles, from that time
 on, the (loaded-unloaded) number was constant.

Now some observations:
Each of my testing cycles forced loading and unloading of 611 classes.
That is IMHO way too much and it would happen to the user each time he
hit build after coding a bit longer w/o building.
Among these classes, some very interesting things can be found,
like several .NET related classes, perforce support, ...
These 611 classes consume temporarily about 2.5M of perm area.

I think this is caused by the way new ant registers preinstalled
extensions (all the jars in ide4/ant/lib). I also think that this way
is broken (memory and performance wise) and should be changed.
Comment 9 Jesse Glick 2004-06-10 16:53:56 UTC
I know that Ant preregisters all standard tasks, which is rather
idiotic. Not sure how to fix that, though. If I come up with any ideas
I will file a patch for Ant.

Re. loading and unloading the Ant installation: the Ant class loader
and its associated objects is held via one static soft reference. We
could make it a timed soft reference, so it would stay a strong ref
for a while, but then what if you did some build and then started
refactoring? You would just run out of memory sooner.

BTW #43839 is probably not the issue you were talking about. Guessing
you meant #42431.
Comment 10 Petr Nejedly 2004-06-10 17:07:00 UTC
I'm not saying the classes should be held in memory stronger. It is OK
they can go, as they can be user-installed classes (with memory leaks
;-) or whatever.
Also, they are probably not freed that often in usual working
conditions, only I was targetting at excercising them heavily

I only see a problem in the amount of them, or more precisely in the
unused portion. I guess at most half of them is really needed during
script execution.

> BTW #43839 is probably not the issue you were talking about.
> Guessing you meant #42431.

Right. I haven't seen the OOME for a long time, the only pity is that
I don't have the class logs from pre-42431 to check them for the
growing pattern.
Comment 11 Petr Nejedly 2004-07-07 14:17:34 UTC
OK, the leak is easy to reproduce using Vincent's demo application.
(Note you have to change all paths from '\\' separated to '/'
separated on unix)

The problem is caused by hanging user classes, it seems, as I have
logged many cases of loading e.g. net.sourceforge.pmd.cpd.SourceCode
and never unloading any of them (that is, several copies of the class
in memory). Maybe there is a problem with ClassLoader references
hanging somewhere.
Comment 12 Jesse Glick 2004-07-07 16:36:41 UTC
Presumably has nothing to do with the project system as such; just
something related to buggy tasks (perhaps?) or some missing cleanup
steps in the Ant module, etc.
Comment 13 vbrabant 2004-07-07 18:37:51 UTC
If you look at
http://cvs.apache.org/dist/ant/v1.6.2beta1/RELEASE-NOTES-apache-ant-1.6.2beta1.html
you can see that first bugfixes mentionned is concerning memory link
in AntClassLoader.

Do you think it's our problems here.
Comment 14 Jesse Glick 2004-07-07 19:22:52 UTC
The fix Vincent mentions is

http://issues.apache.org/bugzilla/show_bug.cgi?id=8689

PatchSet 9012 
Date: 2004/06/28 08:47:05
Author: bodewig
Branch: ANT_16_BRANCH
Tag: (none) 
Log:
Merge fix for 8689 from HEAD

Members: 
	TODO:1.3.2.23->1.3.2.24 
	docs/manual/develop.html:1.13.2.5->1.13.2.6 
	src/main/org/apache/tools/ant/AntClassLoader.java:1.76.2.6->1.76.2.7 
	src/main/org/apache/tools/ant/Project.java:1.154.2.9->1.154.2.10 
	src/main/org/apache/tools/ant/SubBuildListener.java:1.1->1.1.2.1 
	src/main/org/apache/tools/ant/taskdefs/Ant.java:1.92.2.7->1.92.2.8 
	src/main/org/apache/tools/ant/taskdefs/Recorder.java:1.16.2.4->1.16.2.5 
	src/main/org/apache/tools/ant/taskdefs/RecorderEntry.java:1.11.2.5->1.11.2.6



Of course, the quickest check would be whether Vincent's test case is
still reproducible when using Ant 1.6.2 beta 1.

Possibly related is issue #42431, but I don't think so. That bug,
before it was fixed, caused the VM to hold onto JAR file locks
unnecessarily; it did not (AFAIK) prevent classes from being GC'd.
Comment 15 vbrabant 2004-07-07 20:27:32 UTC
Hi everyone.
At first glance, it seems that problem disappeared by switching to Ant
1.6.2Béta1.
I will take you informed if problems continues to occurs
Comment 16 Petr Nejedly 2004-07-08 12:48:41 UTC
> How can I see the size of the PermGen in a JRE ?
Use jvmstat tools, e.g visualgc
http://developers.sun.com/dev/coolstuff/jvmstat/

> How can I give more size for the PermGen ?
-J-XX:MaxPermSize=128m, but I won't do this unless really necessary.

Now, I've verified (using -verbose:class and visualgc) that the new
ant (1.6.2beta1) fixes this problem, but I'd say only partially.
It's on Jesse to incorporate the newer version of ant (if possible)
or provide a workaround in his ant-gutting code.

Now the caveats:
During each run, new instances of task classes (*.pmd.*) get loaded.
They are neither unloaded immediatelly nor directly after several gc.

So subsequent script invocations load more and more class instances
and it can even lead to our OOME.
Only after some IDE inactivity, first gc led to freeing AntClassLoader
instances* (all of them accumulated over the time), while the second
finally unload all the classes.

*) I've patched ClassLoader on bootclasspath to report CL creation
and finalization.

I'll try to trace remaining AntClassLoader references, if there are any...
Comment 17 Jesse Glick 2004-07-08 16:35:38 UTC
You mean everything below "now the caveats" is true under 1.6.2 b1?
What I'm really interested in is what, concretely, is holding onto
class references in 1.6.1. AntClassLoader's, perhaps, but who is
holding onto the AntClassLoader's?

"first gc led to freeing AntClassLoader instances* (all of them
accumulated over the time), while the second finally unload all the
classes." - isn't it a JVM bug of sorts if GC'ing a ClassLoader does
not release its classes in the same pass?

Re. Ant 1.6.2: I'll put it in when it is released, not before.

Still need to study the Ant bug and presumably some profiler info to
see what is actually holding onto class references and why.
Comment 18 Petr Nejedly 2004-07-08 23:03:16 UTC
> You mean everything below "now the caveats" is true under 1.6.2 b1?

Right.

> What I'm really interested in is what, concretely, is holding onto
> class references in 1.6.1

There are probably no remaining class references but the
AntClassLoader2 itself is registered as a listener somewhere.
AFAICT the fix in 1.6.2b1 is to properly unregister itself.

Probably not. GCing classloader and unloading classes are very
different things. Think of it like if first gc() will remove
ClassLoader and all Class objects, while second gc() is sort of
finalize() which causes unloading of the real classes metadata (Class
instances are not that metadata!)
Anyway, I'm not a real expert on this and I have to test the behaviour
of it the black box way.

I thought it may be related to the reflection usage, as it generates
more classes and even more classloaders, which are referencing the
AntClassLoader2 and there is even SoftReference involved (which may
cause the delay in freeing), but I can't prove that theory for now.

Comment 19 Petr Nejedly 2004-07-09 13:28:40 UTC
I'm affraid I was wrong:
I've returned to ant1.6.1 and observed the same behaviour:
No classes are freed directly but after a minute or so of inactivity,
everything can be unloaded.

It is not usual behaviour of the plaftorm:
I wrote a simple test that creates a CL, loads a simple class from it,
instantiates it and even 50times reflectively invoke a method on it.
But for this test, both the CL is finalized and both the class and the
 generated reflection class are unloaded during first gc (with no delay).

So we probably keep some instance of something from those CLs
referenced for some time even after finishing the task.

I recall we had similar behaviour even for the core ant classes
(not unloaded directly, unloaded only after some time), but they were
loaded only once (probably because they were core, no taskdefs), so
caused no real problem.
You wrote about SoftReference holding the loader for them - that may
be the problem: If the core classes and taskdefs somehow
cross-reference, the taskdef classes couldn't be unloaded w/o
unloading core classes. I'll try to verify this.
Comment 20 Petr Nejedly 2004-07-09 14:08:47 UTC
"I'll try to verify this"

And that's it. If I replace the SoftReference with a WeakReference,
everything can be unloaded immediatelly after finishing the task.

Not that I like the "fix". This way we'd basically disable the caching
and loading of all ant core classes (~600) during each ant invocation
would be too expensive.
Optimally, we should be able to keep core classes in JVM (they are
loaded only once - no big deal) and still be able to unload taskdefs
(which are multiply loaded), but that would mean to find all the
references crossing the boundary between bridge loader and ant loader.

The other option would be to eliminate preloading of some of those 600
classes, as most of them won't be needed anyway.

The fallback option is to disable caching, but it is the last resort.
Comment 21 Petr Nejedly 2004-07-09 18:05:20 UTC
One more note:
I have managed to hack ant to not load all those 600 core classes.
After modifying ~2 classes and not changing the real behaviour,
in now loads only about 230 classes for a NB-projects build script
(no VSS and .Not loaded anymore).

It should be possible to convince the ant team to do more lazy loading
in ant, especially because they're apparently trying so and have
failed in few places.

Note: I also had to disable the bridge's cleanup code, as it also
called those two worst methods:
Project.getTaskDefinitions()
Project.getDataTypeDefinitions()
Comment 22 Jesse Glick 2004-07-09 19:31:45 UTC
Re. Ant not lazy loading core taskdefs: yes, I know. If you have a
safe & effective patch it should be filed on ant.apache.org. May be
complications though; there is no real spec for how it should behave.
Anyway I doubt it would help us as much as it would help command-line
Ant, since we would normally be reusing the loaded classes.

Re. keeping a SoftReference to the bridge loader: this can't be the
real problem. The Ant core itself only has a few hundred classes and
no more should be loaded after running Ant repeatedly (unless you turn
on lazy loading, in which case at most as many as are currently loaded
would ever be loaded).

From your description it sounds like the problem is that taskdef'd
classes or other build-specific objects are held from the bridge
loader as well. That is certainly wrong; perhaps some abuse of static
fields etc. "...that would mean to find all the references crossing
the boundary between bridge loader and ant loader" - yes, exactly.
Don't we have tools to do this??

BTW it may be a reportable HotSpot bug if SoftReference's are in fact
not cleared before throwing a PermGen OOME. The Javadoc does after all
state that "All soft references to softly-reachable objects are
guaranteed to have been cleared before the virtual machine throws an
OutOfMemoryError." It does not mention any exception for
java.lang.Class objects.
Comment 23 Petr Nejedly 2004-07-14 16:03:05 UTC
Re. ant patch: I don't know much about ant internals and it probably
involves changing some APIs there to be clean. It would help us had we
have to remove the classloader cache.

"Don't we have tools to do this??": Can't find the right hammer. I've
found no instances and it is nontrivial to check incomming refs for
all java.lang.Class instances in question... Note: I found no
reference to the AntClassLoader2 instances either.

Re. HotSpot bug: Even if it worked "correctly", it won't help us much.
Imagine you've just finished one ant run with 63.5MB of PermGen full
and you start another one - During the run, the classloader is
strong-referenced, so you'll get OOME from PermGen during the taskdefs
anyway.


A possible workaround is to clear the soft reference after every 10
runs or so, like e.g. apache kills worker processes after serving
given number of requests...
Comment 24 Petr Nejedly 2004-07-16 15:14:01 UTC
Re. HotSpot bug: Confirmed, it was bug at least in J2SDK1.4.2_01,
fixed both in latest 1.4.2 and in 1.5 (4896986)

> "Don't we have tools to do this??"
JProfiler claimed I have only 9 instances of java.lang.Class in memory,
which somehow disqualifies if from tracing class leaks...

It was very hard to reproduce the problem under OptimizeIt (so slow
and probably consumes client memory, so when the task finished, first
gc() immediatelly unloaded all the classes), but I have found at least
one class reference across domains, from the table at:
static org.apache.tools.ant.IntrospectionHelper.helpers

trying to clear it and to verify it helps. 
Comment 25 Petr Nejedly 2004-07-16 15:59:52 UTC
So we finaly got it.

All you need to do is to clear the content of the Hashtable referenced
from org.apache.tools.ant.IntrospectionHelper.helpers after the build
finishes.

I'm leaving this on you as I'm not that experienced with your gutting
code and the ant internals (what would happen if I clear it while
another build is still running type-of-issues)
Comment 26 Jesse Glick 2004-07-16 20:12:28 UTC
Hmm... this is very helpful but still raises some questions.

Ant's IntrospectionHelper already clears its own cache if you use the
factory method IH.getHelper(Project,Class) and that project ever fires
buildFinished (which it should, if NB's BridgeImpl is working
correctly). NB's IntrospectionHelperImpl uses the other variant -
IH.gH(Class) - which does not do such cleanup; but since the cleanup
is static, it is only necessary for *someone* to call IH.gH(P,C)
sometime during the build (on the main project). I find it very odd
that *no one* would be calling this important method. I guess it is
possible; usage of the 2-arg variant is spotty.
Comment 27 Jesse Glick 2004-07-16 21:14:17 UTC
Have a patch prepared, Petr maybe you can check that it works...
Comment 28 Jesse Glick 2004-07-16 21:15:04 UTC
Also I will file a patch for Ant to make IntrospectionHelper.helpers
be a WeakHashMap, which it certainly should be as far as I can tell.
Comment 29 Jesse Glick 2004-07-16 23:00:52 UTC
Workaround:

committed     Up-To-Date  1.19       
ant/src-bridge/org/apache/tools/ant/module/bridge/impl/BridgeImpl.java
Comment 30 Jesse Glick 2004-07-17 00:54:11 UTC
Ant patch which would make this hack unnecessary:

http://issues.apache.org/bugzilla/show_bug.cgi?id=30162
Comment 31 Petr Nejedly 2004-07-19 09:25:32 UTC
I'm affraid your ant path won't help.
You have weakened only one path from the static reference (the key),
but the path through value (IH.bean) remained strong.

Wrapping also the values with a WeakReference should be enough.
Comment 32 Petr Nejedly 2004-07-19 11:38:07 UTC
Workaround works OK.