This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
After longer intensive usage (2-3 full days), the IDE starts showing OOMEs with no apparent reason (enough heap space). I've tried to instrument the IDE under heavy load using jvmstat's visualgc and several -XX flags, which revealed that there is high class loading fluctuation and that the number of loaded classes slowly grows before it fills whole permanent generation. Then OOME is thrown during classloading, as the system don't have perm-generation space for class definition. Some numbers: 2nd day morning: Perm gen=46MB, 12179 classes loaded, 852 unloaded noon: Perm gen=48MB, 14540 classes loaded, 2628 unloaded evening: Perm gen=54MB, 16418 classes loaded, 2789 unloaded I'll try to keep using the IDE for third day to prove OOMEs also in this run. I suspect it has something to do with ant based BS loadind/unloading classes on taskdefs/finishes, as a lot of ant related classes gets loaded during build, if I haven't flexed build system long enough before build.
OK, well let me know what you find of course.
Later, I'll probably try to run on 1.5 with java.lang.instrument.Instrumentation on and check whether the accumulating classes are visible to Instrumentation and what (if) are the duplicities. But it explains strange OOMEs as not being usual heap problem. BTW: You can get visualgc yourself from http://developers.sun.com/dev/coolstuff/jvmstat/ It allows direct (no previous setup needed) attachement to running JVM, so you can analyze the problems post-mortem as well.
OK, third day evening: OOME, heap was only 41MB. Perm gen=64MB, 21539 classes loaded, 5281 unloaded Full gc itself haven't helped, but waiting a minute or so and doing full-gc again unloaded 200-some classes and eased the perm heap by about 800KB to 63250K, 21539 classes loaded, 5486 unloaded If I look at the numbers, they are quite consistent on average, 11327 classes taking 46MB of PermGen, 16000 classes reaching the 64MB limit of the PermGen. I just regret I haven't tee-ed stderr to a file, to have a list of loaded classes and check the consistency of loading/unloading...
high class loading fluctuation - I observed it too in profiler. Almost of 60% of byte[] allocations comes from classloaders (reading jars). Even for warmed up IDE.
Note that the last fix to issue #42431, to notify buildFinished properly, may have been necessary to clean up AntClassLoader's created during the build. So it is possible this was just one consequence of that bug. TBD.
*** Issue 43839 has been marked as a duplicate of this issue. ***
Please investigate.
I'm tracking it for a while, but it seems fixing issue 43839 really helped. During a whole-day testing, the perm area was nearly constant while the class loading/unloading code was heavily exercised. The only growth was loading few for-reflection generated classes during the initial cycles of my testing pattern, but that is OK: *) these classes are generated once you hit a threshold on reflection usage of a given thing, so it took time before these things got created by the JDK *) the slight growing trend stopped after ~10 cycles, from that time on, the (loaded-unloaded) number was constant. Now some observations: Each of my testing cycles forced loading and unloading of 611 classes. That is IMHO way too much and it would happen to the user each time he hit build after coding a bit longer w/o building. Among these classes, some very interesting things can be found, like several .NET related classes, perforce support, ... These 611 classes consume temporarily about 2.5M of perm area. I think this is caused by the way new ant registers preinstalled extensions (all the jars in ide4/ant/lib). I also think that this way is broken (memory and performance wise) and should be changed.
I know that Ant preregisters all standard tasks, which is rather idiotic. Not sure how to fix that, though. If I come up with any ideas I will file a patch for Ant. Re. loading and unloading the Ant installation: the Ant class loader and its associated objects is held via one static soft reference. We could make it a timed soft reference, so it would stay a strong ref for a while, but then what if you did some build and then started refactoring? You would just run out of memory sooner. BTW #43839 is probably not the issue you were talking about. Guessing you meant #42431.
I'm not saying the classes should be held in memory stronger. It is OK they can go, as they can be user-installed classes (with memory leaks ;-) or whatever. Also, they are probably not freed that often in usual working conditions, only I was targetting at excercising them heavily I only see a problem in the amount of them, or more precisely in the unused portion. I guess at most half of them is really needed during script execution. > BTW #43839 is probably not the issue you were talking about. > Guessing you meant #42431. Right. I haven't seen the OOME for a long time, the only pity is that I don't have the class logs from pre-42431 to check them for the growing pattern.
OK, the leak is easy to reproduce using Vincent's demo application. (Note you have to change all paths from '\\' separated to '/' separated on unix) The problem is caused by hanging user classes, it seems, as I have logged many cases of loading e.g. net.sourceforge.pmd.cpd.SourceCode and never unloading any of them (that is, several copies of the class in memory). Maybe there is a problem with ClassLoader references hanging somewhere.
Presumably has nothing to do with the project system as such; just something related to buggy tasks (perhaps?) or some missing cleanup steps in the Ant module, etc.
If you look at http://cvs.apache.org/dist/ant/v1.6.2beta1/RELEASE-NOTES-apache-ant-1.6.2beta1.html you can see that first bugfixes mentionned is concerning memory link in AntClassLoader. Do you think it's our problems here.
The fix Vincent mentions is http://issues.apache.org/bugzilla/show_bug.cgi?id=8689 PatchSet 9012 Date: 2004/06/28 08:47:05 Author: bodewig Branch: ANT_16_BRANCH Tag: (none) Log: Merge fix for 8689 from HEAD Members: TODO:1.3.2.23->1.3.2.24 docs/manual/develop.html:1.13.2.5->1.13.2.6 src/main/org/apache/tools/ant/AntClassLoader.java:1.76.2.6->1.76.2.7 src/main/org/apache/tools/ant/Project.java:1.154.2.9->1.154.2.10 src/main/org/apache/tools/ant/SubBuildListener.java:1.1->1.1.2.1 src/main/org/apache/tools/ant/taskdefs/Ant.java:1.92.2.7->1.92.2.8 src/main/org/apache/tools/ant/taskdefs/Recorder.java:1.16.2.4->1.16.2.5 src/main/org/apache/tools/ant/taskdefs/RecorderEntry.java:1.11.2.5->1.11.2.6 Of course, the quickest check would be whether Vincent's test case is still reproducible when using Ant 1.6.2 beta 1. Possibly related is issue #42431, but I don't think so. That bug, before it was fixed, caused the VM to hold onto JAR file locks unnecessarily; it did not (AFAIK) prevent classes from being GC'd.
Hi everyone. At first glance, it seems that problem disappeared by switching to Ant 1.6.2Béta1. I will take you informed if problems continues to occurs
> How can I see the size of the PermGen in a JRE ? Use jvmstat tools, e.g visualgc http://developers.sun.com/dev/coolstuff/jvmstat/ > How can I give more size for the PermGen ? -J-XX:MaxPermSize=128m, but I won't do this unless really necessary. Now, I've verified (using -verbose:class and visualgc) that the new ant (1.6.2beta1) fixes this problem, but I'd say only partially. It's on Jesse to incorporate the newer version of ant (if possible) or provide a workaround in his ant-gutting code. Now the caveats: During each run, new instances of task classes (*.pmd.*) get loaded. They are neither unloaded immediatelly nor directly after several gc. So subsequent script invocations load more and more class instances and it can even lead to our OOME. Only after some IDE inactivity, first gc led to freeing AntClassLoader instances* (all of them accumulated over the time), while the second finally unload all the classes. *) I've patched ClassLoader on bootclasspath to report CL creation and finalization. I'll try to trace remaining AntClassLoader references, if there are any...
You mean everything below "now the caveats" is true under 1.6.2 b1? What I'm really interested in is what, concretely, is holding onto class references in 1.6.1. AntClassLoader's, perhaps, but who is holding onto the AntClassLoader's? "first gc led to freeing AntClassLoader instances* (all of them accumulated over the time), while the second finally unload all the classes." - isn't it a JVM bug of sorts if GC'ing a ClassLoader does not release its classes in the same pass? Re. Ant 1.6.2: I'll put it in when it is released, not before. Still need to study the Ant bug and presumably some profiler info to see what is actually holding onto class references and why.
> You mean everything below "now the caveats" is true under 1.6.2 b1? Right. > What I'm really interested in is what, concretely, is holding onto > class references in 1.6.1 There are probably no remaining class references but the AntClassLoader2 itself is registered as a listener somewhere. AFAICT the fix in 1.6.2b1 is to properly unregister itself. Probably not. GCing classloader and unloading classes are very different things. Think of it like if first gc() will remove ClassLoader and all Class objects, while second gc() is sort of finalize() which causes unloading of the real classes metadata (Class instances are not that metadata!) Anyway, I'm not a real expert on this and I have to test the behaviour of it the black box way. I thought it may be related to the reflection usage, as it generates more classes and even more classloaders, which are referencing the AntClassLoader2 and there is even SoftReference involved (which may cause the delay in freeing), but I can't prove that theory for now.
I'm affraid I was wrong: I've returned to ant1.6.1 and observed the same behaviour: No classes are freed directly but after a minute or so of inactivity, everything can be unloaded. It is not usual behaviour of the plaftorm: I wrote a simple test that creates a CL, loads a simple class from it, instantiates it and even 50times reflectively invoke a method on it. But for this test, both the CL is finalized and both the class and the generated reflection class are unloaded during first gc (with no delay). So we probably keep some instance of something from those CLs referenced for some time even after finishing the task. I recall we had similar behaviour even for the core ant classes (not unloaded directly, unloaded only after some time), but they were loaded only once (probably because they were core, no taskdefs), so caused no real problem. You wrote about SoftReference holding the loader for them - that may be the problem: If the core classes and taskdefs somehow cross-reference, the taskdef classes couldn't be unloaded w/o unloading core classes. I'll try to verify this.
"I'll try to verify this" And that's it. If I replace the SoftReference with a WeakReference, everything can be unloaded immediatelly after finishing the task. Not that I like the "fix". This way we'd basically disable the caching and loading of all ant core classes (~600) during each ant invocation would be too expensive. Optimally, we should be able to keep core classes in JVM (they are loaded only once - no big deal) and still be able to unload taskdefs (which are multiply loaded), but that would mean to find all the references crossing the boundary between bridge loader and ant loader. The other option would be to eliminate preloading of some of those 600 classes, as most of them won't be needed anyway. The fallback option is to disable caching, but it is the last resort.
One more note: I have managed to hack ant to not load all those 600 core classes. After modifying ~2 classes and not changing the real behaviour, in now loads only about 230 classes for a NB-projects build script (no VSS and .Not loaded anymore). It should be possible to convince the ant team to do more lazy loading in ant, especially because they're apparently trying so and have failed in few places. Note: I also had to disable the bridge's cleanup code, as it also called those two worst methods: Project.getTaskDefinitions() Project.getDataTypeDefinitions()
Re. Ant not lazy loading core taskdefs: yes, I know. If you have a safe & effective patch it should be filed on ant.apache.org. May be complications though; there is no real spec for how it should behave. Anyway I doubt it would help us as much as it would help command-line Ant, since we would normally be reusing the loaded classes. Re. keeping a SoftReference to the bridge loader: this can't be the real problem. The Ant core itself only has a few hundred classes and no more should be loaded after running Ant repeatedly (unless you turn on lazy loading, in which case at most as many as are currently loaded would ever be loaded). From your description it sounds like the problem is that taskdef'd classes or other build-specific objects are held from the bridge loader as well. That is certainly wrong; perhaps some abuse of static fields etc. "...that would mean to find all the references crossing the boundary between bridge loader and ant loader" - yes, exactly. Don't we have tools to do this?? BTW it may be a reportable HotSpot bug if SoftReference's are in fact not cleared before throwing a PermGen OOME. The Javadoc does after all state that "All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError." It does not mention any exception for java.lang.Class objects.
Re. ant patch: I don't know much about ant internals and it probably involves changing some APIs there to be clean. It would help us had we have to remove the classloader cache. "Don't we have tools to do this??": Can't find the right hammer. I've found no instances and it is nontrivial to check incomming refs for all java.lang.Class instances in question... Note: I found no reference to the AntClassLoader2 instances either. Re. HotSpot bug: Even if it worked "correctly", it won't help us much. Imagine you've just finished one ant run with 63.5MB of PermGen full and you start another one - During the run, the classloader is strong-referenced, so you'll get OOME from PermGen during the taskdefs anyway. A possible workaround is to clear the soft reference after every 10 runs or so, like e.g. apache kills worker processes after serving given number of requests...
Re. HotSpot bug: Confirmed, it was bug at least in J2SDK1.4.2_01, fixed both in latest 1.4.2 and in 1.5 (4896986) > "Don't we have tools to do this??" JProfiler claimed I have only 9 instances of java.lang.Class in memory, which somehow disqualifies if from tracing class leaks... It was very hard to reproduce the problem under OptimizeIt (so slow and probably consumes client memory, so when the task finished, first gc() immediatelly unloaded all the classes), but I have found at least one class reference across domains, from the table at: static org.apache.tools.ant.IntrospectionHelper.helpers trying to clear it and to verify it helps.
So we finaly got it. All you need to do is to clear the content of the Hashtable referenced from org.apache.tools.ant.IntrospectionHelper.helpers after the build finishes. I'm leaving this on you as I'm not that experienced with your gutting code and the ant internals (what would happen if I clear it while another build is still running type-of-issues)
Hmm... this is very helpful but still raises some questions. Ant's IntrospectionHelper already clears its own cache if you use the factory method IH.getHelper(Project,Class) and that project ever fires buildFinished (which it should, if NB's BridgeImpl is working correctly). NB's IntrospectionHelperImpl uses the other variant - IH.gH(Class) - which does not do such cleanup; but since the cleanup is static, it is only necessary for *someone* to call IH.gH(P,C) sometime during the build (on the main project). I find it very odd that *no one* would be calling this important method. I guess it is possible; usage of the 2-arg variant is spotty.
Have a patch prepared, Petr maybe you can check that it works...
Also I will file a patch for Ant to make IntrospectionHelper.helpers be a WeakHashMap, which it certainly should be as far as I can tell.
Workaround: committed Up-To-Date 1.19 ant/src-bridge/org/apache/tools/ant/module/bridge/impl/BridgeImpl.java
Ant patch which would make this hack unnecessary: http://issues.apache.org/bugzilla/show_bug.cgi?id=30162
I'm affraid your ant path won't help. You have weakened only one path from the static reference (the key), but the path through value (IH.bean) remained strong. Wrapping also the values with a WeakReference should be enough.
Workaround works OK.