162401 – [code model] Parsing large projects is too slow & consumes too much memory

This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 162401 - [code model] Parsing large projects is too slow & consumes too much memory

Summary: [code model] Parsing large projects is too slow & consumes too much memory

Status:	RESOLVED FIXED

Alias:	None

Product:	cnd
Classification:	Unclassified
Component:	Code Model (show other bugs)
Version:	6.x
Hardware:	All Linux

Importance:	P3 blocker with 2 votes (vote)
Assignee:	Alexander Simon

URL:
Keywords:	PERFORMANCE

Duplicates (1):	165731 (view as bug list)
Depends on:
Blocks:

Reported:	2009-04-10 14:46 UTC by rmartins
Modified:	2009-09-04 18:23 UTC (History)
CC List:	0 users

See Also:
Issue Type:	ENHANCEMENT
Exception Reporter:

Attachments
ace+tao project for ACE+TAO-5.6.8 (135.53 KB, application/x-gzip) 2009-05-14 08:07 UTC, Vladimir Voskresensky	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description rmartins 2009-04-10 14:46:35 UTC

I am trying to index a rather large project, the ACE+TAO middleware
(http://download.dre.vanderbilt.edu/previous_versions/ACE+TAO-5.6.8.tar.bz2).
Here are the steps that I took:
1. mkdir build (in ACE_TAO main directory)
2. cd build
3. ../configure --enable-ace-reactor-notification-queue

After creating the project in Netbeans (creating using existing source files), it takes a very long time to finished
finish analyzing/parsing the project and the memory consumption is too high.

If we set the max. amount of memory too low (512Mb), after awhile only 1 core will parse the project.
For ACE_TAO to properly function(to have all the core always parsing) we must set at least 2Gb.

Thanks,
Rolando

Comment 1 rmartins 2009-04-18 11:31:13 UTC

Hi,
I found that setting a max. heap size of 3Gb helps the parsing speed.
ACE_TAO: 

dev200904171401: mem. 3Gb, quad-core => 4 minutes to parse project

Hope it helps,
Rolando

Comment 2 rmartins 2009-04-30 11:37:05 UTC

Hi,
is there any progress on the memory consumption?
Could this issue be upgrade to P2?

Thanks,
Rolando

Comment 3 Vladimir Voskresensky 2009-05-04 13:46:45 UTC

Hi, Rolando, 
yes, there is. Should be 300Mb less for your case.

Comment 4 rmartins 2009-05-05 15:58:42 UTC

Hi,
great job, I checked and it uses about 300Mb;)

I noticed that from that from build 529 to 546, the parsing speed has dropped (significantly). But I'm not sure of the
cause. Could the "open project" optimizations have anything to do with it?

Now that the memory in under control, do you have any game plan for the parsing speed?

Thanks,
Rolando

Comment 5 Vladimir Voskresensky 2009-05-05 17:14:01 UTC

Hi, great news about memory :-)

Re dropped parsing: I've added some "consistency checks" of model to work correctly with projects containing symlinks
(i.e. MySQL) => in debug mode there are lot of checks. Could you, please, try with -J-da startup options to emulate
"release" build and inform about parsing speed

Thanks,
Vladimir.

Comment 6 rmartins 2009-05-05 18:34:55 UTC

Hi Vladimir,
I tried the "-J-da" but I think the speed remained the same:(

Doubts:
a) Is it normal for the "Scanning projects" to take too much time to go away?
b) When you reopen a project, will CND always going to parse it?

Thanks,
Rolando

Comment 7 Vladimir Voskresensky 2009-05-05 21:54:55 UTC

re a): this is the new thing of NB infrastructure which index files for faster Go To File dialog, but my observation
shows that in most cases we parse project faster than it index files :-(
re b): no, we don't. Once project is parsed, next opening should be one minute at max (depending on project size),
usually several seconds

Comment 8 rmartins 2009-05-06 14:38:05 UTC

Hi Vladimir,
re) re) a: Does this means that CND parsing has to wait for the indexing to finish, before it can start to parse?

I saw that you committed some cache entries. Is this part of an infrastructure for caching relevant AST elements?
What is the major problem for improving the parsing speed? Does CND create the total AST tree, or just goes file-by-file
and parses all the related includes?

Thanks,
Rolando

Comment 9 Vladimir Voskresensky 2009-05-06 14:59:17 UTC

Hi, Rolando,

> Does this means that CND parsing has to wait for the indexing to finish, before it can start to parse?
No we don't. But both process touch files => IO collisions

> What is the major problem for improving the parsing speed?
Now it is memory consumption... with often GCs application slows down significantly + we need to put/get needed data
from disk cache 

> Does CND create the total AST tree
During initial parse no - only top level declarations are created. Function bodies are skipped and AST is created only
when you open file in editor (or use Find Usages)

Comment 10 rmartins 2009-05-06 15:13:45 UTC

Hi Vladimir,

>> Does this means that CND parsing has to wait for the indexing to finish, before it can start to parse?
>No we don't. But both process touch files => IO collisions
Does the CND need this indexing? If not, could this be turn off (at least while we are parsing)

>> What is the major problem for improving the parsing speed?
>Now it is memory consumption... with often GCs application slows down significantly + we need to put/get needed data
>from disk cache
So the the memory consumption improvement has done by caching information to disk? 
The problem is not easy to solve:( Do you have any road-map for this? Ideas?

>> Does CND create the total AST tree
>During initial parse no - only top level declarations are created. Function bodies are skipped and AST is created only
>when you open file in editor (or use Find Usages)
Nice;)

Keep up the excellent work,
Rolando

Comment 11 Vladimir Voskresensky 2009-05-06 15:42:07 UTC

> Does the CND need this indexing? If not, could this be turn off (at least while we are parsing)
no :(

> So the the memory consumption improvement has done by caching information to disk? 
No. It was fixed on project side by skipping useless work for non source files

> The problem is not easy to solve:( Do you have any road-map for this? Ideas?
Disk cache is not new. We have it 3 releases... and meet problems all the time :-)

Comment 12 Vladimir Voskresensky 2009-05-07 17:22:44 UTC

reducing peak memory during full parse:
http://hg.netbeans.org/cnd-main/rev/b9a4e9d60e7e

Comment 13 Vladimir Voskresensky 2009-05-14 08:05:45 UTC

from Rolando (in issue 160829)
unfortunately the ACE+TAO didn't work as well as ACE. I did the same procedure that I did on ACE.
The ACE+TAO contains the ACE framework and the TAO ORB (basically 1 project with 2 subprojects, with TAO depending on ACE).
I had to give 1Gb of heap for the analyzing to work, otherwise I would get "Out of memory" exception.
I will post the screenshot.
(I took me >1h to get the project ready. After the parsing procedure finish, another parsing started immediately)
(Event with the file filters I got 33K files!)

Comment 14 Vladimir Voskresensky 2009-05-14 08:07:54 UTC

Created attachment 82093 [details]
ace+tao project for ACE+TAO-5.6.8

Comment 15 Vladimir Voskresensky 2009-05-14 08:11:45 UTC

Hi, Rolando,
I have attached my configured ace+tao project.
Place it on the same level as  ACE_wrappers and it should work.
You are right, creating project takes a lot of time and process with two full parses.
I had to set up 3G as memory to complete this operation.
But after that if I close IDE and run it again => project is opened from repository and occupies about 500Mb

Comment 16 Vladimir Voskresensky 2009-05-14 08:14:03 UTC

one more comment from Rolando:
---------------
Hi Vladimir,
I was reread all the posts and I'm not sure if you compiled ACE or ACE+TAO:
Includes info:
../ACE_wrappers/ace/os_include
../ACE_wrappers/ace/os_include/sys
+ inherited:
../ACE_wrappers/build/ace 
../ACE_wrappers/build 
../ACE_wrappers 
../ACE_wrappers/ace 
This seems to be only ACE...

Thanks,
Rolando

Comment 17 Vladimir Voskresensky 2009-05-14 08:16:21 UTC

re: I'm not sure if you compiled ACE or ACE+TAO
I compiled ACE+TAO and includes info was presented only for our discussed problematic file Message_Queue.cpp (which is
from ACE)

Comment 18 rmartins 2009-05-14 15:09:09 UTC

hi Vladimir,
I think I might figure out the problem with the ACE+TAO.
In the normal project creation, the "build" folder is excluded. This is a problem because there are several files that
are created during the project compilation (IDL files that are compiled and generates .h and .cpp) that aren't accessible.
I am downloading the 664 artifact and I will remove "build" directory from the ignore list, and see what happens. 

thanks,
Rolando

Comment 19 rmartins 2009-05-14 16:19:14 UTC

Hi Vladimir,
It didn't work. Despite having the build directory I still have a lot of missing includes...
Can you verify this?
Is there any workaround?
The missing includes I think are all related to the TAO subproject...
Can the dependencies be a problem? (The main makefile "build/MakeFile" calls all the other Makefiles (Tao,...))?

Thanks,
Rolando

Comment 20 Vladimir Voskresensky 2009-05-14 17:13:33 UTC

Hi Rolando,
Have you tried to build ace+tao and place my attached NB project as I suggested?
I have created it on Ubuntu 8.10, so it should work for you

Thanks,
Vladimir.

Comment 21 rmartins 2009-05-14 22:12:32 UTC

Hi Vladimir,
I tried to use your attach. but still I had a lot of includes that weren't recognized.
Is the any cache around that I must delete?
When I created the project my configuration.xml only had 700Kb, yours has 5Mb!
I really don't understand what is happening:(
When you open the project you get 0 errors (failed includes)?

Thanks,
Rolando

P.S.: I am abroad, so I am using a laptop with only 2Gb of memory. So I only gave Netbeans 1Gb. Can this influence the
outcome of the "project analysis"?

Comment 22 Quality Engineering 2009-05-16 08:59:39 UTC

Integrated into 'main-golden', will be available in build *200905160201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/9a6472cb9b28
User: Alexander Simon <alexvsimon@netbeans.org>
Log: fixing: IZ#162401:[code model] Parsing large projects is too slow & consumes too much memory
- introduce primary (L1) cache for mutable and large code model objects

Comment 23 Quality Engineering 2009-05-20 07:33:54 UTC

Integrated into 'main-golden', will be available in build *200905200201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/2fff1f3e15e1
User: Alexander Simon <alexvsimon@netbeans.org>
Log: fixing: IZ#162401:[code model] Parsing large projects is too slow & consumes too much memory
- optimize read lock on projectBase in method onInclude
- switch on APT cache

Comment 24 Alexander Simon 2009-05-20 12:56:42 UTC

Results of last optimizations on ACE+TAO on 4-core computer:
1. initial parsing time in with 1G memory
- astronomical - 17 minutes
- CPU - 43 minutes
2. initial parsing time in with 1.7G memory
- astronomical - 15 minutes
- CPU - 42 minutes
Note: Numbers do not include creating and scanning time (I suppose that project was created before)

Comment 25 Leonid Lenyashin 2009-05-20 13:43:23 UTC

How long did it take before the optimisation?

Comment 26 Quality Engineering 2009-05-21 08:35:29 UTC

Integrated into 'main-golden', will be available in build *200905210201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/e8f0807f0ec5
User: Alexander Simon <alexvsimon@netbeans.org>
Log: fixing: IZ#162401:[code model] Parsing large projects is too slow & consumes too much memory
- improve performance of service getter

Comment 27 Alexander Simon 2009-05-22 08:56:36 UTC

Ok, last optimization was cashing:
- weak cache large and mutable object in its UID
- cache precompile headers
- cache state after include
Overall gain:
optimization ON:  astronomical: 15 min, CPU: 43 min
optimization OFF: astronomical: 20 min, CPU: 60 min

Comment 28 Vladimir Voskresensky 2009-05-23 13:08:26 UTC

*** Issue 165731 has been marked as a duplicate of this issue. ***

Comment 29 Vladimir Voskresensky 2009-05-23 13:11:44 UTC

note:
running nb on 64bit JVM consumes 2x more memory than running in 32-bit mode.
This is due to the size of pointers and a lot of our data structures have references (not primitive types) as fields

Comment 30 Quality Engineering 2009-06-16 06:50:54 UTC

Integrated into 'main-golden', will be available in build *200906160201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/f5c4d7946740
User: Alexander Simon <alexvsimon@netbeans.org>
Log: fixing IZ#162401:[code model] Parsing large projects is too slow & consumes too much memory

Comment 31 Quality Engineering 2009-06-17 08:32:33 UTC

Integrated into 'main-golden', will be available in build *200906170201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/ad2bc67de12b
User: Alexander Simon <alexvsimon@netbeans.org>
Log: fixing IZ#162401:[code model] Parsing large projects is too slow & consumes too much memory

Comment 32 Alexander Simon 2009-06-17 19:58:42 UTC

optimization results:
- parsing in 1G on 32 bit JVM on 4 core computer
- astronomical time: 10 min
- CPU time: 25 min

Comment 33 Quality Engineering 2009-06-19 07:48:17 UTC

Integrated into 'main-golden', will be available in build *200906190201* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/7c6b37802b83
User: Alexander Simon <alexvsimon@netbeans.org>
Log: fixing IZ#162401:[code model] Parsing large projects is too slow & consumes too much memory
- simplify macro state cache key

Comment 34 Leonid Lenyashin 2009-08-18 14:53:23 UTC

Alexander,
Can you please update status of the feature: it is definitely not NEW, but at least STARTED. Another question is how
close to completion it is?
LL

Comment 35 Alexander Simon 2009-08-18 20:32:49 UTC

Best results was 3 weeks ago:
- parsing in 1G on 32 bit JVM on 4 core computer
- astronomical time: 7 min
- CPU time: 19 min
Possible times were changed due to code model fixes ;-)

Comment 36 Vladimir Kvashin 2009-09-04 18:23:43 UTC

Fixed.