This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 159563 - Analyzing project doesn't scale for large projects
Summary: Analyzing project doesn't scale for large projects
Status: RESOLVED FIXED
Alias: None
Product: cnd
Classification: Unclassified
Component: Code Model (show other bugs)
Version: 6.x
Hardware: All Linux
: P2 blocker (vote)
Assignee: Alexander Simon
URL:
Keywords: PERFORMANCE
Depends on:
Blocks:
 
Reported: 2009-03-03 20:12 UTC by rmartins
Modified: 2009-04-10 14:49 UTC (History)
0 users

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
Thread dump while openning project (18.87 KB, text/plain)
2009-03-04 10:32 UTC, rmartins
Details
Thread dump while analyzing project (19.88 KB, text/plain)
2009-03-04 10:33 UTC, rmartins
Details
Thread dump while parsing project (20.36 KB, text/plain)
2009-03-04 10:34 UTC, rmartins
Details

Note You need to log in before you can comment on or make changes to this bug.
Description rmartins 2009-03-03 20:12:42 UTC
I am trying to index a rather large project, the ACE+TAO middleware
(http://download.dre.vanderbilt.edu/previous_versions/ACE+TAO-5.6.8.tar.bz2).
Here are the steps that I took:
1. mkdir build (in ACE_TAO main directory)
2. cd build
3. ../configure --enable-ace-reactor-notification-queue

After creating a project in Netbeans (creating using existing source files), it takes a very long time to finished
finish analyzing the project.
(Despite having a quad, only 1 core is utilized)
Comment 1 Alexander Simon 2009-03-04 08:29:58 UTC
Could you provide us a thread damp at parsing time?
Please check you JVM parameters.
CND has a property that rewrite default number of threads:
    -J-Dcnd.modelimpl.parser.threads=4
By default it is a number of cores.
Could you specify this parameter and try NB again and take a thread dump?
Comment 2 rmartins 2009-03-04 10:32:41 UTC
Created attachment 77694 [details]
Thread dump while openning project
Comment 3 rmartins 2009-03-04 10:33:05 UTC
Created attachment 77695 [details]
Thread dump while analyzing project
Comment 4 rmartins 2009-03-04 10:34:10 UTC
Created attachment 77696 [details]
Thread dump while parsing  project
Comment 5 rmartins 2009-03-04 10:36:09 UTC
I used the -J-Dcnd.modelimpl.parser.threads=4 param.
It starts to parse but fails, outputting that code completion isn't available.   
Comment 6 Alexander Simon 2009-03-13 17:28:42 UTC
One of the problem was fixed:
- IDE analyzes project in several threads (if computer has several cores).
Comment 7 Alexander Simon 2009-03-13 17:33:25 UTC
Partly was fixed problem with memory consumption on project items:
- removed duplicated items
- remove redundant fields
But memory is steel too large (about 100Mb on ACE)
Comment 8 Alexander Simon 2009-03-16 11:25:22 UTC
P2 because ACE cannot be open in NB IDE.
Comment 9 rmartins 2009-03-19 10:50:24 UTC
Hi Alex,
I have been able to open the separate ACE framework project (and parse) with about 2Gb~2.5Gb of memory (-J-Xmx2048m) (I
left it overnight to index...).
With ACE_TAO, I was able to index with 3.5Gb (also left it overnight).

Did you commit any for your enhancements to the golden repo? I still only have 1 core being utilized...

 
Comment 10 Vladimir Voskresensky 2009-03-19 11:09:05 UTC
Hi, 
The fixes are available in today's official nightly build.
Memory consumption should be reduced + analyzing project job is using all available CPUs
Btw, you can always access last successful C++ build in Build Artifacts at
http://bertram.netbeans.org/hudson/job/cnd-main/lastSuccessfulBuild/

Thanks,
Vladimir.
P.S. Have you already signed up for NetCAT 6.7 program? :-)
http://wiki.netbeans.org/NetCAT67Participants
Comment 11 rmartins 2009-03-19 11:23:58 UTC
Hi,
I will join the NetCAT 6.7.
I hope I can contribute to your great effort.

Keep the good work!

P.S.: thanks for the info about the c++ Build Artifacts
Comment 12 Alexander Simon 2009-03-19 13:48:34 UTC
Current nightly build should have following performance:
- project contains 14315 source and header files
- initial parse consumes about 90 minutes of CPU time (physical time is about 30 minutes)
- parsing in 1700Mb xms memory

Computer:
- memory 4G
- CPU speed 3000
- number of cores 4

It is a better but not enough ;-)
The ACE is a challenge for us.

>I still only have 1 core being utilized...
- IDE has some threads on analyzing and parsing time
- IDE read project (reopen) in one thread.
Comment 13 rmartins 2009-03-19 13:57:14 UTC
Ok, I will have to wait for a good build on Build Artifacts or wait for the nightly build;)

The project shows a clear and steady progress, for me that is the most important thing. ;)

As soon I manage to test it, I will post about my experience.

Thanks,
Rolando



Comment 14 rmartins 2009-03-19 17:32:09 UTC
hi,
I am testing the NetBeans-dev-cnd-main-47-on-090319-full and detected a strange behavior.
Still using ACE_TAO, I detected that after a while the parsing stops using the 4 cores and starts using only one.
The amount required for this to happens is directly linked to the amount of memory you gave the VM.
When I used 2Gb I manage to get the 4 cores working until 20% (parsed project), when I switched to 1.5Gb the same
behavior happen when the parsing reached about 10%.
Hope it helps.
Rolando
Comment 15 Alexander Simon 2009-03-22 09:07:05 UTC
Thanks for reporting problem.
This is a "post parsing project task".
It was parallelized in change set:
http://hg.netbeans.org/cnd-main?cmd=changeset;node=cd40f67846f4
Comment 16 rmartins 2009-03-22 20:16:26 UTC
Congrats Alex, it's way better;)

using 2Gb intel q9450@3.2Ghz:

Now I can parse the ACE framework without any issue, 2-3 minutes walltime (and it uses the 4 cores).
http://download.dre.vanderbilt.edu/previous_versions/ACE-5.6.8.tar.bz2

ACE & TAO project I get an error about completion failed.
http://download.dre.vanderbilt.edu/previous_versions/ACE+TAO-5.6.8.tar.bz2
Comment 17 rmartins 2009-03-23 17:22:10 UTC
Hi Alex,
is the opening of a project parallelized? When comparing to the other tasks it seems slow...
What comprehends  the "open project" task? It scans for checking all files of a project?

Thanks,
Rolando
Comment 18 Alexander Simon 2009-03-23 18:11:48 UTC
>What comprehends  the "open project" task?
- convert paths stored in "configuration.xml" in project items
It includes following time consuming operations:
- convert path to canonical file
- detect file MIME type
- create file object by file (class FileObject)
- create data object by file object (class DataObject)
Project ACE+TAO contains a lot of unnecessary files.
For example:
- number of c/c++/header files: ~14K
- number of other (*.sln, *.bmak, *.vcproj, ...) files: ~44K
So problem in 44K files.
Comment 19 rmartins 2009-03-23 18:17:10 UTC
Hi Alex,
can you do it in parallel?

Is there any way to have a ignore extensions list (*.sln, *.bmak, *.vcproj, ...), and simply bypass them (perhaps you
already have done this...;) )

Rolando
Comment 20 Alexander Simon 2009-03-24 10:09:11 UTC
>Is there any way to have a ignore extensions list (*.sln, *.bmak, *.vcproj, ...)
Yes,Tools->Options->Miscellaneous->Files.
Modify "Ignore Files pattern".
By default CND adds following ignore pattern:
".*\\.(o|lo|la|Po|Plo)$"
Adding other extension is user responsibility.

>can you do it in parallel?
I can but it is error prone and I prefer not to do it.
Comment 21 rmartins 2009-03-24 10:46:47 UTC
Alex, 
thanks for the info. I going to had the extensions to the ignore list.

>can you do it in parallel?
>I can but it is error prone and I prefer not to do it.
Ok, sorry for insistence on this, but and if you split the directory tree?

      Root
  -----|-----
  |         |
 ACE       TAO

One thread for ACE and other for TAO (what I meant to say is, split the top directory tree among the available cores).
Does this avoid the errros you mention?

Rolando
Comment 22 rmartins 2009-03-24 11:45:53 UTC
Alex,
I have noticed that some .h, .cpp (and .inl, I added the inl extension as a header type) are greyed, but they are used
and if I open the file, they were correctly parse and doesn't seem to have any issue.
Ex:ace/Assert.h
(I didn't change the ignore list)

What do you think?

Rolando
Comment 23 rmartins 2009-03-25 01:10:04 UTC
Alex,
I have the following errors with ACE_TAO (I have updated the ignore file list).
When finish creating the project, I have "completion failed":

0 out of 6,597 source files have limited code assistance 
0 out of 6,858 header files have limited code assistance

Do you have any clues why this happens?
Having:
1. mkdir build (in ACE_TAO main directory)
2. cd build
3. ../configure --enable-ace-reactor-notification-queue


I add manually the include and macro files.
include:
ACE_TAO/
ACE_TAO/TAO
ACE_TAO/TAO/orbsvcs

macro:
ACE_TAO/build/ace/config.h
ACE_TAO/build/TAO/tao/config.h

I don't know if it's because of the code completion error, but when I try to find usages:
ACE_TAO/TAO/tao/RTCORBA/Thread_Pool.h
TAO_Thread_Pool_Threads::run

It's takes a very long time to find the usages...
Comment 24 Alexander Simon 2009-03-25 08:40:23 UTC
>When finish creating the project, I have "completion failed":
- it mean that code assistance still has a failed include directives. You can see it in project popup menu "Code
Assistance->Failed include directives"
Causes:
- not enough information in object files
- bugs in analyze algorithm
- bugs in code model
If "failed include" dialog show a few unresolved directives on huge ACE, IMHO it is good enough.
You can investigate why code model has the filed include directives.

>It's takes a very long time to find the usages...
Find usages consists from two steps:
- grep all source/header files for finding ID
- visit all references in selected files
So if you are finding a short ID, it is possible that there are many files will be involved into visiting references.
Also a first step can take a lot of time.
Comment 25 rmartins 2009-03-25 10:45:57 UTC
>>It's takes a very long time to find the usages...
>Find usages consists from two steps:
>- grep all source/header files for finding ID
What does the initial parsing of project files? I thought that in this step a "complete" AST was created.

>- visit all references in selected files
Does the concept of "indexer" exists? Where you can query your a declaration and find the possible bindings.

>So if you are finding a short ID, it is possible that there are many files will be involved into visiting references.
>Also a first step can take a lot of time.
Comment 26 rmartins 2009-03-30 10:35:35 UTC
Hi Alex(, Vladimir),
the process of opening the ACE_TAO is very slow (I have excluded the .sln, etc). Despite the parsing is still a bit slow
and uses a lot of memory, the opening seems to me the immediate bottleneck.

Rolando
Comment 27 Alexander Simon 2009-03-30 11:18:55 UTC
>the process of opening the ACE_TAO is very slow
Rolando, please file a separate issue for this.
Comment 28 rmartins 2009-04-03 11:40:09 UTC
Hi Alex & Vladimir,
I read this issue - http://www.netbeans.org/issues/show_bug.cgi?id=134990, 
and found this idea very interesting:
"Creating some module that can collect performance data is great idea! Thanks. I will ask here if we have something like
this."
Can this be done for CND?
This also be helpful for the find usages issue...

Rolando
Comment 29 Leonid Lenyashin 2009-04-10 09:46:22 UTC
The issue covers so many different things (parsing, find usages, opening project) that it needs to be broken down into
smaller parts. So, Alexander, Rolando, please go ahead and file separate issues. I would assume they are going to be P3s
since it reasonably expected to be slow on huge projects. However we are committed to work on performance issues and
make CND capable for very large projects.
As for now I'm closing this IZ as FIXED, since a lot of improvements have been implemented by Alexander.
Comment 30 rmartins 2009-04-10 14:40:32 UTC
Hi,
just for not double issuing, the find usages and opening project already have there own issue report, so I will create a
new one for the parsing/memory consumption.

I really appreciate your effort, I am doing research work and the capability for handling large projects it is a must
for me.

Thanks,
Rolando
Comment 31 rmartins 2009-04-10 14:49:47 UTC
Hi,
Here are the separate issues:
Non scalable project opening - http://www.netbeans.org/issues/show_bug.cgi?id=161455
[code model] Parsing large projects is too slow & consumes too much memory -
http://www.netbeans.org/issues/show_bug.cgi?id=162401
[code model] Non scalable "find usages" for small keywords - http://www.netbeans.org/issues/show_bug.cgi?id=161456

Rolando