This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.

Bug 167660 - ProfileOnRun enabled in 6.7rc3 gives multithread problems
Summary: ProfileOnRun enabled in 6.7rc3 gives multithread problems
Status: VERIFIED FIXED
Alias: None
Product: cnd
Classification: Unclassified
Component: D-Light (show other bugs)
Version: 6.x
Hardware: PC Linux
: P3 blocker (vote)
Assignee: Alexey Vladykin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-25 16:27 UTC by vcecchetto
Modified: 2009-07-17 14:02 UTC (History)
5 users (show)

See Also:
Issue Type: DEFECT
Exception Reporter:


Attachments
The producer-comsumer code (993 bytes, text/plain)
2009-06-25 16:28 UTC, vcecchetto
Details
Locks (208.65 KB, image/png)
2009-07-03 17:36 UTC, dnikitin
Details
Memory usage (215.18 KB, image/png)
2009-07-03 17:37 UTC, dnikitin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description vcecchetto 2009-06-25 16:27:30 UTC
Hi all. 
I forward here what I asked in http://forums.netbeans.org/topic13674.html:

Hi, i'm experiencing a really frustrating problem with multithread in apparently easy synchronization task in C code
with posix thread.

This is my system:

OS: Ubuntu (linux) 64 bit
JAVA: java version "1.6.0_07"
NETBEANS: 6.7.1 rc3

I created a really simple multithreading producer-consumer task.

main:
1. Set value = 0;
2. Start another thread waiting that value = 1 to finish;
3. Set value = 1;
4. Join the other thread to finish

Compile:

gcc -o main.bin main.o -g -W -Wall -ansi -pedantic -lpthread
OK!

I try to run manually from command line and it end normally.
OK!

I try to run in NetBeans 6.5.1 and it end normally.
OK!

I upgraded my NetBeans to 6.7rc3.
I try to run in NetBeans 6.7rc3.
It never end... The program hangs. No error.

The program, after starting the second thread, starts waiting.

If I try "Debug Main Project" (without breakpoint) in NetBeans 6.7rc3 it end normally! Strange.

A synchronization problem in my code?
Thx for every suggestion.

Attached you find the code. 

To restore the correct behaviour of the program I followed the suggestion:

Could you turn profiling off (project properties -> profiling -> profile
on run; by default it is on) and try once more? 

After i disabled ProfileOnRun all is ok.
Comment 1 vcecchetto 2009-06-25 16:28:57 UTC
Created attachment 84021 [details]
The producer-comsumer code
Comment 2 Maria Tishkova 2009-06-29 10:26:10 UTC
Do you have SunStudio installed on your linux and added as Build Tools?

Comment 3 Alexander Pepin 2009-06-30 17:03:35 UTC
It would be nice to have it fixed in patch1
Comment 4 soldatov 2009-07-02 12:20:29 UTC
I can't reproduce this problem. But I can describe same problem.

Linux 64-bit
Profiler: Simple Indicator
1) Create project with the attached file
2) Press F6 (Run project)
==> terminal appeared and program finished
3) Press ENTER (Close terminal)
4) Repeat 2)-3) steps 2-10 times

application stops on 2nd step
Comment 5 Leonid Lenyashin 2009-07-02 19:36:41 UTC
A few things I've found on this issue:
1) Problem is reproducible on my Ubuntu 9.04 32-bits
2) Root cause is that when our agent is used pthread_cond_signal gets stuck in...
   __kernel_vsyscall
   __lll_lock_wait
   pthread_cond_signal@@GLIBC_2.3.2
   main at main.c:44
This is weird since we do not interpose this call and it is not blocking by definition. So here is a bug in Ubuntu
kernel. I'd appreciate someone to report it. 

3) Here is something in interposition we do that expose this kernel issue. My hypotheses is based on observation that
consumer is sitting in: 
   __kernel_vsyscall
   pthread_cond_wait@@GLIBC_2.3.2
   pthread_cond_wait@@GLIBC_2.0 <== this thing is suspicious 
   pthread_cond_wait at prof_agent.c:143
   consumer at main.c 14
I hacked the way I interposed this call and problem is gone. I'm still looking for a solid fix.

4) With fixed library I see that lock waits information is reported as zero, which is not correct. I've checked and
prof_monitor reports correct values, but Gizmo doesn't. We looked in this with Andrew and we found that for some reason
prof_monitor is not running with Gizmo is working. Not a surprise that when checked with ProfilerDemo neither memory nor
lock waits looked correct (always zeros). 
Does it deserves a separate IZ and an investigation?

5) Once (first execution after IDE launch) I've seen 0 threads where reported! I found that reporter thread in agent
died for some reason... I was unable to reproduce it...

I'd prefer Alexey to continue, since I'm busy with other things. However I'm still looking for a solid solution for 2)
Comment 6 Leonid Lenyashin 2009-07-02 19:39:36 UTC
I meant I'm looking for a solution for 3).
2) just needs to be reported to Ubuntu guys.
Comment 7 Leonid Lenyashin 2009-07-02 20:28:12 UTC
Can anyone check if Gizmo with just simple indicators works on Ubuntu?
Comment 8 dnikitin 2009-07-03 11:20:55 UTC
Gizmo w/ simple indicators works on my Ubuntu 9.04 on Profiling Demo 
Comment 9 dnikitin 2009-07-03 11:58:52 UTC
vcecchetto's program hangs after FIRST STEP, but indicators work.
I've verified on 6.7 and on trunk Build 200907030200

but, in about 10-15% program ends normally. 
Comment 10 Leonid Lenyashin 2009-07-03 16:32:30 UTC
So, Dima, you are confirming that Locks and Memory was reported correctly? Right?
Comment 11 dnikitin 2009-07-03 17:36:56 UTC
Created attachment 84342 [details]
Locks
Comment 12 dnikitin 2009-07-03 17:37:22 UTC
Created attachment 84343 [details]
Memory usage
Comment 13 Maria Tishkova 2009-07-05 17:28:10 UTC
Alexey, please take care of this issue
Comment 14 Alexey Vladykin 2009-07-06 13:18:19 UTC
I'm observing the same situation as Leonid on my 64-bit Gentoo (kernel 2.6.29):

(gdb) thread apply all bt

Thread 2 (Thread 0x7f44c8df4950 (LWP 13175)):
#0  0x00007f44c9561dc9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1  0x00007f44c97742ac in pthread_cond_wait (p=0x6010c0, p1=0x601100) at prof_agent.c:143
#2  0x0000000000400af3 in consumer (c=0x0) at main.c:14
#3  0x00007f44c9774e30 in start_routine (pkg=0x2131040) at prof_agent.c:197
#4  0x00007f44c955e017 in start_thread () from /lib/libpthread.so.0
#5  0x00007f44c92d234d in clone () from /lib/libc.so.6
#6  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f44c9b7b6f0 (LWP 13173)):
#0  0x00007f44c9564444 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007f44c956224b in pthread_cond_signal@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0x0000000000400b94 in main (argc=1, argv=0x7fffd1b937e8) at main.c:40

But I don't have that suspicious line "pthread_cond_signal@@GLIBC_2.0". Leonid, what is the hack you did that prevented
the hang?
Comment 15 Alexey Vladykin 2009-07-06 14:22:28 UTC
Leonid's item #4 is worth a separate IZ if it's reproducible with original prof_agent/prof_monitor. If it happens only
with modified prof_agent/prof_monitor, then there is no need for a separate issue.
Comment 16 Alexey Vladykin 2009-07-06 14:28:34 UTC
apepin: I don't have a fix yet as the reason of the hang is not clear, and can't guarantee that the fix will be
available soon. Investigating.
Comment 17 Alexey Vladykin 2009-07-06 16:09:50 UTC
pthread_cond_signal hangs only if consumer thread has entered pthread_cond_wait. Otherwise pthread_cond_signal succeeds
and the program finishes.

Typically consumer thread starts fast enough, enters pthread_cond_wait and the program hangs. But enabling tracing in
prof_agent (recompiling with TRACE=1) changes thread schedule, pthread_cond_signal is often executed before consumer
enters pthread_cond_wait, and the program finishes successfully in more than 50% of runs.
Comment 18 Alexey Vladykin 2009-07-07 14:33:56 UTC
This has much to do with symbol versioning. See a very relevant blog entry from our colleague:
http://blogs.sun.com/rvs/entry/why_do_i_love_multiple
Comment 19 Alexey Vladykin 2009-07-07 15:01:56 UTC
Another working hack:

#define INSTRUMENT2(func, param, actual) \
int func (void * p param) { \
    static int (* ORIG(func))(void* p param) = NULL; \
    INIT2(func); \
    LOG(QUOTE(func) " called2\n"); \
    thlock++; \
    int ret = ORIG(func) (p actual); \
    thlock--; \
    LOG(QUOTE(func) " returned2\n"); \
    return ret; \
}

#define INIT2(func) \
    if(!ORIG(func)) { \
        ORIG(func) = dlvsym((void*)-1  /*RTLD_NEXT*/, QUOTE(func), "GLIBC_2.3.2"); \
        LOG("%s=%p\n", QUOTE(func), ORIG(func)); \
    }

INSTRUMENT2(pthread_cond_wait, VOID1P, ACTUAL1)
INSTRUMENT2(pthread_cond_timedwait, VOID2P, ACTUAL2)
Comment 20 Leonid Lenyashin 2009-07-07 18:20:00 UTC
Suggested fix (pseudo code):

INIT2 {
  orig_sym_def = NEXT('pthread_cond_wait')
  orig_sym_20  = NEXT('pthread_cond_wait@2.0')
  orig_sym_232 = NEXT('pthread_cond_wait@2.3.2')
  if(orig_sym_232 && orig_sym_20 == orig_sym_def) 
     // we are now in well known trouble
     orig_sym = orig_sym_232
  else 
     // just regular stuff
     if(orig_sym == our_sym) ...
}

Any objections? Alexey can you implement it?




Comment 21 Alexey Vladykin 2009-07-08 10:58:15 UTC
This still looks like a very specific hack. I have an idea of a more general solution. prof_agent should wrap each
version (GLIBC_2.0, GLIBC_2.3.2) of pthread_cond_wait and make sure they call corresponding version of original
function. Pseudocode:

pthread_cond_wait@GLIBC_2.0(...) {
    static orig = dlvsym(NEXT, "pthread_cond_wait", "GLIBC_2.0");
    orig(...);
}

pthread_cond_wait@GLIBC_2.3.2(...) {
    static orig = dlvsym(NEXT, "pthread_cond_wait", "GLIBC_2.3.2");
    orig(...);
}

This is doable. I will implement and test it when I have time.
Comment 22 Alexey Vladykin 2009-07-08 13:07:12 UTC
Fixed in http://hg.netbeans.org/cnd-main/rev/6d3c96d4d748
Works for attached program and for ProfilingDemo. Needs testing.
apepin: could you please test the fix and check if it's still possible to include it in patch 1?
Comment 23 Alexey Vladykin 2009-07-08 14:22:37 UTC
Marking as fixed.
Comment 24 dnikitin 2009-07-08 14:25:44 UTC
verified in dev build
Comment 25 pgebauer 2009-07-08 16:44:53 UTC
The fix has been ported into the release67_fixes repository.
http://hg.netbeans.org/release67_fixes/rev/3dd74f767286
Comment 26 Quality Engineering 2009-07-09 05:44:34 UTC
Integrated into 'main-golden', will be available in build *200907090200* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress)
Changeset: http://hg.netbeans.org/main-golden/rev/6d3c96d4d748
User: Alexey Vladykin <alexey_vladykin@netbeans.org>
Log: #167660 ProfileOnRun enabled in 6.7rc3 gives multithread problems
Comment 27 dnikitin 2009-07-15 15:22:24 UTC
The bug is reproduced in IDE 6.7.1 RC (Build 200907150227)
Comment 28 Alexey Vladykin 2009-07-15 15:47:44 UTC
I've checked release67_fixes. For some reason the changeset http://hg.netbeans.org/release67_fixes/rev/3dd74f767286 did
not update the binaries: dlight.tools/release/tools/Linux-x86/bin/prof_agent.so and
dlight.tools/release/tools/Linux-x86_64/bin/prof_agent.so. This is the most important part of the fix.

Another interesting question is whether those binary files can be updated during patch 1 installation from the update
center. The files are in netbeans/dlight1/tools/, outside of any versioned JAR. Could please somebody comment on it?
Comment 29 Alexey Vladykin 2009-07-16 09:26:49 UTC
BTW, who must put correct prof_agent.so's into release67_fixes: pgebauer or me?
Comment 30 pgebauer 2009-07-16 13:03:39 UTC
Another try to integrate correct binaries.
http://hg.netbeans.org/release67_fixes/rev/48646917323a
Comment 31 pgebauer 2009-07-16 13:24:47 UTC
Since binaries are part of org-netbeans-modules-dlight-tools.nbm, they should be updated during the patch installation from the update center without any 
problem. However QE should check it as a part of 67patch1 download testing.
Comment 32 dnikitin 2009-07-17 13:44:00 UTC
verified in IDE 6.7.1 RC (Build 200907162301)