This Bugzilla instance is a read-only archive of historic NetBeans bug reports. To report a bug in NetBeans please follow the project's instructions for reporting issues.
Hi all. I forward here what I asked in http://forums.netbeans.org/topic13674.html: Hi, i'm experiencing a really frustrating problem with multithread in apparently easy synchronization task in C code with posix thread. This is my system: OS: Ubuntu (linux) 64 bit JAVA: java version "1.6.0_07" NETBEANS: 6.7.1 rc3 I created a really simple multithreading producer-consumer task. main: 1. Set value = 0; 2. Start another thread waiting that value = 1 to finish; 3. Set value = 1; 4. Join the other thread to finish Compile: gcc -o main.bin main.o -g -W -Wall -ansi -pedantic -lpthread OK! I try to run manually from command line and it end normally. OK! I try to run in NetBeans 6.5.1 and it end normally. OK! I upgraded my NetBeans to 6.7rc3. I try to run in NetBeans 6.7rc3. It never end... The program hangs. No error. The program, after starting the second thread, starts waiting. If I try "Debug Main Project" (without breakpoint) in NetBeans 6.7rc3 it end normally! Strange. A synchronization problem in my code? Thx for every suggestion. Attached you find the code. To restore the correct behaviour of the program I followed the suggestion: Could you turn profiling off (project properties -> profiling -> profile on run; by default it is on) and try once more? After i disabled ProfileOnRun all is ok.
Created attachment 84021 [details] The producer-comsumer code
Do you have SunStudio installed on your linux and added as Build Tools?
It would be nice to have it fixed in patch1
I can't reproduce this problem. But I can describe same problem. Linux 64-bit Profiler: Simple Indicator 1) Create project with the attached file 2) Press F6 (Run project) ==> terminal appeared and program finished 3) Press ENTER (Close terminal) 4) Repeat 2)-3) steps 2-10 times application stops on 2nd step
A few things I've found on this issue: 1) Problem is reproducible on my Ubuntu 9.04 32-bits 2) Root cause is that when our agent is used pthread_cond_signal gets stuck in... __kernel_vsyscall __lll_lock_wait pthread_cond_signal@@GLIBC_2.3.2 main at main.c:44 This is weird since we do not interpose this call and it is not blocking by definition. So here is a bug in Ubuntu kernel. I'd appreciate someone to report it. 3) Here is something in interposition we do that expose this kernel issue. My hypotheses is based on observation that consumer is sitting in: __kernel_vsyscall pthread_cond_wait@@GLIBC_2.3.2 pthread_cond_wait@@GLIBC_2.0 <== this thing is suspicious pthread_cond_wait at prof_agent.c:143 consumer at main.c 14 I hacked the way I interposed this call and problem is gone. I'm still looking for a solid fix. 4) With fixed library I see that lock waits information is reported as zero, which is not correct. I've checked and prof_monitor reports correct values, but Gizmo doesn't. We looked in this with Andrew and we found that for some reason prof_monitor is not running with Gizmo is working. Not a surprise that when checked with ProfilerDemo neither memory nor lock waits looked correct (always zeros). Does it deserves a separate IZ and an investigation? 5) Once (first execution after IDE launch) I've seen 0 threads where reported! I found that reporter thread in agent died for some reason... I was unable to reproduce it... I'd prefer Alexey to continue, since I'm busy with other things. However I'm still looking for a solid solution for 2)
I meant I'm looking for a solution for 3). 2) just needs to be reported to Ubuntu guys.
Can anyone check if Gizmo with just simple indicators works on Ubuntu?
Gizmo w/ simple indicators works on my Ubuntu 9.04 on Profiling Demo
vcecchetto's program hangs after FIRST STEP, but indicators work. I've verified on 6.7 and on trunk Build 200907030200 but, in about 10-15% program ends normally.
So, Dima, you are confirming that Locks and Memory was reported correctly? Right?
Created attachment 84342 [details] Locks
Created attachment 84343 [details] Memory usage
Alexey, please take care of this issue
I'm observing the same situation as Leonid on my 64-bit Gentoo (kernel 2.6.29): (gdb) thread apply all bt Thread 2 (Thread 0x7f44c8df4950 (LWP 13175)): #0 0x00007f44c9561dc9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00007f44c97742ac in pthread_cond_wait (p=0x6010c0, p1=0x601100) at prof_agent.c:143 #2 0x0000000000400af3 in consumer (c=0x0) at main.c:14 #3 0x00007f44c9774e30 in start_routine (pkg=0x2131040) at prof_agent.c:197 #4 0x00007f44c955e017 in start_thread () from /lib/libpthread.so.0 #5 0x00007f44c92d234d in clone () from /lib/libc.so.6 #6 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f44c9b7b6f0 (LWP 13173)): #0 0x00007f44c9564444 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007f44c956224b in pthread_cond_signal@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x0000000000400b94 in main (argc=1, argv=0x7fffd1b937e8) at main.c:40 But I don't have that suspicious line "pthread_cond_signal@@GLIBC_2.0". Leonid, what is the hack you did that prevented the hang?
Leonid's item #4 is worth a separate IZ if it's reproducible with original prof_agent/prof_monitor. If it happens only with modified prof_agent/prof_monitor, then there is no need for a separate issue.
apepin: I don't have a fix yet as the reason of the hang is not clear, and can't guarantee that the fix will be available soon. Investigating.
pthread_cond_signal hangs only if consumer thread has entered pthread_cond_wait. Otherwise pthread_cond_signal succeeds and the program finishes. Typically consumer thread starts fast enough, enters pthread_cond_wait and the program hangs. But enabling tracing in prof_agent (recompiling with TRACE=1) changes thread schedule, pthread_cond_signal is often executed before consumer enters pthread_cond_wait, and the program finishes successfully in more than 50% of runs.
This has much to do with symbol versioning. See a very relevant blog entry from our colleague: http://blogs.sun.com/rvs/entry/why_do_i_love_multiple
Another working hack: #define INSTRUMENT2(func, param, actual) \ int func (void * p param) { \ static int (* ORIG(func))(void* p param) = NULL; \ INIT2(func); \ LOG(QUOTE(func) " called2\n"); \ thlock++; \ int ret = ORIG(func) (p actual); \ thlock--; \ LOG(QUOTE(func) " returned2\n"); \ return ret; \ } #define INIT2(func) \ if(!ORIG(func)) { \ ORIG(func) = dlvsym((void*)-1 /*RTLD_NEXT*/, QUOTE(func), "GLIBC_2.3.2"); \ LOG("%s=%p\n", QUOTE(func), ORIG(func)); \ } INSTRUMENT2(pthread_cond_wait, VOID1P, ACTUAL1) INSTRUMENT2(pthread_cond_timedwait, VOID2P, ACTUAL2)
Suggested fix (pseudo code): INIT2 { orig_sym_def = NEXT('pthread_cond_wait') orig_sym_20 = NEXT('pthread_cond_wait@2.0') orig_sym_232 = NEXT('pthread_cond_wait@2.3.2') if(orig_sym_232 && orig_sym_20 == orig_sym_def) // we are now in well known trouble orig_sym = orig_sym_232 else // just regular stuff if(orig_sym == our_sym) ... } Any objections? Alexey can you implement it?
This still looks like a very specific hack. I have an idea of a more general solution. prof_agent should wrap each version (GLIBC_2.0, GLIBC_2.3.2) of pthread_cond_wait and make sure they call corresponding version of original function. Pseudocode: pthread_cond_wait@GLIBC_2.0(...) { static orig = dlvsym(NEXT, "pthread_cond_wait", "GLIBC_2.0"); orig(...); } pthread_cond_wait@GLIBC_2.3.2(...) { static orig = dlvsym(NEXT, "pthread_cond_wait", "GLIBC_2.3.2"); orig(...); } This is doable. I will implement and test it when I have time.
Fixed in http://hg.netbeans.org/cnd-main/rev/6d3c96d4d748 Works for attached program and for ProfilingDemo. Needs testing. apepin: could you please test the fix and check if it's still possible to include it in patch 1?
Marking as fixed.
verified in dev build
The fix has been ported into the release67_fixes repository. http://hg.netbeans.org/release67_fixes/rev/3dd74f767286
Integrated into 'main-golden', will be available in build *200907090200* on http://bits.netbeans.org/dev/nightly/ (upload may still be in progress) Changeset: http://hg.netbeans.org/main-golden/rev/6d3c96d4d748 User: Alexey Vladykin <alexey_vladykin@netbeans.org> Log: #167660 ProfileOnRun enabled in 6.7rc3 gives multithread problems
The bug is reproduced in IDE 6.7.1 RC (Build 200907150227)
I've checked release67_fixes. For some reason the changeset http://hg.netbeans.org/release67_fixes/rev/3dd74f767286 did not update the binaries: dlight.tools/release/tools/Linux-x86/bin/prof_agent.so and dlight.tools/release/tools/Linux-x86_64/bin/prof_agent.so. This is the most important part of the fix. Another interesting question is whether those binary files can be updated during patch 1 installation from the update center. The files are in netbeans/dlight1/tools/, outside of any versioned JAR. Could please somebody comment on it?
BTW, who must put correct prof_agent.so's into release67_fixes: pgebauer or me?
Another try to integrate correct binaries. http://hg.netbeans.org/release67_fixes/rev/48646917323a
Since binaries are part of org-netbeans-modules-dlight-tools.nbm, they should be updated during the patch installation from the update center without any problem. However QE should check it as a part of 67patch1 download testing.
verified in IDE 6.7.1 RC (Build 200907162301)