From: "Américo Wang" <xiyou.wangcong@gmail.com>
To: Andrew Athan <linux_kernel_aathan@memeplex.com>
Cc: linux-kernel@vger.kernel.org, Darren Hart <dvhltc@us.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>
Subject: Re: Futex hang/lockup problem in 2.6.30+ on AMD64
Date: Tue, 12 Jan 2010 22:52:13 +0800 [thread overview]
Message-ID: <20100112145213.GB3925@hack> (raw)
In-Reply-To: <4B4C3E4F.9060001@memeplex.com>
On Tue, Jan 12, 2010 at 04:18:07AM -0500, Andrew Athan wrote:
>
> After some investigation I believe I am experiencing a problem similar
> to the one described in this posting:
> http://sourceware.org/ml/libc-help/2009-10/msg00026.html, in that the
> poster suspects a problem in the futex implementation in 2.6.30 and
> above kernels. In my case, the problem is not a soft lockup in the
> kernel, but it does result in an application lock up due to all threads
> waiting for futex's.
>
> For me this problem began to appear once I upgraded my Debian
> squeeze/testing x86_64 installation (AMD) to a new kernel. I'm not
> sure what the prior kernel version was. The same software running on
> different machines with earlier kernels (lenny) does not seem to
> experience the problem.
>
> I'm really not sure if this is a libc or kernel problem, but due to
> the stack trace, which shows what appears to be a hang on the internal
> __lock of the condition variable, it appears likely this is not an
> application bug. Memory does not appear to be corrupt (I store
> sentinels around the mutexes, and they have retained their values).
>
> It appears that the cond var's __lock indicates there are waiters
> even though there are/should-be none (assuming I'm interpreting the
> __lock value of 2 correctly). Since the __lock in question is a futex
> primitive, and it must be held regardless of other libc/nptl state
> variables,
> I don't believe this is a libc problem.
>
> The problem occurs rarely, but innevitably, and sometimes only after
> several hours of normal program operation. I have not yet
> successfully created a reduced test program that can faithfully
> reproduce the hang in a short timeframe.
>
> The application contains a thread pool where threads perform many
> operations between pthread calls but can be summarized as one of three
> cases below. Due to the design of the thread pool, threads
> round-robbin or at least are randomly assigned a workload (in contrast
> to having one constant broadcast thread).
>
> case 1: while(1){ *A* pthread_lock();pthread_unlock();}
> case 2: pthread_lock();pthread_cond_wait();pthread_unlock();
> case 3: pthread_lock(); *B* pthread_cond_broadcast();pthread_unlock();
>
> The application becomes hung with all threads but one stuck at *A*,
> and one thread at *B*.
>
> The stack trace and other details appear below. I've saved the core
> file in case I can provide additional information.
>
>
> $ uname -a
> Linux UK22 2.6.30-2-amd64 #1 SMP Fri Sep 25 22:16:56 UTC 2009 x86_64
> GNU/Linux
Hmm, thanks for reporting this here.
Adding futex experters into Cc...
--
Live like a child, think like the god.
next prev parent reply other threads:[~2010-01-12 14:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-12 9:18 Futex hang/lockup problem in 2.6.30+ on AMD64 Andrew Athan
2010-01-12 14:52 ` Américo Wang [this message]
2010-01-12 14:55 ` Peter Zijlstra
2010-01-12 15:00 ` Américo Wang
2010-01-12 16:25 ` Andrew Athan
2010-01-20 4:53 ` Andrew Athan
2010-01-20 17:54 ` Darren Hart
2010-01-28 17:46 ` Andrew Athan
2010-01-12 17:53 ` Gong Cheng
2010-01-13 16:03 ` Américo Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100112145213.GB3925@hack \
--to=xiyou.wangcong@gmail.com \
--cc=dvhltc@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux_kernel_aathan@memeplex.com \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.