From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756437Ab0A1Rq4 (ORCPT ); Thu, 28 Jan 2010 12:46:56 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756370Ab0A1Rqy (ORCPT ); Thu, 28 Jan 2010 12:46:54 -0500 Received: from thinktradellc.com ([66.17.177.171]:23071 "EHLO old.thinktradellc.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756369Ab0A1Rqx (ORCPT ); Thu, 28 Jan 2010 12:46:53 -0500 Message-ID: <4B61CD8A.50601@memeplex.com> Date: Thu, 28 Jan 2010 12:46:50 -0500 From: Andrew Athan User-Agent: Thunderbird 2.0.0.23 (Macintosh/20090812) MIME-Version: 1.0 To: Darren Hart CC: Andrew Athan , =?UTF-8?B?QW3DqXJpY28=?= =?UTF-8?B?IFdhbmc=?= , Peter Zijlstra , linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Gong Cheng Subject: Re: Futex hang/lockup problem in 2.6.30+ on AMD64 References: <4B4C3E4F.9060001@memeplex.com> <20100112145213.GB3925@hack> <1263308127.4244.142.camel@laptop> <2375c9f91001120700r4c2e1e05l5e5be3ddc6a13da2@mail.gmail.com> <4B4CA27F.1060102@memeplex.com> <4B568C3A.4080301@memeplex.com> <4B574359.9080308@us.ibm.com> In-Reply-To: <4B574359.9080308@us.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Darren Hart wrote: > Andrew Athan wrote: >> Andrew Athan wrote: >>> Américo Wang wrote: >>>> On Tue, Jan 12, 2010 at 10:55 PM, Peter Zijlstra >>>> wrote: >>>> >>>>> On Tue, 2010-01-12 at 22:52 +0800, Américo Wang wrote: >>>>> >>>>> >>>>>>> $ uname -a >>>>>>> Linux UK22 2.6.30-2-amd64 #1 SMP Fri Sep 25 22:16:56 UTC 2009 >>>>>>> x86_64 >>>>>>> GNU/Linux >>>>>>> >>>>> Does a recent kernel work? >>>>> >>>>> >>>>> >>>> >>>> Ah, I just wanted to ask the same question, adding the original >>>> reporter >>>> Gong Cheng into Cc... >>>> >>>> Gong, could you reproduce it on the latest kernel? And what is your >>>> .config? >>>> >>>> Thanks! >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe >>>> linux-kernel" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> Please read the FAQ at http://www.tux.org/lkml/ >>>> >>> Due to remote location of the hardware and I haven't been able to >>> test a more recent (or older) kernel. Remote hands have put a KVM >>> on the box as of an hour ago, so I hope to have some information for >>> you in a day or two. >>> >>> A. >>> >> >> >> I wanted to report that although I have had no luck (so far) running >> anything more recent than 2.6.30, I was able to revert to 2.6.26. >> Unfortunately, the application hang still occurs. I also saw a >> similar hang of the application running on a 32 bit Intel box, also >> under 2.6.26. So far, the hang *always* involves threads stuck on >> pthread_cond_broadcast()'s condition variable's internal lock while >> other threads are waiting on the outer "public" lock. > > > Are you using real-time scheduling policy or priority inheritance > (PTHREAD_PRIO_INHERIT)? It is possible to suffer an unbounded priority > inversion on the internal condvar data lock in the current distro > implementations of glibc. > > >> These other threads are *not* yet (nor about to) >> pthread_cond_wait(). I saw a message from Darren Hart (subject "Re: >> Problems with futex") in response to someone who apparently was >> having futex problems in 2.6.27, so I'm still operating under the >> assumption that this is not an application bug. > > Those all turned out to be application issues with one exception which > had already been fixed upstream. > > >> Over the next couple of days, I will be running a version of the >> application in which I replaced the pthread_cond calls with simpler >> locks, in the hopes that it won't hang (because I'm hoping the >> underlying implementation in pthreads uses a different set of futex >> opcodes). >> >> Andrew Athan >> > > I wanted to report that this application hang is certainly related to pthread_cond_* calls. With them in place, it consistently hangs. Without, it consistently does not. Whether pthread_cond_* is misbehaving due to memory corruption or another application bug I suppose is an open question. We have now experienced several lockups where even a kill -9 of the application won't get rid of it. Does this say anything about the nature of the hang? By the way, majordomo stopped sending me emails as of 1/17 so I have not seen any updates to this thread sent after this date. Not sure why this happened, as I never asked to be unsubscribed. I've resubscribed, but not sure I will get anything. Please make sure I am directly cc:ed on any responses. carlinux138:~# uname -a Linux carlinux138.thinktradellc.com 2.6.26-2-686 #1 SMP Sun Jun 21 04:57:38 UTC 2009 i686 GNU/Linux (I have to go look up what the best way to give a system config snapshot is, e.g., all major library version etc ... ) Thanks, Andrew Athan