From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751164AbVHQQWq (ORCPT ); Wed, 17 Aug 2005 12:22:46 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751165AbVHQQWq (ORCPT ); Wed, 17 Aug 2005 12:22:46 -0400 Received: from mx2.elte.hu ([157.181.151.9]:62947 "EHLO mx2.mail.elte.hu") by vger.kernel.org with ESMTP id S1751164AbVHQQWp (ORCPT ); Wed, 17 Aug 2005 12:22:45 -0400 Date: Wed, 17 Aug 2005 18:23:24 +0200 From: Ingo Molnar To: Steven Rostedt Cc: Thomas Gleixner , linux-kernel@vger.kernel.org Subject: Re: 2.6.13-rc6-rt6 Message-ID: <20050817162324.GA24495@elte.hu> References: <20050816170805.GA12959@elte.hu> <1124214647.5764.40.camel@localhost.localdomain> <1124215631.5764.43.camel@localhost.localdomain> <1124218245.5764.52.camel@localhost.localdomain> <1124252419.5764.83.camel@localhost.localdomain> <1124257580.5764.105.camel@localhost.localdomain> <20050817064750.GA8395@elte.hu> <1124287505.5764.141.camel@localhost.localdomain> <1124288677.5764.154.camel@localhost.localdomain> <1124295214.5764.163.camel@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1124295214.5764.163.camel@localhost.localdomain> User-Agent: Mutt/1.4.2.1i X-ELTE-SpamScore: 0.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=0.0 required=5.9 tests=AWL autolearn=disabled SpamAssassin version=3.0.4 0.0 AWL AWL: From: address is in the auto white-list X-ELTE-VirusStatus: clean Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org * Steven Rostedt wrote: > On Wed, 2005-08-17 at 10:24 -0400, Steven Rostedt wrote: > > > OK the output from netconsole still seems like netconsole itself is > > causing some problems. But I think it is also showing this lockup. I'll > > recompile my kernel as UP and see if netconsole works fine. > > Well, the UP kernel boots on my laptop, but netconsole gives strange > warnings. > > OK, what's the scoop with the illegal_API_call? What is it about, and > what is the expected work around? this is a recent change: i've started flagging "naked" use of local_irq_disable(), because it's a problem on PREEMPT_RT and it's a potential SMP bug on upstream kernels. A local_irq_disable() is converted either to raw_local_irq_disable() when justified (it's mostly only justified for lowlevel arch code), or is eliminated totally. (either by merging it into a nearby spin_lock API call, or by removing it altogether, making sure that code doesnt break). Right now we print a warning on the first such API use, and then shut up about it. All local_irq_() APIs map to NOPs. (we keep the PF_IRQSOFF flag for compatibility, but only to get irqs_off() right which in turn shuts off a number of BUG_ON(!irqs_disabled()) warnings, and it doesnt have any other functional purpose.) the desired end-result would be the total elimination of local_irq_*() API calls. > I'm also getting the following output on shutdown: > > NET: Registered protocol family 31 > Bluetooth: HCI device and connection manager initialized > Bluetooth: HCI socket layer initialized > Bluetooth: L2CAP ver 2.7 > Bluetooth: L2CAP socket layer initialized > Bluetooth: RFCOMM ver 1.5 > Bluetooth: RFCOMM socket layer initialized > Bluetooth: RFCOMM TTY layer initialized > BUG: nonzero lock count 1 at exit time? > nfsd: 4696 [f7183830, 115] > [] check_no_held_locks+0x62/0x330 (8) > [] do_exit+0x257/0x480 (32) > [] __module_put_and_exit+0x52/0x70 (40) > [] nfsd+0x2b3/0x340 [nfsd] (12) > [] nfsd+0x0/0x340 [nfsd] (48) > [] kernel_thread_helper+0x5/0x18 (16) > --------------------------- > | preempt count: 00000000 ] > | 0-level deep critical section nesting: > ---------------------------------------- > > ------------------------------ > | showing all locks held by: | (nfsd/4696 [f7183830, 116]): > ------------------------------ > > #001: [c038e184] {kernel_sem.lock} > ... acquired at: lock_kernel+0x21/0x40 > > BUG: nfsd/4696, BKL held at task exit time! hm, it seems nfsd forgets to do an unlock_kernel() in some exit path it seems? We are enforcing strict balanced lock use in PREEMPT_RT - the upstream kernel is more relaxed about it. > And it goes on and on. This happens everytime. Without netconsole, I > only get the nonzero lock count error. Also, one of my lockups on SMP > had to do with the kernel_thread_helper: > > Using IPI Shortcut mode > khelper/794[CPU#0]: BUG in set_new_owner at kernel/rt.c:916 this is a 'must not happen'. Somehow lock->held list got non-empty. Maybe some use-after-free thing? Havent seen it myself. Ingo