From mboxrd@z Thu Jan 1 00:00:00 1970 From: Karsten Wiese Subject: Re: [ANNOUNCE] 3.0.14-rt31 Date: Wed, 11 Jan 2012 00:56:20 +0100 Message-ID: <201201110056.21589.fzuuzf@googlemail.com> References: <1324525237.5916.114.camel@gandalf.stny.rr.com> <1326221586.7642.47.camel@gandalf.stny.rr.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Cc: Georgiewskiy Yuriy , LKML , RT , Thomas Gleixner , Clark Williams , John Kacur To: Steven Rostedt Return-path: Received: from mail-ee0-f46.google.com ([74.125.83.46]:42903 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752714Ab2AJX42 (ORCPT ); Tue, 10 Jan 2012 18:56:28 -0500 In-Reply-To: <1326221586.7642.47.camel@gandalf.stny.rr.com> Sender: linux-rt-users-owner@vger.kernel.org List-ID: Am Dienstag 10 Januar 2012 schrieb Steven Rostedt: > On Sat, 2011-12-24 at 01:02 +0100, Karsten Wiese wrote: > > Hi Steven, > > below trace shows regularly here: > > > > [ 3560.172428] BUG: sleeping function called from invalid context at > > kernel/rtmutex.c:645 > > [ 3560.172431] in_atomic(): 1, irqs_disabled(): 1, pid: 28, name: irq/9-acpi > > [ 3560.172434] 1 lock held by irq/9-acpi/28: > > [ 3560.172436] #0: (acpi_gbl_gpe_lock){+.+...}, at: [] > > acpi_ev_gpe_detect+0x29/0x12f > > [ 3560.172447] irq event stamp: 9680 > > [ 3560.172449] hardirqs last enabled at (9679): [] > > _raw_spin_unlock_irq+0x27/0x48 > > [ 3560.172455] hardirqs last disabled at (9680): [] > > _raw_spin_lock_irqsave+0x1c/0x82 > > [ 3560.172460] softirqs last enabled at (0): [] > > copy_process+0x530/0x1086 > > [ 3560.172464] softirqs last disabled at (0): [< (null)>] (null) > > [ 3560.172469] Pid: 28, comm: irq/9-acpi Not tainted > > 3.0.14-1.rt31.1.fc16.ccrma.i686.rt #1 > > [ 3560.172471] Call Trace: > > [ 3560.172476] [] __might_sleep+0xf4/0xfb > > [ 3560.172479] [] rt_spin_lock+0x1f/0x56 > > [ 3560.172483] [] __local_lock_irq+0x1e/0x5b > > [ 3560.172486] [] __local_lock_irqsave+0x19/0x27 > > [ 3560.172490] [] kmem_cache_alloc_trace+0x67/0xf5 > > [ 3560.172493] [] ? acpi_os_allocate_zeroed+0x2f/0x2f > > [ 3560.172497] [] __acpi_os_execute+0x66/0x15b > > [ 3560.172501] [] ? acpi_os_allocate_zeroed+0x2f/0x2f > > [ 3560.172504] [] acpi_os_execute+0x17/0x19 > > [ 3560.172508] [] acpi_ev_gpe_dispatch+0xe4/0x13d > > [ 3560.172511] [] acpi_ev_gpe_detect+0xef/0x12f > > [ 3560.172516] [] acpi_ev_sci_xrupt_handler+0x1a/0x20 > > [ 3560.172519] [] acpi_irq+0x13/0x2e > > [ 3560.172522] [] irq_forced_thread_fn+0x1d/0x36 > > [ 3560.172525] [] irq_thread+0xc0/0x1a0 > > [ 3560.172529] [] ? migrate_enable+0x124/0x133 > > [ 3560.172532] [] ? irq_thread_fn+0x2c/0x2c > > [ 3560.172535] [] ? irq_finalize_oneshot+0x94/0x94 > > [ 3560.172539] [] kthread+0x76/0x7b > > [ 3560.172544] [] ? trace_hardirqs_on_thunk+0xc/0x10 > > [ 3560.172548] [] ? restore_all+0xf/0xf > > [ 3560.172551] [] ? __init_kthread_worker+0x67/0x67 > > [ 3560.172555] [] kernel_thread_helper+0x6/0x10 > > Seems the back traces that have been currently reported have been for > i386. Although Clark Williams has been saying he's been seeing it on > x86_64, but only when he does a suspend and resume on his laptop. > > Does this just happen randomly? Or do you do something in particular > when this happens, (like a suspend and resume)? It happens rythmically on a hp compaq 6710s laptop bought in ~2007. BIOS updated in 2011. With kernels 3.0.14-1.rt31.1.fc16.ccrma.i686 and .x86_64 and self built i386 3.0.14.rt31. rt25 is ok on the hp compaq 6710s except for occasional ext3 corruption past hibernate/resume, which also happens with latest fedora16 stock kernel. My understanding is the bug above is caused by kmalloc being called from __acpi_os_execute under the raw_spin_lock_irqsave(&acpi_gbl_gpe_lock, flags) aquired in acpi_ev_gpe_detect: The bug triggers depending on acpi implementation. If __acpi_os_execute isn't needed, it doesn't. In rt25 acpi_gbl_gpe_lock was a mutex. I didn't notice any bad behaviour on the laptop except the hibernate/resume issue. Is there an easy way to trigger the bug resulting in the change of acpi_gbl_gpe_lock to a raw_spin_lock? Could rt31 be fixed for the laptop by preallocating the struct acpi_os_dpc instances it allocates in __acpi_os_execute? How many needed to be preallocated then? Maybe the queue_work_on(0,...) called in __acpi_os_execute wouldn't be needed in PREEMPT_RT if the acpi-irq thread would be bound to cpu 0? Dunno... Thanks, Karsten