From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: bugsplat 3.6.7-rt18 Date: Fri, 30 Nov 2012 17:59:32 -0800 Message-ID: <20121201015932.GP2474@linux.vnet.ibm.com> References: <201212010050.59563.tim01@vlsi.informatik.tu-darmstadt.de> <50B94C0D.1010809@am.sony.com> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Tim Sander , "linux-rt-users@vger.kernel.org" To: Frank Rowand Return-path: Received: from e39.co.us.ibm.com ([32.97.110.160]:39311 "EHLO e39.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750839Ab2LAB7g (ORCPT ); Fri, 30 Nov 2012 20:59:36 -0500 Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 30 Nov 2012 18:59:35 -0700 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by d03dlp03.boulder.ibm.com (Postfix) with ESMTP id AECFC19D8046 for ; Fri, 30 Nov 2012 18:59:34 -0700 (MST) Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id qB11xYZ0359546 for ; Fri, 30 Nov 2012 18:59:34 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id qB11xXtd007026 for ; Fri, 30 Nov 2012 18:59:34 -0700 Content-Disposition: inline In-Reply-To: <50B94C0D.1010809@am.sony.com> Sender: linux-rt-users-owner@vger.kernel.org List-ID: On Fri, Nov 30, 2012 at 04:15:09PM -0800, Frank Rowand wrote: > Adding Paul to the cc: list, for rcu... > > -Frank > > On 11/30/12 15:50, Tim Sander wrote: > > Hi > > > > After Frank posted a patch i managed the 3.6.7-rt18 kernel to boot: > > There are some local platform modifications for arm mach-pcm043 but > > nothing which should cause this. The kernel got renamed so that the > > hw debugger works. HR_TIMERS are disabled in my config. > > > > The kernel continues to work after that but it fails later on more below on that. > > > > Best regards > > Tim > > > > PS: This mail went accidentially to lkml and i was wondering that it is not > > reaching the linux-rt-users mailing list. But as the s/n ratio is much better > > on this list, i just resend it over here... > > > > NET: Registered protocol family 16 > > DMA: preallocated 256 KiB pool for atomic coherent allocations > > > > ================================= > > [ INFO: inconsistent lock state ] > > 2.6.42pmx-rt18-00015-gdc97b0a-dirty #48 Not tainted <-- this is really 3.6.7-rt18 changed for hw debugger > > --------------------------------- > > inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. > > kworker/u:0/13 [HC1[1]:SC0[0]:HE0:SE1] takes: > > (rcu_kthread_wq.lock.lock.wait_lock){?.+...}, at: [] rt_spin_lock_slowlock+0x44/0x208 > > {HARDIRQ-ON-W} state was registered at: > > [] mark_lock+0x274/0x3c8 > > [] mark_irqflags+0x11c/0x1a4 > > [] __lock_acquire+0x5dc/0x814 > > [] lock_acquire+0x68/0x7c > > [] _raw_spin_lock+0x40/0x50 > > [] rt_spin_lock_slowlock+0x44/0x208 > > [] rt_spin_lock+0x1c/0x44 > > [] prepare_to_wait+0x28/0x70 > > [] rcu_kthread+0x68/0x1b0 > > [] kthread+0x90/0x9c > > [] do_exit+0x0/0x2e0 > > irq event stamp: 46 > > hardirqs last enabled at (45): [] _raw_spin_unlock_irqrestore+0x44/0x7c > > hardirqs last disabled at (46): [] __irq_svc+0x34/0x98 > > softirqs last enabled at (0): [] copy_process+0x1f4/0xb78 > > softirqs last disabled at (0): [< (null)>] (null) > > > > other info that might help us debug this: > > Possible unsafe locking scenario: > > > > CPU0 > > ---- > > lock(rcu_kthread_wq.lock.lock.wait_lock); > > > > lock(rcu_kthread_wq.lock.lock.wait_lock); ??? These are all spin_lock_irqsave() calls, so how would the interrupt happen? Or do these need to be changed to raw_spinlock_t? Hmmm... In include/linux/wait.h, doesn't struct __wait_queue_head need to have raw_spinlock_t rather than spinlock_t? Please try that change and let me know what happens. Thanx, Paul > > *** DEADLOCK *** > > > > 2 locks held by kworker/u:0/13: > > #0: (&p->alloc_lock){+.+...}, at: [] mm_release+0xd0/0xf8 > > #1: (&(&(&q->lock)->lock)->wait_lock){+.+...}, at: [] rt_spin_lock_slowunlock+0x14/0x70 > > > > stack backtrace: > > Backtrace: > > [] (dump_backtrace+0x0/0x110) from [] (dump_stack+0x18/0x1c) > > r6:c041e414 r5:c78735d0 r4:c7873200 r3:c7873200 > > [] (dump_stack+0x0/0x1c) from [] (print_usage_bug+0x180/0x1c8) > > [] (print_usage_bug+0x0/0x1c8) from [] (mark_lock_irq+0xc0/0x26c) > > [] (mark_lock_irq+0x0/0x26c) from [] (mark_lock+0x2a0/0x3c8) > > [] (mark_lock+0x0/0x3c8) from [] (mark_irqflags+0x84/0x1a4) > > [] (mark_irqflags+0x0/0x1a4) from [] (__lock_acquire+0x5dc/0x814) > > r5:c7873200 r4:c036499c > > [] (__lock_acquire+0x0/0x814) from [] (lock_acquire+0x68/0x7c) > > [] (lock_acquire+0x0/0x7c) from [] (_raw_spin_lock+0x40/0x50) > > r6:00000003 r5:c0259720 r4:c036498c > > [] (_raw_spin_lock+0x0/0x50) from [] (rt_spin_lock_slowlock+0x44/0x208) > > r5:c7873200 r4:c036498c > > [] (rt_spin_lock_slowlock+0x0/0x208) from [] (rt_spin_lock+0x1c/0x44) > > [] (rt_spin_lock+0x0/0x44) from [] (__wake_up+0x24/0x4c) > > r5:00000001 r4:c036498c > > [] (__wake_up+0x0/0x4c) from [] (invoke_rcu_callbacks+0x34/0x40) > > r7:00000000 r6:00000000 r5:00000000 r4:c7873200 > > [] (invoke_rcu_callbacks+0x0/0x40) from [] (rcu_check_callbacks+0x8c/0xc8) > > [] (rcu_check_callbacks+0x0/0xc8) from [] (update_process_times+0x48/0x54) > > r4:c7873200 r3:c7873200 > > [] (update_process_times+0x0/0x54) from [] (tick_periodic+0xbc/0xd4) > > r6:c0360740 r5:c78090c0 r4:c037cd24 r3:20000013 > > [] (tick_periodic+0x0/0xd4) from [] (tick_handle_periodic+0x20/0x90) > > r4:c03607c0 r3:c7874000 > > [] (tick_handle_periodic+0x0/0x90) from [] (mxc_timer_interrupt+0x28/0x38) > > [] (mxc_timer_interrupt+0x0/0x38) from [] (handle_irq_event_percpu+0x60/0x1b0) > > [] (handle_irq_event_percpu+0x0/0x1b0) from [] (handle_irq_event+0x44/0x64) > > [] (handle_irq_event+0x0/0x64) from [] (handle_level_irq+0xd0/0xe8) > > r6:c7875e78 r5:c7809118 r4:c78090c0 r3:00020000 > > [] (handle_level_irq+0x0/0xe8) from [] (generic_handle_irq+0x28/0x38) > > r5:00000000 r4:0000002d > > [] (generic_handle_irq+0x0/0x38) from [] (handle_IRQ+0x68/0x8c) > > r4:0000002d r3:00000050 > > [] (handle_IRQ+0x0/0x8c) from [] (avic_handle_irq+0x3c/0x48) > > r5:0000ffff r4:c0375c10 > > [] (avic_handle_irq+0x0/0x48) from [] (__irq_svc+0x44/0x98) > > Exception stack(0xc7875e78 to 0xc7875ec0) > > 5e60: 00000001 c78735b8 > > 5e80: 00000000 00000000 20000013 c7840388 20000013 0000000f 00000008 00000000 > > 5ea0: 00000000 c7875ed4 c7875e58 c7875ec0 c005129c c025a5f8 20000013 ffffffff > > r6:ffffffff r5:20000013 r4:c025a5f8 r3:c7873200 > > [] (_raw_spin_unlock_irqrestore+0x0/0x7c) from [] (try_to_wake_up+0x140/0x164) > > r5:c7840388 r4:00000001 > > [] (try_to_wake_up+0x0/0x164) from [] (wake_up_lock_sleeper+0x18/0x1c) > > r8:80000013 r7:00000001 r6:c7847cc8 r5:c7874000 r4:c7847dcc > > r3:00000001 > > [] (wake_up_lock_sleeper+0x0/0x1c) from [] (rt_mutex_wake_waiter+0x20/0x30) > > [] (rt_mutex_wake_waiter+0x0/0x30) from [] (wakeup_next_waiter+0xa4/0xac) > > [] (wakeup_next_waiter+0x0/0xac) from [] (rt_spin_lock_slowunlock+0x5c/0x70) > > r8:00000000 r7:00000013 r6:00000000 r5:c7874000 r4:c7847dcc > > r3:c7847dec > > [] (rt_spin_lock_slowunlock+0x0/0x70) from [] (rt_spin_unlock+0x28/0x2c) > > r5:c7847dc8 r4:c7847dcc > > [] (rt_spin_unlock+0x0/0x2c) from [] (complete+0x4c/0x54) > > r4:c7847dcc r3:00000000 > > [] (complete+0x0/0x54) from [] (mm_release+0xe8/0xf8) > > r5:c78734d4 r4:c7873200 > > [] (mm_release+0x0/0xf8) from [] (exit_mm+0x20/0x110) > > r6:00000001 r5:c7873200 r4:00000000 > > [] (exit_mm+0x0/0x110) from [] (do_exit+0x1c0/0x2e0) > > r7:00000013 r6:00000001 r5:00000000 r4:c7873200 > > [] (do_exit+0x0/0x2e0) from [] (unlock_slab_and_free_delayed.clone.47+0xc0/0xd0) > > L210 cache controller enabled > > l2x0: 8 ways, CACHE_ID 0x4100004b, AUX_CTRL 0x00030024, Cache size: 131072 B > > bio: create slab at 0 > > SCSI subsystem initialized > > Switching to clocksource mxc_timer1 > > NET: Registered protocol family 2 > > TCP established hash table entries: 4096 (order: 3, 32768 bytes) > > TCP bind hash table entries: 4096 (order: 6, 360448 bytes) > > TCP: Hash tables configured (established 4096 bind 4096) > > TCP: reno registered > > UDP hash table entries: 256 (order: 3, 49152 bytes) > > UDP-Lite hash table entries: 256 (order: 3, 49152 bytes) > > NET: Registered protocol family 1 > > msgmni has been set to 236 > > io scheduler noop registered (default) > > start plist test > > > > --- snip > > > > > > Later on the kernel locks up without any output on serial console. The hardware debugger gives me the following call > > stack: > > Transcribed by hand after the panic stuff: > > > > BUG_ON(rt_mutex_owner(lock) == self)) > > rt_spint_lock_fastlock(... > > __wake_up( > > invoke_rcu_callbacks > > rcu_check_callbacks > > update_process_times > > tick_periodic > > tick_handle_periodic > > handle_irq_event_percpu > > handle_irq_event > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > . > > > >