From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751699AbbCKXBl (ORCPT ); Wed, 11 Mar 2015 19:01:41 -0400 Received: from e33.co.us.ibm.com ([32.97.110.151]:33791 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751056AbbCKXBj (ORCPT ); Wed, 11 Mar 2015 19:01:39 -0400 Date: Wed, 11 Mar 2015 16:01:34 -0700 From: "Paul E. McKenney" To: Sasha Levin Cc: Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , LKML , Dave Jones Subject: Re: rcu: frequent rcu lockups Message-ID: <20150311230133.GN5412@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <55009E2C.7070100@oracle.com> <20150311201704.GL5412@linux.vnet.ibm.com> <5500A329.9060509@oracle.com> <20150311204108.GM5412@linux.vnet.ibm.com> <5500BF51.5000806@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5500BF51.5000806@oracle.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15031123-0009-0000-0000-00000968F45A Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 11, 2015 at 06:18:57PM -0400, Sasha Levin wrote: > On 03/11/2015 04:41 PM, Paul E. McKenney wrote: > > Aha, I forgot to update rcu/next. I have now updated it, so it should > > make it there today or tomorrow. In the meantime, does the following > > commit help? > > > > Also, how quickly does your test setup reproduce this? > > I usually hit it within an hour. I clearly need to work more on rcutorture -- took tens of hours to hit it there. > With the commit I didn't hit it yet, but I do see 4 different WARNings: I wish that I could say that I am surprised, but the sad fact is that I am still shaking the bugs out. > [ 1002.577249] WARNING: CPU: 19 PID: 22311 at kernel/rcu/tree.c:2190 rcu_report_qs_rnp+0x41e/0x590() I haven't seen this particular one, but it is another likely symptom. > [ 1002.582438] Modules linked in: > [ 1002.582438] CPU: 19 PID: 22311 Comm: trinity-c58 Not tainted 4.0.0-rc3-next-20150311-sasha-00035-g0a45b48 #2051 > [ 1002.582438] ffff88047f200000 00000000727b89c5 ffff880672807cc8 ffffffff9ea97ec9 > [ 1002.582438] 0000000000000000 0000000000000000 ffff880672807d18 ffffffff941e242a > [ 1002.582438] 0000000000000282 ffffffff943339be dffffc0000000000 ffffffffa1721000 > [ 1002.582438] Call Trace: > [ 1002.582438] dump_stack (lib/dump_stack.c:52) > [ 1002.582438] warn_slowpath_common (kernel/panic.c:447) > [ 1002.582438] ? rcu_report_qs_rnp (kernel/rcu/tree.c:2190 (discriminator 3)) > [ 1002.582438] warn_slowpath_null (kernel/panic.c:481) > [ 1002.582438] rcu_report_qs_rnp (kernel/rcu/tree.c:2190 (discriminator 3)) > [ 1002.582438] rcu_process_callbacks (kernel/rcu/tree.c:2288 kernel/rcu/tree.c:2324 kernel/rcu/tree.c:2816 kernel/rcu/tree.c:2849) > [ 1002.582438] ? __lock_is_held (kernel/locking/lockdep.c:3518) > [ 1002.582438] __do_softirq (kernel/softirq.c:273 include/linux/jump_label.h:114 include/trace/events/irq.h:126 kernel/softirq.c:274) > [ 1002.582438] irq_exit (kernel/softirq.c:350 kernel/softirq.c:391) > [ 1002.582438] smp_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:918) > [ 1002.582438] apic_timer_interrupt (arch/x86/kernel/entry_64.S:920) > [ 1002.582438] ? __lock_is_held (kernel/locking/lockdep.c:3518) > [ 1002.582438] ? lock_acquire (./arch/x86/include/asm/paravirt.h:809 kernel/locking/lockdep.c:3605) > [ 1002.582438] ? rmap_walk (mm/rmap.c:1569 mm/rmap.c:1594) > [ 1002.582438] down_read (./arch/x86/include/asm/rwsem.h:65 kernel/locking/rwsem.c:24) > [ 1002.582438] ? rmap_walk (mm/rmap.c:1569 mm/rmap.c:1594) > [ 1002.582438] ? up_read (./arch/x86/include/asm/rwsem.h:156 kernel/locking/rwsem.c:81) > [ 1002.582438] ? page_remove_rmap (mm/rmap.c:1209) > [ 1002.582438] rmap_walk (mm/rmap.c:1569 mm/rmap.c:1594) > [ 1002.582438] try_to_munlock (mm/rmap.c:1446) > [ 1002.582438] ? try_to_unmap (mm/rmap.c:1446) > [ 1002.582438] ? page_remove_rmap (mm/rmap.c:1209) > [ 1002.582438] ? invalid_migration_vma (mm/rmap.c:1382) > [ 1002.582438] ? anon_vma_prepare (mm/rmap.c:497) > [ 1002.582438] __munlock_isolated_page (mm/mlock.c:132) > [ 1002.582438] __munlock_pagevec (mm/mlock.c:332) > [ 1002.582438] ? __munlock_isolate_lru_page (mm/mlock.c:277) > [ 1002.582438] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:154 kernel/locking/spinlock.c:183) > [ 1002.582438] ? __get_locked_pte (mm/memory.c:1432) > [ 1002.582438] munlock_vma_pages_range (mm/mlock.c:477) > [ 1002.582438] ? munlock_vma_page (mm/mlock.c:424) > [ 1002.582438] ? __lock_acquire (kernel/locking/lockdep.c:2019 kernel/locking/lockdep.c:3184) > [ 1002.582438] ? kfree (mm/slub.c:1300 mm/slub.c:2707 mm/slub.c:3416) > [ 1002.582438] ? __lock_acquire (kernel/locking/lockdep.c:2019 kernel/locking/lockdep.c:3184) > [ 1002.582438] ? vma_merge (mm/mmap.c:1085) > [ 1002.582438] mlock_fixup (mm/mlock.c:549) > [ 1002.582438] do_mlockall (mm/mlock.c:677) > [ 1002.582438] ? do_mlock (mm/mlock.c:658) > [ 1002.582438] sys_munlockall (mm/mlock.c:720) > [ 1002.582438] ia32_do_call (arch/x86/ia32/ia32entry.S:486) > > [ 1034.418016] WARNING: CPU: 0 PID: 8 at kernel/rcu/tree.c:1936 rcu_gp_kthread+0x2a7c/0x4dd0() I have seen this one, tracking it down. > [ 1034.419933] Modules linked in: > [ 1034.420802] CPU: 0 PID: 8 Comm: rcu_preempt Tainted: G W 4.0.0-rc3-next-20150311-sasha-00035-g0a45b48 #2051 > [ 1034.423329] ffff8800496f8000 000000005fe03e34 ffff880049ac7b28 ffffffff9ea97ec9 > [ 1034.426460] 0000000000000000 0000000000000000 ffff880049ac7b78 ffffffff941e242a > [ 1034.426460] 0000000000022000 ffffffff9433cafc ffffffffa1725040 00000001000110de > [ 1034.426460] Call Trace: > [ 1034.426460] dump_stack (lib/dump_stack.c:52) > [ 1034.426460] warn_slowpath_common (kernel/panic.c:447) > [ 1034.426460] ? rcu_gp_kthread (kernel/rcu/tree.c:1936 kernel/rcu/tree.c:2061) > [ 1034.426460] warn_slowpath_null (kernel/panic.c:481) > [ 1034.426460] rcu_gp_kthread (kernel/rcu/tree.c:1936 kernel/rcu/tree.c:2061) > [ 1034.426460] ? plist_check_head (lib/plist.c:62) > [ 1034.426460] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972) > [ 1034.426460] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2554 kernel/locking/lockdep.c:2601) > [ 1034.426460] ? _raw_spin_unlock_irq (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:171 kernel/locking/spinlock.c:199) > [ 1034.426460] ? __schedule (kernel/sched/core.c:2803) > [ 1034.426460] ? ___preempt_schedule (arch/x86/lib/thunk_64.S:51) > [ 1034.426460] ? abort_exclusive_wait (kernel/sched/wait.c:292) > [ 1034.426460] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972) > [ 1034.426460] kthread (kernel/kthread.c:207) > [ 1034.426460] ? flush_kthread_work (kernel/kthread.c:176) > [ 1034.426460] ? _raw_spin_unlock_irq (./arch/x86/include/asm/paravirt.h:819 include/linux/spinlock_api_smp.h:170 kernel/locking/spinlock.c:199) > [ 1034.426460] ? flush_kthread_work (kernel/kthread.c:176) > [ 1034.426460] ret_from_fork (arch/x86/kernel/entry_64.S:565) > [ 1034.426460] ? flush_kthread_work (kernel/kthread.c:176) > > [ 1034.501448] WARNING: CPU: 0 PID: 8 at kernel/rcu/tree_plugin.h:479 rcu_gp_kthread+0x1e58/0x4dd0() And this one as well. > [ 1034.505431] Modules linked in: > [ 1034.507242] CPU: 0 PID: 8 Comm: rcu_preempt Tainted: G W 4.0.0-rc3-next-20150311-sasha-00035-g0a45b48 #2051 > [ 1034.509348] ffff8800496f8000 000000005fe03e34 ffff880049ac7b28 ffffffff9ea97ec9 > [ 1034.509348] 0000000000000000 0000000000000000 ffff880049ac7b78 ffffffff941e242a > [ 1034.509348] ffffffffa1723038 ffffffff9433bed8 ffffffffa1723040 ffff880050622000 > [ 1034.509348] Call Trace: > [ 1034.509348] dump_stack (lib/dump_stack.c:52) > [ 1034.509348] warn_slowpath_common (kernel/panic.c:447) > [ 1034.509348] ? rcu_gp_kthread (kernel/rcu/tree_plugin.h:479 kernel/rcu/tree.c:1834 kernel/rcu/tree.c:1993) > [ 1034.509348] warn_slowpath_null (kernel/panic.c:481) > [ 1034.509348] rcu_gp_kthread (kernel/rcu/tree_plugin.h:479 kernel/rcu/tree.c:1834 kernel/rcu/tree.c:1993) > [ 1034.509348] ? plist_check_head (lib/plist.c:62) > [ 1034.509348] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972) > [ 1034.509348] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2554 kernel/locking/lockdep.c:2601) > [ 1034.509348] ? _raw_spin_unlock_irq (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:171 kernel/locking/spinlock.c:199) > [ 1034.509348] ? __schedule (kernel/sched/core.c:2803) > [ 1034.509348] ? ___preempt_schedule (arch/x86/lib/thunk_64.S:51) > [ 1034.509348] ? abort_exclusive_wait (kernel/sched/wait.c:292) > [ 1034.509348] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972) > [ 1034.509348] kthread (kernel/kthread.c:207) > [ 1034.509348] ? flush_kthread_work (kernel/kthread.c:176) > [ 1034.509348] ? _raw_spin_unlock_irq (./arch/x86/include/asm/paravirt.h:819 include/linux/spinlock_api_smp.h:170 kernel/locking/spinlock.c:199) > [ 1034.509348] ? flush_kthread_work (kernel/kthread.c:176) > [ 1034.509348] ret_from_fork (arch/x86/kernel/entry_64.S:565) > [ 1034.509348] ? flush_kthread_work (kernel/kthread.c:176) > > [ 2733.830899] WARNING: CPU: 0 PID: 8 at kernel/rcu/tree.c:1935 rcu_gp_kthread+0x2aa6/0x4dd0() And this one, too. > [ 2733.839509] Modules linked in: > [ 2733.839509] CPU: 0 PID: 8 Comm: rcu_preempt Tainted: G W 4.0.0-rc3-next-20150311-sasha-00035-g0a45b48 #2051 > [ 2733.839509] ffff8800496f8000 000000005fe03e34 ffff880049ac7b28 ffffffff9ea97ec9 > [ 2733.839509] 0000000000000000 0000000000000000 ffff880049ac7b78 ffffffff941e242a > [ 2733.839509] 0000000000022000 ffffffff9433cb26 ffffffffa1723040 0000000100038724 > [ 2733.839509] Call Trace: > [ 2733.839509] dump_stack (lib/dump_stack.c:52) > [ 2733.839509] warn_slowpath_common (kernel/panic.c:447) > [ 2733.839509] ? rcu_gp_kthread (kernel/rcu/tree.c:1935 kernel/rcu/tree.c:2061) > [ 2733.839509] warn_slowpath_null (kernel/panic.c:481) > [ 2733.839509] rcu_gp_kthread (kernel/rcu/tree.c:1935 kernel/rcu/tree.c:2061) > [ 2733.839509] ? plist_check_head (lib/plist.c:62) > [ 2733.839509] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972) > [ 2733.839509] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2554 kernel/locking/lockdep.c:2601) > [ 2733.839509] ? _raw_spin_unlock_irq (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:171 kernel/locking/spinlock.c:199) > [ 2733.839509] ? __schedule (kernel/sched/core.c:2803) > [ 2733.839509] ? ___preempt_schedule (arch/x86/lib/thunk_64.S:51) > [ 2733.839509] ? abort_exclusive_wait (kernel/sched/wait.c:292) > [ 2733.839509] ? cond_synchronize_rcu (kernel/rcu/tree.c:1972) > [ 2733.839509] kthread (kernel/kthread.c:207) > [ 2733.839509] ? flush_kthread_work (kernel/kthread.c:176) > [ 2733.839509] ? _raw_spin_unlock_irq (./arch/x86/include/asm/paravirt.h:819 include/linux/spinlock_api_smp.h:170 kernel/locking/spinlock.c:199) > [ 2733.839509] ? flush_kthread_work (kernel/kthread.c:176) > [ 2733.839509] ret_from_fork (arch/x86/kernel/entry_64.S:565) > [ 2733.839509] ? flush_kthread_work (kernel/kthread.c:176) I did locate a similar bug in the expedited RCU code path, and perhaps that will provide a clue to the other problems as well. Thanx, Paul