From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031881AbbEESnr (ORCPT ); Tue, 5 May 2015 14:43:47 -0400 Received: from e38.co.us.ibm.com ([32.97.110.159]:54221 "EHLO e38.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030768AbbEESmV (ORCPT ); Tue, 5 May 2015 14:42:21 -0400 Date: Tue, 5 May 2015 11:42:16 -0700 From: "Paul E. McKenney" To: peterz@infradead.org, mingo@redhat.com Cc: linux-kernel@vger.kernel.org Subject: Re: WARN_ON() in lock_accesses() Message-ID: <20150505184216.GA11785@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20150502104959.GA19926@linux.vnet.ibm.com> <20150502181343.GA26884@linux.vnet.ibm.com> <20150503075316.GA2663@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150503075316.GA2663@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15050518-0029-0000-0000-0000098CEEDC Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, May 03, 2015 at 12:53:16AM -0700, Paul E. McKenney wrote: > On Sat, May 02, 2015 at 11:13:43AM -0700, Paul E. McKenney wrote: > > On Sat, May 02, 2015 at 03:49:59AM -0700, Paul E. McKenney wrote: > > > Hello! > > > > > > I got the following while testing Tiny RCU, so this is a UP system with > > > PREEMPT=n, but with my RCU stack: > > > > > > [ 1774.636012] WARNING: CPU: 0 PID: 1 at /home/paulmck/public_git/linux-rcu/kernel/locking/lockdep.c:973 __bfs+0x207/0x280() > > > [ 1774.636012] Modules linked in: > > > [ 1774.636012] CPU: 0 PID: 1 Comm: init Not tainted 4.1.0-rc1+ #1 > > > [ 1774.636012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > > > [ 1774.636012] ffffffff81c72ee0 ffff88001e8c7818 ffffffff818eab91 ffff88001e8c7858 > > > [ 1774.636012] ffffffff8104916f ffff88001e8c7898 ffffffff838c60c0 0000000000000000 > > > [ 1774.636012] ffffffff81079f20 ffff88001e8c78e8 0000000000000000 ffff88001e8c7868 > > > [ 1774.636012] Call Trace: > > > [ 1774.636012] [] dump_stack+0x19/0x1b > > > [ 1774.636012] [] warn_slowpath_common+0x7f/0xc0 > > > [ 1774.636012] [] ? noop_count+0x10/0x10 > > > [ 1774.636012] [] warn_slowpath_null+0x15/0x20 > > > [ 1774.636012] [] __bfs+0x207/0x280 > > > [ 1774.636012] [] check_usage_backwards+0x72/0x130 > > > [ 1774.636012] [] ? __lock_acquire+0x93c/0x1d50 > > > [ 1774.636012] [] ? print_shortest_lock_dependencies+0x1d0/0x1d0 > > > [ 1774.636012] [] mark_lock+0x1c2/0x2c0 > > > [ 1774.636012] [] __lock_acquire+0x6f7/0x1d50 > > > [ 1774.636012] [] ? __lock_acquire+0x90f/0x1d50 > > > [ 1774.636012] [] ? __lock_acquire+0x469/0x1d50 > > > [ 1774.636012] [] ? __lock_acquire+0x93c/0x1d50 > > > [ 1774.636012] [] lock_acquire+0xa4/0x130 > > > [ 1774.636012] [] ? d_walk+0xd1/0x4e0 > > > [ 1774.636012] [] ? select_collect+0xc0/0xc0 > > > [ 1774.636012] [] _raw_spin_lock_nested+0x2a/0x40 > > > [ 1774.636012] [] ? d_walk+0xd1/0x4e0 > > > [ 1774.636012] [] d_walk+0xd1/0x4e0 > > > [ 1774.636012] [] ? d_invalidate+0xa7/0x100 > > > [ 1774.636012] [] ? __d_drop+0xb0/0xb0 > > > [ 1774.636012] [] d_invalidate+0xa7/0x100 > > > [ 1774.636012] [] proc_flush_task+0x9c/0x180 > > > [ 1774.636012] [] release_task+0xa7/0x640 > > > [ 1774.636012] [] ? release_task+0x14/0x640 > > > [ 1774.636012] [] wait_consider_task+0x804/0xec0 > > > [ 1774.636012] [] ? do_wait+0x100/0x240 > > > [ 1774.636012] [] do_wait+0x100/0x240 > > > [ 1774.636012] [] SyS_wait4+0x63/0xe0 > > > [ 1774.636012] [] ? task_stopped_code+0x60/0x60 > > > [ 1774.636012] [] C_SYSC_wait4+0xc7/0xd0 > > > [ 1774.636012] [] ? up_read+0x1e/0x40 > > > [ 1774.636012] [] ? retint_swapgs+0x11/0x16 > > > [ 1774.636012] [] ? trace_hardirqs_on_caller+0xfd/0x1c0 > > > [ 1774.636012] [] compat_SyS_wait4+0x9/0x10 > > > [ 1774.636012] [] sys32_waitpid+0xb/0x10 > > > [ 1774.636012] [] sysenter_dispatch+0x7/0x1f > > > > > > Unsurprisingly, this is followed up by a NULL pointer dereference. > > > > > > A quick look at the code suggests that someone might have gotten a > > > pointer to a held lock, then somehow that lock was released. Which > > > does not seem at all likely. > > > > > > Hints? > > > > > > In the meantime, I will check for reproducibility. > > > > With two failures thus far, MTBF at about 18 hours. Hmmm... > > Currently looks like something my for-4.2 commits introduced. Sorry for the > bother, tracking it down. And this bisected to a commit that forced rcutorture to use Tiny RCU's RCU-bh. So the next step is to backport this patch and see what happens. Thanx, Paul