From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============2431829035888020532==" MIME-Version: 1.0 From: Al Viro To: lkp@lists.01.org Subject: Re: [d_alloc_parallel] WARNING: bad unlock balance detected! Date: Tue, 07 Nov 2017 02:33:28 +0000 Message-ID: <20171107023328.GU21978@ZenIV.linux.org.uk> In-Reply-To: <20171107020113.52ws4cqhonhk2zvw@wfg-t540p.sh.intel.com> List-Id: --===============2431829035888020532== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Tue, Nov 07, 2017 at 10:01:13AM +0800, Fengguang Wu wrote: > Hi, > = > Here is a warning in v4.14-rc8 -- it's not necessarily a new bug. Why is it a bug at all? > [ 428.512005] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Con= trol: RX > LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-8, MAC , kernel 4.14.0-rc8 158, s= erial console /dev/ttyS0 > [ 429.798345] Kernel tests: Boot OK! > [ 430.761760] [ 430.766166] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > [ 430.775297] WARNING: bad unlock balance detected! > [ 430.784342] 4.14.0-rc8 #158 Not tainted > [ 430.792153] ------------------------------------- > [ 430.801319] pidof/1024 is trying to release lock (rcu_preempt_state) a= t: > [ 430.813514] [] rcu_read_unlock_special+0x5f8/0x620 > [ 430.824041] but there are no more locks to release! Er... yes? What of that? Since when is rcu_read_lock() not allowed to be used under an rwsem? > [ 430.833342] [ 430.833342] other info that might help us debug this: > [ 430.845985] 2 locks held by pidof/1024: > [ 430.853826] #0: (&sb->s_type->i_mutex_key){....}, at: [] l= ookup_slow+0x8a/0x310 > [ 430.869344] #1: (rcu_read_lock){....}, at: [] d_alloc_para= llel+0x7e/0xd10 No shit - we are doing RCU cache chain walk while holding ->i_rwsem. As in down_read(&rwsem); ... rcu_read_lock(); ... rcu_read_unlock(); Why is that a problem? If we are suddenly not allowed to have an RCU reader section while holding any kind of a blocking lock, a *lot* of places in the kernel are screwed. Please, explain. --===============2431829035888020532==-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756284AbdKGCdd (ORCPT ); Mon, 6 Nov 2017 21:33:33 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:59740 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756267AbdKGCdb (ORCPT ); Mon, 6 Nov 2017 21:33:31 -0500 Date: Tue, 7 Nov 2017 02:33:28 +0000 From: Al Viro To: Fengguang Wu Cc: linux-kernel@vger.kernel.org, Linus Torvalds , David Howells , Miklos Szeredi , lkp@01.org Subject: Re: [d_alloc_parallel] WARNING: bad unlock balance detected! Message-ID: <20171107023328.GU21978@ZenIV.linux.org.uk> References: <20171107020113.52ws4cqhonhk2zvw@wfg-t540p.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171107020113.52ws4cqhonhk2zvw@wfg-t540p.sh.intel.com> User-Agent: Mutt/1.9.0 (2017-09-02) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 07, 2017 at 10:01:13AM +0800, Fengguang Wu wrote: > Hi, > > Here is a warning in v4.14-rc8 -- it's not necessarily a new bug. Why is it a bug at all? > [ 428.512005] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX > LKP: HOSTNAME vm-lkp-wsx03-openwrt-i386-8, MAC , kernel 4.14.0-rc8 158, serial console /dev/ttyS0 > [ 429.798345] Kernel tests: Boot OK! > [ 430.761760] [ 430.766166] ===================================== > [ 430.775297] WARNING: bad unlock balance detected! > [ 430.784342] 4.14.0-rc8 #158 Not tainted > [ 430.792153] ------------------------------------- > [ 430.801319] pidof/1024 is trying to release lock (rcu_preempt_state) at: > [ 430.813514] [] rcu_read_unlock_special+0x5f8/0x620 > [ 430.824041] but there are no more locks to release! Er... yes? What of that? Since when is rcu_read_lock() not allowed to be used under an rwsem? > [ 430.833342] [ 430.833342] other info that might help us debug this: > [ 430.845985] 2 locks held by pidof/1024: > [ 430.853826] #0: (&sb->s_type->i_mutex_key){....}, at: [] lookup_slow+0x8a/0x310 > [ 430.869344] #1: (rcu_read_lock){....}, at: [] d_alloc_parallel+0x7e/0xd10 No shit - we are doing RCU cache chain walk while holding ->i_rwsem. As in down_read(&rwsem); ... rcu_read_lock(); ... rcu_read_unlock(); Why is that a problem? If we are suddenly not allowed to have an RCU reader section while holding any kind of a blocking lock, a *lot* of places in the kernel are screwed. Please, explain.