From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:41218 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729681AbeGMQWD (ORCPT ); Fri, 13 Jul 2018 12:22:03 -0400 Date: Fri, 13 Jul 2018 17:06:46 +0100 From: Al Viro To: Peter Geis Cc: linux-fsdevel@vger.kernel.org Subject: Re: [BUG] kernel BUG at fs/dcache.c:899 Message-ID: <20180713160646.GG30522@ZenIV.linux.org.uk> References: <53129dac-48fd-df2e-bb6f-0a79c9776a74@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53129dac-48fd-df2e-bb6f-0a79c9776a74@gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri, Jul 13, 2018 at 11:33:37AM -0400, Peter Geis wrote: > Good Morning, > > I have been trying to track down a bug that has been causing my Tegra3 > device to reboot while compiling. > I finally managed to catch the offender, the details are below: > The offending code is a triggered bug in dget_parent, the code is: > rcu_read_unlock(); > BUG_ON(!ret->d_lockref.count); > ret->d_lockref.count++; Interesting... We call that while holding a reference to dentry (we'd better). That code is rcu_read_lock(); ret = dentry->d_parent; ret won't get freed until after rcu_read_unlock, so spin_lock is safe here spin_lock(&ret->d_lock); if (unlikely(ret != dentry->d_parent)) { spin_unlock(&ret->d_lock); rcu_read_unlock(); goto repeat; } Since we got through that, we have observed dentry->d_parent == ret with ret->d_lock held. rcu_read_unlock(); BUG_ON(!ret->d_lockref.count); Now, this means that dentry->d_parent is *not* equal to ret anymore - otherwise ret would remain pinned. The only place that changes ->d_parent of a live dentry is __d_move() - no other assignments exist. __d_move() is done under rename_lock - it's globally serialized. And it grabs ->d_lock on all parents involved before modifying ->d_parent of anything, so the observed condition (ret == dentry->d_parent, ret->d_lock held by us) can't change until we drop ret->d_lock... Which kernel had that been? It looks either like a memory corruption (anywhere) or as if you called that with dentry itself getting killed right under you. Reference to ->d_parent is not dropped until after the last reference to dentry goes away, so... Could you slap if (WARN_ON(!ret->d_lockref.count)) printk(KERN_ERR "child: %px[%ld], parent: %px:%px\n", dentry, (long)dentry->d_lockref.count, dentry->d_parent, ret); right before that rcu_read_unlock() and see if you can trigger that?