From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dominique Martinet Subject: Re: Race condition introduced in 4bf46a27 VFS: Impose ordering on accesses of d_inode and d_flags Date: Fri, 31 Jul 2015 14:28:10 +0200 Message-ID: <20150731122810.GA17247@u-michard> References: <20150722154519.GA20808@u-michard> <20150730115045.GA24790@u-michard> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Cc: David Howells , Al Viro , Aurelien CEDEYN To: Return-path: Received: from sainfoin-out.extra.cea.fr ([132.167.192.145]:59310 "EHLO sainfoin-out.extra.cea.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750926AbbGaM22 (ORCPT ); Fri, 31 Jul 2015 08:28:28 -0400 Received: from pisaure.intra.cea.fr (pisaure.intra.cea.fr [132.166.88.21]) by sainfoin.extra.cea.fr (8.14.2/8.14.2/CEAnet-Internet-out-2.3) with ESMTP id t6VCSQlM003201 for ; Fri, 31 Jul 2015 14:28:26 +0200 Received: from pisaure.intra.cea.fr (localhost [127.0.0.1]) by localhost (Postfix) with SMTP id 25833203A04 for ; Fri, 31 Jul 2015 14:32:14 +0200 (CEST) Received: from muguet2.intra.cea.fr (muguet2.intra.cea.fr [132.166.192.7]) by pisaure.intra.cea.fr (Postfix) with ESMTP id 1BC99200801 for ; Fri, 31 Jul 2015 14:32:14 +0200 (CEST) Received: from zia.cdc.esteban.ctsi (out.dam.intra.cea.fr [132.165.76.10]) by muguet2.intra.cea.fr (8.13.8/8.13.8/CEAnet-Intranet-out-1.2) with SMTP id t6VCSQlL019337 for ; Fri, 31 Jul 2015 14:28:26 +0200 Content-Disposition: inline In-Reply-To: <20150730115045.GA24790@u-michard> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Dominique Martinet wrote on Thu, Jul 30, 2015 at 01:50:45PM +0200: > > My guess is that I'm just seeing a race condition that already existed > > but the barriers make it easier to reproduce it. > > This commit came with 2b0143b5c9 "VFS: normal filesystems (and lustre): > > d_inode() annotations" which made 9P access d_inode through the helper, > > which could have helped with ordering, but I'm still hitting the bug > > "easily" with that commit. > > Still looking for ideas to help diagnose further... Slowly progressing, through carefully placed systemtap probes I got down to this check in link_path_walk in fs/namei.c: if (!d_can_lookup(nd->path.dentry)) { err = -ENOTDIR; break; } going all the way to path_init -> path_openat -> do_filp_open -> user. Could then add a break in gdb on the err = -ENOTDIR instruction: #0 link_path_walk (name=0xffff88042b47b027 "f", nd=0xffff88042c19eff8) at fs/namei.c:1845 #1 0xffffffff81209209 in path_init (dfd=, name=, flags=64, nd=0x1 ) at fs/namei.c:1952 #2 0xffffffff8120bd42 in path_openat (dfd=, pathname=0xffff88042b47b000, nd=0xffff88042a9cfe28, op=0xffff88042a9cff1c, flags=65) at fs/namei.c:3230 #3 0xffffffff8120d8e9 in do_filp_open (dfd=-100, pathname=0xffff88042b47b000, op=0xffff88042a9cff1c) at fs/namei.c:3280 #4 0xffffffff811fb2d7 in do_sys_open (dfd=683933728, filename=, flags=, mode=) at fs/open.c:1010 #5 0xffffffff811fb3fe in SYSC_open (mode=, flags=, filename=) at fs/open.c:1028 #6 SyS_open (filename=, flags=, mode=) at fs/open.c:1023 Unfortunately can't seem to get much more out of it, nd->path is borked by the time gdb gets here: (gdb) p nd->path $1 = {mnt = 0x6f666e69000064, dentry = 0x7379656b64616564} Looking at other values around: - next->dentry seems to be the last dir (need to rename directories in my test to check, bad idea to name them all the same) - name is the name of the file that does exist It's easy enough to reproduce for me with the script I got, so happy to give more infos if someone has an idea... -- Dominique