Linux NFS development
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "neilb@suse.de" <neilb@suse.de>
Cc: "xiubli@redhat.com" <xiubli@redhat.com>,
	"brauner@kernel.org" <brauner@kernel.org>,
	"idryomov@gmail.com" <idryomov@gmail.com>,
	"okorniev@redhat.com" <okorniev@redhat.com>,
	"linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>,
	"Dai.Ngo@oracle.com" <Dai.Ngo@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"johannes@sipsolutions.net" <johannes@sipsolutions.net>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>,
	"jlayton@kernel.org" <jlayton@kernel.org>,
	"anna@kernel.org" <anna@kernel.org>,
	"miklos@szeredi.hu" <miklos@szeredi.hu>,
	"anton.ivanov@cambridgegreys.com"
	<anton.ivanov@cambridgegreys.com>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"jack@suse.cz" <jack@suse.cz>, "tom@talpey.com" <tom@talpey.com>,
	"richard@nod.at" <richard@nod.at>,
	"linux-um@lists.infradead.org" <linux-um@lists.infradead.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"netfs@lists.linux.dev" <netfs@lists.linux.dev>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
	"senozhatsky@chromium.org" <senozhatsky@chromium.org>
Subject: Re: [PATCH 1/6] Change inode_operations.mkdir to return struct dentry *
Date: Wed, 26 Feb 2025 02:34:49 +0000	[thread overview]
Message-ID: <50e6b21c644b050a29e159c9484a5e01061434f6.camel@hammerspace.com> (raw)
In-Reply-To: <174053575968.102979.1033896985966452059@noble.neil.brown.name>

On Wed, 2025-02-26 at 13:09 +1100, NeilBrown wrote:
> On Tue, 25 Feb 2025, Trond Myklebust wrote:
> > On Mon, 2025-02-24 at 14:09 +1100, NeilBrown wrote:
> > > On Mon, 24 Feb 2025, Al Viro wrote:
> > > > On Mon, Feb 24, 2025 at 12:34:06PM +1100, NeilBrown wrote:
> > > > > On Sat, 22 Feb 2025, Al Viro wrote:
> > > > > > On Fri, Feb 21, 2025 at 10:36:30AM +1100, NeilBrown wrote:
> > > > > > 
> > > > > > > +In general, filesystems which use d_instantiate_new() to
> > > > > > > install the new
> > > > > > > +inode can safely return NULL.  Filesystems which may not
> > > > > > > have an I_NEW inode
> > > > > > > +should use d_drop();d_splice_alias() and return the
> > > > > > > result
> > > > > > > of the latter.
> > > > > > 
> > > > > > IMO that's a bad pattern, _especially_ if you want to go
> > > > > > for
> > > > > > "in-update"
> > > > > > kind of stuff later.
> > > > > 
> > > > > Agreed.  I have a draft patch to change d_splice_alias() and
> > > > > d_exact_alias() to work on hashed dentrys.  I thought it
> > > > > should
> > > > > go after
> > > > > these mkdir patches rather than before.
> > > > 
> > > > Could you give a braindump on the things d_exact_alias() is
> > > > needed
> > > > for?
> > > > It's a recurring headache when doing ->d_name/->d_parent
> > > > audits;
> > > > see e.g.
> > > > https://lore.kernel.org/all/20241213080023.GI3387508@ZenIV/ for
> > > > related
> > > > mini-rant from the latest iteration.
> > > > 
> > > > Proof of correctness is bloody awful; it feels like the
> > > > primitive
> > > > itself
> > > > is wrong, but I'd never been able to write anything concise
> > > > regarding
> > > > the things we really want there ;-/
> > > > 
> > > 
> > > As I understand it, it is needed (or wanted) to handle the
> > > possibility
> > > of an inode becoming "stale" and then recovering.  This could
> > > happen,
> > > for example, with a temporarily misconfigured NFS server.
> > > 
> > > If ->d_revalidate gets a NFSERR_STALE from the server it will
> > > return
> > > '0'
> > > so lookup_fast() and others will call d_invalidate() which will
> > > d_drop()
> > > the dentry.  There are other paths on which -ESTALE can result in
> > > d_drop().
> > > 
> > > If a subsequent attempt to "open" the name successfully finds the
> > > same
> > > inode we want to reuse the old dentry rather than create a new
> > > one.
> > > 
> > > I don't really understand why.  This code was added 20 years ago
> > > before
> > > git.
> > > It was introduced by
> > > 
> > > commit 89a45174b6b32596ea98fa3f89a243e2c1188a01
> > > Author: Trond Myklebust <trond.myklebust@fys.uio.no>
> > > Date:   Tue Jan 4 21:41:37 2005 +0100
> > > 
> > >      VFS: Avoid dentry aliasing problems in filesystems like NFS,
> > > where
> > >           inodes may be marked as stale in one instance (causing
> > > the
> > > dentry
> > >           to be dropped) then re-enabled in the next instance.
> > >     
> > >      Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
> > > 
> > > in history.git
> > > 
> > > Trond: do you have any memory of this?  Can you explain what the
> > > symptom
> > > was that you wanted to fix?
> > > 
> > > The original patch used d_add_unique() for lookup and atomic_open
> > > and
> > > readdir prime-dcache.  We now only use it for v4 atomic_open. 
> > > Maybe
> > > we
> > > don't need it at all?  Or maybe we need to restore it to those
> > > other
> > > callers? 
> > > 
> > 
> > 2005? That looks like it was trying to deal with the userspace NFS
> > server. I can't remember when it was given the ability to use the
> > inode
> > generation counter, but I'm pretty sure that in 2005 there were
> > plenty
> > of setups out there that had the older version that reused
> > filehandles
> > (due to inode number reuse). So you would get spurious ESTALE
> > errors
> > sometimes due to inode number reuse, sometimes because the
> > filehandle
> > fell out of the userspace NFS server's cache.
> 
> So this was likely done to work-around known weaknesses in NFS
> servers
> at the time.
> 
> The original d_add_unique() was used in nfs_lookup()
> nfs_atomic_lookup()
> and nfs_readdir_lookup() but the current descendent d_exact_alias()
> is
> only used in _nfs4_open_and_get_state() called only by nfs4_do_open()
> which is only used in nfs4_atomic_open() and nfs4_proc_create().
> 
> So the usage in 'lookup' and 'readdir' have fallen by the wayside
> with
> no apparent negative consequences.  
> The old NFS servers have probably been fixed.
> 
> So do you have any concerns with us discarding d_exact_alias() and
> only
> using d_splice_alias() in _nfs4_open_get_state() ??
> 

AFAIK, there were never any NFSv4 servers in public use that mimicked
the quirks of the userspace NFSv2/NFSv3 server. So I'm thinking it
should be safe to retire d_exact_alias.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2025-02-26  2:34 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-20 23:36 [PATCH 0/6] Change ->mkdir() and vfs_mkdir() to return a dentry NeilBrown
2025-02-20 23:36 ` [PATCH 1/6] Change inode_operations.mkdir to return struct dentry * NeilBrown
2025-02-22  4:19   ` Al Viro
2025-02-24  1:34     ` NeilBrown
2025-02-24  2:09       ` Al Viro
2025-02-24  3:09         ` NeilBrown
2025-02-24 15:56           ` Trond Myklebust
2025-02-26  2:09             ` NeilBrown
2025-02-26  2:34               ` Trond Myklebust [this message]
2025-02-26  3:18                 ` NeilBrown
2025-02-26  3:35                   ` Al Viro
2025-02-22  4:56   ` Al Viro
2025-02-20 23:36 ` [PATCH 2/6] hostfs: store inode in dentry after mkdir if possible NeilBrown
2025-02-21 13:17   ` Jeff Layton
2025-02-20 23:36 ` [PATCH 3/6] ceph: return the correct dentry on mkdir NeilBrown
2025-02-21  1:48   ` Viacheslav Dubeyko
2025-02-24  2:15     ` NeilBrown
2025-02-24 22:09       ` Viacheslav Dubeyko
2025-02-24 22:53         ` Jeff Layton
2025-02-24 23:29         ` NeilBrown
2025-02-21 13:31   ` Jeff Layton
2025-02-20 23:36 ` [PATCH 4/6] fuse: return correct dentry for ->mkdir NeilBrown
2025-02-21 13:39   ` Jeff Layton
2025-02-22  4:24   ` Al Viro
2025-02-24  2:26     ` NeilBrown
2025-02-24  2:53       ` Al Viro
2025-02-20 23:36 ` [PATCH 5/6] nfs: change mkdir inode_operation to return alternate dentry if needed NeilBrown
2025-02-22  4:41   ` Al Viro
2025-02-24  2:41     ` NeilBrown
2025-02-20 23:36 ` [PATCH 6/6] VFS: Change vfs_mkdir() to return the dentry NeilBrown
2025-02-21 14:25   ` Jeff Layton
2025-02-22  0:32   ` Chuck Lever
2025-02-24  2:51     ` NeilBrown
2025-02-24 14:22       ` Chuck Lever
  -- strict thread matches above, loose matches on Subject: below --
2025-02-27  1:32 [PATCH 0/6 v2] Change ->mkdir() and vfs_mkdir() to return a dentry NeilBrown
2025-02-27  1:32 ` [PATCH 1/6] Change inode_operations.mkdir to return struct dentry * NeilBrown
2025-02-27 11:34   ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50e6b21c644b050a29e159c9484a5e01061434f6.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=Dai.Ngo@oracle.com \
    --cc=anna@kernel.org \
    --cc=anton.ivanov@cambridgegreys.com \
    --cc=brauner@kernel.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=idryomov@gmail.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=johannes@sipsolutions.net \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-um@lists.infradead.org \
    --cc=miklos@szeredi.hu \
    --cc=neilb@suse.de \
    --cc=netfs@lists.linux.dev \
    --cc=okorniev@redhat.com \
    --cc=richard@nod.at \
    --cc=senozhatsky@chromium.org \
    --cc=tom@talpey.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xiubli@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox