From: Trond Myklebust <trondmy@hammerspace.com>
To: "neilb@suse.de" <neilb@suse.de>
Cc: "xiubli@redhat.com" <xiubli@redhat.com>,
"brauner@kernel.org" <brauner@kernel.org>,
"idryomov@gmail.com" <idryomov@gmail.com>,
"okorniev@redhat.com" <okorniev@redhat.com>,
"linux-cifs@vger.kernel.org" <linux-cifs@vger.kernel.org>,
"Dai.Ngo@oracle.com" <Dai.Ngo@oracle.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"johannes@sipsolutions.net" <johannes@sipsolutions.net>,
"chuck.lever@oracle.com" <chuck.lever@oracle.com>,
"jlayton@kernel.org" <jlayton@kernel.org>,
"anna@kernel.org" <anna@kernel.org>,
"miklos@szeredi.hu" <miklos@szeredi.hu>,
"anton.ivanov@cambridgegreys.com"
<anton.ivanov@cambridgegreys.com>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"jack@suse.cz" <jack@suse.cz>, "tom@talpey.com" <tom@talpey.com>,
"richard@nod.at" <richard@nod.at>,
"linux-um@lists.infradead.org" <linux-um@lists.infradead.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"netfs@lists.linux.dev" <netfs@lists.linux.dev>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"ceph-devel@vger.kernel.org" <ceph-devel@vger.kernel.org>,
"senozhatsky@chromium.org" <senozhatsky@chromium.org>
Subject: Re: [PATCH 1/6] Change inode_operations.mkdir to return struct dentry *
Date: Wed, 26 Feb 2025 02:34:49 +0000 [thread overview]
Message-ID: <50e6b21c644b050a29e159c9484a5e01061434f6.camel@hammerspace.com> (raw)
In-Reply-To: <174053575968.102979.1033896985966452059@noble.neil.brown.name>
On Wed, 2025-02-26 at 13:09 +1100, NeilBrown wrote:
> On Tue, 25 Feb 2025, Trond Myklebust wrote:
> > On Mon, 2025-02-24 at 14:09 +1100, NeilBrown wrote:
> > > On Mon, 24 Feb 2025, Al Viro wrote:
> > > > On Mon, Feb 24, 2025 at 12:34:06PM +1100, NeilBrown wrote:
> > > > > On Sat, 22 Feb 2025, Al Viro wrote:
> > > > > > On Fri, Feb 21, 2025 at 10:36:30AM +1100, NeilBrown wrote:
> > > > > >
> > > > > > > +In general, filesystems which use d_instantiate_new() to
> > > > > > > install the new
> > > > > > > +inode can safely return NULL. Filesystems which may not
> > > > > > > have an I_NEW inode
> > > > > > > +should use d_drop();d_splice_alias() and return the
> > > > > > > result
> > > > > > > of the latter.
> > > > > >
> > > > > > IMO that's a bad pattern, _especially_ if you want to go
> > > > > > for
> > > > > > "in-update"
> > > > > > kind of stuff later.
> > > > >
> > > > > Agreed. I have a draft patch to change d_splice_alias() and
> > > > > d_exact_alias() to work on hashed dentrys. I thought it
> > > > > should
> > > > > go after
> > > > > these mkdir patches rather than before.
> > > >
> > > > Could you give a braindump on the things d_exact_alias() is
> > > > needed
> > > > for?
> > > > It's a recurring headache when doing ->d_name/->d_parent
> > > > audits;
> > > > see e.g.
> > > > https://lore.kernel.org/all/20241213080023.GI3387508@ZenIV/ for
> > > > related
> > > > mini-rant from the latest iteration.
> > > >
> > > > Proof of correctness is bloody awful; it feels like the
> > > > primitive
> > > > itself
> > > > is wrong, but I'd never been able to write anything concise
> > > > regarding
> > > > the things we really want there ;-/
> > > >
> > >
> > > As I understand it, it is needed (or wanted) to handle the
> > > possibility
> > > of an inode becoming "stale" and then recovering. This could
> > > happen,
> > > for example, with a temporarily misconfigured NFS server.
> > >
> > > If ->d_revalidate gets a NFSERR_STALE from the server it will
> > > return
> > > '0'
> > > so lookup_fast() and others will call d_invalidate() which will
> > > d_drop()
> > > the dentry. There are other paths on which -ESTALE can result in
> > > d_drop().
> > >
> > > If a subsequent attempt to "open" the name successfully finds the
> > > same
> > > inode we want to reuse the old dentry rather than create a new
> > > one.
> > >
> > > I don't really understand why. This code was added 20 years ago
> > > before
> > > git.
> > > It was introduced by
> > >
> > > commit 89a45174b6b32596ea98fa3f89a243e2c1188a01
> > > Author: Trond Myklebust <trond.myklebust@fys.uio.no>
> > > Date: Tue Jan 4 21:41:37 2005 +0100
> > >
> > > VFS: Avoid dentry aliasing problems in filesystems like NFS,
> > > where
> > > inodes may be marked as stale in one instance (causing
> > > the
> > > dentry
> > > to be dropped) then re-enabled in the next instance.
> > >
> > > Signed-off-by: Trond Myklebust <trond.myklebust@fys.uio.no>
> > >
> > > in history.git
> > >
> > > Trond: do you have any memory of this? Can you explain what the
> > > symptom
> > > was that you wanted to fix?
> > >
> > > The original patch used d_add_unique() for lookup and atomic_open
> > > and
> > > readdir prime-dcache. We now only use it for v4 atomic_open.
> > > Maybe
> > > we
> > > don't need it at all? Or maybe we need to restore it to those
> > > other
> > > callers?
> > >
> >
> > 2005? That looks like it was trying to deal with the userspace NFS
> > server. I can't remember when it was given the ability to use the
> > inode
> > generation counter, but I'm pretty sure that in 2005 there were
> > plenty
> > of setups out there that had the older version that reused
> > filehandles
> > (due to inode number reuse). So you would get spurious ESTALE
> > errors
> > sometimes due to inode number reuse, sometimes because the
> > filehandle
> > fell out of the userspace NFS server's cache.
>
> So this was likely done to work-around known weaknesses in NFS
> servers
> at the time.
>
> The original d_add_unique() was used in nfs_lookup()
> nfs_atomic_lookup()
> and nfs_readdir_lookup() but the current descendent d_exact_alias()
> is
> only used in _nfs4_open_and_get_state() called only by nfs4_do_open()
> which is only used in nfs4_atomic_open() and nfs4_proc_create().
>
> So the usage in 'lookup' and 'readdir' have fallen by the wayside
> with
> no apparent negative consequences.
> The old NFS servers have probably been fixed.
>
> So do you have any concerns with us discarding d_exact_alias() and
> only
> using d_splice_alias() in _nfs4_open_get_state() ??
>
AFAIK, there were never any NFSv4 servers in public use that mimicked
the quirks of the userspace NFSv2/NFSv3 server. So I'm thinking it
should be safe to retire d_exact_alias.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
next prev parent reply other threads:[~2025-02-26 2:34 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-20 23:36 [PATCH 0/6] Change ->mkdir() and vfs_mkdir() to return a dentry NeilBrown
2025-02-20 23:36 ` [PATCH 1/6] Change inode_operations.mkdir to return struct dentry * NeilBrown
2025-02-22 4:19 ` Al Viro
2025-02-24 1:34 ` NeilBrown
2025-02-24 2:09 ` Al Viro
2025-02-24 3:09 ` NeilBrown
2025-02-24 15:56 ` Trond Myklebust
2025-02-26 2:09 ` NeilBrown
2025-02-26 2:34 ` Trond Myklebust [this message]
2025-02-26 3:18 ` NeilBrown
2025-02-26 3:35 ` Al Viro
2025-02-22 4:56 ` Al Viro
2025-02-20 23:36 ` [PATCH 2/6] hostfs: store inode in dentry after mkdir if possible NeilBrown
2025-02-21 13:17 ` Jeff Layton
2025-02-20 23:36 ` [PATCH 3/6] ceph: return the correct dentry on mkdir NeilBrown
2025-02-21 1:48 ` Viacheslav Dubeyko
2025-02-24 2:15 ` NeilBrown
2025-02-24 22:09 ` Viacheslav Dubeyko
2025-02-24 22:53 ` Jeff Layton
2025-02-24 23:29 ` NeilBrown
2025-02-21 13:31 ` Jeff Layton
2025-02-20 23:36 ` [PATCH 4/6] fuse: return correct dentry for ->mkdir NeilBrown
2025-02-21 13:39 ` Jeff Layton
2025-02-22 4:24 ` Al Viro
2025-02-24 2:26 ` NeilBrown
2025-02-24 2:53 ` Al Viro
2025-02-20 23:36 ` [PATCH 5/6] nfs: change mkdir inode_operation to return alternate dentry if needed NeilBrown
2025-02-22 4:41 ` Al Viro
2025-02-24 2:41 ` NeilBrown
2025-02-20 23:36 ` [PATCH 6/6] VFS: Change vfs_mkdir() to return the dentry NeilBrown
2025-02-21 14:25 ` Jeff Layton
2025-02-22 0:32 ` Chuck Lever
2025-02-24 2:51 ` NeilBrown
2025-02-24 14:22 ` Chuck Lever
-- strict thread matches above, loose matches on Subject: below --
2025-02-27 1:32 [PATCH 0/6 v2] Change ->mkdir() and vfs_mkdir() to return a dentry NeilBrown
2025-02-27 1:32 ` [PATCH 1/6] Change inode_operations.mkdir to return struct dentry * NeilBrown
2025-02-27 11:34 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50e6b21c644b050a29e159c9484a5e01061434f6.camel@hammerspace.com \
--to=trondmy@hammerspace.com \
--cc=Dai.Ngo@oracle.com \
--cc=anna@kernel.org \
--cc=anton.ivanov@cambridgegreys.com \
--cc=brauner@kernel.org \
--cc=ceph-devel@vger.kernel.org \
--cc=chuck.lever@oracle.com \
--cc=idryomov@gmail.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=johannes@sipsolutions.net \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-um@lists.infradead.org \
--cc=miklos@szeredi.hu \
--cc=neilb@suse.de \
--cc=netfs@lists.linux.dev \
--cc=okorniev@redhat.com \
--cc=richard@nod.at \
--cc=senozhatsky@chromium.org \
--cc=tom@talpey.com \
--cc=viro@zeniv.linux.org.uk \
--cc=xiubli@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox