From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>,
trondmy@kernel.org, Anna Schumaker <anna@kernel.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()
Date: Tue, 29 Apr 2025 09:54:29 -0700 [thread overview]
Message-ID: <9a09c8681702df0d8a1e3cefb8fb75113e1677b9.camel@kernel.org> (raw)
In-Reply-To: <2526363b-1f4a-4999-9f9a-8c4c5c6c132d@oracle.com>
On Tue, 2025-04-29 at 12:14 -0400, Chuck Lever wrote:
> On 4/27/25 7:01 PM, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >
> > The Linux client assumes that all filehandles are non-volatile for
> > renames within the same directory (otherwise sillyrename cannot work).
> > However, the existence of the Linux 'subtree_check' export option has
> > meant that nfs_rename() has always assumed it needs to flush writes
> > before attempting to rename.
> >
> > Since NFSv4 does allow the client to query whether or not the server
> > exhibits this behaviour, and since knfsd does actually set the
> > appropriate flag when 'subtree_check' is enabled on an export, it
> > should be OK to optimise away the write flushing behaviour in the cases
> > where it is clearly not needed.
>
> No objection to the high level goal, seems like a sensible change.
>
> So NFSv3 still has the flushing requirement, but NFSv4 can disable that
> requirement as long as the server in question asserts FH4_VOLATILE_ANY
> and FH4_VOL_RENAME, correct?
>
(Note that there is a v2 patch that was posted soon after this)
It's actually the reverse: if the server doesn't assert either
FH4_VOL_RENAME or FH4_VOLATILE_ANY, then you're able to disable that
requirement.
> I'm wondering how confident we are that other server implementations
> behave reasonably. Yes, they are broken if they don't behave, but there
> is still risk of introducing a regression.
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> > fs/nfs/client.c | 2 ++
> > fs/nfs/dir.c | 15 ++++++++++++++-
> > include/linux/nfs_fs_sb.h | 6 ++++++
> > 3 files changed, 22 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > index 2115c1189c2d..6d63b958c4bb 100644
> > --- a/fs/nfs/client.c
> > +++ b/fs/nfs/client.c
> > @@ -1105,6 +1105,8 @@ struct nfs_server *nfs_create_server(struct fs_context *fc)
> > if (server->namelen == 0 || server->namelen > NFS2_MAXNAMLEN)
> > server->namelen = NFS2_MAXNAMLEN;
> > }
> > + /* Linux 'subtree_check' borkenness mandates this setting */
>
> Assuming you are going to respin this patch to deal with the build bot
> warnings, can you make this comment more useful? "borkenness" sounds
> like you are dealing with a server bug here, but that's not really
> the case. subtree_check is working as designed: it requires the extra
> flushing. The no_subtree_check case does not, IIUC.
>
> It would be better to explain this need strictly in terms of file handle
> volatility, no?
>
>
> > + server->fh_expire_type = NFS_FH_VOL_RENAME;
> >
> > if (!(fattr->valid & NFS_ATTR_FATTR)) {
> > error = ctx->nfs_mod->rpc_ops->getattr(server, ctx->mntfh,
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index bd23fc736b39..d0e0b435a843 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -2676,6 +2676,18 @@ nfs_unblock_rename(struct rpc_task *task, struct nfs_renamedata *data)
> > unblock_revalidate(new_dentry);
> > }
> >
> > +static bool nfs_rename_is_unsafe_cross_dir(struct dentry *old_dentry,
> > + struct dentry *new_dentry)
> > +{
> > + struct nfs_server *server = NFS_SB(old_dentry->d_sb);
> > +
> > + if (old_dentry->d_parent != new_dentry->d_parent)
> > + return false;
> > + if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
> > + return !(server->fh_expire_type & NFS_FH_NOEXPIRE_WITH_OPEN);
> > + return true;
> > +}
> > +
> > /*
> > * RENAME
> > * FIXME: Some nfsds, like the Linux user space nfsd, may generate a
> > @@ -2763,7 +2775,8 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
> >
> > }
> >
> > - if (S_ISREG(old_inode->i_mode))
> > + if (S_ISREG(old_inode->i_mode) &&
> > + nfs_rename_is_unsafe_cross_dir(old_dentry, new_dentry))
> > nfs_sync_inode(old_inode);
> > task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry,
> > must_unblock ? nfs_unblock_rename : NULL);
> > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> > index 71319637a84e..6732c7e04d19 100644
> > --- a/include/linux/nfs_fs_sb.h
> > +++ b/include/linux/nfs_fs_sb.h
> > @@ -236,6 +236,12 @@ struct nfs_server {
> > u32 acl_bitmask; /* V4 bitmask representing the ACEs
> > that are supported on this
> > filesystem */
> > + /* The following #defines numerically match the NFSv4 equivalents */
> > +#define NFS_FH_NOEXPIRE_WITH_OPEN (0x1)
> > +#define NFS_FH_VOLATILE_ANY (0x2)
> > +#define NFS_FH_VOL_MIGRATION (0x4)
> > +#define NFS_FH_VOL_RENAME (0x8)
> > +#define NFS_FH_RENAME_UNSAFE (NFS_FH_VOLATILE_ANY | NFS_FH_VOL_RENAME)
> > u32 fh_expire_type; /* V4 bitmask representing file
> > handle volatility type for
> > this filesystem */
>
>
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2025-04-29 16:54 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-27 23:01 [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename() trondmy
2025-04-28 1:53 ` kernel test robot
2025-04-28 2:35 ` kernel test robot
2025-04-29 16:14 ` Chuck Lever
2025-04-29 16:54 ` Jeff Layton [this message]
2025-04-29 23:22 ` Trond Myklebust
2025-04-30 13:28 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a09c8681702df0d8a1e3cefb8fb75113e1677b9.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox