* [PATCH 0/1] Client optimize away CB_RECALL for write delegation
@ 2026-05-28 19:22 Benjamin Coddington
2026-05-28 19:22 ` [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access Benjamin Coddington
0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Coddington @ 2026-05-28 19:22 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, linux-kernel
A client holding a write delegation that performs a SETATTR that would
remove the client's write access can expect the server to recall that
delegation. The following patch addresses the simplest case to dectect this
and then preemptively return the delegation rather than have server recall
it.
Benjamin Coddington (1):
nfs: return a write delegation when a SETATTR drops our write access
fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 62 insertions(+), 4 deletions(-)
base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d
--
2.53.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access
2026-05-28 19:22 [PATCH 0/1] Client optimize away CB_RECALL for write delegation Benjamin Coddington
@ 2026-05-28 19:22 ` Benjamin Coddington
2026-05-29 14:00 ` Trond Myklebust
0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Coddington @ 2026-05-28 19:22 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, linux-kernel
A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a later
open(O_WRONLY) from the cached delegation (can_open_delegated()) without
sending an OPEN to the server. That cached "open for write" assertion is
only valid while the delegation holder still has write access. A SETATTR
that changes mode, owner, or group can revoke that access -- after which an
open served from the delegation would bypass an access check the server
would now fail, and, against a server that recalls the delegation on such a
change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry round
trip.
Before issuing such a SETATTR, check whether the proposed mode/owner/group
would remove write access for the delegation's owning credential, judged by
the resulting POSIX mode bits. If so, return the delegation first: the
return is synchronous and flushes modified data, so the SETATTR proceeds on
an open or special stateid and the next open revalidates access with the
server. Permission changes that keep the holder's write access leave the
delegation in place.
Only the mode bits and the holder's fsuid/fsgid are consulted. An NFSv4 ACL
cannot be evaluated by the client, a privileged caller may retain access the
bits deny, and supplementary group membership is not checked, so the test is
necessarily approximate -- but an inexact answer costs at most an
unnecessary delegation return or a fall back to the server's recall, never
incorrect access.
RFC 8881 Section 10.4.4 permits a client to return a delegation voluntarily,
performing the same pre-return state updates (data flush, pending
truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit c01d36457dcc
("NFSv4: Don't return the delegation when not needed by NFSv4.x (x>0)")
stopped returning write delegations on SETATTR for NFSv4.1+, since the
server can identify the delegation holder from the SEQUENCE clientid and
need not recall. That holds for changes that do not affect the holder's
access; restore a return only for the narrow case where the holder's own
write access is removed.
Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com>
---
fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 62 insertions(+), 4 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index a9b8d482d289..e4b7322bf75c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4506,7 +4506,55 @@ int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
return err;
}
-/*
+/*
+ * Would applying @sattr (which changes mode, owner, and/or group) remove the
+ * write access of a held write delegation's owning credential, as judged by
+ * the resulting file mode bits?
+ *
+ * Such a change makes the delegation's cached "open for write" assertion
+ * stale: a later open(O_WRONLY) could be served from the delegation without
+ * the server getting a chance to deny it. Only the mode bits and the
+ * holder's fsuid/fsgid are consulted; an NFSv4 ACL (which the client cannot
+ * evaluate locally), a privileged caller, or supplementary group membership
+ * may make the answer imprecise, but the cost is at most an unnecessary
+ * delegation return or a fall back to the server's recall -- never incorrect
+ * access.
+ */
+static bool nfs4_setattr_removes_write(struct inode *inode, struct iattr *sattr)
+{
+ struct nfs_delegation *delegation;
+ const struct cred *cred;
+ umode_t mode = inode->i_mode;
+ kuid_t uid = inode->i_uid;
+ kgid_t gid = inode->i_gid;
+ bool ret = false;
+
+ delegation = nfs4_get_valid_delegation(inode);
+ if (!delegation)
+ return false;
+ if (!(delegation->type & FMODE_WRITE))
+ goto out;
+ cred = delegation->cred;
+
+ if (sattr->ia_valid & ATTR_MODE)
+ mode = sattr->ia_mode;
+ if (sattr->ia_valid & ATTR_UID)
+ uid = sattr->ia_uid;
+ if (sattr->ia_valid & ATTR_GID)
+ gid = sattr->ia_gid;
+
+ if (uid_eq(uid, cred->fsuid))
+ ret = !(mode & S_IWUSR);
+ else if (gid_eq(gid, cred->fsgid))
+ ret = !(mode & S_IWGRP);
+ else
+ ret = !(mode & S_IWOTH);
+out:
+ nfs_put_delegation(delegation);
+ return ret;
+}
+
+/*
* The file is not closed if it is opened due to the a request to change
* the size of the file. The open call will not be needed once the
* VFS layer lookup-intents are implemented.
@@ -4555,9 +4603,19 @@ nfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr,
cred = ctx->cred;
}
- /* Return any delegations if we're going to change ACLs */
- if ((sattr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0)
- nfs4_inode_make_writeable(inode);
+ /*
+ * A change to mode, owner, or group that removes the write
+ * delegation holder's own write access makes the delegation's cached
+ * "open for write" stale; return it so a later open() revalidates
+ * access with the server. A change that keeps write access leaves
+ * the delegation in place.
+ */
+ if (sattr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) {
+ if (nfs4_setattr_removes_write(inode, sattr))
+ nfs4_inode_return_delegation(inode);
+ else
+ nfs4_inode_make_writeable(inode);
+ }
status = nfs4_do_setattr(inode, cred, fattr, sattr, ctx, NULL);
if (status == 0) {
--
2.53.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access
2026-05-28 19:22 ` [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access Benjamin Coddington
@ 2026-05-29 14:00 ` Trond Myklebust
2026-05-29 15:27 ` Rick Macklem
2026-05-29 16:22 ` Benjamin Coddington
0 siblings, 2 replies; 8+ messages in thread
From: Trond Myklebust @ 2026-05-29 14:00 UTC (permalink / raw)
To: Benjamin Coddington, Anna Schumaker; +Cc: linux-nfs, linux-kernel
On Thu, 2026-05-28 at 15:22 -0400, Benjamin Coddington wrote:
> A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a
> later
> open(O_WRONLY) from the cached delegation (can_open_delegated())
> without
> sending an OPEN to the server. That cached "open for write" assertion
> is
> only valid while the delegation holder still has write access. A
> SETATTR
> that changes mode, owner, or group can revoke that access -- after
> which an
> open served from the delegation would bypass an access check the
> server
> would now fail, and, against a server that recalls the delegation on
> such a
> change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry
> round
> trip.
>
> Before issuing such a SETATTR, check whether the proposed
> mode/owner/group
> would remove write access for the delegation's owning credential,
> judged by
> the resulting POSIX mode bits. If so, return the delegation first:
> the
> return is synchronous and flushes modified data, so the SETATTR
> proceeds on
> an open or special stateid and the next open revalidates access with
> the
> server. Permission changes that keep the holder's write access leave
> the
> delegation in place.
>
> Only the mode bits and the holder's fsuid/fsgid are consulted. An
> NFSv4 ACL
> cannot be evaluated by the client, a privileged caller may retain
> access the
> bits deny, and supplementary group membership is not checked, so the
> test is
> necessarily approximate -- but an inexact answer costs at most an
> unnecessary delegation return or a fall back to the server's recall,
> never
> incorrect access.
>
> RFC 8881 Section 10.4.4 permits a client to return a delegation
> voluntarily,
> performing the same pre-return state updates (data flush, pending
> truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit
> c01d36457dcc
> ("NFSv4: Don't return the delegation when not needed by NFSv4.x
> (x>0)")
> stopped returning write delegations on SETATTR for NFSv4.1+, since
> the
> server can identify the delegation holder from the SEQUENCE clientid
> and
> need not recall. That holds for changes that do not affect the
> holder's
> access; restore a return only for the narrow case where the holder's
> own
> write access is removed.
Hmmm... I'd argue that while recalling the delegation in this case is
mandatory for NFSv4.0, that is certainly not true for NFSv4.1.
Furthermore, I'd argue that if the holder of a write delegation is just
changing the mode, then that should never result in a delegation recall
for a well written NFSv4.1 server. The reason is this does not impact
the client's ability to cache data, metadata or lock state. It only
impacts its ability to rely on previously cached access data when
handling new opens.
One can argue whether or not it's needed for a uid or gid change by
said holder of the delegation, but there too I'd say the right
behaviour is to err on the side of not recalling.
The exception might be if this is an attribute delegation, and the
result will be that the cred attached to the delegation will no longer
be able to issue a SETATTR to update the atime/mtime on delegation
return.
So yes to pre-emptive invalidation of the access cache, but I'm very
sceptical to actually pre-emptively returning the delegation or even
the layouts.
>
> Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com>
> ---
> fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++-
> --
> 1 file changed, 62 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index a9b8d482d289..e4b7322bf75c 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -4506,7 +4506,55 @@ int nfs4_proc_getattr(struct nfs_server
> *server, struct nfs_fh *fhandle,
> return err;
> }
>
> -/*
> +/*
> + * Would applying @sattr (which changes mode, owner, and/or group)
> remove the
> + * write access of a held write delegation's owning credential, as
> judged by
> + * the resulting file mode bits?
> + *
> + * Such a change makes the delegation's cached "open for write"
> assertion
> + * stale: a later open(O_WRONLY) could be served from the delegation
> without
> + * the server getting a chance to deny it. Only the mode bits and
> the
> + * holder's fsuid/fsgid are consulted; an NFSv4 ACL (which the
> client cannot
> + * evaluate locally), a privileged caller, or supplementary group
> membership
> + * may make the answer imprecise, but the cost is at most an
> unnecessary
> + * delegation return or a fall back to the server's recall -- never
> incorrect
> + * access.
> + */
> +static bool nfs4_setattr_removes_write(struct inode *inode, struct
> iattr *sattr)
> +{
> + struct nfs_delegation *delegation;
> + const struct cred *cred;
> + umode_t mode = inode->i_mode;
> + kuid_t uid = inode->i_uid;
> + kgid_t gid = inode->i_gid;
> + bool ret = false;
> +
> + delegation = nfs4_get_valid_delegation(inode);
> + if (!delegation)
> + return false;
> + if (!(delegation->type & FMODE_WRITE))
> + goto out;
> + cred = delegation->cred;
> +
> + if (sattr->ia_valid & ATTR_MODE)
> + mode = sattr->ia_mode;
> + if (sattr->ia_valid & ATTR_UID)
> + uid = sattr->ia_uid;
> + if (sattr->ia_valid & ATTR_GID)
> + gid = sattr->ia_gid;
> +
> + if (uid_eq(uid, cred->fsuid))
> + ret = !(mode & S_IWUSR);
> + else if (gid_eq(gid, cred->fsgid))
> + ret = !(mode & S_IWGRP);
> + else
> + ret = !(mode & S_IWOTH);
> +out:
> + nfs_put_delegation(delegation);
> + return ret;
> +}
> +
> +/*
> * The file is not closed if it is opened due to the a request to
> change
> * the size of the file. The open call will not be needed once the
> * VFS layer lookup-intents are implemented.
> @@ -4555,9 +4603,19 @@ nfs4_proc_setattr(struct dentry *dentry,
> struct nfs_fattr *fattr,
> cred = ctx->cred;
> }
>
> - /* Return any delegations if we're going to change ACLs */
> - if ((sattr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0)
> - nfs4_inode_make_writeable(inode);
> + /*
> + * A change to mode, owner, or group that removes the write
> + * delegation holder's own write access makes the
> delegation's cached
> + * "open for write" stale; return it so a later open()
> revalidates
> + * access with the server. A change that keeps write access
> leaves
> + * the delegation in place.
> + */
> + if (sattr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) {
> + if (nfs4_setattr_removes_write(inode, sattr))
> + nfs4_inode_return_delegation(inode);
> + else
> + nfs4_inode_make_writeable(inode);
> + }
>
> status = nfs4_do_setattr(inode, cred, fattr, sattr, ctx,
> NULL);
> if (status == 0) {
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access
2026-05-29 14:00 ` Trond Myklebust
@ 2026-05-29 15:27 ` Rick Macklem
2026-05-29 15:46 ` Rick Macklem
2026-05-29 19:46 ` Benjamin Coddington
2026-05-29 16:22 ` Benjamin Coddington
1 sibling, 2 replies; 8+ messages in thread
From: Rick Macklem @ 2026-05-29 15:27 UTC (permalink / raw)
To: Trond Myklebust
Cc: Benjamin Coddington, Anna Schumaker, linux-nfs, linux-kernel
On Fri, May 29, 2026 at 7:06 AM Trond Myklebust <trondmy@kernel.org> wrote:
>
> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If you are unsure, forward the message to ITHelp@uoguelph.ca for review.
>
>
> On Thu, 2026-05-28 at 15:22 -0400, Benjamin Coddington wrote:
> > A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a
> > later
> > open(O_WRONLY) from the cached delegation (can_open_delegated())
> > without
> > sending an OPEN to the server. That cached "open for write" assertion
> > is
> > only valid while the delegation holder still has write access. A
> > SETATTR
> > that changes mode, owner, or group can revoke that access -- after
> > which an
> > open served from the delegation would bypass an access check the
> > server
> > would now fail, and, against a server that recalls the delegation on
> > such a
> > change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry
> > round
> > trip.
> >
> > Before issuing such a SETATTR, check whether the proposed
> > mode/owner/group
> > would remove write access for the delegation's owning credential,
> > judged by
> > the resulting POSIX mode bits. If so, return the delegation first:
> > the
> > return is synchronous and flushes modified data, so the SETATTR
> > proceeds on
> > an open or special stateid and the next open revalidates access with
> > the
> > server. Permission changes that keep the holder's write access leave
> > the
> > delegation in place.
> >
> > Only the mode bits and the holder's fsuid/fsgid are consulted. An
> > NFSv4 ACL
> > cannot be evaluated by the client, a privileged caller may retain
> > access the
> > bits deny, and supplementary group membership is not checked, so the
> > test is
> > necessarily approximate -- but an inexact answer costs at most an
> > unnecessary delegation return or a fall back to the server's recall,
> > never
> > incorrect access.
> >
> > RFC 8881 Section 10.4.4 permits a client to return a delegation
> > voluntarily,
> > performing the same pre-return state updates (data flush, pending
> > truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit
> > c01d36457dcc
> > ("NFSv4: Don't return the delegation when not needed by NFSv4.x
> > (x>0)")
> > stopped returning write delegations on SETATTR for NFSv4.1+, since
> > the
> > server can identify the delegation holder from the SEQUENCE clientid
> > and
> > need not recall. That holds for changes that do not affect the
> > holder's
> > access; restore a return only for the narrow case where the holder's
> > own
> > write access is removed.
>
> Hmmm... I'd argue that while recalling the delegation in this case is
> mandatory for NFSv4.0, that is certainly not true for NFSv4.1.
>
> Furthermore, I'd argue that if the holder of a write delegation is just
> changing the mode, then that should never result in a delegation recall
> for a well written NFSv4.1 server. The reason is this does not impact
> the client's ability to cache data, metadata or lock state. It only
> impacts its ability to rely on previously cached access data when
> handling new opens.
I'm not sure I completely agree with this statement. The case I would
be concerned about is delayed writes sitting in the client.
Maybe an NFSv4.1/4.2 server should always allow writes from a
client that holds a write delegation for the file, but I don't think that
is spelled out in RFC8881 (I'm never sure, given that monstrous
document) and I'll admit that the FreeBSD server
does not do that. The FreeBSD server currently does always allow the
owner of the file to do writes, but does not do the same w.r.t. write
delegation held by the client. (I'll think about adding that override,
because it does seem reasonable.)
What does the Linux knfsd currently do w.r.t. allowing writes
from a client that holds a write delegation?
Certainly setting mode bits won't be a problem and clearing
owner mode bits isn't a problem for the FreeBSD server.
>
> One can argue whether or not it's needed for a uid or gid change by
> said holder of the delegation, but there too I'd say the right
> behaviour is to err on the side of not recalling.
I might argue that whether or not clearing mode bits requires a
delegation recall should be left up to the server.
> The exception might be if this is an attribute delegation, and the
> result will be that the cred attached to the delegation will no longer
> be able to issue a SETATTR to update the atime/mtime on delegation
> return.
Lost me. What's an attribute delegation?
rick
>
> So yes to pre-emptive invalidation of the access cache, but I'm very
> sceptical to actually pre-emptively returning the delegation or even
> the layouts.
>
> >
> > Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com>
> > ---
> > fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++-
> > --
> > 1 file changed, 62 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index a9b8d482d289..e4b7322bf75c 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -4506,7 +4506,55 @@ int nfs4_proc_getattr(struct nfs_server
> > *server, struct nfs_fh *fhandle,
> > return err;
> > }
> >
> > -/*
> > +/*
> > + * Would applying @sattr (which changes mode, owner, and/or group)
> > remove the
> > + * write access of a held write delegation's owning credential, as
> > judged by
> > + * the resulting file mode bits?
> > + *
> > + * Such a change makes the delegation's cached "open for write"
> > assertion
> > + * stale: a later open(O_WRONLY) could be served from the delegation
> > without
> > + * the server getting a chance to deny it. Only the mode bits and
> > the
> > + * holder's fsuid/fsgid are consulted; an NFSv4 ACL (which the
> > client cannot
> > + * evaluate locally), a privileged caller, or supplementary group
> > membership
> > + * may make the answer imprecise, but the cost is at most an
> > unnecessary
> > + * delegation return or a fall back to the server's recall -- never
> > incorrect
> > + * access.
> > + */
> > +static bool nfs4_setattr_removes_write(struct inode *inode, struct
> > iattr *sattr)
> > +{
> > + struct nfs_delegation *delegation;
> > + const struct cred *cred;
> > + umode_t mode = inode->i_mode;
> > + kuid_t uid = inode->i_uid;
> > + kgid_t gid = inode->i_gid;
> > + bool ret = false;
> > +
> > + delegation = nfs4_get_valid_delegation(inode);
> > + if (!delegation)
> > + return false;
> > + if (!(delegation->type & FMODE_WRITE))
> > + goto out;
> > + cred = delegation->cred;
> > +
> > + if (sattr->ia_valid & ATTR_MODE)
> > + mode = sattr->ia_mode;
> > + if (sattr->ia_valid & ATTR_UID)
> > + uid = sattr->ia_uid;
> > + if (sattr->ia_valid & ATTR_GID)
> > + gid = sattr->ia_gid;
> > +
> > + if (uid_eq(uid, cred->fsuid))
> > + ret = !(mode & S_IWUSR);
> > + else if (gid_eq(gid, cred->fsgid))
> > + ret = !(mode & S_IWGRP);
> > + else
> > + ret = !(mode & S_IWOTH);
> > +out:
> > + nfs_put_delegation(delegation);
> > + return ret;
> > +}
> > +
> > +/*
> > * The file is not closed if it is opened due to the a request to
> > change
> > * the size of the file. The open call will not be needed once the
> > * VFS layer lookup-intents are implemented.
> > @@ -4555,9 +4603,19 @@ nfs4_proc_setattr(struct dentry *dentry,
> > struct nfs_fattr *fattr,
> > cred = ctx->cred;
> > }
> >
> > - /* Return any delegations if we're going to change ACLs */
> > - if ((sattr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0)
> > - nfs4_inode_make_writeable(inode);
> > + /*
> > + * A change to mode, owner, or group that removes the write
> > + * delegation holder's own write access makes the
> > delegation's cached
> > + * "open for write" stale; return it so a later open()
> > revalidates
> > + * access with the server. A change that keeps write access
> > leaves
> > + * the delegation in place.
> > + */
> > + if (sattr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) {
> > + if (nfs4_setattr_removes_write(inode, sattr))
> > + nfs4_inode_return_delegation(inode);
> > + else
> > + nfs4_inode_make_writeable(inode);
> > + }
> >
> > status = nfs4_do_setattr(inode, cred, fattr, sattr, ctx,
> > NULL);
> > if (status == 0) {
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trondmy@kernel.org, trond.myklebust@hammerspace.com
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access
2026-05-29 15:27 ` Rick Macklem
@ 2026-05-29 15:46 ` Rick Macklem
2026-05-29 19:46 ` Benjamin Coddington
1 sibling, 0 replies; 8+ messages in thread
From: Rick Macklem @ 2026-05-29 15:46 UTC (permalink / raw)
To: Trond Myklebust
Cc: Benjamin Coddington, Anna Schumaker, linux-nfs, linux-kernel
On Fri, May 29, 2026 at 8:27 AM Rick Macklem <rick.macklem@gmail.com> wrote:
>
> On Fri, May 29, 2026 at 7:06 AM Trond Myklebust <trondmy@kernel.org> wrote:
> >
> > CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If you are unsure, forward the message to ITHelp@uoguelph.ca for review.
> >
> >
> > On Thu, 2026-05-28 at 15:22 -0400, Benjamin Coddington wrote:
> > > A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a
> > > later
> > > open(O_WRONLY) from the cached delegation (can_open_delegated())
> > > without
> > > sending an OPEN to the server. That cached "open for write" assertion
> > > is
> > > only valid while the delegation holder still has write access. A
> > > SETATTR
> > > that changes mode, owner, or group can revoke that access -- after
> > > which an
> > > open served from the delegation would bypass an access check the
> > > server
> > > would now fail, and, against a server that recalls the delegation on
> > > such a
> > > change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry
> > > round
> > > trip.
> > >
> > > Before issuing such a SETATTR, check whether the proposed
> > > mode/owner/group
> > > would remove write access for the delegation's owning credential,
> > > judged by
> > > the resulting POSIX mode bits. If so, return the delegation first:
> > > the
> > > return is synchronous and flushes modified data, so the SETATTR
> > > proceeds on
> > > an open or special stateid and the next open revalidates access with
> > > the
> > > server. Permission changes that keep the holder's write access leave
> > > the
> > > delegation in place.
> > >
> > > Only the mode bits and the holder's fsuid/fsgid are consulted. An
> > > NFSv4 ACL
> > > cannot be evaluated by the client, a privileged caller may retain
> > > access the
> > > bits deny, and supplementary group membership is not checked, so the
> > > test is
> > > necessarily approximate -- but an inexact answer costs at most an
> > > unnecessary delegation return or a fall back to the server's recall,
> > > never
> > > incorrect access.
> > >
> > > RFC 8881 Section 10.4.4 permits a client to return a delegation
> > > voluntarily,
> > > performing the same pre-return state updates (data flush, pending
> > > truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit
> > > c01d36457dcc
> > > ("NFSv4: Don't return the delegation when not needed by NFSv4.x
> > > (x>0)")
> > > stopped returning write delegations on SETATTR for NFSv4.1+, since
> > > the
> > > server can identify the delegation holder from the SEQUENCE clientid
> > > and
> > > need not recall. That holds for changes that do not affect the
> > > holder's
> > > access; restore a return only for the narrow case where the holder's
> > > own
> > > write access is removed.
> >
> > Hmmm... I'd argue that while recalling the delegation in this case is
> > mandatory for NFSv4.0, that is certainly not true for NFSv4.1.
> >
> > Furthermore, I'd argue that if the holder of a write delegation is just
> > changing the mode, then that should never result in a delegation recall
> > for a well written NFSv4.1 server. The reason is this does not impact
> > the client's ability to cache data, metadata or lock state. It only
> > impacts its ability to rely on previously cached access data when
> > handling new opens.
> I'm not sure I completely agree with this statement. The case I would
> be concerned about is delayed writes sitting in the client.
>
> Maybe an NFSv4.1/4.2 server should always allow writes from a
> client that holds a write delegation for the file, but I don't think that
> is spelled out in RFC8881 (I'm never sure, given that monstrous
> document) and I'll admit that the FreeBSD server
> does not do that. The FreeBSD server currently does always allow the
> owner of the file to do writes, but does not do the same w.r.t. write
> delegation held by the client. (I'll think about adding that override,
> because it does seem reasonable.)
>
> What does the Linux knfsd currently do w.r.t. allowing writes
> from a client that holds a write delegation?
>
> Certainly setting mode bits won't be a problem and clearing
> owner mode bits isn't a problem for the FreeBSD server.
>
Oh, and one more quirky corner..
If the server provided a non-empty ACE for the permissions
field for the write delegation, these SETATTR changes either
require the server to recall the delegation or the client to
invalidate use of this ACE.
I'd suggest that the client invalidate use of the ACE (if it
ever uses it?) and leave delegation recall up to the server.
rick
> >
> > One can argue whether or not it's needed for a uid or gid change by
> > said holder of the delegation, but there too I'd say the right
> > behaviour is to err on the side of not recalling.
> I might argue that whether or not clearing mode bits requires a
> delegation recall should be left up to the server.
>
> > The exception might be if this is an attribute delegation, and the
> > result will be that the cred attached to the delegation will no longer
> > be able to issue a SETATTR to update the atime/mtime on delegation
> > return.
> Lost me. What's an attribute delegation?
>
> rick
> >
> > So yes to pre-emptive invalidation of the access cache, but I'm very
> > sceptical to actually pre-emptively returning the delegation or even
> > the layouts.
> >
> > >
> > > Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com>
> > > ---
> > > fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++-
> > > --
> > > 1 file changed, 62 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > index a9b8d482d289..e4b7322bf75c 100644
> > > --- a/fs/nfs/nfs4proc.c
> > > +++ b/fs/nfs/nfs4proc.c
> > > @@ -4506,7 +4506,55 @@ int nfs4_proc_getattr(struct nfs_server
> > > *server, struct nfs_fh *fhandle,
> > > return err;
> > > }
> > >
> > > -/*
> > > +/*
> > > + * Would applying @sattr (which changes mode, owner, and/or group)
> > > remove the
> > > + * write access of a held write delegation's owning credential, as
> > > judged by
> > > + * the resulting file mode bits?
> > > + *
> > > + * Such a change makes the delegation's cached "open for write"
> > > assertion
> > > + * stale: a later open(O_WRONLY) could be served from the delegation
> > > without
> > > + * the server getting a chance to deny it. Only the mode bits and
> > > the
> > > + * holder's fsuid/fsgid are consulted; an NFSv4 ACL (which the
> > > client cannot
> > > + * evaluate locally), a privileged caller, or supplementary group
> > > membership
> > > + * may make the answer imprecise, but the cost is at most an
> > > unnecessary
> > > + * delegation return or a fall back to the server's recall -- never
> > > incorrect
> > > + * access.
> > > + */
> > > +static bool nfs4_setattr_removes_write(struct inode *inode, struct
> > > iattr *sattr)
> > > +{
> > > + struct nfs_delegation *delegation;
> > > + const struct cred *cred;
> > > + umode_t mode = inode->i_mode;
> > > + kuid_t uid = inode->i_uid;
> > > + kgid_t gid = inode->i_gid;
> > > + bool ret = false;
> > > +
> > > + delegation = nfs4_get_valid_delegation(inode);
> > > + if (!delegation)
> > > + return false;
> > > + if (!(delegation->type & FMODE_WRITE))
> > > + goto out;
> > > + cred = delegation->cred;
> > > +
> > > + if (sattr->ia_valid & ATTR_MODE)
> > > + mode = sattr->ia_mode;
> > > + if (sattr->ia_valid & ATTR_UID)
> > > + uid = sattr->ia_uid;
> > > + if (sattr->ia_valid & ATTR_GID)
> > > + gid = sattr->ia_gid;
> > > +
> > > + if (uid_eq(uid, cred->fsuid))
> > > + ret = !(mode & S_IWUSR);
> > > + else if (gid_eq(gid, cred->fsgid))
> > > + ret = !(mode & S_IWGRP);
> > > + else
> > > + ret = !(mode & S_IWOTH);
> > > +out:
> > > + nfs_put_delegation(delegation);
> > > + return ret;
> > > +}
> > > +
> > > +/*
> > > * The file is not closed if it is opened due to the a request to
> > > change
> > > * the size of the file. The open call will not be needed once the
> > > * VFS layer lookup-intents are implemented.
> > > @@ -4555,9 +4603,19 @@ nfs4_proc_setattr(struct dentry *dentry,
> > > struct nfs_fattr *fattr,
> > > cred = ctx->cred;
> > > }
> > >
> > > - /* Return any delegations if we're going to change ACLs */
> > > - if ((sattr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0)
> > > - nfs4_inode_make_writeable(inode);
> > > + /*
> > > + * A change to mode, owner, or group that removes the write
> > > + * delegation holder's own write access makes the
> > > delegation's cached
> > > + * "open for write" stale; return it so a later open()
> > > revalidates
> > > + * access with the server. A change that keeps write access
> > > leaves
> > > + * the delegation in place.
> > > + */
> > > + if (sattr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) {
> > > + if (nfs4_setattr_removes_write(inode, sattr))
> > > + nfs4_inode_return_delegation(inode);
> > > + else
> > > + nfs4_inode_make_writeable(inode);
> > > + }
> > >
> > > status = nfs4_do_setattr(inode, cred, fattr, sattr, ctx,
> > > NULL);
> > > if (status == 0) {
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trondmy@kernel.org, trond.myklebust@hammerspace.com
> >
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access
2026-05-29 14:00 ` Trond Myklebust
2026-05-29 15:27 ` Rick Macklem
@ 2026-05-29 16:22 ` Benjamin Coddington
2026-05-29 23:20 ` Rick Macklem
1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Coddington @ 2026-05-29 16:22 UTC (permalink / raw)
To: Trond Myklebust
Cc: Benjamin Coddington, Anna Schumaker, linux-nfs, linux-kernel
On 29 May 2026, at 10:00, Trond Myklebust wrote:
>
> Hmmm... I'd argue that while recalling the delegation in this case is
> mandatory for NFSv4.0, that is certainly not true for NFSv4.1.
>
> Furthermore, I'd argue that if the holder of a write delegation is just
> changing the mode, then that should never result in a delegation recall
> for a well written NFSv4.1 server. The reason is this does not impact
> the client's ability to cache data, metadata or lock state. It only
> impacts its ability to rely on previously cached access data when
> handling new opens.
>
> One can argue whether or not it's needed for a uid or gid change by
> said holder of the delegation, but there too I'd say the right
> behaviour is to err on the side of not recalling.
> The exception might be if this is an attribute delegation, and the
> result will be that the cred attached to the delegation will no longer
> be able to issue a SETATTR to update the atime/mtime on delegation
> return.
>
> So yes to pre-emptive invalidation of the access cache, but I'm very
> sceptical to actually pre-emptively returning the delegation or even
> the layouts.
The latency I was chasing comes from the server electing to recall on these
SETATTRs.
You're right that for v4.1 the mode change doesn't need to trigger a recall,
and that the only client-side exposure is stale cached access on subsequent
opens. It's already covered in the SETATTR reply path changing mode/uid/gid
where it sets NFS_INO_INVALID_ACCESS in nfs_update_inode(), and a fresh open
over the delegation still goes through nfs_may_open() -> nfs_do_access(), so
it revalidates with the server rather than trusting the cached result.
The server isn't recalling these for cache coherence. When the change
removes the holder's write, a later SETATTR(size) under the delegation
carries the delegation stateid, not an open stateid
(nfs4_copy_delegation_stateid()), and the client can reopen O_WRONLY from
the delegation with no OPEN on the wire — so the server can't tell whether
the holder still has the file open for write, nor apply the usual
"open-for-write overrides current mode" check for the truncate. The recall
forces a DELEGRETURN + CLAIM_DELEGATE_CUR reclaim that re-establishes a real
(or absent) open stateid, letting the server decide the truncate correctly.
For write-retaining SETATTRs none of that applies and the server shouldn't
recall — which is where the latency I was chasing actually came from, and
the better place to remove it.
Thanks for the review.
Ben
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access
2026-05-29 15:27 ` Rick Macklem
2026-05-29 15:46 ` Rick Macklem
@ 2026-05-29 19:46 ` Benjamin Coddington
1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Coddington @ 2026-05-29 19:46 UTC (permalink / raw)
To: Rick Macklem
Cc: Trond Myklebust, Benjamin Coddington, Anna Schumaker, linux-nfs,
linux-kernel
On 29 May 2026, at 11:27, Rick Macklem wrote:
> On Fri, May 29, 2026 at 7:06 AM Trond Myklebust <trondmy@kernel.org> wrote:
>>
>> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If you are unsure, forward the message to ITHelp@uoguelph.ca for review.
>>
>>
>> On Thu, 2026-05-28 at 15:22 -0400, Benjamin Coddington wrote:
>>> A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a
>>> later
>>> open(O_WRONLY) from the cached delegation (can_open_delegated())
>>> without
>>> sending an OPEN to the server. That cached "open for write" assertion
>>> is
>>> only valid while the delegation holder still has write access. A
>>> SETATTR
>>> that changes mode, owner, or group can revoke that access -- after
>>> which an
>>> open served from the delegation would bypass an access check the
>>> server
>>> would now fail, and, against a server that recalls the delegation on
>>> such a
>>> change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry
>>> round
>>> trip.
>>>
>>> Before issuing such a SETATTR, check whether the proposed
>>> mode/owner/group
>>> would remove write access for the delegation's owning credential,
>>> judged by
>>> the resulting POSIX mode bits. If so, return the delegation first:
>>> the
>>> return is synchronous and flushes modified data, so the SETATTR
>>> proceeds on
>>> an open or special stateid and the next open revalidates access with
>>> the
>>> server. Permission changes that keep the holder's write access leave
>>> the
>>> delegation in place.
>>>
>>> Only the mode bits and the holder's fsuid/fsgid are consulted. An
>>> NFSv4 ACL
>>> cannot be evaluated by the client, a privileged caller may retain
>>> access the
>>> bits deny, and supplementary group membership is not checked, so the
>>> test is
>>> necessarily approximate -- but an inexact answer costs at most an
>>> unnecessary delegation return or a fall back to the server's recall,
>>> never
>>> incorrect access.
>>>
>>> RFC 8881 Section 10.4.4 permits a client to return a delegation
>>> voluntarily,
>>> performing the same pre-return state updates (data flush, pending
>>> truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit
>>> c01d36457dcc
>>> ("NFSv4: Don't return the delegation when not needed by NFSv4.x
>>> (x>0)")
>>> stopped returning write delegations on SETATTR for NFSv4.1+, since
>>> the
>>> server can identify the delegation holder from the SEQUENCE clientid
>>> and
>>> need not recall. That holds for changes that do not affect the
>>> holder's
>>> access; restore a return only for the narrow case where the holder's
>>> own
>>> write access is removed.
>>
>> Hmmm... I'd argue that while recalling the delegation in this case is
>> mandatory for NFSv4.0, that is certainly not true for NFSv4.1.
>>
>> Furthermore, I'd argue that if the holder of a write delegation is just
>> changing the mode, then that should never result in a delegation recall
>> for a well written NFSv4.1 server. The reason is this does not impact
>> the client's ability to cache data, metadata or lock state. It only
>> impacts its ability to rely on previously cached access data when
>> handling new opens.
> I'm not sure I completely agree with this statement. The case I would
> be concerned about is delayed writes sitting in the client.
Those stay safe on Linux. The client flushes cached writes on the delegation
(or open) stateid, and knfsd authorizes a WRITE from the stateid's granted
access, not from a re-check of the current mode — nfsd4_write() just rides
the already-open, write-capable nfsd_file resolved from the stateid, so a
later mode change doesn't block the holder's writes. And if the server does
recall, DELEGRETURN flushes the dirty data before the delegation goes back.
The only way to lose data is a server that neither honors the holder's
stateid writes nor recalls.
> Maybe an NFSv4.1/4.2 server should always allow writes from a
> client that holds a write delegation for the file, but I don't think that
> is spelled out in RFC8881 (I'm never sure, given that monstrous
> document) and I'll admit that the FreeBSD server
> does not do that. The FreeBSD server currently does always allow the
> owner of the file to do writes, but does not do the same w.r.t. write
> delegation held by the client. (I'll think about adding that override,
> because it does seem reasonable.)
>
> What does the Linux knfsd currently do w.r.t. allowing writes
> from a client that holds a write delegation?
It allows them. WRITE resolves the delegation (or open) stateid to an open
write-capable nfsd_file and writes through it; there's no per-WRITE mode
check in nfsd4_write() -> nfsd_vfs_write(). So knfsd effectively already
does the override you're considering for FreeBSD, just implicitly — write
authorization comes from the open/deleg stateid, the same principle as the
truncate-via-stateid case.
> Certainly setting mode bits won't be a problem and clearing
> owner mode bits isn't a problem for the FreeBSD server.
>
> Oh, and one more quirky corner..
> If the server provided a non-empty ACE for the permissions
> field for the write delegation, these SETATTR changes either
> require the server to recall the delegation or the client to
> invalidate use of this ACE.
>
> I'd suggest that the client invalidate use of the ACE (if it
> ever uses it?) and leave delegation recall up to the server.
The Linux client never uses it — decode_rw_delegation() parses the
delegation's permissions ACE and discards it (decode_ace() is passed a NULL
sink), and the client always does its own ACCESS. So there's nothing to
invalidate on our side, and agreed: the recall decision belongs to the
server.
...
>> The exception might be if this is an attribute delegation, and the
>> result will be that the cred attached to the delegation will no longer
>> be able to issue a SETATTR to update the atime/mtime on delegation
>> return.
> Lost me. What's an attribute delegation?
I'll let Trond give the canonical definition, but as I understand it: the
*_ATTRS_DELEG delegation variants additionally delegate attribute authority
— notably the timestamps — so the client can hold and modify them and write
them back at DELEGRETURN. That return-time SETATTR is what Trond meant about
the holder's cred needing to retain access.
Ben
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access
2026-05-29 16:22 ` Benjamin Coddington
@ 2026-05-29 23:20 ` Rick Macklem
0 siblings, 0 replies; 8+ messages in thread
From: Rick Macklem @ 2026-05-29 23:20 UTC (permalink / raw)
To: Benjamin Coddington
Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-kernel
On Fri, May 29, 2026 at 9:22 AM Benjamin Coddington
<ben.coddington@hammerspace.com> wrote:
>
> On 29 May 2026, at 10:00, Trond Myklebust wrote:
> >
> > Hmmm... I'd argue that while recalling the delegation in this case is
> > mandatory for NFSv4.0, that is certainly not true for NFSv4.1.
> >
> > Furthermore, I'd argue that if the holder of a write delegation is just
> > changing the mode, then that should never result in a delegation recall
> > for a well written NFSv4.1 server. The reason is this does not impact
> > the client's ability to cache data, metadata or lock state. It only
> > impacts its ability to rely on previously cached access data when
> > handling new opens.
> >
> > One can argue whether or not it's needed for a uid or gid change by
> > said holder of the delegation, but there too I'd say the right
> > behaviour is to err on the side of not recalling.
> > The exception might be if this is an attribute delegation, and the
> > result will be that the cred attached to the delegation will no longer
> > be able to issue a SETATTR to update the atime/mtime on delegation
> > return.
> >
> > So yes to pre-emptive invalidation of the access cache, but I'm very
> > sceptical to actually pre-emptively returning the delegation or even
> > the layouts.
>
> The latency I was chasing comes from the server electing to recall on these
> SETATTRs.
>
> You're right that for v4.1 the mode change doesn't need to trigger a recall,
> and that the only client-side exposure is stale cached access on subsequent
> opens. It's already covered in the SETATTR reply path changing mode/uid/gid
> where it sets NFS_INO_INVALID_ACCESS in nfs_update_inode(), and a fresh open
> over the delegation still goes through nfs_may_open() -> nfs_do_access(), so
> it revalidates with the server rather than trusting the cached result.
>
> The server isn't recalling these for cache coherence. When the change
> removes the holder's write, a later SETATTR(size) under the delegation
> carries the delegation stateid, not an open stateid
> (nfs4_copy_delegation_stateid()), and the client can reopen O_WRONLY from
> the delegation with no OPEN on the wire — so the server can't tell whether
> the holder still has the file open for write, nor apply the usual
> "open-for-write overrides current mode" check for the truncate. The recall
> forces a DELEGRETURN + CLAIM_DELEGATE_CUR reclaim that re-establishes a real
> (or absent) open stateid, letting the server decide the truncate correctly.
I will note that, since your server allows WRITEs based on the
delegation stateid,
it could allow SETATTR(size) based on a delegation stateid as well.
(SETATTR(size)
is just a weird variant of WRITE. is it not.) Now, how to implement
this in the server
would be up to you guys.
I suppose the disadvantage of doing the DELEGRETURN pre-emptively will never
allow server implementations to figure out how to allow it without a
CB_RECALL, etc.
I do plan on patching the FreeBSD server to allow READ, WRITE and SETATTR(size)
based on a write delegation stateid (and READ based on a read
delegation stateid).
Of course it will take a while for that to find its way into the
world. The FreeBSD
server will be doing the CB_RECALL in the meantime.
Useful discussion. Thanks, rick
>
> For write-retaining SETATTRs none of that applies and the server shouldn't
> recall — which is where the latency I was chasing actually came from, and
> the better place to remove it.
>
> Thanks for the review.
>
> Ben
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-29 23:20 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28 19:22 [PATCH 0/1] Client optimize away CB_RECALL for write delegation Benjamin Coddington
2026-05-28 19:22 ` [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access Benjamin Coddington
2026-05-29 14:00 ` Trond Myklebust
2026-05-29 15:27 ` Rick Macklem
2026-05-29 15:46 ` Rick Macklem
2026-05-29 19:46 ` Benjamin Coddington
2026-05-29 16:22 ` Benjamin Coddington
2026-05-29 23:20 ` Rick Macklem
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox