* [PATCH 0/1] Client optimize away CB_RECALL for write delegation @ 2026-05-28 19:22 Benjamin Coddington 2026-05-28 19:22 ` [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access Benjamin Coddington 0 siblings, 1 reply; 8+ messages in thread From: Benjamin Coddington @ 2026-05-28 19:22 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, linux-kernel A client holding a write delegation that performs a SETATTR that would remove the client's write access can expect the server to recall that delegation. The following patch addresses the simplest case to dectect this and then preemptively return the delegation rather than have server recall it. Benjamin Coddington (1): nfs: return a write delegation when a SETATTR drops our write access fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 62 insertions(+), 4 deletions(-) base-commit: e7ae89a0c97ce2b68b0983cd01eda67cf373517d -- 2.53.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access 2026-05-28 19:22 [PATCH 0/1] Client optimize away CB_RECALL for write delegation Benjamin Coddington @ 2026-05-28 19:22 ` Benjamin Coddington 2026-05-29 14:00 ` Trond Myklebust 0 siblings, 1 reply; 8+ messages in thread From: Benjamin Coddington @ 2026-05-28 19:22 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, linux-kernel A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a later open(O_WRONLY) from the cached delegation (can_open_delegated()) without sending an OPEN to the server. That cached "open for write" assertion is only valid while the delegation holder still has write access. A SETATTR that changes mode, owner, or group can revoke that access -- after which an open served from the delegation would bypass an access check the server would now fail, and, against a server that recalls the delegation on such a change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry round trip. Before issuing such a SETATTR, check whether the proposed mode/owner/group would remove write access for the delegation's owning credential, judged by the resulting POSIX mode bits. If so, return the delegation first: the return is synchronous and flushes modified data, so the SETATTR proceeds on an open or special stateid and the next open revalidates access with the server. Permission changes that keep the holder's write access leave the delegation in place. Only the mode bits and the holder's fsuid/fsgid are consulted. An NFSv4 ACL cannot be evaluated by the client, a privileged caller may retain access the bits deny, and supplementary group membership is not checked, so the test is necessarily approximate -- but an inexact answer costs at most an unnecessary delegation return or a fall back to the server's recall, never incorrect access. RFC 8881 Section 10.4.4 permits a client to return a delegation voluntarily, performing the same pre-return state updates (data flush, pending truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit c01d36457dcc ("NFSv4: Don't return the delegation when not needed by NFSv4.x (x>0)") stopped returning write delegations on SETATTR for NFSv4.1+, since the server can identify the delegation holder from the SEQUENCE clientid and need not recall. That holds for changes that do not affect the holder's access; restore a return only for the narrow case where the holder's own write access is removed. Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com> --- fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index a9b8d482d289..e4b7322bf75c 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -4506,7 +4506,55 @@ int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle, return err; } -/* +/* + * Would applying @sattr (which changes mode, owner, and/or group) remove the + * write access of a held write delegation's owning credential, as judged by + * the resulting file mode bits? + * + * Such a change makes the delegation's cached "open for write" assertion + * stale: a later open(O_WRONLY) could be served from the delegation without + * the server getting a chance to deny it. Only the mode bits and the + * holder's fsuid/fsgid are consulted; an NFSv4 ACL (which the client cannot + * evaluate locally), a privileged caller, or supplementary group membership + * may make the answer imprecise, but the cost is at most an unnecessary + * delegation return or a fall back to the server's recall -- never incorrect + * access. + */ +static bool nfs4_setattr_removes_write(struct inode *inode, struct iattr *sattr) +{ + struct nfs_delegation *delegation; + const struct cred *cred; + umode_t mode = inode->i_mode; + kuid_t uid = inode->i_uid; + kgid_t gid = inode->i_gid; + bool ret = false; + + delegation = nfs4_get_valid_delegation(inode); + if (!delegation) + return false; + if (!(delegation->type & FMODE_WRITE)) + goto out; + cred = delegation->cred; + + if (sattr->ia_valid & ATTR_MODE) + mode = sattr->ia_mode; + if (sattr->ia_valid & ATTR_UID) + uid = sattr->ia_uid; + if (sattr->ia_valid & ATTR_GID) + gid = sattr->ia_gid; + + if (uid_eq(uid, cred->fsuid)) + ret = !(mode & S_IWUSR); + else if (gid_eq(gid, cred->fsgid)) + ret = !(mode & S_IWGRP); + else + ret = !(mode & S_IWOTH); +out: + nfs_put_delegation(delegation); + return ret; +} + +/* * The file is not closed if it is opened due to the a request to change * the size of the file. The open call will not be needed once the * VFS layer lookup-intents are implemented. @@ -4555,9 +4603,19 @@ nfs4_proc_setattr(struct dentry *dentry, struct nfs_fattr *fattr, cred = ctx->cred; } - /* Return any delegations if we're going to change ACLs */ - if ((sattr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0) - nfs4_inode_make_writeable(inode); + /* + * A change to mode, owner, or group that removes the write + * delegation holder's own write access makes the delegation's cached + * "open for write" stale; return it so a later open() revalidates + * access with the server. A change that keeps write access leaves + * the delegation in place. + */ + if (sattr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) { + if (nfs4_setattr_removes_write(inode, sattr)) + nfs4_inode_return_delegation(inode); + else + nfs4_inode_make_writeable(inode); + } status = nfs4_do_setattr(inode, cred, fattr, sattr, ctx, NULL); if (status == 0) { -- 2.53.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access 2026-05-28 19:22 ` [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access Benjamin Coddington @ 2026-05-29 14:00 ` Trond Myklebust 2026-05-29 15:27 ` Rick Macklem 2026-05-29 16:22 ` Benjamin Coddington 0 siblings, 2 replies; 8+ messages in thread From: Trond Myklebust @ 2026-05-29 14:00 UTC (permalink / raw) To: Benjamin Coddington, Anna Schumaker; +Cc: linux-nfs, linux-kernel On Thu, 2026-05-28 at 15:22 -0400, Benjamin Coddington wrote: > A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a > later > open(O_WRONLY) from the cached delegation (can_open_delegated()) > without > sending an OPEN to the server. That cached "open for write" assertion > is > only valid while the delegation holder still has write access. A > SETATTR > that changes mode, owner, or group can revoke that access -- after > which an > open served from the delegation would bypass an access check the > server > would now fail, and, against a server that recalls the delegation on > such a > change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry > round > trip. > > Before issuing such a SETATTR, check whether the proposed > mode/owner/group > would remove write access for the delegation's owning credential, > judged by > the resulting POSIX mode bits. If so, return the delegation first: > the > return is synchronous and flushes modified data, so the SETATTR > proceeds on > an open or special stateid and the next open revalidates access with > the > server. Permission changes that keep the holder's write access leave > the > delegation in place. > > Only the mode bits and the holder's fsuid/fsgid are consulted. An > NFSv4 ACL > cannot be evaluated by the client, a privileged caller may retain > access the > bits deny, and supplementary group membership is not checked, so the > test is > necessarily approximate -- but an inexact answer costs at most an > unnecessary delegation return or a fall back to the server's recall, > never > incorrect access. > > RFC 8881 Section 10.4.4 permits a client to return a delegation > voluntarily, > performing the same pre-return state updates (data flush, pending > truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit > c01d36457dcc > ("NFSv4: Don't return the delegation when not needed by NFSv4.x > (x>0)") > stopped returning write delegations on SETATTR for NFSv4.1+, since > the > server can identify the delegation holder from the SEQUENCE clientid > and > need not recall. That holds for changes that do not affect the > holder's > access; restore a return only for the narrow case where the holder's > own > write access is removed. Hmmm... I'd argue that while recalling the delegation in this case is mandatory for NFSv4.0, that is certainly not true for NFSv4.1. Furthermore, I'd argue that if the holder of a write delegation is just changing the mode, then that should never result in a delegation recall for a well written NFSv4.1 server. The reason is this does not impact the client's ability to cache data, metadata or lock state. It only impacts its ability to rely on previously cached access data when handling new opens. One can argue whether or not it's needed for a uid or gid change by said holder of the delegation, but there too I'd say the right behaviour is to err on the side of not recalling. The exception might be if this is an attribute delegation, and the result will be that the cred attached to the delegation will no longer be able to issue a SETATTR to update the atime/mtime on delegation return. So yes to pre-emptive invalidation of the access cache, but I'm very sceptical to actually pre-emptively returning the delegation or even the layouts. > > Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com> > --- > fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++- > -- > 1 file changed, 62 insertions(+), 4 deletions(-) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index a9b8d482d289..e4b7322bf75c 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -4506,7 +4506,55 @@ int nfs4_proc_getattr(struct nfs_server > *server, struct nfs_fh *fhandle, > return err; > } > > -/* > +/* > + * Would applying @sattr (which changes mode, owner, and/or group) > remove the > + * write access of a held write delegation's owning credential, as > judged by > + * the resulting file mode bits? > + * > + * Such a change makes the delegation's cached "open for write" > assertion > + * stale: a later open(O_WRONLY) could be served from the delegation > without > + * the server getting a chance to deny it. Only the mode bits and > the > + * holder's fsuid/fsgid are consulted; an NFSv4 ACL (which the > client cannot > + * evaluate locally), a privileged caller, or supplementary group > membership > + * may make the answer imprecise, but the cost is at most an > unnecessary > + * delegation return or a fall back to the server's recall -- never > incorrect > + * access. > + */ > +static bool nfs4_setattr_removes_write(struct inode *inode, struct > iattr *sattr) > +{ > + struct nfs_delegation *delegation; > + const struct cred *cred; > + umode_t mode = inode->i_mode; > + kuid_t uid = inode->i_uid; > + kgid_t gid = inode->i_gid; > + bool ret = false; > + > + delegation = nfs4_get_valid_delegation(inode); > + if (!delegation) > + return false; > + if (!(delegation->type & FMODE_WRITE)) > + goto out; > + cred = delegation->cred; > + > + if (sattr->ia_valid & ATTR_MODE) > + mode = sattr->ia_mode; > + if (sattr->ia_valid & ATTR_UID) > + uid = sattr->ia_uid; > + if (sattr->ia_valid & ATTR_GID) > + gid = sattr->ia_gid; > + > + if (uid_eq(uid, cred->fsuid)) > + ret = !(mode & S_IWUSR); > + else if (gid_eq(gid, cred->fsgid)) > + ret = !(mode & S_IWGRP); > + else > + ret = !(mode & S_IWOTH); > +out: > + nfs_put_delegation(delegation); > + return ret; > +} > + > +/* > * The file is not closed if it is opened due to the a request to > change > * the size of the file. The open call will not be needed once the > * VFS layer lookup-intents are implemented. > @@ -4555,9 +4603,19 @@ nfs4_proc_setattr(struct dentry *dentry, > struct nfs_fattr *fattr, > cred = ctx->cred; > } > > - /* Return any delegations if we're going to change ACLs */ > - if ((sattr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0) > - nfs4_inode_make_writeable(inode); > + /* > + * A change to mode, owner, or group that removes the write > + * delegation holder's own write access makes the > delegation's cached > + * "open for write" stale; return it so a later open() > revalidates > + * access with the server. A change that keeps write access > leaves > + * the delegation in place. > + */ > + if (sattr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) { > + if (nfs4_setattr_removes_write(inode, sattr)) > + nfs4_inode_return_delegation(inode); > + else > + nfs4_inode_make_writeable(inode); > + } > > status = nfs4_do_setattr(inode, cred, fattr, sattr, ctx, > NULL); > if (status == 0) { -- Trond Myklebust Linux NFS client maintainer, Hammerspace trondmy@kernel.org, trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access 2026-05-29 14:00 ` Trond Myklebust @ 2026-05-29 15:27 ` Rick Macklem 2026-05-29 15:46 ` Rick Macklem 2026-05-29 19:46 ` Benjamin Coddington 2026-05-29 16:22 ` Benjamin Coddington 1 sibling, 2 replies; 8+ messages in thread From: Rick Macklem @ 2026-05-29 15:27 UTC (permalink / raw) To: Trond Myklebust Cc: Benjamin Coddington, Anna Schumaker, linux-nfs, linux-kernel On Fri, May 29, 2026 at 7:06 AM Trond Myklebust <trondmy@kernel.org> wrote: > > CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If you are unsure, forward the message to ITHelp@uoguelph.ca for review. > > > On Thu, 2026-05-28 at 15:22 -0400, Benjamin Coddington wrote: > > A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a > > later > > open(O_WRONLY) from the cached delegation (can_open_delegated()) > > without > > sending an OPEN to the server. That cached "open for write" assertion > > is > > only valid while the delegation holder still has write access. A > > SETATTR > > that changes mode, owner, or group can revoke that access -- after > > which an > > open served from the delegation would bypass an access check the > > server > > would now fail, and, against a server that recalls the delegation on > > such a > > change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry > > round > > trip. > > > > Before issuing such a SETATTR, check whether the proposed > > mode/owner/group > > would remove write access for the delegation's owning credential, > > judged by > > the resulting POSIX mode bits. If so, return the delegation first: > > the > > return is synchronous and flushes modified data, so the SETATTR > > proceeds on > > an open or special stateid and the next open revalidates access with > > the > > server. Permission changes that keep the holder's write access leave > > the > > delegation in place. > > > > Only the mode bits and the holder's fsuid/fsgid are consulted. An > > NFSv4 ACL > > cannot be evaluated by the client, a privileged caller may retain > > access the > > bits deny, and supplementary group membership is not checked, so the > > test is > > necessarily approximate -- but an inexact answer costs at most an > > unnecessary delegation return or a fall back to the server's recall, > > never > > incorrect access. > > > > RFC 8881 Section 10.4.4 permits a client to return a delegation > > voluntarily, > > performing the same pre-return state updates (data flush, pending > > truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit > > c01d36457dcc > > ("NFSv4: Don't return the delegation when not needed by NFSv4.x > > (x>0)") > > stopped returning write delegations on SETATTR for NFSv4.1+, since > > the > > server can identify the delegation holder from the SEQUENCE clientid > > and > > need not recall. That holds for changes that do not affect the > > holder's > > access; restore a return only for the narrow case where the holder's > > own > > write access is removed. > > Hmmm... I'd argue that while recalling the delegation in this case is > mandatory for NFSv4.0, that is certainly not true for NFSv4.1. > > Furthermore, I'd argue that if the holder of a write delegation is just > changing the mode, then that should never result in a delegation recall > for a well written NFSv4.1 server. The reason is this does not impact > the client's ability to cache data, metadata or lock state. It only > impacts its ability to rely on previously cached access data when > handling new opens. I'm not sure I completely agree with this statement. The case I would be concerned about is delayed writes sitting in the client. Maybe an NFSv4.1/4.2 server should always allow writes from a client that holds a write delegation for the file, but I don't think that is spelled out in RFC8881 (I'm never sure, given that monstrous document) and I'll admit that the FreeBSD server does not do that. The FreeBSD server currently does always allow the owner of the file to do writes, but does not do the same w.r.t. write delegation held by the client. (I'll think about adding that override, because it does seem reasonable.) What does the Linux knfsd currently do w.r.t. allowing writes from a client that holds a write delegation? Certainly setting mode bits won't be a problem and clearing owner mode bits isn't a problem for the FreeBSD server. > > One can argue whether or not it's needed for a uid or gid change by > said holder of the delegation, but there too I'd say the right > behaviour is to err on the side of not recalling. I might argue that whether or not clearing mode bits requires a delegation recall should be left up to the server. > The exception might be if this is an attribute delegation, and the > result will be that the cred attached to the delegation will no longer > be able to issue a SETATTR to update the atime/mtime on delegation > return. Lost me. What's an attribute delegation? rick > > So yes to pre-emptive invalidation of the access cache, but I'm very > sceptical to actually pre-emptively returning the delegation or even > the layouts. > > > > > Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com> > > --- > > fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++- > > -- > > 1 file changed, 62 insertions(+), 4 deletions(-) > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > index a9b8d482d289..e4b7322bf75c 100644 > > --- a/fs/nfs/nfs4proc.c > > +++ b/fs/nfs/nfs4proc.c > > @@ -4506,7 +4506,55 @@ int nfs4_proc_getattr(struct nfs_server > > *server, struct nfs_fh *fhandle, > > return err; > > } > > > > -/* > > +/* > > + * Would applying @sattr (which changes mode, owner, and/or group) > > remove the > > + * write access of a held write delegation's owning credential, as > > judged by > > + * the resulting file mode bits? > > + * > > + * Such a change makes the delegation's cached "open for write" > > assertion > > + * stale: a later open(O_WRONLY) could be served from the delegation > > without > > + * the server getting a chance to deny it. Only the mode bits and > > the > > + * holder's fsuid/fsgid are consulted; an NFSv4 ACL (which the > > client cannot > > + * evaluate locally), a privileged caller, or supplementary group > > membership > > + * may make the answer imprecise, but the cost is at most an > > unnecessary > > + * delegation return or a fall back to the server's recall -- never > > incorrect > > + * access. > > + */ > > +static bool nfs4_setattr_removes_write(struct inode *inode, struct > > iattr *sattr) > > +{ > > + struct nfs_delegation *delegation; > > + const struct cred *cred; > > + umode_t mode = inode->i_mode; > > + kuid_t uid = inode->i_uid; > > + kgid_t gid = inode->i_gid; > > + bool ret = false; > > + > > + delegation = nfs4_get_valid_delegation(inode); > > + if (!delegation) > > + return false; > > + if (!(delegation->type & FMODE_WRITE)) > > + goto out; > > + cred = delegation->cred; > > + > > + if (sattr->ia_valid & ATTR_MODE) > > + mode = sattr->ia_mode; > > + if (sattr->ia_valid & ATTR_UID) > > + uid = sattr->ia_uid; > > + if (sattr->ia_valid & ATTR_GID) > > + gid = sattr->ia_gid; > > + > > + if (uid_eq(uid, cred->fsuid)) > > + ret = !(mode & S_IWUSR); > > + else if (gid_eq(gid, cred->fsgid)) > > + ret = !(mode & S_IWGRP); > > + else > > + ret = !(mode & S_IWOTH); > > +out: > > + nfs_put_delegation(delegation); > > + return ret; > > +} > > + > > +/* > > * The file is not closed if it is opened due to the a request to > > change > > * the size of the file. The open call will not be needed once the > > * VFS layer lookup-intents are implemented. > > @@ -4555,9 +4603,19 @@ nfs4_proc_setattr(struct dentry *dentry, > > struct nfs_fattr *fattr, > > cred = ctx->cred; > > } > > > > - /* Return any delegations if we're going to change ACLs */ > > - if ((sattr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0) > > - nfs4_inode_make_writeable(inode); > > + /* > > + * A change to mode, owner, or group that removes the write > > + * delegation holder's own write access makes the > > delegation's cached > > + * "open for write" stale; return it so a later open() > > revalidates > > + * access with the server. A change that keeps write access > > leaves > > + * the delegation in place. > > + */ > > + if (sattr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) { > > + if (nfs4_setattr_removes_write(inode, sattr)) > > + nfs4_inode_return_delegation(inode); > > + else > > + nfs4_inode_make_writeable(inode); > > + } > > > > status = nfs4_do_setattr(inode, cred, fattr, sattr, ctx, > > NULL); > > if (status == 0) { > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trondmy@kernel.org, trond.myklebust@hammerspace.com > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access 2026-05-29 15:27 ` Rick Macklem @ 2026-05-29 15:46 ` Rick Macklem 2026-05-29 19:46 ` Benjamin Coddington 1 sibling, 0 replies; 8+ messages in thread From: Rick Macklem @ 2026-05-29 15:46 UTC (permalink / raw) To: Trond Myklebust Cc: Benjamin Coddington, Anna Schumaker, linux-nfs, linux-kernel On Fri, May 29, 2026 at 8:27 AM Rick Macklem <rick.macklem@gmail.com> wrote: > > On Fri, May 29, 2026 at 7:06 AM Trond Myklebust <trondmy@kernel.org> wrote: > > > > CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If you are unsure, forward the message to ITHelp@uoguelph.ca for review. > > > > > > On Thu, 2026-05-28 at 15:22 -0400, Benjamin Coddington wrote: > > > A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a > > > later > > > open(O_WRONLY) from the cached delegation (can_open_delegated()) > > > without > > > sending an OPEN to the server. That cached "open for write" assertion > > > is > > > only valid while the delegation holder still has write access. A > > > SETATTR > > > that changes mode, owner, or group can revoke that access -- after > > > which an > > > open served from the delegation would bypass an access check the > > > server > > > would now fail, and, against a server that recalls the delegation on > > > such a > > > change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry > > > round > > > trip. > > > > > > Before issuing such a SETATTR, check whether the proposed > > > mode/owner/group > > > would remove write access for the delegation's owning credential, > > > judged by > > > the resulting POSIX mode bits. If so, return the delegation first: > > > the > > > return is synchronous and flushes modified data, so the SETATTR > > > proceeds on > > > an open or special stateid and the next open revalidates access with > > > the > > > server. Permission changes that keep the holder's write access leave > > > the > > > delegation in place. > > > > > > Only the mode bits and the holder's fsuid/fsgid are consulted. An > > > NFSv4 ACL > > > cannot be evaluated by the client, a privileged caller may retain > > > access the > > > bits deny, and supplementary group membership is not checked, so the > > > test is > > > necessarily approximate -- but an inexact answer costs at most an > > > unnecessary delegation return or a fall back to the server's recall, > > > never > > > incorrect access. > > > > > > RFC 8881 Section 10.4.4 permits a client to return a delegation > > > voluntarily, > > > performing the same pre-return state updates (data flush, pending > > > truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit > > > c01d36457dcc > > > ("NFSv4: Don't return the delegation when not needed by NFSv4.x > > > (x>0)") > > > stopped returning write delegations on SETATTR for NFSv4.1+, since > > > the > > > server can identify the delegation holder from the SEQUENCE clientid > > > and > > > need not recall. That holds for changes that do not affect the > > > holder's > > > access; restore a return only for the narrow case where the holder's > > > own > > > write access is removed. > > > > Hmmm... I'd argue that while recalling the delegation in this case is > > mandatory for NFSv4.0, that is certainly not true for NFSv4.1. > > > > Furthermore, I'd argue that if the holder of a write delegation is just > > changing the mode, then that should never result in a delegation recall > > for a well written NFSv4.1 server. The reason is this does not impact > > the client's ability to cache data, metadata or lock state. It only > > impacts its ability to rely on previously cached access data when > > handling new opens. > I'm not sure I completely agree with this statement. The case I would > be concerned about is delayed writes sitting in the client. > > Maybe an NFSv4.1/4.2 server should always allow writes from a > client that holds a write delegation for the file, but I don't think that > is spelled out in RFC8881 (I'm never sure, given that monstrous > document) and I'll admit that the FreeBSD server > does not do that. The FreeBSD server currently does always allow the > owner of the file to do writes, but does not do the same w.r.t. write > delegation held by the client. (I'll think about adding that override, > because it does seem reasonable.) > > What does the Linux knfsd currently do w.r.t. allowing writes > from a client that holds a write delegation? > > Certainly setting mode bits won't be a problem and clearing > owner mode bits isn't a problem for the FreeBSD server. > Oh, and one more quirky corner.. If the server provided a non-empty ACE for the permissions field for the write delegation, these SETATTR changes either require the server to recall the delegation or the client to invalidate use of this ACE. I'd suggest that the client invalidate use of the ACE (if it ever uses it?) and leave delegation recall up to the server. rick > > > > One can argue whether or not it's needed for a uid or gid change by > > said holder of the delegation, but there too I'd say the right > > behaviour is to err on the side of not recalling. > I might argue that whether or not clearing mode bits requires a > delegation recall should be left up to the server. > > > The exception might be if this is an attribute delegation, and the > > result will be that the cred attached to the delegation will no longer > > be able to issue a SETATTR to update the atime/mtime on delegation > > return. > Lost me. What's an attribute delegation? > > rick > > > > So yes to pre-emptive invalidation of the access cache, but I'm very > > sceptical to actually pre-emptively returning the delegation or even > > the layouts. > > > > > > > > Signed-off-by: Benjamin Coddington <bcodding@hammerspace.com> > > > --- > > > fs/nfs/nfs4proc.c | 66 ++++++++++++++++++++++++++++++++++++++++++++- > > > -- > > > 1 file changed, 62 insertions(+), 4 deletions(-) > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > > index a9b8d482d289..e4b7322bf75c 100644 > > > --- a/fs/nfs/nfs4proc.c > > > +++ b/fs/nfs/nfs4proc.c > > > @@ -4506,7 +4506,55 @@ int nfs4_proc_getattr(struct nfs_server > > > *server, struct nfs_fh *fhandle, > > > return err; > > > } > > > > > > -/* > > > +/* > > > + * Would applying @sattr (which changes mode, owner, and/or group) > > > remove the > > > + * write access of a held write delegation's owning credential, as > > > judged by > > > + * the resulting file mode bits? > > > + * > > > + * Such a change makes the delegation's cached "open for write" > > > assertion > > > + * stale: a later open(O_WRONLY) could be served from the delegation > > > without > > > + * the server getting a chance to deny it. Only the mode bits and > > > the > > > + * holder's fsuid/fsgid are consulted; an NFSv4 ACL (which the > > > client cannot > > > + * evaluate locally), a privileged caller, or supplementary group > > > membership > > > + * may make the answer imprecise, but the cost is at most an > > > unnecessary > > > + * delegation return or a fall back to the server's recall -- never > > > incorrect > > > + * access. > > > + */ > > > +static bool nfs4_setattr_removes_write(struct inode *inode, struct > > > iattr *sattr) > > > +{ > > > + struct nfs_delegation *delegation; > > > + const struct cred *cred; > > > + umode_t mode = inode->i_mode; > > > + kuid_t uid = inode->i_uid; > > > + kgid_t gid = inode->i_gid; > > > + bool ret = false; > > > + > > > + delegation = nfs4_get_valid_delegation(inode); > > > + if (!delegation) > > > + return false; > > > + if (!(delegation->type & FMODE_WRITE)) > > > + goto out; > > > + cred = delegation->cred; > > > + > > > + if (sattr->ia_valid & ATTR_MODE) > > > + mode = sattr->ia_mode; > > > + if (sattr->ia_valid & ATTR_UID) > > > + uid = sattr->ia_uid; > > > + if (sattr->ia_valid & ATTR_GID) > > > + gid = sattr->ia_gid; > > > + > > > + if (uid_eq(uid, cred->fsuid)) > > > + ret = !(mode & S_IWUSR); > > > + else if (gid_eq(gid, cred->fsgid)) > > > + ret = !(mode & S_IWGRP); > > > + else > > > + ret = !(mode & S_IWOTH); > > > +out: > > > + nfs_put_delegation(delegation); > > > + return ret; > > > +} > > > + > > > +/* > > > * The file is not closed if it is opened due to the a request to > > > change > > > * the size of the file. The open call will not be needed once the > > > * VFS layer lookup-intents are implemented. > > > @@ -4555,9 +4603,19 @@ nfs4_proc_setattr(struct dentry *dentry, > > > struct nfs_fattr *fattr, > > > cred = ctx->cred; > > > } > > > > > > - /* Return any delegations if we're going to change ACLs */ > > > - if ((sattr->ia_valid & (ATTR_MODE|ATTR_UID|ATTR_GID)) != 0) > > > - nfs4_inode_make_writeable(inode); > > > + /* > > > + * A change to mode, owner, or group that removes the write > > > + * delegation holder's own write access makes the > > > delegation's cached > > > + * "open for write" stale; return it so a later open() > > > revalidates > > > + * access with the server. A change that keeps write access > > > leaves > > > + * the delegation in place. > > > + */ > > > + if (sattr->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)) { > > > + if (nfs4_setattr_removes_write(inode, sattr)) > > > + nfs4_inode_return_delegation(inode); > > > + else > > > + nfs4_inode_make_writeable(inode); > > > + } > > > > > > status = nfs4_do_setattr(inode, cred, fattr, sattr, ctx, > > > NULL); > > > if (status == 0) { > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trondmy@kernel.org, trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access 2026-05-29 15:27 ` Rick Macklem 2026-05-29 15:46 ` Rick Macklem @ 2026-05-29 19:46 ` Benjamin Coddington 1 sibling, 0 replies; 8+ messages in thread From: Benjamin Coddington @ 2026-05-29 19:46 UTC (permalink / raw) To: Rick Macklem Cc: Trond Myklebust, Benjamin Coddington, Anna Schumaker, linux-nfs, linux-kernel On 29 May 2026, at 11:27, Rick Macklem wrote: > On Fri, May 29, 2026 at 7:06 AM Trond Myklebust <trondmy@kernel.org> wrote: >> >> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If you are unsure, forward the message to ITHelp@uoguelph.ca for review. >> >> >> On Thu, 2026-05-28 at 15:22 -0400, Benjamin Coddington wrote: >>> A client holding an OPEN_DELEGATE_WRITE delegation can satisfy a >>> later >>> open(O_WRONLY) from the cached delegation (can_open_delegated()) >>> without >>> sending an OPEN to the server. That cached "open for write" assertion >>> is >>> only valid while the delegation holder still has write access. A >>> SETATTR >>> that changes mode, owner, or group can revoke that access -- after >>> which an >>> open served from the delegation would bypass an access check the >>> server >>> would now fail, and, against a server that recalls the delegation on >>> such a >>> change, the SETATTR draws a CB_RECALL/NFS4ERR_DELAY/DELEGRETURN/retry >>> round >>> trip. >>> >>> Before issuing such a SETATTR, check whether the proposed >>> mode/owner/group >>> would remove write access for the delegation's owning credential, >>> judged by >>> the resulting POSIX mode bits. If so, return the delegation first: >>> the >>> return is synchronous and flushes modified data, so the SETATTR >>> proceeds on >>> an open or special stateid and the next open revalidates access with >>> the >>> server. Permission changes that keep the holder's write access leave >>> the >>> delegation in place. >>> >>> Only the mode bits and the holder's fsuid/fsgid are consulted. An >>> NFSv4 ACL >>> cannot be evaluated by the client, a privileged caller may retain >>> access the >>> bits deny, and supplementary group membership is not checked, so the >>> test is >>> necessarily approximate -- but an inexact answer costs at most an >>> unnecessary delegation return or a fall back to the server's recall, >>> never >>> incorrect access. >>> >>> RFC 8881 Section 10.4.4 permits a client to return a delegation >>> voluntarily, >>> performing the same pre-return state updates (data flush, pending >>> truncation, CLOSE/OPEN/LOCK) it would on a recall. Commit >>> c01d36457dcc >>> ("NFSv4: Don't return the delegation when not needed by NFSv4.x >>> (x>0)") >>> stopped returning write delegations on SETATTR for NFSv4.1+, since >>> the >>> server can identify the delegation holder from the SEQUENCE clientid >>> and >>> need not recall. That holds for changes that do not affect the >>> holder's >>> access; restore a return only for the narrow case where the holder's >>> own >>> write access is removed. >> >> Hmmm... I'd argue that while recalling the delegation in this case is >> mandatory for NFSv4.0, that is certainly not true for NFSv4.1. >> >> Furthermore, I'd argue that if the holder of a write delegation is just >> changing the mode, then that should never result in a delegation recall >> for a well written NFSv4.1 server. The reason is this does not impact >> the client's ability to cache data, metadata or lock state. It only >> impacts its ability to rely on previously cached access data when >> handling new opens. > I'm not sure I completely agree with this statement. The case I would > be concerned about is delayed writes sitting in the client. Those stay safe on Linux. The client flushes cached writes on the delegation (or open) stateid, and knfsd authorizes a WRITE from the stateid's granted access, not from a re-check of the current mode — nfsd4_write() just rides the already-open, write-capable nfsd_file resolved from the stateid, so a later mode change doesn't block the holder's writes. And if the server does recall, DELEGRETURN flushes the dirty data before the delegation goes back. The only way to lose data is a server that neither honors the holder's stateid writes nor recalls. > Maybe an NFSv4.1/4.2 server should always allow writes from a > client that holds a write delegation for the file, but I don't think that > is spelled out in RFC8881 (I'm never sure, given that monstrous > document) and I'll admit that the FreeBSD server > does not do that. The FreeBSD server currently does always allow the > owner of the file to do writes, but does not do the same w.r.t. write > delegation held by the client. (I'll think about adding that override, > because it does seem reasonable.) > > What does the Linux knfsd currently do w.r.t. allowing writes > from a client that holds a write delegation? It allows them. WRITE resolves the delegation (or open) stateid to an open write-capable nfsd_file and writes through it; there's no per-WRITE mode check in nfsd4_write() -> nfsd_vfs_write(). So knfsd effectively already does the override you're considering for FreeBSD, just implicitly — write authorization comes from the open/deleg stateid, the same principle as the truncate-via-stateid case. > Certainly setting mode bits won't be a problem and clearing > owner mode bits isn't a problem for the FreeBSD server. > > Oh, and one more quirky corner.. > If the server provided a non-empty ACE for the permissions > field for the write delegation, these SETATTR changes either > require the server to recall the delegation or the client to > invalidate use of this ACE. > > I'd suggest that the client invalidate use of the ACE (if it > ever uses it?) and leave delegation recall up to the server. The Linux client never uses it — decode_rw_delegation() parses the delegation's permissions ACE and discards it (decode_ace() is passed a NULL sink), and the client always does its own ACCESS. So there's nothing to invalidate on our side, and agreed: the recall decision belongs to the server. ... >> The exception might be if this is an attribute delegation, and the >> result will be that the cred attached to the delegation will no longer >> be able to issue a SETATTR to update the atime/mtime on delegation >> return. > Lost me. What's an attribute delegation? I'll let Trond give the canonical definition, but as I understand it: the *_ATTRS_DELEG delegation variants additionally delegate attribute authority — notably the timestamps — so the client can hold and modify them and write them back at DELEGRETURN. That return-time SETATTR is what Trond meant about the holder's cred needing to retain access. Ben ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access 2026-05-29 14:00 ` Trond Myklebust 2026-05-29 15:27 ` Rick Macklem @ 2026-05-29 16:22 ` Benjamin Coddington 2026-05-29 23:20 ` Rick Macklem 1 sibling, 1 reply; 8+ messages in thread From: Benjamin Coddington @ 2026-05-29 16:22 UTC (permalink / raw) To: Trond Myklebust Cc: Benjamin Coddington, Anna Schumaker, linux-nfs, linux-kernel On 29 May 2026, at 10:00, Trond Myklebust wrote: > > Hmmm... I'd argue that while recalling the delegation in this case is > mandatory for NFSv4.0, that is certainly not true for NFSv4.1. > > Furthermore, I'd argue that if the holder of a write delegation is just > changing the mode, then that should never result in a delegation recall > for a well written NFSv4.1 server. The reason is this does not impact > the client's ability to cache data, metadata or lock state. It only > impacts its ability to rely on previously cached access data when > handling new opens. > > One can argue whether or not it's needed for a uid or gid change by > said holder of the delegation, but there too I'd say the right > behaviour is to err on the side of not recalling. > The exception might be if this is an attribute delegation, and the > result will be that the cred attached to the delegation will no longer > be able to issue a SETATTR to update the atime/mtime on delegation > return. > > So yes to pre-emptive invalidation of the access cache, but I'm very > sceptical to actually pre-emptively returning the delegation or even > the layouts. The latency I was chasing comes from the server electing to recall on these SETATTRs. You're right that for v4.1 the mode change doesn't need to trigger a recall, and that the only client-side exposure is stale cached access on subsequent opens. It's already covered in the SETATTR reply path changing mode/uid/gid where it sets NFS_INO_INVALID_ACCESS in nfs_update_inode(), and a fresh open over the delegation still goes through nfs_may_open() -> nfs_do_access(), so it revalidates with the server rather than trusting the cached result. The server isn't recalling these for cache coherence. When the change removes the holder's write, a later SETATTR(size) under the delegation carries the delegation stateid, not an open stateid (nfs4_copy_delegation_stateid()), and the client can reopen O_WRONLY from the delegation with no OPEN on the wire — so the server can't tell whether the holder still has the file open for write, nor apply the usual "open-for-write overrides current mode" check for the truncate. The recall forces a DELEGRETURN + CLAIM_DELEGATE_CUR reclaim that re-establishes a real (or absent) open stateid, letting the server decide the truncate correctly. For write-retaining SETATTRs none of that applies and the server shouldn't recall — which is where the latency I was chasing actually came from, and the better place to remove it. Thanks for the review. Ben ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access 2026-05-29 16:22 ` Benjamin Coddington @ 2026-05-29 23:20 ` Rick Macklem 0 siblings, 0 replies; 8+ messages in thread From: Rick Macklem @ 2026-05-29 23:20 UTC (permalink / raw) To: Benjamin Coddington Cc: Trond Myklebust, Anna Schumaker, linux-nfs, linux-kernel On Fri, May 29, 2026 at 9:22 AM Benjamin Coddington <ben.coddington@hammerspace.com> wrote: > > On 29 May 2026, at 10:00, Trond Myklebust wrote: > > > > Hmmm... I'd argue that while recalling the delegation in this case is > > mandatory for NFSv4.0, that is certainly not true for NFSv4.1. > > > > Furthermore, I'd argue that if the holder of a write delegation is just > > changing the mode, then that should never result in a delegation recall > > for a well written NFSv4.1 server. The reason is this does not impact > > the client's ability to cache data, metadata or lock state. It only > > impacts its ability to rely on previously cached access data when > > handling new opens. > > > > One can argue whether or not it's needed for a uid or gid change by > > said holder of the delegation, but there too I'd say the right > > behaviour is to err on the side of not recalling. > > The exception might be if this is an attribute delegation, and the > > result will be that the cred attached to the delegation will no longer > > be able to issue a SETATTR to update the atime/mtime on delegation > > return. > > > > So yes to pre-emptive invalidation of the access cache, but I'm very > > sceptical to actually pre-emptively returning the delegation or even > > the layouts. > > The latency I was chasing comes from the server electing to recall on these > SETATTRs. > > You're right that for v4.1 the mode change doesn't need to trigger a recall, > and that the only client-side exposure is stale cached access on subsequent > opens. It's already covered in the SETATTR reply path changing mode/uid/gid > where it sets NFS_INO_INVALID_ACCESS in nfs_update_inode(), and a fresh open > over the delegation still goes through nfs_may_open() -> nfs_do_access(), so > it revalidates with the server rather than trusting the cached result. > > The server isn't recalling these for cache coherence. When the change > removes the holder's write, a later SETATTR(size) under the delegation > carries the delegation stateid, not an open stateid > (nfs4_copy_delegation_stateid()), and the client can reopen O_WRONLY from > the delegation with no OPEN on the wire — so the server can't tell whether > the holder still has the file open for write, nor apply the usual > "open-for-write overrides current mode" check for the truncate. The recall > forces a DELEGRETURN + CLAIM_DELEGATE_CUR reclaim that re-establishes a real > (or absent) open stateid, letting the server decide the truncate correctly. I will note that, since your server allows WRITEs based on the delegation stateid, it could allow SETATTR(size) based on a delegation stateid as well. (SETATTR(size) is just a weird variant of WRITE. is it not.) Now, how to implement this in the server would be up to you guys. I suppose the disadvantage of doing the DELEGRETURN pre-emptively will never allow server implementations to figure out how to allow it without a CB_RECALL, etc. I do plan on patching the FreeBSD server to allow READ, WRITE and SETATTR(size) based on a write delegation stateid (and READ based on a read delegation stateid). Of course it will take a while for that to find its way into the world. The FreeBSD server will be doing the CB_RECALL in the meantime. Useful discussion. Thanks, rick > > For write-retaining SETATTRs none of that applies and the server shouldn't > recall — which is where the latency I was chasing actually came from, and > the better place to remove it. > > Thanks for the review. > > Ben > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-29 23:20 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-28 19:22 [PATCH 0/1] Client optimize away CB_RECALL for write delegation Benjamin Coddington 2026-05-28 19:22 ` [PATCH 1/1] nfs: return a write delegation when a SETATTR drops our write access Benjamin Coddington 2026-05-29 14:00 ` Trond Myklebust 2026-05-29 15:27 ` Rick Macklem 2026-05-29 15:46 ` Rick Macklem 2026-05-29 19:46 ` Benjamin Coddington 2026-05-29 16:22 ` Benjamin Coddington 2026-05-29 23:20 ` Rick Macklem
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox