* [PATCH v2] nfsd: fix file change detection in CB_GETATTR
@ 2026-04-04 0:54 Scott Mayhew
2026-04-04 15:07 ` Chuck Lever
0 siblings, 1 reply; 10+ messages in thread
From: Scott Mayhew @ 2026-04-04 0:54 UTC (permalink / raw)
To: Chuck Lever, Jeff Layton
Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs
RFC 8881, section 10.4.3 doesn't say anything about caching the file
size in the delegation record, nor does it say anything about comparing
a cached file size with the size reported by the client in the
CB_GETATTR reply for the purpose of determining if the client holds
modified data for the file.
What section 10.4.3 of RFC 8881 does say is that the server should
compare the *current* file size with size reported by the client holding
the delegation in the CB_GETATTR reply, and if they differ to treat it
as a modification regardless of the change attribute retrieved via the
CB_GETATTR.
Doing otherwise would cause the server to believe the client holding the
delegation has a modified version of the file, even if the client
flushed the modifications to the server prior to the CB_GETATTR. This
would have the added side effect of subsequent CB_GETATTRs causing
updates to the mtime, ctime, and change attribute even if the client
holding the delegation makes no further updates to the file.
Modify nfsd4_deleg_getattr_conflict() to obtain the current file size
via vfs_getattr(). Retain the ncf_cur_fsize field, since it's a
convenient way to return the file size back to nfsd4_encode_fattr4(),
but don't use it for the purpose of detecting file changes.
Also, if we recall the delegation (because the client didn't respond to
the CB_GETATTR), then skip the logic that checks the nfs4_cb_fattr
fields.
Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
---
This patch is against Chuck's nfsd-testing branch.
A pynfs test that illustrates the issue is available here (delegated
timestamps must be disabled to make the test fail):
https://lore.kernel.org/linux-nfs/20260404003050.1560149-6-smayhew@redhat.com/T/#u
v2 changes:
- update kerneldoc comment for nfsd4_deleg_getattr_conflict()
- relocated declarations in nfsd4_deleg_getattr_conflict() to maintain
reverse-xmas tree ordering
- pass the struct kstat from the nfsd4_fattr_args to
nfsd4_deleg_getattr_conflict() instead of creating another one on the
stack
- only call vfs_getattr() in nfsd4_deleg_getattr_conflict() for the
!ncf_file_modified case (once the file has been flagged as modified,
it remains so until the delegation is returned or revoked, so further
calls to vfs_getattr() are unnecessary)
- if we recall the delegation (because the client didn't respond to the
CB_GETATTR), then skip the CB_GETATTR comparison logic
Chuck - to test the last part I hacked a pynfs test to force a CB_RECALL
by not responding to the CB_GETATTR in a timely manner. But nfsd has a
pretty aggressive timeout for the DELEGRETURN (30ms) before it responds
to the client that issued the GETATTR with NFS4ERR_DELAY. On the system
I'm testing on, it looks like it's taking just over 30ms on average:
nfsd-2971 [059] ...1. 451.007305: nfsd_cb_recall: addr=127.0.0.1:0 client 69d047c9:03b2de67 stateid 00000002:00000001
nfsd-2971 [059] ..... 451.037306: nfsd_delegret_wakeup: xid=0x16c768b1 inode=000000000a5c53d9 (timed out)
nfsd-2971 [059] ..... 451.037461: nfsd_deleg_return: client 69d047c9:03b2de67 stateid 00000002:00000001
So I bumped the timeout to 50ms (just in my test kernel) and it appears
to do the right thing... I'm just not sure how often it'll actually come
in play in normal usage.
fs/nfsd/nfs4state.c | 35 ++++++++++++++++++++++++-----------
fs/nfsd/nfs4xdr.c | 3 ++-
fs/nfsd/state.h | 3 ++-
3 files changed, 28 insertions(+), 13 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fa657badf5f8..53d8e7e7d60b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -9444,7 +9444,9 @@ static int cb_getattr_update_times(struct dentry *dentry, struct nfs4_delegation
/**
* nfsd4_deleg_getattr_conflict - Recall if GETATTR causes conflict
* @rqstp: RPC transaction context
- * @dentry: dentry of inode to be checked for a conflict
+ * @path: used to get the inode and size of the file to be checked for a
+ * conflict
+ * @stat: used to get the size of the file to be checked for a conflict
* @pdp: returned WRITE delegation, if one was found
*
* This function is called when there is a conflict between a write
@@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct dentry *dentry, struct nfs4_delegation
* caller must put the reference.
*/
__be32
-nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry *dentry,
- struct nfs4_delegation **pdp)
+nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
+ struct kstat *stat, struct nfs4_delegation **pdp)
{
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
struct nfsd_thread_local_info *ntli = rqstp->rq_private;
+ struct inode *inode = d_inode(path->dentry);
struct file_lock_context *ctx;
struct nfs4_delegation *dp = NULL;
struct file_lease *fl;
struct nfs4_cb_fattr *ncf;
- struct inode *inode = d_inode(dentry);
__be32 status;
+ int err;
ctx = locks_inode_context(inode);
if (!ctx)
@@ -9516,20 +9519,30 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry *dentry,
if (status != nfserr_jukebox ||
!nfsd_wait_for_delegreturn(rqstp, inode))
goto out_status;
+ status = nfs_ok;
+ goto out_status;
+ }
+ if (!ncf->ncf_file_modified) {
+ if (ncf->ncf_initial_cinfo != ncf->ncf_cb_change) {
+ ncf->ncf_file_modified = true;
+ } else {
+ err = vfs_getattr(path, stat, STATX_SIZE,
+ AT_STATX_SYNC_AS_STAT);
+ if (err) {
+ status = nfserrno(err);
+ goto out_status;
+ }
+ if (stat->size != ncf->ncf_cb_fsize)
+ ncf->ncf_file_modified = true;
+ }
}
- if (!ncf->ncf_file_modified &&
- (ncf->ncf_initial_cinfo != ncf->ncf_cb_change ||
- ncf->ncf_cur_fsize != ncf->ncf_cb_fsize))
- ncf->ncf_file_modified = true;
if (ncf->ncf_file_modified) {
- int err;
-
/*
* Per section 10.4.3 of RFC 8881, the server would
* not update the file's metadata with the client's
* modified size
*/
- err = cb_getattr_update_times(dentry, dp);
+ err = cb_getattr_update_times(path->dentry, dp);
if (err) {
status = nfserrno(err);
goto out_status;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 2a0946c630e1..5e09682c8135 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3914,7 +3914,8 @@ nfsd4_encode_fattr4(struct svc_rqst *rqstp, struct xdr_stream *xdr,
(attrmask[1] & (FATTR4_WORD1_TIME_ACCESS |
FATTR4_WORD1_TIME_MODIFY |
FATTR4_WORD1_TIME_METADATA))) {
- status = nfsd4_deleg_getattr_conflict(rqstp, dentry, &dp);
+ status = nfsd4_deleg_getattr_conflict(rqstp, &path, &args.stat,
+ &dp);
if (status)
goto out;
}
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 811c148f36fc..4fa6329c75b4 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -904,7 +904,8 @@ static inline bool try_to_expire_client(struct nfs4_client *clp)
}
extern __be32 nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp,
- struct dentry *dentry, struct nfs4_delegation **pdp);
+ struct path *path, struct kstat *stat,
+ struct nfs4_delegation **pdp);
struct nfsd4_get_dir_delegation;
struct nfs4_delegation *nfsd_get_dir_deleg(struct nfsd4_compound_state *cstate,
--
2.53.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
2026-04-04 0:54 [PATCH v2] nfsd: fix file change detection in CB_GETATTR Scott Mayhew
@ 2026-04-04 15:07 ` Chuck Lever
2026-04-05 22:13 ` Scott Mayhew
0 siblings, 1 reply; 10+ messages in thread
From: Chuck Lever @ 2026-04-04 15:07 UTC (permalink / raw)
To: Scott Mayhew, Chuck Lever, Jeff Layton
Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs
On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
> RFC 8881, section 10.4.3 doesn't say anything about caching the file
> size in the delegation record, nor does it say anything about comparing
> a cached file size with the size reported by the client in the
> CB_GETATTR reply for the purpose of determining if the client holds
> modified data for the file.
>
> What section 10.4.3 of RFC 8881 does say is that the server should
> compare the *current* file size with size reported by the client holding
> the delegation in the CB_GETATTR reply, and if they differ to treat it
> as a modification regardless of the change attribute retrieved via the
> CB_GETATTR.
>
> Doing otherwise would cause the server to believe the client holding the
> delegation has a modified version of the file, even if the client
> flushed the modifications to the server prior to the CB_GETATTR. This
> would have the added side effect of subsequent CB_GETATTRs causing
> updates to the mtime, ctime, and change attribute even if the client
> holding the delegation makes no further updates to the file.
>
> Modify nfsd4_deleg_getattr_conflict() to obtain the current file size
> via vfs_getattr(). Retain the ncf_cur_fsize field, since it's a
> convenient way to return the file size back to nfsd4_encode_fattr4(),
> but don't use it for the purpose of detecting file changes.
>
> Also, if we recall the delegation (because the client didn't respond to
> the CB_GETATTR), then skip the logic that checks the nfs4_cb_fattr
> fields.
>
> Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
> Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> ---
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index fa657badf5f8..53d8e7e7d60b 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct
> dentry *dentry, struct nfs4_delegation
> * caller must put the reference.
> */
> __be32
> -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry
> *dentry,
> - struct nfs4_delegation **pdp)
> +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> + struct kstat *stat, struct nfs4_delegation **pdp)
Passing the kstat struct in saves some stack just as I suggested,
but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
though -- did you happen to measure how deep it gets after this patch
is applied?
> {
> struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> struct nfsd_thread_local_info *ntli = rqstp->rq_private;
> + struct inode *inode = d_inode(path->dentry);
> struct file_lock_context *ctx;
> struct nfs4_delegation *dp = NULL;
> struct file_lease *fl;
> struct nfs4_cb_fattr *ncf;
> - struct inode *inode = d_inode(dentry);
> __be32 status;
> + int err;
>
> ctx = locks_inode_context(inode);
> if (!ctx)
--
Chuck Lever
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
2026-04-04 15:07 ` Chuck Lever
@ 2026-04-05 22:13 ` Scott Mayhew
2026-04-06 14:12 ` Chuck Lever
0 siblings, 1 reply; 10+ messages in thread
From: Scott Mayhew @ 2026-04-05 22:13 UTC (permalink / raw)
To: Chuck Lever
Cc: Chuck Lever, Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, linux-nfs
On Sat, 04 Apr 2026, Chuck Lever wrote:
>
>
> On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
> > RFC 8881, section 10.4.3 doesn't say anything about caching the file
> > size in the delegation record, nor does it say anything about comparing
> > a cached file size with the size reported by the client in the
> > CB_GETATTR reply for the purpose of determining if the client holds
> > modified data for the file.
> >
> > What section 10.4.3 of RFC 8881 does say is that the server should
> > compare the *current* file size with size reported by the client holding
> > the delegation in the CB_GETATTR reply, and if they differ to treat it
> > as a modification regardless of the change attribute retrieved via the
> > CB_GETATTR.
> >
> > Doing otherwise would cause the server to believe the client holding the
> > delegation has a modified version of the file, even if the client
> > flushed the modifications to the server prior to the CB_GETATTR. This
> > would have the added side effect of subsequent CB_GETATTRs causing
> > updates to the mtime, ctime, and change attribute even if the client
> > holding the delegation makes no further updates to the file.
> >
> > Modify nfsd4_deleg_getattr_conflict() to obtain the current file size
> > via vfs_getattr(). Retain the ncf_cur_fsize field, since it's a
> > convenient way to return the file size back to nfsd4_encode_fattr4(),
> > but don't use it for the purpose of detecting file changes.
> >
> > Also, if we recall the delegation (because the client didn't respond to
> > the CB_GETATTR), then skip the logic that checks the nfs4_cb_fattr
> > fields.
> >
> > Fixes: c5967721e106 ("NFSD: handle GETATTR conflict with write delegation")
> > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
> > ---
>
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index fa657badf5f8..53d8e7e7d60b 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
>
> > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct
> > dentry *dentry, struct nfs4_delegation
> > * caller must put the reference.
> > */
> > __be32
> > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry
> > *dentry,
> > - struct nfs4_delegation **pdp)
> > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> > + struct kstat *stat, struct nfs4_delegation **pdp)
>
> Passing the kstat struct in saves some stack just as I suggested,
> but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
> though -- did you happen to measure how deep it gets after this patch
> is applied?
>
I tried using the stack tracer:
# echo 1 >/proc/sys/kernel/stack_tracer_enabled
# echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
# cat /sys/kernel/debug/tracing/stack_trace
Depth Size Location (18 entries)
----- ---- --------
0) 1936 48 vfs_getattr+0x9/0x50
1) 1888 552 nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
2) 1336 80 nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
3) 1256 96 nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
4) 1160 144 nfsd_buffered_readdir+0x139/0x310 [nfsd]
5) 1016 80 nfsd_readdir+0x9f/0x180 [nfsd]
6) 936 80 nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
7) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
8) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
9) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
10) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
11) 536 40 svc_process+0x150/0x240 [sunrpc]
12) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
13) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
14) 368 80 nfsd+0x11c/0x3d0 [nfsd]
15) 288 56 kthread+0xe3/0x120
16) 232 40 ret_from_fork+0x1a1/0x270
17) 192 192 ret_from_fork_asm+0x1a/0x30
But that's capturing a vfs_getattr() call from encoding a readdir reply,
rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
Here's the stack depth for nfsd4_deleg_getattr_conflict():
# echo nfsd4_deleg_getattr_conflict >/sys/kernel/debug/tracing/stack_trace_filter
# echo 0 >/sys/kernel/debug/tracing/stack_max_size
# cat /sys/kernel/debug/tracing/stack_trace
Depth Size Location (14 entries)
----- ---- --------
0) 1472 48 nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
1) 1424 552 nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
2) 872 16 nfsd4_encode_getattr+0x2c/0x40 [nfsd]
3) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
4) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
5) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
6) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
7) 536 40 svc_process+0x150/0x240 [sunrpc]
8) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
9) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
10) 368 80 nfsd+0x11c/0x3d0 [nfsd]
11) 288 56 kthread+0xe3/0x120
12) 232 40 ret_from_fork+0x1a1/0x270
13) 192 192 ret_from_fork_asm+0x1a/0x30
Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
function (that we can trace) is avc_lookup(), so here's a bpftrace script that
prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
(I mostly punted to Gemini for this):
# cat peak-usage.bt
kprobe:nfsd4_deleg_getattr_conflict {
@in_deleg_conflict[tid] = 1;
}
kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
$stack_size = (uint64)16384;
$sp = reg("sp");
$stack_base = $sp & ~($stack_size - 1);
$total_used = $stack_base + $stack_size - $sp;
if ($total_used > @max_depth_bytes) {
@max_depth_bytes = $total_used;
@deepest_stack = kstack;
}
}
kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
And finally the result:
# bpftrace peak-usage.bt
Attached 3 probes
^C
@deepest_stack:
avc_lookup+1
avc_has_perm_noaudit+60
avc_has_perm+89
selinux_inode_getattr+203
security_inode_getattr+70
vfs_getattr+35
nfsd4_deleg_getattr_conflict+958
nfsd4_encode_fattr4+401
nfsd4_encode_getattr+44
nfsd4_encode_operation+207
nfsd4_proc_compound+470
nfsd_dispatch+217
svc_process_common+1227
svc_process+336
svc_handle_xprt+1200
svc_recv+434
nfsd+284
kthread+227
ret_from_fork+417
ret_from_fork_asm+26
@max_depth_bytes: 1792
>
> > {
> > struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> > struct nfsd_thread_local_info *ntli = rqstp->rq_private;
> > + struct inode *inode = d_inode(path->dentry);
> > struct file_lock_context *ctx;
> > struct nfs4_delegation *dp = NULL;
> > struct file_lease *fl;
> > struct nfs4_cb_fattr *ncf;
> > - struct inode *inode = d_inode(dentry);
> > __be32 status;
> > + int err;
> >
> > ctx = locks_inode_context(inode);
> > if (!ctx)
>
> --
> Chuck Lever
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
2026-04-05 22:13 ` Scott Mayhew
@ 2026-04-06 14:12 ` Chuck Lever
2026-04-06 14:38 ` Jeff Layton
2026-04-06 17:20 ` Scott Mayhew
0 siblings, 2 replies; 10+ messages in thread
From: Chuck Lever @ 2026-04-06 14:12 UTC (permalink / raw)
To: Scott Mayhew
Cc: Chuck Lever, Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, linux-nfs
On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote:
> On Sat, 04 Apr 2026, Chuck Lever wrote:
>> On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
>> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> > index fa657badf5f8..53d8e7e7d60b 100644
>> > --- a/fs/nfsd/nfs4state.c
>> > +++ b/fs/nfsd/nfs4state.c
>>
>> > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct
>> > dentry *dentry, struct nfs4_delegation
>> > * caller must put the reference.
>> > */
>> > __be32
>> > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry
>> > *dentry,
>> > - struct nfs4_delegation **pdp)
>> > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
>> > + struct kstat *stat, struct nfs4_delegation **pdp)
>>
>> Passing the kstat struct in saves some stack just as I suggested,
>> but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
>> though -- did you happen to measure how deep it gets after this patch
>> is applied?
>>
>
> I tried using the stack tracer:
>
> # echo 1 >/proc/sys/kernel/stack_tracer_enabled
> # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
> # cat /sys/kernel/debug/tracing/stack_trace
> Depth Size Location (18 entries)
> ----- ---- --------
> 0) 1936 48 vfs_getattr+0x9/0x50
> 1) 1888 552 nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
> 2) 1336 80 nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
> 3) 1256 96 nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
> 4) 1160 144 nfsd_buffered_readdir+0x139/0x310 [nfsd]
> 5) 1016 80 nfsd_readdir+0x9f/0x180 [nfsd]
> 6) 936 80 nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
> 7) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> 8) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> 9) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> 10) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> 11) 536 40 svc_process+0x150/0x240 [sunrpc]
> 12) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> 13) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> 14) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> 15) 288 56 kthread+0xe3/0x120
> 16) 232 40 ret_from_fork+0x1a1/0x270
> 17) 192 192 ret_from_fork_asm+0x1a/0x30
>
> But that's capturing a vfs_getattr() call from encoding a readdir reply,
> rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
>
> Here's the stack depth for nfsd4_deleg_getattr_conflict():
>
> # echo nfsd4_deleg_getattr_conflict
> >/sys/kernel/debug/tracing/stack_trace_filter
> # echo 0 >/sys/kernel/debug/tracing/stack_max_size
> # cat /sys/kernel/debug/tracing/stack_trace
> Depth Size Location (14 entries)
> ----- ---- --------
> 0) 1472 48 nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
> 1) 1424 552 nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
> 2) 872 16 nfsd4_encode_getattr+0x2c/0x40 [nfsd]
> 3) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> 4) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> 5) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> 6) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> 7) 536 40 svc_process+0x150/0x240 [sunrpc]
> 8) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> 9) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> 10) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> 11) 288 56 kthread+0xe3/0x120
> 12) 232 40 ret_from_fork+0x1a1/0x270
> 13) 192 192 ret_from_fork_asm+0x1a/0x30
>
> Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
> function (that we can trace) is avc_lookup(), so here's a bpftrace script that
> prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
> (I mostly punted to Gemini for this):
>
> # cat peak-usage.bt
> kprobe:nfsd4_deleg_getattr_conflict {
> @in_deleg_conflict[tid] = 1;
> }
>
> kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
> $stack_size = (uint64)16384;
> $sp = reg("sp");
> $stack_base = $sp & ~($stack_size - 1);
> $total_used = $stack_base + $stack_size - $sp;
>
> if ($total_used > @max_depth_bytes) {
> @max_depth_bytes = $total_used;
> @deepest_stack = kstack;
> }
> }
>
> kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
>
> And finally the result:
>
> # bpftrace peak-usage.bt
> Attached 3 probes
> ^C
>
> @deepest_stack:
> avc_lookup+1
> avc_has_perm_noaudit+60
> avc_has_perm+89
> selinux_inode_getattr+203
> security_inode_getattr+70
> vfs_getattr+35
> nfsd4_deleg_getattr_conflict+958
> nfsd4_encode_fattr4+401
> nfsd4_encode_getattr+44
> nfsd4_encode_operation+207
> nfsd4_proc_compound+470
> nfsd_dispatch+217
> svc_process_common+1227
> svc_process+336
> svc_handle_xprt+1200
> svc_recv+434
> nfsd+284
> kthread+227
> ret_from_fork+417
> ret_from_fork_asm+26
>
> @max_depth_bytes: 1792
Since the new code only needs the file's size, perhaps you can get
away with
if (i_size_read(inode) != ncf->ncf_cb_fsize)
rather than
err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT);
if (err) {
status = nfserrno(err);
goto out_status;
}
if (stat->size != ncf->ncf_cb_fsize)
Then there's no longer a need for a struct kstat at all. The client is
holding a delegation, so I would expect the file size to be stable.
--
Chuck Lever
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
2026-04-06 14:12 ` Chuck Lever
@ 2026-04-06 14:38 ` Jeff Layton
2026-04-06 17:47 ` Scott Mayhew
2026-04-06 17:20 ` Scott Mayhew
1 sibling, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2026-04-06 14:38 UTC (permalink / raw)
To: Chuck Lever, Scott Mayhew
Cc: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
linux-nfs
On Mon, 2026-04-06 at 10:12 -0400, Chuck Lever wrote:
>
> On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote:
> > On Sat, 04 Apr 2026, Chuck Lever wrote:
> > > On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
>
> > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > index fa657badf5f8..53d8e7e7d60b 100644
> > > > --- a/fs/nfsd/nfs4state.c
> > > > +++ b/fs/nfsd/nfs4state.c
> > >
> > > > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct
> > > > dentry *dentry, struct nfs4_delegation
> > > > * caller must put the reference.
> > > > */
> > > > __be32
> > > > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry
> > > > *dentry,
> > > > - struct nfs4_delegation **pdp)
> > > > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> > > > + struct kstat *stat, struct nfs4_delegation **pdp)
> > >
> > > Passing the kstat struct in saves some stack just as I suggested,
> > > but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
> > > though -- did you happen to measure how deep it gets after this patch
> > > is applied?
> > >
> >
> > I tried using the stack tracer:
> >
> > # echo 1 >/proc/sys/kernel/stack_tracer_enabled
> > # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
> > # cat /sys/kernel/debug/tracing/stack_trace
> > Depth Size Location (18 entries)
> > ----- ---- --------
> > 0) 1936 48 vfs_getattr+0x9/0x50
> > 1) 1888 552 nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
> > 2) 1336 80 nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
> > 3) 1256 96 nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
> > 4) 1160 144 nfsd_buffered_readdir+0x139/0x310 [nfsd]
> > 5) 1016 80 nfsd_readdir+0x9f/0x180 [nfsd]
> > 6) 936 80 nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
> > 7) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > 8) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > 9) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > 10) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > 11) 536 40 svc_process+0x150/0x240 [sunrpc]
> > 12) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > 13) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > 14) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > 15) 288 56 kthread+0xe3/0x120
> > 16) 232 40 ret_from_fork+0x1a1/0x270
> > 17) 192 192 ret_from_fork_asm+0x1a/0x30
> >
> > But that's capturing a vfs_getattr() call from encoding a readdir reply,
> > rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
> >
> > Here's the stack depth for nfsd4_deleg_getattr_conflict():
> >
> > # echo nfsd4_deleg_getattr_conflict
> > > /sys/kernel/debug/tracing/stack_trace_filter
> > # echo 0 >/sys/kernel/debug/tracing/stack_max_size
> > # cat /sys/kernel/debug/tracing/stack_trace
> > Depth Size Location (14 entries)
> > ----- ---- --------
> > 0) 1472 48 nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
> > 1) 1424 552 nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
> > 2) 872 16 nfsd4_encode_getattr+0x2c/0x40 [nfsd]
> > 3) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > 4) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > 5) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > 6) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > 7) 536 40 svc_process+0x150/0x240 [sunrpc]
> > 8) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > 9) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > 10) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > 11) 288 56 kthread+0xe3/0x120
> > 12) 232 40 ret_from_fork+0x1a1/0x270
> > 13) 192 192 ret_from_fork_asm+0x1a/0x30
> >
> > Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
> > function (that we can trace) is avc_lookup(), so here's a bpftrace script that
> > prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
> > (I mostly punted to Gemini for this):
> >
> > # cat peak-usage.bt
> > kprobe:nfsd4_deleg_getattr_conflict {
> > @in_deleg_conflict[tid] = 1;
> > }
> >
> > kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
> > $stack_size = (uint64)16384;
> > $sp = reg("sp");
> > $stack_base = $sp & ~($stack_size - 1);
> > $total_used = $stack_base + $stack_size - $sp;
> >
> > if ($total_used > @max_depth_bytes) {
> > @max_depth_bytes = $total_used;
> > @deepest_stack = kstack;
> > }
> > }
> >
> > kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
> >
> > And finally the result:
> >
> > # bpftrace peak-usage.bt
> > Attached 3 probes
> > ^C
> >
> > @deepest_stack:
> > avc_lookup+1
> > avc_has_perm_noaudit+60
> > avc_has_perm+89
> > selinux_inode_getattr+203
> > security_inode_getattr+70
> > vfs_getattr+35
> > nfsd4_deleg_getattr_conflict+958
> > nfsd4_encode_fattr4+401
> > nfsd4_encode_getattr+44
> > nfsd4_encode_operation+207
> > nfsd4_proc_compound+470
> > nfsd_dispatch+217
> > svc_process_common+1227
> > svc_process+336
> > svc_handle_xprt+1200
> > svc_recv+434
> > nfsd+284
> > kthread+227
> > ret_from_fork+417
> > ret_from_fork_asm+26
> >
> > @max_depth_bytes: 1792
>
> Since the new code only needs the file's size, perhaps you can get
> away with
>
> if (i_size_read(inode) != ncf->ncf_cb_fsize)
>
> rather than
>
> err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT);
> if (err) {
> status = nfserrno(err);
> goto out_status;
> }
> if (stat->size != ncf->ncf_cb_fsize)
>
> Then there's no longer a need for a struct kstat at all. The client is
> holding a delegation, so I would expect the file size to be stable.
>
I hate relying on the size for this, since it's not a reliable
indicator of change, especially on something that has a fixed size (VM
image, for instance).
In fact, I wonder if this ought to be looking at the mtime too. If the
mtime that the client reported is later than the mtime the server has,
then we can probably infer that there are buffered writes out there.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
2026-04-06 14:12 ` Chuck Lever
2026-04-06 14:38 ` Jeff Layton
@ 2026-04-06 17:20 ` Scott Mayhew
1 sibling, 0 replies; 10+ messages in thread
From: Scott Mayhew @ 2026-04-06 17:20 UTC (permalink / raw)
To: Chuck Lever
Cc: Chuck Lever, Jeff Layton, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, linux-nfs
On Mon, 06 Apr 2026, Chuck Lever wrote:
>
>
> On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote:
> > On Sat, 04 Apr 2026, Chuck Lever wrote:
> >> On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
>
> >> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> >> > index fa657badf5f8..53d8e7e7d60b 100644
> >> > --- a/fs/nfsd/nfs4state.c
> >> > +++ b/fs/nfsd/nfs4state.c
> >>
> >> > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct
> >> > dentry *dentry, struct nfs4_delegation
> >> > * caller must put the reference.
> >> > */
> >> > __be32
> >> > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry
> >> > *dentry,
> >> > - struct nfs4_delegation **pdp)
> >> > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> >> > + struct kstat *stat, struct nfs4_delegation **pdp)
> >>
> >> Passing the kstat struct in saves some stack just as I suggested,
> >> but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
> >> though -- did you happen to measure how deep it gets after this patch
> >> is applied?
> >>
> >
> > I tried using the stack tracer:
> >
> > # echo 1 >/proc/sys/kernel/stack_tracer_enabled
> > # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
> > # cat /sys/kernel/debug/tracing/stack_trace
> > Depth Size Location (18 entries)
> > ----- ---- --------
> > 0) 1936 48 vfs_getattr+0x9/0x50
> > 1) 1888 552 nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
> > 2) 1336 80 nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
> > 3) 1256 96 nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
> > 4) 1160 144 nfsd_buffered_readdir+0x139/0x310 [nfsd]
> > 5) 1016 80 nfsd_readdir+0x9f/0x180 [nfsd]
> > 6) 936 80 nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
> > 7) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > 8) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > 9) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > 10) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > 11) 536 40 svc_process+0x150/0x240 [sunrpc]
> > 12) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > 13) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > 14) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > 15) 288 56 kthread+0xe3/0x120
> > 16) 232 40 ret_from_fork+0x1a1/0x270
> > 17) 192 192 ret_from_fork_asm+0x1a/0x30
> >
> > But that's capturing a vfs_getattr() call from encoding a readdir reply,
> > rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
> >
> > Here's the stack depth for nfsd4_deleg_getattr_conflict():
> >
> > # echo nfsd4_deleg_getattr_conflict
> > >/sys/kernel/debug/tracing/stack_trace_filter
> > # echo 0 >/sys/kernel/debug/tracing/stack_max_size
> > # cat /sys/kernel/debug/tracing/stack_trace
> > Depth Size Location (14 entries)
> > ----- ---- --------
> > 0) 1472 48 nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
> > 1) 1424 552 nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
> > 2) 872 16 nfsd4_encode_getattr+0x2c/0x40 [nfsd]
> > 3) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > 4) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > 5) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > 6) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > 7) 536 40 svc_process+0x150/0x240 [sunrpc]
> > 8) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > 9) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > 10) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > 11) 288 56 kthread+0xe3/0x120
> > 12) 232 40 ret_from_fork+0x1a1/0x270
> > 13) 192 192 ret_from_fork_asm+0x1a/0x30
> >
> > Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
> > function (that we can trace) is avc_lookup(), so here's a bpftrace script that
> > prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
> > (I mostly punted to Gemini for this):
> >
> > # cat peak-usage.bt
> > kprobe:nfsd4_deleg_getattr_conflict {
> > @in_deleg_conflict[tid] = 1;
> > }
> >
> > kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
> > $stack_size = (uint64)16384;
> > $sp = reg("sp");
> > $stack_base = $sp & ~($stack_size - 1);
> > $total_used = $stack_base + $stack_size - $sp;
> >
> > if ($total_used > @max_depth_bytes) {
> > @max_depth_bytes = $total_used;
> > @deepest_stack = kstack;
> > }
> > }
> >
> > kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
> >
> > And finally the result:
> >
> > # bpftrace peak-usage.bt
> > Attached 3 probes
> > ^C
> >
> > @deepest_stack:
> > avc_lookup+1
> > avc_has_perm_noaudit+60
> > avc_has_perm+89
> > selinux_inode_getattr+203
> > security_inode_getattr+70
> > vfs_getattr+35
> > nfsd4_deleg_getattr_conflict+958
> > nfsd4_encode_fattr4+401
> > nfsd4_encode_getattr+44
> > nfsd4_encode_operation+207
> > nfsd4_proc_compound+470
> > nfsd_dispatch+217
> > svc_process_common+1227
> > svc_process+336
> > svc_handle_xprt+1200
> > svc_recv+434
> > nfsd+284
> > kthread+227
> > ret_from_fork+417
> > ret_from_fork_asm+26
> >
> > @max_depth_bytes: 1792
>
> Since the new code only needs the file's size, perhaps you can get
> away with
>
> if (i_size_read(inode) != ncf->ncf_cb_fsize)
>
> rather than
>
> err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT);
> if (err) {
> status = nfserrno(err);
> goto out_status;
> }
> if (stat->size != ncf->ncf_cb_fsize)
Definitely. I'll send a v3 in a bit.
>
> Then there's no longer a need for a struct kstat at all. The client is
> holding a delegation, so I would expect the file size to be stable.
>
>
> --
> Chuck Lever
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
2026-04-06 14:38 ` Jeff Layton
@ 2026-04-06 17:47 ` Scott Mayhew
2026-04-06 18:13 ` Jeff Layton
2026-04-06 18:48 ` Jeff Layton
0 siblings, 2 replies; 10+ messages in thread
From: Scott Mayhew @ 2026-04-06 17:47 UTC (permalink / raw)
To: Jeff Layton
Cc: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, linux-nfs
On Mon, 06 Apr 2026, Jeff Layton wrote:
> On Mon, 2026-04-06 at 10:12 -0400, Chuck Lever wrote:
> >
> > On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote:
> > > On Sat, 04 Apr 2026, Chuck Lever wrote:
> > > > On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
> >
> > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > index fa657badf5f8..53d8e7e7d60b 100644
> > > > > --- a/fs/nfsd/nfs4state.c
> > > > > +++ b/fs/nfsd/nfs4state.c
> > > >
> > > > > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct
> > > > > dentry *dentry, struct nfs4_delegation
> > > > > * caller must put the reference.
> > > > > */
> > > > > __be32
> > > > > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry
> > > > > *dentry,
> > > > > - struct nfs4_delegation **pdp)
> > > > > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> > > > > + struct kstat *stat, struct nfs4_delegation **pdp)
> > > >
> > > > Passing the kstat struct in saves some stack just as I suggested,
> > > > but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
> > > > though -- did you happen to measure how deep it gets after this patch
> > > > is applied?
> > > >
> > >
> > > I tried using the stack tracer:
> > >
> > > # echo 1 >/proc/sys/kernel/stack_tracer_enabled
> > > # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
> > > # cat /sys/kernel/debug/tracing/stack_trace
> > > Depth Size Location (18 entries)
> > > ----- ---- --------
> > > 0) 1936 48 vfs_getattr+0x9/0x50
> > > 1) 1888 552 nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
> > > 2) 1336 80 nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
> > > 3) 1256 96 nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
> > > 4) 1160 144 nfsd_buffered_readdir+0x139/0x310 [nfsd]
> > > 5) 1016 80 nfsd_readdir+0x9f/0x180 [nfsd]
> > > 6) 936 80 nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
> > > 7) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > 8) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > 9) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > > 10) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > 11) 536 40 svc_process+0x150/0x240 [sunrpc]
> > > 12) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > 13) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > > 14) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > > 15) 288 56 kthread+0xe3/0x120
> > > 16) 232 40 ret_from_fork+0x1a1/0x270
> > > 17) 192 192 ret_from_fork_asm+0x1a/0x30
> > >
> > > But that's capturing a vfs_getattr() call from encoding a readdir reply,
> > > rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
> > >
> > > Here's the stack depth for nfsd4_deleg_getattr_conflict():
> > >
> > > # echo nfsd4_deleg_getattr_conflict
> > > > /sys/kernel/debug/tracing/stack_trace_filter
> > > # echo 0 >/sys/kernel/debug/tracing/stack_max_size
> > > # cat /sys/kernel/debug/tracing/stack_trace
> > > Depth Size Location (14 entries)
> > > ----- ---- --------
> > > 0) 1472 48 nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
> > > 1) 1424 552 nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
> > > 2) 872 16 nfsd4_encode_getattr+0x2c/0x40 [nfsd]
> > > 3) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > 4) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > 5) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > > 6) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > 7) 536 40 svc_process+0x150/0x240 [sunrpc]
> > > 8) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > 9) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > > 10) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > > 11) 288 56 kthread+0xe3/0x120
> > > 12) 232 40 ret_from_fork+0x1a1/0x270
> > > 13) 192 192 ret_from_fork_asm+0x1a/0x30
> > >
> > > Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
> > > function (that we can trace) is avc_lookup(), so here's a bpftrace script that
> > > prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
> > > (I mostly punted to Gemini for this):
> > >
> > > # cat peak-usage.bt
> > > kprobe:nfsd4_deleg_getattr_conflict {
> > > @in_deleg_conflict[tid] = 1;
> > > }
> > >
> > > kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
> > > $stack_size = (uint64)16384;
> > > $sp = reg("sp");
> > > $stack_base = $sp & ~($stack_size - 1);
> > > $total_used = $stack_base + $stack_size - $sp;
> > >
> > > if ($total_used > @max_depth_bytes) {
> > > @max_depth_bytes = $total_used;
> > > @deepest_stack = kstack;
> > > }
> > > }
> > >
> > > kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
> > >
> > > And finally the result:
> > >
> > > # bpftrace peak-usage.bt
> > > Attached 3 probes
> > > ^C
> > >
> > > @deepest_stack:
> > > avc_lookup+1
> > > avc_has_perm_noaudit+60
> > > avc_has_perm+89
> > > selinux_inode_getattr+203
> > > security_inode_getattr+70
> > > vfs_getattr+35
> > > nfsd4_deleg_getattr_conflict+958
> > > nfsd4_encode_fattr4+401
> > > nfsd4_encode_getattr+44
> > > nfsd4_encode_operation+207
> > > nfsd4_proc_compound+470
> > > nfsd_dispatch+217
> > > svc_process_common+1227
> > > svc_process+336
> > > svc_handle_xprt+1200
> > > svc_recv+434
> > > nfsd+284
> > > kthread+227
> > > ret_from_fork+417
> > > ret_from_fork_asm+26
> > >
> > > @max_depth_bytes: 1792
> >
> > Since the new code only needs the file's size, perhaps you can get
> > away with
> >
> > if (i_size_read(inode) != ncf->ncf_cb_fsize)
> >
> > rather than
> >
> > err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT);
> > if (err) {
> > status = nfserrno(err);
> > goto out_status;
> > }
> > if (stat->size != ncf->ncf_cb_fsize)
> >
> > Then there's no longer a need for a struct kstat at all. The client is
> > holding a delegation, so I would expect the file size to be stable.
> >
>
> I hate relying on the size for this, since it's not a reliable
> indicator of change, especially on something that has a fixed size (VM
> image, for instance).
You're right, it's not. I dislike how the pseudocode in 10.4.3 is only
checking the change attribute to determine if the file has been modified
and then 2 paragraphs down it talks about checking the file size.
At any rate, the main point of this fix is to make sure we're comparing
the size from the cb_getattr reply to the current size, not a cached
value. For example:
client 1 does open(O_RDWR|O_CREATE_O_TRUNC)
client 1 writes some data and flushes it to the server
client 2 does some operation that triggers a CB_GETATTR
At this point client 1 doesn't have any modified data, so it sends the
original change attribute and its current file size (which should match
the current file size on the server). With the current code, the server
compares the size client 1 sent with the value it cached on the
delegation (in this case 0). The server treats it as if the client is
has modified data, which it does not. That's all this patch is aiming
to fix.
>
> In fact, I wonder if this ought to be looking at the mtime too. If the
> mtime that the client reported is later than the mtime the server has,
> then we can probably infer that there are buffered writes out there.
But the spec doesn't say anything about comparing the mtime. Plus, if
the client is sending the original change attribute but a different
mtime than what the server has, wouldn't that be a client bug?
Plus, wouldn't it sort of be crossing into delegated timestamps
territory if we started checking the mtime?
> --
> Jeff Layton <jlayton@kernel.org>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
2026-04-06 17:47 ` Scott Mayhew
@ 2026-04-06 18:13 ` Jeff Layton
2026-04-06 18:48 ` Jeff Layton
1 sibling, 0 replies; 10+ messages in thread
From: Jeff Layton @ 2026-04-06 18:13 UTC (permalink / raw)
To: Scott Mayhew
Cc: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, linux-nfs
On Mon, 2026-04-06 at 13:47 -0400, Scott Mayhew wrote:
> On Mon, 06 Apr 2026, Jeff Layton wrote:
>
> > On Mon, 2026-04-06 at 10:12 -0400, Chuck Lever wrote:
> > >
> > > On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote:
> > > > On Sat, 04 Apr 2026, Chuck Lever wrote:
> > > > > On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
> > >
> > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > index fa657badf5f8..53d8e7e7d60b 100644
> > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > >
> > > > > > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct
> > > > > > dentry *dentry, struct nfs4_delegation
> > > > > > * caller must put the reference.
> > > > > > */
> > > > > > __be32
> > > > > > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry
> > > > > > *dentry,
> > > > > > - struct nfs4_delegation **pdp)
> > > > > > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> > > > > > + struct kstat *stat, struct nfs4_delegation **pdp)
> > > > >
> > > > > Passing the kstat struct in saves some stack just as I suggested,
> > > > > but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
> > > > > though -- did you happen to measure how deep it gets after this patch
> > > > > is applied?
> > > > >
> > > >
> > > > I tried using the stack tracer:
> > > >
> > > > # echo 1 >/proc/sys/kernel/stack_tracer_enabled
> > > > # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
> > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > > Depth Size Location (18 entries)
> > > > ----- ---- --------
> > > > 0) 1936 48 vfs_getattr+0x9/0x50
> > > > 1) 1888 552 nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
> > > > 2) 1336 80 nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
> > > > 3) 1256 96 nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
> > > > 4) 1160 144 nfsd_buffered_readdir+0x139/0x310 [nfsd]
> > > > 5) 1016 80 nfsd_readdir+0x9f/0x180 [nfsd]
> > > > 6) 936 80 nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
> > > > 7) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > > 8) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > > 9) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > > > 10) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > > 11) 536 40 svc_process+0x150/0x240 [sunrpc]
> > > > 12) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > > 13) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > > > 14) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > > > 15) 288 56 kthread+0xe3/0x120
> > > > 16) 232 40 ret_from_fork+0x1a1/0x270
> > > > 17) 192 192 ret_from_fork_asm+0x1a/0x30
> > > >
> > > > But that's capturing a vfs_getattr() call from encoding a readdir reply,
> > > > rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
> > > >
> > > > Here's the stack depth for nfsd4_deleg_getattr_conflict():
> > > >
> > > > # echo nfsd4_deleg_getattr_conflict
> > > > > /sys/kernel/debug/tracing/stack_trace_filter
> > > > # echo 0 >/sys/kernel/debug/tracing/stack_max_size
> > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > > Depth Size Location (14 entries)
> > > > ----- ---- --------
> > > > 0) 1472 48 nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
> > > > 1) 1424 552 nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
> > > > 2) 872 16 nfsd4_encode_getattr+0x2c/0x40 [nfsd]
> > > > 3) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > > 4) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > > 5) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > > > 6) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > > 7) 536 40 svc_process+0x150/0x240 [sunrpc]
> > > > 8) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > > 9) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > > > 10) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > > > 11) 288 56 kthread+0xe3/0x120
> > > > 12) 232 40 ret_from_fork+0x1a1/0x270
> > > > 13) 192 192 ret_from_fork_asm+0x1a/0x30
> > > >
> > > > Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
> > > > function (that we can trace) is avc_lookup(), so here's a bpftrace script that
> > > > prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
> > > > (I mostly punted to Gemini for this):
> > > >
> > > > # cat peak-usage.bt
> > > > kprobe:nfsd4_deleg_getattr_conflict {
> > > > @in_deleg_conflict[tid] = 1;
> > > > }
> > > >
> > > > kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
> > > > $stack_size = (uint64)16384;
> > > > $sp = reg("sp");
> > > > $stack_base = $sp & ~($stack_size - 1);
> > > > $total_used = $stack_base + $stack_size - $sp;
> > > >
> > > > if ($total_used > @max_depth_bytes) {
> > > > @max_depth_bytes = $total_used;
> > > > @deepest_stack = kstack;
> > > > }
> > > > }
> > > >
> > > > kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
> > > >
> > > > And finally the result:
> > > >
> > > > # bpftrace peak-usage.bt
> > > > Attached 3 probes
> > > > ^C
> > > >
> > > > @deepest_stack:
> > > > avc_lookup+1
> > > > avc_has_perm_noaudit+60
> > > > avc_has_perm+89
> > > > selinux_inode_getattr+203
> > > > security_inode_getattr+70
> > > > vfs_getattr+35
> > > > nfsd4_deleg_getattr_conflict+958
> > > > nfsd4_encode_fattr4+401
> > > > nfsd4_encode_getattr+44
> > > > nfsd4_encode_operation+207
> > > > nfsd4_proc_compound+470
> > > > nfsd_dispatch+217
> > > > svc_process_common+1227
> > > > svc_process+336
> > > > svc_handle_xprt+1200
> > > > svc_recv+434
> > > > nfsd+284
> > > > kthread+227
> > > > ret_from_fork+417
> > > > ret_from_fork_asm+26
> > > >
> > > > @max_depth_bytes: 1792
> > >
> > > Since the new code only needs the file's size, perhaps you can get
> > > away with
> > >
> > > if (i_size_read(inode) != ncf->ncf_cb_fsize)
> > >
> > > rather than
> > >
> > > err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT);
> > > if (err) {
> > > status = nfserrno(err);
> > > goto out_status;
> > > }
> > > if (stat->size != ncf->ncf_cb_fsize)
> > >
> > > Then there's no longer a need for a struct kstat at all. The client is
> > > holding a delegation, so I would expect the file size to be stable.
> > >
> >
> > I hate relying on the size for this, since it's not a reliable
> > indicator of change, especially on something that has a fixed size (VM
> > image, for instance).
>
> You're right, it's not. I dislike how the pseudocode in 10.4.3 is only
> checking the change attribute to determine if the file has been modified
> and then 2 paragraphs down it talks about checking the file size.
>
> At any rate, the main point of this fix is to make sure we're comparing
> the size from the cb_getattr reply to the current size, not a cached
> value. For example:
>
> client 1 does open(O_RDWR|O_CREATE_O_TRUNC)
> client 1 writes some data and flushes it to the server
> client 2 does some operation that triggers a CB_GETATTR
>
> At this point client 1 doesn't have any modified data, so it sends the
> original change attribute and its current file size (which should match
> the current file size on the server). With the current code, the server
> compares the size client 1 sent with the value it cached on the
> delegation (in this case 0). The server treats it as if the client is
> has modified data, which it does not. That's all this patch is aiming
> to fix.
>
> >
> > In fact, I wonder if this ought to be looking at the mtime too. If the
> > mtime that the client reported is later than the mtime the server has,
> > then we can probably infer that there are buffered writes out there.
>
> But the spec doesn't say anything about comparing the mtime. Plus, if
> the client is sending the original change attribute but a different
> mtime than what the server has, wouldn't that be a client bug?
>
Not really. The change attr sent by the client is "special". It's
usually just original change attr at time of delegation +1 iff
something has been modified. I don't see that as a reliable indicator
that there is still buffered write data on the client.
> Plus, wouldn't it sort of be crossing into delegated timestamps
> territory if we started checking the mtime?
>
Good point.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
2026-04-06 17:47 ` Scott Mayhew
2026-04-06 18:13 ` Jeff Layton
@ 2026-04-06 18:48 ` Jeff Layton
2026-04-06 22:14 ` Scott Mayhew
1 sibling, 1 reply; 10+ messages in thread
From: Jeff Layton @ 2026-04-06 18:48 UTC (permalink / raw)
To: Scott Mayhew
Cc: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, linux-nfs
On Mon, 2026-04-06 at 13:47 -0400, Scott Mayhew wrote:
> On Mon, 06 Apr 2026, Jeff Layton wrote:
>
> > On Mon, 2026-04-06 at 10:12 -0400, Chuck Lever wrote:
> > >
> > > On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote:
> > > > On Sat, 04 Apr 2026, Chuck Lever wrote:
> > > > > On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
> > >
> > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > index fa657badf5f8..53d8e7e7d60b 100644
> > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > >
> > > > > > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct
> > > > > > dentry *dentry, struct nfs4_delegation
> > > > > > * caller must put the reference.
> > > > > > */
> > > > > > __be32
> > > > > > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry
> > > > > > *dentry,
> > > > > > - struct nfs4_delegation **pdp)
> > > > > > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> > > > > > + struct kstat *stat, struct nfs4_delegation **pdp)
> > > > >
> > > > > Passing the kstat struct in saves some stack just as I suggested,
> > > > > but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
> > > > > though -- did you happen to measure how deep it gets after this patch
> > > > > is applied?
> > > > >
> > > >
> > > > I tried using the stack tracer:
> > > >
> > > > # echo 1 >/proc/sys/kernel/stack_tracer_enabled
> > > > # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
> > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > > Depth Size Location (18 entries)
> > > > ----- ---- --------
> > > > 0) 1936 48 vfs_getattr+0x9/0x50
> > > > 1) 1888 552 nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
> > > > 2) 1336 80 nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
> > > > 3) 1256 96 nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
> > > > 4) 1160 144 nfsd_buffered_readdir+0x139/0x310 [nfsd]
> > > > 5) 1016 80 nfsd_readdir+0x9f/0x180 [nfsd]
> > > > 6) 936 80 nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
> > > > 7) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > > 8) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > > 9) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > > > 10) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > > 11) 536 40 svc_process+0x150/0x240 [sunrpc]
> > > > 12) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > > 13) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > > > 14) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > > > 15) 288 56 kthread+0xe3/0x120
> > > > 16) 232 40 ret_from_fork+0x1a1/0x270
> > > > 17) 192 192 ret_from_fork_asm+0x1a/0x30
> > > >
> > > > But that's capturing a vfs_getattr() call from encoding a readdir reply,
> > > > rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
> > > >
> > > > Here's the stack depth for nfsd4_deleg_getattr_conflict():
> > > >
> > > > # echo nfsd4_deleg_getattr_conflict
> > > > > /sys/kernel/debug/tracing/stack_trace_filter
> > > > # echo 0 >/sys/kernel/debug/tracing/stack_max_size
> > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > > Depth Size Location (14 entries)
> > > > ----- ---- --------
> > > > 0) 1472 48 nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
> > > > 1) 1424 552 nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
> > > > 2) 872 16 nfsd4_encode_getattr+0x2c/0x40 [nfsd]
> > > > 3) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > > 4) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > > 5) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > > > 6) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > > 7) 536 40 svc_process+0x150/0x240 [sunrpc]
> > > > 8) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > > 9) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > > > 10) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > > > 11) 288 56 kthread+0xe3/0x120
> > > > 12) 232 40 ret_from_fork+0x1a1/0x270
> > > > 13) 192 192 ret_from_fork_asm+0x1a/0x30
> > > >
> > > > Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
> > > > function (that we can trace) is avc_lookup(), so here's a bpftrace script that
> > > > prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
> > > > (I mostly punted to Gemini for this):
> > > >
> > > > # cat peak-usage.bt
> > > > kprobe:nfsd4_deleg_getattr_conflict {
> > > > @in_deleg_conflict[tid] = 1;
> > > > }
> > > >
> > > > kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
> > > > $stack_size = (uint64)16384;
> > > > $sp = reg("sp");
> > > > $stack_base = $sp & ~($stack_size - 1);
> > > > $total_used = $stack_base + $stack_size - $sp;
> > > >
> > > > if ($total_used > @max_depth_bytes) {
> > > > @max_depth_bytes = $total_used;
> > > > @deepest_stack = kstack;
> > > > }
> > > > }
> > > >
> > > > kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
> > > >
> > > > And finally the result:
> > > >
> > > > # bpftrace peak-usage.bt
> > > > Attached 3 probes
> > > > ^C
> > > >
> > > > @deepest_stack:
> > > > avc_lookup+1
> > > > avc_has_perm_noaudit+60
> > > > avc_has_perm+89
> > > > selinux_inode_getattr+203
> > > > security_inode_getattr+70
> > > > vfs_getattr+35
> > > > nfsd4_deleg_getattr_conflict+958
> > > > nfsd4_encode_fattr4+401
> > > > nfsd4_encode_getattr+44
> > > > nfsd4_encode_operation+207
> > > > nfsd4_proc_compound+470
> > > > nfsd_dispatch+217
> > > > svc_process_common+1227
> > > > svc_process+336
> > > > svc_handle_xprt+1200
> > > > svc_recv+434
> > > > nfsd+284
> > > > kthread+227
> > > > ret_from_fork+417
> > > > ret_from_fork_asm+26
> > > >
> > > > @max_depth_bytes: 1792
> > >
> > > Since the new code only needs the file's size, perhaps you can get
> > > away with
> > >
> > > if (i_size_read(inode) != ncf->ncf_cb_fsize)
> > >
> > > rather than
> > >
> > > err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT);
> > > if (err) {
> > > status = nfserrno(err);
> > > goto out_status;
> > > }
> > > if (stat->size != ncf->ncf_cb_fsize)
> > >
> > > Then there's no longer a need for a struct kstat at all. The client is
> > > holding a delegation, so I would expect the file size to be stable.
> > >
> >
> > I hate relying on the size for this, since it's not a reliable
> > indicator of change, especially on something that has a fixed size (VM
> > image, for instance).
>
> You're right, it's not. I dislike how the pseudocode in 10.4.3 is only
> checking the change attribute to determine if the file has been modified
> and then 2 paragraphs down it talks about checking the file size.
>
> At any rate, the main point of this fix is to make sure we're comparing
> the size from the cb_getattr reply to the current size, not a cached
> value. For example:
>
> client 1 does open(O_RDWR|O_CREATE_O_TRUNC)
> client 1 writes some data and flushes it to the server
> client 2 does some operation that triggers a CB_GETATTR
>
> At this point client 1 doesn't have any modified data, so it sends the
> original change attribute and its current file size (which should match
> the current file size on the server). With the current code, the server
> compares the size client 1 sent with the value it cached on the
> delegation (in this case 0). The server treats it as if the client is
> has modified data, which it does not. That's all this patch is aiming
> to fix.
>
> >
> > In fact, I wonder if this ought to be looking at the mtime too. If the
> > mtime that the client reported is later than the mtime the server has,
> > then we can probably infer that there are buffered writes out there.
>
> But the spec doesn't say anything about comparing the mtime. Plus, if
> the client is sending the original change attribute but a different
> mtime than what the server has, wouldn't that be a client bug?
>
> Plus, wouldn't it sort of be crossing into delegated timestamps
> territory if we started checking the mtime?
>
>
Should we be looking at extending the spec? Maybe we should add a "I
have outstanding buffered data" attribute that the client could send to
the server in a CB_GETATTR?
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
2026-04-06 18:48 ` Jeff Layton
@ 2026-04-06 22:14 ` Scott Mayhew
0 siblings, 0 replies; 10+ messages in thread
From: Scott Mayhew @ 2026-04-06 22:14 UTC (permalink / raw)
To: Jeff Layton
Cc: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, linux-nfs
On Mon, 06 Apr 2026, Jeff Layton wrote:
> On Mon, 2026-04-06 at 13:47 -0400, Scott Mayhew wrote:
> > On Mon, 06 Apr 2026, Jeff Layton wrote:
> >
> > > On Mon, 2026-04-06 at 10:12 -0400, Chuck Lever wrote:
> > > >
> > > > On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote:
> > > > > On Sat, 04 Apr 2026, Chuck Lever wrote:
> > > > > > On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
> > > >
> > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > > index fa657badf5f8..53d8e7e7d60b 100644
> > > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > >
> > > > > > > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct
> > > > > > > dentry *dentry, struct nfs4_delegation
> > > > > > > * caller must put the reference.
> > > > > > > */
> > > > > > > __be32
> > > > > > > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry
> > > > > > > *dentry,
> > > > > > > - struct nfs4_delegation **pdp)
> > > > > > > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> > > > > > > + struct kstat *stat, struct nfs4_delegation **pdp)
> > > > > >
> > > > > > Passing the kstat struct in saves some stack just as I suggested,
> > > > > > but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
> > > > > > though -- did you happen to measure how deep it gets after this patch
> > > > > > is applied?
> > > > > >
> > > > >
> > > > > I tried using the stack tracer:
> > > > >
> > > > > # echo 1 >/proc/sys/kernel/stack_tracer_enabled
> > > > > # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
> > > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > > > Depth Size Location (18 entries)
> > > > > ----- ---- --------
> > > > > 0) 1936 48 vfs_getattr+0x9/0x50
> > > > > 1) 1888 552 nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
> > > > > 2) 1336 80 nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
> > > > > 3) 1256 96 nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
> > > > > 4) 1160 144 nfsd_buffered_readdir+0x139/0x310 [nfsd]
> > > > > 5) 1016 80 nfsd_readdir+0x9f/0x180 [nfsd]
> > > > > 6) 936 80 nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
> > > > > 7) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > > > 8) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > > > 9) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > > > > 10) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > > > 11) 536 40 svc_process+0x150/0x240 [sunrpc]
> > > > > 12) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > > > 13) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > > > > 14) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > > > > 15) 288 56 kthread+0xe3/0x120
> > > > > 16) 232 40 ret_from_fork+0x1a1/0x270
> > > > > 17) 192 192 ret_from_fork_asm+0x1a/0x30
> > > > >
> > > > > But that's capturing a vfs_getattr() call from encoding a readdir reply,
> > > > > rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
> > > > >
> > > > > Here's the stack depth for nfsd4_deleg_getattr_conflict():
> > > > >
> > > > > # echo nfsd4_deleg_getattr_conflict
> > > > > > /sys/kernel/debug/tracing/stack_trace_filter
> > > > > # echo 0 >/sys/kernel/debug/tracing/stack_max_size
> > > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > > > Depth Size Location (14 entries)
> > > > > ----- ---- --------
> > > > > 0) 1472 48 nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
> > > > > 1) 1424 552 nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
> > > > > 2) 872 16 nfsd4_encode_getattr+0x2c/0x40 [nfsd]
> > > > > 3) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > > > 4) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > > > 5) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > > > > 6) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > > > 7) 536 40 svc_process+0x150/0x240 [sunrpc]
> > > > > 8) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > > > 9) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > > > > 10) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > > > > 11) 288 56 kthread+0xe3/0x120
> > > > > 12) 232 40 ret_from_fork+0x1a1/0x270
> > > > > 13) 192 192 ret_from_fork_asm+0x1a/0x30
> > > > >
> > > > > Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
> > > > > function (that we can trace) is avc_lookup(), so here's a bpftrace script that
> > > > > prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
> > > > > (I mostly punted to Gemini for this):
> > > > >
> > > > > # cat peak-usage.bt
> > > > > kprobe:nfsd4_deleg_getattr_conflict {
> > > > > @in_deleg_conflict[tid] = 1;
> > > > > }
> > > > >
> > > > > kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
> > > > > $stack_size = (uint64)16384;
> > > > > $sp = reg("sp");
> > > > > $stack_base = $sp & ~($stack_size - 1);
> > > > > $total_used = $stack_base + $stack_size - $sp;
> > > > >
> > > > > if ($total_used > @max_depth_bytes) {
> > > > > @max_depth_bytes = $total_used;
> > > > > @deepest_stack = kstack;
> > > > > }
> > > > > }
> > > > >
> > > > > kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
> > > > >
> > > > > And finally the result:
> > > > >
> > > > > # bpftrace peak-usage.bt
> > > > > Attached 3 probes
> > > > > ^C
> > > > >
> > > > > @deepest_stack:
> > > > > avc_lookup+1
> > > > > avc_has_perm_noaudit+60
> > > > > avc_has_perm+89
> > > > > selinux_inode_getattr+203
> > > > > security_inode_getattr+70
> > > > > vfs_getattr+35
> > > > > nfsd4_deleg_getattr_conflict+958
> > > > > nfsd4_encode_fattr4+401
> > > > > nfsd4_encode_getattr+44
> > > > > nfsd4_encode_operation+207
> > > > > nfsd4_proc_compound+470
> > > > > nfsd_dispatch+217
> > > > > svc_process_common+1227
> > > > > svc_process+336
> > > > > svc_handle_xprt+1200
> > > > > svc_recv+434
> > > > > nfsd+284
> > > > > kthread+227
> > > > > ret_from_fork+417
> > > > > ret_from_fork_asm+26
> > > > >
> > > > > @max_depth_bytes: 1792
> > > >
> > > > Since the new code only needs the file's size, perhaps you can get
> > > > away with
> > > >
> > > > if (i_size_read(inode) != ncf->ncf_cb_fsize)
> > > >
> > > > rather than
> > > >
> > > > err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT);
> > > > if (err) {
> > > > status = nfserrno(err);
> > > > goto out_status;
> > > > }
> > > > if (stat->size != ncf->ncf_cb_fsize)
> > > >
> > > > Then there's no longer a need for a struct kstat at all. The client is
> > > > holding a delegation, so I would expect the file size to be stable.
> > > >
> > >
> > > I hate relying on the size for this, since it's not a reliable
> > > indicator of change, especially on something that has a fixed size (VM
> > > image, for instance).
> >
> > You're right, it's not. I dislike how the pseudocode in 10.4.3 is only
> > checking the change attribute to determine if the file has been modified
> > and then 2 paragraphs down it talks about checking the file size.
> >
> > At any rate, the main point of this fix is to make sure we're comparing
> > the size from the cb_getattr reply to the current size, not a cached
> > value. For example:
> >
> > client 1 does open(O_RDWR|O_CREATE_O_TRUNC)
> > client 1 writes some data and flushes it to the server
> > client 2 does some operation that triggers a CB_GETATTR
> >
> > At this point client 1 doesn't have any modified data, so it sends the
> > original change attribute and its current file size (which should match
> > the current file size on the server). With the current code, the server
> > compares the size client 1 sent with the value it cached on the
> > delegation (in this case 0). The server treats it as if the client is
> > has modified data, which it does not. That's all this patch is aiming
> > to fix.
> >
> > >
> > > In fact, I wonder if this ought to be looking at the mtime too. If the
> > > mtime that the client reported is later than the mtime the server has,
> > > then we can probably infer that there are buffered writes out there.
> >
> > But the spec doesn't say anything about comparing the mtime. Plus, if
> > the client is sending the original change attribute but a different
> > mtime than what the server has, wouldn't that be a client bug?
> >
> > Plus, wouldn't it sort of be crossing into delegated timestamps
> > territory if we started checking the mtime?
> >
> >
>
> Should we be looking at extending the spec? Maybe we should add a "I
> have outstanding buffered data" attribute that the client could send to
> the server in a CB_GETATTR?
I guess I don't understand why the client's usage of the change
attribute doesn't already cover that.
If an application calls write() but not fsync(), close(), etc, and the
background flushers haven't kicked in yet, then the dirty pages/folios
will be tracked on the nfs_inode and the client will send the elevated
change attribute in the CB_GETATTR reply.
What am I missing?
> --
> Jeff Layton <jlayton@kernel.org>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-06 22:14 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-04 0:54 [PATCH v2] nfsd: fix file change detection in CB_GETATTR Scott Mayhew
2026-04-04 15:07 ` Chuck Lever
2026-04-05 22:13 ` Scott Mayhew
2026-04-06 14:12 ` Chuck Lever
2026-04-06 14:38 ` Jeff Layton
2026-04-06 17:47 ` Scott Mayhew
2026-04-06 18:13 ` Jeff Layton
2026-04-06 18:48 ` Jeff Layton
2026-04-06 22:14 ` Scott Mayhew
2026-04-06 17:20 ` Scott Mayhew
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox