From: Jeff Layton <jlayton@kernel.org>
To: Scott Mayhew <smayhew@redhat.com>
Cc: Chuck Lever <cel@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>,
NeilBrown <neil@brown.name>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
Date: Mon, 06 Apr 2026 14:48:05 -0400 [thread overview]
Message-ID: <ca6076eba9248c39f8b95bed66cec5d907aa0ab7.camel@kernel.org> (raw)
In-Reply-To: <adPxsOl98imcKXeC@aion>
On Mon, 2026-04-06 at 13:47 -0400, Scott Mayhew wrote:
> On Mon, 06 Apr 2026, Jeff Layton wrote:
>
> > On Mon, 2026-04-06 at 10:12 -0400, Chuck Lever wrote:
> > >
> > > On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote:
> > > > On Sat, 04 Apr 2026, Chuck Lever wrote:
> > > > > On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
> > >
> > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > index fa657badf5f8..53d8e7e7d60b 100644
> > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > >
> > > > > > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct
> > > > > > dentry *dentry, struct nfs4_delegation
> > > > > > * caller must put the reference.
> > > > > > */
> > > > > > __be32
> > > > > > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry
> > > > > > *dentry,
> > > > > > - struct nfs4_delegation **pdp)
> > > > > > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> > > > > > + struct kstat *stat, struct nfs4_delegation **pdp)
> > > > >
> > > > > Passing the kstat struct in saves some stack just as I suggested,
> > > > > but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
> > > > > though -- did you happen to measure how deep it gets after this patch
> > > > > is applied?
> > > > >
> > > >
> > > > I tried using the stack tracer:
> > > >
> > > > # echo 1 >/proc/sys/kernel/stack_tracer_enabled
> > > > # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
> > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > > Depth Size Location (18 entries)
> > > > ----- ---- --------
> > > > 0) 1936 48 vfs_getattr+0x9/0x50
> > > > 1) 1888 552 nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
> > > > 2) 1336 80 nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
> > > > 3) 1256 96 nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
> > > > 4) 1160 144 nfsd_buffered_readdir+0x139/0x310 [nfsd]
> > > > 5) 1016 80 nfsd_readdir+0x9f/0x180 [nfsd]
> > > > 6) 936 80 nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
> > > > 7) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > > 8) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > > 9) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > > > 10) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > > 11) 536 40 svc_process+0x150/0x240 [sunrpc]
> > > > 12) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > > 13) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > > > 14) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > > > 15) 288 56 kthread+0xe3/0x120
> > > > 16) 232 40 ret_from_fork+0x1a1/0x270
> > > > 17) 192 192 ret_from_fork_asm+0x1a/0x30
> > > >
> > > > But that's capturing a vfs_getattr() call from encoding a readdir reply,
> > > > rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
> > > >
> > > > Here's the stack depth for nfsd4_deleg_getattr_conflict():
> > > >
> > > > # echo nfsd4_deleg_getattr_conflict
> > > > > /sys/kernel/debug/tracing/stack_trace_filter
> > > > # echo 0 >/sys/kernel/debug/tracing/stack_max_size
> > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > > Depth Size Location (14 entries)
> > > > ----- ---- --------
> > > > 0) 1472 48 nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
> > > > 1) 1424 552 nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
> > > > 2) 872 16 nfsd4_encode_getattr+0x2c/0x40 [nfsd]
> > > > 3) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > > 4) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > > 5) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd]
> > > > 6) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > > 7) 536 40 svc_process+0x150/0x240 [sunrpc]
> > > > 8) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > > 9) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc]
> > > > 10) 368 80 nfsd+0x11c/0x3d0 [nfsd]
> > > > 11) 288 56 kthread+0xe3/0x120
> > > > 12) 232 40 ret_from_fork+0x1a1/0x270
> > > > 13) 192 192 ret_from_fork_asm+0x1a/0x30
> > > >
> > > > Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
> > > > function (that we can trace) is avc_lookup(), so here's a bpftrace script that
> > > > prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
> > > > (I mostly punted to Gemini for this):
> > > >
> > > > # cat peak-usage.bt
> > > > kprobe:nfsd4_deleg_getattr_conflict {
> > > > @in_deleg_conflict[tid] = 1;
> > > > }
> > > >
> > > > kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
> > > > $stack_size = (uint64)16384;
> > > > $sp = reg("sp");
> > > > $stack_base = $sp & ~($stack_size - 1);
> > > > $total_used = $stack_base + $stack_size - $sp;
> > > >
> > > > if ($total_used > @max_depth_bytes) {
> > > > @max_depth_bytes = $total_used;
> > > > @deepest_stack = kstack;
> > > > }
> > > > }
> > > >
> > > > kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
> > > >
> > > > And finally the result:
> > > >
> > > > # bpftrace peak-usage.bt
> > > > Attached 3 probes
> > > > ^C
> > > >
> > > > @deepest_stack:
> > > > avc_lookup+1
> > > > avc_has_perm_noaudit+60
> > > > avc_has_perm+89
> > > > selinux_inode_getattr+203
> > > > security_inode_getattr+70
> > > > vfs_getattr+35
> > > > nfsd4_deleg_getattr_conflict+958
> > > > nfsd4_encode_fattr4+401
> > > > nfsd4_encode_getattr+44
> > > > nfsd4_encode_operation+207
> > > > nfsd4_proc_compound+470
> > > > nfsd_dispatch+217
> > > > svc_process_common+1227
> > > > svc_process+336
> > > > svc_handle_xprt+1200
> > > > svc_recv+434
> > > > nfsd+284
> > > > kthread+227
> > > > ret_from_fork+417
> > > > ret_from_fork_asm+26
> > > >
> > > > @max_depth_bytes: 1792
> > >
> > > Since the new code only needs the file's size, perhaps you can get
> > > away with
> > >
> > > if (i_size_read(inode) != ncf->ncf_cb_fsize)
> > >
> > > rather than
> > >
> > > err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT);
> > > if (err) {
> > > status = nfserrno(err);
> > > goto out_status;
> > > }
> > > if (stat->size != ncf->ncf_cb_fsize)
> > >
> > > Then there's no longer a need for a struct kstat at all. The client is
> > > holding a delegation, so I would expect the file size to be stable.
> > >
> >
> > I hate relying on the size for this, since it's not a reliable
> > indicator of change, especially on something that has a fixed size (VM
> > image, for instance).
>
> You're right, it's not. I dislike how the pseudocode in 10.4.3 is only
> checking the change attribute to determine if the file has been modified
> and then 2 paragraphs down it talks about checking the file size.
>
> At any rate, the main point of this fix is to make sure we're comparing
> the size from the cb_getattr reply to the current size, not a cached
> value. For example:
>
> client 1 does open(O_RDWR|O_CREATE_O_TRUNC)
> client 1 writes some data and flushes it to the server
> client 2 does some operation that triggers a CB_GETATTR
>
> At this point client 1 doesn't have any modified data, so it sends the
> original change attribute and its current file size (which should match
> the current file size on the server). With the current code, the server
> compares the size client 1 sent with the value it cached on the
> delegation (in this case 0). The server treats it as if the client is
> has modified data, which it does not. That's all this patch is aiming
> to fix.
>
> >
> > In fact, I wonder if this ought to be looking at the mtime too. If the
> > mtime that the client reported is later than the mtime the server has,
> > then we can probably infer that there are buffered writes out there.
>
> But the spec doesn't say anything about comparing the mtime. Plus, if
> the client is sending the original change attribute but a different
> mtime than what the server has, wouldn't that be a client bug?
>
> Plus, wouldn't it sort of be crossing into delegated timestamps
> territory if we started checking the mtime?
>
>
Should we be looking at extending the spec? Maybe we should add a "I
have outstanding buffered data" attribute that the client could send to
the server in a CB_GETATTR?
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2026-04-06 18:48 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-04 0:54 [PATCH v2] nfsd: fix file change detection in CB_GETATTR Scott Mayhew
2026-04-04 15:07 ` Chuck Lever
2026-04-05 22:13 ` Scott Mayhew
2026-04-06 14:12 ` Chuck Lever
2026-04-06 14:38 ` Jeff Layton
2026-04-06 17:47 ` Scott Mayhew
2026-04-06 18:13 ` Jeff Layton
2026-04-06 18:48 ` Jeff Layton [this message]
2026-04-06 22:14 ` Scott Mayhew
2026-04-06 17:20 ` Scott Mayhew
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca6076eba9248c39f8b95bed66cec5d907aa0ab7.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=smayhew@redhat.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox