public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Scott Mayhew <smayhew@redhat.com>
Cc: Chuck Lever <cel@kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	 NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>,  Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
Date: Mon, 06 Apr 2026 14:48:05 -0400	[thread overview]
Message-ID: <ca6076eba9248c39f8b95bed66cec5d907aa0ab7.camel@kernel.org> (raw)
In-Reply-To: <adPxsOl98imcKXeC@aion>

On Mon, 2026-04-06 at 13:47 -0400, Scott Mayhew wrote:
> On Mon, 06 Apr 2026, Jeff Layton wrote:
> 
> > On Mon, 2026-04-06 at 10:12 -0400, Chuck Lever wrote:
> > > 
> > > On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote:
> > > > On Sat, 04 Apr 2026, Chuck Lever wrote:
> > > > > On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
> > > 
> > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > index fa657badf5f8..53d8e7e7d60b 100644
> > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > 
> > > > > > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct 
> > > > > > dentry *dentry, struct nfs4_delegation
> > > > > >   * caller must put the reference.
> > > > > >   */
> > > > > >  __be32
> > > > > > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry 
> > > > > > *dentry,
> > > > > > -			     struct nfs4_delegation **pdp)
> > > > > > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> > > > > > +			     struct kstat *stat, struct nfs4_delegation **pdp)
> > > > > 
> > > > > Passing the kstat struct in saves some stack just as I suggested,
> > > > > but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
> > > > > though -- did you happen to measure how deep it gets after this patch
> > > > > is applied?
> > > > > 
> > > > 
> > > > I tried using the stack tracer:
> > > > 
> > > > # echo 1 >/proc/sys/kernel/stack_tracer_enabled
> > > > # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
> > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > >         Depth    Size   Location    (18 entries)
> > > >         -----    ----   --------
> > > >   0)     1936      48   vfs_getattr+0x9/0x50
> > > >   1)     1888     552   nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
> > > >   2)     1336      80   nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
> > > >   3)     1256      96   nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
> > > >   4)     1160     144   nfsd_buffered_readdir+0x139/0x310 [nfsd]
> > > >   5)     1016      80   nfsd_readdir+0x9f/0x180 [nfsd]
> > > >   6)      936      80   nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
> > > >   7)      856      80   nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > >   8)      776      80   nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > >   9)      696      80   nfsd_dispatch+0xd9/0x240 [nfsd]
> > > >  10)      616      80   svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > >  11)      536      40   svc_process+0x150/0x240 [sunrpc]
> > > >  12)      496      72   svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > >  13)      424      56   svc_recv+0x1b2/0x3a0 [sunrpc]
> > > >  14)      368      80   nfsd+0x11c/0x3d0 [nfsd]
> > > >  15)      288      56   kthread+0xe3/0x120
> > > >  16)      232      40   ret_from_fork+0x1a1/0x270
> > > >  17)      192     192   ret_from_fork_asm+0x1a/0x30
> > > > 
> > > > But that's capturing a vfs_getattr() call from encoding a readdir reply,
> > > > rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
> > > > 
> > > > Here's the stack depth for nfsd4_deleg_getattr_conflict():
> > > > 
> > > > # echo nfsd4_deleg_getattr_conflict 
> > > > > /sys/kernel/debug/tracing/stack_trace_filter
> > > > # echo 0 >/sys/kernel/debug/tracing/stack_max_size 
> > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > >         Depth    Size   Location    (14 entries)
> > > >         -----    ----   --------
> > > >   0)     1472      48   nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
> > > >   1)     1424     552   nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
> > > >   2)      872      16   nfsd4_encode_getattr+0x2c/0x40 [nfsd]
> > > >   3)      856      80   nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > >   4)      776      80   nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > >   5)      696      80   nfsd_dispatch+0xd9/0x240 [nfsd]
> > > >   6)      616      80   svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > >   7)      536      40   svc_process+0x150/0x240 [sunrpc]
> > > >   8)      496      72   svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > >   9)      424      56   svc_recv+0x1b2/0x3a0 [sunrpc]
> > > >  10)      368      80   nfsd+0x11c/0x3d0 [nfsd]
> > > >  11)      288      56   kthread+0xe3/0x120
> > > >  12)      232      40   ret_from_fork+0x1a1/0x270
> > > >  13)      192     192   ret_from_fork_asm+0x1a/0x30
> > > > 
> > > > Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
> > > > function (that we can trace) is avc_lookup(), so here's a bpftrace script that
> > > > prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
> > > > (I mostly punted to Gemini for this):
> > > > 
> > > > # cat peak-usage.bt 
> > > > kprobe:nfsd4_deleg_getattr_conflict {
> > > >     @in_deleg_conflict[tid] = 1;
> > > > }
> > > > 
> > > > kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
> > > >     $stack_size = (uint64)16384; 
> > > >     $sp = reg("sp");
> > > >     $stack_base = $sp & ~($stack_size - 1);
> > > >     $total_used = $stack_base + $stack_size - $sp;
> > > > 
> > > >     if ($total_used > @max_depth_bytes) {
> > > >         @max_depth_bytes = $total_used;
> > > >         @deepest_stack = kstack;
> > > >     }
> > > > }
> > > > 
> > > > kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
> > > > 
> > > > And finally the result:
> > > > 
> > > > # bpftrace peak-usage.bt 
> > > > Attached 3 probes
> > > > ^C
> > > > 
> > > > @deepest_stack: 
> > > >         avc_lookup+1
> > > >         avc_has_perm_noaudit+60
> > > >         avc_has_perm+89
> > > >         selinux_inode_getattr+203
> > > >         security_inode_getattr+70
> > > >         vfs_getattr+35
> > > >         nfsd4_deleg_getattr_conflict+958
> > > >         nfsd4_encode_fattr4+401
> > > >         nfsd4_encode_getattr+44
> > > >         nfsd4_encode_operation+207
> > > >         nfsd4_proc_compound+470
> > > >         nfsd_dispatch+217
> > > >         svc_process_common+1227
> > > >         svc_process+336
> > > >         svc_handle_xprt+1200
> > > >         svc_recv+434
> > > >         nfsd+284
> > > >         kthread+227
> > > >         ret_from_fork+417
> > > >         ret_from_fork_asm+26
> > > > 
> > > > @max_depth_bytes: 1792
> > > 
> > > Since the new code only needs the file's size, perhaps you can get
> > > away with
> > > 
> > >         if (i_size_read(inode) != ncf->ncf_cb_fsize)
> > > 
> > > rather than
> > > 
> > >         err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT);
> > >         if (err) {
> > >                 status = nfserrno(err);
> > >                 goto out_status;
> > >         }
> > >         if (stat->size != ncf->ncf_cb_fsize)
> > > 
> > > Then there's no longer a need for a struct kstat at all. The client is
> > > holding a delegation, so I would expect the file size to be stable.
> > > 
> > 
> > I hate relying on the size for this, since it's not a reliable
> > indicator of change, especially on something that has a fixed size (VM
> > image, for instance).
> 
> You're right, it's not.  I dislike how the pseudocode in 10.4.3 is only
> checking the change attribute to determine if the file has been modified
> and then 2 paragraphs down it talks about checking the file size.
> 
> At any rate, the main point of this fix is to make sure we're comparing
> the size from the cb_getattr reply to the current size, not a cached
> value.  For example:
> 
> client 1 does open(O_RDWR|O_CREATE_O_TRUNC)
> client 1 writes some data and flushes it to the server
> client 2 does some operation that triggers a CB_GETATTR
> 
> At this point client 1 doesn't have any modified data, so it sends the
> original change attribute and its current file size (which should match
> the current file size on the server).  With the current code, the server
> compares the size client 1 sent with the value it cached on the
> delegation (in this case 0).  The server treats it as if the client is
> has modified data, which it does not.  That's all this patch is aiming
> to fix.
> 
> > 
> > In fact, I wonder if this ought to be looking at the mtime too. If the
> > mtime that the client reported is later than the mtime the server has,
> > then we can probably infer that there are buffered writes out there.
> 
> But the spec doesn't say anything about comparing the mtime.  Plus, if
> the client is sending the original change attribute but a different
> mtime than what the server has, wouldn't that be a client bug?
> 
> Plus, wouldn't it sort of be crossing into delegated timestamps
> territory if we started checking the mtime?
> 
> 

Should we be looking at extending the spec? Maybe we should add a "I
have outstanding buffered data" attribute that the client could send to
the server in a CB_GETATTR?
-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2026-04-06 18:48 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04  0:54 [PATCH v2] nfsd: fix file change detection in CB_GETATTR Scott Mayhew
2026-04-04 15:07 ` Chuck Lever
2026-04-05 22:13   ` Scott Mayhew
2026-04-06 14:12     ` Chuck Lever
2026-04-06 14:38       ` Jeff Layton
2026-04-06 17:47         ` Scott Mayhew
2026-04-06 18:13           ` Jeff Layton
2026-04-06 18:48           ` Jeff Layton [this message]
2026-04-06 22:14             ` Scott Mayhew
2026-04-06 17:20       ` Scott Mayhew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca6076eba9248c39f8b95bed66cec5d907aa0ab7.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=smayhew@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox