public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Scott Mayhew <smayhew@redhat.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <cel@kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>, NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR
Date: Mon, 6 Apr 2026 18:14:33 -0400	[thread overview]
Message-ID: <adQwSTW7ABXvrovQ@aion> (raw)
In-Reply-To: <ca6076eba9248c39f8b95bed66cec5d907aa0ab7.camel@kernel.org>

On Mon, 06 Apr 2026, Jeff Layton wrote:

> On Mon, 2026-04-06 at 13:47 -0400, Scott Mayhew wrote:
> > On Mon, 06 Apr 2026, Jeff Layton wrote:
> > 
> > > On Mon, 2026-04-06 at 10:12 -0400, Chuck Lever wrote:
> > > > 
> > > > On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote:
> > > > > On Sat, 04 Apr 2026, Chuck Lever wrote:
> > > > > > On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote:
> > > > 
> > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > > > index fa657badf5f8..53d8e7e7d60b 100644
> > > > > > > --- a/fs/nfsd/nfs4state.c
> > > > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > > 
> > > > > > > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct 
> > > > > > > dentry *dentry, struct nfs4_delegation
> > > > > > >   * caller must put the reference.
> > > > > > >   */
> > > > > > >  __be32
> > > > > > > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry 
> > > > > > > *dentry,
> > > > > > > -			     struct nfs4_delegation **pdp)
> > > > > > > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path,
> > > > > > > +			     struct kstat *stat, struct nfs4_delegation **pdp)
> > > > > > 
> > > > > > Passing the kstat struct in saves some stack just as I suggested,
> > > > > > but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall,
> > > > > > though -- did you happen to measure how deep it gets after this patch
> > > > > > is applied?
> > > > > > 
> > > > > 
> > > > > I tried using the stack tracer:
> > > > > 
> > > > > # echo 1 >/proc/sys/kernel/stack_tracer_enabled
> > > > > # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter
> > > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > > >         Depth    Size   Location    (18 entries)
> > > > >         -----    ----   --------
> > > > >   0)     1936      48   vfs_getattr+0x9/0x50
> > > > >   1)     1888     552   nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd]
> > > > >   2)     1336      80   nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd]
> > > > >   3)     1256      96   nfsd4_encode_entry4+0x10b/0x2a0 [nfsd]
> > > > >   4)     1160     144   nfsd_buffered_readdir+0x139/0x310 [nfsd]
> > > > >   5)     1016      80   nfsd_readdir+0x9f/0x180 [nfsd]
> > > > >   6)      936      80   nfsd4_encode_readdir+0xdf/0x1e0 [nfsd]
> > > > >   7)      856      80   nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > > >   8)      776      80   nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > > >   9)      696      80   nfsd_dispatch+0xd9/0x240 [nfsd]
> > > > >  10)      616      80   svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > > >  11)      536      40   svc_process+0x150/0x240 [sunrpc]
> > > > >  12)      496      72   svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > > >  13)      424      56   svc_recv+0x1b2/0x3a0 [sunrpc]
> > > > >  14)      368      80   nfsd+0x11c/0x3d0 [nfsd]
> > > > >  15)      288      56   kthread+0xe3/0x120
> > > > >  16)      232      40   ret_from_fork+0x1a1/0x270
> > > > >  17)      192     192   ret_from_fork_asm+0x1a/0x30
> > > > > 
> > > > > But that's capturing a vfs_getattr() call from encoding a readdir reply,
> > > > > rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict().
> > > > > 
> > > > > Here's the stack depth for nfsd4_deleg_getattr_conflict():
> > > > > 
> > > > > # echo nfsd4_deleg_getattr_conflict 
> > > > > > /sys/kernel/debug/tracing/stack_trace_filter
> > > > > # echo 0 >/sys/kernel/debug/tracing/stack_max_size 
> > > > > # cat /sys/kernel/debug/tracing/stack_trace
> > > > >         Depth    Size   Location    (14 entries)
> > > > >         -----    ----   --------
> > > > >   0)     1472      48   nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd]
> > > > >   1)     1424     552   nfsd4_encode_fattr4+0x191/0x7a0 [nfsd]
> > > > >   2)      872      16   nfsd4_encode_getattr+0x2c/0x40 [nfsd]
> > > > >   3)      856      80   nfsd4_encode_operation+0xcf/0x3d0 [nfsd]
> > > > >   4)      776      80   nfsd4_proc_compound+0x1d6/0x7a0 [nfsd]
> > > > >   5)      696      80   nfsd_dispatch+0xd9/0x240 [nfsd]
> > > > >   6)      616      80   svc_process_common+0x4cb/0x6b0 [sunrpc]
> > > > >   7)      536      40   svc_process+0x150/0x240 [sunrpc]
> > > > >   8)      496      72   svc_handle_xprt+0x4b0/0x5f0 [sunrpc]
> > > > >   9)      424      56   svc_recv+0x1b2/0x3a0 [sunrpc]
> > > > >  10)      368      80   nfsd+0x11c/0x3d0 [nfsd]
> > > > >  11)      288      56   kthread+0xe3/0x120
> > > > >  12)      232      40   ret_from_fork+0x1a1/0x270
> > > > >  13)      192     192   ret_from_fork_asm+0x1a/0x30
> > > > > 
> > > > > Manually inspecting function graphs of vfs_getattr(), it looks like the deepest
> > > > > function (that we can trace) is avc_lookup(), so here's a bpftrace script that
> > > > > prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict()
> > > > > (I mostly punted to Gemini for this):
> > > > > 
> > > > > # cat peak-usage.bt 
> > > > > kprobe:nfsd4_deleg_getattr_conflict {
> > > > >     @in_deleg_conflict[tid] = 1;
> > > > > }
> > > > > 
> > > > > kprobe:avc_lookup /@in_deleg_conflict[tid]/ {
> > > > >     $stack_size = (uint64)16384; 
> > > > >     $sp = reg("sp");
> > > > >     $stack_base = $sp & ~($stack_size - 1);
> > > > >     $total_used = $stack_base + $stack_size - $sp;
> > > > > 
> > > > >     if ($total_used > @max_depth_bytes) {
> > > > >         @max_depth_bytes = $total_used;
> > > > >         @deepest_stack = kstack;
> > > > >     }
> > > > > }
> > > > > 
> > > > > kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); }
> > > > > 
> > > > > And finally the result:
> > > > > 
> > > > > # bpftrace peak-usage.bt 
> > > > > Attached 3 probes
> > > > > ^C
> > > > > 
> > > > > @deepest_stack: 
> > > > >         avc_lookup+1
> > > > >         avc_has_perm_noaudit+60
> > > > >         avc_has_perm+89
> > > > >         selinux_inode_getattr+203
> > > > >         security_inode_getattr+70
> > > > >         vfs_getattr+35
> > > > >         nfsd4_deleg_getattr_conflict+958
> > > > >         nfsd4_encode_fattr4+401
> > > > >         nfsd4_encode_getattr+44
> > > > >         nfsd4_encode_operation+207
> > > > >         nfsd4_proc_compound+470
> > > > >         nfsd_dispatch+217
> > > > >         svc_process_common+1227
> > > > >         svc_process+336
> > > > >         svc_handle_xprt+1200
> > > > >         svc_recv+434
> > > > >         nfsd+284
> > > > >         kthread+227
> > > > >         ret_from_fork+417
> > > > >         ret_from_fork_asm+26
> > > > > 
> > > > > @max_depth_bytes: 1792
> > > > 
> > > > Since the new code only needs the file's size, perhaps you can get
> > > > away with
> > > > 
> > > >         if (i_size_read(inode) != ncf->ncf_cb_fsize)
> > > > 
> > > > rather than
> > > > 
> > > >         err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT);
> > > >         if (err) {
> > > >                 status = nfserrno(err);
> > > >                 goto out_status;
> > > >         }
> > > >         if (stat->size != ncf->ncf_cb_fsize)
> > > > 
> > > > Then there's no longer a need for a struct kstat at all. The client is
> > > > holding a delegation, so I would expect the file size to be stable.
> > > > 
> > > 
> > > I hate relying on the size for this, since it's not a reliable
> > > indicator of change, especially on something that has a fixed size (VM
> > > image, for instance).
> > 
> > You're right, it's not.  I dislike how the pseudocode in 10.4.3 is only
> > checking the change attribute to determine if the file has been modified
> > and then 2 paragraphs down it talks about checking the file size.
> > 
> > At any rate, the main point of this fix is to make sure we're comparing
> > the size from the cb_getattr reply to the current size, not a cached
> > value.  For example:
> > 
> > client 1 does open(O_RDWR|O_CREATE_O_TRUNC)
> > client 1 writes some data and flushes it to the server
> > client 2 does some operation that triggers a CB_GETATTR
> > 
> > At this point client 1 doesn't have any modified data, so it sends the
> > original change attribute and its current file size (which should match
> > the current file size on the server).  With the current code, the server
> > compares the size client 1 sent with the value it cached on the
> > delegation (in this case 0).  The server treats it as if the client is
> > has modified data, which it does not.  That's all this patch is aiming
> > to fix.
> > 
> > > 
> > > In fact, I wonder if this ought to be looking at the mtime too. If the
> > > mtime that the client reported is later than the mtime the server has,
> > > then we can probably infer that there are buffered writes out there.
> > 
> > But the spec doesn't say anything about comparing the mtime.  Plus, if
> > the client is sending the original change attribute but a different
> > mtime than what the server has, wouldn't that be a client bug?
> > 
> > Plus, wouldn't it sort of be crossing into delegated timestamps
> > territory if we started checking the mtime?
> > 
> > 
> 
> Should we be looking at extending the spec? Maybe we should add a "I
> have outstanding buffered data" attribute that the client could send to
> the server in a CB_GETATTR?

I guess I don't understand why the client's usage of the change
attribute doesn't already cover that.

If an application calls write() but not fsync(), close(), etc, and the
background flushers haven't kicked in yet, then the dirty pages/folios
will be tracked on the nfs_inode and the client will send the elevated
change attribute in the CB_GETATTR reply.

What am I missing?

> -- 
> Jeff Layton <jlayton@kernel.org>
> 


  reply	other threads:[~2026-04-06 22:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-04  0:54 [PATCH v2] nfsd: fix file change detection in CB_GETATTR Scott Mayhew
2026-04-04 15:07 ` Chuck Lever
2026-04-05 22:13   ` Scott Mayhew
2026-04-06 14:12     ` Chuck Lever
2026-04-06 14:38       ` Jeff Layton
2026-04-06 17:47         ` Scott Mayhew
2026-04-06 18:13           ` Jeff Layton
2026-04-06 18:48           ` Jeff Layton
2026-04-06 22:14             ` Scott Mayhew [this message]
2026-04-06 17:20       ` Scott Mayhew

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adQwSTW7ABXvrovQ@aion \
    --to=smayhew@redhat.com \
    --cc=Dai.Ngo@oracle.com \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox