From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8B6335A385 for ; Mon, 6 Apr 2026 17:47:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775497660; cv=none; b=kv7X0mJUiibD9RlZga6HhLQLHMr1ZqhTrqdHfiuWrMadRLuGRFQBCPx0x/jxofvQ70n3+m6SBhEs7z/gtkmPYLcwYqOcXo2W9SusmYsT28Z04mR9iZc83d6e7tZy694eiujUE7yyTufvD8wlBwpfMs1URTDyJIloy0h+CeUE3fs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775497660; c=relaxed/simple; bh=6SvVgbPuDUdF2GVoVZiU/Xgmdels4uVGphDLbGXhrZo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=icdysC7PSA6A/qtvPm2mKA5NmcNGfZZFkiU4O6hXJrdxRUoq+a4NVhJAcVbbEb+/fAGvdxD4yH/AxH/gteVsJOHpG/RwIxh16shsgjLsTqkHuxLE89Gi/r7dmg+L2luRJuQhnrveRLwgMQ/5x4bnEFGDHl0dn2cIABGwaRABFgg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=GPHLtjkl; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="GPHLtjkl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1775497657; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zkOSLQOok/Gg8Pu+HHYjlUGHXr7rIG8x/i4tL0eaDzU=; b=GPHLtjklViYnPM8/bfJQNRzrMFPmFKE777Zk27SvCM8rTnrYUMERLtEqrEaetwoO1JlHC2 +yzM2UyVGx/piw+p6gyljqz2vGxaxK1LceeWLm3sy5a0c0IUr4z4xBjk3llfK6ZUPdx06i 3TteaQ5A51EDyHDWsQI79pn6CzCqW6s= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-64-FI41Yi5GPTKWRK2nZfc8xg-1; Mon, 06 Apr 2026 13:47:31 -0400 X-MC-Unique: FI41Yi5GPTKWRK2nZfc8xg-1 X-Mimecast-MFC-AGG-ID: FI41Yi5GPTKWRK2nZfc8xg_1775497650 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4CC541800281; Mon, 6 Apr 2026 17:47:30 +0000 (UTC) Received: from aion.redhat.com (unknown [10.22.80.134]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BCF031955D84; Mon, 6 Apr 2026 17:47:29 +0000 (UTC) Received: by aion.redhat.com (Postfix, from userid 1000) id 7907D74BD79; Mon, 06 Apr 2026 13:47:28 -0400 (EDT) Date: Mon, 6 Apr 2026 13:47:28 -0400 From: Scott Mayhew To: Jeff Layton Cc: Chuck Lever , Chuck Lever , NeilBrown , Olga Kornievskaia , Dai Ngo , Tom Talpey , linux-nfs@vger.kernel.org Subject: Re: [PATCH v2] nfsd: fix file change detection in CB_GETATTR Message-ID: References: <20260404005405.1565136-1-smayhew@redhat.com> Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 On Mon, 06 Apr 2026, Jeff Layton wrote: > On Mon, 2026-04-06 at 10:12 -0400, Chuck Lever wrote: > > > > On Sun, Apr 5, 2026, at 6:13 PM, Scott Mayhew wrote: > > > On Sat, 04 Apr 2026, Chuck Lever wrote: > > > > On Fri, Apr 3, 2026, at 8:54 PM, Scott Mayhew wrote: > > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > > > index fa657badf5f8..53d8e7e7d60b 100644 > > > > > --- a/fs/nfsd/nfs4state.c > > > > > +++ b/fs/nfsd/nfs4state.c > > > > > > > > > @@ -9459,17 +9461,18 @@ static int cb_getattr_update_times(struct > > > > > dentry *dentry, struct nfs4_delegation > > > > > * caller must put the reference. > > > > > */ > > > > > __be32 > > > > > -nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct dentry > > > > > *dentry, > > > > > - struct nfs4_delegation **pdp) > > > > > +nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct path *path, > > > > > + struct kstat *stat, struct nfs4_delegation **pdp) > > > > > > > > Passing the kstat struct in saves some stack just as I suggested, > > > > but it is an ugly API. The nfsd4_encode_fattr4() call stack is tall, > > > > though -- did you happen to measure how deep it gets after this patch > > > > is applied? > > > > > > > > > > I tried using the stack tracer: > > > > > > # echo 1 >/proc/sys/kernel/stack_tracer_enabled > > > # echo vfs_getattr >/sys/kernel/debug/tracing/stack_trace_filter > > > # cat /sys/kernel/debug/tracing/stack_trace > > > Depth Size Location (18 entries) > > > ----- ---- -------- > > > 0) 1936 48 vfs_getattr+0x9/0x50 > > > 1) 1888 552 nfsd4_encode_fattr4+0x1b2/0x7a0 [nfsd] > > > 2) 1336 80 nfsd4_encode_entry4_fattr+0xf8/0x210 [nfsd] > > > 3) 1256 96 nfsd4_encode_entry4+0x10b/0x2a0 [nfsd] > > > 4) 1160 144 nfsd_buffered_readdir+0x139/0x310 [nfsd] > > > 5) 1016 80 nfsd_readdir+0x9f/0x180 [nfsd] > > > 6) 936 80 nfsd4_encode_readdir+0xdf/0x1e0 [nfsd] > > > 7) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd] > > > 8) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd] > > > 9) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd] > > > 10) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc] > > > 11) 536 40 svc_process+0x150/0x240 [sunrpc] > > > 12) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc] > > > 13) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc] > > > 14) 368 80 nfsd+0x11c/0x3d0 [nfsd] > > > 15) 288 56 kthread+0xe3/0x120 > > > 16) 232 40 ret_from_fork+0x1a1/0x270 > > > 17) 192 192 ret_from_fork_asm+0x1a/0x30 > > > > > > But that's capturing a vfs_getattr() call from encoding a readdir reply, > > > rather than the vfs_getattr() call I added to nfsd4_deleg_getattr_conflict(). > > > > > > Here's the stack depth for nfsd4_deleg_getattr_conflict(): > > > > > > # echo nfsd4_deleg_getattr_conflict > > > > /sys/kernel/debug/tracing/stack_trace_filter > > > # echo 0 >/sys/kernel/debug/tracing/stack_max_size > > > # cat /sys/kernel/debug/tracing/stack_trace > > > Depth Size Location (14 entries) > > > ----- ---- -------- > > > 0) 1472 48 nfsd4_deleg_getattr_conflict+0x9/0x410 [nfsd] > > > 1) 1424 552 nfsd4_encode_fattr4+0x191/0x7a0 [nfsd] > > > 2) 872 16 nfsd4_encode_getattr+0x2c/0x40 [nfsd] > > > 3) 856 80 nfsd4_encode_operation+0xcf/0x3d0 [nfsd] > > > 4) 776 80 nfsd4_proc_compound+0x1d6/0x7a0 [nfsd] > > > 5) 696 80 nfsd_dispatch+0xd9/0x240 [nfsd] > > > 6) 616 80 svc_process_common+0x4cb/0x6b0 [sunrpc] > > > 7) 536 40 svc_process+0x150/0x240 [sunrpc] > > > 8) 496 72 svc_handle_xprt+0x4b0/0x5f0 [sunrpc] > > > 9) 424 56 svc_recv+0x1b2/0x3a0 [sunrpc] > > > 10) 368 80 nfsd+0x11c/0x3d0 [nfsd] > > > 11) 288 56 kthread+0xe3/0x120 > > > 12) 232 40 ret_from_fork+0x1a1/0x270 > > > 13) 192 192 ret_from_fork_asm+0x1a/0x30 > > > > > > Manually inspecting function graphs of vfs_getattr(), it looks like the deepest > > > function (that we can trace) is avc_lookup(), so here's a bpftrace script that > > > prints the stack depth from avc_lookup() via nfsd4_deleg_getattr_conflict() > > > (I mostly punted to Gemini for this): > > > > > > # cat peak-usage.bt > > > kprobe:nfsd4_deleg_getattr_conflict { > > > @in_deleg_conflict[tid] = 1; > > > } > > > > > > kprobe:avc_lookup /@in_deleg_conflict[tid]/ { > > > $stack_size = (uint64)16384; > > > $sp = reg("sp"); > > > $stack_base = $sp & ~($stack_size - 1); > > > $total_used = $stack_base + $stack_size - $sp; > > > > > > if ($total_used > @max_depth_bytes) { > > > @max_depth_bytes = $total_used; > > > @deepest_stack = kstack; > > > } > > > } > > > > > > kretprobe:nfsd4_deleg_getattr_conflict { delete(@in_deleg_conflict[tid]); } > > > > > > And finally the result: > > > > > > # bpftrace peak-usage.bt > > > Attached 3 probes > > > ^C > > > > > > @deepest_stack: > > > avc_lookup+1 > > > avc_has_perm_noaudit+60 > > > avc_has_perm+89 > > > selinux_inode_getattr+203 > > > security_inode_getattr+70 > > > vfs_getattr+35 > > > nfsd4_deleg_getattr_conflict+958 > > > nfsd4_encode_fattr4+401 > > > nfsd4_encode_getattr+44 > > > nfsd4_encode_operation+207 > > > nfsd4_proc_compound+470 > > > nfsd_dispatch+217 > > > svc_process_common+1227 > > > svc_process+336 > > > svc_handle_xprt+1200 > > > svc_recv+434 > > > nfsd+284 > > > kthread+227 > > > ret_from_fork+417 > > > ret_from_fork_asm+26 > > > > > > @max_depth_bytes: 1792 > > > > Since the new code only needs the file's size, perhaps you can get > > away with > > > > if (i_size_read(inode) != ncf->ncf_cb_fsize) > > > > rather than > > > > err = vfs_getattr(path, stat, STATX_SIZE, AT_STATX_SYNC_AS_STAT); > > if (err) { > > status = nfserrno(err); > > goto out_status; > > } > > if (stat->size != ncf->ncf_cb_fsize) > > > > Then there's no longer a need for a struct kstat at all. The client is > > holding a delegation, so I would expect the file size to be stable. > > > > I hate relying on the size for this, since it's not a reliable > indicator of change, especially on something that has a fixed size (VM > image, for instance). You're right, it's not. I dislike how the pseudocode in 10.4.3 is only checking the change attribute to determine if the file has been modified and then 2 paragraphs down it talks about checking the file size. At any rate, the main point of this fix is to make sure we're comparing the size from the cb_getattr reply to the current size, not a cached value. For example: client 1 does open(O_RDWR|O_CREATE_O_TRUNC) client 1 writes some data and flushes it to the server client 2 does some operation that triggers a CB_GETATTR At this point client 1 doesn't have any modified data, so it sends the original change attribute and its current file size (which should match the current file size on the server). With the current code, the server compares the size client 1 sent with the value it cached on the delegation (in this case 0). The server treats it as if the client is has modified data, which it does not. That's all this patch is aiming to fix. > > In fact, I wonder if this ought to be looking at the mtime too. If the > mtime that the client reported is later than the mtime the server has, > then we can probably infer that there are buffered writes out there. But the spec doesn't say anything about comparing the mtime. Plus, if the client is sending the original change attribute but a different mtime than what the server has, wouldn't that be a client bug? Plus, wouldn't it sort of be crossing into delegated timestamps territory if we started checking the mtime? > -- > Jeff Layton >