* Blocking stat calls during (p)NFS write
@ 2025-10-09 10:57 Mkrtchyan, Tigran
2025-10-09 11:48 ` Jeff Layton
0 siblings, 1 reply; 2+ messages in thread
From: Mkrtchyan, Tigran @ 2025-10-09 10:57 UTC (permalink / raw)
To: linux-nfs
[-- Attachment #1: Type: text/plain, Size: 4571 bytes --]
Dear NFS fellows,
We have noticed that when data is written into a large file, a stat call on that file blocks.
This is quite simple to reproduce. Open two terminals, in one copy a 4GB file into NFS NFS-mounted
directory, in another window run `stat /path/to/file`.
By looking at the perf output, I can see:
stat 301583 [007] 265242.393512: sched:sched_wake_idle_without_ipi: cpu=2
ffffffffb5e0cfa0 call_function_single_prep_ipi+0x90 ([kernel.kallsyms])
ffffffffb5e0cfa0 call_function_single_prep_ipi+0x90 ([kernel.kallsyms])
ffffffffb5ef176b __smp_call_single_queue+0xdb ([kernel.kallsyms])
ffffffffb5ef1866 generic_exec_single+0x36 ([kernel.kallsyms])
ffffffffb5ef1bf2 smp_call_function_single_async+0x22 ([kernel.kallsyms])
ffffffffb5ec97f4 update_process_times+0xa4 ([kernel.kallsyms])
ffffffffb5ee279f tick_nohz_handler+0x8f ([kernel.kallsyms])
ffffffffb5eca7c0 __hrtimer_run_queues+0x110 ([kernel.kallsyms])
ffffffffb5ecb62c hrtimer_interrupt+0xfc ([kernel.kallsyms])
ffffffffb5d5deb5 __sysvec_apic_timer_interrupt+0x55 ([kernel.kallsyms])
ffffffffb6f6364c sysvec_apic_timer_interrupt+0x6c ([kernel.kallsyms])
ffffffffb5a0160a asm_sysvec_apic_timer_interrupt+0x1a ([kernel.kallsyms])
ffffffffb5e07b8f finish_task_switch.isra.0+0x9f ([kernel.kallsyms])
ffffffffb6f6ad91 __schedule+0x301 ([kernel.kallsyms])
ffffffffb6f6b277 schedule+0x27 ([kernel.kallsyms])
ffffffffb6f6b326 io_schedule+0x46 ([kernel.kallsyms])
ffffffffb608fddf folio_wait_bit+0xef ([kernel.kallsyms])
ffffffffb609c95e folio_wait_writeback+0x2e ([kernel.kallsyms])
ffffffffb608f0ff __filemap_fdatawait_range+0x7f ([kernel.kallsyms])
ffffffffb6091bc8 filemap_write_and_wait_range+0xc8 ([kernel.kallsyms])
ffffffffc1fc5cc7 nfs_getattr+0x567 ([kernel.kallsyms])
ffffffffb61eb3fe vfs_getattr_nosec+0xbe ([kernel.kallsyms])
ffffffffb61eb673 vfs_statx+0xa3 ([kernel.kallsyms])
ffffffffb61ec363 do_statx+0x63 ([kernel.kallsyms])
ffffffffb61ec5c0 __x64_sys_statx+0x90 ([kernel.kallsyms])
ffffffffb6f5e5ae do_syscall_64+0x7e ([kernel.kallsyms])
ffffffffb5a0012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
7f17c3b1fb6e statx+0xe (/usr/lib64/libc.so.6)
55b6106f684f main+0x45f (/usr/bin/stat)
7f17c3a3a5b5 __libc_start_call_main+0x75 (/usr/lib64/libc.so.6)
7f17c3a3a668 __libc_start_main@@GLIBC_2.34+0x88 (/usr/lib64/libc.so.6)
55b6106f6bb5 _start+0x25 (/usr/bin/stat)
So I assume that blocking comes from inode invalidation in inode.c#nfs_getattr call:
1003 /* Flush out writes to the server in order to update c/mtime/version. */
1004 if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_CHANGE_COOKIE)) &&
1005 S_ISREG(inode->i_mode)) {
1006 if (nfs_have_delegated_mtime(inode))
1007 filemap_fdatawrite(inode->i_mapping);
1008 else
1009 filemap_write_and_wait(inode->i_mapping);
1010 }
The packets look as follows:
No. Time Rpc Time Source Destination Protocol Info
16 21.114244808 10.1.0.71 10.1.0.34 NFS V4 Call (Reply In 18) OPEN DH: 0xf359c2c5/f42.iso | LAYOUTGET
18 21.158017758 0.043772950 10.1.0.34 10.1.0.71 NFS V4 Reply (Call In 16) OPEN StateID: 0x7ca4 | LAYOUTGET
29 42.354912088 10.1.0.71 10.1.0.34 NFS V4 Call (Reply In 30) LAYOUTCOMMIT
30 42.357843007 0.002930919 10.1.0.34 10.1.0.71 NFS V4 Reply (Call In 29) LAYOUTCOMMIT
32 42.357993362 10.1.0.71 10.1.0.34 NFS V4 Call (Reply In 35) GETATTR FH: 0x7d6441d9
33 42.358016100 10.1.0.71 10.1.0.34 NFS V4 Call (Reply In 36) LAYOUTRETURN
35 42.359324091 0.001330729 10.1.0.34 10.1.0.71 NFS V4 Reply (Call In 32) GETATTR
36 42.378923839 0.020907739 10.1.0.34 10.1.0.71 NFS V4 Reply (Call In 33) LAYOUTRETURN
38 42.379133795 10.1.0.71 10.1.0.34 NFS V4 Call (Reply In 39) CLOSE StateID: 0x7ca4
39 42.380213999 0.001080204 10.1.0.34 10.1.0.71 NFS V4 Reply (Call In 38) CLOSE
So, GETATTR is sent after LAYOUTCOMMIT.
This behavior is observed with 6.17 and RHEL kernels.
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2826 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Blocking stat calls during (p)NFS write
2025-10-09 10:57 Blocking stat calls during (p)NFS write Mkrtchyan, Tigran
@ 2025-10-09 11:48 ` Jeff Layton
0 siblings, 0 replies; 2+ messages in thread
From: Jeff Layton @ 2025-10-09 11:48 UTC (permalink / raw)
To: Mkrtchyan, Tigran, linux-nfs
On Thu, 2025-10-09 at 12:57 +0200, Mkrtchyan, Tigran wrote:
> Dear NFS fellows,
>
> We have noticed that when data is written into a large file, a stat call on that file blocks.
> This is quite simple to reproduce. Open two terminals, in one copy a 4GB file into NFS NFS-mounted
> directory, in another window run `stat /path/to/file`.
>
> By looking at the perf output, I can see:
>
> stat 301583 [007] 265242.393512: sched:sched_wake_idle_without_ipi: cpu=2
> ffffffffb5e0cfa0 call_function_single_prep_ipi+0x90 ([kernel.kallsyms])
> ffffffffb5e0cfa0 call_function_single_prep_ipi+0x90 ([kernel.kallsyms])
> ffffffffb5ef176b __smp_call_single_queue+0xdb ([kernel.kallsyms])
> ffffffffb5ef1866 generic_exec_single+0x36 ([kernel.kallsyms])
> ffffffffb5ef1bf2 smp_call_function_single_async+0x22 ([kernel.kallsyms])
> ffffffffb5ec97f4 update_process_times+0xa4 ([kernel.kallsyms])
> ffffffffb5ee279f tick_nohz_handler+0x8f ([kernel.kallsyms])
> ffffffffb5eca7c0 __hrtimer_run_queues+0x110 ([kernel.kallsyms])
> ffffffffb5ecb62c hrtimer_interrupt+0xfc ([kernel.kallsyms])
> ffffffffb5d5deb5 __sysvec_apic_timer_interrupt+0x55 ([kernel.kallsyms])
> ffffffffb6f6364c sysvec_apic_timer_interrupt+0x6c ([kernel.kallsyms])
> ffffffffb5a0160a asm_sysvec_apic_timer_interrupt+0x1a ([kernel.kallsyms])
> ffffffffb5e07b8f finish_task_switch.isra.0+0x9f ([kernel.kallsyms])
> ffffffffb6f6ad91 __schedule+0x301 ([kernel.kallsyms])
> ffffffffb6f6b277 schedule+0x27 ([kernel.kallsyms])
> ffffffffb6f6b326 io_schedule+0x46 ([kernel.kallsyms])
> ffffffffb608fddf folio_wait_bit+0xef ([kernel.kallsyms])
> ffffffffb609c95e folio_wait_writeback+0x2e ([kernel.kallsyms])
> ffffffffb608f0ff __filemap_fdatawait_range+0x7f ([kernel.kallsyms])
> ffffffffb6091bc8 filemap_write_and_wait_range+0xc8 ([kernel.kallsyms])
> ffffffffc1fc5cc7 nfs_getattr+0x567 ([kernel.kallsyms])
> ffffffffb61eb3fe vfs_getattr_nosec+0xbe ([kernel.kallsyms])
> ffffffffb61eb673 vfs_statx+0xa3 ([kernel.kallsyms])
> ffffffffb61ec363 do_statx+0x63 ([kernel.kallsyms])
> ffffffffb61ec5c0 __x64_sys_statx+0x90 ([kernel.kallsyms])
> ffffffffb6f5e5ae do_syscall_64+0x7e ([kernel.kallsyms])
> ffffffffb5a0012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
> 7f17c3b1fb6e statx+0xe (/usr/lib64/libc.so.6)
> 55b6106f684f main+0x45f (/usr/bin/stat)
> 7f17c3a3a5b5 __libc_start_call_main+0x75 (/usr/lib64/libc.so.6)
> 7f17c3a3a668 __libc_start_main@@GLIBC_2.34+0x88 (/usr/lib64/libc.so.6)
> 55b6106f6bb5 _start+0x25 (/usr/bin/stat)
>
>
> So I assume that blocking comes from inode invalidation in inode.c#nfs_getattr call:
>
> 1003 /* Flush out writes to the server in order to update c/mtime/version. */
> 1004 if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_CHANGE_COOKIE)) &&
> 1005 S_ISREG(inode->i_mode)) {
> 1006 if (nfs_have_delegated_mtime(inode))
> 1007 filemap_fdatawrite(inode->i_mapping);
> 1008 else
> 1009 filemap_write_and_wait(inode->i_mapping);
> 1010 }
>
stat() requires updated mtime and ctime. The client must write any
dirty data back before it can report those accurately since the server
is authoritative for those attributes. If the server supports delegated
timestamps however, then you don't need to wait for writeback to
finish.
>
> The packets look as follows:
>
> No. Time Rpc Time Source Destination Protocol Info
> 16 21.114244808 10.1.0.71 10.1.0.34 NFS V4 Call (Reply In 18) OPEN DH: 0xf359c2c5/f42.iso | LAYOUTGET
> 18 21.158017758 0.043772950 10.1.0.34 10.1.0.71 NFS V4 Reply (Call In 16) OPEN StateID: 0x7ca4 | LAYOUTGET
> 29 42.354912088 10.1.0.71 10.1.0.34 NFS V4 Call (Reply In 30) LAYOUTCOMMIT
> 30 42.357843007 0.002930919 10.1.0.34 10.1.0.71 NFS V4 Reply (Call In 29) LAYOUTCOMMIT
> 32 42.357993362 10.1.0.71 10.1.0.34 NFS V4 Call (Reply In 35) GETATTR FH: 0x7d6441d9
> 33 42.358016100 10.1.0.71 10.1.0.34 NFS V4 Call (Reply In 36) LAYOUTRETURN
> 35 42.359324091 0.001330729 10.1.0.34 10.1.0.71 NFS V4 Reply (Call In 32) GETATTR
> 36 42.378923839 0.020907739 10.1.0.34 10.1.0.71 NFS V4 Reply (Call In 33) LAYOUTRETURN
> 38 42.379133795 10.1.0.71 10.1.0.34 NFS V4 Call (Reply In 39) CLOSE StateID: 0x7ca4
> 39 42.380213999 0.001080204 10.1.0.34 10.1.0.71 NFS V4 Reply (Call In 38) CLOSE
>
> So, GETATTR is sent after LAYOUTCOMMIT.
> This behavior is observed with 6.17 and RHEL kernels.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-10-09 11:48 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-09 10:57 Blocking stat calls during (p)NFS write Mkrtchyan, Tigran
2025-10-09 11:48 ` Jeff Layton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox