Linux NFS development
 help / color / mirror / Atom feed
* Blocking stat calls during (p)NFS write
@ 2025-10-09 10:57 Mkrtchyan, Tigran
  2025-10-09 11:48 ` Jeff Layton
  0 siblings, 1 reply; 2+ messages in thread
From: Mkrtchyan, Tigran @ 2025-10-09 10:57 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 4571 bytes --]


Dear NFS fellows,

We have noticed that when data is written into a large file, a stat call on that file blocks.
This is quite simple to reproduce. Open two terminals, in one copy a 4GB file into NFS NFS-mounted
directory, in another window run `stat /path/to/file`.

By looking at the perf output, I can see:

stat  301583 [007] 265242.393512:      sched:sched_wake_idle_without_ipi: cpu=2
        ffffffffb5e0cfa0 call_function_single_prep_ipi+0x90 ([kernel.kallsyms])
        ffffffffb5e0cfa0 call_function_single_prep_ipi+0x90 ([kernel.kallsyms])
        ffffffffb5ef176b __smp_call_single_queue+0xdb ([kernel.kallsyms])
        ffffffffb5ef1866 generic_exec_single+0x36 ([kernel.kallsyms])
        ffffffffb5ef1bf2 smp_call_function_single_async+0x22 ([kernel.kallsyms])
        ffffffffb5ec97f4 update_process_times+0xa4 ([kernel.kallsyms])
        ffffffffb5ee279f tick_nohz_handler+0x8f ([kernel.kallsyms])
        ffffffffb5eca7c0 __hrtimer_run_queues+0x110 ([kernel.kallsyms])
        ffffffffb5ecb62c hrtimer_interrupt+0xfc ([kernel.kallsyms])
        ffffffffb5d5deb5 __sysvec_apic_timer_interrupt+0x55 ([kernel.kallsyms])
        ffffffffb6f6364c sysvec_apic_timer_interrupt+0x6c ([kernel.kallsyms])
        ffffffffb5a0160a asm_sysvec_apic_timer_interrupt+0x1a ([kernel.kallsyms])
        ffffffffb5e07b8f finish_task_switch.isra.0+0x9f ([kernel.kallsyms])
        ffffffffb6f6ad91 __schedule+0x301 ([kernel.kallsyms])
        ffffffffb6f6b277 schedule+0x27 ([kernel.kallsyms])
        ffffffffb6f6b326 io_schedule+0x46 ([kernel.kallsyms])
        ffffffffb608fddf folio_wait_bit+0xef ([kernel.kallsyms])
        ffffffffb609c95e folio_wait_writeback+0x2e ([kernel.kallsyms])
        ffffffffb608f0ff __filemap_fdatawait_range+0x7f ([kernel.kallsyms])
        ffffffffb6091bc8 filemap_write_and_wait_range+0xc8 ([kernel.kallsyms])
        ffffffffc1fc5cc7 nfs_getattr+0x567 ([kernel.kallsyms])
        ffffffffb61eb3fe vfs_getattr_nosec+0xbe ([kernel.kallsyms])
        ffffffffb61eb673 vfs_statx+0xa3 ([kernel.kallsyms])
        ffffffffb61ec363 do_statx+0x63 ([kernel.kallsyms])
        ffffffffb61ec5c0 __x64_sys_statx+0x90 ([kernel.kallsyms])
        ffffffffb6f5e5ae do_syscall_64+0x7e ([kernel.kallsyms])
        ffffffffb5a0012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
            7f17c3b1fb6e statx+0xe (/usr/lib64/libc.so.6)
            55b6106f684f main+0x45f (/usr/bin/stat)
            7f17c3a3a5b5 __libc_start_call_main+0x75 (/usr/lib64/libc.so.6)
            7f17c3a3a668 __libc_start_main@@GLIBC_2.34+0x88 (/usr/lib64/libc.so.6)
            55b6106f6bb5 _start+0x25 (/usr/bin/stat)


So I assume that blocking comes from inode invalidation in inode.c#nfs_getattr call:

1003         /* Flush out writes to the server in order to update c/mtime/version.  */
1004         if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_CHANGE_COOKIE)) &&
1005             S_ISREG(inode->i_mode)) {
1006                 if (nfs_have_delegated_mtime(inode))
1007                         filemap_fdatawrite(inode->i_mapping);
1008                 else
1009                         filemap_write_and_wait(inode->i_mapping);
1010         }


The packets look as follows:

No.     Time           Rpc Time   Source                Destination           Protocol Info
     16 21.114244808              10.1.0.71       10.1.0.34         NFS      V4 Call (Reply In 18) OPEN DH: 0xf359c2c5/f42.iso | LAYOUTGET
     18 21.158017758   0.043772950 10.1.0.34         10.1.0.71       NFS      V4 Reply (Call In 16) OPEN StateID: 0x7ca4 | LAYOUTGET
     29 42.354912088              10.1.0.71       10.1.0.34         NFS      V4 Call (Reply In 30) LAYOUTCOMMIT
     30 42.357843007   0.002930919 10.1.0.34         10.1.0.71       NFS      V4 Reply (Call In 29) LAYOUTCOMMIT
     32 42.357993362              10.1.0.71       10.1.0.34         NFS      V4 Call (Reply In 35) GETATTR FH: 0x7d6441d9
     33 42.358016100              10.1.0.71       10.1.0.34         NFS      V4 Call (Reply In 36) LAYOUTRETURN
     35 42.359324091   0.001330729 10.1.0.34         10.1.0.71       NFS      V4 Reply (Call In 32) GETATTR
     36 42.378923839   0.020907739 10.1.0.34         10.1.0.71       NFS      V4 Reply (Call In 33) LAYOUTRETURN
     38 42.379133795              10.1.0.71       10.1.0.34         NFS      V4 Call (Reply In 39) CLOSE StateID: 0x7ca4
     39 42.380213999   0.001080204 10.1.0.34         10.1.0.71       NFS      V4 Reply (Call In 38) CLOSE 

So, GETATTR is sent after LAYOUTCOMMIT.
This behavior is observed with 6.17 and RHEL kernels.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 2826 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Blocking stat calls during (p)NFS write
  2025-10-09 10:57 Blocking stat calls during (p)NFS write Mkrtchyan, Tigran
@ 2025-10-09 11:48 ` Jeff Layton
  0 siblings, 0 replies; 2+ messages in thread
From: Jeff Layton @ 2025-10-09 11:48 UTC (permalink / raw)
  To: Mkrtchyan, Tigran, linux-nfs

On Thu, 2025-10-09 at 12:57 +0200, Mkrtchyan, Tigran wrote:
> Dear NFS fellows,
> 
> We have noticed that when data is written into a large file, a stat call on that file blocks.
> This is quite simple to reproduce. Open two terminals, in one copy a 4GB file into NFS NFS-mounted
> directory, in another window run `stat /path/to/file`.
> 
> By looking at the perf output, I can see:
> 
> stat  301583 [007] 265242.393512:      sched:sched_wake_idle_without_ipi: cpu=2
>         ffffffffb5e0cfa0 call_function_single_prep_ipi+0x90 ([kernel.kallsyms])
>         ffffffffb5e0cfa0 call_function_single_prep_ipi+0x90 ([kernel.kallsyms])
>         ffffffffb5ef176b __smp_call_single_queue+0xdb ([kernel.kallsyms])
>         ffffffffb5ef1866 generic_exec_single+0x36 ([kernel.kallsyms])
>         ffffffffb5ef1bf2 smp_call_function_single_async+0x22 ([kernel.kallsyms])
>         ffffffffb5ec97f4 update_process_times+0xa4 ([kernel.kallsyms])
>         ffffffffb5ee279f tick_nohz_handler+0x8f ([kernel.kallsyms])
>         ffffffffb5eca7c0 __hrtimer_run_queues+0x110 ([kernel.kallsyms])
>         ffffffffb5ecb62c hrtimer_interrupt+0xfc ([kernel.kallsyms])
>         ffffffffb5d5deb5 __sysvec_apic_timer_interrupt+0x55 ([kernel.kallsyms])
>         ffffffffb6f6364c sysvec_apic_timer_interrupt+0x6c ([kernel.kallsyms])
>         ffffffffb5a0160a asm_sysvec_apic_timer_interrupt+0x1a ([kernel.kallsyms])
>         ffffffffb5e07b8f finish_task_switch.isra.0+0x9f ([kernel.kallsyms])
>         ffffffffb6f6ad91 __schedule+0x301 ([kernel.kallsyms])
>         ffffffffb6f6b277 schedule+0x27 ([kernel.kallsyms])
>         ffffffffb6f6b326 io_schedule+0x46 ([kernel.kallsyms])
>         ffffffffb608fddf folio_wait_bit+0xef ([kernel.kallsyms])
>         ffffffffb609c95e folio_wait_writeback+0x2e ([kernel.kallsyms])
>         ffffffffb608f0ff __filemap_fdatawait_range+0x7f ([kernel.kallsyms])
>         ffffffffb6091bc8 filemap_write_and_wait_range+0xc8 ([kernel.kallsyms])
>         ffffffffc1fc5cc7 nfs_getattr+0x567 ([kernel.kallsyms])
>         ffffffffb61eb3fe vfs_getattr_nosec+0xbe ([kernel.kallsyms])
>         ffffffffb61eb673 vfs_statx+0xa3 ([kernel.kallsyms])
>         ffffffffb61ec363 do_statx+0x63 ([kernel.kallsyms])
>         ffffffffb61ec5c0 __x64_sys_statx+0x90 ([kernel.kallsyms])
>         ffffffffb6f5e5ae do_syscall_64+0x7e ([kernel.kallsyms])
>         ffffffffb5a0012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
>             7f17c3b1fb6e statx+0xe (/usr/lib64/libc.so.6)
>             55b6106f684f main+0x45f (/usr/bin/stat)
>             7f17c3a3a5b5 __libc_start_call_main+0x75 (/usr/lib64/libc.so.6)
>             7f17c3a3a668 __libc_start_main@@GLIBC_2.34+0x88 (/usr/lib64/libc.so.6)
>             55b6106f6bb5 _start+0x25 (/usr/bin/stat)
> 
> 
> So I assume that blocking comes from inode invalidation in inode.c#nfs_getattr call:
> 
> 1003         /* Flush out writes to the server in order to update c/mtime/version.  */
> 1004         if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_CHANGE_COOKIE)) &&
> 1005             S_ISREG(inode->i_mode)) {
> 1006                 if (nfs_have_delegated_mtime(inode))
> 1007                         filemap_fdatawrite(inode->i_mapping);
> 1008                 else
> 1009                         filemap_write_and_wait(inode->i_mapping);
> 1010         }
> 

stat() requires updated mtime and ctime. The client must write any
dirty data back before it can report those accurately since the server
is authoritative for those attributes. If the server supports delegated
timestamps however, then you don't need to wait for writeback to
finish.

> 
> The packets look as follows:
> 
> No.     Time           Rpc Time   Source                Destination           Protocol Info
>      16 21.114244808              10.1.0.71       10.1.0.34         NFS      V4 Call (Reply In 18) OPEN DH: 0xf359c2c5/f42.iso | LAYOUTGET
>      18 21.158017758   0.043772950 10.1.0.34         10.1.0.71       NFS      V4 Reply (Call In 16) OPEN StateID: 0x7ca4 | LAYOUTGET
>      29 42.354912088              10.1.0.71       10.1.0.34         NFS      V4 Call (Reply In 30) LAYOUTCOMMIT
>      30 42.357843007   0.002930919 10.1.0.34         10.1.0.71       NFS      V4 Reply (Call In 29) LAYOUTCOMMIT
>      32 42.357993362              10.1.0.71       10.1.0.34         NFS      V4 Call (Reply In 35) GETATTR FH: 0x7d6441d9
>      33 42.358016100              10.1.0.71       10.1.0.34         NFS      V4 Call (Reply In 36) LAYOUTRETURN
>      35 42.359324091   0.001330729 10.1.0.34         10.1.0.71       NFS      V4 Reply (Call In 32) GETATTR
>      36 42.378923839   0.020907739 10.1.0.34         10.1.0.71       NFS      V4 Reply (Call In 33) LAYOUTRETURN
>      38 42.379133795              10.1.0.71       10.1.0.34         NFS      V4 Call (Reply In 39) CLOSE StateID: 0x7ca4
>      39 42.380213999   0.001080204 10.1.0.34         10.1.0.71       NFS      V4 Reply (Call In 38) CLOSE 
> 
> So, GETATTR is sent after LAYOUTCOMMIT.
> This behavior is observed with 6.17 and RHEL kernels.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-10-09 11:48 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-09 10:57 Blocking stat calls during (p)NFS write Mkrtchyan, Tigran
2025-10-09 11:48 ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox