The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
@ 2026-05-06  7:46 郭玲兴
  2026-05-06 13:28 ` Lionel Cons
  0 siblings, 1 reply; 4+ messages in thread
From: 郭玲兴 @ 2026-05-06  7:46 UTC (permalink / raw)
  To: linux-nfs, anna.schumaker, linux-kernel; +Cc: trond.myklebust


[-- Attachment #1.1: Type: text/plain, Size: 1393 bytes --]

Hi,


We encountered a reproducible NFSv4.1 client hang issue under concurrent workload.


Environment:
- Two independent Linux clients (VMs)
- Both mount the same Windows NFS server (NFSv4.1)
- Kernel version: 6.1.78
- Mount options: vers=4.1,soft,proto=tcp,timeo=60,retrans=10


Workload:
- Each client copies ~5GB files to the same NFS share
- Copy runs every 30 minutes
- After ~41 iterations (~20 hours), both clients hang simultaneously


Symptoms:
- All NFS operations (ls, df, rsync, cp) hang in D state
- No NFS RPC traffic observed (tcpdump shows only TCP ACK)
- nfsstat shows retrans=0


Sysrq stack shows:


NFS state manager thread:
  nfs4_run_state_manager
  nfs4_drain_slot_tbl
  wait_for_completion_interruptible


User processes:
  rpc_wait_bit_killable
  nfs4_proc_getattr
  nfs4_run_open_task


Both clients exhibit identical behavior at the same time.


This suggests that:
- The client enters NFSv4.1 state recovery
- nfs4_drain_slot_tbl waits for slots to drain
- At least one slot never completes
- All further RPCs are blocked


Questions:
1. Is it expected that nfs4_drain_slot_tbl can block indefinitely?
2. What conditions can cause a slot to never be released?
3. Should the client force session reset instead of waiting forever?
4. Is this a known interoperability issue with Windows NFSv4.1 server?


We can provide additional logs if needed.


Thanks.






[-- Attachment #1.2: Type: text/html, Size: 3987 bytes --]

[-- Attachment #2: stack.txt --]
[-- Type: text/plain, Size: 16431 bytes --]

root@storagerepo:~# echo 1 > /proc/sys/kernel/sysrq
root@storagerepo:~# echo t > /proc/sysrq-trigger
root@storagerepo:~# dmesg -T | tail -2000 > /tmp/sysrq_t_nfs_hang.txt
root@storagerepo:~# grep -E -i "ls|nfs|rpc|sunrpc|rpciod|state|slot|session|wait|stack" /tmp/sysrq_t_nfs_hang.txt
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1057172 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  futex_wait_queue+0x64/0x90
[Wed May  6 15:05:59 2026]  futex_wait+0x177/0x270
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1057173 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  do_epoll_wait+0x620/0x780
[Wed May  6 15:05:59 2026]  __x64_sys_epoll_wait+0x6f/0x110
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1057174 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  futex_wait_queue+0x64/0x90
[Wed May  6 15:05:59 2026]  futex_wait+0x177/0x270
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1057175 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  futex_wait_queue+0x64/0x90
[Wed May  6 15:05:59 2026]  futex_wait+0x177/0x270
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1057176 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  futex_wait_queue+0x64/0x90
[Wed May  6 15:05:59 2026]  futex_wait+0x177/0x270
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1057177 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  do_epoll_wait+0x620/0x780
[Wed May  6 15:05:59 2026]  __x64_sys_epoll_wait+0x6f/0x110
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1057178 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  futex_wait_queue+0x64/0x90
[Wed May  6 15:05:59 2026]  futex_wait+0x177/0x270
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1057179 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  do_epoll_wait+0x620/0x780
[Wed May  6 15:05:59 2026]  do_compat_epoll_pwait.part.0+0x12/0x80
[Wed May  6 15:05:59 2026]  __x64_sys_epoll_pwait+0x96/0x140
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1057180 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  futex_wait_queue+0x64/0x90
[Wed May  6 15:05:59 2026]  futex_wait+0x177/0x270
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1057197 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  futex_wait_queue+0x64/0x90
[Wed May  6 15:05:59 2026]  futex_wait+0x177/0x270
[Wed May  6 15:05:59 2026] task:containerd-shim state:S stack:0     pid:1333826 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  futex_wait_queue+0x64/0x90
[Wed May  6 15:05:59 2026]  futex_wait+0x177/0x270
[Wed May  6 15:05:59 2026] task:start_container state:S stack:0     pid:1057191 ppid:1057171 flags:0x00000002
[Wed May  6 15:05:59 2026]  do_wait+0x166/0x300
[Wed May  6 15:05:59 2026]  kernel_wait4+0xbd/0x160
[Wed May  6 15:05:59 2026]  __do_sys_wait4+0xa2/0xb0
[Wed May  6 15:05:59 2026]  __x64_sys_wait4+0x1c/0x30
[Wed May  6 15:05:59 2026] task:python3         state:S stack:0     pid:1057217 ppid:1057191 flags:0x00000002
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026] task:bash            state:D stack:0     pid:1057260 ppid:11873  flags:0x00000002
[Wed May  6 15:05:59 2026]  rpc_wait_bit_killable+0x11/0x70 [sunrpc]
[Wed May  6 15:05:59 2026]  __wait_on_bit+0x42/0x110
[Wed May  6 15:05:59 2026]  ? __bpf_trace_cache_event+0x10/0x10 [sunrpc]
[Wed May  6 15:05:59 2026]  out_of_line_wait_on_bit+0x8c/0xb0
[Wed May  6 15:05:59 2026]  rpc_wait_for_completion_task+0x23/0x30 [sunrpc]
[Wed May  6 15:05:59 2026]  nfs4_run_open_task+0x150/0x1e0 [nfsv4]
[Wed May  6 15:05:59 2026]  nfs4_do_open+0x2cf/0xcf0 [nfsv4]
[Wed May  6 15:05:59 2026]  ? alloc_nfs_open_context+0x2f/0x130 [nfs]
[Wed May  6 15:05:59 2026]  nfs4_atomic_open+0xf3/0x100 [nfsv4]
[Wed May  6 15:05:59 2026]  nfs_atomic_open+0x204/0x740 [nfs]
[Wed May  6 15:05:59 2026] task:rsync           state:S stack:0     pid:1057415 ppid:1057217 flags:0x00000002
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026] task:vsftpd          state:S stack:0     pid:1057416 ppid:1057217 flags:0x00000002
[Wed May  6 15:05:59 2026] task:172.50.0.120-ma state:S stack:0     pid:1065524 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026]  wait_for_completion_interruptible+0x145/0x1b0
[Wed May  6 15:05:59 2026]  nfs4_drain_slot_tbl+0x4d/0x70 [nfsv4]
[Wed May  6 15:05:59 2026]  nfs4_run_state_manager+0x3ab/0xb50 [nfsv4]
[Wed May  6 15:05:59 2026]  ? nfs4_do_reclaim+0x970/0x970 [nfsv4]
[Wed May  6 15:05:59 2026] task:rsync           state:D stack:0     pid:1166689 ppid:1057415 flags:0x00000002
[Wed May  6 15:05:59 2026]  rpc_wait_bit_killable+0x11/0x70 [sunrpc]
[Wed May  6 15:05:59 2026]  __wait_on_bit+0x42/0x110
[Wed May  6 15:05:59 2026]  ? __bpf_trace_cache_event+0x10/0x10 [sunrpc]
[Wed May  6 15:05:59 2026]  out_of_line_wait_on_bit+0x8c/0xb0
[Wed May  6 15:05:59 2026]  __rpc_execute+0x137/0x4b0 [sunrpc]
[Wed May  6 15:05:59 2026]  ? rpc_new_task+0x172/0x1e0 [sunrpc]
[Wed May  6 15:05:59 2026]  rpc_execute+0xd2/0x100 [sunrpc]
[Wed May  6 15:05:59 2026]  rpc_run_task+0x12d/0x190 [sunrpc]
[Wed May  6 15:05:59 2026]  nfs4_do_call_sync+0x6b/0xa0 [nfsv4]
[Wed May  6 15:05:59 2026]  _nfs4_proc_getattr+0x13b/0x170 [nfsv4]
[Wed May  6 15:05:59 2026]  ? nfs_alloc_fattr_with_label+0x27/0xc0 [nfs]
[Wed May  6 15:05:59 2026]  nfs4_proc_getattr+0x6e/0x100 [nfsv4]
[Wed May  6 15:05:59 2026]  __nfs_revalidate_inode+0xa6/0x2b0 [nfs]
[Wed May  6 15:05:59 2026]  nfs_access_get_cached+0x13c/0x1d0 [nfs]
[Wed May  6 15:05:59 2026]  nfs_do_access+0x60/0x280 [nfs]
[Wed May  6 15:05:59 2026]  nfs_permission+0x99/0x190 [nfs]
[Wed May  6 15:05:59 2026] task:kworker/2:0     state:I stack:0     pid:1368373 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] task:sshd            state:S stack:0     pid:511119 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026] task:sh              state:S stack:0     pid:511427 ppid:511119 flags:0x00000002
[Wed May  6 15:05:59 2026]  do_wait+0x166/0x300
[Wed May  6 15:05:59 2026]  kernel_wait4+0xbd/0x160
[Wed May  6 15:05:59 2026]  __do_sys_wait4+0xa2/0xb0
[Wed May  6 15:05:59 2026]  __x64_sys_wait4+0x1c/0x30
[Wed May  6 15:05:59 2026] task:kworker/1:2     state:I stack:0     pid:570066 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] task:sshd            state:S stack:0     pid:572471 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026] task:sh              state:S stack:0     pid:572653 ppid:572471 flags:0x00000002
[Wed May  6 15:05:59 2026]  do_wait+0x166/0x300
[Wed May  6 15:05:59 2026]  kernel_wait4+0xbd/0x160
[Wed May  6 15:05:59 2026]  __do_sys_wait4+0xa2/0xb0
[Wed May  6 15:05:59 2026]  __x64_sys_wait4+0x1c/0x30
[Wed May  6 15:05:59 2026] task:ls              state:D stack:0     pid:572654 ppid:572653 flags:0x00000002
[Wed May  6 15:05:59 2026]  rpc_wait_bit_killable+0x11/0x70 [sunrpc]
[Wed May  6 15:05:59 2026]  __wait_on_bit+0x42/0x110
[Wed May  6 15:05:59 2026]  ? __bpf_trace_cache_event+0x10/0x10 [sunrpc]
[Wed May  6 15:05:59 2026]  out_of_line_wait_on_bit+0x8c/0xb0
[Wed May  6 15:05:59 2026]  __rpc_execute+0x137/0x4b0 [sunrpc]
[Wed May  6 15:05:59 2026]  ? rpc_new_task+0x172/0x1e0 [sunrpc]
[Wed May  6 15:05:59 2026]  rpc_execute+0xd2/0x100 [sunrpc]
[Wed May  6 15:05:59 2026]  rpc_run_task+0x12d/0x190 [sunrpc]
[Wed May  6 15:05:59 2026]  nfs4_do_call_sync+0x6b/0xa0 [nfsv4]
[Wed May  6 15:05:59 2026]  _nfs4_proc_getattr+0x13b/0x170 [nfsv4]
[Wed May  6 15:05:59 2026]  ? nfs_alloc_fattr_with_label+0x27/0xc0 [nfs]
[Wed May  6 15:05:59 2026]  nfs4_proc_getattr+0x6e/0x100 [nfsv4]
[Wed May  6 15:05:59 2026]  __nfs_revalidate_inode+0xa6/0x2b0 [nfs]
[Wed May  6 15:05:59 2026]  nfs_getattr+0x2f4/0x470 [nfs]
[Wed May  6 15:05:59 2026] task:tail            state:S stack:0     pid:572655 ppid:572653 flags:0x00000002
[Wed May  6 15:05:59 2026] task:sshd            state:S stack:0     pid:857023 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026] task:sh              state:S stack:0     pid:857335 ppid:857023 flags:0x00000002
[Wed May  6 15:05:59 2026]  do_wait+0x166/0x300
[Wed May  6 15:05:59 2026]  kernel_wait4+0xbd/0x160
[Wed May  6 15:05:59 2026]  __do_sys_wait4+0xa2/0xb0
[Wed May  6 15:05:59 2026]  __x64_sys_wait4+0x1c/0x30
[Wed May  6 15:05:59 2026] task:kworker/1:1     state:I stack:0     pid:971568 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] task:kworker/2:2     state:I stack:0     pid:973219 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] task:kworker/u8:1    state:I stack:0     pid:2250065 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] task:kworker/3:0     state:I stack:0     pid:2330645 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] task:kworker/u8:2    state:I stack:0     pid:2503821 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] Workqueue:  0x0 (rpciod)
[Wed May  6 15:05:59 2026] task:kworker/0:0     state:I stack:0     pid:2559575 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] task:kworker/0:1     state:I stack:0     pid:2592413 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] task:kworker/u8:0    state:I stack:0     pid:2597836 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] task:sshd            state:S stack:0     pid:2603312 ppid:1      flags:0x00000002
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026] task:sh              state:R  running task     stack:0     pid:2603620 ppid:2603312 flags:0x00004002
[Wed May  6 15:05:59 2026]  show_state_filter+0x5e/0x100
[Wed May  6 15:05:59 2026]  sysrq_handle_showstate+0x10/0x20
[Wed May  6 15:05:59 2026] task:tcpdump         state:S stack:0     pid:2610467 ppid:857335 flags:0x00000002
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026]  ? __pollwait+0xe0/0xe0
[Wed May  6 15:05:59 2026] task:df              state:D stack:0     pid:2610988 ppid:511427 flags:0x00000002
[Wed May  6 15:05:59 2026]  rpc_wait_bit_killable+0x11/0x70 [sunrpc]
[Wed May  6 15:05:59 2026]  __wait_on_bit+0x42/0x110
[Wed May  6 15:05:59 2026]  ? __bpf_trace_cache_event+0x10/0x10 [sunrpc]
[Wed May  6 15:05:59 2026]  out_of_line_wait_on_bit+0x8c/0xb0
[Wed May  6 15:05:59 2026]  __rpc_execute+0x137/0x4b0 [sunrpc]
[Wed May  6 15:05:59 2026]  ? rpc_new_task+0x172/0x1e0 [sunrpc]
[Wed May  6 15:05:59 2026]  rpc_execute+0xd2/0x100 [sunrpc]
[Wed May  6 15:05:59 2026]  rpc_run_task+0x12d/0x190 [sunrpc]
[Wed May  6 15:05:59 2026]  nfs4_do_call_sync+0x6b/0xa0 [nfsv4]
[Wed May  6 15:05:59 2026]  _nfs4_proc_getattr+0x13b/0x170 [nfsv4]
[Wed May  6 15:05:59 2026]  nfs4_proc_getattr+0x6e/0x100 [nfsv4]
[Wed May  6 15:05:59 2026]  __nfs_revalidate_inode+0xa6/0x2b0 [nfs]
[Wed May  6 15:05:59 2026]  nfs_getattr+0x2f4/0x470 [nfs]
[Wed May  6 15:05:59 2026] task:kworker/0:2     state:I stack:0     pid:2625351 ppid:2      flags:0x00004000
[Wed May  6 15:05:59 2026] task:sleep           state:S stack:0     pid:2633473 ppid:11873  flags:0x00000002
[Wed May  6 15:05:59 2026] task:sleep           state:S stack:0     pid:2633632 ppid:11552  flags:0x00000002
[Wed May  6 15:05:59 2026] task:sleep           state:S stack:0     pid:2633633 ppid:11683  flags:0x00000002
[Wed May  6 15:05:59 2026] task:sleep           state:S stack:0     pid:2633634 ppid:11682  flags:0x00000002
[Wed May  6 15:05:59 2026] task:sleep           state:S stack:0     pid:2633690 ppid:11608  flags:0x00000002
[Wed May  6 15:05:59 2026] cfs_rq[0]:/user.slice/user-0.slice/session-c47.scope
[Wed May  6 15:05:59 2026]  S            task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep
[Wed May  6 15:05:59 2026]  S            sshd 572471        40.854415      2749   120         0.000000       187.624459         0.000000         0.000000 /user.slice/user-0.slice/session-c17.scope
[Wed May  6 15:05:59 2026]  S              sh 572653         9.976400         1   120         0.000000         2.024990         0.000000         0.000000 /user.slice/user-0.slice/session-c17.scope
[Wed May  6 15:05:59 2026]  S            sshd 857023        84.311169      6127   120         0.000000       315.874935         0.000000         0.000000 /user.slice/user-0.slice/session-c32.scope
[Wed May  6 15:05:59 2026]  S              sh 857335        74.620879       146   120         0.000000        26.479416         0.000000         0.000000 /user.slice/user-0.slice/session-c32.scope
[Wed May  6 15:05:59 2026]  S            sshd 2603312        43.116931      2279   120         0.000000       127.678161         0.000000         0.000000 /user.slice/user-0.slice/session-c47.scope
[Wed May  6 15:05:59 2026] >R              sh 2603620        45.102420       102   120         0.000000        49.337532         0.000000         0.000000 /user.slice/user-0.slice/session-c47.scope
[Wed May  6 15:05:59 2026]  D              df 2610988       112.369425         2   120         0.000000         0.976203         0.000000         0.000000 /user.slice/user-0.slice/session-c5.scope
[Wed May  6 15:05:59 2026]  S            task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep
[Wed May  6 15:05:59 2026]  I          nfsiod 11517     14249.162010         2   100         0.000000         0.014478         0.000000         0.000000 /
[Wed May  6 15:05:59 2026]  S  NFSv4 callback 1057043  10301690.367917         2   120         0.000000         0.010939         0.000000         0.000000 /
[Wed May  6 15:05:59 2026]  S            task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep
[Wed May  6 15:05:59 2026]  S            sshd  9880      1397.825913     69753   120         0.000000      4095.507095         0.000000         0.000000 /user.slice/user-0.slice/session-c2.scope
[Wed May  6 15:05:59 2026]  S            sshd 511119       188.588500      7090   120         0.000000       389.997546         0.000000         0.000000 /user.slice/user-0.slice/session-c5.scope
[Wed May  6 15:05:59 2026]  S              sh 511427       179.750924       213   120         0.000000        29.271168         0.000000         0.000000 /user.slice/user-0.slice/session-c5.scope
[Wed May  6 15:05:59 2026]  D              ls 572654         8.821982         1   120         0.000000         0.870567         0.000000         0.000000 /user.slice/user-0.slice/session-c17.scope
[Wed May  6 15:05:59 2026] cfs_rq[3]:/user.slice/user-0.slice/session-c47.scope
[Wed May  6 15:05:59 2026]  S            task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep
[Wed May  6 15:05:59 2026]  I          rpciod   338      1744.571935         2   100         0.000000         0.018540         0.000000         0.000000 /
[Wed May  6 15:05:59 2026]  S            tail 572655         9.339705         1   120         0.000000         0.607847         0.000000         0.000000 /user.slice/user-0.slice/session-c17.scope
[Wed May  6 15:05:59 2026]  S         tcpdump 2610467       110.272032        11   120         0.000000         2.436573         0.000000         0.000000 /user.slice/user-0.slice/session-c32.scope
[Wed May  6 15:05:59 2026] Showing busy workqueues and worker pools:

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
  2026-05-06  7:46 [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server 郭玲兴
@ 2026-05-06 13:28 ` Lionel Cons
  2026-05-07  0:50   ` 郭玲兴
  0 siblings, 1 reply; 4+ messages in thread
From: Lionel Cons @ 2026-05-06 13:28 UTC (permalink / raw)
  To: 郭玲兴, linux-nfs, linux-kernel

On Wed, 6 May 2026 at 09:49, 郭玲兴 <guolingxing@supcon.com> wrote:
>
> Hi,
>
>
> We encountered a reproducible NFSv4.1 client hang issue under concurrent workload.
>
>
> Environment:
> - Two independent Linux clients (VMs)
> - Both mount the same Windows NFS server (NFSv4.1)
> - Kernel version: 6.1.78
> - Mount options: vers=4.1,soft,proto=tcp,timeo=60,retrans=10

Which version of WindowsServer do you use, e.g what does the "ver"
command in cmd.exe output? How did you set up the user accounts, and
which authentication (AUTH_SYS, GSS, ...) do you use?
Which CPU architecture do you use? How much memory do you have on the
Linux NFS client?

Lionel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
  2026-05-06 13:28 ` Lionel Cons
@ 2026-05-07  0:50   ` 郭玲兴
  2026-05-07  1:22     ` 郭玲兴
  0 siblings, 1 reply; 4+ messages in thread
From: 郭玲兴 @ 2026-05-07  0:50 UTC (permalink / raw)
  To: Lionel Cons; +Cc: linux-nfs, linux-kernel

Hi Lionel,

Thanks for your response.

Here are the details:

1. Windows Server version:
Microsoft Windows Server 2022
Version 10.0.20348.587

2. User accounts:
No mapping mechanism is configured.
No AD, LDAP, or passwd mapping is used.

Unmapped users are handled by the default "Everyone" account.

3. Authentication:
sec=sys (AUTH_SYS), as reported by nfsstat -m

4. CPU architecture:
- Linux clients: x86_64
- Windows server: x86_64 (64-bit OS)

5. Memory:
Each Linux client VM has 16GB RAM

Thanks.


> -----原始邮件-----
> 发件人: "Lionel Cons" <lionelcons1972@gmail.com>
> 发送时间:2026-05-06 21:28:33 (星期三)
> 收件人: 郭玲兴 <guolingxing@supcon.com>, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
> 主题: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
> 
> On Wed, 6 May 2026 at 09:49, 郭玲兴 <guolingxing@supcon.com> wrote:
> >
> > Hi,
> >
> >
> > We encountered a reproducible NFSv4.1 client hang issue under concurrent workload.
> >
> >
> > Environment:
> > - Two independent Linux clients (VMs)
> > - Both mount the same Windows NFS server (NFSv4.1)
> > - Kernel version: 6.1.78
> > - Mount options: vers=4.1,soft,proto=tcp,timeo=60,retrans=10
> 
> Which version of WindowsServer do you use, e.g what does the "ver"
> command in cmd.exe output? How did you set up the user accounts, and
> which authentication (AUTH_SYS, GSS, ...) do you use?
> Which CPU architecture do you use? How much memory do you have on the
> Linux NFS client?
> 
> Lionel






^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
  2026-05-07  0:50   ` 郭玲兴
@ 2026-05-07  1:22     ` 郭玲兴
  0 siblings, 0 replies; 4+ messages in thread
From: 郭玲兴 @ 2026-05-07  1:22 UTC (permalink / raw)
  To: Lionel Cons; +Cc: linux-nfs, linux-kernel

Hi Lionel,

Thanks for your response.

Here are the details you requested:

1. Windows Server version:
Microsoft Windows Server 2022
Version 10.0.20348.587

2. User accounts:
No mapping mechanism is configured.
No AD, LDAP, or passwd mapping is used.

Unmapped users are handled by the default "Everyone" account.

3. Authentication:
sec=sys (AUTH_SYS), as reported by nfsstat -m

4. CPU architecture:
- Linux clients: x86_64
- Windows server: x86_64 (64-bit OS)

5. Memory:
Each Linux client VM has 16GB RAM

---

Additional observations from two independent clients:

Client A:
age: 498061
lease_time: 120
lease_expired: 497941

Client B:
age: 69598
lease_time: 120
lease_expired: 69478

In both cases, lease_expired is approximately equal to (age - lease_time),
which suggests that the lease expired shortly after mount and has not
been successfully renewed since.

At the same time:

- Both clients hang simultaneously under concurrent workload
- Clients are stuck in nfs4_drain_slot_tbl
- No NFS RPC traffic is observed at hang time (only TCP ACK)
- nfsstat shows retrans=0
- On the Windows server side, the NFS session state is reported as "Initialized"

We are currently tracing the RPC lifecycle to identify which RPC does not complete.

Please let us know if further information would be helpful.

Thanks.


> -----原始邮件-----
> 发件人: 郭玲兴 <guolingxing@supcon.com>
> 发送时间:2026-05-07 08:50:23 (星期四)
> 收件人: "Lionel Cons" <lionelcons1972@gmail.com>
> 抄送: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
> 主题: Re: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
> 
> Hi Lionel,
> 
> Thanks for your response.
> 
> Here are the details:
> 
> 1. Windows Server version:
> Microsoft Windows Server 2022
> Version 10.0.20348.587
> 
> 2. User accounts:
> No mapping mechanism is configured.
> No AD, LDAP, or passwd mapping is used.
> 
> Unmapped users are handled by the default "Everyone" account.
> 
> 3. Authentication:
> sec=sys (AUTH_SYS), as reported by nfsstat -m
> 
> 4. CPU architecture:
> - Linux clients: x86_64
> - Windows server: x86_64 (64-bit OS)
> 
> 5. Memory:
> Each Linux client VM has 16GB RAM
> 
> Thanks.
> 
> 
> > -----原始邮件-----
> > 发件人: "Lionel Cons" <lionelcons1972@gmail.com>
> > 发送时间:2026-05-06 21:28:33 (星期三)
> > 收件人: 郭玲兴 <guolingxing@supcon.com>, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
> > 主题: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
> > 
> > On Wed, 6 May 2026 at 09:49, 郭玲兴 <guolingxing@supcon.com> wrote:
> > >
> > > Hi,
> > >
> > >
> > > We encountered a reproducible NFSv4.1 client hang issue under concurrent workload.
> > >
> > >
> > > Environment:
> > > - Two independent Linux clients (VMs)
> > > - Both mount the same Windows NFS server (NFSv4.1)
> > > - Kernel version: 6.1.78
> > > - Mount options: vers=4.1,soft,proto=tcp,timeo=60,retrans=10
> > 
> > Which version of WindowsServer do you use, e.g what does the "ver"
> > command in cmd.exe output? How did you set up the user accounts, and
> > which authentication (AUTH_SYS, GSS, ...) do you use?
> > Which CPU architecture do you use? How much memory do you have on the
> > Linux NFS client?
> > 
> > Lionel
> 
> 
> 
> 
> 






^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-07  1:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06  7:46 [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server 郭玲兴
2026-05-06 13:28 ` Lionel Cons
2026-05-07  0:50   ` 郭玲兴
2026-05-07  1:22     ` 郭玲兴

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox