* [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
@ 2026-05-06 7:46 郭玲兴
2026-05-06 13:28 ` Lionel Cons
0 siblings, 1 reply; 4+ messages in thread
From: 郭玲兴 @ 2026-05-06 7:46 UTC (permalink / raw)
To: linux-nfs, anna.schumaker, linux-kernel; +Cc: trond.myklebust
[-- Attachment #1.1: Type: text/plain, Size: 1393 bytes --]
Hi,
We encountered a reproducible NFSv4.1 client hang issue under concurrent workload.
Environment:
- Two independent Linux clients (VMs)
- Both mount the same Windows NFS server (NFSv4.1)
- Kernel version: 6.1.78
- Mount options: vers=4.1,soft,proto=tcp,timeo=60,retrans=10
Workload:
- Each client copies ~5GB files to the same NFS share
- Copy runs every 30 minutes
- After ~41 iterations (~20 hours), both clients hang simultaneously
Symptoms:
- All NFS operations (ls, df, rsync, cp) hang in D state
- No NFS RPC traffic observed (tcpdump shows only TCP ACK)
- nfsstat shows retrans=0
Sysrq stack shows:
NFS state manager thread:
nfs4_run_state_manager
nfs4_drain_slot_tbl
wait_for_completion_interruptible
User processes:
rpc_wait_bit_killable
nfs4_proc_getattr
nfs4_run_open_task
Both clients exhibit identical behavior at the same time.
This suggests that:
- The client enters NFSv4.1 state recovery
- nfs4_drain_slot_tbl waits for slots to drain
- At least one slot never completes
- All further RPCs are blocked
Questions:
1. Is it expected that nfs4_drain_slot_tbl can block indefinitely?
2. What conditions can cause a slot to never be released?
3. Should the client force session reset instead of waiting forever?
4. Is this a known interoperability issue with Windows NFSv4.1 server?
We can provide additional logs if needed.
Thanks.
[-- Attachment #1.2: Type: text/html, Size: 3987 bytes --]
[-- Attachment #2: stack.txt --]
[-- Type: text/plain, Size: 16431 bytes --]
root@storagerepo:~# echo 1 > /proc/sys/kernel/sysrq
root@storagerepo:~# echo t > /proc/sysrq-trigger
root@storagerepo:~# dmesg -T | tail -2000 > /tmp/sysrq_t_nfs_hang.txt
root@storagerepo:~# grep -E -i "ls|nfs|rpc|sunrpc|rpciod|state|slot|session|wait|stack" /tmp/sysrq_t_nfs_hang.txt
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1057172 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] futex_wait_queue+0x64/0x90
[Wed May 6 15:05:59 2026] futex_wait+0x177/0x270
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1057173 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] do_epoll_wait+0x620/0x780
[Wed May 6 15:05:59 2026] __x64_sys_epoll_wait+0x6f/0x110
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1057174 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] futex_wait_queue+0x64/0x90
[Wed May 6 15:05:59 2026] futex_wait+0x177/0x270
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1057175 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] futex_wait_queue+0x64/0x90
[Wed May 6 15:05:59 2026] futex_wait+0x177/0x270
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1057176 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] futex_wait_queue+0x64/0x90
[Wed May 6 15:05:59 2026] futex_wait+0x177/0x270
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1057177 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] do_epoll_wait+0x620/0x780
[Wed May 6 15:05:59 2026] __x64_sys_epoll_wait+0x6f/0x110
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1057178 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] futex_wait_queue+0x64/0x90
[Wed May 6 15:05:59 2026] futex_wait+0x177/0x270
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1057179 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] do_epoll_wait+0x620/0x780
[Wed May 6 15:05:59 2026] do_compat_epoll_pwait.part.0+0x12/0x80
[Wed May 6 15:05:59 2026] __x64_sys_epoll_pwait+0x96/0x140
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1057180 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] futex_wait_queue+0x64/0x90
[Wed May 6 15:05:59 2026] futex_wait+0x177/0x270
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1057197 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] futex_wait_queue+0x64/0x90
[Wed May 6 15:05:59 2026] futex_wait+0x177/0x270
[Wed May 6 15:05:59 2026] task:containerd-shim state:S stack:0 pid:1333826 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] futex_wait_queue+0x64/0x90
[Wed May 6 15:05:59 2026] futex_wait+0x177/0x270
[Wed May 6 15:05:59 2026] task:start_container state:S stack:0 pid:1057191 ppid:1057171 flags:0x00000002
[Wed May 6 15:05:59 2026] do_wait+0x166/0x300
[Wed May 6 15:05:59 2026] kernel_wait4+0xbd/0x160
[Wed May 6 15:05:59 2026] __do_sys_wait4+0xa2/0xb0
[Wed May 6 15:05:59 2026] __x64_sys_wait4+0x1c/0x30
[Wed May 6 15:05:59 2026] task:python3 state:S stack:0 pid:1057217 ppid:1057191 flags:0x00000002
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] task:bash state:D stack:0 pid:1057260 ppid:11873 flags:0x00000002
[Wed May 6 15:05:59 2026] rpc_wait_bit_killable+0x11/0x70 [sunrpc]
[Wed May 6 15:05:59 2026] __wait_on_bit+0x42/0x110
[Wed May 6 15:05:59 2026] ? __bpf_trace_cache_event+0x10/0x10 [sunrpc]
[Wed May 6 15:05:59 2026] out_of_line_wait_on_bit+0x8c/0xb0
[Wed May 6 15:05:59 2026] rpc_wait_for_completion_task+0x23/0x30 [sunrpc]
[Wed May 6 15:05:59 2026] nfs4_run_open_task+0x150/0x1e0 [nfsv4]
[Wed May 6 15:05:59 2026] nfs4_do_open+0x2cf/0xcf0 [nfsv4]
[Wed May 6 15:05:59 2026] ? alloc_nfs_open_context+0x2f/0x130 [nfs]
[Wed May 6 15:05:59 2026] nfs4_atomic_open+0xf3/0x100 [nfsv4]
[Wed May 6 15:05:59 2026] nfs_atomic_open+0x204/0x740 [nfs]
[Wed May 6 15:05:59 2026] task:rsync state:S stack:0 pid:1057415 ppid:1057217 flags:0x00000002
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] task:vsftpd state:S stack:0 pid:1057416 ppid:1057217 flags:0x00000002
[Wed May 6 15:05:59 2026] task:172.50.0.120-ma state:S stack:0 pid:1065524 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] wait_for_completion_interruptible+0x145/0x1b0
[Wed May 6 15:05:59 2026] nfs4_drain_slot_tbl+0x4d/0x70 [nfsv4]
[Wed May 6 15:05:59 2026] nfs4_run_state_manager+0x3ab/0xb50 [nfsv4]
[Wed May 6 15:05:59 2026] ? nfs4_do_reclaim+0x970/0x970 [nfsv4]
[Wed May 6 15:05:59 2026] task:rsync state:D stack:0 pid:1166689 ppid:1057415 flags:0x00000002
[Wed May 6 15:05:59 2026] rpc_wait_bit_killable+0x11/0x70 [sunrpc]
[Wed May 6 15:05:59 2026] __wait_on_bit+0x42/0x110
[Wed May 6 15:05:59 2026] ? __bpf_trace_cache_event+0x10/0x10 [sunrpc]
[Wed May 6 15:05:59 2026] out_of_line_wait_on_bit+0x8c/0xb0
[Wed May 6 15:05:59 2026] __rpc_execute+0x137/0x4b0 [sunrpc]
[Wed May 6 15:05:59 2026] ? rpc_new_task+0x172/0x1e0 [sunrpc]
[Wed May 6 15:05:59 2026] rpc_execute+0xd2/0x100 [sunrpc]
[Wed May 6 15:05:59 2026] rpc_run_task+0x12d/0x190 [sunrpc]
[Wed May 6 15:05:59 2026] nfs4_do_call_sync+0x6b/0xa0 [nfsv4]
[Wed May 6 15:05:59 2026] _nfs4_proc_getattr+0x13b/0x170 [nfsv4]
[Wed May 6 15:05:59 2026] ? nfs_alloc_fattr_with_label+0x27/0xc0 [nfs]
[Wed May 6 15:05:59 2026] nfs4_proc_getattr+0x6e/0x100 [nfsv4]
[Wed May 6 15:05:59 2026] __nfs_revalidate_inode+0xa6/0x2b0 [nfs]
[Wed May 6 15:05:59 2026] nfs_access_get_cached+0x13c/0x1d0 [nfs]
[Wed May 6 15:05:59 2026] nfs_do_access+0x60/0x280 [nfs]
[Wed May 6 15:05:59 2026] nfs_permission+0x99/0x190 [nfs]
[Wed May 6 15:05:59 2026] task:kworker/2:0 state:I stack:0 pid:1368373 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] task:sshd state:S stack:0 pid:511119 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] task:sh state:S stack:0 pid:511427 ppid:511119 flags:0x00000002
[Wed May 6 15:05:59 2026] do_wait+0x166/0x300
[Wed May 6 15:05:59 2026] kernel_wait4+0xbd/0x160
[Wed May 6 15:05:59 2026] __do_sys_wait4+0xa2/0xb0
[Wed May 6 15:05:59 2026] __x64_sys_wait4+0x1c/0x30
[Wed May 6 15:05:59 2026] task:kworker/1:2 state:I stack:0 pid:570066 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] task:sshd state:S stack:0 pid:572471 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] task:sh state:S stack:0 pid:572653 ppid:572471 flags:0x00000002
[Wed May 6 15:05:59 2026] do_wait+0x166/0x300
[Wed May 6 15:05:59 2026] kernel_wait4+0xbd/0x160
[Wed May 6 15:05:59 2026] __do_sys_wait4+0xa2/0xb0
[Wed May 6 15:05:59 2026] __x64_sys_wait4+0x1c/0x30
[Wed May 6 15:05:59 2026] task:ls state:D stack:0 pid:572654 ppid:572653 flags:0x00000002
[Wed May 6 15:05:59 2026] rpc_wait_bit_killable+0x11/0x70 [sunrpc]
[Wed May 6 15:05:59 2026] __wait_on_bit+0x42/0x110
[Wed May 6 15:05:59 2026] ? __bpf_trace_cache_event+0x10/0x10 [sunrpc]
[Wed May 6 15:05:59 2026] out_of_line_wait_on_bit+0x8c/0xb0
[Wed May 6 15:05:59 2026] __rpc_execute+0x137/0x4b0 [sunrpc]
[Wed May 6 15:05:59 2026] ? rpc_new_task+0x172/0x1e0 [sunrpc]
[Wed May 6 15:05:59 2026] rpc_execute+0xd2/0x100 [sunrpc]
[Wed May 6 15:05:59 2026] rpc_run_task+0x12d/0x190 [sunrpc]
[Wed May 6 15:05:59 2026] nfs4_do_call_sync+0x6b/0xa0 [nfsv4]
[Wed May 6 15:05:59 2026] _nfs4_proc_getattr+0x13b/0x170 [nfsv4]
[Wed May 6 15:05:59 2026] ? nfs_alloc_fattr_with_label+0x27/0xc0 [nfs]
[Wed May 6 15:05:59 2026] nfs4_proc_getattr+0x6e/0x100 [nfsv4]
[Wed May 6 15:05:59 2026] __nfs_revalidate_inode+0xa6/0x2b0 [nfs]
[Wed May 6 15:05:59 2026] nfs_getattr+0x2f4/0x470 [nfs]
[Wed May 6 15:05:59 2026] task:tail state:S stack:0 pid:572655 ppid:572653 flags:0x00000002
[Wed May 6 15:05:59 2026] task:sshd state:S stack:0 pid:857023 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] task:sh state:S stack:0 pid:857335 ppid:857023 flags:0x00000002
[Wed May 6 15:05:59 2026] do_wait+0x166/0x300
[Wed May 6 15:05:59 2026] kernel_wait4+0xbd/0x160
[Wed May 6 15:05:59 2026] __do_sys_wait4+0xa2/0xb0
[Wed May 6 15:05:59 2026] __x64_sys_wait4+0x1c/0x30
[Wed May 6 15:05:59 2026] task:kworker/1:1 state:I stack:0 pid:971568 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] task:kworker/2:2 state:I stack:0 pid:973219 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] task:kworker/u8:1 state:I stack:0 pid:2250065 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] task:kworker/3:0 state:I stack:0 pid:2330645 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] task:kworker/u8:2 state:I stack:0 pid:2503821 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] Workqueue: 0x0 (rpciod)
[Wed May 6 15:05:59 2026] task:kworker/0:0 state:I stack:0 pid:2559575 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] task:kworker/0:1 state:I stack:0 pid:2592413 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] task:kworker/u8:0 state:I stack:0 pid:2597836 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] task:sshd state:S stack:0 pid:2603312 ppid:1 flags:0x00000002
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] task:sh state:R running task stack:0 pid:2603620 ppid:2603312 flags:0x00004002
[Wed May 6 15:05:59 2026] show_state_filter+0x5e/0x100
[Wed May 6 15:05:59 2026] sysrq_handle_showstate+0x10/0x20
[Wed May 6 15:05:59 2026] task:tcpdump state:S stack:0 pid:2610467 ppid:857335 flags:0x00000002
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] ? __pollwait+0xe0/0xe0
[Wed May 6 15:05:59 2026] task:df state:D stack:0 pid:2610988 ppid:511427 flags:0x00000002
[Wed May 6 15:05:59 2026] rpc_wait_bit_killable+0x11/0x70 [sunrpc]
[Wed May 6 15:05:59 2026] __wait_on_bit+0x42/0x110
[Wed May 6 15:05:59 2026] ? __bpf_trace_cache_event+0x10/0x10 [sunrpc]
[Wed May 6 15:05:59 2026] out_of_line_wait_on_bit+0x8c/0xb0
[Wed May 6 15:05:59 2026] __rpc_execute+0x137/0x4b0 [sunrpc]
[Wed May 6 15:05:59 2026] ? rpc_new_task+0x172/0x1e0 [sunrpc]
[Wed May 6 15:05:59 2026] rpc_execute+0xd2/0x100 [sunrpc]
[Wed May 6 15:05:59 2026] rpc_run_task+0x12d/0x190 [sunrpc]
[Wed May 6 15:05:59 2026] nfs4_do_call_sync+0x6b/0xa0 [nfsv4]
[Wed May 6 15:05:59 2026] _nfs4_proc_getattr+0x13b/0x170 [nfsv4]
[Wed May 6 15:05:59 2026] nfs4_proc_getattr+0x6e/0x100 [nfsv4]
[Wed May 6 15:05:59 2026] __nfs_revalidate_inode+0xa6/0x2b0 [nfs]
[Wed May 6 15:05:59 2026] nfs_getattr+0x2f4/0x470 [nfs]
[Wed May 6 15:05:59 2026] task:kworker/0:2 state:I stack:0 pid:2625351 ppid:2 flags:0x00004000
[Wed May 6 15:05:59 2026] task:sleep state:S stack:0 pid:2633473 ppid:11873 flags:0x00000002
[Wed May 6 15:05:59 2026] task:sleep state:S stack:0 pid:2633632 ppid:11552 flags:0x00000002
[Wed May 6 15:05:59 2026] task:sleep state:S stack:0 pid:2633633 ppid:11683 flags:0x00000002
[Wed May 6 15:05:59 2026] task:sleep state:S stack:0 pid:2633634 ppid:11682 flags:0x00000002
[Wed May 6 15:05:59 2026] task:sleep state:S stack:0 pid:2633690 ppid:11608 flags:0x00000002
[Wed May 6 15:05:59 2026] cfs_rq[0]:/user.slice/user-0.slice/session-c47.scope
[Wed May 6 15:05:59 2026] S task PID tree-key switches prio wait-time sum-exec sum-sleep
[Wed May 6 15:05:59 2026] S sshd 572471 40.854415 2749 120 0.000000 187.624459 0.000000 0.000000 /user.slice/user-0.slice/session-c17.scope
[Wed May 6 15:05:59 2026] S sh 572653 9.976400 1 120 0.000000 2.024990 0.000000 0.000000 /user.slice/user-0.slice/session-c17.scope
[Wed May 6 15:05:59 2026] S sshd 857023 84.311169 6127 120 0.000000 315.874935 0.000000 0.000000 /user.slice/user-0.slice/session-c32.scope
[Wed May 6 15:05:59 2026] S sh 857335 74.620879 146 120 0.000000 26.479416 0.000000 0.000000 /user.slice/user-0.slice/session-c32.scope
[Wed May 6 15:05:59 2026] S sshd 2603312 43.116931 2279 120 0.000000 127.678161 0.000000 0.000000 /user.slice/user-0.slice/session-c47.scope
[Wed May 6 15:05:59 2026] >R sh 2603620 45.102420 102 120 0.000000 49.337532 0.000000 0.000000 /user.slice/user-0.slice/session-c47.scope
[Wed May 6 15:05:59 2026] D df 2610988 112.369425 2 120 0.000000 0.976203 0.000000 0.000000 /user.slice/user-0.slice/session-c5.scope
[Wed May 6 15:05:59 2026] S task PID tree-key switches prio wait-time sum-exec sum-sleep
[Wed May 6 15:05:59 2026] I nfsiod 11517 14249.162010 2 100 0.000000 0.014478 0.000000 0.000000 /
[Wed May 6 15:05:59 2026] S NFSv4 callback 1057043 10301690.367917 2 120 0.000000 0.010939 0.000000 0.000000 /
[Wed May 6 15:05:59 2026] S task PID tree-key switches prio wait-time sum-exec sum-sleep
[Wed May 6 15:05:59 2026] S sshd 9880 1397.825913 69753 120 0.000000 4095.507095 0.000000 0.000000 /user.slice/user-0.slice/session-c2.scope
[Wed May 6 15:05:59 2026] S sshd 511119 188.588500 7090 120 0.000000 389.997546 0.000000 0.000000 /user.slice/user-0.slice/session-c5.scope
[Wed May 6 15:05:59 2026] S sh 511427 179.750924 213 120 0.000000 29.271168 0.000000 0.000000 /user.slice/user-0.slice/session-c5.scope
[Wed May 6 15:05:59 2026] D ls 572654 8.821982 1 120 0.000000 0.870567 0.000000 0.000000 /user.slice/user-0.slice/session-c17.scope
[Wed May 6 15:05:59 2026] cfs_rq[3]:/user.slice/user-0.slice/session-c47.scope
[Wed May 6 15:05:59 2026] S task PID tree-key switches prio wait-time sum-exec sum-sleep
[Wed May 6 15:05:59 2026] I rpciod 338 1744.571935 2 100 0.000000 0.018540 0.000000 0.000000 /
[Wed May 6 15:05:59 2026] S tail 572655 9.339705 1 120 0.000000 0.607847 0.000000 0.000000 /user.slice/user-0.slice/session-c17.scope
[Wed May 6 15:05:59 2026] S tcpdump 2610467 110.272032 11 120 0.000000 2.436573 0.000000 0.000000 /user.slice/user-0.slice/session-c32.scope
[Wed May 6 15:05:59 2026] Showing busy workqueues and worker pools:
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
2026-05-06 7:46 [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server 郭玲兴
@ 2026-05-06 13:28 ` Lionel Cons
2026-05-07 0:50 ` 郭玲兴
0 siblings, 1 reply; 4+ messages in thread
From: Lionel Cons @ 2026-05-06 13:28 UTC (permalink / raw)
To: 郭玲兴, linux-nfs, linux-kernel
On Wed, 6 May 2026 at 09:49, 郭玲兴 <guolingxing@supcon.com> wrote:
>
> Hi,
>
>
> We encountered a reproducible NFSv4.1 client hang issue under concurrent workload.
>
>
> Environment:
> - Two independent Linux clients (VMs)
> - Both mount the same Windows NFS server (NFSv4.1)
> - Kernel version: 6.1.78
> - Mount options: vers=4.1,soft,proto=tcp,timeo=60,retrans=10
Which version of WindowsServer do you use, e.g what does the "ver"
command in cmd.exe output? How did you set up the user accounts, and
which authentication (AUTH_SYS, GSS, ...) do you use?
Which CPU architecture do you use? How much memory do you have on the
Linux NFS client?
Lionel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
2026-05-06 13:28 ` Lionel Cons
@ 2026-05-07 0:50 ` 郭玲兴
2026-05-07 1:22 ` 郭玲兴
0 siblings, 1 reply; 4+ messages in thread
From: 郭玲兴 @ 2026-05-07 0:50 UTC (permalink / raw)
To: Lionel Cons; +Cc: linux-nfs, linux-kernel
Hi Lionel,
Thanks for your response.
Here are the details:
1. Windows Server version:
Microsoft Windows Server 2022
Version 10.0.20348.587
2. User accounts:
No mapping mechanism is configured.
No AD, LDAP, or passwd mapping is used.
Unmapped users are handled by the default "Everyone" account.
3. Authentication:
sec=sys (AUTH_SYS), as reported by nfsstat -m
4. CPU architecture:
- Linux clients: x86_64
- Windows server: x86_64 (64-bit OS)
5. Memory:
Each Linux client VM has 16GB RAM
Thanks.
> -----原始邮件-----
> 发件人: "Lionel Cons" <lionelcons1972@gmail.com>
> 发送时间:2026-05-06 21:28:33 (星期三)
> 收件人: 郭玲兴 <guolingxing@supcon.com>, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
> 主题: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
>
> On Wed, 6 May 2026 at 09:49, 郭玲兴 <guolingxing@supcon.com> wrote:
> >
> > Hi,
> >
> >
> > We encountered a reproducible NFSv4.1 client hang issue under concurrent workload.
> >
> >
> > Environment:
> > - Two independent Linux clients (VMs)
> > - Both mount the same Windows NFS server (NFSv4.1)
> > - Kernel version: 6.1.78
> > - Mount options: vers=4.1,soft,proto=tcp,timeo=60,retrans=10
>
> Which version of WindowsServer do you use, e.g what does the "ver"
> command in cmd.exe output? How did you set up the user accounts, and
> which authentication (AUTH_SYS, GSS, ...) do you use?
> Which CPU architecture do you use? How much memory do you have on the
> Linux NFS client?
>
> Lionel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Re: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
2026-05-07 0:50 ` 郭玲兴
@ 2026-05-07 1:22 ` 郭玲兴
0 siblings, 0 replies; 4+ messages in thread
From: 郭玲兴 @ 2026-05-07 1:22 UTC (permalink / raw)
To: Lionel Cons; +Cc: linux-nfs, linux-kernel
Hi Lionel,
Thanks for your response.
Here are the details you requested:
1. Windows Server version:
Microsoft Windows Server 2022
Version 10.0.20348.587
2. User accounts:
No mapping mechanism is configured.
No AD, LDAP, or passwd mapping is used.
Unmapped users are handled by the default "Everyone" account.
3. Authentication:
sec=sys (AUTH_SYS), as reported by nfsstat -m
4. CPU architecture:
- Linux clients: x86_64
- Windows server: x86_64 (64-bit OS)
5. Memory:
Each Linux client VM has 16GB RAM
---
Additional observations from two independent clients:
Client A:
age: 498061
lease_time: 120
lease_expired: 497941
Client B:
age: 69598
lease_time: 120
lease_expired: 69478
In both cases, lease_expired is approximately equal to (age - lease_time),
which suggests that the lease expired shortly after mount and has not
been successfully renewed since.
At the same time:
- Both clients hang simultaneously under concurrent workload
- Clients are stuck in nfs4_drain_slot_tbl
- No NFS RPC traffic is observed at hang time (only TCP ACK)
- nfsstat shows retrans=0
- On the Windows server side, the NFS session state is reported as "Initialized"
We are currently tracing the RPC lifecycle to identify which RPC does not complete.
Please let us know if further information would be helpful.
Thanks.
> -----原始邮件-----
> 发件人: 郭玲兴 <guolingxing@supcon.com>
> 发送时间:2026-05-07 08:50:23 (星期四)
> 收件人: "Lionel Cons" <lionelcons1972@gmail.com>
> 抄送: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
> 主题: Re: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
>
> Hi Lionel,
>
> Thanks for your response.
>
> Here are the details:
>
> 1. Windows Server version:
> Microsoft Windows Server 2022
> Version 10.0.20348.587
>
> 2. User accounts:
> No mapping mechanism is configured.
> No AD, LDAP, or passwd mapping is used.
>
> Unmapped users are handled by the default "Everyone" account.
>
> 3. Authentication:
> sec=sys (AUTH_SYS), as reported by nfsstat -m
>
> 4. CPU architecture:
> - Linux clients: x86_64
> - Windows server: x86_64 (64-bit OS)
>
> 5. Memory:
> Each Linux client VM has 16GB RAM
>
> Thanks.
>
>
> > -----原始邮件-----
> > 发件人: "Lionel Cons" <lionelcons1972@gmail.com>
> > 发送时间:2026-05-06 21:28:33 (星期三)
> > 收件人: 郭玲兴 <guolingxing@supcon.com>, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
> > 主题: Re: [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server
> >
> > On Wed, 6 May 2026 at 09:49, 郭玲兴 <guolingxing@supcon.com> wrote:
> > >
> > > Hi,
> > >
> > >
> > > We encountered a reproducible NFSv4.1 client hang issue under concurrent workload.
> > >
> > >
> > > Environment:
> > > - Two independent Linux clients (VMs)
> > > - Both mount the same Windows NFS server (NFSv4.1)
> > > - Kernel version: 6.1.78
> > > - Mount options: vers=4.1,soft,proto=tcp,timeo=60,retrans=10
> >
> > Which version of WindowsServer do you use, e.g what does the "ver"
> > command in cmd.exe output? How did you set up the user accounts, and
> > which authentication (AUTH_SYS, GSS, ...) do you use?
> > Which CPU architecture do you use? How much memory do you have on the
> > Linux NFS client?
> >
> > Lionel
>
>
>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-05-07 1:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 7:46 [BUG] NFSv4.1 client hang in nfs4_drain_slot_tbl under concurrent workload against Windows NFS server 郭玲兴
2026-05-06 13:28 ` Lionel Cons
2026-05-07 0:50 ` 郭玲兴
2026-05-07 1:22 ` 郭玲兴
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox