Hi,


We encountered a reproducible NFSv4.1 client hang issue under concurrent workload.


Environment:
- Two independent Linux clients (VMs)
- Both mount the same Windows NFS server (NFSv4.1)
- Kernel version: 6.1.78
- Mount options: vers=4.1,soft,proto=tcp,timeo=60,retrans=10


Workload:
- Each client copies ~5GB files to the same NFS share
- Copy runs every 30 minutes
- After ~41 iterations (~20 hours), both clients hang simultaneously


Symptoms:
- All NFS operations (ls, df, rsync, cp) hang in D state
- No NFS RPC traffic observed (tcpdump shows only TCP ACK)
- nfsstat shows retrans=0


Sysrq stack shows:


NFS state manager thread:
  nfs4_run_state_manager
  nfs4_drain_slot_tbl
  wait_for_completion_interruptible


User processes:
  rpc_wait_bit_killable
  nfs4_proc_getattr
  nfs4_run_open_task


Both clients exhibit identical behavior at the same time.


This suggests that:
- The client enters NFSv4.1 state recovery
- nfs4_drain_slot_tbl waits for slots to drain
- At least one slot never completes
- All further RPCs are blocked


Questions:
1. Is it expected that nfs4_drain_slot_tbl can block indefinitely?
2. What conditions can cause a slot to never be released?
3. Should the client force session reset instead of waiting forever?
4. Is this a known interoperability issue with Windows NFSv4.1 server?


We can provide additional logs if needed.


Thanks.