public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* NFS stuck in nfs_lookup_revalidate
@ 2025-06-03 11:57 Lukáš Hejtmánek
  2025-06-04 17:40 ` Anna Schumaker
       [not found] ` <CAOuNp5kzbyVfbdumXJF3bb=RKxdE5P8aKJDeSoLtgEV9=9xU+g@mail.gmail.com>
  0 siblings, 2 replies; 5+ messages in thread
From: Lukáš Hejtmánek @ 2025-06-03 11:57 UTC (permalink / raw)
  To: linux-nfs@vger.kernel.org; +Cc: Zdenek Salvet


Hello,

We are experiencing repeated PostgreSQL process freezes when using NFSv3-mounted storage on clients running Ubuntu kernels in the 6.x series (specifically tested on 6.2, 6.8, and 6.11).

Setup:
    - Storage: All-flash disk array exported via NFS (v3)
    - Client OS: Ubuntu with kernel versions 6.2, 6.8, 6.11
    - Application: PostgreSQL using the NFS volume as its data directory

Symptoms:
On affected systems, PostgreSQL processes (particularly autovacuum workers) intermittently hang. 
The stack trace shows a consistent pattern involving __nfs_lookup_revalidate:
[<0>] __nfs_lookup_revalidate+0x113/0x160 [nfs]
[<0>] nfs_lookup_revalidate+0x15/0x30 [nfs]
[<0>] lookup_fast+0x87/0x100
[<0>] open_last_lookups+0x5f/0x400
[<0>] path_openat+0x99/0x2d0
[<0>] do_filp_open+0xaf/0x170
[<0>] do_sys_openat2+0xb3/0xe0
[<0>] __x64_sys_openat+0x55/0xa0
[<0>] x64_sys_call+0x1eb1/0x25a0
[<0>] do_syscall_64+0x7f/0x180
[<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80

Process Tree Example:
863813 ?        Zsl   12:24  \_ [manager] <defunct>
3644504 ?        Ds     0:00      \_ postgres: mlflow: autovacuum worker template1

The autovacuum worker is most commonly affected.

Workaround Attempt:
We observed some improvement by modifying the NFS client source fs/nfs/dir.c (around line 1833):

Change:
dentry->d_fsdata = NFS_FSDATA_BLOCKED;

To:
smp_store_release(&dentry->d_fsdata, NFS_FSDATA_BLOCKED);

While this mitigates the issue somewhat, it does not fully resolve the hangs.

Is this a known issue with NFS in 6.x kernels?
Is there a recommended patch or workaround?
Are there any known regressions related to __nfs_lookup_revalidate or dentry locking?

Problem can be related to the all-flash array, that is able to provide about 30k IOPS over NFS and 5633 TPS in pgbench (pgbench -T 300 -c100 -j20 -r).

Other NFS connections to the same NFS servers are not affected and are usable, however, the process cannot be kille obviously and the client node reboot is required.

I believe that in 5.x kernel series it was more stable.

--
Lukáš Hejtmánek

Linux Administrator only because
  Full Time Multitasking Ninja 
  is not an official job title


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-19  0:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 11:57 NFS stuck in nfs_lookup_revalidate Lukáš Hejtmánek
2025-06-04 17:40 ` Anna Schumaker
     [not found] ` <CAOuNp5kzbyVfbdumXJF3bb=RKxdE5P8aKJDeSoLtgEV9=9xU+g@mail.gmail.com>
2025-06-04 19:11   ` Zdenek Salvet
2025-07-18 22:19     ` Lukáš Hejtmánek
2025-07-19  0:02       ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox