nfsd becomes a zombie

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* nfsd becomes a zombie
@ 2024-06-17  6:55 Harald Dunkel
  2024-06-17 14:31 ` Chuck Lever III
  0 siblings, 1 reply; 31+ messages in thread
From: Harald Dunkel @ 2024-06-17  6:55 UTC (permalink / raw)
  To: linux-nfs

Hi folks,

what would be the reason for nfsd getting stuck somehow and becoming
an unkillable process? See

- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
- https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568

Doesn't this mean that something inside the kernel gets stuck as
well? Seems odd to me.

Regards
Harri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-17  6:55 nfsd becomes a zombie Harald Dunkel
@ 2024-06-17 14:31 ` Chuck Lever III
  2024-06-17 19:20   ` Calum Mackay
  2024-12-25  9:15   ` nfsd blocks indefinitely in nfsd4_destroy_session (was: Re: nfsd becomes a zombie) Salvatore Bonaccorso
  0 siblings, 2 replies; 31+ messages in thread
From: Chuck Lever III @ 2024-06-17 14:31 UTC (permalink / raw)
  To: Harald Dunkel; +Cc: Linux NFS Mailing List



> On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
> 
> Hi folks,
> 
> what would be the reason for nfsd getting stuck somehow and becoming
> an unkillable process? See
> 
> - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
> - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
> 
> Doesn't this mean that something inside the kernel gets stuck as
> well? Seems odd to me.

I'm not familiar with the Debian or Ubuntu kernel packages. Can
the kernel release numbers be translated to LTS kernel releases
please? Need both "last known working" and "first broken" releases.

This:

[ 6596.911785] RPC: Could not send backchannel reply error: -110
[ 6596.972490] RPC: Could not send backchannel reply error: -110
[ 6837.281307] RPC: Could not send backchannel reply error: -110

is a known set of client backchannel bugs. Knowing the LTS kernel
releases (see above) will help us figure out what needs to be
backported to the LTS kernels kernels in question.

This:

[11183.290619] wait_for_completion+0x88/0x150
[11183.290623] __flush_workqueue+0x140/0x3e0
[11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
[11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]

is probably related to the backchannel errors on the client, but
client bugs shouldn't cause the server to hang like this. We
might be able to say more if you can provide the kernel release
translations (see above).


--
Chuck Lever



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-17 14:31 ` Chuck Lever III
@ 2024-06-17 19:20   ` Calum Mackay
  2024-06-18 14:29     ` Harald Dunkel
  2024-12-25  9:15   ` nfsd blocks indefinitely in nfsd4_destroy_session (was: Re: nfsd becomes a zombie) Salvatore Bonaccorso
  1 sibling, 1 reply; 31+ messages in thread
From: Calum Mackay @ 2024-06-17 19:20 UTC (permalink / raw)
  To: Chuck Lever III, Harald Dunkel; +Cc: Calum Mackay, Linux NFS Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 2243 bytes --]

On 17/06/2024 3:31 pm, Chuck Lever III wrote:
> 
> 
>> On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
>>
>> Hi folks,
>>
>> what would be the reason for nfsd getting stuck somehow and becoming
>> an unkillable process? See
>>
>> - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
>> - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
>>
>> Doesn't this mean that something inside the kernel gets stuck as
>> well? Seems odd to me.
> 
> I'm not familiar with the Debian or Ubuntu kernel packages. Can
> the kernel release numbers be translated to LTS kernel releases

The Debian NFS server's broken kernel corresponds to upstream stable 
6.1.90. There doesn't seem to be any note of a working version.

Decoding the Ubuntu pkg version is harder (but is obviously the 6.8.y 
series; the actual corresponding 'y' value is printed during boot, but 
other than downloading the pkg source, I'm not sure how to find it 
otherwise). The older Ubuntu mentioned in the bug as seeing the nfsd 
issue will be 6.5.y.


Harald: do you have a Debian/Ubuntu kernel version that doesn't see the 
issue, please? i.e. ideally from the same 6.1.y series…

cheers,
c.



> please? Need both "last known working" and "first broken" releases.
> 
> This:
> 
> [ 6596.911785] RPC: Could not send backchannel reply error: -110
> [ 6596.972490] RPC: Could not send backchannel reply error: -110
> [ 6837.281307] RPC: Could not send backchannel reply error: -110
> 
> is a known set of client backchannel bugs. Knowing the LTS kernel
> releases (see above) will help us figure out what needs to be
> backported to the LTS kernels kernels in question.
> 
> This:
> 
> [11183.290619] wait_for_completion+0x88/0x150
> [11183.290623] __flush_workqueue+0x140/0x3e0
> [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
> [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
> 
> is probably related to the backchannel errors on the client, but
> client bugs shouldn't cause the server to hang like this. We
> might be able to say more if you can provide the kernel release
> translations (see above).
> 
> 
> --
> Chuck Lever
> 
> 

-- 
Calum Mackay
Linux Kernel Engineering
Oracle Linux and Virtualisation


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-17 19:20   ` Calum Mackay
@ 2024-06-18 14:29     ` Harald Dunkel
  2024-06-18 14:32       ` Harald Dunkel
  2024-06-18 14:52       ` Chuck Lever
  0 siblings, 2 replies; 31+ messages in thread
From: Harald Dunkel @ 2024-06-18 14:29 UTC (permalink / raw)
  To: Calum Mackay, Chuck Lever III; +Cc: Linux NFS Mailing List

On 2024-06-17 21:20:54, Calum Mackay wrote:
> 
> Harald: do you have a Debian/Ubuntu kernel version that doesn't see the
> issue, please? i.e. ideally from the same 6.1.y series…
> 

I had migrated the Server from Debian 11 to Debian 12 on Dec 11th 2023.
The kernel version provided by Debian 11 at that time was 5.10.197. I
can't remember having seen a frozen nfsd with this version.

Since Dec 11th I had these kernels in use (all provided by Debian):

	6.1.66			linux-image-6.1.0-15-amd64
	6.1.69			linux-image-6.1.0-17-amd64
	6.1.76			linux-image-6.1.0-18-amd64
	6.1.85			linux-image-6.1.0-20-amd64
	6.1.90			linux-image-6.1.0-21-amd64

These are all "upstream" version numbers. The second column lists
Debian's package names.

AFAIR nfsd got stuck only on the 6.1.90 kernel (installed and booted
on May 18th, first time freeze on May 26th). But I've got a weak
memory in cases like this. I haven't found older kernel log files.

Please remember that I have seen a frozen nfsd only 3 times since
May 26th. Its not easy to reproduce.

Regards

Harri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-18 14:29     ` Harald Dunkel
@ 2024-06-18 14:32       ` Harald Dunkel
  2024-06-18 14:52       ` Chuck Lever
  1 sibling, 0 replies; 31+ messages in thread
From: Harald Dunkel @ 2024-06-18 14:32 UTC (permalink / raw)
  To: Calum Mackay, Chuck Lever III; +Cc: Linux NFS Mailing List

PS: Do you think the problem could be related to running
nfsd inside a container?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-18 14:29     ` Harald Dunkel
  2024-06-18 14:32       ` Harald Dunkel
@ 2024-06-18 14:52       ` Chuck Lever
  2024-06-19  7:32         ` Harald Dunkel
  1 sibling, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2024-06-18 14:52 UTC (permalink / raw)
  To: Harald Dunkel; +Cc: Calum Mackay, Linux NFS Mailing List

On Tue, Jun 18, 2024 at 04:29:55PM +0200, Harald Dunkel wrote:
> On 2024-06-17 21:20:54, Calum Mackay wrote:
> > 
> > Harald: do you have a Debian/Ubuntu kernel version that doesn't see the
> > issue, please? i.e. ideally from the same 6.1.y series…
> > 
> 
> I had migrated the Server from Debian 11 to Debian 12 on Dec 11th 2023.
> The kernel version provided by Debian 11 at that time was 5.10.197. I
> can't remember having seen a frozen nfsd with this version.
> 
> Since Dec 11th I had these kernels in use (all provided by Debian):
> 
> 	6.1.66			linux-image-6.1.0-15-amd64
> 	6.1.69			linux-image-6.1.0-17-amd64
> 	6.1.76			linux-image-6.1.0-18-amd64
> 	6.1.85			linux-image-6.1.0-20-amd64
> 	6.1.90			linux-image-6.1.0-21-amd64
> 
> These are all "upstream" version numbers. The second column lists
> Debian's package names.
> 
> AFAIR nfsd got stuck only on the 6.1.90 kernel (installed and booted
> on May 18th, first time freeze on May 26th). But I've got a weak
> memory in cases like this. I haven't found older kernel log files.
> 
> Please remember that I have seen a frozen nfsd only 3 times since
> May 26th. Its not easy to reproduce.

I can find no NFSD or SUNRPC changes between v6.1.85 and v6.1.90.
Between v6.1.76 and v6.1.85, there are 48 NFSD changes, 8 changes to
fs/nfs/, and 3 changes to net/sunrpc/.

Are there any NFS client changes during the second half of May that
correlate with the NFSD misbehavior?


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-18 14:52       ` Chuck Lever
@ 2024-06-19  7:32         ` Harald Dunkel
  2024-06-19  7:56           ` Harald Dunkel
  2024-06-19 13:14           ` Chuck Lever III
  0 siblings, 2 replies; 31+ messages in thread
From: Harald Dunkel @ 2024-06-19  7:32 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Calum Mackay, Linux NFS Mailing List

On 2024-06-18 16:52:00, Chuck Lever wrote:
> 
> I can find no NFSD or SUNRPC changes between v6.1.85 and v6.1.90.
> Between v6.1.76 and v6.1.85, there are 48 NFSD changes, 8 changes to
 > fs/nfs/, and 3 changes to net/sunrpc/.
 > > Are there any NFS client changes during the second half of May that
> correlate with the NFSD misbehavior?
> 
> 
There was a big change: Most clients (>100) were upgraded from Debian
11 to Debian 12 within the last couple of months. A big chunk of
about 40 hosts have been upgraded on May 21st. This included the
kernel upgrade to 6.1.xx, and the upgrade of the "nfs-common" package
from version 1.3.4 to 2.6.2.

The current nfs-common package provides these tools:

/sbin/mount.nfs
/sbin/mount.nfs4
/sbin/rpc.statd
/sbin/showmount
/sbin/sm-notify
/sbin/umount.nfs
/sbin/umount.nfs4
/usr/sbin/blkmapd
/usr/sbin/mountstats
/usr/sbin/nfsconf
/usr/sbin/nfsidmap
/usr/sbin/nfsiostat
/usr/sbin/nfsstat
/usr/sbin/rpc.gssd
/usr/sbin/rpc.idmapd
/usr/sbin/rpc.svcgssd
/usr/sbin/rpcctl
/usr/sbin/rpcdebug
/usr/sbin/start-statd

I have 3 clients with more recent kernel versions 5.10.209, 5.10.218
and 6.7.12. The 5.10.x hosts are running Debian 12 and this kernel
for several months, but the host with 6.7.12 is running this kernel
since May 16th.

These "backports" kernels were installed for testing purposes. Would
you recommend to downgrade these kernels to 6.1.90?

As asked before, do you think the problem could be related to running
nfsd inside an LXC container?


Regards
Harri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-19  7:32         ` Harald Dunkel
@ 2024-06-19  7:56           ` Harald Dunkel
  2024-06-19 13:14           ` Chuck Lever III
  1 sibling, 0 replies; 31+ messages in thread
From: Harald Dunkel @ 2024-06-19  7:56 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Calum Mackay, Linux NFS Mailing List

On 2024-06-19 09:32:23, Harald Dunkel wrote:
> 
> I have 3 clients with more recent kernel versions 5.10.209, 5.10.218
> and 6.7.12. The 5.10.x hosts are running Debian 12 and this kernel

5.10.x kernel means Debian *11*, of course.

> for several months, but the host with 6.7.12 is running this kernel
> since May 16th.
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-19  7:32         ` Harald Dunkel
  2024-06-19  7:56           ` Harald Dunkel
@ 2024-06-19 13:14           ` Chuck Lever III
  2024-06-20  5:29             ` Harald Dunkel
  2024-07-02 17:25             ` Harald Dunkel
  1 sibling, 2 replies; 31+ messages in thread
From: Chuck Lever III @ 2024-06-19 13:14 UTC (permalink / raw)
  To: Harald Dunkel; +Cc: Calum Mackay, Linux NFS Mailing List



> On Jun 19, 2024, at 3:32 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
> 
> On 2024-06-18 16:52:00, Chuck Lever wrote:
>> I can find no NFSD or SUNRPC changes between v6.1.85 and v6.1.90.
>> Between v6.1.76 and v6.1.85, there are 48 NFSD changes, 8 changes to
> > fs/nfs/, and 3 changes to net/sunrpc/.
> > > Are there any NFS client changes during the second half of May that
>> correlate with the NFSD misbehavior?
> There was a big change: Most clients (>100) were upgraded from Debian
> 11 to Debian 12 within the last couple of months. A big chunk of
> about 40 hosts have been upgraded on May 21st. This included the
> kernel upgrade to 6.1.xx, and the upgrade of the "nfs-common" package
> from version 1.3.4 to 2.6.2.
> 
> The current nfs-common package provides these tools:
> 
> /sbin/mount.nfs
> /sbin/mount.nfs4
> /sbin/rpc.statd
> /sbin/showmount
> /sbin/sm-notify
> /sbin/umount.nfs
> /sbin/umount.nfs4
> /usr/sbin/blkmapd
> /usr/sbin/mountstats
> /usr/sbin/nfsconf
> /usr/sbin/nfsidmap
> /usr/sbin/nfsiostat
> /usr/sbin/nfsstat
> /usr/sbin/rpc.gssd
> /usr/sbin/rpc.idmapd
> /usr/sbin/rpc.svcgssd
> /usr/sbin/rpcctl
> /usr/sbin/rpcdebug
> /usr/sbin/start-statd
> 
> I have 3 clients with more recent kernel versions 5.10.209, 5.10.218
> and 6.7.12. The 5.10.x hosts are running Debian 12 and this kernel
> for several months, but the host with 6.7.12 is running this kernel
> since May 16th.

This makes it very difficult to say what might have
introduced the problem in your environment.

My next step might be capturing network traces, but
you said the problem is hard to reproduce and you
seem to have a lot of clients, making such capture
impractical.


> These "backports" kernels were installed for testing purposes. Would
> you recommend to downgrade these kernels to 6.1.90?
> 
> As asked before, do you think the problem could be related to running
> nfsd inside an LXC container?

There's no way to answer either of these questions
since we have no idea what the problem is yet.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-19 13:14           ` Chuck Lever III
@ 2024-06-20  5:29             ` Harald Dunkel
  2024-06-20 19:09               ` Chuck Lever III
  2024-07-02 17:25             ` Harald Dunkel
  1 sibling, 1 reply; 31+ messages in thread
From: Harald Dunkel @ 2024-06-20  5:29 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Calum Mackay, Linux NFS Mailing List

Hi Chuck,

On 2024-06-19 15:14:15, Chuck Lever III wrote:
> 
> There's no way to answer either of these questions
> since we have no idea what the problem is yet.
> 

I am getting these messages again, similar to previous
incidents:

[Wed Jun 19 16:46:14 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000a5ff94a6 xid f9d6a568
[Wed Jun 19 18:42:23 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000a5ff94a6 xid 59d8a568
[Wed Jun 19 18:43:15 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000a5ff94a6 xid 5cd8a568

nfsd is still running as usual. Assuming they are related to
NFS, Is there a way to map these weird messages to a remote IP
address? I could setup a similar system and use it to capture
the traffic without breaking privacy.


Regards

Harri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-20  5:29             ` Harald Dunkel
@ 2024-06-20 19:09               ` Chuck Lever III
  0 siblings, 0 replies; 31+ messages in thread
From: Chuck Lever III @ 2024-06-20 19:09 UTC (permalink / raw)
  To: Harald Dunkel; +Cc: Calum Mackay, Linux NFS Mailing List



> On Jun 20, 2024, at 1:29 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
> 
> Hi Chuck,
> 
> On 2024-06-19 15:14:15, Chuck Lever III wrote:
>> There's no way to answer either of these questions
>> since we have no idea what the problem is yet.
> 
> I am getting these messages again, similar to previous
> incidents:
> 
> [Wed Jun 19 16:46:14 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000a5ff94a6 xid f9d6a568
> [Wed Jun 19 18:42:23 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000a5ff94a6 xid 59d8a568
> [Wed Jun 19 18:43:15 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000a5ff94a6 xid 5cd8a568
> 
> nfsd is still running as usual. Assuming they are related to
> NFS, Is there a way to map these weird messages to a remote IP
> address? I could setup a similar system and use it to capture
> the traffic without breaking privacy.

These messages do come from kernel NFS code, but it's still not
possible to say how they are related to the hang/deadlock.

The best you can do here is enable server tracing:

$ sudo su -
# trace-cmd record -e nfsd:nfsd_cb\*

That will pick up NFSv4 callback related events. When you are
done tracing, ^C.

# trace-cmd report | less

For trace points that record client address information, you
might also need the -R option to see everything in each trace
record.


--
Chuck Lever



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-06-19 13:14           ` Chuck Lever III
  2024-06-20  5:29             ` Harald Dunkel
@ 2024-07-02 17:25             ` Harald Dunkel
  2024-07-02 18:17               ` Chuck Lever III
  1 sibling, 1 reply; 31+ messages in thread
From: Harald Dunkel @ 2024-07-02 17:25 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Calum Mackay, Linux NFS Mailing List

Hi folks,

my NAS ran into this problem again. NFS got stuck somehow, and
the nfsd couldn't be killed :-(.

dmesg:
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000749c823f xid 5bf8d3d0
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ce307050 xid 3b4fbd9f
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000f7f9161e xid 0a26635c
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000007c978512 xid 384cbf0c
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000dc3c09f6 xid 53cc0e3e
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000d1675728 xid 129006af
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000047159b90 xid 0c06b6e0
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000008b3b3ac xid 641bb0da
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000009eb832dc xid 005fcc99
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000042dcce88 xid b3cf5de4
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b66bbd6f xid d4f06b56
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b5e5e5a3 xid c032dbba
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000e123efc9 xid 99fa75d9
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ca43f6f0 xid e38d5b74
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ad683927 xid 277cde8c
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000e8e01f09 xid 641df4a4
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000006223d195 xid 3dba2d2a
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b73943aa xid a688e47f
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000004cd80e49 xid 64e688ca
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ef92587f xid 70bf2e44
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000a5ff94a6 xid c0f7a668
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000fd9a0890 xid 0df7d2c7
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000c42ddaac xid 800e710e
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000f43275cf xid 8b05e704
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000009a1d5dcf xid 3c2ba924
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000007cad732d xid e73a0429
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000008e7d297f xid 075a98e5
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ed964446 xid 8bb8e568
[Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b14782f0 xid 4c4ae7c5
[Tue Jul  2 17:23:28 2024] INFO: task nfsd:3037 blocked for more than 120 seconds.
[Tue Jul  2 17:23:28 2024]       Not tainted 6.1.0-21-amd64 #1 Debian 6.1.90-1
[Tue Jul  2 17:23:28 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jul  2 17:23:28 2024] task:nfsd            state:D stack:0     pid:3037  ppid:2      flags:0x00004000
[Tue Jul  2 17:23:28 2024] Call Trace:
[Tue Jul  2 17:23:28 2024]  <TASK>
[Tue Jul  2 17:23:28 2024]  __schedule+0x34d/0x9e0
[Tue Jul  2 17:23:28 2024]  schedule+0x5a/0xd0
[Tue Jul  2 17:23:28 2024]  schedule_timeout+0x118/0x150
[Tue Jul  2 17:23:28 2024]  wait_for_completion+0x86/0x160
[Tue Jul  2 17:23:28 2024]  __flush_workqueue+0x152/0x420
[Tue Jul  2 17:23:28 2024]  nfsd4_destroy_session+0x1b6/0x250 [nfsd]
[Tue Jul  2 17:23:28 2024]  nfsd4_proc_compound+0x355/0x660 [nfsd]
[Tue Jul  2 17:23:28 2024]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
[Tue Jul  2 17:23:28 2024]  svc_process_common+0x289/0x5e0 [sunrpc]
[Tue Jul  2 17:23:28 2024]  ? svc_recv+0x4e5/0x890 [sunrpc]
[Tue Jul  2 17:23:28 2024]  ? nfsd_svc+0x360/0x360 [nfsd]
[Tue Jul  2 17:23:28 2024]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
[Tue Jul  2 17:23:28 2024]  svc_process+0xad/0x100 [sunrpc]
[Tue Jul  2 17:23:28 2024]  nfsd+0xd5/0x190 [nfsd]
[Tue Jul  2 17:23:28 2024]  kthread+0xda/0x100
[Tue Jul  2 17:23:28 2024]  ? kthread_complete_and_exit+0x20/0x20
[Tue Jul  2 17:23:28 2024]  ret_from_fork+0x22/0x30
[Tue Jul  2 17:23:28 2024]  </TASK>
[Tue Jul  2 17:23:28 2024] INFO: task nfsd:3038 blocked for more than 120 seconds.
[Tue Jul  2 17:23:28 2024]       Not tainted 6.1.0-21-amd64 #1 Debian 6.1.90-1
[Tue Jul  2 17:23:28 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.


/var/log/kern.log:
2024-06-28T10:40:40.273493+02:00 nasl006b kernel: [959982.169372] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000d1675728 xid 372e06af
2024-06-28T10:40:40.273507+02:00 nasl006b kernel: [959982.169374] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000506887ca xid 5be3c4d4
2024-06-28T10:40:40.273508+02:00 nasl006b kernel: [959982.169379] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b5e5e5a3 xid e5d0daba
2024-06-28T10:40:40.273509+02:00 nasl006b kernel: [959982.169423] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b66bbd6f xid 69696b56
2024-06-28T10:40:40.273509+02:00 nasl006b kernel: [959982.169498] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000008b3b3ac xid 89b9afda
2024-06-28T10:40:40.273510+02:00 nasl006b kernel: [959982.169504] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000a5ff94a6 xid e595a668
2024-06-28T10:40:40.273512+02:00 nasl006b kernel: [959982.169529] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000f7f9161e xid 2fc4625c
2024-06-28T10:40:40.273513+02:00 nasl006b kernel: [959982.169659] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b73943aa xid cb26e47f
2024-06-28T10:40:40.273514+02:00 nasl006b kernel: [959982.169691] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000009a1d5dcf xid 61c9a824
2024-06-28T10:40:40.273514+02:00 nasl006b kernel: [959982.169697] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000749c823f xid 8096d3d0
2024-06-28T10:40:40.944609+02:00 nasl006b kernel: [959983.506736] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000190b801c xid bdd1dcd0
2024-06-28T10:40:40.948612+02:00 nasl006b kernel: [959983.512235] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000042dcce88 xid d76d5de4
2024-06-28T10:40:40.952617+02:00 nasl006b kernel: [959983.514349] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000007cad732d xid 0bd90329
2024-06-28T10:40:40.952623+02:00 nasl006b kernel: [959983.514564] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000004cd80e49 xid 898488ca
2024-06-28T10:40:40.952624+02:00 nasl006b kernel: [959983.514951] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000abccd646 xid d0d28401
2024-06-28T10:40:40.952624+02:00 nasl006b kernel: [959983.515009] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ef92587f xid 955d2e44
2024-06-28T10:40:40.952625+02:00 nasl006b kernel: [959983.515060] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ed964446 xid b056e568
2024-07-02T17:20:23.113792+02:00 nasl006b kernel: [1329564.790305] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b14782f0 xid 4c4ae7c5
2024-07-02T17:23:32.268700+02:00 nasl006b kernel: [1329753.944957]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
2024-07-02T17:23:32.300740+02:00 nasl006b kernel: [1329753.969482]  svc_process_common+0x289/0x5e0 [sunrpc]
2024-07-02T17:23:32.300757+02:00 nasl006b kernel: [1329753.969919]  nfsd+0xd5/0x190 [nfsd]
2024-07-02T17:23:32.364636+02:00 nasl006b kernel: [1329754.041100]  ? nfsd_svc+0x360/0x360 [nfsd]
2024-07-02T17:23:32.419012+02:00 nasl006b kernel: [1329754.088290]  svc_process+0xad/0x100 [sunrpc]
2024-07-02T17:23:32.419020+02:00 nasl006b kernel: [1329754.088337]  ret_from_fork+0x22/0x30
2024-07-02T17:23:32.443744+02:00 nasl006b kernel: [1329754.111842]  svc_process+0xad/0x100 [sunrpc]
2024-07-02T17:23:32.443749+02:00 nasl006b kernel: [1329754.111882]  ? kthread_complete_and_exit+0x20/0x20
2024-07-02T17:23:32.488628+02:00 nasl006b kernel: [1329754.161331]  ? kthread_complete_and_exit+0x20/0x20


Regards
Harri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-07-02 17:25             ` Harald Dunkel
@ 2024-07-02 18:17               ` Chuck Lever III
  2024-07-03  4:14                 ` Harald Dunkel
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever III @ 2024-07-02 18:17 UTC (permalink / raw)
  To: Harald Dunkel; +Cc: Calum Mackay, Linux NFS Mailing List



> On Jul 2, 2024, at 1:25 PM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
> 
> Hi folks,
> 
> my NAS ran into this problem again. NFS got stuck somehow, and
> the nfsd couldn't be killed :-(.
> 
> dmesg:
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000749c823f xid 5bf8d3d0
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ce307050 xid 3b4fbd9f
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000f7f9161e xid 0a26635c
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000007c978512 xid 384cbf0c
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000dc3c09f6 xid 53cc0e3e
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000d1675728 xid 129006af
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000047159b90 xid 0c06b6e0
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000008b3b3ac xid 641bb0da
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000009eb832dc xid 005fcc99
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000042dcce88 xid b3cf5de4
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b66bbd6f xid d4f06b56
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b5e5e5a3 xid c032dbba
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000e123efc9 xid 99fa75d9
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ca43f6f0 xid e38d5b74
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ad683927 xid 277cde8c
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000e8e01f09 xid 641df4a4
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000006223d195 xid 3dba2d2a
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b73943aa xid a688e47f
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000004cd80e49 xid 64e688ca
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ef92587f xid 70bf2e44
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000a5ff94a6 xid c0f7a668
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000fd9a0890 xid 0df7d2c7
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000c42ddaac xid 800e710e
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000f43275cf xid 8b05e704
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000009a1d5dcf xid 3c2ba924
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000007cad732d xid e73a0429
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000008e7d297f xid 075a98e5
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ed964446 xid 8bb8e568
> [Tue Jul  2 17:20:19 2024] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b14782f0 xid 4c4ae7c5
> [Tue Jul  2 17:23:28 2024] INFO: task nfsd:3037 blocked for more than 120 seconds.
> [Tue Jul  2 17:23:28 2024]       Not tainted 6.1.0-21-amd64 #1 Debian 6.1.90-1
> [Tue Jul  2 17:23:28 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Tue Jul  2 17:23:28 2024] task:nfsd            state:D stack:0     pid:3037  ppid:2      flags:0x00004000
> [Tue Jul  2 17:23:28 2024] Call Trace:
> [Tue Jul  2 17:23:28 2024]  <TASK>
> [Tue Jul  2 17:23:28 2024]  __schedule+0x34d/0x9e0
> [Tue Jul  2 17:23:28 2024]  schedule+0x5a/0xd0
> [Tue Jul  2 17:23:28 2024]  schedule_timeout+0x118/0x150
> [Tue Jul  2 17:23:28 2024]  wait_for_completion+0x86/0x160
> [Tue Jul  2 17:23:28 2024]  __flush_workqueue+0x152/0x420
> [Tue Jul  2 17:23:28 2024]  nfsd4_destroy_session+0x1b6/0x250 [nfsd]
> [Tue Jul  2 17:23:28 2024]  nfsd4_proc_compound+0x355/0x660 [nfsd]
> [Tue Jul  2 17:23:28 2024]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
> [Tue Jul  2 17:23:28 2024]  svc_process_common+0x289/0x5e0 [sunrpc]
> [Tue Jul  2 17:23:28 2024]  ? svc_recv+0x4e5/0x890 [sunrpc]
> [Tue Jul  2 17:23:28 2024]  ? nfsd_svc+0x360/0x360 [nfsd]
> [Tue Jul  2 17:23:28 2024]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> [Tue Jul  2 17:23:28 2024]  svc_process+0xad/0x100 [sunrpc]
> [Tue Jul  2 17:23:28 2024]  nfsd+0xd5/0x190 [nfsd]
> [Tue Jul  2 17:23:28 2024]  kthread+0xda/0x100
> [Tue Jul  2 17:23:28 2024]  ? kthread_complete_and_exit+0x20/0x20
> [Tue Jul  2 17:23:28 2024]  ret_from_fork+0x22/0x30
> [Tue Jul  2 17:23:28 2024]  </TASK>
> [Tue Jul  2 17:23:28 2024] INFO: task nfsd:3038 blocked for more than 120 seconds.
> [Tue Jul  2 17:23:28 2024]       Not tainted 6.1.0-21-amd64 #1 Debian 6.1.90-1
> [Tue Jul  2 17:23:28 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 
> 
> /var/log/kern.log:
> 2024-06-28T10:40:40.273493+02:00 nasl006b kernel: [959982.169372] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000d1675728 xid 372e06af
> 2024-06-28T10:40:40.273507+02:00 nasl006b kernel: [959982.169374] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000506887ca xid 5be3c4d4
> 2024-06-28T10:40:40.273508+02:00 nasl006b kernel: [959982.169379] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b5e5e5a3 xid e5d0daba
> 2024-06-28T10:40:40.273509+02:00 nasl006b kernel: [959982.169423] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b66bbd6f xid 69696b56
> 2024-06-28T10:40:40.273509+02:00 nasl006b kernel: [959982.169498] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000008b3b3ac xid 89b9afda
> 2024-06-28T10:40:40.273510+02:00 nasl006b kernel: [959982.169504] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000a5ff94a6 xid e595a668
> 2024-06-28T10:40:40.273512+02:00 nasl006b kernel: [959982.169529] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000f7f9161e xid 2fc4625c
> 2024-06-28T10:40:40.273513+02:00 nasl006b kernel: [959982.169659] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b73943aa xid cb26e47f
> 2024-06-28T10:40:40.273514+02:00 nasl006b kernel: [959982.169691] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000009a1d5dcf xid 61c9a824
> 2024-06-28T10:40:40.273514+02:00 nasl006b kernel: [959982.169697] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000749c823f xid 8096d3d0
> 2024-06-28T10:40:40.944609+02:00 nasl006b kernel: [959983.506736] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000190b801c xid bdd1dcd0
> 2024-06-28T10:40:40.948612+02:00 nasl006b kernel: [959983.512235] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000042dcce88 xid d76d5de4
> 2024-06-28T10:40:40.952617+02:00 nasl006b kernel: [959983.514349] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000007cad732d xid 0bd90329
> 2024-06-28T10:40:40.952623+02:00 nasl006b kernel: [959983.514564] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 000000004cd80e49 xid 898488ca
> 2024-06-28T10:40:40.952624+02:00 nasl006b kernel: [959983.514951] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000abccd646 xid d0d28401
> 2024-06-28T10:40:40.952624+02:00 nasl006b kernel: [959983.515009] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ef92587f xid 955d2e44
> 2024-06-28T10:40:40.952625+02:00 nasl006b kernel: [959983.515060] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ed964446 xid b056e568
> 2024-07-02T17:20:23.113792+02:00 nasl006b kernel: [1329564.790305] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000b14782f0 xid 4c4ae7c5
> 2024-07-02T17:23:32.268700+02:00 nasl006b kernel: [1329753.944957]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
> 2024-07-02T17:23:32.300740+02:00 nasl006b kernel: [1329753.969482]  svc_process_common+0x289/0x5e0 [sunrpc]
> 2024-07-02T17:23:32.300757+02:00 nasl006b kernel: [1329753.969919]  nfsd+0xd5/0x190 [nfsd]
> 2024-07-02T17:23:32.364636+02:00 nasl006b kernel: [1329754.041100]  ? nfsd_svc+0x360/0x360 [nfsd]
> 2024-07-02T17:23:32.419012+02:00 nasl006b kernel: [1329754.088290]  svc_process+0xad/0x100 [sunrpc]
> 2024-07-02T17:23:32.419020+02:00 nasl006b kernel: [1329754.088337]  ret_from_fork+0x22/0x30
> 2024-07-02T17:23:32.443744+02:00 nasl006b kernel: [1329754.111842]  svc_process+0xad/0x100 [sunrpc]
> 2024-07-02T17:23:32.443749+02:00 nasl006b kernel: [1329754.111882]  ? kthread_complete_and_exit+0x20/0x20
> 2024-07-02T17:23:32.488628+02:00 nasl006b kernel: [1329754.161331]  ? kthread_complete_and_exit+0x20/0x20

Harald, none of this is any more probative than the first
report you sent. We can't tell what's going on unless you
can help us debug the problem. We're just not set up as a
help desk. Have you contacted your Linux vendor and asked
for help?


--
Chuck Lever



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd becomes a zombie
  2024-07-02 18:17               ` Chuck Lever III
@ 2024-07-03  4:14                 ` Harald Dunkel
  0 siblings, 0 replies; 31+ messages in thread
From: Harald Dunkel @ 2024-07-03  4:14 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Calum Mackay, Linux NFS Mailing List

On 2024-07-02 20:17:11, Chuck Lever III wrote:
> 
> Harald, none of this is any more probative than the first
> report you sent. We can't tell what's going on unless you
> can help us debug the problem. We're just not set up as a
> help desk. Have you contacted your Linux vendor and asked
> for help?
> 

Understood. I highly appreciate your work on the NFS implementation.

Maybe it is not important, but did you notice that the most recent
"Got unrecognized reply" messages in dmesg did not show up in /var/\
log/kern.log, except for the first message immediately before NFS got
frozen? Usually both show the same.

Regards

Harri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* nfsd blocks indefinitely in nfsd4_destroy_session (was: Re: nfsd becomes a zombie)
  2024-06-17 14:31 ` Chuck Lever III
  2024-06-17 19:20   ` Calum Mackay
@ 2024-12-25  9:15   ` Salvatore Bonaccorso
  2025-01-01 19:24     ` nfsd blocks indefinitely in nfsd4_destroy_session Chuck Lever
  1 sibling, 1 reply; 31+ messages in thread
From: Salvatore Bonaccorso @ 2024-12-25  9:15 UTC (permalink / raw)
  To: Chuck Lever III, Benjamin Coddington, Trond Myklebust
  Cc: Harald Dunkel, Linux NFS Mailing List, herzog, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste

Hi Chuck, hi all,

[it was not ideal to pick one of the message for this followup, let me
know if you want a complete new thread, adding as well Benjamin and
Trond as they are involved in one mentioned patch]

On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
> 
> 
> > On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
> > 
> > Hi folks,
> > 
> > what would be the reason for nfsd getting stuck somehow and becoming
> > an unkillable process? See
> > 
> > - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
> > - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
> > 
> > Doesn't this mean that something inside the kernel gets stuck as
> > well? Seems odd to me.
> 
> I'm not familiar with the Debian or Ubuntu kernel packages. Can
> the kernel release numbers be translated to LTS kernel releases
> please? Need both "last known working" and "first broken" releases.
> 
> This:
> 
> [ 6596.911785] RPC: Could not send backchannel reply error: -110
> [ 6596.972490] RPC: Could not send backchannel reply error: -110
> [ 6837.281307] RPC: Could not send backchannel reply error: -110
> 
> is a known set of client backchannel bugs. Knowing the LTS kernel
> releases (see above) will help us figure out what needs to be
> backported to the LTS kernels kernels in question.
> 
> This:
> 
> [11183.290619] wait_for_completion+0x88/0x150
> [11183.290623] __flush_workqueue+0x140/0x3e0
> [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
> [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
> 
> is probably related to the backchannel errors on the client, but
> client bugs shouldn't cause the server to hang like this. We
> might be able to say more if you can provide the kernel release
> translations (see above).

In Debian we hstill have the bug #1071562 open and one person notified
mye offlist that it appears that the issue get more frequent since
they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
with a 6.1.y based kernel). 

Some people around those issues, seem to claim that the change
mentioned in
https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
would fix the issue, which is as well backchannel related.

This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
nfs_client's rpc timeouts for backchannel") this is not something
which goes back to 6.1.y, could it be possible that hte backchannel
refactoring and this final fix indeeds fixes the issue?

As people report it is not easily reproducible, so this makes it
harder to identify fixes correctly.

I gave a (short) stance on trying to backport commits up to
6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
seems to indicate it is probably still not the right thing for
backporting to the older stable series.

As at least pre-requisites:

2009e32997ed568a305cf9bc7bf27d22e0f6ccda
4119bd0306652776cb0b7caa3aea5b2a93aecb89
163cdfca341b76c958567ae0966bd3575c5c6192
f4afc8fead386c81fda2593ad6162271d26667f8
6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
57331a59ac0d680f606403eb24edd3c35aecba31

and still there would be conflicting codepaths (and does not seem
right).

Chuck, Benjamin, Trond, is there anything we can provive on reporters
side that we can try to tackle this issue better?

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2024-12-25  9:15   ` nfsd blocks indefinitely in nfsd4_destroy_session (was: Re: nfsd becomes a zombie) Salvatore Bonaccorso
@ 2025-01-01 19:24     ` Chuck Lever
  2025-01-07 21:17       ` Salvatore Bonaccorso
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-01-01 19:24 UTC (permalink / raw)
  To: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust
  Cc: Harald Dunkel, Linux NFS Mailing List, herzog, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste

On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
> Hi Chuck, hi all,
> 
> [it was not ideal to pick one of the message for this followup, let me
> know if you want a complete new thread, adding as well Benjamin and
> Trond as they are involved in one mentioned patch]
> 
> On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
>>
>>
>>> On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
>>>
>>> Hi folks,
>>>
>>> what would be the reason for nfsd getting stuck somehow and becoming
>>> an unkillable process? See
>>>
>>> - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
>>> - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
>>>
>>> Doesn't this mean that something inside the kernel gets stuck as
>>> well? Seems odd to me.
>>
>> I'm not familiar with the Debian or Ubuntu kernel packages. Can
>> the kernel release numbers be translated to LTS kernel releases
>> please? Need both "last known working" and "first broken" releases.
>>
>> This:
>>
>> [ 6596.911785] RPC: Could not send backchannel reply error: -110
>> [ 6596.972490] RPC: Could not send backchannel reply error: -110
>> [ 6837.281307] RPC: Could not send backchannel reply error: -110
>>
>> is a known set of client backchannel bugs. Knowing the LTS kernel
>> releases (see above) will help us figure out what needs to be
>> backported to the LTS kernels kernels in question.
>>
>> This:
>>
>> [11183.290619] wait_for_completion+0x88/0x150
>> [11183.290623] __flush_workqueue+0x140/0x3e0
>> [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
>> [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
>>
>> is probably related to the backchannel errors on the client, but
>> client bugs shouldn't cause the server to hang like this. We
>> might be able to say more if you can provide the kernel release
>> translations (see above).
> 
> In Debian we hstill have the bug #1071562 open and one person notified
> mye offlist that it appears that the issue get more frequent since
> they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
> with a 6.1.y based kernel).
> 
> Some people around those issues, seem to claim that the change
> mentioned in
> https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
> would fix the issue, which is as well backchannel related.
> 
> This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
> again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
> nfs_client's rpc timeouts for backchannel") this is not something
> which goes back to 6.1.y, could it be possible that hte backchannel
> refactoring and this final fix indeeds fixes the issue?
> 
> As people report it is not easily reproducible, so this makes it
> harder to identify fixes correctly.
> 
> I gave a (short) stance on trying to backport commits up to
> 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
> seems to indicate it is probably still not the right thing for
> backporting to the older stable series.
> 
> As at least pre-requisites:
> 
> 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
> 4119bd0306652776cb0b7caa3aea5b2a93aecb89
> 163cdfca341b76c958567ae0966bd3575c5c6192
> f4afc8fead386c81fda2593ad6162271d26667f8
> 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
> 57331a59ac0d680f606403eb24edd3c35aecba31
> 
> and still there would be conflicting codepaths (and does not seem
> right).
> 
> Chuck, Benjamin, Trond, is there anything we can provive on reporters
> side that we can try to tackle this issue better?

As I've indicated before, NFSD should not block no matter what the
client may or may not be doing. I'd like to focus on the server first.

What is the result of:

$ cd <Bookworm's v6.1.90 kernel source >
$ unset KBUILD_OUTPUT
$ make -j `nproc`
$ scripts/faddr2line \
	fs/nfsd/nfs4state.o \
	nfsd4_destroy_session+0x16d

Since this issue appeared after v6.1.1, is it possible to bisect
between v6.1.1 and v6.1.90 ?


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-01 19:24     ` nfsd blocks indefinitely in nfsd4_destroy_session Chuck Lever
@ 2025-01-07 21:17       ` Salvatore Bonaccorso
  2025-01-08 13:33         ` Chuck Lever
  0 siblings, 1 reply; 31+ messages in thread
From: Salvatore Bonaccorso @ 2025-01-07 21:17 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Benjamin Coddington, Trond Myklebust, Harald Dunkel,
	Linux NFS Mailing List, herzog, Martin Svec, Michael Gernoth,
	Pellegrin Baptiste

Hi Chuck,

Thanks for your time on this, much appreciated.

On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
> On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
> > Hi Chuck, hi all,
> > 
> > [it was not ideal to pick one of the message for this followup, let me
> > know if you want a complete new thread, adding as well Benjamin and
> > Trond as they are involved in one mentioned patch]
> > 
> > On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
> > > 
> > > 
> > > > On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
> > > > 
> > > > Hi folks,
> > > > 
> > > > what would be the reason for nfsd getting stuck somehow and becoming
> > > > an unkillable process? See
> > > > 
> > > > - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
> > > > - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
> > > > 
> > > > Doesn't this mean that something inside the kernel gets stuck as
> > > > well? Seems odd to me.
> > > 
> > > I'm not familiar with the Debian or Ubuntu kernel packages. Can
> > > the kernel release numbers be translated to LTS kernel releases
> > > please? Need both "last known working" and "first broken" releases.
> > > 
> > > This:
> > > 
> > > [ 6596.911785] RPC: Could not send backchannel reply error: -110
> > > [ 6596.972490] RPC: Could not send backchannel reply error: -110
> > > [ 6837.281307] RPC: Could not send backchannel reply error: -110
> > > 
> > > is a known set of client backchannel bugs. Knowing the LTS kernel
> > > releases (see above) will help us figure out what needs to be
> > > backported to the LTS kernels kernels in question.
> > > 
> > > This:
> > > 
> > > [11183.290619] wait_for_completion+0x88/0x150
> > > [11183.290623] __flush_workqueue+0x140/0x3e0
> > > [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
> > > [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
> > > 
> > > is probably related to the backchannel errors on the client, but
> > > client bugs shouldn't cause the server to hang like this. We
> > > might be able to say more if you can provide the kernel release
> > > translations (see above).
> > 
> > In Debian we hstill have the bug #1071562 open and one person notified
> > mye offlist that it appears that the issue get more frequent since
> > they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
> > with a 6.1.y based kernel).
> > 
> > Some people around those issues, seem to claim that the change
> > mentioned in
> > https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
> > would fix the issue, which is as well backchannel related.
> > 
> > This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
> > again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
> > nfs_client's rpc timeouts for backchannel") this is not something
> > which goes back to 6.1.y, could it be possible that hte backchannel
> > refactoring and this final fix indeeds fixes the issue?
> > 
> > As people report it is not easily reproducible, so this makes it
> > harder to identify fixes correctly.
> > 
> > I gave a (short) stance on trying to backport commits up to
> > 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
> > seems to indicate it is probably still not the right thing for
> > backporting to the older stable series.
> > 
> > As at least pre-requisites:
> > 
> > 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
> > 4119bd0306652776cb0b7caa3aea5b2a93aecb89
> > 163cdfca341b76c958567ae0966bd3575c5c6192
> > f4afc8fead386c81fda2593ad6162271d26667f8
> > 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
> > 57331a59ac0d680f606403eb24edd3c35aecba31
> > 
> > and still there would be conflicting codepaths (and does not seem
> > right).
> > 
> > Chuck, Benjamin, Trond, is there anything we can provive on reporters
> > side that we can try to tackle this issue better?
> 
> As I've indicated before, NFSD should not block no matter what the
> client may or may not be doing. I'd like to focus on the server first.
> 
> What is the result of:
> 
> $ cd <Bookworm's v6.1.90 kernel source >
> $ unset KBUILD_OUTPUT
> $ make -j `nproc`
> $ scripts/faddr2line \
> 	fs/nfsd/nfs4state.o \
> 	nfsd4_destroy_session+0x16d
> 
> Since this issue appeared after v6.1.1, is it possible to bisect
> between v6.1.1 and v6.1.90 ?

First please note, at least speaking of triggering the issue in
Debian, Debian has moved to 6.1.119 based kernel already (and soon in
the weekends point release move to 6.1.123).

That said, one of the users which regularly seems to be hit by the
issue was able to provide the above requested information, based for
6.1.119:

~/kernel/linux-source-6.1# make kernelversion
6.1.119
~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
nfsd4_destroy_session+0x16d/0x250:
__list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
(inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
(inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
(inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856

They could provide as well a decode_stacktrace output for the recent
hit (if that is helpful for you):

[Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
[Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
[Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
[Mon Jan 6 13:25:28 2025] Call Trace:
[Mon Jan 6 13:25:28 2025]  <TASK>
[Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
[Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
[Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
[Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
[Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
[Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
[Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
[Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
[Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
[Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
[Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
[Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
[Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
[Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
[Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
[Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
[Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
[Mon Jan  6 13:25:28 2025]  </TASK>

The question about bisection is actually harder, those are production
systems and I understand it's not possible to do bisect there, while
unfortunately not reprodcing the issue on "lab conditions".

Does the above give us still a clue on what you were looking for?

Regards,
Salvatore

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-07 21:17       ` Salvatore Bonaccorso
@ 2025-01-08 13:33         ` Chuck Lever
  2025-01-08 14:54           ` Christian Herzog
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-01-08 13:33 UTC (permalink / raw)
  To: Salvatore Bonaccorso
  Cc: Benjamin Coddington, Trond Myklebust, Harald Dunkel,
	Linux NFS Mailing List, herzog, Martin Svec, Michael Gernoth,
	Pellegrin Baptiste

On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
> Hi Chuck,
> 
> Thanks for your time on this, much appreciated.
> 
> On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
>> On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
>>> Hi Chuck, hi all,
>>>
>>> [it was not ideal to pick one of the message for this followup, let me
>>> know if you want a complete new thread, adding as well Benjamin and
>>> Trond as they are involved in one mentioned patch]
>>>
>>> On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
>>>>
>>>>
>>>>> On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
>>>>>
>>>>> Hi folks,
>>>>>
>>>>> what would be the reason for nfsd getting stuck somehow and becoming
>>>>> an unkillable process? See
>>>>>
>>>>> - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
>>>>> - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
>>>>>
>>>>> Doesn't this mean that something inside the kernel gets stuck as
>>>>> well? Seems odd to me.
>>>>
>>>> I'm not familiar with the Debian or Ubuntu kernel packages. Can
>>>> the kernel release numbers be translated to LTS kernel releases
>>>> please? Need both "last known working" and "first broken" releases.
>>>>
>>>> This:
>>>>
>>>> [ 6596.911785] RPC: Could not send backchannel reply error: -110
>>>> [ 6596.972490] RPC: Could not send backchannel reply error: -110
>>>> [ 6837.281307] RPC: Could not send backchannel reply error: -110
>>>>
>>>> is a known set of client backchannel bugs. Knowing the LTS kernel
>>>> releases (see above) will help us figure out what needs to be
>>>> backported to the LTS kernels kernels in question.
>>>>
>>>> This:
>>>>
>>>> [11183.290619] wait_for_completion+0x88/0x150
>>>> [11183.290623] __flush_workqueue+0x140/0x3e0
>>>> [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
>>>> [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
>>>>
>>>> is probably related to the backchannel errors on the client, but
>>>> client bugs shouldn't cause the server to hang like this. We
>>>> might be able to say more if you can provide the kernel release
>>>> translations (see above).
>>>
>>> In Debian we hstill have the bug #1071562 open and one person notified
>>> mye offlist that it appears that the issue get more frequent since
>>> they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
>>> with a 6.1.y based kernel).
>>>
>>> Some people around those issues, seem to claim that the change
>>> mentioned in
>>> https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
>>> would fix the issue, which is as well backchannel related.
>>>
>>> This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
>>> again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
>>> nfs_client's rpc timeouts for backchannel") this is not something
>>> which goes back to 6.1.y, could it be possible that hte backchannel
>>> refactoring and this final fix indeeds fixes the issue?
>>>
>>> As people report it is not easily reproducible, so this makes it
>>> harder to identify fixes correctly.
>>>
>>> I gave a (short) stance on trying to backport commits up to
>>> 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
>>> seems to indicate it is probably still not the right thing for
>>> backporting to the older stable series.
>>>
>>> As at least pre-requisites:
>>>
>>> 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
>>> 4119bd0306652776cb0b7caa3aea5b2a93aecb89
>>> 163cdfca341b76c958567ae0966bd3575c5c6192
>>> f4afc8fead386c81fda2593ad6162271d26667f8
>>> 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
>>> 57331a59ac0d680f606403eb24edd3c35aecba31
>>>
>>> and still there would be conflicting codepaths (and does not seem
>>> right).
>>>
>>> Chuck, Benjamin, Trond, is there anything we can provive on reporters
>>> side that we can try to tackle this issue better?
>>
>> As I've indicated before, NFSD should not block no matter what the
>> client may or may not be doing. I'd like to focus on the server first.
>>
>> What is the result of:
>>
>> $ cd <Bookworm's v6.1.90 kernel source >
>> $ unset KBUILD_OUTPUT
>> $ make -j `nproc`
>> $ scripts/faddr2line \
>> 	fs/nfsd/nfs4state.o \
>> 	nfsd4_destroy_session+0x16d
>>
>> Since this issue appeared after v6.1.1, is it possible to bisect
>> between v6.1.1 and v6.1.90 ?
> 
> First please note, at least speaking of triggering the issue in
> Debian, Debian has moved to 6.1.119 based kernel already (and soon in
> the weekends point release move to 6.1.123).
> 
> That said, one of the users which regularly seems to be hit by the
> issue was able to provide the above requested information, based for
> 6.1.119:
> 
> ~/kernel/linux-source-6.1# make kernelversion
> 6.1.119
> ~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
> nfsd4_destroy_session+0x16d/0x250:
> __list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
> (inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
> (inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
> (inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
> 
> They could provide as well a decode_stacktrace output for the recent
> hit (if that is helpful for you):
> 
> [Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
> [Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
> [Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
> [Mon Jan 6 13:25:28 2025] Call Trace:
> [Mon Jan 6 13:25:28 2025]  <TASK>
> [Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
> [Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
> [Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
> [Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
> [Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
> [Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
> [Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
> [Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
> [Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
> [Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
> [Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
> [Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
> [Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
> [Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
> [Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
> [Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
> [Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
> [Mon Jan  6 13:25:28 2025]  </TASK>
> 
> The question about bisection is actually harder, those are production
> systems and I understand it's not possible to do bisect there, while
> unfortunately not reprodcing the issue on "lab conditions".
> 
> Does the above give us still a clue on what you were looking for?

Thanks for the update.

It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
nfs4_client"), while not a specific fix for this issue, might offer some
relief by preventing the DESTROY_SESSION hang from affecting all other
clients of the degraded server.

Not having a reproducer or the ability to bisect puts a damper on
things. The next step, then, is to enable tracing on servers where this
issue can come up, and wait for the hang to occur. The following command
captures information only about callback operation, so it should not
generate much data or impact server performance.

   # trace-cmd record -e nfsd:nfsd_cb\*

Let that run until the problem occurs, then ^C, compress the resulting
trace.dat file, and either attach it to 1071562 or email it to me
privately.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-08 13:33         ` Chuck Lever
@ 2025-01-08 14:54           ` Christian Herzog
  2025-01-08 15:07             ` Chuck Lever
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Herzog @ 2025-01-08 14:54 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste

Dear Chuck,

On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
> On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
> > Hi Chuck,
> > 
> > Thanks for your time on this, much appreciated.
> > 
> > On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
> > > On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
> > > > Hi Chuck, hi all,
> > > > 
> > > > [it was not ideal to pick one of the message for this followup, let me
> > > > know if you want a complete new thread, adding as well Benjamin and
> > > > Trond as they are involved in one mentioned patch]
> > > > 
> > > > On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
> > > > > 
> > > > > 
> > > > > > On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
> > > > > > 
> > > > > > Hi folks,
> > > > > > 
> > > > > > what would be the reason for nfsd getting stuck somehow and becoming
> > > > > > an unkillable process? See
> > > > > > 
> > > > > > - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
> > > > > > - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
> > > > > > 
> > > > > > Doesn't this mean that something inside the kernel gets stuck as
> > > > > > well? Seems odd to me.
> > > > > 
> > > > > I'm not familiar with the Debian or Ubuntu kernel packages. Can
> > > > > the kernel release numbers be translated to LTS kernel releases
> > > > > please? Need both "last known working" and "first broken" releases.
> > > > > 
> > > > > This:
> > > > > 
> > > > > [ 6596.911785] RPC: Could not send backchannel reply error: -110
> > > > > [ 6596.972490] RPC: Could not send backchannel reply error: -110
> > > > > [ 6837.281307] RPC: Could not send backchannel reply error: -110
> > > > > 
> > > > > is a known set of client backchannel bugs. Knowing the LTS kernel
> > > > > releases (see above) will help us figure out what needs to be
> > > > > backported to the LTS kernels kernels in question.
> > > > > 
> > > > > This:
> > > > > 
> > > > > [11183.290619] wait_for_completion+0x88/0x150
> > > > > [11183.290623] __flush_workqueue+0x140/0x3e0
> > > > > [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
> > > > > [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
> > > > > 
> > > > > is probably related to the backchannel errors on the client, but
> > > > > client bugs shouldn't cause the server to hang like this. We
> > > > > might be able to say more if you can provide the kernel release
> > > > > translations (see above).
> > > > 
> > > > In Debian we hstill have the bug #1071562 open and one person notified
> > > > mye offlist that it appears that the issue get more frequent since
> > > > they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
> > > > with a 6.1.y based kernel).
> > > > 
> > > > Some people around those issues, seem to claim that the change
> > > > mentioned in
> > > > https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
> > > > would fix the issue, which is as well backchannel related.
> > > > 
> > > > This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
> > > > again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
> > > > nfs_client's rpc timeouts for backchannel") this is not something
> > > > which goes back to 6.1.y, could it be possible that hte backchannel
> > > > refactoring and this final fix indeeds fixes the issue?
> > > > 
> > > > As people report it is not easily reproducible, so this makes it
> > > > harder to identify fixes correctly.
> > > > 
> > > > I gave a (short) stance on trying to backport commits up to
> > > > 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
> > > > seems to indicate it is probably still not the right thing for
> > > > backporting to the older stable series.
> > > > 
> > > > As at least pre-requisites:
> > > > 
> > > > 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
> > > > 4119bd0306652776cb0b7caa3aea5b2a93aecb89
> > > > 163cdfca341b76c958567ae0966bd3575c5c6192
> > > > f4afc8fead386c81fda2593ad6162271d26667f8
> > > > 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
> > > > 57331a59ac0d680f606403eb24edd3c35aecba31
> > > > 
> > > > and still there would be conflicting codepaths (and does not seem
> > > > right).
> > > > 
> > > > Chuck, Benjamin, Trond, is there anything we can provive on reporters
> > > > side that we can try to tackle this issue better?
> > > 
> > > As I've indicated before, NFSD should not block no matter what the
> > > client may or may not be doing. I'd like to focus on the server first.
> > > 
> > > What is the result of:
> > > 
> > > $ cd <Bookworm's v6.1.90 kernel source >
> > > $ unset KBUILD_OUTPUT
> > > $ make -j `nproc`
> > > $ scripts/faddr2line \
> > > 	fs/nfsd/nfs4state.o \
> > > 	nfsd4_destroy_session+0x16d
> > > 
> > > Since this issue appeared after v6.1.1, is it possible to bisect
> > > between v6.1.1 and v6.1.90 ?
> > 
> > First please note, at least speaking of triggering the issue in
> > Debian, Debian has moved to 6.1.119 based kernel already (and soon in
> > the weekends point release move to 6.1.123).
> > 
> > That said, one of the users which regularly seems to be hit by the
> > issue was able to provide the above requested information, based for
> > 6.1.119:
> > 
> > ~/kernel/linux-source-6.1# make kernelversion
> > 6.1.119
> > ~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
> > nfsd4_destroy_session+0x16d/0x250:
> > __list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
> > (inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
> > (inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
> > (inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
> > 
> > They could provide as well a decode_stacktrace output for the recent
> > hit (if that is helpful for you):
> > 
> > [Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
> > [Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
> > [Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
> > [Mon Jan 6 13:25:28 2025] Call Trace:
> > [Mon Jan 6 13:25:28 2025]  <TASK>
> > [Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
> > [Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
> > [Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
> > [Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
> > [Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
> > [Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
> > [Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
> > [Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
> > [Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
> > [Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
> > [Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
> > [Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
> > [Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
> > [Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
> > [Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
> > [Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
> > [Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
> > [Mon Jan  6 13:25:28 2025]  </TASK>
> > 
> > The question about bisection is actually harder, those are production
> > systems and I understand it's not possible to do bisect there, while
> > unfortunately not reprodcing the issue on "lab conditions".
> > 
> > Does the above give us still a clue on what you were looking for?
> 
> Thanks for the update.
> 
> It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
> nfs4_client"), while not a specific fix for this issue, might offer some
> relief by preventing the DESTROY_SESSION hang from affecting all other
> clients of the degraded server.
> 
> Not having a reproducer or the ability to bisect puts a damper on
> things. The next step, then, is to enable tracing on servers where this
> issue can come up, and wait for the hang to occur. The following command
> captures information only about callback operation, so it should not
> generate much data or impact server performance.
> 
>   # trace-cmd record -e nfsd:nfsd_cb\*
> 
> Let that run until the problem occurs, then ^C, compress the resulting
> trace.dat file, and either attach it to 1071562 or email it to me
> privately.
thanks for the follow-up.

I am the "customer" with two affected file servers. We have since moved those
servers to the backports kernel (6.11.10) in the hope of forward fixing the
issue. If this kernel is stable, I'm afraid I can't go back to the 'bad'
kernel (700+ researchers affected..), and we're also not able to trigger the
issue on our test environment.

I'm sorry I can't be of more help.

best,
-Christian



-- 
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
Department of Physics, ETH Zurich           
8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-08 14:54           ` Christian Herzog
@ 2025-01-08 15:07             ` Chuck Lever
  2025-01-09 11:56               ` Christian Herzog
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-01-08 15:07 UTC (permalink / raw)
  To: Christian Herzog
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste

On 1/8/25 9:54 AM, Christian Herzog wrote:
> Dear Chuck,
> 
> On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
>> On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
>>> Hi Chuck,
>>>
>>> Thanks for your time on this, much appreciated.
>>>
>>> On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
>>>> On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
>>>>> Hi Chuck, hi all,
>>>>>
>>>>> [it was not ideal to pick one of the message for this followup, let me
>>>>> know if you want a complete new thread, adding as well Benjamin and
>>>>> Trond as they are involved in one mentioned patch]
>>>>>
>>>>> On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
>>>>>>
>>>>>>
>>>>>>> On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
>>>>>>>
>>>>>>> Hi folks,
>>>>>>>
>>>>>>> what would be the reason for nfsd getting stuck somehow and becoming
>>>>>>> an unkillable process? See
>>>>>>>
>>>>>>> - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
>>>>>>> - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
>>>>>>>
>>>>>>> Doesn't this mean that something inside the kernel gets stuck as
>>>>>>> well? Seems odd to me.
>>>>>>
>>>>>> I'm not familiar with the Debian or Ubuntu kernel packages. Can
>>>>>> the kernel release numbers be translated to LTS kernel releases
>>>>>> please? Need both "last known working" and "first broken" releases.
>>>>>>
>>>>>> This:
>>>>>>
>>>>>> [ 6596.911785] RPC: Could not send backchannel reply error: -110
>>>>>> [ 6596.972490] RPC: Could not send backchannel reply error: -110
>>>>>> [ 6837.281307] RPC: Could not send backchannel reply error: -110
>>>>>>
>>>>>> is a known set of client backchannel bugs. Knowing the LTS kernel
>>>>>> releases (see above) will help us figure out what needs to be
>>>>>> backported to the LTS kernels kernels in question.
>>>>>>
>>>>>> This:
>>>>>>
>>>>>> [11183.290619] wait_for_completion+0x88/0x150
>>>>>> [11183.290623] __flush_workqueue+0x140/0x3e0
>>>>>> [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
>>>>>> [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
>>>>>>
>>>>>> is probably related to the backchannel errors on the client, but
>>>>>> client bugs shouldn't cause the server to hang like this. We
>>>>>> might be able to say more if you can provide the kernel release
>>>>>> translations (see above).
>>>>>
>>>>> In Debian we hstill have the bug #1071562 open and one person notified
>>>>> mye offlist that it appears that the issue get more frequent since
>>>>> they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
>>>>> with a 6.1.y based kernel).
>>>>>
>>>>> Some people around those issues, seem to claim that the change
>>>>> mentioned in
>>>>> https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
>>>>> would fix the issue, which is as well backchannel related.
>>>>>
>>>>> This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
>>>>> again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
>>>>> nfs_client's rpc timeouts for backchannel") this is not something
>>>>> which goes back to 6.1.y, could it be possible that hte backchannel
>>>>> refactoring and this final fix indeeds fixes the issue?
>>>>>
>>>>> As people report it is not easily reproducible, so this makes it
>>>>> harder to identify fixes correctly.
>>>>>
>>>>> I gave a (short) stance on trying to backport commits up to
>>>>> 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
>>>>> seems to indicate it is probably still not the right thing for
>>>>> backporting to the older stable series.
>>>>>
>>>>> As at least pre-requisites:
>>>>>
>>>>> 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
>>>>> 4119bd0306652776cb0b7caa3aea5b2a93aecb89
>>>>> 163cdfca341b76c958567ae0966bd3575c5c6192
>>>>> f4afc8fead386c81fda2593ad6162271d26667f8
>>>>> 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
>>>>> 57331a59ac0d680f606403eb24edd3c35aecba31
>>>>>
>>>>> and still there would be conflicting codepaths (and does not seem
>>>>> right).
>>>>>
>>>>> Chuck, Benjamin, Trond, is there anything we can provive on reporters
>>>>> side that we can try to tackle this issue better?
>>>>
>>>> As I've indicated before, NFSD should not block no matter what the
>>>> client may or may not be doing. I'd like to focus on the server first.
>>>>
>>>> What is the result of:
>>>>
>>>> $ cd <Bookworm's v6.1.90 kernel source >
>>>> $ unset KBUILD_OUTPUT
>>>> $ make -j `nproc`
>>>> $ scripts/faddr2line \
>>>> 	fs/nfsd/nfs4state.o \
>>>> 	nfsd4_destroy_session+0x16d
>>>>
>>>> Since this issue appeared after v6.1.1, is it possible to bisect
>>>> between v6.1.1 and v6.1.90 ?
>>>
>>> First please note, at least speaking of triggering the issue in
>>> Debian, Debian has moved to 6.1.119 based kernel already (and soon in
>>> the weekends point release move to 6.1.123).
>>>
>>> That said, one of the users which regularly seems to be hit by the
>>> issue was able to provide the above requested information, based for
>>> 6.1.119:
>>>
>>> ~/kernel/linux-source-6.1# make kernelversion
>>> 6.1.119
>>> ~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
>>> nfsd4_destroy_session+0x16d/0x250:
>>> __list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
>>> (inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
>>> (inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
>>> (inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
>>>
>>> They could provide as well a decode_stacktrace output for the recent
>>> hit (if that is helpful for you):
>>>
>>> [Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
>>> [Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
>>> [Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> [Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
>>> [Mon Jan 6 13:25:28 2025] Call Trace:
>>> [Mon Jan 6 13:25:28 2025]  <TASK>
>>> [Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
>>> [Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
>>> [Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
>>> [Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
>>> [Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
>>> [Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
>>> [Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
>>> [Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
>>> [Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
>>> [Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
>>> [Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
>>> [Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
>>> [Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
>>> [Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
>>> [Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
>>> [Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
>>> [Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
>>> [Mon Jan  6 13:25:28 2025]  </TASK>
>>>
>>> The question about bisection is actually harder, those are production
>>> systems and I understand it's not possible to do bisect there, while
>>> unfortunately not reprodcing the issue on "lab conditions".
>>>
>>> Does the above give us still a clue on what you were looking for?
>>
>> Thanks for the update.
>>
>> It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
>> nfs4_client"), while not a specific fix for this issue, might offer some
>> relief by preventing the DESTROY_SESSION hang from affecting all other
>> clients of the degraded server.
>>
>> Not having a reproducer or the ability to bisect puts a damper on
>> things. The next step, then, is to enable tracing on servers where this
>> issue can come up, and wait for the hang to occur. The following command
>> captures information only about callback operation, so it should not
>> generate much data or impact server performance.
>>
>>    # trace-cmd record -e nfsd:nfsd_cb\*
>>
>> Let that run until the problem occurs, then ^C, compress the resulting
>> trace.dat file, and either attach it to 1071562 or email it to me
>> privately.
> thanks for the follow-up.
> 
> I am the "customer" with two affected file servers. We have since moved those
> servers to the backports kernel (6.11.10) in the hope of forward fixing the
> issue. If this kernel is stable, I'm afraid I can't go back to the 'bad'
> kernel (700+ researchers affected..), and we're also not able to trigger the
> issue on our test environment.

Hello Dr. Herzog -

If the problem recurs on 6.11, the trace-cmd I suggest above works
there as well.

If not, there are several "me too!" reports in the bug. Anyone who hits
this issue can try the trace-cmd and report back.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-08 15:07             ` Chuck Lever
@ 2025-01-09 11:56               ` Christian Herzog
  2025-01-09 12:42                 ` Jeff Layton
  2025-01-09 15:49                 ` Chuck Lever
  0 siblings, 2 replies; 31+ messages in thread
From: Christian Herzog @ 2025-01-09 11:56 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste

Dear Chuck,

On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
> On 1/8/25 9:54 AM, Christian Herzog wrote:
> > Dear Chuck,
> > 
> > On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
> > > On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
> > > > Hi Chuck,
> > > > 
> > > > Thanks for your time on this, much appreciated.
> > > > 
> > > > On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
> > > > > On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
> > > > > > Hi Chuck, hi all,
> > > > > > 
> > > > > > [it was not ideal to pick one of the message for this followup, let me
> > > > > > know if you want a complete new thread, adding as well Benjamin and
> > > > > > Trond as they are involved in one mentioned patch]
> > > > > > 
> > > > > > On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > > On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
> > > > > > > > 
> > > > > > > > Hi folks,
> > > > > > > > 
> > > > > > > > what would be the reason for nfsd getting stuck somehow and becoming
> > > > > > > > an unkillable process? See
> > > > > > > > 
> > > > > > > > - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
> > > > > > > > - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
> > > > > > > > 
> > > > > > > > Doesn't this mean that something inside the kernel gets stuck as
> > > > > > > > well? Seems odd to me.
> > > > > > > 
> > > > > > > I'm not familiar with the Debian or Ubuntu kernel packages. Can
> > > > > > > the kernel release numbers be translated to LTS kernel releases
> > > > > > > please? Need both "last known working" and "first broken" releases.
> > > > > > > 
> > > > > > > This:
> > > > > > > 
> > > > > > > [ 6596.911785] RPC: Could not send backchannel reply error: -110
> > > > > > > [ 6596.972490] RPC: Could not send backchannel reply error: -110
> > > > > > > [ 6837.281307] RPC: Could not send backchannel reply error: -110
> > > > > > > 
> > > > > > > is a known set of client backchannel bugs. Knowing the LTS kernel
> > > > > > > releases (see above) will help us figure out what needs to be
> > > > > > > backported to the LTS kernels kernels in question.
> > > > > > > 
> > > > > > > This:
> > > > > > > 
> > > > > > > [11183.290619] wait_for_completion+0x88/0x150
> > > > > > > [11183.290623] __flush_workqueue+0x140/0x3e0
> > > > > > > [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
> > > > > > > [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
> > > > > > > 
> > > > > > > is probably related to the backchannel errors on the client, but
> > > > > > > client bugs shouldn't cause the server to hang like this. We
> > > > > > > might be able to say more if you can provide the kernel release
> > > > > > > translations (see above).
> > > > > > 
> > > > > > In Debian we hstill have the bug #1071562 open and one person notified
> > > > > > mye offlist that it appears that the issue get more frequent since
> > > > > > they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
> > > > > > with a 6.1.y based kernel).
> > > > > > 
> > > > > > Some people around those issues, seem to claim that the change
> > > > > > mentioned in
> > > > > > https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
> > > > > > would fix the issue, which is as well backchannel related.
> > > > > > 
> > > > > > This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
> > > > > > again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
> > > > > > nfs_client's rpc timeouts for backchannel") this is not something
> > > > > > which goes back to 6.1.y, could it be possible that hte backchannel
> > > > > > refactoring and this final fix indeeds fixes the issue?
> > > > > > 
> > > > > > As people report it is not easily reproducible, so this makes it
> > > > > > harder to identify fixes correctly.
> > > > > > 
> > > > > > I gave a (short) stance on trying to backport commits up to
> > > > > > 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
> > > > > > seems to indicate it is probably still not the right thing for
> > > > > > backporting to the older stable series.
> > > > > > 
> > > > > > As at least pre-requisites:
> > > > > > 
> > > > > > 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
> > > > > > 4119bd0306652776cb0b7caa3aea5b2a93aecb89
> > > > > > 163cdfca341b76c958567ae0966bd3575c5c6192
> > > > > > f4afc8fead386c81fda2593ad6162271d26667f8
> > > > > > 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
> > > > > > 57331a59ac0d680f606403eb24edd3c35aecba31
> > > > > > 
> > > > > > and still there would be conflicting codepaths (and does not seem
> > > > > > right).
> > > > > > 
> > > > > > Chuck, Benjamin, Trond, is there anything we can provive on reporters
> > > > > > side that we can try to tackle this issue better?
> > > > > 
> > > > > As I've indicated before, NFSD should not block no matter what the
> > > > > client may or may not be doing. I'd like to focus on the server first.
> > > > > 
> > > > > What is the result of:
> > > > > 
> > > > > $ cd <Bookworm's v6.1.90 kernel source >
> > > > > $ unset KBUILD_OUTPUT
> > > > > $ make -j `nproc`
> > > > > $ scripts/faddr2line \
> > > > > 	fs/nfsd/nfs4state.o \
> > > > > 	nfsd4_destroy_session+0x16d
> > > > > 
> > > > > Since this issue appeared after v6.1.1, is it possible to bisect
> > > > > between v6.1.1 and v6.1.90 ?
> > > > 
> > > > First please note, at least speaking of triggering the issue in
> > > > Debian, Debian has moved to 6.1.119 based kernel already (and soon in
> > > > the weekends point release move to 6.1.123).
> > > > 
> > > > That said, one of the users which regularly seems to be hit by the
> > > > issue was able to provide the above requested information, based for
> > > > 6.1.119:
> > > > 
> > > > ~/kernel/linux-source-6.1# make kernelversion
> > > > 6.1.119
> > > > ~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
> > > > nfsd4_destroy_session+0x16d/0x250:
> > > > __list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
> > > > (inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
> > > > (inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
> > > > (inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
> > > > 
> > > > They could provide as well a decode_stacktrace output for the recent
> > > > hit (if that is helpful for you):
> > > > 
> > > > [Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
> > > > [Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
> > > > [Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > [Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
> > > > [Mon Jan 6 13:25:28 2025] Call Trace:
> > > > [Mon Jan 6 13:25:28 2025]  <TASK>
> > > > [Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
> > > > [Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
> > > > [Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
> > > > [Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
> > > > [Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
> > > > [Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
> > > > [Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
> > > > [Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
> > > > [Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
> > > > [Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
> > > > [Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
> > > > [Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
> > > > [Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
> > > > [Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
> > > > [Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
> > > > [Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
> > > > [Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
> > > > [Mon Jan  6 13:25:28 2025]  </TASK>
> > > > 
> > > > The question about bisection is actually harder, those are production
> > > > systems and I understand it's not possible to do bisect there, while
> > > > unfortunately not reprodcing the issue on "lab conditions".
> > > > 
> > > > Does the above give us still a clue on what you were looking for?
> > > 
> > > Thanks for the update.
> > > 
> > > It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
> > > nfs4_client"), while not a specific fix for this issue, might offer some
> > > relief by preventing the DESTROY_SESSION hang from affecting all other
> > > clients of the degraded server.
> > > 
> > > Not having a reproducer or the ability to bisect puts a damper on
> > > things. The next step, then, is to enable tracing on servers where this
> > > issue can come up, and wait for the hang to occur. The following command
> > > captures information only about callback operation, so it should not
> > > generate much data or impact server performance.
> > > 
> > >    # trace-cmd record -e nfsd:nfsd_cb\*
> > > 
> > > Let that run until the problem occurs, then ^C, compress the resulting
> > > trace.dat file, and either attach it to 1071562 or email it to me
> > > privately.
> > thanks for the follow-up.
> > 
> > I am the "customer" with two affected file servers. We have since moved those
> > servers to the backports kernel (6.11.10) in the hope of forward fixing the
> > issue. If this kernel is stable, I'm afraid I can't go back to the 'bad'
> > kernel (700+ researchers affected..), and we're also not able to trigger the
> > issue on our test environment.
> 
> Hello Dr. Herzog -
> 
> If the problem recurs on 6.11, the trace-cmd I suggest above works
> there as well.
the bad news is: it just happened again with the bpo kernel.

We then immediately started trace-cmd since usually there are several call
traces in a row and we did get a trace.dat:
http://people.phys.ethz.ch/~daduke/trace.dat

we did notice however that the stack trace looked a bit different this time:

    INFO: task nfsd:2566 blocked for more than 5799 seconds.
    Tainted: G        W          6.11.10+bpo-amd64 #1 Debia>
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables t>
    task:nfsd            state:D stack:0     pid:2566  tgid:2566 >
    Call Trace:
    <TASK>
    __schedule+0x400/0xad0
    schedule+0x27/0xf0
    nfsd4_shutdown_callback+0xfe/0x140 [nfsd]
    ? __pfx_var_wake_function+0x10/0x10
    __destroy_client+0x1f0/0x290 [nfsd]
    nfsd4_destroy_clientid+0xf1/0x1e0 [nfsd]
    ? svcauth_unix_set_client+0x586/0x5f0 [sunrpc]
    nfsd4_proc_compound+0x34d/0x670 [nfsd]
    nfsd_dispatch+0xfd/0x220 [nfsd]
    svc_process_common+0x2f7/0x700 [sunrpc]
    ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
    svc_process+0x131/0x180 [sunrpc]
    svc_recv+0x830/0xa10 [sunrpc]
    ? __pfx_nfsd+0x10/0x10 [nfsd]
    nfsd+0x87/0xf0 [nfsd]
    kthread+0xcf/0x100
    ? __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    ? __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1a/0x30
    </TASK>

and also the state of the offending client in `/proc/fs/nfsd/clients/*/info`
used to be callback state: UNKNOWN while now it is DOWN or FAULT. No idea
what it means, but I thought it was worth mentioning.

thanks!
-Christian




-- 
Dr. Christian Herzog <herzog@phys.ethz.ch>  support: +41 44 633 26 68
Head, IT Services Group, HPT H 8              voice: +41 44 633 39 50
Department of Physics, ETH Zurich           
8093 Zurich, Switzerland                     http://isg.phys.ethz.ch/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-09 11:56               ` Christian Herzog
@ 2025-01-09 12:42                 ` Jeff Layton
  2025-01-09 13:56                   ` Chuck Lever
  2025-01-09 16:32                   ` Chuck Lever
  2025-01-09 15:49                 ` Chuck Lever
  1 sibling, 2 replies; 31+ messages in thread
From: Jeff Layton @ 2025-01-09 12:42 UTC (permalink / raw)
  To: Christian Herzog, Chuck Lever
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste

On Thu, 2025-01-09 at 12:56 +0100, Christian Herzog wrote:
> Dear Chuck,
> 
> On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
> > On 1/8/25 9:54 AM, Christian Herzog wrote:
> > > Dear Chuck,
> > > 
> > > On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
> > > > On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
> > > > > Hi Chuck,
> > > > > 
> > > > > Thanks for your time on this, much appreciated.
> > > > > 
> > > > > On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
> > > > > > On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
> > > > > > > Hi Chuck, hi all,
> > > > > > > 
> > > > > > > [it was not ideal to pick one of the message for this followup, let me
> > > > > > > know if you want a complete new thread, adding as well Benjamin and
> > > > > > > Trond as they are involved in one mentioned patch]
> > > > > > > 
> > > > > > > On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
> > > > > > > > > 
> > > > > > > > > Hi folks,
> > > > > > > > > 
> > > > > > > > > what would be the reason for nfsd getting stuck somehow and becoming
> > > > > > > > > an unkillable process? See
> > > > > > > > > 
> > > > > > > > > - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
> > > > > > > > > - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
> > > > > > > > > 
> > > > > > > > > Doesn't this mean that something inside the kernel gets stuck as
> > > > > > > > > well? Seems odd to me.
> > > > > > > > 
> > > > > > > > I'm not familiar with the Debian or Ubuntu kernel packages. Can
> > > > > > > > the kernel release numbers be translated to LTS kernel releases
> > > > > > > > please? Need both "last known working" and "first broken" releases.
> > > > > > > > 
> > > > > > > > This:
> > > > > > > > 
> > > > > > > > [ 6596.911785] RPC: Could not send backchannel reply error: -110
> > > > > > > > [ 6596.972490] RPC: Could not send backchannel reply error: -110
> > > > > > > > [ 6837.281307] RPC: Could not send backchannel reply error: -110
> > > > > > > > 
> > > > > > > > is a known set of client backchannel bugs. Knowing the LTS kernel
> > > > > > > > releases (see above) will help us figure out what needs to be
> > > > > > > > backported to the LTS kernels kernels in question.
> > > > > > > > 
> > > > > > > > This:
> > > > > > > > 
> > > > > > > > [11183.290619] wait_for_completion+0x88/0x150
> > > > > > > > [11183.290623] __flush_workqueue+0x140/0x3e0
> > > > > > > > [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
> > > > > > > > [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
> > > > > > > > 
> > > > > > > > is probably related to the backchannel errors on the client, but
> > > > > > > > client bugs shouldn't cause the server to hang like this. We
> > > > > > > > might be able to say more if you can provide the kernel release
> > > > > > > > translations (see above).
> > > > > > > 
> > > > > > > In Debian we hstill have the bug #1071562 open and one person notified
> > > > > > > mye offlist that it appears that the issue get more frequent since
> > > > > > > they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
> > > > > > > with a 6.1.y based kernel).
> > > > > > > 
> > > > > > > Some people around those issues, seem to claim that the change
> > > > > > > mentioned in
> > > > > > > https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
> > > > > > > would fix the issue, which is as well backchannel related.
> > > > > > > 
> > > > > > > This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
> > > > > > > again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
> > > > > > > nfs_client's rpc timeouts for backchannel") this is not something
> > > > > > > which goes back to 6.1.y, could it be possible that hte backchannel
> > > > > > > refactoring and this final fix indeeds fixes the issue?
> > > > > > > 
> > > > > > > As people report it is not easily reproducible, so this makes it
> > > > > > > harder to identify fixes correctly.
> > > > > > > 
> > > > > > > I gave a (short) stance on trying to backport commits up to
> > > > > > > 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
> > > > > > > seems to indicate it is probably still not the right thing for
> > > > > > > backporting to the older stable series.
> > > > > > > 
> > > > > > > As at least pre-requisites:
> > > > > > > 
> > > > > > > 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
> > > > > > > 4119bd0306652776cb0b7caa3aea5b2a93aecb89
> > > > > > > 163cdfca341b76c958567ae0966bd3575c5c6192
> > > > > > > f4afc8fead386c81fda2593ad6162271d26667f8
> > > > > > > 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
> > > > > > > 57331a59ac0d680f606403eb24edd3c35aecba31
> > > > > > > 
> > > > > > > and still there would be conflicting codepaths (and does not seem
> > > > > > > right).
> > > > > > > 
> > > > > > > Chuck, Benjamin, Trond, is there anything we can provive on reporters
> > > > > > > side that we can try to tackle this issue better?
> > > > > > 
> > > > > > As I've indicated before, NFSD should not block no matter what the
> > > > > > client may or may not be doing. I'd like to focus on the server first.
> > > > > > 
> > > > > > What is the result of:
> > > > > > 
> > > > > > $ cd <Bookworm's v6.1.90 kernel source >
> > > > > > $ unset KBUILD_OUTPUT
> > > > > > $ make -j `nproc`
> > > > > > $ scripts/faddr2line \
> > > > > > 	fs/nfsd/nfs4state.o \
> > > > > > 	nfsd4_destroy_session+0x16d
> > > > > > 
> > > > > > Since this issue appeared after v6.1.1, is it possible to bisect
> > > > > > between v6.1.1 and v6.1.90 ?
> > > > > 
> > > > > First please note, at least speaking of triggering the issue in
> > > > > Debian, Debian has moved to 6.1.119 based kernel already (and soon in
> > > > > the weekends point release move to 6.1.123).
> > > > > 
> > > > > That said, one of the users which regularly seems to be hit by the
> > > > > issue was able to provide the above requested information, based for
> > > > > 6.1.119:
> > > > > 
> > > > > ~/kernel/linux-source-6.1# make kernelversion
> > > > > 6.1.119
> > > > > ~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
> > > > > nfsd4_destroy_session+0x16d/0x250:
> > > > > __list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
> > > > > (inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
> > > > > (inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
> > > > > (inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
> > > > > 
> > > > > They could provide as well a decode_stacktrace output for the recent
> > > > > hit (if that is helpful for you):
> > > > > 
> > > > > [Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
> > > > > [Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
> > > > > [Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > [Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
> > > > > [Mon Jan 6 13:25:28 2025] Call Trace:
> > > > > [Mon Jan 6 13:25:28 2025]  <TASK>
> > > > > [Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
> > > > > [Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
> > > > > [Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
> > > > > [Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
> > > > > [Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
> > > > > [Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
> > > > > [Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
> > > > > [Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
> > > > > [Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
> > > > > [Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
> > > > > [Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
> > > > > [Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
> > > > > [Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
> > > > > [Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
> > > > > [Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
> > > > > [Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
> > > > > [Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
> > > > > [Mon Jan  6 13:25:28 2025]  </TASK>
> > > > > 
> > > > > The question about bisection is actually harder, those are production
> > > > > systems and I understand it's not possible to do bisect there, while
> > > > > unfortunately not reprodcing the issue on "lab conditions".
> > > > > 
> > > > > Does the above give us still a clue on what you were looking for?
> > > > 
> > > > Thanks for the update.
> > > > 
> > > > It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
> > > > nfs4_client"), while not a specific fix for this issue, might offer some
> > > > relief by preventing the DESTROY_SESSION hang from affecting all other
> > > > clients of the degraded server.
> > > > 
> > > > Not having a reproducer or the ability to bisect puts a damper on
> > > > things. The next step, then, is to enable tracing on servers where this
> > > > issue can come up, and wait for the hang to occur. The following command
> > > > captures information only about callback operation, so it should not
> > > > generate much data or impact server performance.
> > > > 
> > > >    # trace-cmd record -e nfsd:nfsd_cb\*
> > > > 
> > > > Let that run until the problem occurs, then ^C, compress the resulting
> > > > trace.dat file, and either attach it to 1071562 or email it to me
> > > > privately.
> > > thanks for the follow-up.
> > > 
> > > I am the "customer" with two affected file servers. We have since moved those
> > > servers to the backports kernel (6.11.10) in the hope of forward fixing the
> > > issue. If this kernel is stable, I'm afraid I can't go back to the 'bad'
> > > kernel (700+ researchers affected..), and we're also not able to trigger the
> > > issue on our test environment.
> > 
> > Hello Dr. Herzog -
> > 
> > If the problem recurs on 6.11, the trace-cmd I suggest above works
> > there as well.
> the bad news is: it just happened again with the bpo kernel.
> 
> We then immediately started trace-cmd since usually there are several call
> traces in a row and we did get a trace.dat:
> http://people.phys.ethz.ch/~daduke/trace.dat
> 
> we did notice however that the stack trace looked a bit different this time:
> 
>     INFO: task nfsd:2566 blocked for more than 5799 seconds.
>     Tainted: G        W          6.11.10+bpo-amd64 #1 Debia>
>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables t>
>     task:nfsd            state:D stack:0     pid:2566  tgid:2566 >
>     Call Trace:
>     <TASK>
>     __schedule+0x400/0xad0
>     schedule+0x27/0xf0
>     nfsd4_shutdown_callback+0xfe/0x140 [nfsd]
>     ? __pfx_var_wake_function+0x10/0x10
>     __destroy_client+0x1f0/0x290 [nfsd]
>     nfsd4_destroy_clientid+0xf1/0x1e0 [nfsd]
>     ? svcauth_unix_set_client+0x586/0x5f0 [sunrpc]
>     nfsd4_proc_compound+0x34d/0x670 [nfsd]
>     nfsd_dispatch+0xfd/0x220 [nfsd]
>     svc_process_common+0x2f7/0x700 [sunrpc]
>     ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>     svc_process+0x131/0x180 [sunrpc]
>     svc_recv+0x830/0xa10 [sunrpc]
>     ? __pfx_nfsd+0x10/0x10 [nfsd]
>     nfsd+0x87/0xf0 [nfsd]
>     kthread+0xcf/0x100
>     ? __pfx_kthread+0x10/0x10
>     ret_from_fork+0x31/0x50
>     ? __pfx_kthread+0x10/0x10
>     ret_from_fork_asm+0x1a/0x30
>     </TASK>
> 
> and also the state of the offending client in `/proc/fs/nfsd/clients/*/info`
> used to be callback state: UNKNOWN while now it is DOWN or FAULT. No idea
> what it means, but I thought it was worth mentioning.
> 

Looks like this is hung in nfsd41_cb_inflight_wait_complete() ? That
probably means that the cl_cb_inflight counter is stuck at >0. I'm
guessing that means that there is some callback that it's expecting to
complete that isn't. From nfsd4_shutdown_callback():

        /*
         * Note this won't actually result in a null callback;
         * instead, nfsd4_run_cb_null() will detect the killed
         * client, destroy the rpc client, and stop:
         */
        nfsd4_run_cb(&clp->cl_cb_null);
        flush_workqueue(clp->cl_callback_wq);
        nfsd41_cb_inflight_wait_complete(clp);

...it sounds like that isn't happening properly though.

It might be interesting to see if you can track down the callback
client in /sys/kernel/debug/sunrpc and see what it's doing.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-09 12:42                 ` Jeff Layton
@ 2025-01-09 13:56                   ` Chuck Lever
  2025-01-09 16:32                   ` Chuck Lever
  1 sibling, 0 replies; 31+ messages in thread
From: Chuck Lever @ 2025-01-09 13:56 UTC (permalink / raw)
  To: Jeff Layton, Christian Herzog
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste

On 1/9/25 7:42 AM, Jeff Layton wrote:
> On Thu, 2025-01-09 at 12:56 +0100, Christian Herzog wrote:
>> Dear Chuck,
>>
>> On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
>>> On 1/8/25 9:54 AM, Christian Herzog wrote:
>>>> Dear Chuck,
>>>>
>>>> On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
>>>>> On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
>>>>>> Hi Chuck,
>>>>>>
>>>>>> Thanks for your time on this, much appreciated.
>>>>>>
>>>>>> On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
>>>>>>> On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
>>>>>>>> Hi Chuck, hi all,
>>>>>>>>
>>>>>>>> [it was not ideal to pick one of the message for this followup, let me
>>>>>>>> know if you want a complete new thread, adding as well Benjamin and
>>>>>>>> Trond as they are involved in one mentioned patch]
>>>>>>>>
>>>>>>>> On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>> what would be the reason for nfsd getting stuck somehow and becoming
>>>>>>>>>> an unkillable process? See
>>>>>>>>>>
>>>>>>>>>> - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
>>>>>>>>>> - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
>>>>>>>>>>
>>>>>>>>>> Doesn't this mean that something inside the kernel gets stuck as
>>>>>>>>>> well? Seems odd to me.
>>>>>>>>>
>>>>>>>>> I'm not familiar with the Debian or Ubuntu kernel packages. Can
>>>>>>>>> the kernel release numbers be translated to LTS kernel releases
>>>>>>>>> please? Need both "last known working" and "first broken" releases.
>>>>>>>>>
>>>>>>>>> This:
>>>>>>>>>
>>>>>>>>> [ 6596.911785] RPC: Could not send backchannel reply error: -110
>>>>>>>>> [ 6596.972490] RPC: Could not send backchannel reply error: -110
>>>>>>>>> [ 6837.281307] RPC: Could not send backchannel reply error: -110
>>>>>>>>>
>>>>>>>>> is a known set of client backchannel bugs. Knowing the LTS kernel
>>>>>>>>> releases (see above) will help us figure out what needs to be
>>>>>>>>> backported to the LTS kernels kernels in question.
>>>>>>>>>
>>>>>>>>> This:
>>>>>>>>>
>>>>>>>>> [11183.290619] wait_for_completion+0x88/0x150
>>>>>>>>> [11183.290623] __flush_workqueue+0x140/0x3e0
>>>>>>>>> [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
>>>>>>>>> [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
>>>>>>>>>
>>>>>>>>> is probably related to the backchannel errors on the client, but
>>>>>>>>> client bugs shouldn't cause the server to hang like this. We
>>>>>>>>> might be able to say more if you can provide the kernel release
>>>>>>>>> translations (see above).
>>>>>>>>
>>>>>>>> In Debian we hstill have the bug #1071562 open and one person notified
>>>>>>>> mye offlist that it appears that the issue get more frequent since
>>>>>>>> they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
>>>>>>>> with a 6.1.y based kernel).
>>>>>>>>
>>>>>>>> Some people around those issues, seem to claim that the change
>>>>>>>> mentioned in
>>>>>>>> https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
>>>>>>>> would fix the issue, which is as well backchannel related.
>>>>>>>>
>>>>>>>> This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
>>>>>>>> again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
>>>>>>>> nfs_client's rpc timeouts for backchannel") this is not something
>>>>>>>> which goes back to 6.1.y, could it be possible that hte backchannel
>>>>>>>> refactoring and this final fix indeeds fixes the issue?
>>>>>>>>
>>>>>>>> As people report it is not easily reproducible, so this makes it
>>>>>>>> harder to identify fixes correctly.
>>>>>>>>
>>>>>>>> I gave a (short) stance on trying to backport commits up to
>>>>>>>> 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
>>>>>>>> seems to indicate it is probably still not the right thing for
>>>>>>>> backporting to the older stable series.
>>>>>>>>
>>>>>>>> As at least pre-requisites:
>>>>>>>>
>>>>>>>> 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
>>>>>>>> 4119bd0306652776cb0b7caa3aea5b2a93aecb89
>>>>>>>> 163cdfca341b76c958567ae0966bd3575c5c6192
>>>>>>>> f4afc8fead386c81fda2593ad6162271d26667f8
>>>>>>>> 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
>>>>>>>> 57331a59ac0d680f606403eb24edd3c35aecba31
>>>>>>>>
>>>>>>>> and still there would be conflicting codepaths (and does not seem
>>>>>>>> right).
>>>>>>>>
>>>>>>>> Chuck, Benjamin, Trond, is there anything we can provive on reporters
>>>>>>>> side that we can try to tackle this issue better?
>>>>>>>
>>>>>>> As I've indicated before, NFSD should not block no matter what the
>>>>>>> client may or may not be doing. I'd like to focus on the server first.
>>>>>>>
>>>>>>> What is the result of:
>>>>>>>
>>>>>>> $ cd <Bookworm's v6.1.90 kernel source >
>>>>>>> $ unset KBUILD_OUTPUT
>>>>>>> $ make -j `nproc`
>>>>>>> $ scripts/faddr2line \
>>>>>>> 	fs/nfsd/nfs4state.o \
>>>>>>> 	nfsd4_destroy_session+0x16d
>>>>>>>
>>>>>>> Since this issue appeared after v6.1.1, is it possible to bisect
>>>>>>> between v6.1.1 and v6.1.90 ?
>>>>>>
>>>>>> First please note, at least speaking of triggering the issue in
>>>>>> Debian, Debian has moved to 6.1.119 based kernel already (and soon in
>>>>>> the weekends point release move to 6.1.123).
>>>>>>
>>>>>> That said, one of the users which regularly seems to be hit by the
>>>>>> issue was able to provide the above requested information, based for
>>>>>> 6.1.119:
>>>>>>
>>>>>> ~/kernel/linux-source-6.1# make kernelversion
>>>>>> 6.1.119
>>>>>> ~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
>>>>>> nfsd4_destroy_session+0x16d/0x250:
>>>>>> __list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
>>>>>> (inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
>>>>>> (inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
>>>>>> (inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
>>>>>>
>>>>>> They could provide as well a decode_stacktrace output for the recent
>>>>>> hit (if that is helpful for you):
>>>>>>
>>>>>> [Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
>>>>>> [Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
>>>>>> [Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>> [Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
>>>>>> [Mon Jan 6 13:25:28 2025] Call Trace:
>>>>>> [Mon Jan 6 13:25:28 2025]  <TASK>
>>>>>> [Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
>>>>>> [Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
>>>>>> [Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
>>>>>> [Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
>>>>>> [Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
>>>>>> [Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
>>>>>> [Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
>>>>>> [Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
>>>>>> [Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
>>>>>> [Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
>>>>>> [Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
>>>>>> [Mon Jan  6 13:25:28 2025]  </TASK>
>>>>>>
>>>>>> The question about bisection is actually harder, those are production
>>>>>> systems and I understand it's not possible to do bisect there, while
>>>>>> unfortunately not reprodcing the issue on "lab conditions".
>>>>>>
>>>>>> Does the above give us still a clue on what you were looking for?
>>>>>
>>>>> Thanks for the update.
>>>>>
>>>>> It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
>>>>> nfs4_client"), while not a specific fix for this issue, might offer some
>>>>> relief by preventing the DESTROY_SESSION hang from affecting all other
>>>>> clients of the degraded server.
>>>>>
>>>>> Not having a reproducer or the ability to bisect puts a damper on
>>>>> things. The next step, then, is to enable tracing on servers where this
>>>>> issue can come up, and wait for the hang to occur. The following command
>>>>> captures information only about callback operation, so it should not
>>>>> generate much data or impact server performance.
>>>>>
>>>>>     # trace-cmd record -e nfsd:nfsd_cb\*
>>>>>
>>>>> Let that run until the problem occurs, then ^C, compress the resulting
>>>>> trace.dat file, and either attach it to 1071562 or email it to me
>>>>> privately.
>>>> thanks for the follow-up.
>>>>
>>>> I am the "customer" with two affected file servers. We have since moved those
>>>> servers to the backports kernel (6.11.10) in the hope of forward fixing the
>>>> issue. If this kernel is stable, I'm afraid I can't go back to the 'bad'
>>>> kernel (700+ researchers affected..), and we're also not able to trigger the
>>>> issue on our test environment.
>>>
>>> Hello Dr. Herzog -
>>>
>>> If the problem recurs on 6.11, the trace-cmd I suggest above works
>>> there as well.
>> the bad news is: it just happened again with the bpo kernel.
>>
>> We then immediately started trace-cmd since usually there are several call
>> traces in a row and we did get a trace.dat:
>> http://people.phys.ethz.ch/~daduke/trace.dat
>>
>> we did notice however that the stack trace looked a bit different this time:
>>
>>      INFO: task nfsd:2566 blocked for more than 5799 seconds.
>>      Tainted: G        W          6.11.10+bpo-amd64 #1 Debia>
>>      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables t>
>>      task:nfsd            state:D stack:0     pid:2566  tgid:2566 >
>>      Call Trace:
>>      <TASK>
>>      __schedule+0x400/0xad0
>>      schedule+0x27/0xf0
>>      nfsd4_shutdown_callback+0xfe/0x140 [nfsd]
>>      ? __pfx_var_wake_function+0x10/0x10
>>      __destroy_client+0x1f0/0x290 [nfsd]
>>      nfsd4_destroy_clientid+0xf1/0x1e0 [nfsd]
>>      ? svcauth_unix_set_client+0x586/0x5f0 [sunrpc]
>>      nfsd4_proc_compound+0x34d/0x670 [nfsd]
>>      nfsd_dispatch+0xfd/0x220 [nfsd]
>>      svc_process_common+0x2f7/0x700 [sunrpc]
>>      ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>      svc_process+0x131/0x180 [sunrpc]
>>      svc_recv+0x830/0xa10 [sunrpc]
>>      ? __pfx_nfsd+0x10/0x10 [nfsd]
>>      nfsd+0x87/0xf0 [nfsd]
>>      kthread+0xcf/0x100
>>      ? __pfx_kthread+0x10/0x10
>>      ret_from_fork+0x31/0x50
>>      ? __pfx_kthread+0x10/0x10
>>      ret_from_fork_asm+0x1a/0x30
>>      </TASK>
>>
>> and also the state of the offending client in `/proc/fs/nfsd/clients/*/info`
>> used to be callback state: UNKNOWN while now it is DOWN or FAULT. No idea
>> what it means, but I thought it was worth mentioning.
>>
> 
> Looks like this is hung in nfsd41_cb_inflight_wait_complete() ? That
> probably means that the cl_cb_inflight counter is stuck at >0. I'm
> guessing that means that there is some callback that it's expecting to
> complete that isn't. From nfsd4_shutdown_callback():
> 
>          /*
>           * Note this won't actually result in a null callback;
>           * instead, nfsd4_run_cb_null() will detect the killed
>           * client, destroy the rpc client, and stop:
>           */
>          nfsd4_run_cb(&clp->cl_cb_null);
>          flush_workqueue(clp->cl_callback_wq);
>          nfsd41_cb_inflight_wait_complete(clp);
> 
> ...it sounds like that isn't happening properly though.
> 
> It might be interesting to see if you can track down the callback
> client in /sys/kernel/debug/sunrpc and see what it's doing.

Could be that the RPC client was shutdown while there were outstanding
callbacks, and that means that counter can now never go to zero?



-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-09 12:42                 ` Jeff Layton
  2025-01-09 13:56                   ` Chuck Lever
@ 2025-01-09 16:32                   ` Chuck Lever
       [not found]                     ` <f0705a65549ef253.67823675@ac-grenoble.fr>
  1 sibling, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-01-09 16:32 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste, Christian Herzog

On 1/9/25 7:42 AM, Jeff Layton wrote:
> On Thu, 2025-01-09 at 12:56 +0100, Christian Herzog wrote:
>> Dear Chuck,
>>
>> On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
>>> On 1/8/25 9:54 AM, Christian Herzog wrote:
>>>> Dear Chuck,
>>>>
>>>> On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
>>>>> On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
>>>>>> Hi Chuck,
>>>>>>
>>>>>> Thanks for your time on this, much appreciated.
>>>>>>
>>>>>> On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
>>>>>>> On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
>>>>>>>> Hi Chuck, hi all,
>>>>>>>>
>>>>>>>> [it was not ideal to pick one of the message for this followup, let me
>>>>>>>> know if you want a complete new thread, adding as well Benjamin and
>>>>>>>> Trond as they are involved in one mentioned patch]
>>>>>>>>
>>>>>>>> On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi folks,
>>>>>>>>>>
>>>>>>>>>> what would be the reason for nfsd getting stuck somehow and becoming
>>>>>>>>>> an unkillable process? See
>>>>>>>>>>
>>>>>>>>>> - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
>>>>>>>>>> - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
>>>>>>>>>>
>>>>>>>>>> Doesn't this mean that something inside the kernel gets stuck as
>>>>>>>>>> well? Seems odd to me.
>>>>>>>>>
>>>>>>>>> I'm not familiar with the Debian or Ubuntu kernel packages. Can
>>>>>>>>> the kernel release numbers be translated to LTS kernel releases
>>>>>>>>> please? Need both "last known working" and "first broken" releases.
>>>>>>>>>
>>>>>>>>> This:
>>>>>>>>>
>>>>>>>>> [ 6596.911785] RPC: Could not send backchannel reply error: -110
>>>>>>>>> [ 6596.972490] RPC: Could not send backchannel reply error: -110
>>>>>>>>> [ 6837.281307] RPC: Could not send backchannel reply error: -110
>>>>>>>>>
>>>>>>>>> is a known set of client backchannel bugs. Knowing the LTS kernel
>>>>>>>>> releases (see above) will help us figure out what needs to be
>>>>>>>>> backported to the LTS kernels kernels in question.
>>>>>>>>>
>>>>>>>>> This:
>>>>>>>>>
>>>>>>>>> [11183.290619] wait_for_completion+0x88/0x150
>>>>>>>>> [11183.290623] __flush_workqueue+0x140/0x3e0
>>>>>>>>> [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
>>>>>>>>> [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
>>>>>>>>>
>>>>>>>>> is probably related to the backchannel errors on the client, but
>>>>>>>>> client bugs shouldn't cause the server to hang like this. We
>>>>>>>>> might be able to say more if you can provide the kernel release
>>>>>>>>> translations (see above).
>>>>>>>>
>>>>>>>> In Debian we hstill have the bug #1071562 open and one person notified
>>>>>>>> mye offlist that it appears that the issue get more frequent since
>>>>>>>> they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
>>>>>>>> with a 6.1.y based kernel).
>>>>>>>>
>>>>>>>> Some people around those issues, seem to claim that the change
>>>>>>>> mentioned in
>>>>>>>> https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
>>>>>>>> would fix the issue, which is as well backchannel related.
>>>>>>>>
>>>>>>>> This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
>>>>>>>> again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
>>>>>>>> nfs_client's rpc timeouts for backchannel") this is not something
>>>>>>>> which goes back to 6.1.y, could it be possible that hte backchannel
>>>>>>>> refactoring and this final fix indeeds fixes the issue?
>>>>>>>>
>>>>>>>> As people report it is not easily reproducible, so this makes it
>>>>>>>> harder to identify fixes correctly.
>>>>>>>>
>>>>>>>> I gave a (short) stance on trying to backport commits up to
>>>>>>>> 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
>>>>>>>> seems to indicate it is probably still not the right thing for
>>>>>>>> backporting to the older stable series.
>>>>>>>>
>>>>>>>> As at least pre-requisites:
>>>>>>>>
>>>>>>>> 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
>>>>>>>> 4119bd0306652776cb0b7caa3aea5b2a93aecb89
>>>>>>>> 163cdfca341b76c958567ae0966bd3575c5c6192
>>>>>>>> f4afc8fead386c81fda2593ad6162271d26667f8
>>>>>>>> 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
>>>>>>>> 57331a59ac0d680f606403eb24edd3c35aecba31
>>>>>>>>
>>>>>>>> and still there would be conflicting codepaths (and does not seem
>>>>>>>> right).
>>>>>>>>
>>>>>>>> Chuck, Benjamin, Trond, is there anything we can provive on reporters
>>>>>>>> side that we can try to tackle this issue better?
>>>>>>>
>>>>>>> As I've indicated before, NFSD should not block no matter what the
>>>>>>> client may or may not be doing. I'd like to focus on the server first.
>>>>>>>
>>>>>>> What is the result of:
>>>>>>>
>>>>>>> $ cd <Bookworm's v6.1.90 kernel source >
>>>>>>> $ unset KBUILD_OUTPUT
>>>>>>> $ make -j `nproc`
>>>>>>> $ scripts/faddr2line \
>>>>>>> 	fs/nfsd/nfs4state.o \
>>>>>>> 	nfsd4_destroy_session+0x16d
>>>>>>>
>>>>>>> Since this issue appeared after v6.1.1, is it possible to bisect
>>>>>>> between v6.1.1 and v6.1.90 ?
>>>>>>
>>>>>> First please note, at least speaking of triggering the issue in
>>>>>> Debian, Debian has moved to 6.1.119 based kernel already (and soon in
>>>>>> the weekends point release move to 6.1.123).
>>>>>>
>>>>>> That said, one of the users which regularly seems to be hit by the
>>>>>> issue was able to provide the above requested information, based for
>>>>>> 6.1.119:
>>>>>>
>>>>>> ~/kernel/linux-source-6.1# make kernelversion
>>>>>> 6.1.119
>>>>>> ~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
>>>>>> nfsd4_destroy_session+0x16d/0x250:
>>>>>> __list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
>>>>>> (inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
>>>>>> (inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
>>>>>> (inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
>>>>>>
>>>>>> They could provide as well a decode_stacktrace output for the recent
>>>>>> hit (if that is helpful for you):
>>>>>>
>>>>>> [Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
>>>>>> [Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
>>>>>> [Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>> [Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
>>>>>> [Mon Jan 6 13:25:28 2025] Call Trace:
>>>>>> [Mon Jan 6 13:25:28 2025]  <TASK>
>>>>>> [Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
>>>>>> [Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
>>>>>> [Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
>>>>>> [Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
>>>>>> [Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
>>>>>> [Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
>>>>>> [Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
>>>>>> [Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
>>>>>> [Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
>>>>>> [Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
>>>>>> [Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
>>>>>> [Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
>>>>>> [Mon Jan  6 13:25:28 2025]  </TASK>
>>>>>>
>>>>>> The question about bisection is actually harder, those are production
>>>>>> systems and I understand it's not possible to do bisect there, while
>>>>>> unfortunately not reprodcing the issue on "lab conditions".
>>>>>>
>>>>>> Does the above give us still a clue on what you were looking for?
>>>>>
>>>>> Thanks for the update.
>>>>>
>>>>> It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
>>>>> nfs4_client"), while not a specific fix for this issue, might offer some
>>>>> relief by preventing the DESTROY_SESSION hang from affecting all other
>>>>> clients of the degraded server.
>>>>>
>>>>> Not having a reproducer or the ability to bisect puts a damper on
>>>>> things. The next step, then, is to enable tracing on servers where this
>>>>> issue can come up, and wait for the hang to occur. The following command
>>>>> captures information only about callback operation, so it should not
>>>>> generate much data or impact server performance.
>>>>>
>>>>>     # trace-cmd record -e nfsd:nfsd_cb\*
>>>>>
>>>>> Let that run until the problem occurs, then ^C, compress the resulting
>>>>> trace.dat file, and either attach it to 1071562 or email it to me
>>>>> privately.
>>>> thanks for the follow-up.
>>>>
>>>> I am the "customer" with two affected file servers. We have since moved those
>>>> servers to the backports kernel (6.11.10) in the hope of forward fixing the
>>>> issue. If this kernel is stable, I'm afraid I can't go back to the 'bad'
>>>> kernel (700+ researchers affected..), and we're also not able to trigger the
>>>> issue on our test environment.
>>>
>>> Hello Dr. Herzog -
>>>
>>> If the problem recurs on 6.11, the trace-cmd I suggest above works
>>> there as well.
>> the bad news is: it just happened again with the bpo kernel.
>>
>> We then immediately started trace-cmd since usually there are several call
>> traces in a row and we did get a trace.dat:
>> http://people.phys.ethz.ch/~daduke/trace.dat
>>
>> we did notice however that the stack trace looked a bit different this time:
>>
>>      INFO: task nfsd:2566 blocked for more than 5799 seconds.
>>      Tainted: G        W          6.11.10+bpo-amd64 #1 Debia>
>>      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables t>
>>      task:nfsd            state:D stack:0     pid:2566  tgid:2566 >
>>      Call Trace:
>>      <TASK>
>>      __schedule+0x400/0xad0
>>      schedule+0x27/0xf0
>>      nfsd4_shutdown_callback+0xfe/0x140 [nfsd]
>>      ? __pfx_var_wake_function+0x10/0x10
>>      __destroy_client+0x1f0/0x290 [nfsd]
>>      nfsd4_destroy_clientid+0xf1/0x1e0 [nfsd]
>>      ? svcauth_unix_set_client+0x586/0x5f0 [sunrpc]
>>      nfsd4_proc_compound+0x34d/0x670 [nfsd]
>>      nfsd_dispatch+0xfd/0x220 [nfsd]
>>      svc_process_common+0x2f7/0x700 [sunrpc]
>>      ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>      svc_process+0x131/0x180 [sunrpc]
>>      svc_recv+0x830/0xa10 [sunrpc]
>>      ? __pfx_nfsd+0x10/0x10 [nfsd]
>>      nfsd+0x87/0xf0 [nfsd]
>>      kthread+0xcf/0x100
>>      ? __pfx_kthread+0x10/0x10
>>      ret_from_fork+0x31/0x50
>>      ? __pfx_kthread+0x10/0x10
>>      ret_from_fork_asm+0x1a/0x30
>>      </TASK>
>>
>> and also the state of the offending client in `/proc/fs/nfsd/clients/*/info`
>> used to be callback state: UNKNOWN while now it is DOWN or FAULT. No idea
>> what it means, but I thought it was worth mentioning.
>>
> 
> Looks like this is hung in nfsd41_cb_inflight_wait_complete() ? That
> probably means that the cl_cb_inflight counter is stuck at >0. I'm
> guessing that means that there is some callback that it's expecting to
> complete that isn't. From nfsd4_shutdown_callback():
> 
>          /*
>           * Note this won't actually result in a null callback;
>           * instead, nfsd4_run_cb_null() will detect the killed
>           * client, destroy the rpc client, and stop:
>           */
>          nfsd4_run_cb(&clp->cl_cb_null);
>          flush_workqueue(clp->cl_callback_wq);
>          nfsd41_cb_inflight_wait_complete(clp);
> 
> ...it sounds like that isn't happening properly though.
> 
> It might be interesting to see if you can track down the callback
> client in /sys/kernel/debug/sunrpc and see what it's doing.

Here's a possible scenario:

The trace.dat shows the server is sending a lot of CB_RECALL_ANY
operations.

Normally a callback occurs only due to some particular client request.
Such requests would be completed before a client unmounts.

CB_RECALL_ANY is an operation which does not occur due to a particular
client operation and can occur at any time. I believe this is the
first operation of this type we've added to NFSD.

My guess is that the server's laundromat has sent some CB_RECALL_ANY
operations while the client is umounting -- DESTROY_SESSION is
racing with those callback operations.

deleg_reaper() was backported to 6.1.y in 6.1.81, which lines up
more or less with when the issues started to show up.

A quick way to test this theory would be to make deleg_reaper() a
no-op. If this helps, then we know that shutting down the server's
callback client needs to be more careful about cleaning up outstanding
callbacks.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

[parent not found: <f0705a65549ef253.67823675@ac-grenoble.fr>]

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
       [not found]                     ` <f0705a65549ef253.67823675@ac-grenoble.fr>
@ 2025-01-16 20:07                       ` Chuck Lever
  2025-01-17 19:43                         ` Baptiste PELLEGRIN
  0 siblings, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-01-16 20:07 UTC (permalink / raw)
  To: Pellegrin Baptiste, Rik Theys, Christian Herzog
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Jeff Layton

On 1/11/25 3:14 AM, Pellegrin Baptiste wrote:
> On 09/01/25 17:32, *Chuck Lever * <chuck.lever@oracle.com> wrote:
>> On 1/9/25 7:42 AM, Jeff Layton wrote:
>> >On Thu, 2025-01-09 at 12:56 +0100, Christian Herzog wrote:
>> >>Dear Chuck,
>> >>
>> >>On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
>> >>>On 1/8/25 9:54 AM, Christian Herzog wrote:
>> >>>>Dear Chuck,
>> >>>>
>> >>>>On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
>> >>>>>On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
>> >>>>>>Hi Chuck,
>> >>>>>>
>> >>>>>>Thanks for your time on this, much appreciated.
>> >>>>>>
>> >>>>>>On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
>> >>>>>>>On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
>> >>>>>>>>Hi Chuck, hi all,
>> >>>>>>>>
>> >>>>>>>>[it was not ideal to pick one of the message for this 
>> followup, let me
>> >>>>>>>>know if you want a complete new thread, adding as well 
>> Benjamin and
>> >>>>>>>>Trond as they are involved in one mentioned patch]
>> >>>>>>>>
>> >>>>>>>>On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>>On Jun 17, 2024, at 2:55 AM, Harald Dunkel 
>> <harald.dunkel@aixigo.com> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>Hi folks,
>> >>>>>>>>>>
>> >>>>>>>>>>what would be the reason for nfsd getting stuck somehow and 
>> becoming
>> >>>>>>>>>>an unkillable process? See
>> >>>>>>>>>>
>> >>>>>>>>>>- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562 
>> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562>
>> >>>>>>>>>>- https://bugs.launchpad.net/ubuntu/+source/nfs-utils/ 
>> +bug/2062568 <https://bugs.launchpad.net/ubuntu/+source/nfs-utils/ 
>> +bug/2062568>
>> >>>>>>>>>>
>> >>>>>>>>>>Doesn't this mean that something inside the kernel gets stuck as
>> >>>>>>>>>>well? Seems odd to me.
>> >>>>>>>>>
>> >>>>>>>>>I'm not familiar with the Debian or Ubuntu kernel packages. Can
>> >>>>>>>>>the kernel release numbers be translated to LTS kernel releases
>> >>>>>>>>>please? Need both "last known working" and "first broken" 
>> releases.
>> >>>>>>>>>
>> >>>>>>>>>This:
>> >>>>>>>>>
>> >>>>>>>>>[ 6596.911785] RPC: Could not send backchannel reply error: -110
>> >>>>>>>>>[ 6596.972490] RPC: Could not send backchannel reply error: -110
>> >>>>>>>>>[ 6837.281307] RPC: Could not send backchannel reply error: -110
>> >>>>>>>>>
>> >>>>>>>>>is a known set of client backchannel bugs. Knowing the LTS kernel
>> >>>>>>>>>releases (see above) will help us figure out what needs to be
>> >>>>>>>>>backported to the LTS kernels kernels in question.
>> >>>>>>>>>
>> >>>>>>>>>This:
>> >>>>>>>>>
>> >>>>>>>>>[11183.290619] wait_for_completion+0x88/0x150
>> >>>>>>>>>[11183.290623] __flush_workqueue+0x140/0x3e0
>> >>>>>>>>>[11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
>> >>>>>>>>>[11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
>> >>>>>>>>>
>> >>>>>>>>>is probably related to the backchannel errors on the client, but
>> >>>>>>>>>client bugs shouldn't cause the server to hang like this. We
>> >>>>>>>>>might be able to say more if you can provide the kernel release
>> >>>>>>>>>translations (see above).
>> >>>>>>>>
>> >>>>>>>>In Debian we hstill have the bug #1071562 open and one person 
>> notified
>> >>>>>>>>mye offlist that it appears that the issue get more frequent since
>> >>>>>>>>they updated on NFS client side from Ubuntu 20.04 to Debian 
>> bookworm
>> >>>>>>>>with a 6.1.y based kernel).
>> >>>>>>>>
>> >>>>>>>>Some people around those issues, seem to claim that the change
>> >>>>>>>>mentioned in
>> >>>>>>>>https://lists.proxmox.com/pipermail/pve-devel/2024- 
>> July/064614.html <https://lists.proxmox.com/pipermail/pve-devel/2024- 
>> July/064614.html>
>> >>>>>>>>would fix the issue, which is as well backchannel related.
>> >>>>>>>>
>> >>>>>>>>This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
>> >>>>>>>>again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
>> >>>>>>>>nfs_client's rpc timeouts for backchannel") this is not something
>> >>>>>>>>which goes back to 6.1.y, could it be possible that hte 
>> backchannel
>> >>>>>>>>refactoring and this final fix indeeds fixes the issue?
>> >>>>>>>>
>> >>>>>>>>As people report it is not easily reproducible, so this makes it
>> >>>>>>>>harder to identify fixes correctly.
>> >>>>>>>>
>> >>>>>>>>I gave a (short) stance on trying to backport commits up to
>> >>>>>>>>6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this 
>> quickly
>> >>>>>>>>seems to indicate it is probably still not the right thing for
>> >>>>>>>>backporting to the older stable series.
>> >>>>>>>>
>> >>>>>>>>As at least pre-requisites:
>> >>>>>>>>
>> >>>>>>>>2009e32997ed568a305cf9bc7bf27d22e0f6ccda
>> >>>>>>>>4119bd0306652776cb0b7caa3aea5b2a93aecb89
>> >>>>>>>>163cdfca341b76c958567ae0966bd3575c5c6192
>> >>>>>>>>f4afc8fead386c81fda2593ad6162271d26667f8
>> >>>>>>>>6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
>> >>>>>>>>57331a59ac0d680f606403eb24edd3c35aecba31
>> >>>>>>>>
>> >>>>>>>>and still there would be conflicting codepaths (and does not seem
>> >>>>>>>>right).
>> >>>>>>>>
>> >>>>>>>>Chuck, Benjamin, Trond, is there anything we can provive on 
>> reporters
>> >>>>>>>>side that we can try to tackle this issue better?
>> >>>>>>>
>> >>>>>>>As I've indicated before, NFSD should not block no matter what the
>> >>>>>>>client may or may not be doing. I'd like to focus on the server 
>> first.
>> >>>>>>>
>> >>>>>>>What is the result of:
>> >>>>>>>
>> >>>>>>>$ cd <Bookworm's v6.1.90 kernel source >
>> >>>>>>>$ unset KBUILD_OUTPUT
>> >>>>>>>$ make -j `nproc`
>> >>>>>>>$ scripts/faddr2line \
>> >>>>>>> fs/nfsd/nfs4state.o \
>> >>>>>>> nfsd4_destroy_session+0x16d
>> >>>>>>>
>> >>>>>>>Since this issue appeared after v6.1.1, is it possible to bisect
>> >>>>>>>between v6.1.1 and v6.1.90 ?
>> >>>>>>
>> >>>>>>First please note, at least speaking of triggering the issue in
>> >>>>>>Debian, Debian has moved to 6.1.119 based kernel already (and 
>> soon in
>> >>>>>>the weekends point release move to 6.1.123).
>> >>>>>>
>> >>>>>>That said, one of the users which regularly seems to be hit by the
>> >>>>>>issue was able to provide the above requested information, based for
>> >>>>>>6.1.119:
>> >>>>>>
>> >>>>>>~/kernel/linux-source-6.1# make kernelversion
>> >>>>>>6.1.119
>> >>>>>>~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/ 
>> nfs4state.o nfsd4_destroy_session+0x16d
>> >>>>>>nfsd4_destroy_session+0x16d/0x250:
>> >>>>>>__list_del_entry at /root/kernel/linux-source-6.1/./include/ 
>> linux/list.h:134
>> >>>>>>(inlined by) list_del at /root/kernel/linux-source-6.1/./ 
>> include/linux/list.h:148
>> >>>>>>(inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/ 
>> nfsd/nfs4state.c:2062
>> >>>>>>(inlined by) nfsd4_destroy_session at /root/kernel/linux- 
>> source-6.1/fs/nfsd/nfs4state.c:3856
>> >>>>>>
>> >>>>>>They could provide as well a decode_stacktrace output for the recent
>> >>>>>>hit (if that is helpful for you):
>> >>>>>>
>> >>>>>>[Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more 
>> than 6883 seconds.
>> >>>>>>[Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 
>> Debian 6.1.119-1
>> >>>>>>[Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/ 
>> hung_task_timeout_secs" disables this message.
>> >>>>>>[Mon Jan 6 13:25:28 2025] task:nfsd            state:D 
>> stack:0     pid:55306 ppid:2      flags:0x00004000
>> >>>>>>[Mon Jan 6 13:25:28 2025] Call Trace:
>> >>>>>>[Mon Jan 6 13:25:28 2025]  <TASK>
>> >>>>>>[Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
>> >>>>>>[Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
>> >>>>>>[Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
>> >>>>>>[Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
>> >>>>>>[Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/ 
>> build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/ 
>> build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/ 
>> build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/ 
>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
>> >>>>>>[Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/ 
>> build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
>> >>>>>>[Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/ 
>> build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
>> >>>>>>[Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/ 
>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
>> >>>>>>[Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/ 
>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
>> >>>>>>[Mon Jan 6 13:25:28 2025] svc_process (debian/build/ 
>> build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd (debian/build/ 
>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
>> >>>>>>[Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
>> >>>>>>[Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
>> >>>>>>[Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
>> >>>>>>[Mon Jan  6 13:25:28 2025]  </TASK>
>> >>>>>>
>> >>>>>>The question about bisection is actually harder, those are 
>> production
>> >>>>>>systems and I understand it's not possible to do bisect there, while
>> >>>>>>unfortunately not reprodcing the issue on "lab conditions".
>> >>>>>>
>> >>>>>>Does the above give us still a clue on what you were looking for?
>> >>>>>
>> >>>>>Thanks for the update.
>> >>>>>
>> >>>>>It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
>> >>>>>nfs4_client"), while not a specific fix for this issue, might 
>> offer some
>> >>>>>relief by preventing the DESTROY_SESSION hang from affecting all 
>> other
>> >>>>>clients of the degraded server.
>> >>>>>
>> >>>>>Not having a reproducer or the ability to bisect puts a damper on
>> >>>>>things. The next step, then, is to enable tracing on servers 
>> where this
>> >>>>>issue can come up, and wait for the hang to occur. The following 
>> command
>> >>>>>captures information only about callback operation, so it should not
>> >>>>>generate much data or impact server performance.
>> >>>>>
>> >>>>>    # trace-cmd record -e nfsd:nfsd_cb\*
>> >>>>>
>> >>>>>Let that run until the problem occurs, then ^C, compress the 
>> resulting
>> >>>>>trace.dat file, and either attach it to 1071562 or email it to me
>> >>>>>privately.
>> >>>>thanks for the follow-up.
>> >>>>
>> >>>>I am the "customer" with two affected file servers. We have since 
>> moved those
>> >>>>servers to the backports kernel (6.11.10) in the hope of forward 
>> fixing the
>> >>>>issue. If this kernel is stable, I'm afraid I can't go back to the 
>> 'bad'
>> >>>>kernel (700+ researchers affected..), and we're also not able to 
>> trigger the
>> >>>>issue on our test environment.
>> >>>
>> >>>Hello Dr. Herzog -
>> >>>
>> >>>If the problem recurs on 6.11, the trace-cmd I suggest above works
>> >>>there as well.
>> >>the bad news is: it just happened again with the bpo kernel.
>> >>
>> >>We then immediately started trace-cmd since usually there are 
>> several call
>> >>traces in a row and we did get a trace.dat:
>> >>http://people.phys.ethz.ch/~daduke/trace.dat <http:// 
>> people.phys.ethz.ch/~daduke/trace.dat>
>> >>
>> >>we did notice however that the stack trace looked a bit different 
>> this time:
>> >>
>> >>     INFO: task nfsd:2566 blocked for more than 5799 seconds.
>> >>     Tainted: G        W          6.11.10+bpo-amd64 #1 Debia>
>> >>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables t>
>> >>     task:nfsd            state:D stack:0     pid:2566  tgid:2566 >
>> >>     Call Trace:
>> >>     <TASK>
>> >>     __schedule+0x400/0xad0
>> >>     schedule+0x27/0xf0
>> >>     nfsd4_shutdown_callback+0xfe/0x140 [nfsd]
>> >>     ? __pfx_var_wake_function+0x10/0x10
>> >>     __destroy_client+0x1f0/0x290 [nfsd]
>> >>     nfsd4_destroy_clientid+0xf1/0x1e0 [nfsd]
>> >>     ? svcauth_unix_set_client+0x586/0x5f0 [sunrpc]
>> >>     nfsd4_proc_compound+0x34d/0x670 [nfsd]
>> >>     nfsd_dispatch+0xfd/0x220 [nfsd]
>> >>     svc_process_common+0x2f7/0x700 [sunrpc]
>> >>     ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>> >>     svc_process+0x131/0x180 [sunrpc]
>> >>     svc_recv+0x830/0xa10 [sunrpc]
>> >>     ? __pfx_nfsd+0x10/0x10 [nfsd]
>> >>     nfsd+0x87/0xf0 [nfsd]
>> >>     kthread+0xcf/0x100
>> >>     ? __pfx_kthread+0x10/0x10
>> >>     ret_from_fork+0x31/0x50
>> >>     ? __pfx_kthread+0x10/0x10
>> >>     ret_from_fork_asm+0x1a/0x30
>> >>     </TASK>
>> >>
>> >>and also the state of the offending client in `/proc/fs/nfsd/ 
>> clients/*/info`
>> >>used to be callback state: UNKNOWN while now it is DOWN or FAULT. No 
>> idea
>> >>what it means, but I thought it was worth mentioning.
>> >>
>> >
>> >Looks like this is hung in nfsd41_cb_inflight_wait_complete() ? That
>> >probably means that the cl_cb_inflight counter is stuck at >0. I'm
>> >guessing that means that there is some callback that it's expecting to
>> >complete that isn't. From nfsd4_shutdown_callback():
>> >
>> >         /*
>> >          * Note this won't actually result in a null callback;
>> >          * instead, nfsd4_run_cb_null() will detect the killed
>> >          * client, destroy the rpc client, and stop:
>> >          */
>> >         nfsd4_run_cb(&clp->cl_cb_null);
>> >         flush_workqueue(clp->cl_callback_wq);
>> >         nfsd41_cb_inflight_wait_complete(clp);
>> >
>> >...it sounds like that isn't happening properly though.
>> >
>> >It might be interesting to see if you can track down the callback
>> >client in /sys/kernel/debug/sunrpc and see what it's doing.
>>
>> Here's a possible scenario:
>>
>> The trace.dat shows the server is sending a lot of CB_RECALL_ANY
>> operations.
>>
>> Normally a callback occurs only due to some particular client request.
>> Such requests would be completed before a client unmounts.
>>
>> CB_RECALL_ANY is an operation which does not occur due to a particular
>> client operation and can occur at any time. I believe this is the
>> first operation of this type we've added to NFSD.
>>
>> My guess is that the server's laundromat has sent some CB_RECALL_ANY
>> operations while the client is umounting -- DESTROY_SESSION is
>> racing with those callback operations.
>>
>> deleg_reaper() was backported to 6.1.y in 6.1.81, which lines up
>> more or less with when the issues started to show up.
>>
>> A quick way to test this theory would be to make deleg_reaper() a
>> no-op. If this helps, then we know that shutting down the server's
>> callback client needs to be more careful about cleaning up outstanding
>> callbacks.
>>
>>
>> -- 
>> Chuck Lever
> 
> Hello.
> 
> Thanks a lot for all of this work !
> 
> I have started recording traces using :
> 
> # trace-cmd record -e nfsd:nfsd_cb\*
> 
> and currently waiting for the bug to trigger. This generally happen in 
> one/two weeks for me.
> 
> If you want I adjust some recording parameters let me know.

After looking at your data and Rik's data, I decided to try reproducing
the problem in my lab. No luck so far.

The signatures of these issues are slightly different from each other:

v6.1:
- nfsd4_destroy_session hangs on the flush_workqueue in
   nfsd4_shutdown_callback()
- There might be outstanding callback RPCs that are not exiting
- Lots of CB_RECALL_ANY traffic. Only place that's generated in
   v6.1 is the NFSD shrinker when memory is short
- There appear to be some unreachable clients -- evidence of client
   crashes or network outages, perhaps

v6.12:
- destroy_client() hangs on the wait_var_event in
   nfsd4_shtudown_callback()
- There are no callback RPCs left after the hang
- There are some stuck clients listed under /proc/fs/nfsd/

Similar, but not the same.

I'm still open to clues, so please continue the regimen of starting
the trace-cmd when the server boots, and doing an "echo t >
/proc/sysrq-trigger" when the hang appears.

  # trace-cmd record -e nfsd:nfsd_cb\* -e sunrpc:svc_xprt_detach \
    -p function -l nfsd4_destroy_session


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-16 20:07                       ` Chuck Lever
@ 2025-01-17 19:43                         ` Baptiste PELLEGRIN
  2025-01-17 20:27                           ` Chuck Lever
  0 siblings, 1 reply; 31+ messages in thread
From: Baptiste PELLEGRIN @ 2025-01-17 19:43 UTC (permalink / raw)
  To: Chuck Lever, Rik Theys, Christian Herzog
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Jeff Layton



On 1/16/25 9:07 PM, Chuck Lever wrote:
> On 1/11/25 3:14 AM, Pellegrin Baptiste wrote:
>> On 09/01/25 17:32, *Chuck Lever * <chuck.lever@oracle.com> wrote:
>>> On 1/9/25 7:42 AM, Jeff Layton wrote:
>>> >On Thu, 2025-01-09 at 12:56 +0100, Christian Herzog wrote:
>>> >>Dear Chuck,
>>> >>
>>> >>On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
>>> >>>On 1/8/25 9:54 AM, Christian Herzog wrote:
>>> >>>>Dear Chuck,
>>> >>>>
>>> >>>>On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
>>> >>>>>On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
>>> >>>>>>Hi Chuck,
>>> >>>>>>
>>> >>>>>>Thanks for your time on this, much appreciated.
>>> >>>>>>
>>> >>>>>>On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
>>> >>>>>>>On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
>>> >>>>>>>>Hi Chuck, hi all,
>>> >>>>>>>>
>>> >>>>>>>>[it was not ideal to pick one of the message for this 
>>> followup, let me
>>> >>>>>>>>know if you want a complete new thread, adding as well 
>>> Benjamin and
>>> >>>>>>>>Trond as they are involved in one mentioned patch]
>>> >>>>>>>>
>>> >>>>>>>>On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>>On Jun 17, 2024, at 2:55 AM, Harald Dunkel 
>>> <harald.dunkel@aixigo.com> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>>Hi folks,
>>> >>>>>>>>>>
>>> >>>>>>>>>>what would be the reason for nfsd getting stuck somehow and 
>>> becoming
>>> >>>>>>>>>>an unkillable process? See
>>> >>>>>>>>>>
>>> >>>>>>>>>>- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562 
>>> <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562>
>>> >>>>>>>>>>- https://bugs.launchpad.net/ubuntu/+source/nfs-utils/ 
>>> +bug/2062568 <https://bugs.launchpad.net/ubuntu/+source/nfs-utils/ 
>>> +bug/2062568>
>>> >>>>>>>>>>
>>> >>>>>>>>>>Doesn't this mean that something inside the kernel gets 
>>> stuck as
>>> >>>>>>>>>>well? Seems odd to me.
>>> >>>>>>>>>
>>> >>>>>>>>>I'm not familiar with the Debian or Ubuntu kernel packages. Can
>>> >>>>>>>>>the kernel release numbers be translated to LTS kernel releases
>>> >>>>>>>>>please? Need both "last known working" and "first broken" 
>>> releases.
>>> >>>>>>>>>
>>> >>>>>>>>>This:
>>> >>>>>>>>>
>>> >>>>>>>>>[ 6596.911785] RPC: Could not send backchannel reply error: 
>>> -110
>>> >>>>>>>>>[ 6596.972490] RPC: Could not send backchannel reply error: 
>>> -110
>>> >>>>>>>>>[ 6837.281307] RPC: Could not send backchannel reply error: 
>>> -110
>>> >>>>>>>>>
>>> >>>>>>>>>is a known set of client backchannel bugs. Knowing the LTS 
>>> kernel
>>> >>>>>>>>>releases (see above) will help us figure out what needs to be
>>> >>>>>>>>>backported to the LTS kernels kernels in question.
>>> >>>>>>>>>
>>> >>>>>>>>>This:
>>> >>>>>>>>>
>>> >>>>>>>>>[11183.290619] wait_for_completion+0x88/0x150
>>> >>>>>>>>>[11183.290623] __flush_workqueue+0x140/0x3e0
>>> >>>>>>>>>[11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
>>> >>>>>>>>>[11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
>>> >>>>>>>>>
>>> >>>>>>>>>is probably related to the backchannel errors on the client, 
>>> but
>>> >>>>>>>>>client bugs shouldn't cause the server to hang like this. We
>>> >>>>>>>>>might be able to say more if you can provide the kernel release
>>> >>>>>>>>>translations (see above).
>>> >>>>>>>>
>>> >>>>>>>>In Debian we hstill have the bug #1071562 open and one person 
>>> notified
>>> >>>>>>>>mye offlist that it appears that the issue get more frequent 
>>> since
>>> >>>>>>>>they updated on NFS client side from Ubuntu 20.04 to Debian 
>>> bookworm
>>> >>>>>>>>with a 6.1.y based kernel).
>>> >>>>>>>>
>>> >>>>>>>>Some people around those issues, seem to claim that the change
>>> >>>>>>>>mentioned in
>>> >>>>>>>>https://lists.proxmox.com/pipermail/pve-devel/2024- 
>>> July/064614.html <https://lists.proxmox.com/pipermail/pve-devel/2024- 
>>> July/064614.html>
>>> >>>>>>>>would fix the issue, which is as well backchannel related.
>>> >>>>>>>>
>>> >>>>>>>>This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
>>> >>>>>>>>again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
>>> >>>>>>>>nfs_client's rpc timeouts for backchannel") this is not 
>>> something
>>> >>>>>>>>which goes back to 6.1.y, could it be possible that hte 
>>> backchannel
>>> >>>>>>>>refactoring and this final fix indeeds fixes the issue?
>>> >>>>>>>>
>>> >>>>>>>>As people report it is not easily reproducible, so this makes it
>>> >>>>>>>>harder to identify fixes correctly.
>>> >>>>>>>>
>>> >>>>>>>>I gave a (short) stance on trying to backport commits up to
>>> >>>>>>>>6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but 
>>> this quickly
>>> >>>>>>>>seems to indicate it is probably still not the right thing for
>>> >>>>>>>>backporting to the older stable series.
>>> >>>>>>>>
>>> >>>>>>>>As at least pre-requisites:
>>> >>>>>>>>
>>> >>>>>>>>2009e32997ed568a305cf9bc7bf27d22e0f6ccda
>>> >>>>>>>>4119bd0306652776cb0b7caa3aea5b2a93aecb89
>>> >>>>>>>>163cdfca341b76c958567ae0966bd3575c5c6192
>>> >>>>>>>>f4afc8fead386c81fda2593ad6162271d26667f8
>>> >>>>>>>>6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
>>> >>>>>>>>57331a59ac0d680f606403eb24edd3c35aecba31
>>> >>>>>>>>
>>> >>>>>>>>and still there would be conflicting codepaths (and does not 
>>> seem
>>> >>>>>>>>right).
>>> >>>>>>>>
>>> >>>>>>>>Chuck, Benjamin, Trond, is there anything we can provive on 
>>> reporters
>>> >>>>>>>>side that we can try to tackle this issue better?
>>> >>>>>>>
>>> >>>>>>>As I've indicated before, NFSD should not block no matter what 
>>> the
>>> >>>>>>>client may or may not be doing. I'd like to focus on the 
>>> server first.
>>> >>>>>>>
>>> >>>>>>>What is the result of:
>>> >>>>>>>
>>> >>>>>>>$ cd <Bookworm's v6.1.90 kernel source >
>>> >>>>>>>$ unset KBUILD_OUTPUT
>>> >>>>>>>$ make -j `nproc`
>>> >>>>>>>$ scripts/faddr2line \
>>> >>>>>>> fs/nfsd/nfs4state.o \
>>> >>>>>>> nfsd4_destroy_session+0x16d
>>> >>>>>>>
>>> >>>>>>>Since this issue appeared after v6.1.1, is it possible to bisect
>>> >>>>>>>between v6.1.1 and v6.1.90 ?
>>> >>>>>>
>>> >>>>>>First please note, at least speaking of triggering the issue in
>>> >>>>>>Debian, Debian has moved to 6.1.119 based kernel already (and 
>>> soon in
>>> >>>>>>the weekends point release move to 6.1.123).
>>> >>>>>>
>>> >>>>>>That said, one of the users which regularly seems to be hit by the
>>> >>>>>>issue was able to provide the above requested information, 
>>> based for
>>> >>>>>>6.1.119:
>>> >>>>>>
>>> >>>>>>~/kernel/linux-source-6.1# make kernelversion
>>> >>>>>>6.1.119
>>> >>>>>>~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/ 
>>> nfs4state.o nfsd4_destroy_session+0x16d
>>> >>>>>>nfsd4_destroy_session+0x16d/0x250:
>>> >>>>>>__list_del_entry at /root/kernel/linux-source-6.1/./include/ 
>>> linux/list.h:134
>>> >>>>>>(inlined by) list_del at /root/kernel/linux-source-6.1/./ 
>>> include/linux/list.h:148
>>> >>>>>>(inlined by) unhash_session at /root/kernel/linux-source-6.1/ 
>>> fs/ nfsd/nfs4state.c:2062
>>> >>>>>>(inlined by) nfsd4_destroy_session at /root/kernel/linux- 
>>> source-6.1/fs/nfsd/nfs4state.c:3856
>>> >>>>>>
>>> >>>>>>They could provide as well a decode_stacktrace output for the 
>>> recent
>>> >>>>>>hit (if that is helpful for you):
>>> >>>>>>
>>> >>>>>>[Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for 
>>> more than 6883 seconds.
>>> >>>>>>[Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 
>>> Debian 6.1.119-1
>>> >>>>>>[Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/ 
>>> hung_task_timeout_secs" disables this message.
>>> >>>>>>[Mon Jan 6 13:25:28 2025] task:nfsd            state:D 
>>> stack:0     pid:55306 ppid:2      flags:0x00004000
>>> >>>>>>[Mon Jan 6 13:25:28 2025] Call Trace:
>>> >>>>>>[Mon Jan 6 13:25:28 2025]  <TASK>
>>> >>>>>>[Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
>>> >>>>>>[Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
>>> >>>>>>[Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
>>> >>>>>>[Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
>>> >>>>>>[Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
>>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/ 
>>> build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/ 
>>> build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
>>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/ 
>>> build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
>>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/ 
>>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
>>> >>>>>>[Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/ 
>>> build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
>>> >>>>>>[Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/ 
>>> build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
>>> >>>>>>[Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/ 
>>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
>>> >>>>>>[Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/ 
>>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
>>> >>>>>>[Mon Jan 6 13:25:28 2025] svc_process (debian/build/ 
>>> build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
>>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd (debian/build/ 
>>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
>>> >>>>>>[Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
>>> >>>>>>[Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
>>> >>>>>>[Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
>>> >>>>>>[Mon Jan  6 13:25:28 2025]  </TASK>
>>> >>>>>>
>>> >>>>>>The question about bisection is actually harder, those are 
>>> production
>>> >>>>>>systems and I understand it's not possible to do bisect there, 
>>> while
>>> >>>>>>unfortunately not reprodcing the issue on "lab conditions".
>>> >>>>>>
>>> >>>>>>Does the above give us still a clue on what you were looking for?
>>> >>>>>
>>> >>>>>Thanks for the update.
>>> >>>>>
>>> >>>>>It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into 
>>> struct
>>> >>>>>nfs4_client"), while not a specific fix for this issue, might 
>>> offer some
>>> >>>>>relief by preventing the DESTROY_SESSION hang from affecting all 
>>> other
>>> >>>>>clients of the degraded server.
>>> >>>>>
>>> >>>>>Not having a reproducer or the ability to bisect puts a damper on
>>> >>>>>things. The next step, then, is to enable tracing on servers 
>>> where this
>>> >>>>>issue can come up, and wait for the hang to occur. The following 
>>> command
>>> >>>>>captures information only about callback operation, so it should 
>>> not
>>> >>>>>generate much data or impact server performance.
>>> >>>>>
>>> >>>>>    # trace-cmd record -e nfsd:nfsd_cb\*
>>> >>>>>
>>> >>>>>Let that run until the problem occurs, then ^C, compress the 
>>> resulting
>>> >>>>>trace.dat file, and either attach it to 1071562 or email it to me
>>> >>>>>privately.
>>> >>>>thanks for the follow-up.
>>> >>>>
>>> >>>>I am the "customer" with two affected file servers. We have since 
>>> moved those
>>> >>>>servers to the backports kernel (6.11.10) in the hope of forward 
>>> fixing the
>>> >>>>issue. If this kernel is stable, I'm afraid I can't go back to 
>>> the 'bad'
>>> >>>>kernel (700+ researchers affected..), and we're also not able to 
>>> trigger the
>>> >>>>issue on our test environment.
>>> >>>
>>> >>>Hello Dr. Herzog -
>>> >>>
>>> >>>If the problem recurs on 6.11, the trace-cmd I suggest above works
>>> >>>there as well.
>>> >>the bad news is: it just happened again with the bpo kernel.
>>> >>
>>> >>We then immediately started trace-cmd since usually there are 
>>> several call
>>> >>traces in a row and we did get a trace.dat:
>>> >>http://people.phys.ethz.ch/~daduke/trace.dat <http:// 
>>> people.phys.ethz.ch/~daduke/trace.dat>
>>> >>
>>> >>we did notice however that the stack trace looked a bit different 
>>> this time:
>>> >>
>>> >>     INFO: task nfsd:2566 blocked for more than 5799 seconds.
>>> >>     Tainted: G        W          6.11.10+bpo-amd64 #1 Debia>
>>> >>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables t>
>>> >>     task:nfsd            state:D stack:0     pid:2566  tgid:2566 >
>>> >>     Call Trace:
>>> >>     <TASK>
>>> >>     __schedule+0x400/0xad0
>>> >>     schedule+0x27/0xf0
>>> >>     nfsd4_shutdown_callback+0xfe/0x140 [nfsd]
>>> >>     ? __pfx_var_wake_function+0x10/0x10
>>> >>     __destroy_client+0x1f0/0x290 [nfsd]
>>> >>     nfsd4_destroy_clientid+0xf1/0x1e0 [nfsd]
>>> >>     ? svcauth_unix_set_client+0x586/0x5f0 [sunrpc]
>>> >>     nfsd4_proc_compound+0x34d/0x670 [nfsd]
>>> >>     nfsd_dispatch+0xfd/0x220 [nfsd]
>>> >>     svc_process_common+0x2f7/0x700 [sunrpc]
>>> >>     ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>> >>     svc_process+0x131/0x180 [sunrpc]
>>> >>     svc_recv+0x830/0xa10 [sunrpc]
>>> >>     ? __pfx_nfsd+0x10/0x10 [nfsd]
>>> >>     nfsd+0x87/0xf0 [nfsd]
>>> >>     kthread+0xcf/0x100
>>> >>     ? __pfx_kthread+0x10/0x10
>>> >>     ret_from_fork+0x31/0x50
>>> >>     ? __pfx_kthread+0x10/0x10
>>> >>     ret_from_fork_asm+0x1a/0x30
>>> >>     </TASK>
>>> >>
>>> >>and also the state of the offending client in `/proc/fs/nfsd/ 
>>> clients/*/info`
>>> >>used to be callback state: UNKNOWN while now it is DOWN or FAULT. 
>>> No idea
>>> >>what it means, but I thought it was worth mentioning.
>>> >>
>>> >
>>> >Looks like this is hung in nfsd41_cb_inflight_wait_complete() ? That
>>> >probably means that the cl_cb_inflight counter is stuck at >0. I'm
>>> >guessing that means that there is some callback that it's expecting to
>>> >complete that isn't. From nfsd4_shutdown_callback():
>>> >
>>> >         /*
>>> >          * Note this won't actually result in a null callback;
>>> >          * instead, nfsd4_run_cb_null() will detect the killed
>>> >          * client, destroy the rpc client, and stop:
>>> >          */
>>> >         nfsd4_run_cb(&clp->cl_cb_null);
>>> >         flush_workqueue(clp->cl_callback_wq);
>>> >         nfsd41_cb_inflight_wait_complete(clp);
>>> >
>>> >...it sounds like that isn't happening properly though.
>>> >
>>> >It might be interesting to see if you can track down the callback
>>> >client in /sys/kernel/debug/sunrpc and see what it's doing.
>>>
>>> Here's a possible scenario:
>>>
>>> The trace.dat shows the server is sending a lot of CB_RECALL_ANY
>>> operations.
>>>
>>> Normally a callback occurs only due to some particular client request.
>>> Such requests would be completed before a client unmounts.
>>>
>>> CB_RECALL_ANY is an operation which does not occur due to a particular
>>> client operation and can occur at any time. I believe this is the
>>> first operation of this type we've added to NFSD.
>>>
>>> My guess is that the server's laundromat has sent some CB_RECALL_ANY
>>> operations while the client is umounting -- DESTROY_SESSION is
>>> racing with those callback operations.
>>>
>>> deleg_reaper() was backported to 6.1.y in 6.1.81, which lines up
>>> more or less with when the issues started to show up.
>>>
>>> A quick way to test this theory would be to make deleg_reaper() a
>>> no-op. If this helps, then we know that shutting down the server's
>>> callback client needs to be more careful about cleaning up outstanding
>>> callbacks.
>>>
>>>
>>> -- 
>>> Chuck Lever
>>
>> Hello.
>>
>> Thanks a lot for all of this work !
>>
>> I have started recording traces using :
>>
>> # trace-cmd record -e nfsd:nfsd_cb\*
>>
>> and currently waiting for the bug to trigger. This generally happen in 
>> one/two weeks for me.
>>
>> If you want I adjust some recording parameters let me know.
> 
> After looking at your data and Rik's data, I decided to try reproducing
> the problem in my lab. No luck so far.
> 
> The signatures of these issues are slightly different from each other:
> 
> v6.1:
> - nfsd4_destroy_session hangs on the flush_workqueue in
>    nfsd4_shutdown_callback()
> - There might be outstanding callback RPCs that are not exiting
> - Lots of CB_RECALL_ANY traffic. Only place that's generated in
>    v6.1 is the NFSD shrinker when memory is short
> - There appear to be some unreachable clients -- evidence of client
>    crashes or network outages, perhaps
> 
> v6.12:
> - destroy_client() hangs on the wait_var_event in
>    nfsd4_shtudown_callback()
> - There are no callback RPCs left after the hang
> - There are some stuck clients listed under /proc/fs/nfsd/
> 
> Similar, but not the same.
> 
> I'm still open to clues, so please continue the regimen of starting
> the trace-cmd when the server boots, and doing an "echo t >
> /proc/sysrq-trigger" when the hang appears.
> 
>   # trace-cmd record -e nfsd:nfsd_cb\* -e sunrpc:svc_xprt_detach \
>     -p function -l nfsd4_destroy_session
> 
> 

Hello Chuck.

Sorry for my late reply but I need to find something
interesting for you.

Did you see my call trace analyze here ?

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562

On v6.1 did you see that nfsd4_shutdown_callback is reached from
both nfsd4 laundromat_main and nfsd4_state_shrinker_worker ?
For me the hang in V6.1 come also from wait_var_event.

Here the state of my RAM on my server with some load.

                total        used        free      shared  buff/cache 
available
Mem:           32083        4982         651           5       26919 
   27101
Swap:          16629         842       15787

My desktop clients use heavily suspend. The computer are almost
never rebooted. When a user logout, suspending is automatically 
launched. Do you think that some delegation can be accumulated,
locked by many suspended host, and can cause the
high memory/cache usage ?

Here my crash timeline between the desktop client salleprof13
(the one you think is unreachable) and the server fichdc02.
The failure come from suspend.


---
Client salleprof13 wake from sleep at 09:55:37
and was attached by the server.
The callback channel is initialized.
---

janv. 14 07:20:32 salleprofs13 kernel: PM: suspend entry (deep)

janv. 14 09:55:37 salleprofs13 kernel: PM: suspend exit
janv. 14 09:55:37 salleprofs13 ModemManager[621]: <info> 
[sleep-monitor-systemd] system is resuming
janv. 14 09:55:37 salleprofs13 NetworkManager[576]: <info> 
[1736844937.2382] manager: sleep: wake requested (sleeping: yes 
enabled: yes)
janv. 14 09:55:37 salleprofs13 systemd[1]: Stopping gssd-bug.service - 
Gssd credential cache hook...
janv. 14 09:55:37 salleprofs13 systemd[1]: Starting winbind.service - 
Samba Winbind Daemon...
janv. 14 09:55:37 salleprofs13 systemd[1]: Started winbind.service - 
Samba Winbind Daemon.
janv. 14 09:55:40 salleprofs13 NetworkManager[576]: <info> 
[1736844940.6304] device (enp2s0): carrier: link connected
janv. 14 09:55:40 salleprofs13 kernel: r8169 0000:02:00.0 enp2s0: Link 
is Up - 1Gbps/Full - flow control off

nfsd-1695  [047] ..... 190512.862590: nfsd_cb_probe: 
addr=172.16.118.121:0 client 67833e73:4dd32903 state=UNKNOWN
kworker/u96:8-225529 [047] ..... 190512.862782: nfsd_cb_setup: 
addr=172.16.118.121:0 client 67833e73:4dd32903 proto=tcp flavor=sys
kworker/u96:8-225529 [047] ..... 190512.862784: nfsd_cb_state: 
addr=172.16.118.121:0 client 67833e73:4dd32903 state=UP
2025-01-14T09:55:48.290546+01:00 fichdc02 rpc.mountd[1363]: v4.2 client 
attached: 0x4dd3290367833e73 from "172.16.118.121:825"


---
Someone logon on the station.
Immediately the server start sending nfsd_cb_recall_any.
Strange no ? What can be recalled on that host that just
wake up...
---

janv. 14 09:55:53 salleprofs13 nfsrahead[121742]: setting /dnfs/wine 
readahead to 128
janv. 14 09:56:13 salleprofs13 systemd-logind[569]: New session 36 of 
user xxxxxxxx.

kworker/u96:3-227781 [047] ..... 190541.974364: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:2-220501 [013] ..... 190541.977732: nfsd_cb_recall_any_done: 
client 67833e73:4dd32903 status=0
kworker/u96:2-220501 [000] ..... 190550.387742: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:0-228547 [003] ..... 190550.389602: nfsd_cb_recall_any_done: 
client 67833e73:4dd32903 status=0
kworker/u96:2-220501 [021] ..... 190565.367759: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:1-230409 [000] ..... 190565.371391: nfsd_cb_recall_any_done: 
client 67833e73:4dd32903 status=0
kworker/u96:2-220501 [013] ..... 190579.748430: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:8-225529 [026] ..... 190579.749946: nfsd_cb_recall_any_done: 
client 67833e73:4dd32903 status=0
kworker/u96:8-225529 [013] ..... 190592.749738: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:2-220501 [001] ..... 190592.751192: nfsd_cb_recall_any_done: 
client 67833e73:4dd32903 status=0
kworker/u96:0-228547 [000] ..... 190599.849333: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:0-228547 [000] ..... 190599.850872: nfsd_cb_recall_any_done: 
client 67833e73:4dd32903 status=0
kworker/u96:2-220501 [045] ..... 190686.760138: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:1-230409 [014] ..... 190686.761181: nfsd_cb_recall_any_done: 
client 67833e73:4dd32903 status=0
kworker/u96:1-230409 [013] ..... 190694.103480: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:8-225529 [022] ..... 190694.106120: nfsd_cb_recall_any_done: 
client 67833e73:4dd32903 status=0
kworker/u96:8-225529 [000] ..... 190749.678378: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:2-220501 [007] ..... 190749.679870: nfsd_cb_recall_any_done: 
client 67833e73:4dd32903 status=0
kworker/u96:4-230705 [014] ..... 190782.830144: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:1-230409 [010] ..... 190782.831220: nfsd_cb_recall_any_done: 
client 67833e73:4dd32903 status=0


---
Next the user suspend the host maybe by pressing
the power button without login out so the last
nfsd_cb_recall_any fail
Maybe the "unrecognized reply" related ?
---

janv. 14 10:01:18 salleprofs13 systemd-logind[569]: The system will 
suspend now!
janv. 14 10:01:18 salleprofs13 NetworkManager[576]: <info> 
[1736845278.5631] manager: sleep: sleep requested (sleeping: no 
enabled: yes)
janv. 14 10:01:18 salleprofs13 ModemManager[621]: <info> 
[sleep-monitor-systemd] system is about to suspend
janv. 14 10:01:20 salleprofs13 kernel: PM: suspend entry (deep)

kworker/u96:2-220501 [047] ..... 190851.843698: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
kworker/u96:2-220501 [047] ..... 190861.062828: nfsd_cb_state: 
addr=172.16.118.121:0 client 67833e73:4dd32903 state=FAULT

2025-01-14T10:06:06.586140+01:00 fichdc02 kernel: [191131.106081] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
00000000ad57929d xid 52751431


---
Got 5 unrecognized reply.
Seems not related .
---

2025-01-14T10:06:56.762137+01:00 fichdc02 kernel: [191181.279788] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
000000002dcd99c8 xid 5d819d7d
2025-01-14T10:06:56.762197+01:00 fichdc02 kernel: [191181.280005] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
000000002dcd99c8 xid 61819d7d
2025-01-14T10:06:56.762201+01:00 fichdc02 kernel: [191181.280054] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
000000002dcd99c8 xid 60819d7d
2025-01-14T10:06:56.762206+01:00 fichdc02 kernel: [191181.280223] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
000000002dcd99c8 xid 5f819d7d
2025-01-14T10:06:56.762209+01:00 fichdc02 kernel: [191181.280364] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
000000002dcd99c8 xid 5e819d7d


---
Just 10 minutes after suspend the host wake up (with xxxxxxxx still 
logged on).
The gssd-bug.service is a custom script that kill all the user processes
if the kerberos cache file has expired. This is due to a rpc.gssd bug
that hammer the server when the cache has expired during suspend.
It run "killall -u xxxxxxxx". But here, I don't know if the killall
was called or not.

Just after the wake. The server send again an nfsd_cb_recall_any. But 
this time
there is no nfsd_cb_state with state=FAULT. From this time there no more 
kworker
cb calls. I think this where the hang start.

And we got again an unrecognized reply.
---

janv. 14 10:10:24 salleprofs13 kernel: PM: suspend exit
janv. 14 10:10:24 salleprofs13 systemd-sleep[124910]: System returned 
from sleep state.
janv. 14 10:10:24 salleprofs13 systemd[1]: Stopping gssd-bug.service - 
Gssd credential cache hook...
janv. 14 10:10:24 salleprofs13 NetworkManager[576]: <info> 
[1736845824.4117] manager: sleep: wake requested (sleeping: yes 
enabled: yes)
janv. 14 10:10:24 salleprofs13 check-gssd-cache[124962]: id: 
« xxxxxxxx » : the user don't exist
janv. 14 10:10:24 salleprofs13 systemd[1]: Started winbind.service - 
Samba Winbind Daemon.
janv. 14 10:10:27 salleprofs13 kernel: r8169 0000:02:00.0 enp2s0: Link 
is Up - 1Gbps/Full - flow control off
janv. 14 10:10:27 salleprofs13 NetworkManager[576]: <info> 
[1736845827.6412] device (enp2s0): carrier: link connected
janv. 14 10:10:35 salleprofs13 systemd[1]: Stopped gssd-bug.service - 
Gssd credential cache hook.

kworker/u96:2-220501 [012] ..... 191417.544355: nfsd_cb_recall_any: 
addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG

2025-01-14T10:10:58.426113+01:00 fichdc02 kernel: [191422.934976] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
000000007205cdb2 xid ee91a9d8


---
nfsd probe a failed State. Normal.
But as kworker hang, it can't reinit the callback channel
after the nfsd_cb_state "state=UNKNOWN".

I don't know why but nfsd started also
to send nfsd_cb_recall...
Note that is less than 120 second before the "BLOCKED" message.
so the thread may be already locked.
---

nfsd-1690  [014] ..... 191424.483249: nfsd_cb_probe: 
addr=172.16.118.121:0 client 67833e73:4dd32903 state=FAULT
nfsd-1690  [014] ..... 191424.483252: nfsd_cb_state: 
addr=172.16.118.121:0 client 67833e73:4dd32903 state=UNKNOWN
nfsd-1695  [013] ...1. 191441.265047: nfsd_cb_recall: 
addr=172.16.118.121:0 client 67833e73:4dd32903 stateid 00001c6e:00000001
nfsd-1695  [014] ...1. 191441.646104: nfsd_cb_recall: 
addr=172.16.118.121:0 client 67833e73:4dd32903 stateid 000015fc:00000001


---
Got some strange logs on the client
And 7 unrecognized reply on the server that seems not related.
---

janv. 14 10:11:00 salleprofs13 PackageKit[122249]: daemon quit
janv. 14 10:11:00 salleprofs13 systemd[1]: packagekit.service: 
Deactivated successfully.
janv. 14 10:11:00 salleprofs13 systemd[1]: packagekit.service: Consumed 
28.223s CPU time.
janv. 14 10:11:02 salleprofs13 gnome-shell[122127]: JS ERROR: 
Gio.IOErrorEnum: Erreur lors de l’appel de StartServiceByName pour 
org.vmware.viewagent.Credentials : Le délai d’>
  
_injectToMethod/klass[method]@resource:///org/gnome/gjs/modules/core/overrides/Gio.js:287:25
  
VmwareCredentials@resource:///org/gnome/shell/gdm/vmware.js:34:10
  
VmwareCredentialsManager@resource:///org/gnome/shell/gdm/vmware.js:41:29
  
getVmwareCredentialsManager@resource:///org/gnome/shell/gdm/vmware.js:51:37
  
ShellUserVerifier@resource:///org/gnome/shell/gdm/util.js:183:64
  
_init@resource:///org/gnome/shell/gdm/authPrompt.js:74:30
  
AuthPrompt@resource:///org/gnome/shell/gdm/authPrompt.js:51:4
  
_ensureAuthPrompt@resource:///org/gnome/shell/ui/unlockDialog.js:686:32
  
_showPrompt@resource:///org/gnome/shell/ui/unlockDialog.js:725:14
  
vfunc_key_press_event@resource:///org/gnome/shell/ui/unlockDialog.js:620:14
janv. 14 10:11:05 salleprofs13 gsd-power[122329]: Error setting property 
'PowerSaveMode' on interface org.gnome.Mutter.DisplayConfig: Le délai 
d’attente est dépassé (g-io-erro>
janv. 14 10:11:27 salleprofs13 gnome-shell[122127]: JS ERROR: 
Gio.IOErrorEnum: Erreur lors de l’appel de StartServiceByName pour 
org.vmware.viewagent.Credentials : Le délai d’>
  
_injectToMethod/klass[method]@resource:///org/gnome/gjs/modules/core/overrides/Gio.js:287:25
  
VmwareCredentials@resource:///org/gnome/shell/gdm/vmware.js:34:10
  
VmwareCredentialsManager@resource:///org/gnome/shell/gdm/vmware.js:41:29
  
getVmwareCredentialsManager@resource:///org/gnome/shell/gdm/vmware.js:51:37
  
ShellUserVerifier@resource:///org/gnome/shell/gdm/util.js:183:64
  
_init@resource:///org/gnome/shell/gdm/authPrompt.js:74:30
  
AuthPrompt@resource:///org/gnome/shell/gdm/authPrompt.js:51:4
  
_ensureAuthPrompt@resource:///org/gnome/shell/ui/unlockDialog.js:686:32
  
_showPrompt@resource:///org/gnome/shell/ui/unlockDialog.js:725:14
  
vfunc_key_press_event@resource:///org/gnome/shell/ui/unlockDialog.js:620:14



2025-01-14T10:12:36.730114+01:00 fichdc02 kernel: [191521.235822] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
00000000191d4459 xid d591b1ee
2025-01-14T10:12:36.730165+01:00 fichdc02 kernel: [191521.236547] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
00000000191d4459 xid d691b1ee
2025-01-14T10:12:36.730169+01:00 fichdc02 kernel: [191521.236666] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
00000000191d4459 xid da91b1ee
2025-01-14T10:12:36.730171+01:00 fichdc02 kernel: [191521.236743] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
00000000191d4459 xid db91b1ee
2025-01-14T10:12:36.730172+01:00 fichdc02 kernel: [191521.236769] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
00000000191d4459 xid d991b1ee
2025-01-14T10:12:36.730174+01:00 fichdc02 kernel: [191521.236824] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
00000000191d4459 xid d891b1ee
2025-01-14T10:12:36.730177+01:00 fichdc02 kernel: [191521.236840] 
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
00000000191d4459 xid d791b1ee


---
And finally the hang syslog message
---

2025-01-14T10:14:29.370164+01:00 fichdc02 kernel: [191633.871204] INFO: 
task nfsd:1690 blocked for more than 120 seconds.
2025-01-14T10:14:29.374846+01:00 fichdc02 kernel: [191633.875410] INFO: 
task nfsd:1691 blocked for more than 120 seconds.
2025-01-14T10:14:29.374966+01:00 fichdc02 kernel: [191633.878106] INFO: 
task nfsd:1693 blocked for more than 120 seconds.
2025-01-14T10:14:29.378303+01:00 fichdc02 kernel: [191633.880748] INFO: 
task nfsd:1695 blocked for more than 120 seconds.


Hope this help.

I'm still recording.

Regards,

Baptiste.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-17 19:43                         ` Baptiste PELLEGRIN
@ 2025-01-17 20:27                           ` Chuck Lever
  0 siblings, 0 replies; 31+ messages in thread
From: Chuck Lever @ 2025-01-17 20:27 UTC (permalink / raw)
  To: Baptiste PELLEGRIN, Rik Theys, Christian Herzog
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Jeff Layton

On 1/17/25 2:43 PM, Baptiste PELLEGRIN wrote:
> 
> 
> On 1/16/25 9:07 PM, Chuck Lever wrote:
>> On 1/11/25 3:14 AM, Pellegrin Baptiste wrote:
>>> On 09/01/25 17:32, *Chuck Lever * <chuck.lever@oracle.com> wrote:
>>>> On 1/9/25 7:42 AM, Jeff Layton wrote:
>>>> >On Thu, 2025-01-09 at 12:56 +0100, Christian Herzog wrote:
>>>> >>Dear Chuck,
>>>> >>
>>>> >>On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
>>>> >>>On 1/8/25 9:54 AM, Christian Herzog wrote:
>>>> >>>>Dear Chuck,
>>>> >>>>
>>>> >>>>On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
>>>> >>>>>On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
>>>> >>>>>>Hi Chuck,
>>>> >>>>>>
>>>> >>>>>>Thanks for your time on this, much appreciated.
>>>> >>>>>>
>>>> >>>>>>On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
>>>> >>>>>>>On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
>>>> >>>>>>>>Hi Chuck, hi all,
>>>> >>>>>>>>
>>>> >>>>>>>>[it was not ideal to pick one of the message for this 
>>>> followup, let me
>>>> >>>>>>>>know if you want a complete new thread, adding as well 
>>>> Benjamin and
>>>> >>>>>>>>Trond as they are involved in one mentioned patch]
>>>> >>>>>>>>
>>>> >>>>>>>>On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III 
>>>> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>>On Jun 17, 2024, at 2:55 AM, Harald Dunkel 
>>>> <harald.dunkel@aixigo.com> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>Hi folks,
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>what would be the reason for nfsd getting stuck somehow 
>>>> and becoming
>>>> >>>>>>>>>>an unkillable process? See
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>- https://bugs.debian.org/cgi-bin/bugreport.cgi? 
>>>> bug=1071562 <https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562>
>>>> >>>>>>>>>>- https://bugs.launchpad.net/ubuntu/+source/nfs-utils/ 
>>>> +bug/2062568 <https://bugs.launchpad.net/ubuntu/+source/nfs-utils/ 
>>>> +bug/2062568>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>Doesn't this mean that something inside the kernel gets 
>>>> stuck as
>>>> >>>>>>>>>>well? Seems odd to me.
>>>> >>>>>>>>>
>>>> >>>>>>>>>I'm not familiar with the Debian or Ubuntu kernel packages. 
>>>> Can
>>>> >>>>>>>>>the kernel release numbers be translated to LTS kernel 
>>>> releases
>>>> >>>>>>>>>please? Need both "last known working" and "first broken" 
>>>> releases.
>>>> >>>>>>>>>
>>>> >>>>>>>>>This:
>>>> >>>>>>>>>
>>>> >>>>>>>>>[ 6596.911785] RPC: Could not send backchannel reply error: 
>>>> -110
>>>> >>>>>>>>>[ 6596.972490] RPC: Could not send backchannel reply error: 
>>>> -110
>>>> >>>>>>>>>[ 6837.281307] RPC: Could not send backchannel reply error: 
>>>> -110
>>>> >>>>>>>>>
>>>> >>>>>>>>>is a known set of client backchannel bugs. Knowing the LTS 
>>>> kernel
>>>> >>>>>>>>>releases (see above) will help us figure out what needs to be
>>>> >>>>>>>>>backported to the LTS kernels kernels in question.
>>>> >>>>>>>>>
>>>> >>>>>>>>>This:
>>>> >>>>>>>>>
>>>> >>>>>>>>>[11183.290619] wait_for_completion+0x88/0x150
>>>> >>>>>>>>>[11183.290623] __flush_workqueue+0x140/0x3e0
>>>> >>>>>>>>>[11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
>>>> >>>>>>>>>[11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
>>>> >>>>>>>>>
>>>> >>>>>>>>>is probably related to the backchannel errors on the 
>>>> client, but
>>>> >>>>>>>>>client bugs shouldn't cause the server to hang like this. We
>>>> >>>>>>>>>might be able to say more if you can provide the kernel 
>>>> release
>>>> >>>>>>>>>translations (see above).
>>>> >>>>>>>>
>>>> >>>>>>>>In Debian we hstill have the bug #1071562 open and one 
>>>> person notified
>>>> >>>>>>>>mye offlist that it appears that the issue get more frequent 
>>>> since
>>>> >>>>>>>>they updated on NFS client side from Ubuntu 20.04 to Debian 
>>>> bookworm
>>>> >>>>>>>>with a 6.1.y based kernel).
>>>> >>>>>>>>
>>>> >>>>>>>>Some people around those issues, seem to claim that the change
>>>> >>>>>>>>mentioned in
>>>> >>>>>>>>https://lists.proxmox.com/pipermail/pve-devel/2024- 
>>>> July/064614.html <https://lists.proxmox.com/pipermail/pve- 
>>>> devel/2024- July/064614.html>
>>>> >>>>>>>>would fix the issue, which is as well backchannel related.
>>>> >>>>>>>>
>>>> >>>>>>>>This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
>>>> >>>>>>>>again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use 
>>>> the
>>>> >>>>>>>>nfs_client's rpc timeouts for backchannel") this is not 
>>>> something
>>>> >>>>>>>>which goes back to 6.1.y, could it be possible that hte 
>>>> backchannel
>>>> >>>>>>>>refactoring and this final fix indeeds fixes the issue?
>>>> >>>>>>>>
>>>> >>>>>>>>As people report it is not easily reproducible, so this 
>>>> makes it
>>>> >>>>>>>>harder to identify fixes correctly.
>>>> >>>>>>>>
>>>> >>>>>>>>I gave a (short) stance on trying to backport commits up to
>>>> >>>>>>>>6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but 
>>>> this quickly
>>>> >>>>>>>>seems to indicate it is probably still not the right thing for
>>>> >>>>>>>>backporting to the older stable series.
>>>> >>>>>>>>
>>>> >>>>>>>>As at least pre-requisites:
>>>> >>>>>>>>
>>>> >>>>>>>>2009e32997ed568a305cf9bc7bf27d22e0f6ccda
>>>> >>>>>>>>4119bd0306652776cb0b7caa3aea5b2a93aecb89
>>>> >>>>>>>>163cdfca341b76c958567ae0966bd3575c5c6192
>>>> >>>>>>>>f4afc8fead386c81fda2593ad6162271d26667f8
>>>> >>>>>>>>6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
>>>> >>>>>>>>57331a59ac0d680f606403eb24edd3c35aecba31
>>>> >>>>>>>>
>>>> >>>>>>>>and still there would be conflicting codepaths (and does not 
>>>> seem
>>>> >>>>>>>>right).
>>>> >>>>>>>>
>>>> >>>>>>>>Chuck, Benjamin, Trond, is there anything we can provive on 
>>>> reporters
>>>> >>>>>>>>side that we can try to tackle this issue better?
>>>> >>>>>>>
>>>> >>>>>>>As I've indicated before, NFSD should not block no matter 
>>>> what the
>>>> >>>>>>>client may or may not be doing. I'd like to focus on the 
>>>> server first.
>>>> >>>>>>>
>>>> >>>>>>>What is the result of:
>>>> >>>>>>>
>>>> >>>>>>>$ cd <Bookworm's v6.1.90 kernel source >
>>>> >>>>>>>$ unset KBUILD_OUTPUT
>>>> >>>>>>>$ make -j `nproc`
>>>> >>>>>>>$ scripts/faddr2line \
>>>> >>>>>>> fs/nfsd/nfs4state.o \
>>>> >>>>>>> nfsd4_destroy_session+0x16d
>>>> >>>>>>>
>>>> >>>>>>>Since this issue appeared after v6.1.1, is it possible to bisect
>>>> >>>>>>>between v6.1.1 and v6.1.90 ?
>>>> >>>>>>
>>>> >>>>>>First please note, at least speaking of triggering the issue in
>>>> >>>>>>Debian, Debian has moved to 6.1.119 based kernel already (and 
>>>> soon in
>>>> >>>>>>the weekends point release move to 6.1.123).
>>>> >>>>>>
>>>> >>>>>>That said, one of the users which regularly seems to be hit by 
>>>> the
>>>> >>>>>>issue was able to provide the above requested information, 
>>>> based for
>>>> >>>>>>6.1.119:
>>>> >>>>>>
>>>> >>>>>>~/kernel/linux-source-6.1# make kernelversion
>>>> >>>>>>6.1.119
>>>> >>>>>>~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/ 
>>>> nfs4state.o nfsd4_destroy_session+0x16d
>>>> >>>>>>nfsd4_destroy_session+0x16d/0x250:
>>>> >>>>>>__list_del_entry at /root/kernel/linux-source-6.1/./include/ 
>>>> linux/list.h:134
>>>> >>>>>>(inlined by) list_del at /root/kernel/linux-source-6.1/./ 
>>>> include/linux/list.h:148
>>>> >>>>>>(inlined by) unhash_session at /root/kernel/linux-source-6.1/ 
>>>> fs/ nfsd/nfs4state.c:2062
>>>> >>>>>>(inlined by) nfsd4_destroy_session at /root/kernel/linux- 
>>>> source-6.1/fs/nfsd/nfs4state.c:3856
>>>> >>>>>>
>>>> >>>>>>They could provide as well a decode_stacktrace output for the 
>>>> recent
>>>> >>>>>>hit (if that is helpful for you):
>>>> >>>>>>
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for 
>>>> more than 6883 seconds.
>>>> >>>>>>[Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 
>>>> Debian 6.1.119-1
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/ 
>>>> hung_task_timeout_secs" disables this message.
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] task:nfsd            state:D 
>>>> stack:0     pid:55306 ppid:2      flags:0x00004000
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] Call Trace:
>>>> >>>>>>[Mon Jan 6 13:25:28 2025]  <TASK>
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/ 
>>>> build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/ 
>>>> build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/ 
>>>> build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/ 
>>>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/ 
>>>> build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/ 
>>>> build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/ 
>>>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/ 
>>>> build/ build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] svc_process (debian/build/ 
>>>> build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] nfsd (debian/build/ 
>>>> build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
>>>> >>>>>>[Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
>>>> >>>>>>[Mon Jan  6 13:25:28 2025]  </TASK>
>>>> >>>>>>
>>>> >>>>>>The question about bisection is actually harder, those are 
>>>> production
>>>> >>>>>>systems and I understand it's not possible to do bisect there, 
>>>> while
>>>> >>>>>>unfortunately not reprodcing the issue on "lab conditions".
>>>> >>>>>>
>>>> >>>>>>Does the above give us still a clue on what you were looking for?
>>>> >>>>>
>>>> >>>>>Thanks for the update.
>>>> >>>>>
>>>> >>>>>It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into 
>>>> struct
>>>> >>>>>nfs4_client"), while not a specific fix for this issue, might 
>>>> offer some
>>>> >>>>>relief by preventing the DESTROY_SESSION hang from affecting 
>>>> all other
>>>> >>>>>clients of the degraded server.
>>>> >>>>>
>>>> >>>>>Not having a reproducer or the ability to bisect puts a damper on
>>>> >>>>>things. The next step, then, is to enable tracing on servers 
>>>> where this
>>>> >>>>>issue can come up, and wait for the hang to occur. The 
>>>> following command
>>>> >>>>>captures information only about callback operation, so it 
>>>> should not
>>>> >>>>>generate much data or impact server performance.
>>>> >>>>>
>>>> >>>>>    # trace-cmd record -e nfsd:nfsd_cb\*
>>>> >>>>>
>>>> >>>>>Let that run until the problem occurs, then ^C, compress the 
>>>> resulting
>>>> >>>>>trace.dat file, and either attach it to 1071562 or email it to me
>>>> >>>>>privately.
>>>> >>>>thanks for the follow-up.
>>>> >>>>
>>>> >>>>I am the "customer" with two affected file servers. We have 
>>>> since moved those
>>>> >>>>servers to the backports kernel (6.11.10) in the hope of forward 
>>>> fixing the
>>>> >>>>issue. If this kernel is stable, I'm afraid I can't go back to 
>>>> the 'bad'
>>>> >>>>kernel (700+ researchers affected..), and we're also not able to 
>>>> trigger the
>>>> >>>>issue on our test environment.
>>>> >>>
>>>> >>>Hello Dr. Herzog -
>>>> >>>
>>>> >>>If the problem recurs on 6.11, the trace-cmd I suggest above works
>>>> >>>there as well.
>>>> >>the bad news is: it just happened again with the bpo kernel.
>>>> >>
>>>> >>We then immediately started trace-cmd since usually there are 
>>>> several call
>>>> >>traces in a row and we did get a trace.dat:
>>>> >>http://people.phys.ethz.ch/~daduke/trace.dat <http:// 
>>>> people.phys.ethz.ch/~daduke/trace.dat>
>>>> >>
>>>> >>we did notice however that the stack trace looked a bit different 
>>>> this time:
>>>> >>
>>>> >>     INFO: task nfsd:2566 blocked for more than 5799 seconds.
>>>> >>     Tainted: G        W          6.11.10+bpo-amd64 #1 Debia>
>>>> >>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables t>
>>>> >>     task:nfsd            state:D stack:0     pid:2566  tgid:2566 >
>>>> >>     Call Trace:
>>>> >>     <TASK>
>>>> >>     __schedule+0x400/0xad0
>>>> >>     schedule+0x27/0xf0
>>>> >>     nfsd4_shutdown_callback+0xfe/0x140 [nfsd]
>>>> >>     ? __pfx_var_wake_function+0x10/0x10
>>>> >>     __destroy_client+0x1f0/0x290 [nfsd]
>>>> >>     nfsd4_destroy_clientid+0xf1/0x1e0 [nfsd]
>>>> >>     ? svcauth_unix_set_client+0x586/0x5f0 [sunrpc]
>>>> >>     nfsd4_proc_compound+0x34d/0x670 [nfsd]
>>>> >>     nfsd_dispatch+0xfd/0x220 [nfsd]
>>>> >>     svc_process_common+0x2f7/0x700 [sunrpc]
>>>> >>     ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>>>> >>     svc_process+0x131/0x180 [sunrpc]
>>>> >>     svc_recv+0x830/0xa10 [sunrpc]
>>>> >>     ? __pfx_nfsd+0x10/0x10 [nfsd]
>>>> >>     nfsd+0x87/0xf0 [nfsd]
>>>> >>     kthread+0xcf/0x100
>>>> >>     ? __pfx_kthread+0x10/0x10
>>>> >>     ret_from_fork+0x31/0x50
>>>> >>     ? __pfx_kthread+0x10/0x10
>>>> >>     ret_from_fork_asm+0x1a/0x30
>>>> >>     </TASK>
>>>> >>
>>>> >>and also the state of the offending client in `/proc/fs/nfsd/ 
>>>> clients/*/info`
>>>> >>used to be callback state: UNKNOWN while now it is DOWN or FAULT. 
>>>> No idea
>>>> >>what it means, but I thought it was worth mentioning.
>>>> >>
>>>> >
>>>> >Looks like this is hung in nfsd41_cb_inflight_wait_complete() ? That
>>>> >probably means that the cl_cb_inflight counter is stuck at >0. I'm
>>>> >guessing that means that there is some callback that it's expecting to
>>>> >complete that isn't. From nfsd4_shutdown_callback():
>>>> >
>>>> >         /*
>>>> >          * Note this won't actually result in a null callback;
>>>> >          * instead, nfsd4_run_cb_null() will detect the killed
>>>> >          * client, destroy the rpc client, and stop:
>>>> >          */
>>>> >         nfsd4_run_cb(&clp->cl_cb_null);
>>>> >         flush_workqueue(clp->cl_callback_wq);
>>>> >         nfsd41_cb_inflight_wait_complete(clp);
>>>> >
>>>> >...it sounds like that isn't happening properly though.
>>>> >
>>>> >It might be interesting to see if you can track down the callback
>>>> >client in /sys/kernel/debug/sunrpc and see what it's doing.
>>>>
>>>> Here's a possible scenario:
>>>>
>>>> The trace.dat shows the server is sending a lot of CB_RECALL_ANY
>>>> operations.
>>>>
>>>> Normally a callback occurs only due to some particular client request.
>>>> Such requests would be completed before a client unmounts.
>>>>
>>>> CB_RECALL_ANY is an operation which does not occur due to a particular
>>>> client operation and can occur at any time. I believe this is the
>>>> first operation of this type we've added to NFSD.
>>>>
>>>> My guess is that the server's laundromat has sent some CB_RECALL_ANY
>>>> operations while the client is umounting -- DESTROY_SESSION is
>>>> racing with those callback operations.
>>>>
>>>> deleg_reaper() was backported to 6.1.y in 6.1.81, which lines up
>>>> more or less with when the issues started to show up.
>>>>
>>>> A quick way to test this theory would be to make deleg_reaper() a
>>>> no-op. If this helps, then we know that shutting down the server's
>>>> callback client needs to be more careful about cleaning up outstanding
>>>> callbacks.
>>>>
>>>>
>>>> -- 
>>>> Chuck Lever
>>>
>>> Hello.
>>>
>>> Thanks a lot for all of this work !
>>>
>>> I have started recording traces using :
>>>
>>> # trace-cmd record -e nfsd:nfsd_cb\*
>>>
>>> and currently waiting for the bug to trigger. This generally happen 
>>> in one/two weeks for me.
>>>
>>> If you want I adjust some recording parameters let me know.
>>
>> After looking at your data and Rik's data, I decided to try reproducing
>> the problem in my lab. No luck so far.
>>
>> The signatures of these issues are slightly different from each other:
>>
>> v6.1:
>> - nfsd4_destroy_session hangs on the flush_workqueue in
>>    nfsd4_shutdown_callback()
>> - There might be outstanding callback RPCs that are not exiting
>> - Lots of CB_RECALL_ANY traffic. Only place that's generated in
>>    v6.1 is the NFSD shrinker when memory is short
>> - There appear to be some unreachable clients -- evidence of client
>>    crashes or network outages, perhaps
>>
>> v6.12:
>> - destroy_client() hangs on the wait_var_event in
>>    nfsd4_shtudown_callback()
>> - There are no callback RPCs left after the hang
>> - There are some stuck clients listed under /proc/fs/nfsd/
>>
>> Similar, but not the same.
>>
>> I'm still open to clues, so please continue the regimen of starting
>> the trace-cmd when the server boots, and doing an "echo t >
>> /proc/sysrq-trigger" when the hang appears.
>>
>>   # trace-cmd record -e nfsd:nfsd_cb\* -e sunrpc:svc_xprt_detach \
>>     -p function -l nfsd4_destroy_session
>>
>>
> 
> Hello Chuck.
> 
> Sorry for my late reply but I need to find something
> interesting for you.
> 
> Did you see my call trace analyze here ?
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
> 
> On v6.1 did you see that nfsd4_shutdown_callback is reached from
> both nfsd4 laundromat_main and nfsd4_state_shrinker_worker ?
> For me the hang in V6.1 come also from wait_var_event.
> 
> Here the state of my RAM on my server with some load.
> 
>                 total        used        free      shared  buff/cache 
> available
> Mem:           32083        4982         651           5       26919   
> 27101
> Swap:          16629         842       15787
> 
> My desktop clients use heavily suspend. The computer are almost
> never rebooted. When a user logout, suspending is automatically 
> launched. Do you think that some delegation can be accumulated,
> locked by many suspended host, and can cause the
> high memory/cache usage ?

v6.1 I think has courteous server support, meaning the server will
attempt to preserve all client state when the client stops sending
state-renewing operations.

Suspended clients probably don't clean up their network connections.
The server could see that as a network partition.

Still, the server is supposed to clear out an idle connection after
6 minutes. So these hangs should clear up if this were the only
issue.


> Here my crash timeline between the desktop client salleprof13
> (the one you think is unreachable) and the server fichdc02.
> The failure come from suspend.
> 
> 
> ---
> Client salleprof13 wake from sleep at 09:55:37
> and was attached by the server.
> The callback channel is initialized.
> ---
> 
> janv. 14 07:20:32 salleprofs13 kernel: PM: suspend entry (deep)
> 
> janv. 14 09:55:37 salleprofs13 kernel: PM: suspend exit
> janv. 14 09:55:37 salleprofs13 ModemManager[621]: <info> [sleep-monitor- 
> systemd] system is resuming
> janv. 14 09:55:37 salleprofs13 NetworkManager[576]: <info> 
> [1736844937.2382] manager: sleep: wake requested (sleeping: yes enabled: 
> yes)
> janv. 14 09:55:37 salleprofs13 systemd[1]: Stopping gssd-bug.service - 
> Gssd credential cache hook...
> janv. 14 09:55:37 salleprofs13 systemd[1]: Starting winbind.service - 
> Samba Winbind Daemon...
> janv. 14 09:55:37 salleprofs13 systemd[1]: Started winbind.service - 
> Samba Winbind Daemon.
> janv. 14 09:55:40 salleprofs13 NetworkManager[576]: <info> 
> [1736844940.6304] device (enp2s0): carrier: link connected
> janv. 14 09:55:40 salleprofs13 kernel: r8169 0000:02:00.0 enp2s0: Link 
> is Up - 1Gbps/Full - flow control off
> 
> nfsd-1695  [047] ..... 190512.862590: nfsd_cb_probe: 
> addr=172.16.118.121:0 client 67833e73:4dd32903 state=UNKNOWN
> kworker/u96:8-225529 [047] ..... 190512.862782: nfsd_cb_setup: 
> addr=172.16.118.121:0 client 67833e73:4dd32903 proto=tcp flavor=sys
> kworker/u96:8-225529 [047] ..... 190512.862784: nfsd_cb_state: 
> addr=172.16.118.121:0 client 67833e73:4dd32903 state=UP
> 2025-01-14T09:55:48.290546+01:00 fichdc02 rpc.mountd[1363]: v4.2 client 
> attached: 0x4dd3290367833e73 from "172.16.118.121:825"
> 
> 
> ---
> Someone logon on the station.
> Immediately the server start sending nfsd_cb_recall_any.
> Strange no ? What can be recalled on that host that just
> wake up...

The server attempts to invoke CB_RECALL_ANY on all clients it knows
about. If it believes the client has state (even when in courtesy)
then the server will send this request to that client.


> ---
> 
> janv. 14 09:55:53 salleprofs13 nfsrahead[121742]: setting /dnfs/wine 
> readahead to 128
> janv. 14 09:56:13 salleprofs13 systemd-logind[569]: New session 36 of 
> user xxxxxxxx.
> 
> kworker/u96:3-227781 [047] ..... 190541.974364: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:2-220501 [013] ..... 190541.977732: nfsd_cb_recall_any_done: 
> client 67833e73:4dd32903 status=0
> kworker/u96:2-220501 [000] ..... 190550.387742: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:0-228547 [003] ..... 190550.389602: nfsd_cb_recall_any_done: 
> client 67833e73:4dd32903 status=0
> kworker/u96:2-220501 [021] ..... 190565.367759: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:1-230409 [000] ..... 190565.371391: nfsd_cb_recall_any_done: 
> client 67833e73:4dd32903 status=0
> kworker/u96:2-220501 [013] ..... 190579.748430: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:8-225529 [026] ..... 190579.749946: nfsd_cb_recall_any_done: 
> client 67833e73:4dd32903 status=0
> kworker/u96:8-225529 [013] ..... 190592.749738: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:2-220501 [001] ..... 190592.751192: nfsd_cb_recall_any_done: 
> client 67833e73:4dd32903 status=0
> kworker/u96:0-228547 [000] ..... 190599.849333: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:0-228547 [000] ..... 190599.850872: nfsd_cb_recall_any_done: 
> client 67833e73:4dd32903 status=0
> kworker/u96:2-220501 [045] ..... 190686.760138: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:1-230409 [014] ..... 190686.761181: nfsd_cb_recall_any_done: 
> client 67833e73:4dd32903 status=0
> kworker/u96:1-230409 [013] ..... 190694.103480: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:8-225529 [022] ..... 190694.106120: nfsd_cb_recall_any_done: 
> client 67833e73:4dd32903 status=0
> kworker/u96:8-225529 [000] ..... 190749.678378: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:2-220501 [007] ..... 190749.679870: nfsd_cb_recall_any_done: 
> client 67833e73:4dd32903 status=0
> kworker/u96:4-230705 [014] ..... 190782.830144: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:1-230409 [010] ..... 190782.831220: nfsd_cb_recall_any_done: 
> client 67833e73:4dd32903 status=0
> 
> 
> ---
> Next the user suspend the host maybe by pressing
> the power button without login out so the last
> nfsd_cb_recall_any fail
> Maybe the "unrecognized reply" related ?
> ---
> 
> janv. 14 10:01:18 salleprofs13 systemd-logind[569]: The system will 
> suspend now!
> janv. 14 10:01:18 salleprofs13 NetworkManager[576]: <info> 
> [1736845278.5631] manager: sleep: sleep requested (sleeping: no enabled: 
> yes)
> janv. 14 10:01:18 salleprofs13 ModemManager[621]: <info> [sleep-monitor- 
> systemd] system is about to suspend
> janv. 14 10:01:20 salleprofs13 kernel: PM: suspend entry (deep)
> 
> kworker/u96:2-220501 [047] ..... 190851.843698: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> kworker/u96:2-220501 [047] ..... 190861.062828: nfsd_cb_state: 
> addr=172.16.118.121:0 client 67833e73:4dd32903 state=FAULT
> 
> 2025-01-14T10:06:06.586140+01:00 fichdc02 kernel: [191131.106081] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 00000000ad57929d xid 52751431
> 
> 
> ---
> Got 5 unrecognized reply.
> Seems not related .
> ---
> 
> 2025-01-14T10:06:56.762137+01:00 fichdc02 kernel: [191181.279788] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 000000002dcd99c8 xid 5d819d7d
> 2025-01-14T10:06:56.762197+01:00 fichdc02 kernel: [191181.280005] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 000000002dcd99c8 xid 61819d7d
> 2025-01-14T10:06:56.762201+01:00 fichdc02 kernel: [191181.280054] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 000000002dcd99c8 xid 60819d7d
> 2025-01-14T10:06:56.762206+01:00 fichdc02 kernel: [191181.280223] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 000000002dcd99c8 xid 5f819d7d
> 2025-01-14T10:06:56.762209+01:00 fichdc02 kernel: [191181.280364] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 000000002dcd99c8 xid 5e819d7d
> 
> 
> ---
> Just 10 minutes after suspend the host wake up (with xxxxxxxx still 
> logged on).
> The gssd-bug.service is a custom script that kill all the user processes
> if the kerberos cache file has expired. This is due to a rpc.gssd bug
> that hammer the server when the cache has expired during suspend.
> It run "killall -u xxxxxxxx". But here, I don't know if the killall
> was called or not.
> 
> Just after the wake. The server send again an nfsd_cb_recall_any. But 
> this time
> there is no nfsd_cb_state with state=FAULT. From this time there no more 
> kworker
> cb calls. I think this where the hang start.

I'm thinking the hang started a little earlier, but I agree that this is
the first symptom of it.

I was attempting to reproduce by periodically dropping the connection,
but it appears there needs to be one or two more steps to make it behave
like a suspended client. I will see what can be done.


> And we got again an unrecognized reply.
> ---
> 
> janv. 14 10:10:24 salleprofs13 kernel: PM: suspend exit
> janv. 14 10:10:24 salleprofs13 systemd-sleep[124910]: System returned 
> from sleep state.
> janv. 14 10:10:24 salleprofs13 systemd[1]: Stopping gssd-bug.service - 
> Gssd credential cache hook...
> janv. 14 10:10:24 salleprofs13 NetworkManager[576]: <info> 
> [1736845824.4117] manager: sleep: wake requested (sleeping: yes enabled: 
> yes)
> janv. 14 10:10:24 salleprofs13 check-gssd-cache[124962]: id: 
> « xxxxxxxx » : the user don't exist
> janv. 14 10:10:24 salleprofs13 systemd[1]: Started winbind.service - 
> Samba Winbind Daemon.
> janv. 14 10:10:27 salleprofs13 kernel: r8169 0000:02:00.0 enp2s0: Link 
> is Up - 1Gbps/Full - flow control off
> janv. 14 10:10:27 salleprofs13 NetworkManager[576]: <info> 
> [1736845827.6412] device (enp2s0): carrier: link connected
> janv. 14 10:10:35 salleprofs13 systemd[1]: Stopped gssd-bug.service - 
> Gssd credential cache hook.
> 
> kworker/u96:2-220501 [012] ..... 191417.544355: nfsd_cb_recall_any: 
> addr=172.16.118.121:825 client 67833e73:4dd32903 keep=0 bmval0=RDATA_DLG
> 
> 2025-01-14T10:10:58.426113+01:00 fichdc02 kernel: [191422.934976] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 000000007205cdb2 xid ee91a9d8
> 
> 
> ---
> nfsd probe a failed State. Normal.
> But as kworker hang, it can't reinit the callback channel
> after the nfsd_cb_state "state=UNKNOWN".
> 
> I don't know why but nfsd started also
> to send nfsd_cb_recall...
> Note that is less than 120 second before the "BLOCKED" message.
> so the thread may be already locked.
> ---
> 
> nfsd-1690  [014] ..... 191424.483249: nfsd_cb_probe: 
> addr=172.16.118.121:0 client 67833e73:4dd32903 state=FAULT
> nfsd-1690  [014] ..... 191424.483252: nfsd_cb_state: 
> addr=172.16.118.121:0 client 67833e73:4dd32903 state=UNKNOWN
> nfsd-1695  [013] ...1. 191441.265047: nfsd_cb_recall: 
> addr=172.16.118.121:0 client 67833e73:4dd32903 stateid 00001c6e:00000001
> nfsd-1695  [014] ...1. 191441.646104: nfsd_cb_recall: 
> addr=172.16.118.121:0 client 67833e73:4dd32903 stateid 000015fc:00000001
> 
> 
> ---
> Got some strange logs on the client
> And 7 unrecognized reply on the server that seems not related.
> ---
> 
> janv. 14 10:11:00 salleprofs13 PackageKit[122249]: daemon quit
> janv. 14 10:11:00 salleprofs13 systemd[1]: packagekit.service: 
> Deactivated successfully.
> janv. 14 10:11:00 salleprofs13 systemd[1]: packagekit.service: Consumed 
> 28.223s CPU time.
> janv. 14 10:11:02 salleprofs13 gnome-shell[122127]: JS ERROR: 
> Gio.IOErrorEnum: Erreur lors de l’appel de StartServiceByName pour 
> org.vmware.viewagent.Credentials : Le délai d’>
> 
> _injectToMethod/klass[method]@resource:///org/gnome/gjs/modules/core/ 
> overrides/Gio.js:287:25
> 
> VmwareCredentials@resource:///org/gnome/shell/gdm/vmware.js:34:10
> 
> VmwareCredentialsManager@resource:///org/gnome/shell/gdm/vmware.js:41:29
> 
> getVmwareCredentialsManager@resource:///org/gnome/shell/gdm/vmware.js:51:37
> 
> ShellUserVerifier@resource:///org/gnome/shell/gdm/util.js:183:64
> 
> _init@resource:///org/gnome/shell/gdm/authPrompt.js:74:30
> 
> AuthPrompt@resource:///org/gnome/shell/gdm/authPrompt.js:51:4
> 
> _ensureAuthPrompt@resource:///org/gnome/shell/ui/unlockDialog.js:686:32
> 
> _showPrompt@resource:///org/gnome/shell/ui/unlockDialog.js:725:14
> 
> vfunc_key_press_event@resource:///org/gnome/shell/ui/unlockDialog.js:620:14
> janv. 14 10:11:05 salleprofs13 gsd-power[122329]: Error setting property 
> 'PowerSaveMode' on interface org.gnome.Mutter.DisplayConfig: Le délai 
> d’attente est dépassé (g-io-erro>
> janv. 14 10:11:27 salleprofs13 gnome-shell[122127]: JS ERROR: 
> Gio.IOErrorEnum: Erreur lors de l’appel de StartServiceByName pour 
> org.vmware.viewagent.Credentials : Le délai d’>
> 
> _injectToMethod/klass[method]@resource:///org/gnome/gjs/modules/core/ 
> overrides/Gio.js:287:25
> 
> VmwareCredentials@resource:///org/gnome/shell/gdm/vmware.js:34:10
> 
> VmwareCredentialsManager@resource:///org/gnome/shell/gdm/vmware.js:41:29
> 
> getVmwareCredentialsManager@resource:///org/gnome/shell/gdm/vmware.js:51:37
> 
> ShellUserVerifier@resource:///org/gnome/shell/gdm/util.js:183:64
> 
> _init@resource:///org/gnome/shell/gdm/authPrompt.js:74:30
> 
> AuthPrompt@resource:///org/gnome/shell/gdm/authPrompt.js:51:4
> 
> _ensureAuthPrompt@resource:///org/gnome/shell/ui/unlockDialog.js:686:32
> 
> _showPrompt@resource:///org/gnome/shell/ui/unlockDialog.js:725:14
> 
> vfunc_key_press_event@resource:///org/gnome/shell/ui/unlockDialog.js:620:14
> 
> 
> 
> 2025-01-14T10:12:36.730114+01:00 fichdc02 kernel: [191521.235822] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 00000000191d4459 xid d591b1ee
> 2025-01-14T10:12:36.730165+01:00 fichdc02 kernel: [191521.236547] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 00000000191d4459 xid d691b1ee
> 2025-01-14T10:12:36.730169+01:00 fichdc02 kernel: [191521.236666] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 00000000191d4459 xid da91b1ee
> 2025-01-14T10:12:36.730171+01:00 fichdc02 kernel: [191521.236743] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 00000000191d4459 xid db91b1ee
> 2025-01-14T10:12:36.730172+01:00 fichdc02 kernel: [191521.236769] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 00000000191d4459 xid d991b1ee
> 2025-01-14T10:12:36.730174+01:00 fichdc02 kernel: [191521.236824] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 00000000191d4459 xid d891b1ee
> 2025-01-14T10:12:36.730177+01:00 fichdc02 kernel: [191521.236840] 
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 
> 00000000191d4459 xid d791b1ee
> 
> 
> ---
> And finally the hang syslog message
> ---
> 
> 2025-01-14T10:14:29.370164+01:00 fichdc02 kernel: [191633.871204] INFO: 
> task nfsd:1690 blocked for more than 120 seconds.
> 2025-01-14T10:14:29.374846+01:00 fichdc02 kernel: [191633.875410] INFO: 
> task nfsd:1691 blocked for more than 120 seconds.
> 2025-01-14T10:14:29.374966+01:00 fichdc02 kernel: [191633.878106] INFO: 
> task nfsd:1693 blocked for more than 120 seconds.
> 2025-01-14T10:14:29.378303+01:00 fichdc02 kernel: [191633.880748] INFO: 
> task nfsd:1695 blocked for more than 120 seconds.
> 
> 
> Hope this help.
> 
> I'm still recording.
> 
> Regards,
> 
> Baptiste.
> 
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-09 11:56               ` Christian Herzog
  2025-01-09 12:42                 ` Jeff Layton
@ 2025-01-09 15:49                 ` Chuck Lever
  2025-01-09 15:58                   ` Christian Herzog
  1 sibling, 1 reply; 31+ messages in thread
From: Chuck Lever @ 2025-01-09 15:49 UTC (permalink / raw)
  To: Christian Herzog
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste

On 1/9/25 6:56 AM, Christian Herzog wrote:
> Dear Chuck,
> 
> On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
>> On 1/8/25 9:54 AM, Christian Herzog wrote:
>>> Dear Chuck,
>>>
>>> On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
>>>> On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
>>>>> Hi Chuck,
>>>>>
>>>>> Thanks for your time on this, much appreciated.
>>>>>
>>>>> On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
>>>>>> On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
>>>>>>> Hi Chuck, hi all,
>>>>>>>
>>>>>>> [it was not ideal to pick one of the message for this followup, let me
>>>>>>> know if you want a complete new thread, adding as well Benjamin and
>>>>>>> Trond as they are involved in one mentioned patch]
>>>>>>>
>>>>>>> On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi folks,
>>>>>>>>>
>>>>>>>>> what would be the reason for nfsd getting stuck somehow and becoming
>>>>>>>>> an unkillable process? See
>>>>>>>>>
>>>>>>>>> - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
>>>>>>>>> - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
>>>>>>>>>
>>>>>>>>> Doesn't this mean that something inside the kernel gets stuck as
>>>>>>>>> well? Seems odd to me.
>>>>>>>>
>>>>>>>> I'm not familiar with the Debian or Ubuntu kernel packages. Can
>>>>>>>> the kernel release numbers be translated to LTS kernel releases
>>>>>>>> please? Need both "last known working" and "first broken" releases.
>>>>>>>>
>>>>>>>> This:
>>>>>>>>
>>>>>>>> [ 6596.911785] RPC: Could not send backchannel reply error: -110
>>>>>>>> [ 6596.972490] RPC: Could not send backchannel reply error: -110
>>>>>>>> [ 6837.281307] RPC: Could not send backchannel reply error: -110
>>>>>>>>
>>>>>>>> is a known set of client backchannel bugs. Knowing the LTS kernel
>>>>>>>> releases (see above) will help us figure out what needs to be
>>>>>>>> backported to the LTS kernels kernels in question.
>>>>>>>>
>>>>>>>> This:
>>>>>>>>
>>>>>>>> [11183.290619] wait_for_completion+0x88/0x150
>>>>>>>> [11183.290623] __flush_workqueue+0x140/0x3e0
>>>>>>>> [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
>>>>>>>> [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
>>>>>>>>
>>>>>>>> is probably related to the backchannel errors on the client, but
>>>>>>>> client bugs shouldn't cause the server to hang like this. We
>>>>>>>> might be able to say more if you can provide the kernel release
>>>>>>>> translations (see above).
>>>>>>>
>>>>>>> In Debian we hstill have the bug #1071562 open and one person notified
>>>>>>> mye offlist that it appears that the issue get more frequent since
>>>>>>> they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
>>>>>>> with a 6.1.y based kernel).
>>>>>>>
>>>>>>> Some people around those issues, seem to claim that the change
>>>>>>> mentioned in
>>>>>>> https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
>>>>>>> would fix the issue, which is as well backchannel related.
>>>>>>>
>>>>>>> This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
>>>>>>> again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
>>>>>>> nfs_client's rpc timeouts for backchannel") this is not something
>>>>>>> which goes back to 6.1.y, could it be possible that hte backchannel
>>>>>>> refactoring and this final fix indeeds fixes the issue?
>>>>>>>
>>>>>>> As people report it is not easily reproducible, so this makes it
>>>>>>> harder to identify fixes correctly.
>>>>>>>
>>>>>>> I gave a (short) stance on trying to backport commits up to
>>>>>>> 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
>>>>>>> seems to indicate it is probably still not the right thing for
>>>>>>> backporting to the older stable series.
>>>>>>>
>>>>>>> As at least pre-requisites:
>>>>>>>
>>>>>>> 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
>>>>>>> 4119bd0306652776cb0b7caa3aea5b2a93aecb89
>>>>>>> 163cdfca341b76c958567ae0966bd3575c5c6192
>>>>>>> f4afc8fead386c81fda2593ad6162271d26667f8
>>>>>>> 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
>>>>>>> 57331a59ac0d680f606403eb24edd3c35aecba31
>>>>>>>
>>>>>>> and still there would be conflicting codepaths (and does not seem
>>>>>>> right).
>>>>>>>
>>>>>>> Chuck, Benjamin, Trond, is there anything we can provive on reporters
>>>>>>> side that we can try to tackle this issue better?
>>>>>>
>>>>>> As I've indicated before, NFSD should not block no matter what the
>>>>>> client may or may not be doing. I'd like to focus on the server first.
>>>>>>
>>>>>> What is the result of:
>>>>>>
>>>>>> $ cd <Bookworm's v6.1.90 kernel source >
>>>>>> $ unset KBUILD_OUTPUT
>>>>>> $ make -j `nproc`
>>>>>> $ scripts/faddr2line \
>>>>>> 	fs/nfsd/nfs4state.o \
>>>>>> 	nfsd4_destroy_session+0x16d
>>>>>>
>>>>>> Since this issue appeared after v6.1.1, is it possible to bisect
>>>>>> between v6.1.1 and v6.1.90 ?
>>>>>
>>>>> First please note, at least speaking of triggering the issue in
>>>>> Debian, Debian has moved to 6.1.119 based kernel already (and soon in
>>>>> the weekends point release move to 6.1.123).
>>>>>
>>>>> That said, one of the users which regularly seems to be hit by the
>>>>> issue was able to provide the above requested information, based for
>>>>> 6.1.119:
>>>>>
>>>>> ~/kernel/linux-source-6.1# make kernelversion
>>>>> 6.1.119
>>>>> ~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
>>>>> nfsd4_destroy_session+0x16d/0x250:
>>>>> __list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
>>>>> (inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
>>>>> (inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
>>>>> (inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
>>>>>
>>>>> They could provide as well a decode_stacktrace output for the recent
>>>>> hit (if that is helpful for you):
>>>>>
>>>>> [Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
>>>>> [Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
>>>>> [Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
>>>>> [Mon Jan 6 13:25:28 2025] Call Trace:
>>>>> [Mon Jan 6 13:25:28 2025]  <TASK>
>>>>> [Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
>>>>> [Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
>>>>> [Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
>>>>> [Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
>>>>> [Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
>>>>> [Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
>>>>> [Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
>>>>> [Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
>>>>> [Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
>>>>> [Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
>>>>> [Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
>>>>> [Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
>>>>> [Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
>>>>> [Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
>>>>> [Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
>>>>> [Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
>>>>> [Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
>>>>> [Mon Jan  6 13:25:28 2025]  </TASK>
>>>>>
>>>>> The question about bisection is actually harder, those are production
>>>>> systems and I understand it's not possible to do bisect there, while
>>>>> unfortunately not reprodcing the issue on "lab conditions".
>>>>>
>>>>> Does the above give us still a clue on what you were looking for?
>>>>
>>>> Thanks for the update.
>>>>
>>>> It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
>>>> nfs4_client"), while not a specific fix for this issue, might offer some
>>>> relief by preventing the DESTROY_SESSION hang from affecting all other
>>>> clients of the degraded server.
>>>>
>>>> Not having a reproducer or the ability to bisect puts a damper on
>>>> things. The next step, then, is to enable tracing on servers where this
>>>> issue can come up, and wait for the hang to occur. The following command
>>>> captures information only about callback operation, so it should not
>>>> generate much data or impact server performance.
>>>>
>>>>     # trace-cmd record -e nfsd:nfsd_cb\*
>>>>
>>>> Let that run until the problem occurs, then ^C, compress the resulting
>>>> trace.dat file, and either attach it to 1071562 or email it to me
>>>> privately.
>>> thanks for the follow-up.
>>>
>>> I am the "customer" with two affected file servers. We have since moved those
>>> servers to the backports kernel (6.11.10) in the hope of forward fixing the
>>> issue. If this kernel is stable, I'm afraid I can't go back to the 'bad'
>>> kernel (700+ researchers affected..), and we're also not able to trigger the
>>> issue on our test environment.
>>
>> Hello Dr. Herzog -
>>
>> If the problem recurs on 6.11, the trace-cmd I suggest above works
>> there as well.
> the bad news is: it just happened again with the bpo kernel.
> 
> We then immediately started trace-cmd since usually there are several call
> traces in a row and we did get a trace.dat:
> http://people.phys.ethz.ch/~daduke/trace.dat

I'd prefer to have the trace running from before the first occurrence
of the problem. Even better would be to get it started before the
first client mounts the server.

But I will have a look at your trace soon.

It would be extremely helpful if we could reproduce this problem in
our own labs.


> we did notice however that the stack trace looked a bit different this time:
> 
>      INFO: task nfsd:2566 blocked for more than 5799 seconds.
>      Tainted: G        W          6.11.10+bpo-amd64 #1 Debia>
>      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables t>
>      task:nfsd            state:D stack:0     pid:2566  tgid:2566 >
>      Call Trace:
>      <TASK>
>      __schedule+0x400/0xad0
>      schedule+0x27/0xf0
>      nfsd4_shutdown_callback+0xfe/0x140 [nfsd]
>      ? __pfx_var_wake_function+0x10/0x10
>      __destroy_client+0x1f0/0x290 [nfsd]
>      nfsd4_destroy_clientid+0xf1/0x1e0 [nfsd]
>      ? svcauth_unix_set_client+0x586/0x5f0 [sunrpc]
>      nfsd4_proc_compound+0x34d/0x670 [nfsd]
>      nfsd_dispatch+0xfd/0x220 [nfsd]
>      svc_process_common+0x2f7/0x700 [sunrpc]
>      ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
>      svc_process+0x131/0x180 [sunrpc]
>      svc_recv+0x830/0xa10 [sunrpc]
>      ? __pfx_nfsd+0x10/0x10 [nfsd]
>      nfsd+0x87/0xf0 [nfsd]
>      kthread+0xcf/0x100
>      ? __pfx_kthread+0x10/0x10
>      ret_from_fork+0x31/0x50
>      ? __pfx_kthread+0x10/0x10
>      ret_from_fork_asm+0x1a/0x30
>      </TASK>
> 
> and also the state of the offending client in `/proc/fs/nfsd/clients/*/info`
> used to be callback state: UNKNOWN while now it is DOWN or FAULT. No idea
> what it means, but I thought it was worth mentioning.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-09 15:49                 ` Chuck Lever
@ 2025-01-09 15:58                   ` Christian Herzog
  2025-01-09 16:09                     ` Chuck Lever
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Herzog @ 2025-01-09 15:58 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste

On Thu, Jan 09, 2025 at 10:49:37AM -0500, Chuck Lever wrote:
> On 1/9/25 6:56 AM, Christian Herzog wrote:
> > Dear Chuck,
> > 
> > On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
> > > On 1/8/25 9:54 AM, Christian Herzog wrote:
> > > > Dear Chuck,
> > > > 
> > > > On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
> > > > > On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
> > > > > > Hi Chuck,
> > > > > > 
> > > > > > Thanks for your time on this, much appreciated.
> > > > > > 
> > > > > > On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
> > > > > > > On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
> > > > > > > > Hi Chuck, hi all,
> > > > > > > > 
> > > > > > > > [it was not ideal to pick one of the message for this followup, let me
> > > > > > > > know if you want a complete new thread, adding as well Benjamin and
> > > > > > > > Trond as they are involved in one mentioned patch]
> > > > > > > > 
> > > > > > > > On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
> > > > > > > > > > 
> > > > > > > > > > Hi folks,
> > > > > > > > > > 
> > > > > > > > > > what would be the reason for nfsd getting stuck somehow and becoming
> > > > > > > > > > an unkillable process? See
> > > > > > > > > > 
> > > > > > > > > > - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
> > > > > > > > > > - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
> > > > > > > > > > 
> > > > > > > > > > Doesn't this mean that something inside the kernel gets stuck as
> > > > > > > > > > well? Seems odd to me.
> > > > > > > > > 
> > > > > > > > > I'm not familiar with the Debian or Ubuntu kernel packages. Can
> > > > > > > > > the kernel release numbers be translated to LTS kernel releases
> > > > > > > > > please? Need both "last known working" and "first broken" releases.
> > > > > > > > > 
> > > > > > > > > This:
> > > > > > > > > 
> > > > > > > > > [ 6596.911785] RPC: Could not send backchannel reply error: -110
> > > > > > > > > [ 6596.972490] RPC: Could not send backchannel reply error: -110
> > > > > > > > > [ 6837.281307] RPC: Could not send backchannel reply error: -110
> > > > > > > > > 
> > > > > > > > > is a known set of client backchannel bugs. Knowing the LTS kernel
> > > > > > > > > releases (see above) will help us figure out what needs to be
> > > > > > > > > backported to the LTS kernels kernels in question.
> > > > > > > > > 
> > > > > > > > > This:
> > > > > > > > > 
> > > > > > > > > [11183.290619] wait_for_completion+0x88/0x150
> > > > > > > > > [11183.290623] __flush_workqueue+0x140/0x3e0
> > > > > > > > > [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
> > > > > > > > > [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
> > > > > > > > > 
> > > > > > > > > is probably related to the backchannel errors on the client, but
> > > > > > > > > client bugs shouldn't cause the server to hang like this. We
> > > > > > > > > might be able to say more if you can provide the kernel release
> > > > > > > > > translations (see above).
> > > > > > > > 
> > > > > > > > In Debian we hstill have the bug #1071562 open and one person notified
> > > > > > > > mye offlist that it appears that the issue get more frequent since
> > > > > > > > they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
> > > > > > > > with a 6.1.y based kernel).
> > > > > > > > 
> > > > > > > > Some people around those issues, seem to claim that the change
> > > > > > > > mentioned in
> > > > > > > > https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
> > > > > > > > would fix the issue, which is as well backchannel related.
> > > > > > > > 
> > > > > > > > This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
> > > > > > > > again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
> > > > > > > > nfs_client's rpc timeouts for backchannel") this is not something
> > > > > > > > which goes back to 6.1.y, could it be possible that hte backchannel
> > > > > > > > refactoring and this final fix indeeds fixes the issue?
> > > > > > > > 
> > > > > > > > As people report it is not easily reproducible, so this makes it
> > > > > > > > harder to identify fixes correctly.
> > > > > > > > 
> > > > > > > > I gave a (short) stance on trying to backport commits up to
> > > > > > > > 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
> > > > > > > > seems to indicate it is probably still not the right thing for
> > > > > > > > backporting to the older stable series.
> > > > > > > > 
> > > > > > > > As at least pre-requisites:
> > > > > > > > 
> > > > > > > > 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
> > > > > > > > 4119bd0306652776cb0b7caa3aea5b2a93aecb89
> > > > > > > > 163cdfca341b76c958567ae0966bd3575c5c6192
> > > > > > > > f4afc8fead386c81fda2593ad6162271d26667f8
> > > > > > > > 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
> > > > > > > > 57331a59ac0d680f606403eb24edd3c35aecba31
> > > > > > > > 
> > > > > > > > and still there would be conflicting codepaths (and does not seem
> > > > > > > > right).
> > > > > > > > 
> > > > > > > > Chuck, Benjamin, Trond, is there anything we can provive on reporters
> > > > > > > > side that we can try to tackle this issue better?
> > > > > > > 
> > > > > > > As I've indicated before, NFSD should not block no matter what the
> > > > > > > client may or may not be doing. I'd like to focus on the server first.
> > > > > > > 
> > > > > > > What is the result of:
> > > > > > > 
> > > > > > > $ cd <Bookworm's v6.1.90 kernel source >
> > > > > > > $ unset KBUILD_OUTPUT
> > > > > > > $ make -j `nproc`
> > > > > > > $ scripts/faddr2line \
> > > > > > > 	fs/nfsd/nfs4state.o \
> > > > > > > 	nfsd4_destroy_session+0x16d
> > > > > > > 
> > > > > > > Since this issue appeared after v6.1.1, is it possible to bisect
> > > > > > > between v6.1.1 and v6.1.90 ?
> > > > > > 
> > > > > > First please note, at least speaking of triggering the issue in
> > > > > > Debian, Debian has moved to 6.1.119 based kernel already (and soon in
> > > > > > the weekends point release move to 6.1.123).
> > > > > > 
> > > > > > That said, one of the users which regularly seems to be hit by the
> > > > > > issue was able to provide the above requested information, based for
> > > > > > 6.1.119:
> > > > > > 
> > > > > > ~/kernel/linux-source-6.1# make kernelversion
> > > > > > 6.1.119
> > > > > > ~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
> > > > > > nfsd4_destroy_session+0x16d/0x250:
> > > > > > __list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
> > > > > > (inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
> > > > > > (inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
> > > > > > (inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
> > > > > > 
> > > > > > They could provide as well a decode_stacktrace output for the recent
> > > > > > hit (if that is helpful for you):
> > > > > > 
> > > > > > [Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
> > > > > > [Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
> > > > > > [Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > > > [Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
> > > > > > [Mon Jan 6 13:25:28 2025] Call Trace:
> > > > > > [Mon Jan 6 13:25:28 2025]  <TASK>
> > > > > > [Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
> > > > > > [Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
> > > > > > [Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
> > > > > > [Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
> > > > > > [Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
> > > > > > [Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
> > > > > > [Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
> > > > > > [Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
> > > > > > [Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
> > > > > > [Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
> > > > > > [Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
> > > > > > [Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
> > > > > > [Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
> > > > > > [Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
> > > > > > [Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
> > > > > > [Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
> > > > > > [Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
> > > > > > [Mon Jan  6 13:25:28 2025]  </TASK>
> > > > > > 
> > > > > > The question about bisection is actually harder, those are production
> > > > > > systems and I understand it's not possible to do bisect there, while
> > > > > > unfortunately not reprodcing the issue on "lab conditions".
> > > > > > 
> > > > > > Does the above give us still a clue on what you were looking for?
> > > > > 
> > > > > Thanks for the update.
> > > > > 
> > > > > It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
> > > > > nfs4_client"), while not a specific fix for this issue, might offer some
> > > > > relief by preventing the DESTROY_SESSION hang from affecting all other
> > > > > clients of the degraded server.
> > > > > 
> > > > > Not having a reproducer or the ability to bisect puts a damper on
> > > > > things. The next step, then, is to enable tracing on servers where this
> > > > > issue can come up, and wait for the hang to occur. The following command
> > > > > captures information only about callback operation, so it should not
> > > > > generate much data or impact server performance.
> > > > > 
> > > > >     # trace-cmd record -e nfsd:nfsd_cb\*
> > > > > 
> > > > > Let that run until the problem occurs, then ^C, compress the resulting
> > > > > trace.dat file, and either attach it to 1071562 or email it to me
> > > > > privately.
> > > > thanks for the follow-up.
> > > > 
> > > > I am the "customer" with two affected file servers. We have since moved those
> > > > servers to the backports kernel (6.11.10) in the hope of forward fixing the
> > > > issue. If this kernel is stable, I'm afraid I can't go back to the 'bad'
> > > > kernel (700+ researchers affected..), and we're also not able to trigger the
> > > > issue on our test environment.
> > > 
> > > Hello Dr. Herzog -
> > > 
> > > If the problem recurs on 6.11, the trace-cmd I suggest above works
> > > there as well.
> > the bad news is: it just happened again with the bpo kernel.
> > 
> > We then immediately started trace-cmd since usually there are several call
> > traces in a row and we did get a trace.dat:
> > http://people.phys.ethz.ch/~daduke/trace.dat
> 
> I'd prefer to have the trace running from before the first occurrence
> of the problem. Even better would be to get it started before the
> first client mounts the server.
> 
> But I will have a look at your trace soon.
before you invest significant time in the trace: there is a chance it's a
false alarm: we do see nfsd in D, but they're not spreading like they used to.
And since someone is pulling 400 Mb/s from that server via globus gridftp,
it might just be real IO overload. We're still trying to figure out what we're
dealing with...

> It would be extremely helpful if we could reproduce this problem in
> our own labs.
easy: just hire 700 physicists and let 'em loose

thanks,
-Christian


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
  2025-01-09 15:58                   ` Christian Herzog
@ 2025-01-09 16:09                     ` Chuck Lever
  0 siblings, 0 replies; 31+ messages in thread
From: Chuck Lever @ 2025-01-09 16:09 UTC (permalink / raw)
  To: Christian Herzog
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Pellegrin Baptiste

On 1/9/25 10:58 AM, Christian Herzog wrote:
> On Thu, Jan 09, 2025 at 10:49:37AM -0500, Chuck Lever wrote:
>> On 1/9/25 6:56 AM, Christian Herzog wrote:
>>> Dear Chuck,
>>>
>>> On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
>>>> On 1/8/25 9:54 AM, Christian Herzog wrote:
>>>>> Dear Chuck,
>>>>>
>>>>> On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
>>>>>> On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
>>>>>>> Hi Chuck,
>>>>>>>
>>>>>>> Thanks for your time on this, much appreciated.
>>>>>>>
>>>>>>> On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
>>>>>>>> On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
>>>>>>>>> Hi Chuck, hi all,
>>>>>>>>>
>>>>>>>>> [it was not ideal to pick one of the message for this followup, let me
>>>>>>>>> know if you want a complete new thread, adding as well Benjamin and
>>>>>>>>> Trond as they are involved in one mentioned patch]
>>>>>>>>>
>>>>>>>>> On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Jun 17, 2024, at 2:55 AM, Harald Dunkel <harald.dunkel@aixigo.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi folks,
>>>>>>>>>>>
>>>>>>>>>>> what would be the reason for nfsd getting stuck somehow and becoming
>>>>>>>>>>> an unkillable process? See
>>>>>>>>>>>
>>>>>>>>>>> - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
>>>>>>>>>>> - https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
>>>>>>>>>>>
>>>>>>>>>>> Doesn't this mean that something inside the kernel gets stuck as
>>>>>>>>>>> well? Seems odd to me.
>>>>>>>>>>
>>>>>>>>>> I'm not familiar with the Debian or Ubuntu kernel packages. Can
>>>>>>>>>> the kernel release numbers be translated to LTS kernel releases
>>>>>>>>>> please? Need both "last known working" and "first broken" releases.
>>>>>>>>>>
>>>>>>>>>> This:
>>>>>>>>>>
>>>>>>>>>> [ 6596.911785] RPC: Could not send backchannel reply error: -110
>>>>>>>>>> [ 6596.972490] RPC: Could not send backchannel reply error: -110
>>>>>>>>>> [ 6837.281307] RPC: Could not send backchannel reply error: -110
>>>>>>>>>>
>>>>>>>>>> is a known set of client backchannel bugs. Knowing the LTS kernel
>>>>>>>>>> releases (see above) will help us figure out what needs to be
>>>>>>>>>> backported to the LTS kernels kernels in question.
>>>>>>>>>>
>>>>>>>>>> This:
>>>>>>>>>>
>>>>>>>>>> [11183.290619] wait_for_completion+0x88/0x150
>>>>>>>>>> [11183.290623] __flush_workqueue+0x140/0x3e0
>>>>>>>>>> [11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
>>>>>>>>>> [11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
>>>>>>>>>>
>>>>>>>>>> is probably related to the backchannel errors on the client, but
>>>>>>>>>> client bugs shouldn't cause the server to hang like this. We
>>>>>>>>>> might be able to say more if you can provide the kernel release
>>>>>>>>>> translations (see above).
>>>>>>>>>
>>>>>>>>> In Debian we hstill have the bug #1071562 open and one person notified
>>>>>>>>> mye offlist that it appears that the issue get more frequent since
>>>>>>>>> they updated on NFS client side from Ubuntu 20.04 to Debian bookworm
>>>>>>>>> with a 6.1.y based kernel).
>>>>>>>>>
>>>>>>>>> Some people around those issues, seem to claim that the change
>>>>>>>>> mentioned in
>>>>>>>>> https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
>>>>>>>>> would fix the issue, which is as well backchannel related.
>>>>>>>>>
>>>>>>>>> This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
>>>>>>>>> again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
>>>>>>>>> nfs_client's rpc timeouts for backchannel") this is not something
>>>>>>>>> which goes back to 6.1.y, could it be possible that hte backchannel
>>>>>>>>> refactoring and this final fix indeeds fixes the issue?
>>>>>>>>>
>>>>>>>>> As people report it is not easily reproducible, so this makes it
>>>>>>>>> harder to identify fixes correctly.
>>>>>>>>>
>>>>>>>>> I gave a (short) stance on trying to backport commits up to
>>>>>>>>> 6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but this quickly
>>>>>>>>> seems to indicate it is probably still not the right thing for
>>>>>>>>> backporting to the older stable series.
>>>>>>>>>
>>>>>>>>> As at least pre-requisites:
>>>>>>>>>
>>>>>>>>> 2009e32997ed568a305cf9bc7bf27d22e0f6ccda
>>>>>>>>> 4119bd0306652776cb0b7caa3aea5b2a93aecb89
>>>>>>>>> 163cdfca341b76c958567ae0966bd3575c5c6192
>>>>>>>>> f4afc8fead386c81fda2593ad6162271d26667f8
>>>>>>>>> 6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
>>>>>>>>> 57331a59ac0d680f606403eb24edd3c35aecba31
>>>>>>>>>
>>>>>>>>> and still there would be conflicting codepaths (and does not seem
>>>>>>>>> right).
>>>>>>>>>
>>>>>>>>> Chuck, Benjamin, Trond, is there anything we can provive on reporters
>>>>>>>>> side that we can try to tackle this issue better?
>>>>>>>>
>>>>>>>> As I've indicated before, NFSD should not block no matter what the
>>>>>>>> client may or may not be doing. I'd like to focus on the server first.
>>>>>>>>
>>>>>>>> What is the result of:
>>>>>>>>
>>>>>>>> $ cd <Bookworm's v6.1.90 kernel source >
>>>>>>>> $ unset KBUILD_OUTPUT
>>>>>>>> $ make -j `nproc`
>>>>>>>> $ scripts/faddr2line \
>>>>>>>> 	fs/nfsd/nfs4state.o \
>>>>>>>> 	nfsd4_destroy_session+0x16d
>>>>>>>>
>>>>>>>> Since this issue appeared after v6.1.1, is it possible to bisect
>>>>>>>> between v6.1.1 and v6.1.90 ?
>>>>>>>
>>>>>>> First please note, at least speaking of triggering the issue in
>>>>>>> Debian, Debian has moved to 6.1.119 based kernel already (and soon in
>>>>>>> the weekends point release move to 6.1.123).
>>>>>>>
>>>>>>> That said, one of the users which regularly seems to be hit by the
>>>>>>> issue was able to provide the above requested information, based for
>>>>>>> 6.1.119:
>>>>>>>
>>>>>>> ~/kernel/linux-source-6.1# make kernelversion
>>>>>>> 6.1.119
>>>>>>> ~/kernel/linux-source-6.1# scripts/faddr2line fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
>>>>>>> nfsd4_destroy_session+0x16d/0x250:
>>>>>>> __list_del_entry at /root/kernel/linux-source-6.1/./include/linux/list.h:134
>>>>>>> (inlined by) list_del at /root/kernel/linux-source-6.1/./include/linux/list.h:148
>>>>>>> (inlined by) unhash_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
>>>>>>> (inlined by) nfsd4_destroy_session at /root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
>>>>>>>
>>>>>>> They could provide as well a decode_stacktrace output for the recent
>>>>>>> hit (if that is helpful for you):
>>>>>>>
>>>>>>> [Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for more than 6883 seconds.
>>>>>>> [Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 Debian 6.1.119-1
>>>>>>> [Mon Jan 6 13:25:28 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>>> [Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0     pid:55306 ppid:2      flags:0x00004000
>>>>>>> [Mon Jan 6 13:25:28 2025] Call Trace:
>>>>>>> [Mon Jan 6 13:25:28 2025]  <TASK>
>>>>>>> [Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
>>>>>>> [Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
>>>>>>> [Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
>>>>>>> [Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
>>>>>>> [Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
>>>>>>> [Mon Jan 6 13:25:28 2025] nfsd4_destroy_session (debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
>>>>>>> [Mon Jan 6 13:25:28 2025] nfsd4_proc_compound (debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
>>>>>>> [Mon Jan 6 13:25:28 2025] nfsd_dispatch (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
>>>>>>> [Mon Jan 6 13:25:28 2025] svc_process_common (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
>>>>>>> [Mon Jan 6 13:25:28 2025] ? svc_recv (debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
>>>>>>> [Mon Jan 6 13:25:28 2025] ? nfsd_svc (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
>>>>>>> [Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
>>>>>>> [Mon Jan 6 13:25:28 2025] svc_process (debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
>>>>>>> [Mon Jan 6 13:25:28 2025] nfsd (debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
>>>>>>> [Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
>>>>>>> [Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
>>>>>>> [Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
>>>>>>> [Mon Jan  6 13:25:28 2025]  </TASK>
>>>>>>>
>>>>>>> The question about bisection is actually harder, those are production
>>>>>>> systems and I understand it's not possible to do bisect there, while
>>>>>>> unfortunately not reprodcing the issue on "lab conditions".
>>>>>>>
>>>>>>> Does the above give us still a clue on what you were looking for?
>>>>>>
>>>>>> Thanks for the update.
>>>>>>
>>>>>> It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
>>>>>> nfs4_client"), while not a specific fix for this issue, might offer some
>>>>>> relief by preventing the DESTROY_SESSION hang from affecting all other
>>>>>> clients of the degraded server.
>>>>>>
>>>>>> Not having a reproducer or the ability to bisect puts a damper on
>>>>>> things. The next step, then, is to enable tracing on servers where this
>>>>>> issue can come up, and wait for the hang to occur. The following command
>>>>>> captures information only about callback operation, so it should not
>>>>>> generate much data or impact server performance.
>>>>>>
>>>>>>      # trace-cmd record -e nfsd:nfsd_cb\*
>>>>>>
>>>>>> Let that run until the problem occurs, then ^C, compress the resulting
>>>>>> trace.dat file, and either attach it to 1071562 or email it to me
>>>>>> privately.
>>>>> thanks for the follow-up.
>>>>>
>>>>> I am the "customer" with two affected file servers. We have since moved those
>>>>> servers to the backports kernel (6.11.10) in the hope of forward fixing the
>>>>> issue. If this kernel is stable, I'm afraid I can't go back to the 'bad'
>>>>> kernel (700+ researchers affected..), and we're also not able to trigger the
>>>>> issue on our test environment.
>>>>
>>>> Hello Dr. Herzog -
>>>>
>>>> If the problem recurs on 6.11, the trace-cmd I suggest above works
>>>> there as well.
>>> the bad news is: it just happened again with the bpo kernel.
>>>
>>> We then immediately started trace-cmd since usually there are several call
>>> traces in a row and we did get a trace.dat:
>>> http://people.phys.ethz.ch/~daduke/trace.dat
>>
>> I'd prefer to have the trace running from before the first occurrence
>> of the problem. Even better would be to get it started before the
>> first client mounts the server.
>>
>> But I will have a look at your trace soon.
> before you invest significant time in the trace: there is a chance it's a
> false alarm: we do see nfsd in D, but they're not spreading like they used to.
> And since someone is pulling 400 Mb/s from that server via globus gridftp,
> it might just be real IO overload. We're still trying to figure out what we're
> dealing with...

6.11 is going to behave a little differently, IIRC. It should allow
other clients to continue working normally after one client hits this
issue.


>> It would be extremely helpful if we could reproduce this problem in
>> our own labs.
> easy: just hire 700 physicists and let 'em loose

Yeah, that's the problem. I ain't got 700 physicists in my back
pocket who would put up with NFS server outages ;-)

Jeff and I have discussed a potential pynfs-based reproducer,
but no guarantee it's the same issue as yours.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: nfsd blocks indefinitely in nfsd4_destroy_session
@ 2025-01-14 20:31 Baptiste PELLEGRIN
  0 siblings, 0 replies; 31+ messages in thread
From: Baptiste PELLEGRIN @ 2025-01-14 20:31 UTC (permalink / raw)
  To: Chuck Lever, Jeff Layton
  Cc: Salvatore Bonaccorso, Benjamin Coddington, Trond Myklebust,
	Harald Dunkel, Linux NFS Mailing List, Martin Svec,
	Michael Gernoth, Christian Herzog

[-- Attachment #1: Type: text/plain, Size: 15145 bytes --]

On 09/01/25 17:32, Chuck Lever <chuck.lever@oracle.com> wrote:
 > On 1/9/25 7:42 AM, Jeff Layton wrote:
 > >On Thu, 2025-01-09 at 12:56 +0100, Christian Herzog wrote:
 > >>Dear Chuck,
 > >>
 > >>On Wed, Jan 08, 2025 at 10:07:49AM -0500, Chuck Lever wrote:
 > >>>On 1/8/25 9:54 AM, Christian Herzog wrote:
 > >>>>Dear Chuck,
 > >>>>
 > >>>>On Wed, Jan 08, 2025 at 08:33:23AM -0500, Chuck Lever wrote:
 > >>>>>On 1/7/25 4:17 PM, Salvatore Bonaccorso wrote:
 > >>>>>>Hi Chuck,
 > >>>>>>
 > >>>>>>Thanks for your time on this, much appreciated.
 > >>>>>>
 > >>>>>>On Wed, Jan 01, 2025 at 02:24:50PM -0500, Chuck Lever wrote:
 > >>>>>>>On 12/25/24 4:15 AM, Salvatore Bonaccorso wrote:
 > >>>>>>>>Hi Chuck, hi all,
 > >>>>>>>>
 > >>>>>>>>[it was not ideal to pick one of the message for this 
followup, let me
 > >>>>>>>>know if you want a complete new thread, adding as well 
Benjamin and
 > >>>>>>>>Trond as they are involved in one mentioned patch]
 > >>>>>>>>
 > >>>>>>>>On Mon, Jun 17, 2024 at 02:31:54PM +0000, Chuck Lever III wrote:
 > >>>>>>>>>
 > >>>>>>>>>
 > >>>>>>>>>>On Jun 17, 2024, at 2:55 AM, Harald Dunkel 
<harald.dunkel@aixigo.com> wrote:
 > >>>>>>>>>>
 > >>>>>>>>>>Hi folks,
 > >>>>>>>>>>
 > >>>>>>>>>>what would be the reason for nfsd getting stuck somehow and 
becoming
 > >>>>>>>>>>an unkillable process? See
 > >>>>>>>>>>
 > >>>>>>>>>>- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562
 > >>>>>>>>>>- 
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568
 > >>>>>>>>>>
 > >>>>>>>>>>Doesn't this mean that something inside the kernel gets 
stuck as
 > >>>>>>>>>>well? Seems odd to me.
 > >>>>>>>>>
 > >>>>>>>>>I'm not familiar with the Debian or Ubuntu kernel packages. Can
 > >>>>>>>>>the kernel release numbers be translated to LTS kernel releases
 > >>>>>>>>>please? Need both "last known working" and "first broken" 
releases.
 > >>>>>>>>>
 > >>>>>>>>>This:
 > >>>>>>>>>
 > >>>>>>>>>[ 6596.911785] RPC: Could not send backchannel reply error: -110
 > >>>>>>>>>[ 6596.972490] RPC: Could not send backchannel reply error: -110
 > >>>>>>>>>[ 6837.281307] RPC: Could not send backchannel reply error: -110
 > >>>>>>>>>
 > >>>>>>>>>is a known set of client backchannel bugs. Knowing the LTS 
kernel
 > >>>>>>>>>releases (see above) will help us figure out what needs to be
 > >>>>>>>>>backported to the LTS kernels kernels in question.
 > >>>>>>>>>
 > >>>>>>>>>This:
 > >>>>>>>>>
 > >>>>>>>>>[11183.290619] wait_for_completion+0x88/0x150
 > >>>>>>>>>[11183.290623] __flush_workqueue+0x140/0x3e0
 > >>>>>>>>>[11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
 > >>>>>>>>>[11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
 > >>>>>>>>>
 > >>>>>>>>>is probably related to the backchannel errors on the client, but
 > >>>>>>>>>client bugs shouldn't cause the server to hang like this. We
 > >>>>>>>>>might be able to say more if you can provide the kernel release
 > >>>>>>>>>translations (see above).
 > >>>>>>>>
 > >>>>>>>>In Debian we hstill have the bug #1071562 open and one person 
notified
 > >>>>>>>>mye offlist that it appears that the issue get more frequent 
since
 > >>>>>>>>they updated on NFS client side from Ubuntu 20.04 to Debian 
bookworm
 > >>>>>>>>with a 6.1.y based kernel).
 > >>>>>>>>
 > >>>>>>>>Some people around those issues, seem to claim that the change
 > >>>>>>>>mentioned in
 > 
 >>>>>>>>https://lists.proxmox.com/pipermail/pve-devel/2024-July/064614.html
 > >>>>>>>>would fix the issue, which is as well backchannel related.
 > >>>>>>>>
 > >>>>>>>>This is upstream: 6ddc9deacc13 ("SUNRPC: Fix backchannel reply,
 > >>>>>>>>again"). While this commit fixes 57331a59ac0d ("NFSv4.1: Use the
 > >>>>>>>>nfs_client's rpc timeouts for backchannel") this is not something
 > >>>>>>>>which goes back to 6.1.y, could it be possible that hte 
backchannel
 > >>>>>>>>refactoring and this final fix indeeds fixes the issue?
 > >>>>>>>>
 > >>>>>>>>As people report it is not easily reproducible, so this makes it
 > >>>>>>>>harder to identify fixes correctly.
 > >>>>>>>>
 > >>>>>>>>I gave a (short) stance on trying to backport commits up to
 > >>>>>>>>6ddc9deacc13 ("SUNRPC: Fix backchannel reply, again") but 
this quickly
 > >>>>>>>>seems to indicate it is probably still not the right thing for
 > >>>>>>>>backporting to the older stable series.
 > >>>>>>>>
 > >>>>>>>>As at least pre-requisites:
 > >>>>>>>>
 > >>>>>>>>2009e32997ed568a305cf9bc7bf27d22e0f6ccda
 > >>>>>>>>4119bd0306652776cb0b7caa3aea5b2a93aecb89
 > >>>>>>>>163cdfca341b76c958567ae0966bd3575c5c6192
 > >>>>>>>>f4afc8fead386c81fda2593ad6162271d26667f8
 > >>>>>>>>6ed8cdf967f7e9fc96cd1c129719ef99db2f9afc
 > >>>>>>>>57331a59ac0d680f606403eb24edd3c35aecba31
 > >>>>>>>>
 > >>>>>>>>and still there would be conflicting codepaths (and does not seem
 > >>>>>>>>right).
 > >>>>>>>>
 > >>>>>>>>Chuck, Benjamin, Trond, is there anything we can provive on 
reporters
 > >>>>>>>>side that we can try to tackle this issue better?
 > >>>>>>>
 > >>>>>>>As I've indicated before, NFSD should not block no matter what the
 > >>>>>>>client may or may not be doing. I'd like to focus on the 
server first.
 > >>>>>>>
 > >>>>>>>What is the result of:
 > >>>>>>>
 > >>>>>>>$ cd <Bookworm's v6.1.90 kernel source >
 > >>>>>>>$ unset KBUILD_OUTPUT
 > >>>>>>>$ make -j `nproc`
 > >>>>>>>$ scripts/faddr2line \
 > >>>>>>> fs/nfsd/nfs4state.o \
 > >>>>>>> nfsd4_destroy_session+0x16d
 > >>>>>>>
 > >>>>>>>Since this issue appeared after v6.1.1, is it possible to bisect
 > >>>>>>>between v6.1.1 and v6.1.90 ?
 > >>>>>>
 > >>>>>>First please note, at least speaking of triggering the issue in
 > >>>>>>Debian, Debian has moved to 6.1.119 based kernel already (and 
soon in
 > >>>>>>the weekends point release move to 6.1.123).
 > >>>>>>
 > >>>>>>That said, one of the users which regularly seems to be hit by the
 > >>>>>>issue was able to provide the above requested information, 
based for
 > >>>>>>6.1.119:
 > >>>>>>
 > >>>>>>~/kernel/linux-source-6.1# make kernelversion
 > >>>>>>6.1.119
 > >>>>>>~/kernel/linux-source-6.1# scripts/faddr2line 
fs/nfsd/nfs4state.o nfsd4_destroy_session+0x16d
 > >>>>>>nfsd4_destroy_session+0x16d/0x250:
 > >>>>>>__list_del_entry at 
/root/kernel/linux-source-6.1/./include/linux/list.h:134
 > >>>>>>(inlined by) list_del at 
/root/kernel/linux-source-6.1/./include/linux/list.h:148
 > >>>>>>(inlined by) unhash_session at 
/root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:2062
 > >>>>>>(inlined by) nfsd4_destroy_session at 
/root/kernel/linux-source-6.1/fs/nfsd/nfs4state.c:3856
 > >>>>>>
 > >>>>>>They could provide as well a decode_stacktrace output for the 
recent
 > >>>>>>hit (if that is helpful for you):
 > >>>>>>
 > >>>>>>[Mon Jan 6 13:25:28 2025] INFO: task nfsd:55306 blocked for 
more than 6883 seconds.
 > >>>>>>[Mon Jan 6 13:25:28 2025]       Not tainted 6.1.0-28-amd64 #1 
Debian 6.1.119-1
 > >>>>>>[Mon Jan 6 13:25:28 2025] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
 > >>>>>>[Mon Jan 6 13:25:28 2025] task:nfsd            state:D stack:0 
    pid:55306 ppid:2      flags:0x00004000
 > >>>>>>[Mon Jan 6 13:25:28 2025] Call Trace:
 > >>>>>>[Mon Jan 6 13:25:28 2025]  <TASK>
 > >>>>>>[Mon Jan 6 13:25:28 2025] __schedule+0x34d/0x9e0
 > >>>>>>[Mon Jan 6 13:25:28 2025] schedule+0x5a/0xd0
 > >>>>>>[Mon Jan 6 13:25:28 2025] schedule_timeout+0x118/0x150
 > >>>>>>[Mon Jan 6 13:25:28 2025] wait_for_completion+0x86/0x160
 > >>>>>>[Mon Jan 6 13:25:28 2025] __flush_workqueue+0x152/0x420
 > >>>>>>[Mon Jan 6 13:25:28 2025] nfsd4_destroy_session 
(debian/build/build_amd64_none_amd64/include/linux/spinlock.h:351 
debian/build/build_amd64_none_amd64/fs/nfsd/nfs4state.c:3861) nfsd
 > >>>>>>[Mon Jan 6 13:25:28 2025] nfsd4_proc_compound 
(debian/build/build_amd64_none_amd64/fs/nfsd/nfs4proc.c:2680) nfsd
 > >>>>>>[Mon Jan 6 13:25:28 2025] nfsd_dispatch 
(debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:1022) nfsd
 > >>>>>>[Mon Jan 6 13:25:28 2025] svc_process_common 
(debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1344) sunrpc
 > >>>>>>[Mon Jan 6 13:25:28 2025] ? svc_recv 
(debian/build/build_amd64_none_amd64/net/sunrpc/svc_xprt.c:897) sunrpc
 > >>>>>>[Mon Jan 6 13:25:28 2025] ? nfsd_svc 
(debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:983) nfsd
 > >>>>>>[Mon Jan 6 13:25:28 2025] ? nfsd_inet6addr_event 
(debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:922) nfsd
 > >>>>>>[Mon Jan 6 13:25:28 2025] svc_process 
(debian/build/build_amd64_none_amd64/net/sunrpc/svc.c:1474) sunrpc
 > >>>>>>[Mon Jan 6 13:25:28 2025] nfsd 
(debian/build/build_amd64_none_amd64/fs/nfsd/nfssvc.c:960) nfsd
 > >>>>>>[Mon Jan 6 13:25:28 2025] kthread+0xd7/0x100
 > >>>>>>[Mon Jan 6 13:25:28 2025] ? kthread_complete_and_exit+0x20/0x20
 > >>>>>>[Mon Jan 6 13:25:28 2025] ret_from_fork+0x1f/0x30
 > >>>>>>[Mon Jan  6 13:25:28 2025]  </TASK>
 > >>>>>>
 > >>>>>>The question about bisection is actually harder, those are 
production
 > >>>>>>systems and I understand it's not possible to do bisect there, 
while
 > >>>>>>unfortunately not reprodcing the issue on "lab conditions".
 > >>>>>>
 > >>>>>>Does the above give us still a clue on what you were looking for?
 > >>>>>
 > >>>>>Thanks for the update.
 > >>>>>
 > >>>>>It's possible that 38f080f3cd19 ("NFSD: Move callback_wq into struct
 > >>>>>nfs4_client"), while not a specific fix for this issue, might 
offer some
 > >>>>>relief by preventing the DESTROY_SESSION hang from affecting all 
other
 > >>>>>clients of the degraded server.
 > >>>>>
 > >>>>>Not having a reproducer or the ability to bisect puts a damper on
 > >>>>>things. The next step, then, is to enable tracing on servers 
where this
 > >>>>>issue can come up, and wait for the hang to occur. The following 
command
 > >>>>>captures information only about callback operation, so it should not
 > >>>>>generate much data or impact server performance.
 > >>>>>
 > >>>>>    # trace-cmd record -e nfsd:nfsd_cb\*
 > >>>>>
 > >>>>>Let that run until the problem occurs, then ^C, compress the 
resulting
 > >>>>>trace.dat file, and either attach it to 1071562 or email it to me
 > >>>>>privately.
 > >>>>thanks for the follow-up.
 > >>>>
 > >>>>I am the "customer" with two affected file servers. We have since 
moved those
 > >>>>servers to the backports kernel (6.11.10) in the hope of forward 
fixing the
 > >>>>issue. If this kernel is stable, I'm afraid I can't go back to 
the 'bad'
 > >>>>kernel (700+ researchers affected..), and we're also not able to 
trigger the
 > >>>>issue on our test environment.
 > >>>
 > >>>Hello Dr. Herzog -
 > >>>
 > >>>If the problem recurs on 6.11, the trace-cmd I suggest above works
 > >>>there as well.
 > >>the bad news is: it just happened again with the bpo kernel.
 > >>
 > >>We then immediately started trace-cmd since usually there are 
several call
 > >>traces in a row and we did get a trace.dat:
 > >>http://people.phys.ethz.ch/~daduke/trace.dat
 > >>
 > >>we did notice however that the stack trace looked a bit different 
this time:
 > >>
 > >>     INFO: task nfsd:2566 blocked for more than 5799 seconds.
 > >>     Tainted: G        W          6.11.10+bpo-amd64 #1 Debia>
 > >>     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables t>
 > >>     task:nfsd            state:D stack:0     pid:2566  tgid:2566 >
 > >>     Call Trace:
 > >>     <TASK>
 > >>     __schedule+0x400/0xad0
 > >>     schedule+0x27/0xf0
 > >>     nfsd4_shutdown_callback+0xfe/0x140 [nfsd]
 > >>     ? __pfx_var_wake_function+0x10/0x10
 > >>     __destroy_client+0x1f0/0x290 [nfsd]
 > >>     nfsd4_destroy_clientid+0xf1/0x1e0 [nfsd]
 > >>     ? svcauth_unix_set_client+0x586/0x5f0 [sunrpc]
 > >>     nfsd4_proc_compound+0x34d/0x670 [nfsd]
 > >>     nfsd_dispatch+0xfd/0x220 [nfsd]
 > >>     svc_process_common+0x2f7/0x700 [sunrpc]
 > >>     ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
 > >>     svc_process+0x131/0x180 [sunrpc]
 > >>     svc_recv+0x830/0xa10 [sunrpc]
 > >>     ? __pfx_nfsd+0x10/0x10 [nfsd]
 > >>     nfsd+0x87/0xf0 [nfsd]
 > >>     kthread+0xcf/0x100
 > >>     ? __pfx_kthread+0x10/0x10
 > >>     ret_from_fork+0x31/0x50
 > >>     ? __pfx_kthread+0x10/0x10
 > >>     ret_from_fork_asm+0x1a/0x30
 > >>     </TASK>
 > >>
 > >>and also the state of the offending client in 
`/proc/fs/nfsd/clients/*/info`
 > >>used to be callback state: UNKNOWN while now it is DOWN or FAULT. 
No idea
 > >>what it means, but I thought it was worth mentioning.
 > >>
 > >
 > >Looks like this is hung in nfsd41_cb_inflight_wait_complete() ? That
 > >probably means that the cl_cb_inflight counter is stuck at >0. I'm
 > >guessing that means that there is some callback that it's expecting to
 > >complete that isn't. From nfsd4_shutdown_callback():
 > >
 > >         /*
 > >          * Note this won't actually result in a null callback;
 > >          * instead, nfsd4_run_cb_null() will detect the killed
 > >          * client, destroy the rpc client, and stop:
 > >          */
 > >         nfsd4_run_cb(&clp->cl_cb_null);
 > >         flush_workqueue(clp->cl_callback_wq);
 > >         nfsd41_cb_inflight_wait_complete(clp);
 > >
 > >...it sounds like that isn't happening properly though.
 > >
 > >It might be interesting to see if you can track down the callback
 > >client in /sys/kernel/debug/sunrpc and see what it's doing.
 >
 > Here's a possible scenario:
 >
 > The trace.dat shows the server is sending a lot of CB_RECALL_ANY
 > operations.
 >
 > Normally a callback occurs only due to some particular client request.
 > Such requests would be completed before a client unmounts.
 >
 > CB_RECALL_ANY is an operation which does not occur due to a particular
 > client operation and can occur at any time. I believe this is the
 > first operation of this type we've added to NFSD.
 >
 > My guess is that the server's laundromat has sent some CB_RECALL_ANY
 > operations while the client is umounting -- DESTROY_SESSION is
 > racing with those callback operations.
 >
 > deleg_reaper() was backported to 6.1.y in 6.1.81, which lines up
 > more or less with when the issues started to show up.
 >
 > A quick way to test this theory would be to make deleg_reaper() a
 > no-op. If this helps, then we know that shutting down the server's
 > callback client needs to be more careful about cleaning up outstanding
 > callbacks.
 >
 >
 > --
 > Chuck Lever

Hello !

Got It !

Strangely today my two NFS servers hangs "almost" simultaneously :

One the Jan 14 at 09:14
One the Jan 14 at 10:14

The kernel record is in attachment with my syslog.

Here some additional info :

I my case, during the high school working time, the crash occur the same 
day every week, but the hour vary. After a vacation period the day of 
the week change but stay the same during the working period.

On September the crash was every Monday.
On December the crash was every Friday.
And now the crash occur every Tuesday.

So I suspect that some log rotation or automated Weekly job on the 
client side produce the problem. I can't find anything on the server 
side checking systemd timers.

Rebooting the server the morning before the "planned" crash avoid 
generally the hang... very strange.

Thanks a lot !

Regards,

Baptiste.

[-- Attachment #2: record_nfsdc_20250114.tar.gz --]
[-- Type: application/gzip, Size: 2196981 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-01-17 20:28 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17  6:55 nfsd becomes a zombie Harald Dunkel
2024-06-17 14:31 ` Chuck Lever III
2024-06-17 19:20   ` Calum Mackay
2024-06-18 14:29     ` Harald Dunkel
2024-06-18 14:32       ` Harald Dunkel
2024-06-18 14:52       ` Chuck Lever
2024-06-19  7:32         ` Harald Dunkel
2024-06-19  7:56           ` Harald Dunkel
2024-06-19 13:14           ` Chuck Lever III
2024-06-20  5:29             ` Harald Dunkel
2024-06-20 19:09               ` Chuck Lever III
2024-07-02 17:25             ` Harald Dunkel
2024-07-02 18:17               ` Chuck Lever III
2024-07-03  4:14                 ` Harald Dunkel
2024-12-25  9:15   ` nfsd blocks indefinitely in nfsd4_destroy_session (was: Re: nfsd becomes a zombie) Salvatore Bonaccorso
2025-01-01 19:24     ` nfsd blocks indefinitely in nfsd4_destroy_session Chuck Lever
2025-01-07 21:17       ` Salvatore Bonaccorso
2025-01-08 13:33         ` Chuck Lever
2025-01-08 14:54           ` Christian Herzog
2025-01-08 15:07             ` Chuck Lever
2025-01-09 11:56               ` Christian Herzog
2025-01-09 12:42                 ` Jeff Layton
2025-01-09 13:56                   ` Chuck Lever
2025-01-09 16:32                   ` Chuck Lever
     [not found]                     ` <f0705a65549ef253.67823675@ac-grenoble.fr>
2025-01-16 20:07                       ` Chuck Lever
2025-01-17 19:43                         ` Baptiste PELLEGRIN
2025-01-17 20:27                           ` Chuck Lever
2025-01-09 15:49                 ` Chuck Lever
2025-01-09 15:58                   ` Christian Herzog
2025-01-09 16:09                     ` Chuck Lever
  -- strict thread matches above, loose matches on Subject: below --
2025-01-14 20:31 Baptiste PELLEGRIN

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox