nfsd stuck in D (disk sleep) state

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* nfsd stuck in D (disk sleep) state
@ 2024-10-23 19:27 Benoît Gschwind
  2024-10-23 19:38 ` Chuck Lever III
  0 siblings, 1 reply; 12+ messages in thread
From: Benoît Gschwind @ 2024-10-23 19:27 UTC (permalink / raw)
  To: Linux NFS Mailing List

Hello,

I have a nfs server using debian 11 (Linux hostname 6.1.0-25-amd64 #1
SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux)

In some heavy workload some nfsd goes in D state and seems to never
leave this state. I did a python script to monitor how long a process
stay in particular state and I use it to monitor nfsd state. I get the
following result :

[...]
 178056 I (idle) 0:25:24.475 [nfsd]
 178057 I (idle) 0:25:24.475 [nfsd]
 178058 I (idle) 0:25:24.475 [nfsd]
 178059 I (idle) 0:25:24.475 [nfsd]
 178060 I (idle) 0:25:24.475 [nfsd]
 178061 I (idle) 0:25:24.475 [nfsd]
 178062 I (idle) 0:24:15.638 [nfsd]
 178063 I (idle) 0:24:13.488 [nfsd]
 178064 I (idle) 0:24:13.488 [nfsd]
 178065 I (idle) 0:00:00.000 [nfsd]
 178066 I (idle) 0:00:00.000 [nfsd]
 178067 I (idle) 0:00:00.000 [nfsd]
 178068 I (idle) 0:00:00.000 [nfsd]
 178069 S (sleeping) 0:00:02.147 [nfsd]
 178070 S (sleeping) 0:00:02.147 [nfsd]
 178071 S (sleeping) 0:00:02.147 [nfsd]
 178072 S (sleeping) 0:00:02.147 [nfsd]
 178073 S (sleeping) 0:00:02.147 [nfsd]
 178074 D (disk sleep) 1:29:25.809 [nfsd]
 178075 S (sleeping) 0:00:02.147 [nfsd]
 178076 S (sleeping) 0:00:02.147 [nfsd]
 178077 S (sleeping) 0:00:02.147 [nfsd]
 178078 S (sleeping) 0:00:02.147 [nfsd]
 178079 S (sleeping) 0:00:02.147 [nfsd]
 178080 D (disk sleep) 1:29:25.809 [nfsd]
 178081 D (disk sleep) 1:29:25.809 [nfsd]
 178082 D (disk sleep) 0:28:04.444 [nfsd]

All process not shown are in idle state. Columns are the following:
PID, state, state name, amoung of time the state did not changed and
the process was not interrupted, and /proc/PID/status Name entry.

As you can read some nfsd process are in disk sleep state since more
than 1 hour, but looking at the disk activity, there is almost no I/O.

I tried to restart nfs-server but I get the following error from the
kernel:

oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32 when sending 20 bytes - shutting down socket
oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32 when sending 20 bytes - shutting down socket

The only way to recover seems to reboot the kernel. I guess because the
kernel force the reboot after a given timeout.

My setup involve in order :
 - scsi driver
 - mdraid on top of scsi (raid6)
 - btrfs ontop of mdraid
 - nfsd ontop of btrfs

The setup is not very fast as expected, but it seems that in some
situation nfsd never leave the disk sleep state. the exports options
are: gss/krb5i(rw,sync,no_wdelay,no_subtree_check,fsid=XXXXX). The
situation is not commun but it's always happen at some point. For
instance in the case I report here, my server booted the 2024-10-01 and
was stuck about the 2024-10-23. I did reduced by a large amount the
frequency of issue by using no_wdelay (I did thought that I did solved
the issue when I started to use this option).

My guess is hadware bug, scsi bug, btrfs bug or nfsd bug ?

Any clue on this topic or any advice is wellcome.

Best regards.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-23 19:27 nfsd stuck in D (disk sleep) state Benoît Gschwind
@ 2024-10-23 19:38 ` Chuck Lever III
  2024-10-24  8:07   ` Benoît Gschwind
  2024-10-28  9:18   ` Benoît Gschwind
  0 siblings, 2 replies; 12+ messages in thread
From: Chuck Lever III @ 2024-10-23 19:38 UTC (permalink / raw)
  To: Benoît Gschwind; +Cc: Linux NFS Mailing List



> On Oct 23, 2024, at 3:27 PM, Benoît Gschwind <benoit.gschwind@minesparis.psl.eu> wrote:
> 
> Hello,
> 
> I have a nfs server using debian 11 (Linux hostname 6.1.0-25-amd64 #1
> SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux)
> 
> In some heavy workload some nfsd goes in D state and seems to never
> leave this state. I did a python script to monitor how long a process
> stay in particular state and I use it to monitor nfsd state. I get the
> following result :
> 
> [...]
> 178056 I (idle) 0:25:24.475 [nfsd]
> 178057 I (idle) 0:25:24.475 [nfsd]
> 178058 I (idle) 0:25:24.475 [nfsd]
> 178059 I (idle) 0:25:24.475 [nfsd]
> 178060 I (idle) 0:25:24.475 [nfsd]
> 178061 I (idle) 0:25:24.475 [nfsd]
> 178062 I (idle) 0:24:15.638 [nfsd]
> 178063 I (idle) 0:24:13.488 [nfsd]
> 178064 I (idle) 0:24:13.488 [nfsd]
> 178065 I (idle) 0:00:00.000 [nfsd]
> 178066 I (idle) 0:00:00.000 [nfsd]
> 178067 I (idle) 0:00:00.000 [nfsd]
> 178068 I (idle) 0:00:00.000 [nfsd]
> 178069 S (sleeping) 0:00:02.147 [nfsd]
> 178070 S (sleeping) 0:00:02.147 [nfsd]
> 178071 S (sleeping) 0:00:02.147 [nfsd]
> 178072 S (sleeping) 0:00:02.147 [nfsd]
> 178073 S (sleeping) 0:00:02.147 [nfsd]
> 178074 D (disk sleep) 1:29:25.809 [nfsd]
> 178075 S (sleeping) 0:00:02.147 [nfsd]
> 178076 S (sleeping) 0:00:02.147 [nfsd]
> 178077 S (sleeping) 0:00:02.147 [nfsd]
> 178078 S (sleeping) 0:00:02.147 [nfsd]
> 178079 S (sleeping) 0:00:02.147 [nfsd]
> 178080 D (disk sleep) 1:29:25.809 [nfsd]
> 178081 D (disk sleep) 1:29:25.809 [nfsd]
> 178082 D (disk sleep) 0:28:04.444 [nfsd]
> 
> All process not shown are in idle state. Columns are the following:
> PID, state, state name, amoung of time the state did not changed and
> the process was not interrupted, and /proc/PID/status Name entry.
> 
> As you can read some nfsd process are in disk sleep state since more
> than 1 hour, but looking at the disk activity, there is almost no I/O.
> 
> I tried to restart nfs-server but I get the following error from the
> kernel:
> 
> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104 when sending 20 bytes - shutting down socket
> oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32 when sending 20 bytes - shutting down socket
> oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32 when sending 20 bytes - shutting down socket
> 
> The only way to recover seems to reboot the kernel. I guess because the
> kernel force the reboot after a given timeout.
> 
> My setup involve in order :
> - scsi driver
> - mdraid on top of scsi (raid6)
> - btrfs ontop of mdraid
> - nfsd ontop of btrfs
> 
> 
> The setup is not very fast as expected, but it seems that in some
> situation nfsd never leave the disk sleep state. the exports options
> are: gss/krb5i(rw,sync,no_wdelay,no_subtree_check,fsid=XXXXX). The
> situation is not commun but it's always happen at some point. For
> instance in the case I report here, my server booted the 2024-10-01 and
> was stuck about the 2024-10-23. I did reduced by a large amount the
> frequency of issue by using no_wdelay (I did thought that I did solved
> the issue when I started to use this option).
> 
> My guess is hadware bug, scsi bug, btrfs bug or nfsd bug ?
> 
> Any clue on this topic or any advice is wellcome.

Generate stack traces for each process on the system
using "sudo echo t > /proc/sysrq-trigger" and then
examine the output in the system journal. Note the
stack contents for the processes that look stuck.

--
Chuck Lever



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-23 19:38 ` Chuck Lever III
@ 2024-10-24  8:07   ` Benoît Gschwind
  2024-10-28  9:18   ` Benoît Gschwind
  1 sibling, 0 replies; 12+ messages in thread
From: Benoît Gschwind @ 2024-10-24  8:07 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

Hello,

I will do it next time I get the issue, thanks by advance.

Best regards

Le mercredi 23 octobre 2024 à 19:38 +0000, Chuck Lever III a écrit :
> 
> 
> > On Oct 23, 2024, at 3:27 PM, Benoît Gschwind
> > <benoit.gschwind@minesparis.psl.eu> wrote:
> > 
> > Hello,
> > 
> > I have a nfs server using debian 11 (Linux hostname 6.1.0-25-amd64
> > #1
> > SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux)
> > 
> > In some heavy workload some nfsd goes in D state and seems to never
> > leave this state. I did a python script to monitor how long a
> > process
> > stay in particular state and I use it to monitor nfsd state. I get
> > the
> > following result :
> > 
> > [...]
> > 178056 I (idle) 0:25:24.475 [nfsd]
> > 178057 I (idle) 0:25:24.475 [nfsd]
> > 178058 I (idle) 0:25:24.475 [nfsd]
> > 178059 I (idle) 0:25:24.475 [nfsd]
> > 178060 I (idle) 0:25:24.475 [nfsd]
> > 178061 I (idle) 0:25:24.475 [nfsd]
> > 178062 I (idle) 0:24:15.638 [nfsd]
> > 178063 I (idle) 0:24:13.488 [nfsd]
> > 178064 I (idle) 0:24:13.488 [nfsd]
> > 178065 I (idle) 0:00:00.000 [nfsd]
> > 178066 I (idle) 0:00:00.000 [nfsd]
> > 178067 I (idle) 0:00:00.000 [nfsd]
> > 178068 I (idle) 0:00:00.000 [nfsd]
> > 178069 S (sleeping) 0:00:02.147 [nfsd]
> > 178070 S (sleeping) 0:00:02.147 [nfsd]
> > 178071 S (sleeping) 0:00:02.147 [nfsd]
> > 178072 S (sleeping) 0:00:02.147 [nfsd]
> > 178073 S (sleeping) 0:00:02.147 [nfsd]
> > 178074 D (disk sleep) 1:29:25.809 [nfsd]
> > 178075 S (sleeping) 0:00:02.147 [nfsd]
> > 178076 S (sleeping) 0:00:02.147 [nfsd]
> > 178077 S (sleeping) 0:00:02.147 [nfsd]
> > 178078 S (sleeping) 0:00:02.147 [nfsd]
> > 178079 S (sleeping) 0:00:02.147 [nfsd]
> > 178080 D (disk sleep) 1:29:25.809 [nfsd]
> > 178081 D (disk sleep) 1:29:25.809 [nfsd]
> > 178082 D (disk sleep) 0:28:04.444 [nfsd]
> > 
> > All process not shown are in idle state. Columns are the following:
> > PID, state, state name, amoung of time the state did not changed
> > and
> > the process was not interrupted, and /proc/PID/status Name entry.
> > 
> > As you can read some nfsd process are in disk sleep state since
> > more
> > than 1 hour, but looking at the disk activity, there is almost no
> > I/O.
> > 
> > I tried to restart nfs-server but I get the following error from
> > the
> > kernel:
> > 
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
> > when sending 20 bytes - shutting down socket
> > 
> > The only way to recover seems to reboot the kernel. I guess because
> > the
> > kernel force the reboot after a given timeout.
> > 
> > My setup involve in order :
> > - scsi driver
> > - mdraid on top of scsi (raid6)
> > - btrfs ontop of mdraid
> > - nfsd ontop of btrfs
> > 
> > 
> > The setup is not very fast as expected, but it seems that in some
> > situation nfsd never leave the disk sleep state. the exports
> > options
> > are: gss/krb5i(rw,sync,no_wdelay,no_subtree_check,fsid=XXXXX). The
> > situation is not commun but it's always happen at some point. For
> > instance in the case I report here, my server booted the 2024-10-01
> > and
> > was stuck about the 2024-10-23. I did reduced by a large amount the
> > frequency of issue by using no_wdelay (I did thought that I did
> > solved
> > the issue when I started to use this option).
> > 
> > My guess is hadware bug, scsi bug, btrfs bug or nfsd bug ?
> > 
> > Any clue on this topic or any advice is wellcome.
> 
> Generate stack traces for each process on the system
> using "sudo echo t > /proc/sysrq-trigger" and then
> examine the output in the system journal. Note the
> stack contents for the processes that look stuck.
> 
> --
> Chuck Lever
> 
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-23 19:38 ` Chuck Lever III
  2024-10-24  8:07   ` Benoît Gschwind
@ 2024-10-28  9:18   ` Benoît Gschwind
  2024-10-28 12:46     ` Tom Talpey
  1 sibling, 1 reply; 12+ messages in thread
From: Benoît Gschwind @ 2024-10-28  9:18 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 5806 bytes --]

Hello,

The issue trigger again, I attached the result of:

# dmesg -W | tee dmesg.txt

using:

# echo t > /proc/sysrq-trigger

I have the following PID stuck:

    1474 D (disk sleep)       0:54:58.602 [nfsd]
    1475 D (disk sleep)       0:54:58.602 [nfsd]
    1484 D (disk sleep)       0:54:58.602 [nfsd]
    1495 D (disk sleep)       0:54:58.602 [nfsd]

Thank by advance,
Best regards

Le mercredi 23 octobre 2024 à 19:38 +0000, Chuck Lever III a écrit :
> 
> 
> > On Oct 23, 2024, at 3:27 PM, Benoît Gschwind
> > <benoit.gschwind@minesparis.psl.eu> wrote:
> > 
> > Hello,
> > 
> > I have a nfs server using debian 11 (Linux hostname 6.1.0-25-amd64
> > #1
> > SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux)
> > 
> > In some heavy workload some nfsd goes in D state and seems to never
> > leave this state. I did a python script to monitor how long a
> > process
> > stay in particular state and I use it to monitor nfsd state. I get
> > the
> > following result :
> > 
> > [...]
> > 178056 I (idle) 0:25:24.475 [nfsd]
> > 178057 I (idle) 0:25:24.475 [nfsd]
> > 178058 I (idle) 0:25:24.475 [nfsd]
> > 178059 I (idle) 0:25:24.475 [nfsd]
> > 178060 I (idle) 0:25:24.475 [nfsd]
> > 178061 I (idle) 0:25:24.475 [nfsd]
> > 178062 I (idle) 0:24:15.638 [nfsd]
> > 178063 I (idle) 0:24:13.488 [nfsd]
> > 178064 I (idle) 0:24:13.488 [nfsd]
> > 178065 I (idle) 0:00:00.000 [nfsd]
> > 178066 I (idle) 0:00:00.000 [nfsd]
> > 178067 I (idle) 0:00:00.000 [nfsd]
> > 178068 I (idle) 0:00:00.000 [nfsd]
> > 178069 S (sleeping) 0:00:02.147 [nfsd]
> > 178070 S (sleeping) 0:00:02.147 [nfsd]
> > 178071 S (sleeping) 0:00:02.147 [nfsd]
> > 178072 S (sleeping) 0:00:02.147 [nfsd]
> > 178073 S (sleeping) 0:00:02.147 [nfsd]
> > 178074 D (disk sleep) 1:29:25.809 [nfsd]
> > 178075 S (sleeping) 0:00:02.147 [nfsd]
> > 178076 S (sleeping) 0:00:02.147 [nfsd]
> > 178077 S (sleeping) 0:00:02.147 [nfsd]
> > 178078 S (sleeping) 0:00:02.147 [nfsd]
> > 178079 S (sleeping) 0:00:02.147 [nfsd]
> > 178080 D (disk sleep) 1:29:25.809 [nfsd]
> > 178081 D (disk sleep) 1:29:25.809 [nfsd]
> > 178082 D (disk sleep) 0:28:04.444 [nfsd]
> > 
> > All process not shown are in idle state. Columns are the following:
> > PID, state, state name, amoung of time the state did not changed
> > and
> > the process was not interrupted, and /proc/PID/status Name entry.
> > 
> > As you can read some nfsd process are in disk sleep state since
> > more
> > than 1 hour, but looking at the disk activity, there is almost no
> > I/O.
> > 
> > I tried to restart nfs-server but I get the following error from
> > the
> > kernel:
> > 
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
> > when sending 20 bytes - shutting down socket
> > oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
> > when sending 20 bytes - shutting down socket
> > 
> > The only way to recover seems to reboot the kernel. I guess because
> > the
> > kernel force the reboot after a given timeout.
> > 
> > My setup involve in order :
> > - scsi driver
> > - mdraid on top of scsi (raid6)
> > - btrfs ontop of mdraid
> > - nfsd ontop of btrfs
> > 
> > 
> > The setup is not very fast as expected, but it seems that in some
> > situation nfsd never leave the disk sleep state. the exports
> > options
> > are: gss/krb5i(rw,sync,no_wdelay,no_subtree_check,fsid=XXXXX). The
> > situation is not commun but it's always happen at some point. For
> > instance in the case I report here, my server booted the 2024-10-01
> > and
> > was stuck about the 2024-10-23. I did reduced by a large amount the
> > frequency of issue by using no_wdelay (I did thought that I did
> > solved
> > the issue when I started to use this option).
> > 
> > My guess is hadware bug, scsi bug, btrfs bug or nfsd bug ?
> > 
> > Any clue on this topic or any advice is wellcome.
> 
> Generate stack traces for each process on the system
> using "sudo echo t > /proc/sysrq-trigger" and then
> examine the output in the system journal. Note the
> stack contents for the processes that look stuck.
> 
> --
> Chuck Lever
> 
> 


[-- Attachment #2: dmesg.txt.gz --]
[-- Type: application/gzip, Size: 68622 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-28  9:18   ` Benoît Gschwind
@ 2024-10-28 12:46     ` Tom Talpey
  2024-10-28 13:32       ` Chuck Lever III
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Tom Talpey @ 2024-10-28 12:46 UTC (permalink / raw)
  To: Benoît Gschwind, Chuck Lever III; +Cc: Linux NFS Mailing List

On 10/28/2024 5:18 AM, Benoît Gschwind wrote:
> Hello,
> 
> The issue trigger again, I attached the result of:
> 
> # dmesg -W | tee dmesg.txt
> 
> using:
> 
> # echo t > /proc/sysrq-trigger
> 
> I have the following PID stuck:
> 
>      1474 D (disk sleep)       0:54:58.602 [nfsd]
>      1475 D (disk sleep)       0:54:58.602 [nfsd]
>      1484 D (disk sleep)       0:54:58.602 [nfsd]
>      1495 D (disk sleep)       0:54:58.602 [nfsd]

Hmm, 1495 is stuck in nfsd4_create_session

 > [427468.304955] task:nfsd            state:D stack:0     pid:1495 
ppid:2      flags:0x00004000
 > [427468.304962] Call Trace:
 > [427468.304965]  <TASK>
 > [427468.304971]  __schedule+0x34d/0x9e0
 > [427468.304983]  schedule+0x5a/0xd0
 > [427468.304991]  schedule_timeout+0x118/0x150
 > [427468.305003]  wait_for_completion+0x86/0x160
 > [427468.305015]  __flush_workqueue+0x152/0x420
 > [427468.305031]  nfsd4_create_session+0x79f/0xba0 [nfsd]
 > [427468.305092]  nfsd4_proc_compound+0x34c/0x660 [nfsd]
 > [427468.305147]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
 > [427468.305199]  svc_process_common+0x295/0x610 [sunrpc]
 > [427468.305269]  ? svc_recv+0x491/0x810 [sunrpc]
 > [427468.305337]  ? nfsd_svc+0x370/0x370 [nfsd]
 > [427468.305389]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
 > [427468.305437]  svc_process+0xad/0x100 [sunrpc]
 > [427468.305505]  nfsd+0x99/0x140 [nfsd]
 > [427468.305555]  kthread+0xda/0x100
 > [427468.305562]  ? kthread_complete_and_exit+0x20/0x20
 > [427468.305572]  ret_from_fork+0x22/0x30

and the other three are stuck in nfsd4_destroy_session

 > [427468.298315] task:nfsd            state:D stack:0     pid:1474 
ppid:2      flags:0x00004000
 > [427468.298322] Call Trace:
 > [427468.298326]  <TASK>
 > [427468.298332]  __schedule+0x34d/0x9e0
 > [427468.298343]  schedule+0x5a/0xd0
 > [427468.298350]  schedule_timeout+0x118/0x150
 > [427468.298362]  wait_for_completion+0x86/0x160
 > [427468.298375]  __flush_workqueue+0x152/0x420
 > [427468.298392]  nfsd4_destroy_session+0x1b6/0x250 [nfsd]
 > [427468.298456]  nfsd4_proc_compound+0x34c/0x660 [nfsd]
 > [427468.298515]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
 > [427468.298568]  svc_process_common+0x295/0x610 [sunrpc]
 > [427468.298643]  ? svc_recv+0x491/0x810 [sunrpc]
 > [427468.298711]  ? nfsd_svc+0x370/0x370 [nfsd]
 > [427468.298776]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
 > [427468.298825]  svc_process+0xad/0x100 [sunrpc]
 > [427468.298896]  nfsd+0x99/0x140 [nfsd]
 > [427468.298946]  kthread+0xda/0x100
 > [427468.298954]  ? kthread_complete_and_exit+0x20/0x20
 > [427468.298963]  ret_from_fork+0x22/0x30

There aren't a lot of 6.1-era changes in either of these, but there
are some interesting behavior updates around session create replay
from early this year. I wonder if the 6.1 server is mishandling an
nfserr_jukebox situation in nfsd4_session_create.

Was the client actually attempting to mount or unmount?

Tom.

> 
> Thank by advance,
> Best regards
> 
> Le mercredi 23 octobre 2024 à 19:38 +0000, Chuck Lever III a écrit :
>>
>>
>>> On Oct 23, 2024, at 3:27 PM, Benoît Gschwind
>>> <benoit.gschwind@minesparis.psl.eu> wrote:
>>>
>>> Hello,
>>>
>>> I have a nfs server using debian 11 (Linux hostname 6.1.0-25-amd64
>>> #1
>>> SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux)
>>>
>>> In some heavy workload some nfsd goes in D state and seems to never
>>> leave this state. I did a python script to monitor how long a
>>> process
>>> stay in particular state and I use it to monitor nfsd state. I get
>>> the
>>> following result :
>>>
>>> [...]
>>> 178056 I (idle) 0:25:24.475 [nfsd]
>>> 178057 I (idle) 0:25:24.475 [nfsd]
>>> 178058 I (idle) 0:25:24.475 [nfsd]
>>> 178059 I (idle) 0:25:24.475 [nfsd]
>>> 178060 I (idle) 0:25:24.475 [nfsd]
>>> 178061 I (idle) 0:25:24.475 [nfsd]
>>> 178062 I (idle) 0:24:15.638 [nfsd]
>>> 178063 I (idle) 0:24:13.488 [nfsd]
>>> 178064 I (idle) 0:24:13.488 [nfsd]
>>> 178065 I (idle) 0:00:00.000 [nfsd]
>>> 178066 I (idle) 0:00:00.000 [nfsd]
>>> 178067 I (idle) 0:00:00.000 [nfsd]
>>> 178068 I (idle) 0:00:00.000 [nfsd]
>>> 178069 S (sleeping) 0:00:02.147 [nfsd]
>>> 178070 S (sleeping) 0:00:02.147 [nfsd]
>>> 178071 S (sleeping) 0:00:02.147 [nfsd]
>>> 178072 S (sleeping) 0:00:02.147 [nfsd]
>>> 178073 S (sleeping) 0:00:02.147 [nfsd]
>>> 178074 D (disk sleep) 1:29:25.809 [nfsd]
>>> 178075 S (sleeping) 0:00:02.147 [nfsd]
>>> 178076 S (sleeping) 0:00:02.147 [nfsd]
>>> 178077 S (sleeping) 0:00:02.147 [nfsd]
>>> 178078 S (sleeping) 0:00:02.147 [nfsd]
>>> 178079 S (sleeping) 0:00:02.147 [nfsd]
>>> 178080 D (disk sleep) 1:29:25.809 [nfsd]
>>> 178081 D (disk sleep) 1:29:25.809 [nfsd]
>>> 178082 D (disk sleep) 0:28:04.444 [nfsd]
>>>
>>> All process not shown are in idle state. Columns are the following:
>>> PID, state, state name, amoung of time the state did not changed
>>> and
>>> the process was not interrupted, and /proc/PID/status Name entry.
>>>
>>> As you can read some nfsd process are in disk sleep state since
>>> more
>>> than 1 hour, but looking at the disk activity, there is almost no
>>> I/O.
>>>
>>> I tried to restart nfs-server but I get the following error from
>>> the
>>> kernel:
>>>
>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
>>> when sending 20 bytes - shutting down socket
>>>
>>> The only way to recover seems to reboot the kernel. I guess because
>>> the
>>> kernel force the reboot after a given timeout.
>>>
>>> My setup involve in order :
>>> - scsi driver
>>> - mdraid on top of scsi (raid6)
>>> - btrfs ontop of mdraid
>>> - nfsd ontop of btrfs
>>>
>>>
>>> The setup is not very fast as expected, but it seems that in some
>>> situation nfsd never leave the disk sleep state. the exports
>>> options
>>> are: gss/krb5i(rw,sync,no_wdelay,no_subtree_check,fsid=XXXXX). The
>>> situation is not commun but it's always happen at some point. For
>>> instance in the case I report here, my server booted the 2024-10-01
>>> and
>>> was stuck about the 2024-10-23. I did reduced by a large amount the
>>> frequency of issue by using no_wdelay (I did thought that I did
>>> solved
>>> the issue when I started to use this option).
>>>
>>> My guess is hadware bug, scsi bug, btrfs bug or nfsd bug ?
>>>
>>> Any clue on this topic or any advice is wellcome.
>>
>> Generate stack traces for each process on the system
>> using "sudo echo t > /proc/sysrq-trigger" and then
>> examine the output in the system journal. Note the
>> stack contents for the processes that look stuck.
>>
>> --
>> Chuck Lever
>>
>>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-28 12:46     ` Tom Talpey
@ 2024-10-28 13:32       ` Chuck Lever III
  2024-10-28 17:01       ` Benoît Gschwind
  2024-10-28 17:27       ` Jeff Layton
  2 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever III @ 2024-10-28 13:32 UTC (permalink / raw)
  To: Tom Talpey; +Cc: Benoît Gschwind, Linux NFS Mailing List



> On Oct 28, 2024, at 8:46 AM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 10/28/2024 5:18 AM, Benoît Gschwind wrote:
>> Hello,
>> The issue trigger again, I attached the result of:
>> # dmesg -W | tee dmesg.txt
>> using:
>> # echo t > /proc/sysrq-trigger
>> I have the following PID stuck:
>>     1474 D (disk sleep)       0:54:58.602 [nfsd]
>>     1475 D (disk sleep)       0:54:58.602 [nfsd]
>>     1484 D (disk sleep)       0:54:58.602 [nfsd]
>>     1495 D (disk sleep)       0:54:58.602 [nfsd]
> 
> Hmm, 1495 is stuck in nfsd4_create_session
> 
> > [427468.304955] task:nfsd            state:D stack:0     pid:1495 ppid:2      flags:0x00004000
> > [427468.304962] Call Trace:
> > [427468.304965]  <TASK>
> > [427468.304971]  __schedule+0x34d/0x9e0
> > [427468.304983]  schedule+0x5a/0xd0
> > [427468.304991]  schedule_timeout+0x118/0x150
> > [427468.305003]  wait_for_completion+0x86/0x160
> > [427468.305015]  __flush_workqueue+0x152/0x420
> > [427468.305031]  nfsd4_create_session+0x79f/0xba0 [nfsd]
> > [427468.305092]  nfsd4_proc_compound+0x34c/0x660 [nfsd]
> > [427468.305147]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
> > [427468.305199]  svc_process_common+0x295/0x610 [sunrpc]
> > [427468.305269]  ? svc_recv+0x491/0x810 [sunrpc]
> > [427468.305337]  ? nfsd_svc+0x370/0x370 [nfsd]
> > [427468.305389]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> > [427468.305437]  svc_process+0xad/0x100 [sunrpc]
> > [427468.305505]  nfsd+0x99/0x140 [nfsd]
> > [427468.305555]  kthread+0xda/0x100
> > [427468.305562]  ? kthread_complete_and_exit+0x20/0x20
> > [427468.305572]  ret_from_fork+0x22/0x30
> 
> and the other three are stuck in nfsd4_destroy_session
> 
> > [427468.298315] task:nfsd            state:D stack:0     pid:1474 ppid:2      flags:0x00004000
> > [427468.298322] Call Trace:
> > [427468.298326]  <TASK>
> > [427468.298332]  __schedule+0x34d/0x9e0
> > [427468.298343]  schedule+0x5a/0xd0
> > [427468.298350]  schedule_timeout+0x118/0x150
> > [427468.298362]  wait_for_completion+0x86/0x160
> > [427468.298375]  __flush_workqueue+0x152/0x420
> > [427468.298392]  nfsd4_destroy_session+0x1b6/0x250 [nfsd]
> > [427468.298456]  nfsd4_proc_compound+0x34c/0x660 [nfsd]
> > [427468.298515]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
> > [427468.298568]  svc_process_common+0x295/0x610 [sunrpc]
> > [427468.298643]  ? svc_recv+0x491/0x810 [sunrpc]
> > [427468.298711]  ? nfsd_svc+0x370/0x370 [nfsd]
> > [427468.298776]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> > [427468.298825]  svc_process+0xad/0x100 [sunrpc]
> > [427468.298896]  nfsd+0x99/0x140 [nfsd]
> > [427468.298946]  kthread+0xda/0x100
> > [427468.298954]  ? kthread_complete_and_exit+0x20/0x20
> > [427468.298963]  ret_from_fork+0x22/0x30
> 
> There aren't a lot of 6.1-era changes in either of these, but there
> are some interesting behavior updates around session create replay
> from early this year. I wonder if the 6.1 server is mishandling an
> nfserr_jukebox situation in nfsd4_session_create.

For reference:

10396f4df8b7 ("nfsd: hold a lighter-weight client reference over CB_RECALL_ANY")
99dc2ef0397d ("NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies")
e4469c6cc69b ("NFSD: Fix the NFSv4.1 CREATE_SESSION operation")

A network capture, once these processes are stuck, might
confirm CREATE_SESSION slot replay as a culprit.


> Was the client actually attempting to mount or unmount?
> 
> Tom.
> 
>> Thank by advance,
>> Best regards
>> Le mercredi 23 octobre 2024 à 19:38 +0000, Chuck Lever III a écrit :
>>> 
>>> 
>>>> On Oct 23, 2024, at 3:27 PM, Benoît Gschwind
>>>> <benoit.gschwind@minesparis.psl.eu> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> I have a nfs server using debian 11 (Linux hostname 6.1.0-25-amd64
>>>> #1
>>>> SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux)
>>>> 
>>>> In some heavy workload some nfsd goes in D state and seems to never
>>>> leave this state. I did a python script to monitor how long a
>>>> process
>>>> stay in particular state and I use it to monitor nfsd state. I get
>>>> the
>>>> following result :
>>>> 
>>>> [...]
>>>> 178056 I (idle) 0:25:24.475 [nfsd]
>>>> 178057 I (idle) 0:25:24.475 [nfsd]
>>>> 178058 I (idle) 0:25:24.475 [nfsd]
>>>> 178059 I (idle) 0:25:24.475 [nfsd]
>>>> 178060 I (idle) 0:25:24.475 [nfsd]
>>>> 178061 I (idle) 0:25:24.475 [nfsd]
>>>> 178062 I (idle) 0:24:15.638 [nfsd]
>>>> 178063 I (idle) 0:24:13.488 [nfsd]
>>>> 178064 I (idle) 0:24:13.488 [nfsd]
>>>> 178065 I (idle) 0:00:00.000 [nfsd]
>>>> 178066 I (idle) 0:00:00.000 [nfsd]
>>>> 178067 I (idle) 0:00:00.000 [nfsd]
>>>> 178068 I (idle) 0:00:00.000 [nfsd]
>>>> 178069 S (sleeping) 0:00:02.147 [nfsd]
>>>> 178070 S (sleeping) 0:00:02.147 [nfsd]
>>>> 178071 S (sleeping) 0:00:02.147 [nfsd]
>>>> 178072 S (sleeping) 0:00:02.147 [nfsd]
>>>> 178073 S (sleeping) 0:00:02.147 [nfsd]
>>>> 178074 D (disk sleep) 1:29:25.809 [nfsd]
>>>> 178075 S (sleeping) 0:00:02.147 [nfsd]
>>>> 178076 S (sleeping) 0:00:02.147 [nfsd]
>>>> 178077 S (sleeping) 0:00:02.147 [nfsd]
>>>> 178078 S (sleeping) 0:00:02.147 [nfsd]
>>>> 178079 S (sleeping) 0:00:02.147 [nfsd]
>>>> 178080 D (disk sleep) 1:29:25.809 [nfsd]
>>>> 178081 D (disk sleep) 1:29:25.809 [nfsd]
>>>> 178082 D (disk sleep) 0:28:04.444 [nfsd]
>>>> 
>>>> All process not shown are in idle state. Columns are the following:
>>>> PID, state, state name, amoung of time the state did not changed
>>>> and
>>>> the process was not interrupted, and /proc/PID/status Name entry.
>>>> 
>>>> As you can read some nfsd process are in disk sleep state since
>>>> more
>>>> than 1 hour, but looking at the disk activity, there is almost no
>>>> I/O.
>>>> 
>>>> I tried to restart nfs-server but I get the following error from
>>>> the
>>>> kernel:
>>>> 
>>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
>>>> when sending 20 bytes - shutting down socket
>>>> oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
>>>> when sending 20 bytes - shutting down socket
>>>> 
>>>> The only way to recover seems to reboot the kernel. I guess because
>>>> the
>>>> kernel force the reboot after a given timeout.
>>>> 
>>>> My setup involve in order :
>>>> - scsi driver
>>>> - mdraid on top of scsi (raid6)
>>>> - btrfs ontop of mdraid
>>>> - nfsd ontop of btrfs
>>>> 
>>>> 
>>>> The setup is not very fast as expected, but it seems that in some
>>>> situation nfsd never leave the disk sleep state. the exports
>>>> options
>>>> are: gss/krb5i(rw,sync,no_wdelay,no_subtree_check,fsid=XXXXX). The
>>>> situation is not commun but it's always happen at some point. For
>>>> instance in the case I report here, my server booted the 2024-10-01
>>>> and
>>>> was stuck about the 2024-10-23. I did reduced by a large amount the
>>>> frequency of issue by using no_wdelay (I did thought that I did
>>>> solved
>>>> the issue when I started to use this option).
>>>> 
>>>> My guess is hadware bug, scsi bug, btrfs bug or nfsd bug ?
>>>> 
>>>> Any clue on this topic or any advice is wellcome.
>>> 
>>> Generate stack traces for each process on the system
>>> using "sudo echo t > /proc/sysrq-trigger" and then
>>> examine the output in the system journal. Note the
>>> stack contents for the processes that look stuck.
>>> 
>>> --
>>> Chuck Lever
>>> 
>>> 
> 

--
Chuck Lever



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-28 12:46     ` Tom Talpey
  2024-10-28 13:32       ` Chuck Lever III
@ 2024-10-28 17:01       ` Benoît Gschwind
  2024-10-28 17:38         ` Tom Talpey
  2024-10-28 17:27       ` Jeff Layton
  2 siblings, 1 reply; 12+ messages in thread
From: Benoît Gschwind @ 2024-10-28 17:01 UTC (permalink / raw)
  To: Tom Talpey, Chuck Lever III; +Cc: Linux NFS Mailing List

Le lundi 28 octobre 2024 à 08:46 -0400, Tom Talpey a écrit :
> 
> Was the client actually attempting to mount or unmount?

The servers were running for a while, thus, as far as I know my setup,
there is no ongoing mount or unmount. Moreover none of the clients were
rebooting, starting or shutting down.

> 
> Tom.
> 
> 

--
Benoît Gschwind

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-28 17:01       ` Benoît Gschwind
@ 2024-10-28 17:38         ` Tom Talpey
  0 siblings, 0 replies; 12+ messages in thread
From: Tom Talpey @ 2024-10-28 17:38 UTC (permalink / raw)
  To: Benoît Gschwind, Chuck Lever III; +Cc: Linux NFS Mailing List

On 10/28/2024 1:01 PM, Benoît Gschwind wrote:
> Le lundi 28 octobre 2024 à 08:46 -0400, Tom Talpey a écrit :
>>
>> Was the client actually attempting to mount or unmount?
> 
> The servers were running for a while, thus, as far as I know my setup,
> there is no ongoing mount or unmount. Moreover none of the clients were
> rebooting, starting or shutting down.
Interesting. I would therefore suspect the create_session was simply a
reconnect after some kind of transport connection error/loss, and the
destroy_session callers are hung for the same reason. And I agree with
Chuck that a network trace will help identify the patch which is needed
here.

You may have to let the trace run for a minute or two to detect all the
retry traffic between the client and server threads.

Tom.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-28 12:46     ` Tom Talpey
  2024-10-28 13:32       ` Chuck Lever III
  2024-10-28 17:01       ` Benoît Gschwind
@ 2024-10-28 17:27       ` Jeff Layton
  2024-10-29 13:49         ` Jeff Layton
  2 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2024-10-28 17:27 UTC (permalink / raw)
  To: Tom Talpey, Benoît Gschwind, Chuck Lever III; +Cc: Linux NFS Mailing List

On Mon, 2024-10-28 at 08:46 -0400, Tom Talpey wrote:
> On 10/28/2024 5:18 AM, Benoît Gschwind wrote:
> > Hello,
> > 
> > The issue trigger again, I attached the result of:
> > 
> > # dmesg -W | tee dmesg.txt
> > 
> > using:
> > 
> > # echo t > /proc/sysrq-trigger
> > 
> > I have the following PID stuck:
> > 
> >      1474 D (disk sleep)       0:54:58.602 [nfsd]
> >      1475 D (disk sleep)       0:54:58.602 [nfsd]
> >      1484 D (disk sleep)       0:54:58.602 [nfsd]
> >      1495 D (disk sleep)       0:54:58.602 [nfsd]
> 
> Hmm, 1495 is stuck in nfsd4_create_session
> 
>  > [427468.304955] task:nfsd            state:D stack:0     pid:1495 
> ppid:2      flags:0x00004000
>  > [427468.304962] Call Trace:
>  > [427468.304965]  <TASK>
>  > [427468.304971]  __schedule+0x34d/0x9e0
>  > [427468.304983]  schedule+0x5a/0xd0
>  > [427468.304991]  schedule_timeout+0x118/0x150
>  > [427468.305003]  wait_for_completion+0x86/0x160
>  > [427468.305015]  __flush_workqueue+0x152/0x420
>  > [427468.305031]  nfsd4_create_session+0x79f/0xba0 [nfsd]
>  > [427468.305092]  nfsd4_proc_compound+0x34c/0x660 [nfsd]
>  > [427468.305147]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
>  > [427468.305199]  svc_process_common+0x295/0x610 [sunrpc]
>  > [427468.305269]  ? svc_recv+0x491/0x810 [sunrpc]
>  > [427468.305337]  ? nfsd_svc+0x370/0x370 [nfsd]
>  > [427468.305389]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
>  > [427468.305437]  svc_process+0xad/0x100 [sunrpc]
>  > [427468.305505]  nfsd+0x99/0x140 [nfsd]
>  > [427468.305555]  kthread+0xda/0x100
>  > [427468.305562]  ? kthread_complete_and_exit+0x20/0x20
>  > [427468.305572]  ret_from_fork+0x22/0x30
> 
> and the other three are stuck in nfsd4_destroy_session
> 

All 4 processes are stuck waiting on a flush_workqueue call. It's
probably one of these:

    flush_workqueue(clp->cl_callback_wq);

So the question here is really -- why are the callback workqueue jobs
stuck? I do see this:

[427468.316839] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
[427468.316899] Call Trace:
[427468.316902]  <TASK>
[427468.316908]  __schedule+0x34d/0x9e0
[427468.316919]  schedule+0x5a/0xd0
[427468.316927]  schedule_timeout+0x94/0x150
[427468.316937]  ? __bpf_trace_tick_stop+0x10/0x10
[427468.316947]  rpc_shutdown_client+0xf2/0x150 [sunrpc]
[427468.317015]  ? cpuusage_read+0x10/0x10
[427468.317026]  nfsd4_process_cb_update+0x4c/0x270 [nfsd]
[427468.317097]  nfsd4_run_cb_work+0x9f/0x150 [nfsd]
[427468.317146]  process_one_work+0x1c7/0x380
[427468.317158]  worker_thread+0x4d/0x380
[427468.317170]  ? rescuer_thread+0x3a0/0x3a0
[427468.317177]  kthread+0xda/0x100
[427468.317185]  ? kthread_complete_and_exit+0x20/0x20
[427468.317195]  ret_from_fork+0x22/0x30
[427468.317213]  </TASK>

Maybe the RPC client is having trouble clearing clnt->cl_tasks ? 

>  > [427468.298315] task:nfsd            state:D stack:0     pid:1474 
> ppid:2      flags:0x00004000
>  > [427468.298322] Call Trace:
>  > [427468.298326]  <TASK>
>  > [427468.298332]  __schedule+0x34d/0x9e0
>  > [427468.298343]  schedule+0x5a/0xd0
>  > [427468.298350]  schedule_timeout+0x118/0x150
>  > [427468.298362]  wait_for_completion+0x86/0x160
>  > [427468.298375]  __flush_workqueue+0x152/0x420
>  > [427468.298392]  nfsd4_destroy_session+0x1b6/0x250 [nfsd]
>  > [427468.298456]  nfsd4_proc_compound+0x34c/0x660 [nfsd]
>  > [427468.298515]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
>  > [427468.298568]  svc_process_common+0x295/0x610 [sunrpc]
>  > [427468.298643]  ? svc_recv+0x491/0x810 [sunrpc]
>  > [427468.298711]  ? nfsd_svc+0x370/0x370 [nfsd]
>  > [427468.298776]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
>  > [427468.298825]  svc_process+0xad/0x100 [sunrpc]
>  > [427468.298896]  nfsd+0x99/0x140 [nfsd]
>  > [427468.298946]  kthread+0xda/0x100
>  > [427468.298954]  ? kthread_complete_and_exit+0x20/0x20
>  > [427468.298963]  ret_from_fork+0x22/0x30
> 
> There aren't a lot of 6.1-era changes in either of these, but there
> are some interesting behavior updates around session create replay
> from early this year. I wonder if the 6.1 server is mishandling an
> nfserr_jukebox situation in nfsd4_session_create.
> 
> Was the client actually attempting to mount or unmount?
> 
> Tom.
> 
> > 
> > Thank by advance,
> > Best regards
> > 
> > Le mercredi 23 octobre 2024 à 19:38 +0000, Chuck Lever III a écrit :
> > > 
> > > 
> > > > On Oct 23, 2024, at 3:27 PM, Benoît Gschwind
> > > > <benoit.gschwind@minesparis.psl.eu> wrote:
> > > > 
> > > > Hello,
> > > > 
> > > > I have a nfs server using debian 11 (Linux hostname 6.1.0-25-amd64
> > > > #1
> > > > SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux)
> > > > 
> > > > In some heavy workload some nfsd goes in D state and seems to never
> > > > leave this state. I did a python script to monitor how long a
> > > > process
> > > > stay in particular state and I use it to monitor nfsd state. I get
> > > > the
> > > > following result :
> > > > 
> > > > [...]
> > > > 178056 I (idle) 0:25:24.475 [nfsd]
> > > > 178057 I (idle) 0:25:24.475 [nfsd]
> > > > 178058 I (idle) 0:25:24.475 [nfsd]
> > > > 178059 I (idle) 0:25:24.475 [nfsd]
> > > > 178060 I (idle) 0:25:24.475 [nfsd]
> > > > 178061 I (idle) 0:25:24.475 [nfsd]
> > > > 178062 I (idle) 0:24:15.638 [nfsd]
> > > > 178063 I (idle) 0:24:13.488 [nfsd]
> > > > 178064 I (idle) 0:24:13.488 [nfsd]
> > > > 178065 I (idle) 0:00:00.000 [nfsd]
> > > > 178066 I (idle) 0:00:00.000 [nfsd]
> > > > 178067 I (idle) 0:00:00.000 [nfsd]
> > > > 178068 I (idle) 0:00:00.000 [nfsd]
> > > > 178069 S (sleeping) 0:00:02.147 [nfsd]
> > > > 178070 S (sleeping) 0:00:02.147 [nfsd]
> > > > 178071 S (sleeping) 0:00:02.147 [nfsd]
> > > > 178072 S (sleeping) 0:00:02.147 [nfsd]
> > > > 178073 S (sleeping) 0:00:02.147 [nfsd]
> > > > 178074 D (disk sleep) 1:29:25.809 [nfsd]
> > > > 178075 S (sleeping) 0:00:02.147 [nfsd]
> > > > 178076 S (sleeping) 0:00:02.147 [nfsd]
> > > > 178077 S (sleeping) 0:00:02.147 [nfsd]
> > > > 178078 S (sleeping) 0:00:02.147 [nfsd]
> > > > 178079 S (sleeping) 0:00:02.147 [nfsd]
> > > > 178080 D (disk sleep) 1:29:25.809 [nfsd]
> > > > 178081 D (disk sleep) 1:29:25.809 [nfsd]
> > > > 178082 D (disk sleep) 0:28:04.444 [nfsd]
> > > > 
> > > > All process not shown are in idle state. Columns are the following:
> > > > PID, state, state name, amoung of time the state did not changed
> > > > and
> > > > the process was not interrupted, and /proc/PID/status Name entry.
> > > > 
> > > > As you can read some nfsd process are in disk sleep state since
> > > > more
> > > > than 1 hour, but looking at the disk activity, there is almost no
> > > > I/O.
> > > > 
> > > > I tried to restart nfs-server but I get the following error from
> > > > the
> > > > kernel:
> > > > 
> > > > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
> > > > when sending 20 bytes - shutting down socket
> > > > oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
> > > > when sending 20 bytes - shutting down socket
> > > > 
> > > > The only way to recover seems to reboot the kernel. I guess because
> > > > the
> > > > kernel force the reboot after a given timeout.
> > > > 
> > > > My setup involve in order :
> > > > - scsi driver
> > > > - mdraid on top of scsi (raid6)
> > > > - btrfs ontop of mdraid
> > > > - nfsd ontop of btrfs
> > > > 
> > > > 
> > > > The setup is not very fast as expected, but it seems that in some
> > > > situation nfsd never leave the disk sleep state. the exports
> > > > options
> > > > are: gss/krb5i(rw,sync,no_wdelay,no_subtree_check,fsid=XXXXX). The
> > > > situation is not commun but it's always happen at some point. For
> > > > instance in the case I report here, my server booted the 2024-10-01
> > > > and
> > > > was stuck about the 2024-10-23. I did reduced by a large amount the
> > > > frequency of issue by using no_wdelay (I did thought that I did
> > > > solved
> > > > the issue when I started to use this option).
> > > > 
> > > > My guess is hadware bug, scsi bug, btrfs bug or nfsd bug ?
> > > > 
> > > > Any clue on this topic or any advice is wellcome.
> > > 
> > > Generate stack traces for each process on the system
> > > using "sudo echo t > /proc/sysrq-trigger" and then
> > > examine the output in the system journal. Note the
> > > stack contents for the processes that look stuck.
> > > 
> > > --
> > > Chuck Lever
> > > 
> > > 
> > 
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-28 17:27       ` Jeff Layton
@ 2024-10-29 13:49         ` Jeff Layton
  2024-10-29 20:04           ` Jeff Layton
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2024-10-29 13:49 UTC (permalink / raw)
  To: Tom Talpey, Benoît Gschwind, Chuck Lever III; +Cc: Linux NFS Mailing List

On Mon, 2024-10-28 at 13:27 -0400, Jeff Layton wrote:
> On Mon, 2024-10-28 at 08:46 -0400, Tom Talpey wrote:
> > On 10/28/2024 5:18 AM, Benoît Gschwind wrote:
> > > Hello,
> > > 
> > > The issue trigger again, I attached the result of:
> > > 
> > > # dmesg -W | tee dmesg.txt
> > > 
> > > using:
> > > 
> > > # echo t > /proc/sysrq-trigger
> > > 
> > > I have the following PID stuck:
> > > 
> > >      1474 D (disk sleep)       0:54:58.602 [nfsd]
> > >      1475 D (disk sleep)       0:54:58.602 [nfsd]
> > >      1484 D (disk sleep)       0:54:58.602 [nfsd]
> > >      1495 D (disk sleep)       0:54:58.602 [nfsd]
> > 
> > Hmm, 1495 is stuck in nfsd4_create_session
> > 
> >  > [427468.304955] task:nfsd            state:D stack:0     pid:1495 
> > ppid:2      flags:0x00004000
> >  > [427468.304962] Call Trace:
> >  > [427468.304965]  <TASK>
> >  > [427468.304971]  __schedule+0x34d/0x9e0
> >  > [427468.304983]  schedule+0x5a/0xd0
> >  > [427468.304991]  schedule_timeout+0x118/0x150
> >  > [427468.305003]  wait_for_completion+0x86/0x160
> >  > [427468.305015]  __flush_workqueue+0x152/0x420
> >  > [427468.305031]  nfsd4_create_session+0x79f/0xba0 [nfsd]
> >  > [427468.305092]  nfsd4_proc_compound+0x34c/0x660 [nfsd]
> >  > [427468.305147]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
> >  > [427468.305199]  svc_process_common+0x295/0x610 [sunrpc]
> >  > [427468.305269]  ? svc_recv+0x491/0x810 [sunrpc]
> >  > [427468.305337]  ? nfsd_svc+0x370/0x370 [nfsd]
> >  > [427468.305389]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> >  > [427468.305437]  svc_process+0xad/0x100 [sunrpc]
> >  > [427468.305505]  nfsd+0x99/0x140 [nfsd]
> >  > [427468.305555]  kthread+0xda/0x100
> >  > [427468.305562]  ? kthread_complete_and_exit+0x20/0x20
> >  > [427468.305572]  ret_from_fork+0x22/0x30
> > 
> > and the other three are stuck in nfsd4_destroy_session
> > 
> 
> All 4 processes are stuck waiting on a flush_workqueue call. It's
> probably one of these:
> 
>     flush_workqueue(clp->cl_callback_wq);
> 
> So the question here is really -- why are the callback workqueue jobs
> stuck? I do see this:
> 
> [427468.316839] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
> [427468.316899] Call Trace:
> [427468.316902]  <TASK>
> [427468.316908]  __schedule+0x34d/0x9e0
> [427468.316919]  schedule+0x5a/0xd0
> [427468.316927]  schedule_timeout+0x94/0x150
> [427468.316937]  ? __bpf_trace_tick_stop+0x10/0x10
> [427468.316947]  rpc_shutdown_client+0xf2/0x150 [sunrpc]
> [427468.317015]  ? cpuusage_read+0x10/0x10
> [427468.317026]  nfsd4_process_cb_update+0x4c/0x270 [nfsd]
> [427468.317097]  nfsd4_run_cb_work+0x9f/0x150 [nfsd]
> [427468.317146]  process_one_work+0x1c7/0x380
> [427468.317158]  worker_thread+0x4d/0x380
> [427468.317170]  ? rescuer_thread+0x3a0/0x3a0
> [427468.317177]  kthread+0xda/0x100
> [427468.317185]  ? kthread_complete_and_exit+0x20/0x20
> [427468.317195]  ret_from_fork+0x22/0x30
> [427468.317213]  </TASK>
> 
> Maybe the RPC client is having trouble clearing clnt->cl_tasks ? 
> 

Interestingly, I hit this today while testing the patches that widen
the backchannel:

[  484.550350] INFO: task nfsd:1168 blocked for more than 120 seconds.
[  484.551746]       Not tainted 6.12.0-rc5+ #188
[  484.552854] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.586316] task:nfsd            state:D stack:0     pid:1168  tgid:1168  ppid:2      flags:0x00004000
[  484.588417] Call Trace:
[  484.589070]  <TASK>
[  484.589624]  __schedule+0x4e4/0xb90
[  484.622306]  schedule+0x26/0xf0
[  484.623162]  schedule_timeout+0x121/0x150
[  484.624344]  ? __smp_call_single_queue+0xa7/0x110
[  484.625375]  ? x2apic_send_IPI+0x45/0x50
[  484.654299]  ? ttwu_queue_wakelist+0xbf/0x110
[  484.655454]  __wait_for_common+0x91/0x1c0
[  484.656372]  ? __pfx_schedule_timeout+0x10/0x10
[  484.657429]  __flush_workqueue+0x13a/0x400
[  484.686304]  nfsd4_create_session+0x9cd/0xd90 [nfsd]
[  484.687562]  nfsd4_proc_compound+0x396/0x750 [nfsd]
[  484.688746]  nfsd_dispatch+0xc6/0x210 [nfsd]
[  484.689835]  svc_process_common+0x4e6/0x6d0 [sunrpc]
[  484.722305]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[  484.723532]  svc_process+0x12d/0x1c0 [sunrpc]
[  484.724619]  svc_recv+0x7cf/0x980 [sunrpc]
[  484.725618]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[  484.758308]  nfsd+0x9f/0x100 [nfsd]
[  484.759184]  kthread+0xdd/0x110
[  484.759905]  ? __pfx_kthread+0x10/0x10
[  484.760870]  ret_from_fork+0x30/0x50
[  484.761603]  ? __pfx_kthread+0x10/0x10
[  484.790302]  ret_from_fork_asm+0x1a/0x30
[  484.791151]  </TASK>

The host was hung after this, so I can't fully confirm the scenario
below, but I think it's probably correct:

The client might want to create a new session. When that happens, the
create_session call tries to flush_workqueue() on the client-wide
backchannel workqueue.

That workqueue is stuck, as it's waiting for all of the callback RPCs
to die off (as evidenced by the stack from Benoît's info that shows it
hung in rpc_shutdown_client). So the question I guess at this point is
why these RPC tasks aren't dying like they're supposed to when
rpc_killall_tasks() is run.

I'm guessing that the backchannel-widening patches may make this a bit
easier to reproduce? In any case, this problem doesn't seem to be
confined to v6.1. 

I'm still looking at this, so stay tuned...
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-29 13:49         ` Jeff Layton
@ 2024-10-29 20:04           ` Jeff Layton
  2025-01-20  9:22             ` Benoît Gschwind
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Layton @ 2024-10-29 20:04 UTC (permalink / raw)
  To: Tom Talpey, Benoît Gschwind, Chuck Lever III; +Cc: Linux NFS Mailing List

On Tue, 2024-10-29 at 09:49 -0400, Jeff Layton wrote:
> On Mon, 2024-10-28 at 13:27 -0400, Jeff Layton wrote:
> > On Mon, 2024-10-28 at 08:46 -0400, Tom Talpey wrote:
> > > On 10/28/2024 5:18 AM, Benoît Gschwind wrote:
> > > > Hello,
> > > > 
> > > > The issue trigger again, I attached the result of:
> > > > 
> > > > # dmesg -W | tee dmesg.txt
> > > > 
> > > > using:
> > > > 
> > > > # echo t > /proc/sysrq-trigger
> > > > 
> > > > I have the following PID stuck:
> > > > 
> > > >      1474 D (disk sleep)       0:54:58.602 [nfsd]
> > > >      1475 D (disk sleep)       0:54:58.602 [nfsd]
> > > >      1484 D (disk sleep)       0:54:58.602 [nfsd]
> > > >      1495 D (disk sleep)       0:54:58.602 [nfsd]
> > > 
> > > Hmm, 1495 is stuck in nfsd4_create_session
> > > 
> > >  > [427468.304955] task:nfsd            state:D stack:0     pid:1495 
> > > ppid:2      flags:0x00004000
> > >  > [427468.304962] Call Trace:
> > >  > [427468.304965]  <TASK>
> > >  > [427468.304971]  __schedule+0x34d/0x9e0
> > >  > [427468.304983]  schedule+0x5a/0xd0
> > >  > [427468.304991]  schedule_timeout+0x118/0x150
> > >  > [427468.305003]  wait_for_completion+0x86/0x160
> > >  > [427468.305015]  __flush_workqueue+0x152/0x420
> > >  > [427468.305031]  nfsd4_create_session+0x79f/0xba0 [nfsd]
> > >  > [427468.305092]  nfsd4_proc_compound+0x34c/0x660 [nfsd]
> > >  > [427468.305147]  nfsd_dispatch+0x1a1/0x2b0 [nfsd]
> > >  > [427468.305199]  svc_process_common+0x295/0x610 [sunrpc]
> > >  > [427468.305269]  ? svc_recv+0x491/0x810 [sunrpc]
> > >  > [427468.305337]  ? nfsd_svc+0x370/0x370 [nfsd]
> > >  > [427468.305389]  ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> > >  > [427468.305437]  svc_process+0xad/0x100 [sunrpc]
> > >  > [427468.305505]  nfsd+0x99/0x140 [nfsd]
> > >  > [427468.305555]  kthread+0xda/0x100
> > >  > [427468.305562]  ? kthread_complete_and_exit+0x20/0x20
> > >  > [427468.305572]  ret_from_fork+0x22/0x30
> > > 
> > > and the other three are stuck in nfsd4_destroy_session
> > > 
> > 
> > All 4 processes are stuck waiting on a flush_workqueue call. It's
> > probably one of these:
> > 
> >     flush_workqueue(clp->cl_callback_wq);
> > 
> > So the question here is really -- why are the callback workqueue jobs
> > stuck? I do see this:
> > 
> > [427468.316839] Workqueue: nfsd4_callbacks nfsd4_run_cb_work [nfsd]
> > [427468.316899] Call Trace:
> > [427468.316902]  <TASK>
> > [427468.316908]  __schedule+0x34d/0x9e0
> > [427468.316919]  schedule+0x5a/0xd0
> > [427468.316927]  schedule_timeout+0x94/0x150
> > [427468.316937]  ? __bpf_trace_tick_stop+0x10/0x10
> > [427468.316947]  rpc_shutdown_client+0xf2/0x150 [sunrpc]
> > [427468.317015]  ? cpuusage_read+0x10/0x10
> > [427468.317026]  nfsd4_process_cb_update+0x4c/0x270 [nfsd]
> > [427468.317097]  nfsd4_run_cb_work+0x9f/0x150 [nfsd]
> > [427468.317146]  process_one_work+0x1c7/0x380
> > [427468.317158]  worker_thread+0x4d/0x380
> > [427468.317170]  ? rescuer_thread+0x3a0/0x3a0
> > [427468.317177]  kthread+0xda/0x100
> > [427468.317185]  ? kthread_complete_and_exit+0x20/0x20
> > [427468.317195]  ret_from_fork+0x22/0x30
> > [427468.317213]  </TASK>
> > 
> > Maybe the RPC client is having trouble clearing clnt->cl_tasks ? 
> > 
> 
> Interestingly, I hit this today while testing the patches that widen
> the backchannel:
> 
> [  484.550350] INFO: task nfsd:1168 blocked for more than 120 seconds.
> [  484.551746]       Not tainted 6.12.0-rc5+ #188
> [  484.552854] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  484.586316] task:nfsd            state:D stack:0     pid:1168  tgid:1168  ppid:2      flags:0x00004000
> [  484.588417] Call Trace:
> [  484.589070]  <TASK>
> [  484.589624]  __schedule+0x4e4/0xb90
> [  484.622306]  schedule+0x26/0xf0
> [  484.623162]  schedule_timeout+0x121/0x150
> [  484.624344]  ? __smp_call_single_queue+0xa7/0x110
> [  484.625375]  ? x2apic_send_IPI+0x45/0x50
> [  484.654299]  ? ttwu_queue_wakelist+0xbf/0x110
> [  484.655454]  __wait_for_common+0x91/0x1c0
> [  484.656372]  ? __pfx_schedule_timeout+0x10/0x10
> [  484.657429]  __flush_workqueue+0x13a/0x400
> [  484.686304]  nfsd4_create_session+0x9cd/0xd90 [nfsd]
> [  484.687562]  nfsd4_proc_compound+0x396/0x750 [nfsd]
> [  484.688746]  nfsd_dispatch+0xc6/0x210 [nfsd]
> [  484.689835]  svc_process_common+0x4e6/0x6d0 [sunrpc]
> [  484.722305]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> [  484.723532]  svc_process+0x12d/0x1c0 [sunrpc]
> [  484.724619]  svc_recv+0x7cf/0x980 [sunrpc]
> [  484.725618]  ? __pfx_nfsd+0x10/0x10 [nfsd]
> [  484.758308]  nfsd+0x9f/0x100 [nfsd]
> [  484.759184]  kthread+0xdd/0x110
> [  484.759905]  ? __pfx_kthread+0x10/0x10
> [  484.760870]  ret_from_fork+0x30/0x50
> [  484.761603]  ? __pfx_kthread+0x10/0x10
> [  484.790302]  ret_from_fork_asm+0x1a/0x30
> [  484.791151]  </TASK>
> 
> The host was hung after this, so I can't fully confirm the scenario
> below, but I think it's probably correct:
> 
> The client might want to create a new session. When that happens, the
> create_session call tries to flush_workqueue() on the client-wide
> backchannel workqueue.
> 
> That workqueue is stuck, as it's waiting for all of the callback RPCs
> to die off (as evidenced by the stack from Benoît's info that shows it
> hung in rpc_shutdown_client). So the question I guess at this point is
> why these RPC tasks aren't dying like they're supposed to when
> rpc_killall_tasks() is run.
> 
> I'm guessing that the backchannel-widening patches may make this a bit
> easier to reproduce? In any case, this problem doesn't seem to be
> confined to v6.1. 
> 
> I'm still looking at this, so stay tuned...

The root cause in my case was that I had a bug in the backchannel
widening patch, and that was causing the client to recreate the session
occasionally.

I can't reproduce it now with the v2 version of that patch, but I still
suspect that this is related to problems killing backchannel rpc tasks.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: nfsd stuck in D (disk sleep) state
  2024-10-29 20:04           ` Jeff Layton
@ 2025-01-20  9:22             ` Benoît Gschwind
  0 siblings, 0 replies; 12+ messages in thread
From: Benoît Gschwind @ 2025-01-20  9:22 UTC (permalink / raw)
  To: Jeff Layton, Tom Talpey, Chuck Lever III; +Cc: Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 81 bytes --]

Hello,

bug triggered again after ~ 2 month with attached data.

Best regards




[-- Attachment #2: dmesg-2025-01-20.txt.gz --]
[-- Type: application/gzip, Size: 67242 bytes --]

[-- Attachment #3: status-2025-01-20.txt.gz --]
[-- Type: application/gzip, Size: 361 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-01-20  9:22 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23 19:27 nfsd stuck in D (disk sleep) state Benoît Gschwind
2024-10-23 19:38 ` Chuck Lever III
2024-10-24  8:07   ` Benoît Gschwind
2024-10-28  9:18   ` Benoît Gschwind
2024-10-28 12:46     ` Tom Talpey
2024-10-28 13:32       ` Chuck Lever III
2024-10-28 17:01       ` Benoît Gschwind
2024-10-28 17:38         ` Tom Talpey
2024-10-28 17:27       ` Jeff Layton
2024-10-29 13:49         ` Jeff Layton
2024-10-29 20:04           ` Jeff Layton
2025-01-20  9:22             ` Benoît Gschwind

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox