From: Tom Talpey <tom@talpey.com>
To: "Benoît Gschwind" <benoit.gschwind@minesparis.psl.eu>,
"Chuck Lever III" <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: nfsd stuck in D (disk sleep) state
Date: Mon, 28 Oct 2024 08:46:37 -0400 [thread overview]
Message-ID: <509abdda-9143-40da-bbf2-9bfdf8044d4c@talpey.com> (raw)
In-Reply-To: <f7aec65eab6180cf5e97bbfd3fad2a6b633d4613.camel@minesparis.psl.eu>
On 10/28/2024 5:18 AM, Benoît Gschwind wrote:
> Hello,
>
> The issue trigger again, I attached the result of:
>
> # dmesg -W | tee dmesg.txt
>
> using:
>
> # echo t > /proc/sysrq-trigger
>
> I have the following PID stuck:
>
> 1474 D (disk sleep) 0:54:58.602 [nfsd]
> 1475 D (disk sleep) 0:54:58.602 [nfsd]
> 1484 D (disk sleep) 0:54:58.602 [nfsd]
> 1495 D (disk sleep) 0:54:58.602 [nfsd]
Hmm, 1495 is stuck in nfsd4_create_session
> [427468.304955] task:nfsd state:D stack:0 pid:1495
ppid:2 flags:0x00004000
> [427468.304962] Call Trace:
> [427468.304965] <TASK>
> [427468.304971] __schedule+0x34d/0x9e0
> [427468.304983] schedule+0x5a/0xd0
> [427468.304991] schedule_timeout+0x118/0x150
> [427468.305003] wait_for_completion+0x86/0x160
> [427468.305015] __flush_workqueue+0x152/0x420
> [427468.305031] nfsd4_create_session+0x79f/0xba0 [nfsd]
> [427468.305092] nfsd4_proc_compound+0x34c/0x660 [nfsd]
> [427468.305147] nfsd_dispatch+0x1a1/0x2b0 [nfsd]
> [427468.305199] svc_process_common+0x295/0x610 [sunrpc]
> [427468.305269] ? svc_recv+0x491/0x810 [sunrpc]
> [427468.305337] ? nfsd_svc+0x370/0x370 [nfsd]
> [427468.305389] ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> [427468.305437] svc_process+0xad/0x100 [sunrpc]
> [427468.305505] nfsd+0x99/0x140 [nfsd]
> [427468.305555] kthread+0xda/0x100
> [427468.305562] ? kthread_complete_and_exit+0x20/0x20
> [427468.305572] ret_from_fork+0x22/0x30
and the other three are stuck in nfsd4_destroy_session
> [427468.298315] task:nfsd state:D stack:0 pid:1474
ppid:2 flags:0x00004000
> [427468.298322] Call Trace:
> [427468.298326] <TASK>
> [427468.298332] __schedule+0x34d/0x9e0
> [427468.298343] schedule+0x5a/0xd0
> [427468.298350] schedule_timeout+0x118/0x150
> [427468.298362] wait_for_completion+0x86/0x160
> [427468.298375] __flush_workqueue+0x152/0x420
> [427468.298392] nfsd4_destroy_session+0x1b6/0x250 [nfsd]
> [427468.298456] nfsd4_proc_compound+0x34c/0x660 [nfsd]
> [427468.298515] nfsd_dispatch+0x1a1/0x2b0 [nfsd]
> [427468.298568] svc_process_common+0x295/0x610 [sunrpc]
> [427468.298643] ? svc_recv+0x491/0x810 [sunrpc]
> [427468.298711] ? nfsd_svc+0x370/0x370 [nfsd]
> [427468.298776] ? nfsd_shutdown_threads+0x90/0x90 [nfsd]
> [427468.298825] svc_process+0xad/0x100 [sunrpc]
> [427468.298896] nfsd+0x99/0x140 [nfsd]
> [427468.298946] kthread+0xda/0x100
> [427468.298954] ? kthread_complete_and_exit+0x20/0x20
> [427468.298963] ret_from_fork+0x22/0x30
There aren't a lot of 6.1-era changes in either of these, but there
are some interesting behavior updates around session create replay
from early this year. I wonder if the 6.1 server is mishandling an
nfserr_jukebox situation in nfsd4_session_create.
Was the client actually attempting to mount or unmount?
Tom.
>
> Thank by advance,
> Best regards
>
> Le mercredi 23 octobre 2024 à 19:38 +0000, Chuck Lever III a écrit :
>>
>>
>>> On Oct 23, 2024, at 3:27 PM, Benoît Gschwind
>>> <benoit.gschwind@minesparis.psl.eu> wrote:
>>>
>>> Hello,
>>>
>>> I have a nfs server using debian 11 (Linux hostname 6.1.0-25-amd64
>>> #1
>>> SMP PREEMPT_DYNAMIC Debian 6.1.106-3 (2024-08-26) x86_64 GNU/Linux)
>>>
>>> In some heavy workload some nfsd goes in D state and seems to never
>>> leave this state. I did a python script to monitor how long a
>>> process
>>> stay in particular state and I use it to monitor nfsd state. I get
>>> the
>>> following result :
>>>
>>> [...]
>>> 178056 I (idle) 0:25:24.475 [nfsd]
>>> 178057 I (idle) 0:25:24.475 [nfsd]
>>> 178058 I (idle) 0:25:24.475 [nfsd]
>>> 178059 I (idle) 0:25:24.475 [nfsd]
>>> 178060 I (idle) 0:25:24.475 [nfsd]
>>> 178061 I (idle) 0:25:24.475 [nfsd]
>>> 178062 I (idle) 0:24:15.638 [nfsd]
>>> 178063 I (idle) 0:24:13.488 [nfsd]
>>> 178064 I (idle) 0:24:13.488 [nfsd]
>>> 178065 I (idle) 0:00:00.000 [nfsd]
>>> 178066 I (idle) 0:00:00.000 [nfsd]
>>> 178067 I (idle) 0:00:00.000 [nfsd]
>>> 178068 I (idle) 0:00:00.000 [nfsd]
>>> 178069 S (sleeping) 0:00:02.147 [nfsd]
>>> 178070 S (sleeping) 0:00:02.147 [nfsd]
>>> 178071 S (sleeping) 0:00:02.147 [nfsd]
>>> 178072 S (sleeping) 0:00:02.147 [nfsd]
>>> 178073 S (sleeping) 0:00:02.147 [nfsd]
>>> 178074 D (disk sleep) 1:29:25.809 [nfsd]
>>> 178075 S (sleeping) 0:00:02.147 [nfsd]
>>> 178076 S (sleeping) 0:00:02.147 [nfsd]
>>> 178077 S (sleeping) 0:00:02.147 [nfsd]
>>> 178078 S (sleeping) 0:00:02.147 [nfsd]
>>> 178079 S (sleeping) 0:00:02.147 [nfsd]
>>> 178080 D (disk sleep) 1:29:25.809 [nfsd]
>>> 178081 D (disk sleep) 1:29:25.809 [nfsd]
>>> 178082 D (disk sleep) 0:28:04.444 [nfsd]
>>>
>>> All process not shown are in idle state. Columns are the following:
>>> PID, state, state name, amoung of time the state did not changed
>>> and
>>> the process was not interrupted, and /proc/PID/status Name entry.
>>>
>>> As you can read some nfsd process are in disk sleep state since
>>> more
>>> than 1 hour, but looking at the disk activity, there is almost no
>>> I/O.
>>>
>>> I tried to restart nfs-server but I get the following error from
>>> the
>>> kernel:
>>>
>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:49 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 11:59:59 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 12:00:09 hostname kernel: rpc-srv/tcp: nfsd: got error -104
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
>>> when sending 20 bytes - shutting down socket
>>> oct. 23 12:00:10 hostname kernel: rpc-srv/tcp: nfsd: got error -32
>>> when sending 20 bytes - shutting down socket
>>>
>>> The only way to recover seems to reboot the kernel. I guess because
>>> the
>>> kernel force the reboot after a given timeout.
>>>
>>> My setup involve in order :
>>> - scsi driver
>>> - mdraid on top of scsi (raid6)
>>> - btrfs ontop of mdraid
>>> - nfsd ontop of btrfs
>>>
>>>
>>> The setup is not very fast as expected, but it seems that in some
>>> situation nfsd never leave the disk sleep state. the exports
>>> options
>>> are: gss/krb5i(rw,sync,no_wdelay,no_subtree_check,fsid=XXXXX). The
>>> situation is not commun but it's always happen at some point. For
>>> instance in the case I report here, my server booted the 2024-10-01
>>> and
>>> was stuck about the 2024-10-23. I did reduced by a large amount the
>>> frequency of issue by using no_wdelay (I did thought that I did
>>> solved
>>> the issue when I started to use this option).
>>>
>>> My guess is hadware bug, scsi bug, btrfs bug or nfsd bug ?
>>>
>>> Any clue on this topic or any advice is wellcome.
>>
>> Generate stack traces for each process on the system
>> using "sudo echo t > /proc/sysrq-trigger" and then
>> examine the output in the system journal. Note the
>> stack contents for the processes that look stuck.
>>
>> --
>> Chuck Lever
>>
>>
>
next prev parent reply other threads:[~2024-10-28 12:46 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-23 19:27 nfsd stuck in D (disk sleep) state Benoît Gschwind
2024-10-23 19:38 ` Chuck Lever III
2024-10-24 8:07 ` Benoît Gschwind
2024-10-28 9:18 ` Benoît Gschwind
2024-10-28 12:46 ` Tom Talpey [this message]
2024-10-28 13:32 ` Chuck Lever III
2024-10-28 17:01 ` Benoît Gschwind
2024-10-28 17:38 ` Tom Talpey
2024-10-28 17:27 ` Jeff Layton
2024-10-29 13:49 ` Jeff Layton
2024-10-29 20:04 ` Jeff Layton
2025-01-20 9:22 ` Benoît Gschwind
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=509abdda-9143-40da-bbf2-9bfdf8044d4c@talpey.com \
--to=tom@talpey.com \
--cc=benoit.gschwind@minesparis.psl.eu \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox