From: Chuck Lever <chuck.lever@oracle.com>
To: Rik Theys <rik.theys@gmail.com>,
Christian Herzog <herzog@phys.ethz.ch>,
Salvatore Bonaccorso <carnil@debian.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: nfsd4 laundromat_main hung tasks
Date: Mon, 13 Jan 2025 17:12:12 -0500 [thread overview]
Message-ID: <cbc55c4a-ac98-4121-b590-13f32a257d65@oracle.com> (raw)
In-Reply-To: <CAPwv0J=ju3fZ8C_FFeDnzzKT-ppXaLCde64hQof3=g641Daudw@mail.gmail.com>
On 1/12/25 7:42 AM, Rik Theys wrote:
> On Fri, Jan 10, 2025 at 11:07 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>>
>> On 1/10/25 3:51 PM, Rik Theys wrote:
>>> Are there any debugging commands we can run once the issue happens
>>> that can help to determine the cause of this issue?
>>
>> Once the issue happens, the precipitating bug has already done its
>> damage, so at that point it is too late.
I've studied the code and bug reports a bit. I see one intriguing
mention in comment #5:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071562#5
/proc/130/stack:
[<0>] rpc_shutdown_client+0xf2/0x150 [sunrpc]
[<0>] nfsd4_process_cb_update+0x4c/0x270 [nfsd]
[<0>] nfsd4_run_cb_work+0x9f/0x150 [nfsd]
[<0>] process_one_work+0x1c7/0x380
[<0>] worker_thread+0x4d/0x380
[<0>] kthread+0xda/0x100
[<0>] ret_from_fork+0x22/0x30
This tells me that the active item on the callback_wq is waiting for the
backchannel RPC client to shut down. This is probably the proximal cause
of the callback workqueue stall.
rpc_shutdown_client() is waiting for the client's cl_tasks to become
empty. Typically this is a short wait. But here, there's one or more RPC
requests that are not completing.
Please issue these two commands on your server once it gets into the
hung state:
# rpcdebug -m rpc -c
# echo t > /proc/sysrq-trigger
Then gift-wrap the server's system journal and send it to me. I need to
see only the output from these two commands, so if you want to
anonymize the journal and truncate it to just the day of the failure,
I think that should be fine.
--
Chuck Lever
next prev parent reply other threads:[~2025-01-13 22:12 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-10 19:49 nfsd4 laundromat_main hung tasks Rik Theys
2025-01-10 20:30 ` Chuck Lever
[not found] ` <CAPwv0J=oKBnCia_mmhm-tYLPqw03jO=LxfUbShSyXFp-mKET5A@mail.gmail.com>
[not found] ` <49654519-9166-4593-ac62-77400cebebb4@oracle.com>
2025-01-12 12:42 ` Rik Theys
2025-01-12 18:57 ` Chuck Lever
2025-01-13 12:30 ` Rik Theys
2025-01-13 13:39 ` Chuck Lever
2025-01-13 22:12 ` Chuck Lever [this message]
2025-01-14 8:23 ` Rik Theys
2025-01-14 14:51 ` Chuck Lever
2025-01-14 15:30 ` Rik Theys
2025-01-14 16:10 ` Chuck Lever
2025-01-14 19:02 ` Chuck Lever
2025-01-16 9:03 ` Rik Theys
2025-01-16 14:12 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cbc55c4a-ac98-4121-b590-13f32a257d65@oracle.com \
--to=chuck.lever@oracle.com \
--cc=carnil@debian.org \
--cc=herzog@phys.ethz.ch \
--cc=linux-nfs@vger.kernel.org \
--cc=rik.theys@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox