Re: [Question]Is a Kernel Timeout Recovery Mechanism Needed for Prolonged User-Space Downcall Unresponsiveness?

Linux NFS development
 help / color / mirror / Atom feed

From: Jeff Layton <jlayton@kernel.org>
To: Li Lingfeng <lilingfeng3@huawei.com>,
	Chuck Lever <chuck.lever@oracle.com>,  NeilBrown <neilb@suse.de>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>,  Tom Talpey <tom@talpey.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	Yu Kuai <yukuai1@huaweicloud.com>, Hou Tao <houtao1@huawei.com>,
	"zhangyi (F)" <yi.zhang@huawei.com>,
	yangerkun <yangerkun@huawei.com>,
	Li Lingfeng <lilingfeng@huaweicloud.com>,
	"zhangjian (CG)" <zhangjian496@huawei.com>
Subject: Re: [Question]Is a Kernel Timeout Recovery Mechanism Needed for Prolonged User-Space Downcall Unresponsiveness?
Date: Tue, 25 Feb 2025 06:52:02 -0500	[thread overview]
Message-ID: <090bbf28ace1a6f7c05da726b62cd642f24b01d0.camel@kernel.org> (raw)
In-Reply-To: <c44533cc-f625-4eda-b47b-c6f6dd01c991@huawei.com>

On Tue, 2025-02-25 at 16:51 +0800, Li Lingfeng wrote:
>  
> 
> Hi.
>  Recently, during fault injection testing, we found an issue where nfsd
>  process cannot exit when /proc/fs/nfsd/threads is written to 0, causing
>  other processes to be unable to acquire nfsd_mutex, leading to a hungtask.
>  This is the stack trace of the nfsd process:
>  PID: 107326  TASK: ffff8881013a4040  CPU: 1   COMMAND: "nfsd"
>   #0 [ffffc900077077d8] __schedule at ffffffff9c6434b6
>   #1 [ffffc900077078d8] schedule at ffffffff9c643e28
>   #2 [ffffc90007707900] schedule_timeout at ffffffff9c64bf16
>   #3 [ffffc90007707a68] wait_for_common at ffffffff9c645346
>   #4 [ffffc90007707b38] nfsd4_cld_create at ffffffff9b80626a
>   #5 [ffffc90007707c40] nfsd4_open_confirm at ffffffff9b7f41d9
>   #6 [ffffc90007707ce0] nfsd4_proc_compound at ffffffff9b7c872a
>   #7 [ffffc90007707d80] nfsd_dispatch at ffffffff9b79f20d
>   #8 [ffffc90007707dc8] svc_process_common at ffffffff9c4ad9fb
>   #9 [ffffc90007707ea0] svc_process at ffffffff9c4adf15
>  #10 [ffffc90007707ed8] nfsd at ffffffff9b79ba18
>  #11 [ffffc90007707f10] kthread at ffffffff9af908c4
>  #12 [ffffc90007707f50] ret_from_fork at ffffffff9ae048df
>  
> 
>  This is because the nfsdcld process exited abnormally, causing the nfsd
>  process to wait indefinitely for a downcall response after initiating an
>  upcall.
>  Here is the log of nfsdcld:
>  Jan  4 02:22:29 localhost nfsdcld[696]: cld_message_size invalid upcall version 0
>  Jan  4 02:22:29 localhost systemd[1]: nfsdcld.service: Main process exited, code=exited, status=1/FAILURE
>  Jan  4 02:22:29 localhost systemd[1]: nfsdcld.service: Failed with result 'exit-code'.
>  
> 
>  Memory fault injection caused the kernel to report cld_msg in v1 format,
>  and nfsdcld parsed it incorrectly, leading to an abnormal exit.
>  
> 
>  // Expected Scenario
>  nfsd4_client_tracking_init
>   nn->client_tracking_ops = &nfsd4_cld_tracking_ops; // Initialize to v1
>   nfsd4_cld_tracking_init
>    nfsd4_cld_get_version
>     cld_pipe_upcall // Request version information from user space
>     nn->client_tracking_ops = &nfsd4_cld_tracking_ops_v2; // Initialize to v2
>  
> 
>  // Actual Scenario
>  nfsd4_client_tracking_init
>   nn->client_tracking_ops = &nfsd4_cld_tracking_ops; // Initialize to v1
>   nfsd4_cld_tracking_init
>    nfsd4_cld_get_version
>     alloc_cld_upcall // A failure is returned due to memory fault
>                      // injection, and the upcall is skipped.
>    nfsd4_cld_grace_start
>     alloc_cld_upcall // A failure is returned due to memory fault
>                      // injection, and the upcall is skipped.
>   nn->client_tracking_ops = &nfsd4_cld_tracking_ops_v0 // Initialize to v1
>  
> 
> I was wondering if the kernel might benefit from having a timeout mechanism
>  in place to gracefully handle situations where nfsdcld is unable to send a
>  downcall for certain reasons, ensuring that the nfsd process can exit properly.
> 
>  Link:https://lore.kernel.org/all/3e26c767-f347-4dbe-ae04-aabe8e87af12@huawei.com/
>  

That does sound like a real bug to me. Looks like there is a similar
problem in the client-side block layout upcall (bl_resolve_deviceid)
too.

In practice, this could even happen while the server was running, which
would probably cause a RECLAIM_COMPLETE or OPEN operation to hang
indefinitely. Adding a timeout sounds reasonable.
-- 
Jeff Layton <jlayton@kernel.org>

next      parent reply	other threads:[~2025-02-25 11:52 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <c44533cc-f625-4eda-b47b-c6f6dd01c991@huawei.com>
2025-02-25 11:52 ` Jeff Layton [this message]
2025-02-25 14:09 ` [Question]Is a Kernel Timeout Recovery Mechanism Needed for Prolonged User-Space Downcall Unresponsiveness? Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=090bbf28ace1a6f7c05da726b62cd642f24b01d0.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=chuck.lever@oracle.com \
    --cc=houtao1@huawei.com \
    --cc=lilingfeng3@huawei.com \
    --cc=lilingfeng@huaweicloud.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=zhangjian496@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox