From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>, NeilBrown <neilb@suse.de>
Cc: syzbot <syzbot+d1e76d963f757db40f91@syzkaller.appspotmail.com>,
Dai.Ngo@oracle.com, kolga@netapp.com,
linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org,
lorenzo@kernel.org, netdev@vger.kernel.org, okorniev@redhat.com,
syzkaller-bugs@googlegroups.com, tom@talpey.com
Subject: Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit
Date: Wed, 09 Oct 2024 16:26:56 -0400 [thread overview]
Message-ID: <f37c0b837fd947362eb9d5bf7873347fbc5aa567.camel@kernel.org> (raw)
In-Reply-To: <ZthtZ4omOnFnhXXr@tissot.1015granger.net>
On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote:
> On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
> > On Sun, 01 Sep 2024, syzbot wrote:
> > > syzbot has found a reproducer for the following issue on:
> >
> > I had a poke around using the provided disk image and kernel for
> > exploring.
> >
> > I think the problem is demonstrated by this stack :
> >
> > [<0>] rpc_wait_bit_killable+0x1b/0x160
> > [<0>] __rpc_execute+0x723/0x1460
> > [<0>] rpc_execute+0x1ec/0x3f0
> > [<0>] rpc_run_task+0x562/0x6c0
> > [<0>] rpc_call_sync+0x197/0x2e0
> > [<0>] rpcb_register+0x36b/0x670
> > [<0>] svc_unregister+0x208/0x730
> > [<0>] svc_bind+0x1bb/0x1e0
> > [<0>] nfsd_create_serv+0x3f0/0x760
> > [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
> > [<0>] genl_rcv_msg+0xb16/0xec0
> > [<0>] netlink_rcv_skb+0x1e5/0x430
> >
> > No rpcbind is running on this host so that "svc_unregister" takes a
> > long time. Maybe not forever but if a few of these get queued up all
> > blocking some other thread, then maybe that pushed it over the limit.
> >
> > The fact that rpcbind is not running might not be relevant as the test
> > messes up the network. "ping 127.0.0.1" stops working.
> >
> > So this bug comes down to "we try to contact rpcbind while holding a
> > mutex and if that gets no response and no error, then we can hold the
> > mutex for a long time".
> >
> > Are we surprised? Do we want to fix this? Any suggestions how?
>
> In the past, we've tried to address "hanging upcall" issues where
> the kernel part of an administrative command needs a user space
> service that isn't working or present. (eg mount needing a running
> gssd)
>
> If NFSD is using the kernel RPC client for the upcall, then maybe
> adding the RPC_TASK_SOFTCONN flag might turn the hang into an
> immediate failure.
>
> IMO this should be addressed.
>
>
I sent a patch that does the above, but now I'm wondering if we ought
to take another approach. The listener array can be pretty long. What
if we instead were to just drop and reacquire the mutex in the loop at
strategic points? Then we wouldn't squat on the mutex for so long.
Something like this maybe? It's ugly but it might prevent hung task
warnings, and listener setup isn't a fastpath anyway.
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 3adbc05ebaac..5de01fb4c557 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2042,7 +2042,9 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
set_bit(XPT_CLOSE, &xprt->xpt_flags);
spin_unlock_bh(&serv->sv_lock);
svc_xprt_close(xprt);
+
+ /* ensure we don't squat on the mutex for too long */
+ mutex_unlock(&nfsd_mutex);
+ mutex_lock(&nfsd_mutex);
spin_lock_bh(&serv->sv_lock);
}
@@ -2082,6 +2084,10 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
/* always save the latest error */
if (ret < 0)
err = ret;
+
+ /* ensure we don't squat on the mutex for too long */
+ mutex_unlock(&nfsd_mutex);
+ mutex_lock(&nfsd_mutex);
}
if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
next prev parent reply other threads:[~2024-10-09 20:26 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-28 17:54 [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit syzbot
2024-05-28 18:10 ` Jeff Layton
2024-08-04 23:16 ` syzbot
2024-08-31 18:22 ` syzbot
2024-09-01 4:35 ` Edward Adam Davis
2024-09-01 4:51 ` syzbot
2024-09-01 6:25 ` Edward Adam Davis
2024-09-01 6:50 ` syzbot
2024-09-02 1:57 ` NeilBrown
2024-09-04 14:23 ` Chuck Lever
2024-09-04 14:36 ` Jeff Layton
2024-10-09 20:26 ` Jeff Layton [this message]
2024-10-11 18:18 ` Chuck Lever III
2024-10-11 19:15 ` Jeff Layton
2024-10-11 21:08 ` NeilBrown
2024-10-11 21:13 ` Chuck Lever III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f37c0b837fd947362eb9d5bf7873347fbc5aa567.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=kolga@netapp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=lorenzo@kernel.org \
--cc=neilb@suse.de \
--cc=netdev@vger.kernel.org \
--cc=okorniev@redhat.com \
--cc=syzbot+d1e76d963f757db40f91@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox