From: Greg KH <gregkh@linuxfoundation.org>
To: Ido Schimmel <idosch@nvidia.com>
Cc: stable@vger.kernel.org, davem@davemloft.net, kuba@kernel.org,
pabeni@redhat.com, edumazet@google.com, jiri@nvidia.com,
sashal@kernel.org, vkarri@nvidia.com
Subject: Re: [PATCH stable 6.1] devlink: Fix RCU stall when unregistering a devlink instance
Date: Tue, 1 Oct 2024 14:11:27 +0200 [thread overview]
Message-ID: <2024100135-siren-vocalist-0299@gregkh> (raw)
In-Reply-To: <20241001112035.973187-1-idosch@nvidia.com>
On Tue, Oct 01, 2024 at 02:20:35PM +0300, Ido Schimmel wrote:
> When a devlink instance is unregistered the following happens (among
> other things):
>
> t0 - The instance is marked with 'DEVLINK_UNREGISTERING'.
> t1 - Blocking until an RCU grace period passes.
> t2 - The 'DEVLINK_UNREGISTERING' mark is cleared from the instance.
>
> When iterating over devlink instances (f.e., when requesting a dump of
> available instances) and encountering an instance that is currently
> being unregistered, the current code will loop around until the
> 'DEVLINK_UNREGISTERING' mark is cleared.
>
> The iteration over devlink instances happens in an RCU critical section,
> so if the instance that is currently being unregistered was encountered
> between t0 and t1, the system will deadlock and RCU stalls will be
> reported [1]. The task unregistering the instance will forever wait for an
> RCU grace period to pass and the task iterating over the instances will
> forever wait for the mark to be cleared.
>
> The issue can be reliably reproduced by increasing the time window
> between t0 and t1 (used a 60 seconds sleep) and running the following
> reproducer [2].
>
> Fix by skipping over instances that are currently being unregistered.
>
> [1]
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P344
> (detected by 4, t=26002 jiffies, g=5773, q=12 ncpus=8)
> task:devlink state:R running task stack:25568 pid:344 ppid:260 flags:0x00004002
> [...]
> Call Trace:
> xa_get_mark+0x184/0x3e0
> devlinks_xa_find_get.constprop.0+0xc6/0x2e0
> devlink_nl_cmd_get_dumpit+0x105/0x3f0
> netlink_dump+0x568/0xff0
> __netlink_dump_start+0x651/0x900
> genl_family_rcv_msg_dumpit+0x201/0x340
> genl_rcv_msg+0x573/0x780
> netlink_rcv_skb+0x15f/0x430
> genl_rcv+0x29/0x40
> netlink_unicast+0x546/0x800
> netlink_sendmsg+0x958/0xe60
> __sys_sendto+0x3a2/0x480
> __x64_sys_sendto+0xe1/0x1b0
> do_syscall_64+0x35/0x80
> entry_SYSCALL_64_after_hwframe+0x68/0xd2
>
> [2]
> # echo 10 > /sys/bus/netdevsim/new_device
> # echo 10 > /sys/bus/netdevsim/del_device &
> # devlink dev
>
> Fixes: c2368b19807a ("net: devlink: introduce "unregistering" mark and use it during devlinks iteration")
> Reported-by: Vivek Reddy Karri <vkarri@nvidia.com>
> Signed-off-by: Ido Schimmel <idosch@nvidia.com>
> ---
> I read the stable rules and I am not providing an "upstream commit ID"
> since the code in upstream has been reworked, making this fix
> irrelevant. The only affected stable kernel is 6.1.y.
You need to document the heck out of why this is only relevant for this
one specific kernel branch IN the changelog text, so that we understand
what is going on, AND you need to get acks from the relevant maintainers
of this area of the kernel to accept something that is not in Linus's
tree.
But first of, why? Why not just take the upstrema commits instead?
thanks,
greg k-h
next prev parent reply other threads:[~2024-10-01 12:11 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-01 11:20 [PATCH stable 6.1] devlink: Fix RCU stall when unregistering a devlink instance Ido Schimmel
2024-10-01 12:11 ` Greg KH [this message]
2024-10-01 13:38 ` Ido Schimmel
2024-10-01 16:47 ` Aleksandr Nogikh
2024-10-06 8:51 ` Ido Schimmel
2024-10-01 22:39 ` Jakub Kicinski
2024-10-06 8:44 ` Ido Schimmel
2024-10-06 10:11 ` Greg KH
2024-10-10 14:08 ` Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2024100135-siren-vocalist-0299@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=idosch@nvidia.com \
--cc=jiri@nvidia.com \
--cc=kuba@kernel.org \
--cc=pabeni@redhat.com \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=vkarri@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox