From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Jiri Pirko <jiri@resnulli.us>, netdev@vger.kernel.org
Cc: arkadiusz.kubalewski@intel.com
Subject: Re: [patch net] dpll: fix possible deadlock during netlink dump operation
Date: Tue, 6 Feb 2024 14:10:05 +0000 [thread overview]
Message-ID: <dcbd7098-be09-4309-a95f-c613977be389@linux.dev> (raw)
In-Reply-To: <20240206125145.354557-1-jiri@resnulli.us>
On 06/02/2024 12:51, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@nvidia.com>
>
> Recently, I've been hitting following deadlock warning during dpll pin
> dump:
>
> [52804.637962] ======================================================
> [52804.638536] WARNING: possible circular locking dependency detected
> [52804.639111] 6.8.0-rc2jiri+ #1 Not tainted
> [52804.639529] ------------------------------------------------------
> [52804.640104] python3/2984 is trying to acquire lock:
> [52804.640581] ffff88810e642678 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}, at: netlink_dump+0xb3/0x780
> [52804.641417]
> but task is already holding lock:
> [52804.642010] ffffffff83bde4c8 (dpll_lock){+.+.}-{3:3}, at: dpll_lock_dumpit+0x13/0x20
> [52804.642747]
> which lock already depends on the new lock.
>
> [52804.643551]
> the existing dependency chain (in reverse order) is:
> [52804.644259]
> -> #1 (dpll_lock){+.+.}-{3:3}:
> [52804.644836] lock_acquire+0x174/0x3e0
> [52804.645271] __mutex_lock+0x119/0x1150
> [52804.645723] dpll_lock_dumpit+0x13/0x20
> [52804.646169] genl_start+0x266/0x320
> [52804.646578] __netlink_dump_start+0x321/0x450
> [52804.647056] genl_family_rcv_msg_dumpit+0x155/0x1e0
> [52804.647575] genl_rcv_msg+0x1ed/0x3b0
> [52804.648001] netlink_rcv_skb+0xdc/0x210
> [52804.648440] genl_rcv+0x24/0x40
> [52804.648831] netlink_unicast+0x2f1/0x490
> [52804.649290] netlink_sendmsg+0x36d/0x660
> [52804.649742] __sock_sendmsg+0x73/0xc0
> [52804.650165] __sys_sendto+0x184/0x210
> [52804.650597] __x64_sys_sendto+0x72/0x80
> [52804.651045] do_syscall_64+0x6f/0x140
> [52804.651474] entry_SYSCALL_64_after_hwframe+0x46/0x4e
> [52804.652001]
> -> #0 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}:
> [52804.652650] check_prev_add+0x1ae/0x1280
> [52804.653107] __lock_acquire+0x1ed3/0x29a0
> [52804.653559] lock_acquire+0x174/0x3e0
> [52804.653984] __mutex_lock+0x119/0x1150
> [52804.654423] netlink_dump+0xb3/0x780
> [52804.654845] __netlink_dump_start+0x389/0x450
> [52804.655321] genl_family_rcv_msg_dumpit+0x155/0x1e0
> [52804.655842] genl_rcv_msg+0x1ed/0x3b0
> [52804.656272] netlink_rcv_skb+0xdc/0x210
> [52804.656721] genl_rcv+0x24/0x40
> [52804.657119] netlink_unicast+0x2f1/0x490
> [52804.657570] netlink_sendmsg+0x36d/0x660
> [52804.658022] __sock_sendmsg+0x73/0xc0
> [52804.658450] __sys_sendto+0x184/0x210
> [52804.658877] __x64_sys_sendto+0x72/0x80
> [52804.659322] do_syscall_64+0x6f/0x140
> [52804.659752] entry_SYSCALL_64_after_hwframe+0x46/0x4e
> [52804.660281]
> other info that might help us debug this:
>
> [52804.661077] Possible unsafe locking scenario:
>
> [52804.661671] CPU0 CPU1
> [52804.662129] ---- ----
> [52804.662577] lock(dpll_lock);
> [52804.662924] lock(nlk_cb_mutex-GENERIC);
> [52804.663538] lock(dpll_lock);
> [52804.664073] lock(nlk_cb_mutex-GENERIC);
> [52804.664490]
>
> The issue as follows: __netlink_dump_start() calls control->start(cb)
> with nlk->cb_mutex held. In control->start(cb) the dpll_lock is taken.
> Then nlk->cb_mutex is released and taken again in netlink_dump(), while
> dpll_lock still being held. That leads to ABBA deadlock when another
> CPU races with the same operation.
>
> Fix this by moving dpll_lock taking into dumpit() callback which ensures
> correct lock taking order.
>
> Fixes: 9d71b54b65b1 ("dpll: netlink: Add DPLL framework base functions")
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Good catch, thanks!
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
> ---
> drivers/dpll/dpll_netlink.c | 20 ++++++--------------
> drivers/dpll/dpll_nl.c | 4 ----
> 2 files changed, 6 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/dpll/dpll_netlink.c b/drivers/dpll/dpll_netlink.c
> index 314bb3775465..4ca9ad16cd95 100644
> --- a/drivers/dpll/dpll_netlink.c
> +++ b/drivers/dpll/dpll_netlink.c
> @@ -1199,6 +1199,7 @@ int dpll_nl_pin_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
> unsigned long i;
> int ret = 0;
>
> + mutex_lock(&dpll_lock);
> xa_for_each_marked_start(&dpll_pin_xa, i, pin, DPLL_REGISTERED,
> ctx->idx) {
> if (!dpll_pin_available(pin))
> @@ -1218,6 +1219,8 @@ int dpll_nl_pin_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
> }
> genlmsg_end(skb, hdr);
> }
> + mutex_unlock(&dpll_lock);
> +
> if (ret == -EMSGSIZE) {
> ctx->idx = i;
> return skb->len;
> @@ -1373,6 +1376,7 @@ int dpll_nl_device_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
> unsigned long i;
> int ret = 0;
>
> + mutex_lock(&dpll_lock);
> xa_for_each_marked_start(&dpll_device_xa, i, dpll, DPLL_REGISTERED,
> ctx->idx) {
> hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
> @@ -1389,6 +1393,8 @@ int dpll_nl_device_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb)
> }
> genlmsg_end(skb, hdr);
> }
> + mutex_unlock(&dpll_lock);
> +
> if (ret == -EMSGSIZE) {
> ctx->idx = i;
> return skb->len;
> @@ -1439,20 +1445,6 @@ dpll_unlock_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
> mutex_unlock(&dpll_lock);
> }
>
> -int dpll_lock_dumpit(struct netlink_callback *cb)
> -{
> - mutex_lock(&dpll_lock);
> -
> - return 0;
> -}
> -
> -int dpll_unlock_dumpit(struct netlink_callback *cb)
> -{
> - mutex_unlock(&dpll_lock);
> -
> - return 0;
> -}
> -
> int dpll_pin_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb,
> struct genl_info *info)
> {
> diff --git a/drivers/dpll/dpll_nl.c b/drivers/dpll/dpll_nl.c
> index eaee5be7aa64..1e95f5397cfc 100644
> --- a/drivers/dpll/dpll_nl.c
> +++ b/drivers/dpll/dpll_nl.c
> @@ -95,9 +95,7 @@ static const struct genl_split_ops dpll_nl_ops[] = {
> },
> {
> .cmd = DPLL_CMD_DEVICE_GET,
> - .start = dpll_lock_dumpit,
> .dumpit = dpll_nl_device_get_dumpit,
> - .done = dpll_unlock_dumpit,
> .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
> },
> {
> @@ -129,9 +127,7 @@ static const struct genl_split_ops dpll_nl_ops[] = {
> },
> {
> .cmd = DPLL_CMD_PIN_GET,
> - .start = dpll_lock_dumpit,
> .dumpit = dpll_nl_pin_get_dumpit,
> - .done = dpll_unlock_dumpit,
> .policy = dpll_pin_get_dump_nl_policy,
> .maxattr = DPLL_A_PIN_ID,
> .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
next prev parent reply other threads:[~2024-02-06 14:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-06 12:51 [patch net] dpll: fix possible deadlock during netlink dump operation Jiri Pirko
2024-02-06 14:10 ` Vadim Fedorenko [this message]
2024-02-07 3:07 ` Jakub Kicinski
2024-02-07 11:46 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dcbd7098-be09-4309-a95f-c613977be389@linux.dev \
--to=vadim.fedorenko@linux.dev \
--cc=arkadiusz.kubalewski@intel.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.