netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: William McVicker <willmcvicker@google.com>
To: Johannes Berg <johannes.berg@intel.com>, linux-wireless@vger.kernel.org
Cc: Marek Szyprowski <m.szyprowski@samsung.com>,
	Kalle Valo <kvalo@codeaurora.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, Amitkumar Karwar <amitkarwar@gmail.com>,
	Ganapathi Bhat <ganapathi.bhat@nxp.com>,
	Xinming Hu <huxinming820@gmail.com>,
	"Cc: Android Kernel" <kernel-team@android.com>
Subject: Re: [BUG] deadlock in nl80211_vendor_cmd
Date: Mon, 21 Mar 2022 17:00:14 +0000	[thread overview]
Message-ID: <YjivHrPnzG7h2Sdf@google.com> (raw)
In-Reply-To: <CABYd82Z=YXmZPTQhf0K1M4nS2wk3dPBSqx91D8SoUd59AUzpHg@mail.gmail.com>

On 03/21/2022, Will McVicker wrote:
> On Thu, Mar 17, 2022 at 10:09 AM <willmcvicker@google.com> wrote:
> 
> > Hi,
> >
> > I wanted to report a deadlock that I'm hitting as a result of the upstream
> > commit a05829a7222e ("cfg80211: avoid holding the RTNL when calling the
> > driver"). I'm using the Pixel 6 with downstream version of the 5.15 kernel,
> > but I'm pretty sure this will happen on the upstream tip-of-tree kernel as
> > well.
> >
> > Basically, my wlan driver uses the wiphy_vendor_command ops to handle
> > a number of vendor specific operations. One of them in particular deletes
> > a cfg80211 interface. The deadlock happens when thread 1 tries to take the
> > RTNL lock before calling cfg80211_unregister_device() while thread 2 is
> > inside nl80211_pre_doit(), holding the RTNL lock, and waiting on
> > wiphy_lock().
> >
> > Here is the call flow:
> >
> > Thread 1:                         Thread 2:
> >
> > nl80211_pre_doit():
> >   -> rtnl_lock()
> >                                       nl80211_pre_doit():
> >                                        -> rtnl_lock()
> >                                        -> <blocked by Thread 1>
> >   -> wiphy_lock()
> >   -> rtnl_unlock()
> >   -> <unblock Thread 1>
> > exit nl80211_pre_doit()
> >                                        <Thread 2 got the RTNL lock>
> >                                        -> wiphy_lock()
> >                                        -> <blocked by Thread 1>
> > nl80211_doit()
> >   -> nl80211_vendor_cmd()
> >       -> rtnl_lock() <DEADLOCK>
> >       -> cfg80211_unregister_device()
> >       -> rtnl_unlock()
> >
> >
> > To be complete, here are the kernel call traces when the deadlock occurs:
> >
> > Thread 1 Call trace:
> >     <Take rtnl before calling cfg80211_unregister_device()>
> >     nl80211_vendor_cmd+0x210/0x218
> >     genl_rcv_msg+0x3ac/0x45c
> >     netlink_rcv_skb+0x130/0x168
> >     genl_rcv+0x38/0x54
> >     netlink_unicast_kernel+0xe4/0x1f4
> >     netlink_unicast+0x128/0x21c
> >     netlink_sendmsg+0x2d8/0x3d8
> >
> > Thread 2 Call trace:
> >     <Take wiphy_lock>
> >     nl80211_pre_doit+0x1b0/0x250
> >     genl_rcv_msg+0x37c/0x45c
> >     netlink_rcv_skb+0x130/0x168
> >     genl_rcv+0x38/0x54
> >     netlink_unicast_kernel+0xe4/0x1f4
> >     netlink_unicast+0x128/0x21c
> >     netlink_sendmsg+0x2d8/0x3d8
> >
> > I'm not an networking expert. So my main question is if I'm allowed to take
> > the RTNL lock inside the nl80211_vendor_cmd callbacks? If so, then
> > regardless of why I take it, we shouldn't be allowing this deadlock
> > situation, right?
> >
> > I hope that helps explain the issue. Let me know if you need any more
> > details.
> >
> > Thanks,
> > Will
> >
> 
> Sorry my CC list got dropped. Adding the following:
> 
> Kalle Valo <kvalo@codeaurora.org>
> "David S. Miller" <davem@davemloft.net>
> Jakub Kicinski <kuba@kernel.org>
> netdev@vger.kernel.org
> Amitkumar Karwar <amitkarwar@gmail.com>
> Ganapathi Bhat <ganapathi.bhat@nxp.com>
> Xinming Hu <huxinming820@gmail.com>
> kernel-team@android.com

Sorry for the noise. The lists bounced due to html. Resending with mutt to make
sure everyone gets this message.

As an update, I was able to fix the deadlock by updating nl80211_pre_doit() to
not hold the RTNL lock while waiting to get the wiphy_lock. This allows us to
take the RTNL lock within nl80211_doit() and have parallel calls to
nl80211_doit(). Below is the logic I tested. Please let me know if I'm heading
in the right direction.

Thanks,
Will

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 686a69381731..bb4ad746509b 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -15227,7 +15227,24 @@ static int nl80211_pre_doit(const struct genl_ops *ops, struct sk_buff *skb,
 	}
 
 	if (rdev && !(ops->internal_flags & NL80211_FLAG_NO_WIPHY_MTX)) {
-		wiphy_lock(&rdev->wiphy);
+		while (!mutex_trylock(&rdev->wiphy.mtx)) {
+			/* Holding the RTNL lock while waiting for the wiphy lock can lead to
+			 * a deadlock within doit() ops that don't hold the RTNL in pre_doit. So
+			 * we need to release the RTNL lock first while we wait for the wiphy
+			 * lock.
+			 */
+			rtnl_unlock();
+			wiphy_lock(&rdev->wiphy);
+
+			/* Once we get the wiphy_lock, we need to grab the RTNL lock. If we can't
+			 * get it, then we need to unlock the wiphy to avoid a deadlock in
+			 * pre_doit and then retry taking the locks again. */
+			if (!rtnl_trylock()) {
+				wiphy_unlock(&rdev->wiphy);
+				rtnl_lock();
+			} else
+				break;
+		}
 		/* we keep the mutex locked until post_doit */
 		__release(&rdev->wiphy.mtx);
 	}

       reply	other threads:[~2022-03-21 17:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0000000000009e9b7105da6d1779@google.com>
     [not found] ` <CABYd82Z=YXmZPTQhf0K1M4nS2wk3dPBSqx91D8SoUd59AUzpHg@mail.gmail.com>
2022-03-21 17:00   ` William McVicker [this message]
     [not found] ` <99eda6d1dad3ff49435b74e539488091642b10a8.camel@sipsolutions.net>
     [not found]   ` <5d5cf050-7de0-7bad-2407-276970222635@quicinc.com>
     [not found]     ` <YjpGlRvcg72zNo8s@google.com>
     [not found]       ` <dc556455-51a2-06e8-8ec5-b807c2901b7e@quicinc.com>
2022-03-24 21:58         ` [BUG] deadlock in nl80211_vendor_cmd William McVicker
2022-03-25 12:04           ` Johannes Berg
2022-03-25 12:06             ` Johannes Berg
2022-03-25 16:49             ` Jakub Kicinski
2022-03-25 17:01               ` Johannes Berg
2022-03-25 18:08                 ` William McVicker
2022-03-25 20:21                   ` Johannes Berg
2022-03-25 20:36                   ` William McVicker
2022-03-25 21:16                     ` Johannes Berg
2022-03-25 21:54                       ` Johannes Berg
2022-03-25 22:18                       ` Jeff Johnson
2022-03-25 23:57                       ` William McVicker
2022-03-26  0:07                         ` Jakub Kicinski
2022-03-26  0:12                           ` William McVicker
2022-03-25 20:40                 ` Jakub Kicinski
2022-03-25 21:25                   ` Johannes Berg
2022-03-25 21:48                     ` Jakub Kicinski
2022-03-25 21:50                       ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjivHrPnzG7h2Sdf@google.com \
    --to=willmcvicker@google.com \
    --cc=amitkarwar@gmail.com \
    --cc=davem@davemloft.net \
    --cc=ganapathi.bhat@nxp.com \
    --cc=huxinming820@gmail.com \
    --cc=johannes.berg@intel.com \
    --cc=kernel-team@android.com \
    --cc=kuba@kernel.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).