All of lore.kernel.org
 help / color / mirror / Atom feed
From: William McVicker <willmcvicker@google.com>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: linux-wireless@vger.kernel.org,
	Marek Szyprowski <m.szyprowski@samsung.com>
Subject: Re: [BUG] deadlock in nl80211_vendor_cmd
Date: Tue, 22 Mar 2022 00:15:23 +0000	[thread overview]
Message-ID: <YjkVG1ofyUg2IJP0@google.com> (raw)
In-Reply-To: <99eda6d1dad3ff49435b74e539488091642b10a8.camel@sipsolutions.net>

On 03/21/2022, Johannes Berg wrote:
> Hi,
> 
> 
> > Basically, my wlan driver uses the wiphy_vendor_command ops to handle
> > a number of vendor specific operations.
> > 
> 
> I guess it's an out-of-tree driver, since I (hope I) fixed all the
> issues in the code here ... :)
> 
> > One of them in particular deletes
> > a cfg80211 interface.
> 
> There's quite normal API for that, why would you do that?!

Yeah, unfortunately this is the code I was given.

> 
> > The deadlock happens when thread 1 tries to take the
> > RTNL lock before calling cfg80211_unregister_device() while thread 2 is
> > inside nl80211_pre_doit(), holding the RTNL lock, and waiting on
> > wiphy_lock().
> > 
> > Here is the call flow:
> > 
> > Thread 1:                         Thread 2:
> > 
> > nl80211_pre_doit():
> >   -> rtnl_lock()
> >                                       nl80211_pre_doit():
> >                                        -> rtnl_lock()
> >                                        -> <blocked by Thread 1>
> >   -> wiphy_lock()
> >   -> rtnl_unlock()
> >   -> <unblock Thread 1>
> > exit nl80211_pre_doit()
> >                                        <Thread 2 got the RTNL lock>
> >                                        -> wiphy_lock()
> >                                        -> <blocked by Thread 1>
> > nl80211_doit()
> >   -> nl80211_vendor_cmd()
> >       -> rtnl_lock() <DEADLOCK>
> 
> Yeah, I guess the way we invoke vendor commands now w/o RTNL held means
> you cannot safely acquire RTNL in them.
> 
> I mean, the whole above thing basically collapses down to
> 
>  Thread 1                           Thread 2
>   wiphy_lock(); // nl80211
>                                      rtnl_lock();
>                                      wiphy_lock();
>   rtnl_lock();  // your driver
> 
> The correct order to _acquire_ these is rtnl -> wiphy, and we do it that
> way around everywhere (else).
> 
> > I'm not an networking expert. So my main question is if I'm allowed to take
> > the RTNL lock inside the nl80211_vendor_cmd callbacks?
> 
> Evidently, you're not. It's interesting though, it used to be that we
> called these with the RTNL held, now we don't, and the driver you're
> using somehow "got fixed" to take it, but whoever fixed it didn't take
> into account that this is not possible?

So in my quest to upgrade the Pixel 6 kernel from 5.10 to 5.15, I noticed that
I was hitting several ASSERT_RTNL() warnings during wlan testing. When I dug
into those asserts, I found commit a05829a7222e ("cfg80211: avoid holding the
RTNL when calling the driver") was causing these issues. So I went about adding
the necessary locks in the driver which led me to find this ABBA deadlock
scenario.

> 
> > I hope that helps explain the issue. Let me know if you need any more
> > details.
> 
> It does, but I don't think there's any way to fix it. You just
> fundamentally cannot acquire the RTNL in a vendor command operation
> since that introduced the ABBA deadlock you observed.
> 
> Since it's an out-of-tree driver that's about as much as I can help.
> 
> johannes

Yeah, I understand.  Thanks for the response!

--Will

  reply	other threads:[~2022-03-22  0:18 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-17 17:09 [BUG] deadlock in nl80211_vendor_cmd willmcvicker
     [not found] ` <CABYd82Z=YXmZPTQhf0K1M4nS2wk3dPBSqx91D8SoUd59AUzpHg@mail.gmail.com>
2022-03-21 17:00   ` William McVicker
2022-03-21 20:07 ` Johannes Berg
2022-03-22  0:15   ` William McVicker [this message]
2022-03-22 21:31   ` Jeff Johnson
2022-03-22 21:58     ` William McVicker
2022-03-23 16:06       ` Jeff Johnson
2022-03-24 21:58         ` William McVicker
2022-03-25 12:04           ` Johannes Berg
2022-03-25 12:06             ` Johannes Berg
2022-03-25 16:49             ` Jakub Kicinski
2022-03-25 17:01               ` Johannes Berg
2022-03-25 18:08                 ` William McVicker
2022-03-25 20:21                   ` Johannes Berg
2022-03-25 20:36                   ` William McVicker
2022-03-25 21:16                     ` Johannes Berg
2022-03-25 21:54                       ` Johannes Berg
2022-03-25 22:18                       ` Jeff Johnson
2022-03-25 23:57                       ` William McVicker
2022-03-26  0:07                         ` Jakub Kicinski
2022-03-26  0:12                           ` William McVicker
2022-03-25 20:40                 ` Jakub Kicinski
2022-03-25 21:25                   ` Johannes Berg
2022-03-25 21:48                     ` Jakub Kicinski
2022-03-25 21:50                       ` Johannes Berg
2022-03-25 12:09       ` Johannes Berg
2022-03-25 15:59         ` Jeff Johnson
2022-03-25 16:04           ` Johannes Berg
2022-03-25 17:14             ` Jeff Johnson
2022-03-25 12:08     ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjkVG1ofyUg2IJP0@google.com \
    --to=willmcvicker@google.com \
    --cc=johannes@sipsolutions.net \
    --cc=linux-wireless@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.