linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Norris <briannorris@chromium.org>
To: Xinming Hu <huxm@marvell.com>, Kalle Valo <kvalo@codeaurora.org>
Cc: Kalle Valo <kvalo@codeaurora.org>,
	Linux Wireless <linux-wireless@vger.kernel.org>,
	Dmitry Torokhov <dtor@google.com>,
	"rajatja@google.com" <rajatja@google.com>,
	Zhiyuan Yang <yangzy@marvell.com>, Tim Song <songtao@marvell.com>,
	Cathy Luo <cluo@marvell.com>, James Cao <jcao@marvell.com>,
	Ganapathi Bhat <gbhat@marvell.com>,
	Ellie Reeves <ellierevves@gmail.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: Re: [PATCH] mwifiex: cancel pcie/sdio work in remove/shutdown handler
Date: Tue, 9 Jan 2018 14:01:29 -0800	[thread overview]
Message-ID: <20180109220128.GA148512@google.com> (raw)
In-Reply-To: <120235b5b7fb4b2286a32dcca2a3878a@SC-EXCH02.marvell.com>

+ Christopher

Hi Simon and Kalle,

On Tue, Jan 09, 2018 at 11:42:21AM +0000, Xinming Hu wrote:
> Hi,
> 
> > -----Original Message-----
> > From: Kalle Valo [mailto:kvalo@codeaurora.org]
> > Sent: 2018年1月9日 15:40
> > To: Brian Norris <briannorris@chromium.org>
> > Cc: Xinming Hu <huxm@marvell.com>; Linux Wireless
> > <linux-wireless@vger.kernel.org>; Dmitry Torokhov <dtor@google.com>;
> > rajatja@google.com; Zhiyuan Yang <yangzy@marvell.com>; Tim Song
> > <songtao@marvell.com>; Cathy Luo <cluo@marvell.com>; James Cao
> > <jcao@marvell.com>; Ganapathi Bhat <gbhat@marvell.com>; Ellie Reeves
> > <ellierevves@gmail.com>
> > Subject: [EXT] Re: [PATCH] mwifiex: cancel pcie/sdio work in remove/shutdown
> > handler
> > 
> > External Email
> > 
> > ----------------------------------------------------------------------
> > Brian Norris <briannorris@chromium.org> writes:
> > 
> > >> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > >> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > >> @@ -310,6 +310,8 @@ static void mwifiex_pcie_remove(struct pci_dev
> > *pdev)
> > >>  		mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
> > >>  	}
> > >>
> > >> +	cancel_work_sync(&card->work);
> > >> +
> > >
> > > Just FYI, this "fix" is not a real fix. It will likely paper over some
> > > of your bugs (where, e.g., the FW shutdown command times out in the
> > > previous couple of lines), but this highlights the fact that there are
> > > other races that could trigger the same behavior. You're not fixing
> > > those.
> > >
> > > For example, what if somebody initiates a scan or other nl80211
> > > command between the above line and mwifiex_remove_card()? That
> > command
> > > could potentially time out too.
> > >
> 
> The hardware status have been reset before downloading the last
> command(FUNC SHUTDOWN), in this way, follow commands  download will be
> ignored and warned.

Hmm, I suppose that's true. So the race I'm talking about probably can't
happen usually. What about in manufacturing mode or !FIRMWARE_READY_PCIE
though? Those cases don't shut down the firmware. Can we still have
outstanding timeouts in those cases?

Anyway, I still think there's a problem though, and this patch is just
going to make things worse. See below.

> > > The proper fix would be to institute some kind of mutual exclusion
> > > (locking) between mwifiex_shutdown_sw() and mwifiex_remove_card(), so
> > > that they can't occur at the same time.
> > >
> 
> I am not sure whether there is any mutual exclusion protect between pcie_reset and pcie_remove in pcie core.
> But it looks a different race.
> We still need this fix, right?

Good point. Previously, there wasn't any such exclusion, and that's why
races like the above were even more likely. But as of 4.13, now there
*is* exclusion. See commit b014e96d1abb ("PCI: Protect
pci_error_handlers->reset_notify() usage with device_lock()"). That
incidentally means that you're creating a deadlock with this patch! [1]

If we start a timeout/reset sequence in mwifiex_init_shutdown_fw()
(called from remove()), then you'll eventually have pci_reset_function()
waiting on the device lock, but mwifiex_pcie_remove() will be holding
the device lock already, and now (with your patch), remove() will be
waiting on the worker thread to finish pci_reset_function()...deadlock!

I actually think that the above patch (adding device_lock()) resolves
most of the race but introduces a possible deadlock. I believe an easy
solution is just to switch to pci_try_reset_function() instead? That
will just abort a reset attempt if we're in the middle of removing the
device. Problem solved? Diff appended, but I'll send out a real version
if that looks right. Can you test your original problem with the above
commit from Christopher, as well as the appended diff?

> Regards,
> Simon
> > > Unfortunately, I only paid attention to this after Kalle already
> > > applied this patch. Personally, I'd prefer this patch not get applied,
> > > since it's a bad solution to an obvious problem, which instead leaves
> > > a subtle problem that perhaps no one will bother fixing.
> > 
> > I can revert it, that's not a problem. Can I use the text below as explanation for
> > the revert?
> > 
> > ----------------------------------------------------------------------
> > Brian Norris <briannorris@chromium.org> says:
> > 
> > Just FYI, this "fix" is not a real fix. It will likely paper over some of your bugs
> > (where, e.g., the FW shutdown command times out in the previous couple of
> > lines), but this highlights the fact that there are other races that could trigger
> > the same behavior. You're not fixing those.
> > 
> > For example, what if somebody initiates a scan or other nl80211 command
> > between the above line and mwifiex_remove_card()? That command could
> > potentially time out too.
> > 
> > The proper fix would be to institute some kind of mutual exclusion
> > (locking) between mwifiex_shutdown_sw() and mwifiex_remove_card(), so that
> > they can't occur at the same time.
> > 
> > ----------------------------------------------------------------------

Hi Kalle, thanks for the above. With the new information from above, I
think that there's a more accurate description, like the following:

---
Brian Norris <briannorris@chromium.org> says:

The "fix" in question might not actually fix all related problems,
and it also looks like it can cause a deadlock. Since commit
b014e96d1abb ("PCI: Protect pci_error_handlers->reset_notify() usage
with device_lock()"), the race in question is actually resolved (PCIe
reset cannot happen at the same time as remove()). Instead, this "fix"
just introduces a deadlock where mwifiex_pcie_card_reset_work() is
waiting on device_lock, which is held by PCIe device remove(), which is
waiting on...mwifiex_pcie_card_reset_work().

The proper thing to do is just to fix the deadlock. Patch for this will
come separately.
---

Brian

[1] Technically, the deadlock is already there, since
mwifiex_remove_card() eventually calls pcie.c's ->cleanup_if(), which
also calls cancel_work_sync(). But your patch doesn't help...

Untested diff:

diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index cd314946452c..5da3d6ccf5f2 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -2781,7 +2781,7 @@ static void mwifiex_pcie_card_reset_work(struct mwifiex_adapter *adapter)
 {
 	struct pcie_service_card *card = adapter->card;
 
-	pci_reset_function(card->dev);
+	pci_try_reset_function(card->dev);
 }
 
 static void mwifiex_pcie_work(struct work_struct *work)

      reply	other threads:[~2018-01-09 22:01 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-09 11:42 Re: [PATCH] mwifiex: cancel pcie/sdio work in remove/shutdown handler Xinming Hu
2018-01-09 22:01 ` Brian Norris [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180109220128.GA148512@google.com \
    --to=briannorris@chromium.org \
    --cc=cluo@marvell.com \
    --cc=dtor@google.com \
    --cc=ellierevves@gmail.com \
    --cc=gbhat@marvell.com \
    --cc=hch@lst.de \
    --cc=huxm@marvell.com \
    --cc=jcao@marvell.com \
    --cc=kvalo@codeaurora.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=rajatja@google.com \
    --cc=songtao@marvell.com \
    --cc=yangzy@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).