From: Brian Norris <briannorris@chromium.org>
To: Amitkumar Karwar <akarwar@marvell.com>
Cc: "linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>,
Cathy Luo <cluo@marvell.com>,
Nishant Sarmukadam <nishants@marvell.com>,
"rajatja@google.com" <rajatja@google.com>,
Xinming Hu <huxm@marvell.com>
Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and resume handlers
Date: Wed, 5 Oct 2016 09:41:38 -0700 [thread overview]
Message-ID: <20161005164138.GB54237@google.com> (raw)
In-Reply-To: <39b7881045ec42628a4ffa654840681e@SC-EXCH04.marvell.com>
Hi Amit,
On Wed, Oct 05, 2016 at 12:26:36PM +0000, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:briannorris@chromium.org]
> > Sent: Wednesday, October 05, 2016 2:35 AM
> > To: Amitkumar Karwar
> > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam;
> > rajatja@google.com; briannorris@google.com; Xinming Hu
> > Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> > resume handlers
> >
> > On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > > From: Xinming Hu <huxm@marvell.com>
> > >
> > > We have observed a kernel crash when system immediately suspends after
> > > booting. There is a race between suspend and driver initialization
> > > paths.
> > > This patch adds hw_status checks to fix the problem
> > >
> > > Signed-off-by: Xinming Hu <huxm@marvell.com>
> > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > > ---
> > > v2: Return failure in suspend/resume handler in this scenario.
> > > ---
> > > drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> > > 1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > index ba9e068..fa6bf85 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > > *dev)
> > >
> > > if (pdev) {
> > > card = pci_get_drvdata(pdev);
> > > - if (!card || !card->adapter) {
> > > + if (!card || !card->adapter ||
> > > + card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
> >
> > Wait, is there no locking on the 'hw_status' field? That is inherently
> > an unsafe race all on its own; you're not guaranteed that this will be
> > read/written atomically. And you also aren't guaranteed that writes to
> > this happen in the order they appear in the code -- in other words,
> > reading this flag doesn't necessarily guarantee that initialization is
> > actually complete (even if that's very likely to be true, given that
> > it's probably just a single-instruction word-access, and any prior HW
> > polling or interrupts likely have done some synchronization and can't be
> > reordered).
> > actually complete
>
> Here is the brief info on how "hw_status" flag is updated.
> 1) It gets changed incrementally during initialization.
> MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE -> MWIFIEX_HW_STATUS_READY
>
> 2) Status will remain READY once driver+firmware is up and running.
>
> 3) Below is status during teardown
> MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET -> MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY
>
> As the events occur one after another, we don't expect a race and
> don't need locking here for the flag. Flag status
> MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.
It seems like, as with patch 1, you're mostly arguing about the writes
to this variable. But writes race with reads as well; how do you
guarantee that you're not seeing incorrect values of 'hw_status' here in
suspend() -- e.g., either old or new values of it, or even
partially-written values, if for some reason the compiler decides it
can't read/write this all in one go?
> In worst case scenario, only first system suspend attempt issued
> immediately after system boot will be aborted with BUSY error. I
> think, that should be fine.
(For the record, my concern about -EBUSY is separate from my concern
about the potential race condition.)
> Let me know if you have any concerns.
Sorry, I probably didn't completely flesh out my thought here.
I think I was concerned about a failed initialization causing the system
to never enter suspend again. So specifically: what happens if (e.g.)
the firmware fails to load? AFAICT, the device doesn't actually unbind
itself from the driver, so instead, you have a device in limbo that will
always return -EBUSY in suspend(), and your system can never again enter
suspend. Am I correct? If so, that doesn't sound great.
> > This is probably better than nothing, but it's not very good.
> >
> > > pr_err("Card or adapter structure is not valid\n");
> > > - return 0;
> > > + return -EBUSY;
> >
> > So the above cases all mean that the driver hasn't finished loading,
> > right?
> >
> > !card => can't happen (PCIe probe() would have failed
> > mwifiex_add_card()), but fine to check
>
> NULL "card" or "card->adapter" pointers are expected in below cases
> 1) Driver initialization failed after downloading the firmware. Driver is performing cleanup in init failure path. Now system suspends.
> 2) Race of teardown + suspend
>
> >
> > !card->adapter => only happens after patch 1; i.e., when tearing down
> > the device and detaching it from the driver
> >
> > card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not loaded
> > (i.e., in the process of starting or stopping FW?)
> >
> > I guess all of those cases make sense to be -EBUSY.
^^ I'm second-guessing my claim here then.
> Yes. we can keep -EBUSY for all of the cases.
Brian
next prev parent reply other threads:[~2016-10-05 16:41 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-04 17:08 [PATCH v2 1/2] mwifiex: reset card->adapter during device unregister Amitkumar Karwar
2016-10-04 17:08 ` [PATCH v2 2/2] mwifiex: check hw_status in suspend and resume handlers Amitkumar Karwar
2016-10-04 21:04 ` Brian Norris
2016-10-05 12:26 ` Amitkumar Karwar
2016-10-05 16:41 ` Brian Norris [this message]
2016-10-05 17:19 ` Cathy Luo
2016-10-06 17:28 ` Amitkumar Karwar
2016-10-04 21:58 ` [PATCH v2 1/2] mwifiex: reset card->adapter during device unregister Brian Norris
2016-10-05 14:04 ` Amitkumar Karwar
2016-10-05 16:30 ` Brian Norris
2016-10-06 13:03 ` Amitkumar Karwar
2016-10-10 20:43 ` Brian Norris
2016-10-10 23:43 ` Dmitry Torokhov
2016-10-10 23:47 ` Brian Norris
2016-10-10 23:53 ` Dmitry Torokhov
2016-10-10 23:54 ` Brian Norris
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161005164138.GB54237@google.com \
--to=briannorris@chromium.org \
--cc=akarwar@marvell.com \
--cc=cluo@marvell.com \
--cc=huxm@marvell.com \
--cc=linux-wireless@vger.kernel.org \
--cc=nishants@marvell.com \
--cc=rajatja@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).