linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Norris <briannorris@chromium.org>
To: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Amitkumar Karwar <akarwar@marvell.com>,
	linux-wireless@vger.kernel.org, Cathy Luo <cluo@marvell.com>,
	Nishant Sarmukadam <nishants@marvell.com>,
	rajatja@google.com
Subject: Re: [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout
Date: Thu, 16 Mar 2017 12:38:58 -0700	[thread overview]
Message-ID: <20170316193857.GB105900@google.com> (raw)
In-Reply-To: <20170316184115.GA105900@google.com>

Hi Dmitry and Amit,

On Thu, Mar 16, 2017 at 11:41:15AM -0700, Brian Norris wrote:
> On Thu, Mar 16, 2017 at 11:33:17AM -0700, Dmitry Torokhov wrote:
> > On Thu, Mar 16, 2017 at 03:58:52PM +0530, Amitkumar Karwar wrote:
> > > We observed a SHUTDOWN command timeout during reboot stress test
> > > due to a corner case firmware bug. It leads to use-after-free on
> > > adapter structure pointer and crash.
> > > 
> > > Let's add MWIFIEX_IFACE_WORK_DONT_RUN work flag to avoid executing

BTW, the 'DONT_RUN' suggestion was more of a pseudo-code suggestion than
a real name, but I guess it's not terrible :)

> > > any work scheduled after cancel_work_sync() call in teardown path
> > > to resolve the issue.
> > > 
> > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > > ---
> > > v2: New work_flag has been added to resolve the issue cleanly as per
> > > Brian's suggestion.
> > > ---
> > >  drivers/net/wireless/marvell/mwifiex/main.h | 1 +
> > >  drivers/net/wireless/marvell/mwifiex/pcie.c | 4 ++++
> > >  drivers/net/wireless/marvell/mwifiex/sdio.c | 4 ++++
> > >  3 files changed, 9 insertions(+)
> > > 
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
> > > index 5c82972..d5b1fd6 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/main.h
> > > +++ b/drivers/net/wireless/marvell/mwifiex/main.h
> > > @@ -510,6 +510,7 @@ struct mwifiex_roc_cfg {
> > >  enum mwifiex_iface_work_flags {
> > >  	MWIFIEX_IFACE_WORK_DEVICE_DUMP,
> > >  	MWIFIEX_IFACE_WORK_CARD_RESET,
> > > +	MWIFIEX_IFACE_WORK_DONT_RUN,
> > >  };
> > >  
> > >  struct mwifiex_private {
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > index a0d9180..bb3d798 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -294,6 +294,7 @@ static void mwifiex_pcie_remove(struct pci_dev *pdev)
> > >  	if (!adapter || !adapter->priv_num)
> > >  		return;
> > >  
> > > +	set_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags);
> > >  	cancel_work_sync(&card->work);
> > >  
> > >  	reg = card->pcie.reg;
> > > @@ -2721,6 +2722,9 @@ static void mwifiex_pcie_work(struct work_struct *work)
> > >  	struct pcie_service_card *card =
> > >  		container_of(work, struct pcie_service_card, work);
> > >  
> > > +	if (test_bit(MWIFIEX_IFACE_WORK_DONT_RUN, &card->work_flags))
> > > +		return;
> > 
> > I do not see how this could possible prevent use-after-free, assuming
> > that the "card" memory is gone by the time mwifiex_pcie_work() gets to
> > run.
> 
> The 'card' memory isn't getting freed; it's the 'adapter' memory we're
> worried about. This is either already freed (because the FW init
> procedure failed), or else it's freed later in this function via
> mwifiex_remove_card().

I guess there was a slight miscommunication here: Dmitry pointed out to
me that he *was* actually talking about 'card' getting freed -- when it
gets freed after remove() finishes.

So the sequence would have to go like:

1. enter remove()
2. set DONT_RUN flag; cancel_work_sync()
3. begin to shutdown firmware
4. hit, e.g., a command timeout that schedules the work again
5. ** scheduler decides not to schedule the work for a while **
6. we finish mwifiex_remove_card(), and exit from remove() successfully
7. devm_* frees the pcie_service_card (and enclosed work_struct)
8. scheduler tries to run our work item
9. use-after-free!

However unlikely that the delay from 4 to 8 might be, this is indeed a
race condition.

> (We're also worried about having the FW dump race with the FW shutdown
> sequence, which can begin later in this function. This patch blocks both
> races AFAICT.)
> 
> > You need to check this flag before queueing firmware dump work, and
> > make sure it is not racy with setting this flag in mwifiex_pcie_remove()
> > (and sdio).
> 
> That's another approach that could work, but it's a little more
> invasive.

Never mind, that isn't too invasive. There's only one schedule_work() in
pcie.c and two in sdio.c. We could even factor out a helper, that knows
how to check the appropriate MWIFIEX_IFACE_* flags, if we really wanted
to...

Brian

  reply	other threads:[~2017-03-16 19:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-16 10:28 [PATCH v2] mwifiex: fix kernel crash after shutdown command timeout Amitkumar Karwar
2017-03-16 18:33 ` Dmitry Torokhov
2017-03-16 18:41   ` Brian Norris
2017-03-16 19:38     ` Brian Norris [this message]
2017-03-16 20:52       ` Brian Norris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170316193857.GB105900@google.com \
    --to=briannorris@chromium.org \
    --cc=akarwar@marvell.com \
    --cc=cluo@marvell.com \
    --cc=dmitry.torokhov@gmail.com \
    --cc=linux-wireless@vger.kernel.org \
    --cc=nishants@marvell.com \
    --cc=rajatja@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).