* Re: Re: [PATCH] mwifiex: cancel pcie/sdio work in remove/shutdown handler
@ 2018-01-09 11:42 Xinming Hu
2018-01-09 22:01 ` Brian Norris
0 siblings, 1 reply; 2+ messages in thread
From: Xinming Hu @ 2018-01-09 11:42 UTC (permalink / raw)
To: Kalle Valo, Brian Norris
Cc: Linux Wireless, Dmitry Torokhov, rajatja@google.com, Zhiyuan Yang,
Tim Song, Cathy Luo, James Cao, Ganapathi Bhat, Ellie Reeves
SGksDQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogS2FsbGUgVmFsbyBb
bWFpbHRvOmt2YWxvQGNvZGVhdXJvcmEub3JnXQ0KPiBTZW50OiAyMDE4xOox1MI5yNUgMTU6NDAN
Cj4gVG86IEJyaWFuIE5vcnJpcyA8YnJpYW5ub3JyaXNAY2hyb21pdW0ub3JnPg0KPiBDYzogWGlu
bWluZyBIdSA8aHV4bUBtYXJ2ZWxsLmNvbT47IExpbnV4IFdpcmVsZXNzDQo+IDxsaW51eC13aXJl
bGVzc0B2Z2VyLmtlcm5lbC5vcmc+OyBEbWl0cnkgVG9yb2tob3YgPGR0b3JAZ29vZ2xlLmNvbT47
DQo+IHJhamF0amFAZ29vZ2xlLmNvbTsgWmhpeXVhbiBZYW5nIDx5YW5nenlAbWFydmVsbC5jb20+
OyBUaW0gU29uZw0KPiA8c29uZ3Rhb0BtYXJ2ZWxsLmNvbT47IENhdGh5IEx1byA8Y2x1b0BtYXJ2
ZWxsLmNvbT47IEphbWVzIENhbw0KPiA8amNhb0BtYXJ2ZWxsLmNvbT47IEdhbmFwYXRoaSBCaGF0
IDxnYmhhdEBtYXJ2ZWxsLmNvbT47IEVsbGllIFJlZXZlcw0KPiA8ZWxsaWVyZXZ2ZXNAZ21haWwu
Y29tPg0KPiBTdWJqZWN0OiBbRVhUXSBSZTogW1BBVENIXSBtd2lmaWV4OiBjYW5jZWwgcGNpZS9z
ZGlvIHdvcmsgaW4gcmVtb3ZlL3NodXRkb3duDQo+IGhhbmRsZXINCj4gDQo+IEV4dGVybmFsIEVt
YWlsDQo+IA0KPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+IEJyaWFuIE5vcnJpcyA8YnJpYW5ub3JyaXNAY2hy
b21pdW0ub3JnPiB3cml0ZXM6DQo+IA0KPiA+PiAtLS0gYS9kcml2ZXJzL25ldC93aXJlbGVzcy9t
YXJ2ZWxsL213aWZpZXgvcGNpZS5jDQo+ID4+ICsrKyBiL2RyaXZlcnMvbmV0L3dpcmVsZXNzL21h
cnZlbGwvbXdpZmlleC9wY2llLmMNCj4gPj4gQEAgLTMxMCw2ICszMTAsOCBAQCBzdGF0aWMgdm9p
ZCBtd2lmaWV4X3BjaWVfcmVtb3ZlKHN0cnVjdCBwY2lfZGV2DQo+ICpwZGV2KQ0KPiA+PiAgCQlt
d2lmaWV4X2luaXRfc2h1dGRvd25fZncocHJpdiwgTVdJRklFWF9GVU5DX1NIVVRET1dOKTsNCj4g
Pj4gIAl9DQo+ID4+DQo+ID4+ICsJY2FuY2VsX3dvcmtfc3luYygmY2FyZC0+d29yayk7DQo+ID4+
ICsNCj4gPg0KPiA+IEp1c3QgRllJLCB0aGlzICJmaXgiIGlzIG5vdCBhIHJlYWwgZml4LiBJdCB3
aWxsIGxpa2VseSBwYXBlciBvdmVyIHNvbWUNCj4gPiBvZiB5b3VyIGJ1Z3MgKHdoZXJlLCBlLmcu
LCB0aGUgRlcgc2h1dGRvd24gY29tbWFuZCB0aW1lcyBvdXQgaW4gdGhlDQo+ID4gcHJldmlvdXMg
Y291cGxlIG9mIGxpbmVzKSwgYnV0IHRoaXMgaGlnaGxpZ2h0cyB0aGUgZmFjdCB0aGF0IHRoZXJl
IGFyZQ0KPiA+IG90aGVyIHJhY2VzIHRoYXQgY291bGQgdHJpZ2dlciB0aGUgc2FtZSBiZWhhdmlv
ci4gWW91J3JlIG5vdCBmaXhpbmcNCj4gPiB0aG9zZS4NCj4gPg0KPiA+IEZvciBleGFtcGxlLCB3
aGF0IGlmIHNvbWVib2R5IGluaXRpYXRlcyBhIHNjYW4gb3Igb3RoZXIgbmw4MDIxMQ0KPiA+IGNv
bW1hbmQgYmV0d2VlbiB0aGUgYWJvdmUgbGluZSBhbmQgbXdpZmlleF9yZW1vdmVfY2FyZCgpPyBU
aGF0DQo+IGNvbW1hbmQNCj4gPiBjb3VsZCBwb3RlbnRpYWxseSB0aW1lIG91dCB0b28uDQo+ID4N
Cg0KVGhlIGhhcmR3YXJlIHN0YXR1cyBoYXZlIGJlZW4gcmVzZXQgYmVmb3JlIGRvd25sb2FkaW5n
IHRoZSBsYXN0IGNvbW1hbmQoRlVOQyBTSFVURE9XTiksIGluIHRoaXMgd2F5LCBmb2xsb3cgY29t
bWFuZHMgIGRvd25sb2FkIHdpbGwgYmUgaWdub3JlZCBhbmQgd2FybmVkLg0KDQo+ID4gVGhlIHBy
b3BlciBmaXggd291bGQgYmUgdG8gaW5zdGl0dXRlIHNvbWUga2luZCBvZiBtdXR1YWwgZXhjbHVz
aW9uDQo+ID4gKGxvY2tpbmcpIGJldHdlZW4gbXdpZmlleF9zaHV0ZG93bl9zdygpIGFuZCBtd2lm
aWV4X3JlbW92ZV9jYXJkKCksIHNvDQo+ID4gdGhhdCB0aGV5IGNhbid0IG9jY3VyIGF0IHRoZSBz
YW1lIHRpbWUuDQo+ID4NCg0KSSBhbSBub3Qgc3VyZSB3aGV0aGVyIHRoZXJlIGlzIGFueSBtdXR1
YWwgZXhjbHVzaW9uIHByb3RlY3QgYmV0d2VlbiBwY2llX3Jlc2V0IGFuZCBwY2llX3JlbW92ZSBp
biBwY2llIGNvcmUuDQpCdXQgaXQgbG9va3MgYSBkaWZmZXJlbnQgcmFjZS4NCldlIHN0aWxsIG5l
ZWQgdGhpcyBmaXgsIHJpZ2h0Pw0KDQpSZWdhcmRzLA0KU2ltb24NCj4gPiBVbmZvcnR1bmF0ZWx5
LCBJIG9ubHkgcGFpZCBhdHRlbnRpb24gdG8gdGhpcyBhZnRlciBLYWxsZSBhbHJlYWR5DQo+ID4g
YXBwbGllZCB0aGlzIHBhdGNoLiBQZXJzb25hbGx5LCBJJ2QgcHJlZmVyIHRoaXMgcGF0Y2ggbm90
IGdldCBhcHBsaWVkLA0KPiA+IHNpbmNlIGl0J3MgYSBiYWQgc29sdXRpb24gdG8gYW4gb2J2aW91
cyBwcm9ibGVtLCB3aGljaCBpbnN0ZWFkIGxlYXZlcw0KPiA+IGEgc3VidGxlIHByb2JsZW0gdGhh
dCBwZXJoYXBzIG5vIG9uZSB3aWxsIGJvdGhlciBmaXhpbmcuDQo+IA0KPiBJIGNhbiByZXZlcnQg
aXQsIHRoYXQncyBub3QgYSBwcm9ibGVtLiBDYW4gSSB1c2UgdGhlIHRleHQgYmVsb3cgYXMgZXhw
bGFuYXRpb24gZm9yDQo+IHRoZSByZXZlcnQ/DQo+IA0KPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+IEJyaWFu
IE5vcnJpcyA8YnJpYW5ub3JyaXNAY2hyb21pdW0ub3JnPiBzYXlzOg0KPiANCj4gSnVzdCBGWUks
IHRoaXMgImZpeCIgaXMgbm90IGEgcmVhbCBmaXguIEl0IHdpbGwgbGlrZWx5IHBhcGVyIG92ZXIg
c29tZSBvZiB5b3VyIGJ1Z3MNCj4gKHdoZXJlLCBlLmcuLCB0aGUgRlcgc2h1dGRvd24gY29tbWFu
ZCB0aW1lcyBvdXQgaW4gdGhlIHByZXZpb3VzIGNvdXBsZSBvZg0KPiBsaW5lcyksIGJ1dCB0aGlz
IGhpZ2hsaWdodHMgdGhlIGZhY3QgdGhhdCB0aGVyZSBhcmUgb3RoZXIgcmFjZXMgdGhhdCBjb3Vs
ZCB0cmlnZ2VyDQo+IHRoZSBzYW1lIGJlaGF2aW9yLiBZb3UncmUgbm90IGZpeGluZyB0aG9zZS4N
Cj4gDQo+IEZvciBleGFtcGxlLCB3aGF0IGlmIHNvbWVib2R5IGluaXRpYXRlcyBhIHNjYW4gb3Ig
b3RoZXIgbmw4MDIxMSBjb21tYW5kDQo+IGJldHdlZW4gdGhlIGFib3ZlIGxpbmUgYW5kIG13aWZp
ZXhfcmVtb3ZlX2NhcmQoKT8gVGhhdCBjb21tYW5kIGNvdWxkDQo+IHBvdGVudGlhbGx5IHRpbWUg
b3V0IHRvby4NCj4gDQo+IFRoZSBwcm9wZXIgZml4IHdvdWxkIGJlIHRvIGluc3RpdHV0ZSBzb21l
IGtpbmQgb2YgbXV0dWFsIGV4Y2x1c2lvbg0KPiAobG9ja2luZykgYmV0d2VlbiBtd2lmaWV4X3No
dXRkb3duX3N3KCkgYW5kIG13aWZpZXhfcmVtb3ZlX2NhcmQoKSwgc28gdGhhdA0KPiB0aGV5IGNh
bid0IG9jY3VyIGF0IHRoZSBzYW1lIHRpbWUuDQo+IA0KPiAtLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQo+IA0KPiAt
LQ0KPiBLYWxsZSBWYWxvDQo=
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Re: [PATCH] mwifiex: cancel pcie/sdio work in remove/shutdown handler
2018-01-09 11:42 Re: [PATCH] mwifiex: cancel pcie/sdio work in remove/shutdown handler Xinming Hu
@ 2018-01-09 22:01 ` Brian Norris
0 siblings, 0 replies; 2+ messages in thread
From: Brian Norris @ 2018-01-09 22:01 UTC (permalink / raw)
To: Xinming Hu, Kalle Valo
Cc: Kalle Valo, Linux Wireless, Dmitry Torokhov, rajatja@google.com,
Zhiyuan Yang, Tim Song, Cathy Luo, James Cao, Ganapathi Bhat,
Ellie Reeves, Christoph Hellwig
+ Christopher
Hi Simon and Kalle,
On Tue, Jan 09, 2018 at 11:42:21AM +0000, Xinming Hu wrote:
> Hi,
>
> > -----Original Message-----
> > From: Kalle Valo [mailto:kvalo@codeaurora.org]
> > Sent: 2018年1月9日 15:40
> > To: Brian Norris <briannorris@chromium.org>
> > Cc: Xinming Hu <huxm@marvell.com>; Linux Wireless
> > <linux-wireless@vger.kernel.org>; Dmitry Torokhov <dtor@google.com>;
> > rajatja@google.com; Zhiyuan Yang <yangzy@marvell.com>; Tim Song
> > <songtao@marvell.com>; Cathy Luo <cluo@marvell.com>; James Cao
> > <jcao@marvell.com>; Ganapathi Bhat <gbhat@marvell.com>; Ellie Reeves
> > <ellierevves@gmail.com>
> > Subject: [EXT] Re: [PATCH] mwifiex: cancel pcie/sdio work in remove/shutdown
> > handler
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > Brian Norris <briannorris@chromium.org> writes:
> >
> > >> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > >> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > >> @@ -310,6 +310,8 @@ static void mwifiex_pcie_remove(struct pci_dev
> > *pdev)
> > >> mwifiex_init_shutdown_fw(priv, MWIFIEX_FUNC_SHUTDOWN);
> > >> }
> > >>
> > >> + cancel_work_sync(&card->work);
> > >> +
> > >
> > > Just FYI, this "fix" is not a real fix. It will likely paper over some
> > > of your bugs (where, e.g., the FW shutdown command times out in the
> > > previous couple of lines), but this highlights the fact that there are
> > > other races that could trigger the same behavior. You're not fixing
> > > those.
> > >
> > > For example, what if somebody initiates a scan or other nl80211
> > > command between the above line and mwifiex_remove_card()? That
> > command
> > > could potentially time out too.
> > >
>
> The hardware status have been reset before downloading the last
> command(FUNC SHUTDOWN), in this way, follow commands download will be
> ignored and warned.
Hmm, I suppose that's true. So the race I'm talking about probably can't
happen usually. What about in manufacturing mode or !FIRMWARE_READY_PCIE
though? Those cases don't shut down the firmware. Can we still have
outstanding timeouts in those cases?
Anyway, I still think there's a problem though, and this patch is just
going to make things worse. See below.
> > > The proper fix would be to institute some kind of mutual exclusion
> > > (locking) between mwifiex_shutdown_sw() and mwifiex_remove_card(), so
> > > that they can't occur at the same time.
> > >
>
> I am not sure whether there is any mutual exclusion protect between pcie_reset and pcie_remove in pcie core.
> But it looks a different race.
> We still need this fix, right?
Good point. Previously, there wasn't any such exclusion, and that's why
races like the above were even more likely. But as of 4.13, now there
*is* exclusion. See commit b014e96d1abb ("PCI: Protect
pci_error_handlers->reset_notify() usage with device_lock()"). That
incidentally means that you're creating a deadlock with this patch! [1]
If we start a timeout/reset sequence in mwifiex_init_shutdown_fw()
(called from remove()), then you'll eventually have pci_reset_function()
waiting on the device lock, but mwifiex_pcie_remove() will be holding
the device lock already, and now (with your patch), remove() will be
waiting on the worker thread to finish pci_reset_function()...deadlock!
I actually think that the above patch (adding device_lock()) resolves
most of the race but introduces a possible deadlock. I believe an easy
solution is just to switch to pci_try_reset_function() instead? That
will just abort a reset attempt if we're in the middle of removing the
device. Problem solved? Diff appended, but I'll send out a real version
if that looks right. Can you test your original problem with the above
commit from Christopher, as well as the appended diff?
> Regards,
> Simon
> > > Unfortunately, I only paid attention to this after Kalle already
> > > applied this patch. Personally, I'd prefer this patch not get applied,
> > > since it's a bad solution to an obvious problem, which instead leaves
> > > a subtle problem that perhaps no one will bother fixing.
> >
> > I can revert it, that's not a problem. Can I use the text below as explanation for
> > the revert?
> >
> > ----------------------------------------------------------------------
> > Brian Norris <briannorris@chromium.org> says:
> >
> > Just FYI, this "fix" is not a real fix. It will likely paper over some of your bugs
> > (where, e.g., the FW shutdown command times out in the previous couple of
> > lines), but this highlights the fact that there are other races that could trigger
> > the same behavior. You're not fixing those.
> >
> > For example, what if somebody initiates a scan or other nl80211 command
> > between the above line and mwifiex_remove_card()? That command could
> > potentially time out too.
> >
> > The proper fix would be to institute some kind of mutual exclusion
> > (locking) between mwifiex_shutdown_sw() and mwifiex_remove_card(), so that
> > they can't occur at the same time.
> >
> > ----------------------------------------------------------------------
Hi Kalle, thanks for the above. With the new information from above, I
think that there's a more accurate description, like the following:
---
Brian Norris <briannorris@chromium.org> says:
The "fix" in question might not actually fix all related problems,
and it also looks like it can cause a deadlock. Since commit
b014e96d1abb ("PCI: Protect pci_error_handlers->reset_notify() usage
with device_lock()"), the race in question is actually resolved (PCIe
reset cannot happen at the same time as remove()). Instead, this "fix"
just introduces a deadlock where mwifiex_pcie_card_reset_work() is
waiting on device_lock, which is held by PCIe device remove(), which is
waiting on...mwifiex_pcie_card_reset_work().
The proper thing to do is just to fix the deadlock. Patch for this will
come separately.
---
Brian
[1] Technically, the deadlock is already there, since
mwifiex_remove_card() eventually calls pcie.c's ->cleanup_if(), which
also calls cancel_work_sync(). But your patch doesn't help...
Untested diff:
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index cd314946452c..5da3d6ccf5f2 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -2781,7 +2781,7 @@ static void mwifiex_pcie_card_reset_work(struct mwifiex_adapter *adapter)
{
struct pcie_service_card *card = adapter->card;
- pci_reset_function(card->dev);
+ pci_try_reset_function(card->dev);
}
static void mwifiex_pcie_work(struct work_struct *work)
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2018-01-09 22:01 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-09 11:42 Re: [PATCH] mwifiex: cancel pcie/sdio work in remove/shutdown handler Xinming Hu
2018-01-09 22:01 ` Brian Norris
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).