* [PATCH] wifi: wilc1000: fix kernel oops during interface down during background scan
@ 2023-04-04 1:20 Ajay.Kathat
2023-04-05 11:40 ` Michael Walle
2023-05-05 15:47 ` Kalle Valo
0 siblings, 2 replies; 7+ messages in thread
From: Ajay.Kathat @ 2023-04-04 1:20 UTC (permalink / raw)
To: linux-wireless; +Cc: Claudiu.Beznea, Sripad.Balwadgi, Ajay.Kathat, mwalle
Fix for kernel crash observed with following test procedure [1]:
while true;
do ifconfig wlan0 up;
iw dev wlan0 scan &
ifconfig wlan0 down;
done
During the above test procedure, the scan results are received from firmware
for 'iw scan' command gets queued even when the interface is going down. It
was causing the kernel oops when dereferencing the freed pointers.
For synchronization, 'mac_close()' calls flush_workqueue() to block its
execution till all pending work is completed. Afterwards 'wilc->close' flag
which is set before the flush_workqueue() should avoid adding new work.
Added 'wilc->close' check in wilc_handle_isr() which is common for
SPI/SDIO bus to ignore the interrupts from firmware that inturns adds the
work since the interface is getting closed.
1. https://lore.kernel.org/linux-wireless/20221024135407.7udo3dwl3mqyv2yj@0002.3ffe.de/
Reported-by: Michael Walle <mwalle@kernel.org>
Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
---
drivers/net/wireless/microchip/wilc1000/netdev.c | 9 +++------
drivers/net/wireless/microchip/wilc1000/wlan.c | 3 +++
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.c b/drivers/net/wireless/microchip/wilc1000/netdev.c
index e9f59de31b0b..40edee10a81f 100644
--- a/drivers/net/wireless/microchip/wilc1000/netdev.c
+++ b/drivers/net/wireless/microchip/wilc1000/netdev.c
@@ -38,11 +38,6 @@ static irqreturn_t isr_bh_routine(int irq, void *userdata)
{
struct wilc *wilc = userdata;
- if (wilc->close) {
- pr_err("Can't handle BH interrupt\n");
- return IRQ_HANDLED;
- }
-
wilc_handle_isr(wilc);
return IRQ_HANDLED;
@@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device *ndev)
if (vif->ndev) {
netif_stop_queue(vif->ndev);
+ if (wl->open_ifcs == 0)
+ wl->close = 1;
+
wilc_handle_disconnect(vif);
wilc_deinit_host_int(vif->ndev);
}
if (wl->open_ifcs == 0) {
netdev_dbg(ndev, "Deinitializing wilc1000\n");
- wl->close = 1;
wilc_wlan_deinitialize(ndev);
}
diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.c b/drivers/net/wireless/microchip/wilc1000/wlan.c
index 58bbf50081e4..700cb657be00 100644
--- a/drivers/net/wireless/microchip/wilc1000/wlan.c
+++ b/drivers/net/wireless/microchip/wilc1000/wlan.c
@@ -1066,6 +1066,9 @@ void wilc_handle_isr(struct wilc *wilc)
{
u32 int_status;
+ if (wilc->close)
+ return;
+
acquire_bus(wilc, WILC_BUS_ACQUIRE_AND_WAKEUP);
wilc->hif_func->hif_read_int(wilc, &int_status);
--
2.34.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: wilc1000: fix kernel oops during interface down during background scan
2023-04-04 1:20 [PATCH] wifi: wilc1000: fix kernel oops during interface down during background scan Ajay.Kathat
@ 2023-04-05 11:40 ` Michael Walle
2023-04-11 11:24 ` Johannes Berg
2023-04-12 0:04 ` Ajay.Kathat
2023-05-05 15:47 ` Kalle Valo
1 sibling, 2 replies; 7+ messages in thread
From: Michael Walle @ 2023-04-05 11:40 UTC (permalink / raw)
To: Ajay.Kathat, Johannes Berg, Kalle Valo
Cc: linux-wireless, Claudiu.Beznea, Sripad.Balwadgi, mwalle
Hi,
[+ wireless and cfg80211 maintainers because I'm not familiar with
cfg80211 ]
> Fix for kernel crash observed with following test procedure [1]:
> while true;
> do ifconfig wlan0 up;
> iw dev wlan0 scan &
> ifconfig wlan0 down;
> done
>
> During the above test procedure, the scan results are received from
> firmware
> for 'iw scan' command gets queued even when the interface is going
> down. It
> was causing the kernel oops when dereferencing the freed pointers.
>
> For synchronization, 'mac_close()' calls flush_workqueue() to block its
> execution till all pending work is completed. Afterwards 'wilc->close'
> flag
> which is set before the flush_workqueue() should avoid adding new work.
> Added 'wilc->close' check in wilc_handle_isr() which is common for
> SPI/SDIO bus to ignore the interrupts from firmware that inturns adds
> the
> work since the interface is getting closed.
With this patch I'm now getting
wilc1000_sdio mmc0:0001:1 wlan0: Failed to send setup multicast
when you close the interface.
>
> 1.
> https://lore.kernel.org/linux-wireless/20221024135407.7udo3dwl3mqyv2yj@0002.3ffe.de/
should be Link:
> Reported-by: Michael Walle <mwalle@kernel.org>
> Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
Missing Fixes: tag. In this regard, most of the previous wilc fixes
patches
miss a proper Fixes tag which makes the wilc1000 pretty unusable on
stable
kernels IMHO :/
> ---
> drivers/net/wireless/microchip/wilc1000/netdev.c | 9 +++------
> drivers/net/wireless/microchip/wilc1000/wlan.c | 3 +++
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.c
> b/drivers/net/wireless/microchip/wilc1000/netdev.c
> index e9f59de31b0b..40edee10a81f 100644
> --- a/drivers/net/wireless/microchip/wilc1000/netdev.c
> +++ b/drivers/net/wireless/microchip/wilc1000/netdev.c
> @@ -38,11 +38,6 @@ static irqreturn_t isr_bh_routine(int irq, void
> *userdata)
> {
> struct wilc *wilc = userdata;
>
> - if (wilc->close) {
> - pr_err("Can't handle BH interrupt\n");
> - return IRQ_HANDLED;
> - }
> -
This check is still in the top half of the interrupt processing.
Shouldn't it be removed there, too? That way you can get rid of
the top half entirely and just let the irq subsys use the default
top half implementation.
> wilc_handle_isr(wilc);
>
> return IRQ_HANDLED;
> @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device
> *ndev)
> if (vif->ndev) {
> netif_stop_queue(vif->ndev);
>
> + if (wl->open_ifcs == 0)
> + wl->close = 1;
Ignoring the fact that this isn't protected somehow and that
there is no write barrier (maybe I'm overthinking this and
it isn't really needed for an 'int' field), this and your
reasoning with the flush_workqueue() sounds legit.
But I'm still not convinced a lock is not required.
wilc_user_scan_req::scan_result is at least updated in
wilc_disconnect() and wilc_deinit().
wilc_disconnect() is called from the cfg80211_ops::disconnect
callback. wilc_deinit() is called from net_device_ops::ndo_stop.
Is there any lock which prevents both functions be called in
parallel? wl->close is checked in the .disconnect op, but as
mentioned above, it is not protected by any lock.
-michael
> +
> wilc_handle_disconnect(vif);
> wilc_deinit_host_int(vif->ndev);
> }
>
> if (wl->open_ifcs == 0) {
> netdev_dbg(ndev, "Deinitializing wilc1000\n");
> - wl->close = 1;
> wilc_wlan_deinitialize(ndev);
> }
>
> diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.c
> b/drivers/net/wireless/microchip/wilc1000/wlan.c
> index 58bbf50081e4..700cb657be00 100644
> --- a/drivers/net/wireless/microchip/wilc1000/wlan.c
> +++ b/drivers/net/wireless/microchip/wilc1000/wlan.c
> @@ -1066,6 +1066,9 @@ void wilc_handle_isr(struct wilc *wilc)
> {
> u32 int_status;
>
> + if (wilc->close)
> + return;
> +
> acquire_bus(wilc, WILC_BUS_ACQUIRE_AND_WAKEUP);
> wilc->hif_func->hif_read_int(wilc, &int_status);
>
> --
> 2.34.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: wilc1000: fix kernel oops during interface down during background scan
2023-04-05 11:40 ` Michael Walle
@ 2023-04-11 11:24 ` Johannes Berg
2023-04-12 0:04 ` Ajay.Kathat
1 sibling, 0 replies; 7+ messages in thread
From: Johannes Berg @ 2023-04-11 11:24 UTC (permalink / raw)
To: Michael Walle, Ajay.Kathat, Kalle Valo
Cc: linux-wireless, Claudiu.Beznea, Sripad.Balwadgi, mwalle
On Wed, 2023-04-05 at 13:40 +0200, Michael Walle wrote:
>
> wilc_disconnect() is called from the cfg80211_ops::disconnect
> callback. wilc_deinit() is called from net_device_ops::ndo_stop.
> Is there any lock which prevents both functions be called in
> parallel?
I don't _think_ there's any common lock, ndo_stop() holds the RTNL, but
cfg80211 for a normal nl80211 disconnect command will only briefly hold
the RTNL and drop it again before calling into the driver.
The internal flags here don't indicate requiring RTNL and that wouldn't
make much sense either:
{
.cmd = NL80211_CMD_DISCONNECT,
.validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
.doit = nl80211_disconnect,
.flags = GENL_UNS_ADMIN_PERM,
.internal_flags = IFLAGS(NL80211_FLAG_NEED_NETDEV_UP),
},
See commit a05829a7222e ("cfg80211: avoid holding the RTNL when calling
the driver").
johannes
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: wilc1000: fix kernel oops during interface down during background scan
2023-04-05 11:40 ` Michael Walle
2023-04-11 11:24 ` Johannes Berg
@ 2023-04-12 0:04 ` Ajay.Kathat
1 sibling, 0 replies; 7+ messages in thread
From: Ajay.Kathat @ 2023-04-12 0:04 UTC (permalink / raw)
To: michael, johannes, kvalo
Cc: linux-wireless, Claudiu.Beznea, Sripad.Balwadgi, mwalle
Hi Michael,
On 4/5/23 04:40, Michael Walle wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know
> the content is safe
>
> Hi,
>
> [+ wireless and cfg80211 maintainers because I'm not familiar with
> cfg80211 ]
>
>> Fix for kernel crash observed with following test procedure [1]:
>> while true;
>> do ifconfig wlan0 up;
>> iw dev wlan0 scan &
>> ifconfig wlan0 down;
>> done
>>
>> During the above test procedure, the scan results are received from
>> firmware
>> for 'iw scan' command gets queued even when the interface is going
>> down. It
>> was causing the kernel oops when dereferencing the freed pointers.
>>
>> For synchronization, 'mac_close()' calls flush_workqueue() to block its
>> execution till all pending work is completed. Afterwards 'wilc->close'
>> flag
>> which is set before the flush_workqueue() should avoid adding new work.
>> Added 'wilc->close' check in wilc_handle_isr() which is common for
>> SPI/SDIO bus to ignore the interrupts from firmware that inturns adds
>> the
>> work since the interface is getting closed.
>
> With this patch I'm now getting
> wilc1000_sdio mmc0:0001:1 wlan0: Failed to send setup multicast
>
> when you close the interface.
>
This is a false alarm. I will modify the patch to ignore this debug
message when the mac_close() is in progress.
>>
>> 1.
>> https://lore.kernel.org/linux-wireless/20221024135407.7udo3dwl3mqyv2yj@0002.3ffe.de/
>
> should be Link:
Okay
>
>> Reported-by: Michael Walle <mwalle@kernel.org>
>> Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
>
> Missing Fixes: tag. In this regard, most of the previous wilc fixes
> patches
> miss a proper Fixes tag which makes the wilc1000 pretty unusable on
> stable
> kernels IMHO :/
>
Sure. I will include the fixes tag in updated version.
>> ---
>> drivers/net/wireless/microchip/wilc1000/netdev.c | 9 +++------
>> drivers/net/wireless/microchip/wilc1000/wlan.c | 3 +++
>> 2 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/net/wireless/microchip/wilc1000/netdev.c
>> b/drivers/net/wireless/microchip/wilc1000/netdev.c
>> index e9f59de31b0b..40edee10a81f 100644
>> --- a/drivers/net/wireless/microchip/wilc1000/netdev.c
>> +++ b/drivers/net/wireless/microchip/wilc1000/netdev.c
>> @@ -38,11 +38,6 @@ static irqreturn_t isr_bh_routine(int irq, void
>> *userdata)
>> {
>> struct wilc *wilc = userdata;
>>
>> - if (wilc->close) {
>> - pr_err("Can't handle BH interrupt\n");
>> - return IRQ_HANDLED;
>> - }
>> -
>
> This check is still in the top half of the interrupt processing.
> Shouldn't it be removed there, too? That way you can get rid of
> the top half entirely and just let the irq subsys use the default
> top half implementation.
>
Yeah, it makes sense. I will include this change in the updated version.
>> wilc_handle_isr(wilc);
>>
>> return IRQ_HANDLED;
>> @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device
>> *ndev)
>> if (vif->ndev) {
>> netif_stop_queue(vif->ndev);
>>
>> + if (wl->open_ifcs == 0)
>> + wl->close = 1;
>
> Ignoring the fact that this isn't protected somehow and that
> there is no write barrier (maybe I'm overthinking this and
> it isn't really needed for an 'int' field), this and your
> reasoning with the flush_workqueue() sounds legit.
>
> But I'm still not convinced a lock is not required.
> wilc_user_scan_req::scan_result is at least updated in
> wilc_disconnect() and wilc_deinit().
>
> wilc_disconnect() is called from the cfg80211_ops::disconnect
> callback. wilc_deinit() is called from net_device_ops::ndo_stop.
> Is there any lock which prevents both functions be called in
> parallel? wl->close is checked in the .disconnect op, but as
> mentioned above, it is not protected by any lock.
Sure, I will prepare a separate patch to handle this.
>
> -michael
>
>> +
>> wilc_handle_disconnect(vif);
>> wilc_deinit_host_int(vif->ndev);
>> }
>>
>> if (wl->open_ifcs == 0) {
>> netdev_dbg(ndev, "Deinitializing wilc1000\n");
>> - wl->close = 1;
>> wilc_wlan_deinitialize(ndev);
>> }
>>
>> diff --git a/drivers/net/wireless/microchip/wilc1000/wlan.c
>> b/drivers/net/wireless/microchip/wilc1000/wlan.c
>> index 58bbf50081e4..700cb657be00 100644
>> --- a/drivers/net/wireless/microchip/wilc1000/wlan.c
>> +++ b/drivers/net/wireless/microchip/wilc1000/wlan.c
>> @@ -1066,6 +1066,9 @@ void wilc_handle_isr(struct wilc *wilc)
>> {
>> u32 int_status;
>>
>> + if (wilc->close)
>> + return;
>> +
>> acquire_bus(wilc, WILC_BUS_ACQUIRE_AND_WAKEUP);
>> wilc->hif_func->hif_read_int(wilc, &int_status);
>>
>> --
>> 2.34.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: wilc1000: fix kernel oops during interface down during background scan
2023-04-04 1:20 [PATCH] wifi: wilc1000: fix kernel oops during interface down during background scan Ajay.Kathat
2023-04-05 11:40 ` Michael Walle
@ 2023-05-05 15:47 ` Kalle Valo
2023-05-05 20:53 ` Ajay.Kathat
1 sibling, 1 reply; 7+ messages in thread
From: Kalle Valo @ 2023-05-05 15:47 UTC (permalink / raw)
To: Ajay.Kathat; +Cc: linux-wireless, Claudiu.Beznea, Sripad.Balwadgi, mwalle
<Ajay.Kathat@microchip.com> writes:
> Fix for kernel crash observed with following test procedure [1]:
> while true;
> do ifconfig wlan0 up;
> iw dev wlan0 scan &
> ifconfig wlan0 down;
> done
>
> During the above test procedure, the scan results are received from firmware
> for 'iw scan' command gets queued even when the interface is going down. It
> was causing the kernel oops when dereferencing the freed pointers.
>
> For synchronization, 'mac_close()' calls flush_workqueue() to block its
> execution till all pending work is completed. Afterwards 'wilc->close' flag
> which is set before the flush_workqueue() should avoid adding new work.
> Added 'wilc->close' check in wilc_handle_isr() which is common for
> SPI/SDIO bus to ignore the interrupts from firmware that inturns adds the
> work since the interface is getting closed.
>
> 1. https://lore.kernel.org/linux-wireless/20221024135407.7udo3dwl3mqyv2yj@0002.3ffe.de/
>
> Reported-by: Michael Walle <mwalle@kernel.org>
> Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
[...]
> @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device *ndev)
> if (vif->ndev) {
> netif_stop_queue(vif->ndev);
>
> + if (wl->open_ifcs == 0)
> + wl->close = 1;
> +
wl-close is an int, I wonder if it's racy to int as a flag like this? In
cases like this I usually use set_bit() & co because those guarantee
atomicity, though don't know if that's overkill.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: wilc1000: fix kernel oops during interface down during background scan
2023-05-05 15:47 ` Kalle Valo
@ 2023-05-05 20:53 ` Ajay.Kathat
2023-05-06 5:50 ` Kalle Valo
0 siblings, 1 reply; 7+ messages in thread
From: Ajay.Kathat @ 2023-05-05 20:53 UTC (permalink / raw)
To: kvalo; +Cc: linux-wireless, Claudiu.Beznea, Sripad.Balwadgi, mwalle
Hi Kalle,
On 5/5/23 08:47, Kalle Valo wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> <Ajay.Kathat@microchip.com> writes:
>
>> Fix for kernel crash observed with following test procedure [1]:
>> while true;
>> do ifconfig wlan0 up;
>> iw dev wlan0 scan &
>> ifconfig wlan0 down;
>> done
>>
>> During the above test procedure, the scan results are received from firmware
>> for 'iw scan' command gets queued even when the interface is going down. It
>> was causing the kernel oops when dereferencing the freed pointers.
>>
>> For synchronization, 'mac_close()' calls flush_workqueue() to block its
>> execution till all pending work is completed. Afterwards 'wilc->close' flag
>> which is set before the flush_workqueue() should avoid adding new work.
>> Added 'wilc->close' check in wilc_handle_isr() which is common for
>> SPI/SDIO bus to ignore the interrupts from firmware that inturns adds the
>> work since the interface is getting closed.
>>
>> 1. https://lore.kernel.org/linux-wireless/20221024135407.7udo3dwl3mqyv2yj@0002.3ffe.de/
>>
>> Reported-by: Michael Walle <mwalle@kernel.org>
>> Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
>
> [...]
>
>> @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device *ndev)
>> if (vif->ndev) {
>> netif_stop_queue(vif->ndev);
>>
>> + if (wl->open_ifcs == 0)
>> + wl->close = 1;
>> +
>
> wl-close is an int, I wonder if it's racy to int as a flag like this? In
> cases like this I usually use set_bit() & co because those guarantee
> atomicity, though don't know if that's overkill.
>
I think it's a good idea to use an atomic operation but I am not sure if
using atomic for 'wl->close' will have much impact. For instance, if any
new work gets added to the workqueue before the 'wl->close=1' is fully
completed, then that work would get executed as normal.
However, I feel it's safe to define 'wl->close' as atomic_t type. I will
prepare the conversion patch and will try to include it along with the
updated version of this patch.
Regards,
Ajay
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] wifi: wilc1000: fix kernel oops during interface down during background scan
2023-05-05 20:53 ` Ajay.Kathat
@ 2023-05-06 5:50 ` Kalle Valo
0 siblings, 0 replies; 7+ messages in thread
From: Kalle Valo @ 2023-05-06 5:50 UTC (permalink / raw)
To: Ajay.Kathat; +Cc: linux-wireless, Claudiu.Beznea, Sripad.Balwadgi, mwalle
<Ajay.Kathat@microchip.com> writes:
>>> @@ -781,13 +776,15 @@ static int wilc_mac_close(struct net_device *ndev)
>>> if (vif->ndev) {
>>> netif_stop_queue(vif->ndev);
>>>
>>> + if (wl->open_ifcs == 0)
>>> + wl->close = 1;
>>> +
>>
>> wl-close is an int, I wonder if it's racy to int as a flag like this? In
>> cases like this I usually use set_bit() & co because those guarantee
>> atomicity, though don't know if that's overkill.
>>
>
> I think it's a good idea to use an atomic operation but I am not sure if
> using atomic for 'wl->close' will have much impact. For instance, if any
> new work gets added to the workqueue before the 'wl->close=1' is fully
> completed, then that work would get executed as normal.
Sure, this is most likely a small race condition. But still a race.
> However, I feel it's safe to define 'wl->close' as atomic_t type. I will
> prepare the conversion patch and will try to include it along with the
> updated version of this patch.
Why atomic_t? You only use values 0 and 1 so test_bit() and set_bit()
sounds more approriate to me.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-05-06 5:51 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-04 1:20 [PATCH] wifi: wilc1000: fix kernel oops during interface down during background scan Ajay.Kathat
2023-04-05 11:40 ` Michael Walle
2023-04-11 11:24 ` Johannes Berg
2023-04-12 0:04 ` Ajay.Kathat
2023-05-05 15:47 ` Kalle Valo
2023-05-05 20:53 ` Ajay.Kathat
2023-05-06 5:50 ` Kalle Valo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).