linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mwifiex: fix command timeout with SDIO interrupts enabled
@ 2013-09-16 11:39 Daniel Mack
  2013-09-16 11:39 ` [PATCH 1/1] mwifiex: queue main work from main process when bailing on races Daniel Mack
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Mack @ 2013-09-16 11:39 UTC (permalink / raw)
  To: linux-wireless
  Cc: s.neumann, afenkart, bzhao, linville, johannes.berg, Daniel Mack

We've been hunting a command timeout issue in the mwifiex driver which
occurs on an AM33xx platform when Andreas Fenkart's omap_hsmmc SDIO IRQ
patches are applied (they are not yet in the mainline kernel). We first
suspected the mmc host driver to be racy, but that wasn't the case.

I dug a little through the changes between 3.7 and 3.10 on the mwifiex
driver, and it turned out that the culprit is 601216e12 ("mwifiex:
process RX packets in SDIO IRQ thread directly"), especially the
situation where mwifiex_main_process() bails when mwifiex_processing is
set.

In other words: if an SDIO irq arrives while the driver is processing
mwifiex_main_process() from the work queue, we effectively loose the
interrupt, which then results in a command timeout.

I've written a little test case scenario that calls
mwifiex_dump_station_info() through the SIOCGIWSTATS ioctl excessively,
and that reproduces the problem after a couple of seconds already:

  https://gist.github.com/zonque/6579314


The fix for this issue is quite simple and works very well.


Thanks,
Daniel


Daniel Mack (1):
  mwifiex: queue main work from main process when bailing on races

 drivers/net/wireless/mwifiex/main.c | 1 +
 1 file changed, 1 insertion(+)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] mwifiex: queue main work from main process when bailing on races
  2013-09-16 11:39 [PATCH] mwifiex: fix command timeout with SDIO interrupts enabled Daniel Mack
@ 2013-09-16 11:39 ` Daniel Mack
  2013-09-16 21:14   ` Bing Zhao
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Mack @ 2013-09-16 11:39 UTC (permalink / raw)
  To: linux-wireless
  Cc: s.neumann, afenkart, bzhao, linville, johannes.berg, Daniel Mack,
	stable

Queue main_work in case mwifiex_main_process() bails due to an already
processed transaction. This is particularly necessary because
mwifiex_main_process() is called from both the SDIO interrupt handler
and the workqueue. In case an interrupt occurs while the main process
is currently executed from the workqueue, the interrupt is lost,
resulting in a command timeout and consequently a card reset.

I'm marking this for stable kernel in version 3.7+, because on our
platform, the issue appears since 601216e12c ("mwifiex: process RX
packets in SDIO IRQ thread directly") went in.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Sven Neumann <s.neumann@raumfeld.com>
Reported-by: Andreas Fenkart <andreas.fenkart@streamunlimited.com>
Cc: Bing Zhao <bzhao@marvell.com>
Cc: <stable@vger.kernel.org> [v3.7+]
---
 drivers/net/wireless/mwifiex/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/mwifiex/main.c b/drivers/net/wireless/mwifiex/main.c
index ff4ed96..0700bc2 100644
--- a/drivers/net/wireless/mwifiex/main.c
+++ b/drivers/net/wireless/mwifiex/main.c
@@ -235,6 +235,7 @@ int mwifiex_main_process(struct mwifiex_adapter *adapter)
 	/* Check if already processing */
 	if (adapter->mwifiex_processing) {
 		spin_unlock_irqrestore(&adapter->main_proc_lock, flags);
+		queue_work(adapter->workqueue, &adapter->main_work);
 		goto exit_main_proc;
 	} else {
 		adapter->mwifiex_processing = true;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH 1/1] mwifiex: queue main work from main process when bailing on races
  2013-09-16 11:39 ` [PATCH 1/1] mwifiex: queue main work from main process when bailing on races Daniel Mack
@ 2013-09-16 21:14   ` Bing Zhao
  2013-09-16 23:00     ` Daniel Mack
  0 siblings, 1 reply; 5+ messages in thread
From: Bing Zhao @ 2013-09-16 21:14 UTC (permalink / raw)
  To: Daniel Mack, linux-wireless@vger.kernel.org
  Cc: s.neumann@raumfeld.com, afenkart@gmail.com,
	linville@tuxdriver.com, johannes.berg@intel.com,
	stable@vger.kernel.org

Hi Daniel,

Thanks for the patch.

> Queue main_work in case mwifiex_main_process() bails due to an already
> processed transaction. This is particularly necessary because
> mwifiex_main_process() is called from both the SDIO interrupt handler
> and the workqueue. In case an interrupt occurs while the main process
> is currently executed from the workqueue, the interrupt is lost,
> resulting in a command timeout and consequently a card reset.
> 
> I'm marking this for stable kernel in version 3.7+, because on our
> platform, the issue appears since 601216e12c ("mwifiex: process RX
> packets in SDIO IRQ thread directly") went in.
> 
> Signed-off-by: Daniel Mack <zonque@gmail.com>
> Reported-by: Sven Neumann <s.neumann@raumfeld.com>
> Reported-by: Andreas Fenkart <andreas.fenkart@streamunlimited.com>
> Cc: Bing Zhao <bzhao@marvell.com>
> Cc: <stable@vger.kernel.org> [v3.7+]
> ---
>  drivers/net/wireless/mwifiex/main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/wireless/mwifiex/main.c b/drivers/net/wireless/mwifiex/main.c
> index ff4ed96..0700bc2 100644
> --- a/drivers/net/wireless/mwifiex/main.c
> +++ b/drivers/net/wireless/mwifiex/main.c
> @@ -235,6 +235,7 @@ int mwifiex_main_process(struct mwifiex_adapter *adapter)
>  	/* Check if already processing */
>  	if (adapter->mwifiex_processing) {
>  		spin_unlock_irqrestore(&adapter->main_proc_lock, flags);
> +		queue_work(adapter->workqueue, &adapter->main_work);

This is specific to SDIO interface, so we'd add a check here.

+               if (adapter->iface_type == MWIFIEX_SDIO)
+                       queue_work(adapter->workqueue, &adapter->main_work);

Thanks,
Bing

>  		goto exit_main_proc;
>  	} else {
>  		adapter->mwifiex_processing = true;
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] mwifiex: queue main work from main process when bailing on races
  2013-09-16 21:14   ` Bing Zhao
@ 2013-09-16 23:00     ` Daniel Mack
  2013-09-17 18:19       ` Bing Zhao
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Mack @ 2013-09-16 23:00 UTC (permalink / raw)
  To: Bing Zhao
  Cc: linux-wireless@vger.kernel.org, s.neumann@raumfeld.com,
	afenkart@gmail.com, linville@tuxdriver.com,
	johannes.berg@intel.com, stable@vger.kernel.org

Hi Bing,

On 16.09.2013 23:14, Bing Zhao wrote:
>> diff --git a/drivers/net/wireless/mwifiex/main.c b/drivers/net/wireless/mwifiex/main.c
>> index ff4ed96..0700bc2 100644
>> --- a/drivers/net/wireless/mwifiex/main.c
>> +++ b/drivers/net/wireless/mwifiex/main.c
>> @@ -235,6 +235,7 @@ int mwifiex_main_process(struct mwifiex_adapter *adapter)
>>  	/* Check if already processing */
>>  	if (adapter->mwifiex_processing) {
>>  		spin_unlock_irqrestore(&adapter->main_proc_lock, flags);
>> +		queue_work(adapter->workqueue, &adapter->main_work);
> 
> This is specific to SDIO interface,

Is it really? By checking adapter->mwifiex_processing, the driver seems
to expect mwifiex_main_process() to be called from multiple execution
paths, and in that case, we will always loose one execution cycle in
case we bail early. I actually wonder why this didn't hit us earlier,
but I might miss a detail.

OTOH, the worst thing that can happen if the function is executed too
often is that it exits early and does nothing.

> +               if (adapter->iface_type == MWIFIEX_SDIO)
> +                       queue_work(adapter->workqueue, &adapter->main_work);

I can of course add this, but I don't fully understand why the driver
takes care of concurrently running executing paths and then just bails
silently in case a race is detected.


Best regards,
Daniel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH 1/1] mwifiex: queue main work from main process when bailing on races
  2013-09-16 23:00     ` Daniel Mack
@ 2013-09-17 18:19       ` Bing Zhao
  0 siblings, 0 replies; 5+ messages in thread
From: Bing Zhao @ 2013-09-17 18:19 UTC (permalink / raw)
  To: Daniel Mack
  Cc: linux-wireless@vger.kernel.org, s.neumann@raumfeld.com,
	afenkart@gmail.com, linville@tuxdriver.com,
	johannes.berg@intel.com, stable@vger.kernel.org

Hi Daniel,

> >>  	/* Check if already processing */
> >>  	if (adapter->mwifiex_processing) {
> >>  		spin_unlock_irqrestore(&adapter->main_proc_lock, flags);
> >> +		queue_work(adapter->workqueue, &adapter->main_work);
> >
> > This is specific to SDIO interface,
> 
> Is it really? By checking adapter->mwifiex_processing, the driver seems
> to expect mwifiex_main_process() to be called from multiple execution
> paths, and in that case, we will always loose one execution cycle in

You are right. I overlooked it.

> case we bail early. I actually wonder why this didn't hit us earlier,
> but I might miss a detail.

I guess, in your case, the interrupt comes in at line 363 where you have passed the int_status or RX_RCVD checking but the mwifiex_processing flag is still true.

 361         if ((adapter->int_status) || IS_CARD_RX_RCVD(adapter))
 362                 goto process_start;
 363
 364         spin_lock_irqsave(&adapter->main_proc_lock, flags);
 365         adapter->mwifiex_processing = false;
 366         spin_unlock_irqrestore(&adapter->main_proc_lock, flags);

The interrupt thread exits because mwifiex_processing is true.
Therefore the mwifiex_main_process misses this interrupt.

> 
> OTOH, the worst thing that can happen if the function is executed too
> often is that it exits early and does nothing.
> 
> > +               if (adapter->iface_type == MWIFIEX_SDIO)
> > +                       queue_work(adapter->workqueue, &adapter->main_work);
> 
> I can of course add this, but I don't fully understand why the driver
> takes care of concurrently running executing paths and then just bails
> silently in case a race is detected.

No. Your original patch is fine.
Could you resend it as [PATCH 3.12]? I will ACK in that thread.

Thanks,
Bing




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-17 18:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-16 11:39 [PATCH] mwifiex: fix command timeout with SDIO interrupts enabled Daniel Mack
2013-09-16 11:39 ` [PATCH 1/1] mwifiex: queue main work from main process when bailing on races Daniel Mack
2013-09-16 21:14   ` Bing Zhao
2013-09-16 23:00     ` Daniel Mack
2013-09-17 18:19       ` Bing Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).