linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mac80211: fix race condition caused by late addBA resp
@ 2011-11-27  7:23 Nikolay Martynov
  2011-11-27  7:23 ` Nikolay Martynov
  0 siblings, 1 reply; 15+ messages in thread
From: Nikolay Martynov @ 2011-11-27  7:23 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Nikolay Martynov

  Currently if addBA respones comes in just after addba_resp_timer has
expired we still accept addBA response and (try to) open agg
session. This patch fixes this race condition and makes sure that if
addba_resp_timer has expired addBA response is not longer accepted and
we do not try to open half-closed session.

  It looks like this is related to discussion of iwlagn being
'shaky'. I have intel wifi 5300 and experienced complete system locks
from time to time. And sometimes I get the following in logs
(i.e. this is the last thing which gets into log before system freezes):

Nov 26 23:50:09 kolya-laptop kernel: [147253.552830] Open BA session requested for 00:14:d1:53:50:2d tid 0
Nov 26 23:50:09 kolya-laptop kernel: [147253.564129] activated addBA response timer on tid 0
Nov 26 23:50:10 kolya-laptop kernel: [147254.564025] addBA response timer expired on tid 0
Nov 26 23:50:10 kolya-laptop kernel: [147254.564048] Tx BA session stop requested for 00:14:d1:53:50:2d tid 0
Nov 26 23:50:13 kolya-laptop kernel: [147257.089647] switched off addBA timer for tid 0
Nov 26 23:50:13 kolya-laptop kernel: [147257.089653] Aggregation is on for tid 0
Nov 26 23:50:13 kolya-laptop kernel: [147257.089817] iwlwifi 0000:0b:00.0: Tx aggregation enabled on ra = 00:14:d1:53:50:2d tid = 0
Nov 26 23:50:13 kolya-laptop kernel: [147257.089829] ------------[ cut here ]------------
Nov 26 23:50:13 kolya-laptop kernel: [147257.089849] WARNING: at /home/kolya/projects/wireless/compat-wireless/drivers/net/wireless/iwlwifi/iwl-trans-pcie.c:1106 iwl_trans_pcie_tx+0x9d2/0x9e0 [iwlwifi]()
... part of stack trace which got to disk before lock ...

  It can be seen here that session response timer gets expired which
brings agg session down. Shorty after addba resp arrives which tries to bring
that same session up. And it looks like this causes driver confusion and
subsequent system freeze.
  This patch seems to fix the problem for me. Didn't have system
freezes since I've applied it couple of days ago.

  All comments and suggestions are really appreciated.
  Thanks.

Nikolay Martynov (1):
  mac80211: fix race condition caused by late addBA resp

 net/mac80211/agg-tx.c |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] mac80211: fix race condition caused by late addBA resp
  2011-11-27  7:23 [PATCH] mac80211: fix race condition caused by late addBA resp Nikolay Martynov
@ 2011-11-27  7:23 ` Nikolay Martynov
  2011-11-27  9:38   ` Johannes Berg
  2011-11-27 11:50   ` [PATCH v2] mac80211: fix race condition caused by late addBA response Johannes Berg
  0 siblings, 2 replies; 15+ messages in thread
From: Nikolay Martynov @ 2011-11-27  7:23 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, Nikolay Martynov

  Currently if addBA respones comes in just after addba_resp_timer has
expired we still accept addBA response and (try to) open agg
session. This patch fixes this race condition and makes sure that if
addba_resp_timer has expired addBA response is not longer accepted and
we do not try to open half-closed session.

Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
---
 net/mac80211/agg-tx.c |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 39d72cc..683effe 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -746,6 +746,23 @@ void ieee80211_process_addba_resp(struct ieee80211_local *local,
 	if (!tid_tx)
 		goto out;
 
+	del_timer_sync(&tid_tx->addba_resp_timer);
+
+	/*
+	 * Test that we are not stopping agg session now.
+	 * Since addba_resp_timer may have just finished we need to
+	 * check HT_AGG_STATE_STOPPING too.
+	 */
+	if (test_bit(HT_AGG_STATE_WANT_STOP, &tid_tx->state)
+	    || test_bit(HT_AGG_STATE_STOPPING, &tid_tx->state)) {
+#ifdef CONFIG_MAC80211_HT_DEBUG
+		printk(KERN_DEBUG "got addBA resp for tid %d but we are not "
+				"(or no longer) expecting expecting it\n",
+			tid);
+#endif
+		goto out;
+	}
+
 	if (mgmt->u.action.u.addba_resp.dialog_token != tid_tx->dialog_token) {
 #ifdef CONFIG_MAC80211_HT_DEBUG
 		printk(KERN_DEBUG "wrong addBA response token, tid %d\n", tid);
@@ -753,8 +770,6 @@ void ieee80211_process_addba_resp(struct ieee80211_local *local,
 		goto out;
 	}
 
-	del_timer(&tid_tx->addba_resp_timer);
-
 #ifdef CONFIG_MAC80211_HT_DEBUG
 	printk(KERN_DEBUG "switched off addBA timer for tid %d\n", tid);
 #endif
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] mac80211: fix race condition caused by late addBA resp
  2011-11-27  7:23 ` Nikolay Martynov
@ 2011-11-27  9:38   ` Johannes Berg
  2011-11-27 10:42     ` Emmanuel Grumbach
  2011-11-27 15:43     ` Nikolay Martynov
  2011-11-27 11:50   ` [PATCH v2] mac80211: fix race condition caused by late addBA response Johannes Berg
  1 sibling, 2 replies; 15+ messages in thread
From: Johannes Berg @ 2011-11-27  9:38 UTC (permalink / raw)
  To: Nikolay Martynov; +Cc: linville, linux-wireless

On Sun, 2011-11-27 at 02:23 -0500, Nikolay Martynov wrote:
> Currently if addBA respones comes in just after addba_resp_timer has
> expired we still accept addBA response and (try to) open agg
> session. This patch fixes this race condition and makes sure that if
> addba_resp_timer has expired addBA response is not longer accepted and
> we do not try to open half-closed session.

I wonder what happened that we hit all these races now ... kinda
strange.


> +	del_timer_sync(&tid_tx->addba_resp_timer);
> +
> +	/*
> +	 * Test that we are not stopping agg session now.
> +	 * Since addba_resp_timer may have just finished we need to
> +	 * check HT_AGG_STATE_STOPPING too.
> +	 */
> +	if (test_bit(HT_AGG_STATE_WANT_STOP, &tid_tx->state)
> +	    || test_bit(HT_AGG_STATE_STOPPING, &tid_tx->state)) {
> +#ifdef CONFIG_MAC80211_HT_DEBUG
> +		printk(KERN_DEBUG "got addBA resp for tid %d but we are not "
> +				"(or no longer) expecting expecting it\n",
> +			tid);
> +#endif
> +		goto out;
> +	}
> +
>  	if (mgmt->u.action.u.addba_resp.dialog_token != tid_tx->dialog_token) {
>  #ifdef CONFIG_MAC80211_HT_DEBUG
>  		printk(KERN_DEBUG "wrong addBA response token, tid %d\n", tid);

You can't do this -- if this happens and the dialog token is bad the
session will linger forever since the timer is now dead.

Also please move the bool operators to the end of the line rather than
the start:
	if (test_bit(...) ||
	    test_bit(...)) {

Other than that the patch looks fine.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mac80211: fix race condition caused by late addBA resp
  2011-11-27  9:38   ` Johannes Berg
@ 2011-11-27 10:42     ` Emmanuel Grumbach
  2011-11-27 15:43     ` Nikolay Martynov
  1 sibling, 0 replies; 15+ messages in thread
From: Emmanuel Grumbach @ 2011-11-27 10:42 UTC (permalink / raw)
  To: Johannes Berg; +Cc: Nikolay Martynov, linville, linux-wireless

On Sun, Nov 27, 2011 at 11:38, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Sun, 2011-11-27 at 02:23 -0500, Nikolay Martynov wrote:
>> Currently if addBA respones comes in just after addba_resp_timer has
>> expired we still accept addBA response and (try to) open agg
>> session. This patch fixes this race condition and makes sure that if
>> addba_resp_timer has expired addBA response is not longer accepted and
>> we do not try to open half-closed session.
>

I just sent a patch to Norbert that looks a bit the same.
I prefer yours though, as Johannes said in the other thread, it checks
the stop condition better.
I also saw a few other holes in the state machine. Basically we don't
clear flags: i.e. I didn't see where we clear the driver ready flag.
This is the kind of things that can lead to weird stuff.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] mac80211: fix race condition caused by late addBA response
  2011-11-27  7:23 ` Nikolay Martynov
  2011-11-27  9:38   ` Johannes Berg
@ 2011-11-27 11:50   ` Johannes Berg
  2011-11-27 11:53     ` [PATCH v3] " Johannes Berg
  1 sibling, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2011-11-27 11:50 UTC (permalink / raw)
  To: Nikolay Martynov
  Cc: linville, linux-wireless, Norbert Preining, Emmanuel Grumbach

From: Nikolay Martynov <mar.kolya@gmail.com>

If addBA responses comes in just after addba_resp_timer has
expired mac80211 will still accept it and try to open the
aggregation session. This causes drivers to be confused and
in some cases even crash.

This patch fixes the race condition and makes sure that if
addba_resp_timer has expired addBA response is not longer
accepted and we do not try to open half-closed session.

Cc: stable@vger.kernel.org
Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
[some adjustments]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/agg-tx.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

--- a/net/mac80211/agg-tx.c	2011-11-27 12:41:41.000000000 +0100
+++ b/net/mac80211/agg-tx.c	2011-11-27 12:48:03.000000000 +0100
@@ -762,11 +762,27 @@ void ieee80211_process_addba_resp(struct
 		goto out;
 	}
 
-	del_timer(&tid_tx->addba_resp_timer);
+	del_timer_sync(&tid_tx->addba_resp_timer);
 
 #ifdef CONFIG_MAC80211_HT_DEBUG
 	printk(KERN_DEBUG "switched off addBA timer for tid %d\n", tid);
 #endif
+
+	/*
+	 * addba_resp_timer may have fired before we got here, and
+	 * caused WANT_STOP to be set. STOPPING should not be set
+	 * as we're under the mutex, but check it anyway.
+	 */
+	if (test_bit(HT_AGG_STATE_WANT_STOP, &tid_tx->state) ||
+	    test_bit(HT_AGG_STATE_STOPPING, &tid_tx->state)) {
+#ifdef CONFIG_MAC80211_HT_DEBUG
+		printk(KERN_DEBUG "got addBA resp for tid %d but we are not "
+				"(or no longer) expecting expecting it\n",
+			tid);
+#endif
+		goto out;
+	}
+
 	/*
 	 * IEEE 802.11-2007 7.3.1.14:
 	 * In an ADDBA Response frame, when the Status Code field



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] mac80211: fix race condition caused by late addBA response
  2011-11-27 11:50   ` [PATCH v2] mac80211: fix race condition caused by late addBA response Johannes Berg
@ 2011-11-27 11:53     ` Johannes Berg
  2011-11-27 16:12       ` Nikolay Martynov
  2011-11-28  8:18       ` [PATCH v4] " Johannes Berg
  0 siblings, 2 replies; 15+ messages in thread
From: Johannes Berg @ 2011-11-27 11:53 UTC (permalink / raw)
  To: Nikolay Martynov
  Cc: linville, linux-wireless, Norbert Preining, Emmanuel Grumbach

From: Nikolay Martynov <mar.kolya@gmail.com>

If addBA responses comes in just after addba_resp_timer has
expired mac80211 will still accept it and try to open the
aggregation session. This causes drivers to be confused and
in some cases even crash.

This patch fixes the race condition and makes sure that if
addba_resp_timer has expired addBA response is not longer
accepted and we do not try to open half-closed session.

Cc: stable@vger.kernel.org
Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
[some adjustments]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
v3: adjust message

 net/mac80211/agg-tx.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

--- a/net/mac80211/agg-tx.c	2011-11-27 12:51:16.000000000 +0100
+++ b/net/mac80211/agg-tx.c	2011-11-27 12:52:30.000000000 +0100
@@ -762,11 +762,27 @@ void ieee80211_process_addba_resp(struct
 		goto out;
 	}
 
-	del_timer(&tid_tx->addba_resp_timer);
+	del_timer_sync(&tid_tx->addba_resp_timer);
 
 #ifdef CONFIG_MAC80211_HT_DEBUG
 	printk(KERN_DEBUG "switched off addBA timer for tid %d\n", tid);
 #endif
+
+	/*
+	 * addba_resp_timer may have fired before we got here, and
+	 * caused WANT_STOP to be set. STOPPING should not be set
+	 * as we're under the mutex, but check it anyway.
+	 */
+	if (test_bit(HT_AGG_STATE_WANT_STOP, &tid_tx->state) ||
+	    test_bit(HT_AGG_STATE_STOPPING, &tid_tx->state)) {
+#ifdef CONFIG_MAC80211_HT_DEBUG
+		printk(KERN_DEBUG
+		       "got addBA resp for tid %d but we already gave up\n",
+		       tid);
+#endif
+		goto out;
+	}
+
 	/*
 	 * IEEE 802.11-2007 7.3.1.14:
 	 * In an ADDBA Response frame, when the Status Code field



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mac80211: fix race condition caused by late addBA resp
  2011-11-27  9:38   ` Johannes Berg
  2011-11-27 10:42     ` Emmanuel Grumbach
@ 2011-11-27 15:43     ` Nikolay Martynov
  1 sibling, 0 replies; 15+ messages in thread
From: Nikolay Martynov @ 2011-11-27 15:43 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linville, linux-wireless

Hi,

>
>
>> +     del_timer_sync(&tid_tx->addba_resp_timer);
>> +
>> +     /*
>> +      * Test that we are not stopping agg session now.
>> +      * Since addba_resp_timer may have just finished we need to
>> +      * check HT_AGG_STATE_STOPPING too.
>> +      */
>> +     if (test_bit(HT_AGG_STATE_WANT_STOP, &tid_tx->state)
>> +         || test_bit(HT_AGG_STATE_STOPPING, &tid_tx->state)) {
>> +#ifdef CONFIG_MAC80211_HT_DEBUG
>> +             printk(KERN_DEBUG "got addBA resp for tid %d but we are not "
>> +                             "(or no longer) expecting expecting it\n",
>> +                     tid);
>> +#endif
>> +             goto out;
>> +     }
>> +
>>       if (mgmt->u.action.u.addba_resp.dialog_token != tid_tx->dialog_token) {
>>  #ifdef CONFIG_MAC80211_HT_DEBUG
>>               printk(KERN_DEBUG "wrong addBA response token, tid %d\n", tid);
>
> You can't do this -- if this happens and the dialog token is bad the
> session will linger forever since the timer is now dead.

   I'm not sure I understand what you mean by this and how my patch
affects dialog_token. I'd appreciate if you could explain this in some
detail.

>
> Also please move the bool operators to the end of the line rather than
> the start:
>        if (test_bit(...) ||
>            test_bit(...)) {
>
> Other than that the patch looks fine.

  I saw you have posted couple of newer versions of this. Shall I
still make this change and repost?

  Thanks!

-- 
Truthfully yours,
Martynov Nikolay.
Email: mar.kolya@gmail.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mac80211: fix race condition caused by late addBA response
  2011-11-27 11:53     ` [PATCH v3] " Johannes Berg
@ 2011-11-27 16:12       ` Nikolay Martynov
  2011-11-27 16:55         ` Johannes Berg
  2011-11-28  8:18       ` [PATCH v4] " Johannes Berg
  1 sibling, 1 reply; 15+ messages in thread
From: Nikolay Martynov @ 2011-11-27 16:12 UTC (permalink / raw)
  To: Johannes Berg
  Cc: linville, linux-wireless, Norbert Preining, Emmanuel Grumbach

Hi,

2011/11/27 Johannes Berg <johannes@sipsolutions.net>:
> From: Nikolay Martynov <mar.kolya@gmail.com>
>
> If addBA responses comes in just after addba_resp_timer has
> expired mac80211 will still accept it and try to open the
> aggregation session. This causes drivers to be confused and
> in some cases even crash.
>
> This patch fixes the race condition and makes sure that if
> addba_resp_timer has expired addBA response is not longer
> accepted and we do not try to open half-closed session.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
> [some adjustments]
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> ---
> v3: adjust message
>
>  net/mac80211/agg-tx.c |   18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> --- a/net/mac80211/agg-tx.c     2011-11-27 12:51:16.000000000 +0100
> +++ b/net/mac80211/agg-tx.c     2011-11-27 12:52:30.000000000 +0100
> @@ -762,11 +762,27 @@ void ieee80211_process_addba_resp(struct
>                goto out;
>        }
>
> -       del_timer(&tid_tx->addba_resp_timer);
> +       del_timer_sync(&tid_tx->addba_resp_timer);
>
>  #ifdef CONFIG_MAC80211_HT_DEBUG
>        printk(KERN_DEBUG "switched off addBA timer for tid %d\n", tid);
>  #endif
> +
> +       /*
> +        * addba_resp_timer may have fired before we got here, and
> +        * caused WANT_STOP to be set. STOPPING should not be set
> +        * as we're under the mutex, but check it anyway.
> +        */
> +       if (test_bit(HT_AGG_STATE_WANT_STOP, &tid_tx->state) ||
> +           test_bit(HT_AGG_STATE_STOPPING, &tid_tx->state)) {

  Just a small comment about comment :).
  If I understand correctly process of stopping tx agg looks as following:
  1) Call to ___ieee80211_stop_tx_ba_session. This removes
HT_AGG_STATE_WANT_STOP (in ht.c) and adds HT_AGG_STATE_STOPPING. It
holds mutex during duration of the call.
  2) Call to ieee80211_stop_tx_ba_cb. This destroys the actual tx_tid.
It holds mutex during duration of the call.

  But between these two calls we have HT_AGG_STATE_STOPPING and no
mutex. I think at this time we can receive addBA resp and that's why I
was checking for HT_AGG_STATE_STOPPING in my original patch.

  I'd appreciate if you could let me know if my understanding is wrong.
  Thanks!

-- 
Truthfully yours,
Martynov Nikolay.
Email: mar.kolya@gmail.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mac80211: fix race condition caused by late addBA response
  2011-11-27 16:12       ` Nikolay Martynov
@ 2011-11-27 16:55         ` Johannes Berg
  2011-11-28  6:35           ` Emmanuel Grumbach
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2011-11-27 16:55 UTC (permalink / raw)
  To: Nikolay Martynov
  Cc: linville, linux-wireless, Norbert Preining, Emmanuel Grumbach

On Sun, 2011-11-27 at 11:12 -0500, Nikolay Martynov wrote:

> > -       del_timer(&tid_tx->addba_resp_timer);
> > +       del_timer_sync(&tid_tx->addba_resp_timer);
> >
> >  #ifdef CONFIG_MAC80211_HT_DEBUG
> >        printk(KERN_DEBUG "switched off addBA timer for tid %d\n", tid);
> >  #endif
> > +
> > +       /*
> > +        * addba_resp_timer may have fired before we got here, and
> > +        * caused WANT_STOP to be set. STOPPING should not be set
> > +        * as we're under the mutex, but check it anyway.
> > +        */
> > +       if (test_bit(HT_AGG_STATE_WANT_STOP, &tid_tx->state) ||
> > +           test_bit(HT_AGG_STATE_STOPPING, &tid_tx->state)) {
> 
>   Just a small comment about comment :).
>   If I understand correctly process of stopping tx agg looks as following:
>   1) Call to ___ieee80211_stop_tx_ba_session. This removes
> HT_AGG_STATE_WANT_STOP (in ht.c) and adds HT_AGG_STATE_STOPPING. It
> holds mutex during duration of the call.
>   2) Call to ieee80211_stop_tx_ba_cb. This destroys the actual tx_tid.
> It holds mutex during duration of the call.

>   But between these two calls we have HT_AGG_STATE_STOPPING and no
> mutex. I think at this time we can receive addBA resp and that's why I
> was checking for HT_AGG_STATE_STOPPING in my original patch.

Indeed, I missed that. It's due to the driver callback again.

How about this?

/*
 * addba_resp_timer may have fired before we got here, and
 * caused WANT_STOP to be set. If the stop then was already
 * processed further, STOPPING might be set.
 */


Did you notice that I moved this code to after the dialog token check?

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mac80211: fix race condition caused by late addBA response
  2011-11-27 16:55         ` Johannes Berg
@ 2011-11-28  6:35           ` Emmanuel Grumbach
  2011-11-28  8:16             ` Johannes Berg
  0 siblings, 1 reply; 15+ messages in thread
From: Emmanuel Grumbach @ 2011-11-28  6:35 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Nikolay Martynov, linville, linux-wireless, Norbert Preining,
	Emmanuel Grumbach

On Sun, Nov 27, 2011 at 18:55, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Sun, 2011-11-27 at 11:12 -0500, Nikolay Martynov wrote:
>
>> > -       del_timer(&tid_tx->addba_resp_timer);
>> > +       del_timer_sync(&tid_tx->addba_resp_timer);
>> >
>> >  #ifdef CONFIG_MAC80211_HT_DEBUG
>> >        printk(KERN_DEBUG "switched off addBA timer for tid %d\n", tid);
>> >  #endif
>> > +
>> > +       /*
>> > +        * addba_resp_timer may have fired before we got here, and
>> > +        * caused WANT_STOP to be set. STOPPING should not be set
>> > +        * as we're under the mutex, but check it anyway.
>> > +        */
>> > +       if (test_bit(HT_AGG_STATE_WANT_STOP, &tid_tx->state) ||
>> > +           test_bit(HT_AGG_STATE_STOPPING, &tid_tx->state)) {
>>
>>   Just a small comment about comment :).
>>   If I understand correctly process of stopping tx agg looks as following:
>>   1) Call to ___ieee80211_stop_tx_ba_session. This removes
>> HT_AGG_STATE_WANT_STOP (in ht.c) and adds HT_AGG_STATE_STOPPING. It
>> holds mutex during duration of the call.
>>   2) Call to ieee80211_stop_tx_ba_cb. This destroys the actual tx_tid.
>> It holds mutex during duration of the call.
>
>>   But between these two calls we have HT_AGG_STATE_STOPPING and no
>> mutex. I think at this time we can receive addBA resp and that's why I
>> was checking for HT_AGG_STATE_STOPPING in my original patch.
>
> Indeed, I missed that. It's due to the driver callback again.
>
> How about this?
>
> /*
>  * addba_resp_timer may have fired before we got here, and
>  * caused WANT_STOP to be set. If the stop then was already
>  * processed further, STOPPING might be set.
>  */
>
>
> Did you notice that I moved this code to after the dialog token check?
>

Don't you think we should also send a delBA ? The AP thinks we will Tx
in Agg and basically we are now out of sync.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mac80211: fix race condition caused by late addBA response
  2011-11-28  6:35           ` Emmanuel Grumbach
@ 2011-11-28  8:16             ` Johannes Berg
  2011-11-28 12:34               ` Emmanuel Grumbach
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2011-11-28  8:16 UTC (permalink / raw)
  To: Emmanuel Grumbach
  Cc: Nikolay Martynov, linville, linux-wireless, Norbert Preining,
	Emmanuel Grumbach

On Mon, 2011-11-28 at 08:35 +0200, Emmanuel Grumbach wrote:

> > /*
> >  * addba_resp_timer may have fired before we got here, and
> >  * caused WANT_STOP to be set. If the stop then was already
> >  * processed further, STOPPING might be set.
> >  */
> >
> >
> > Did you notice that I moved this code to after the dialog token check?
> >
> 
> Don't you think we should also send a delBA ? The AP thinks we will Tx
> in Agg and basically we are now out of sync.

We do, during the timer stop path, afaict.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4] mac80211: fix race condition caused by late addBA response
  2011-11-27 11:53     ` [PATCH v3] " Johannes Berg
  2011-11-27 16:12       ` Nikolay Martynov
@ 2011-11-28  8:18       ` Johannes Berg
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Berg @ 2011-11-28  8:18 UTC (permalink / raw)
  To: Nikolay Martynov
  Cc: linville, linux-wireless, Norbert Preining, Emmanuel Grumbach

From: Nikolay Martynov <mar.kolya@gmail.com>

If addBA responses comes in just after addba_resp_timer has
expired mac80211 will still accept it and try to open the
aggregation session. This causes drivers to be confused and
in some cases even crash.

This patch fixes the race condition and makes sure that if
addba_resp_timer has expired addBA response is not longer
accepted and we do not try to open half-closed session.

Cc: stable@vger.kernel.org
Signed-off-by: Nikolay Martynov <mar.kolya@gmail.com>
[some adjustments]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
v3: adjust message
v4: adjust message again

 net/mac80211/agg-tx.c |   18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

--- a/net/mac80211/agg-tx.c	2011-11-27 12:51:16.000000000 +0100
+++ b/net/mac80211/agg-tx.c	2011-11-28 09:17:14.000000000 +0100
@@ -762,11 +762,27 @@ void ieee80211_process_addba_resp(struct
 		goto out;
 	}
 
-	del_timer(&tid_tx->addba_resp_timer);
+	del_timer_sync(&tid_tx->addba_resp_timer);
 
 #ifdef CONFIG_MAC80211_HT_DEBUG
 	printk(KERN_DEBUG "switched off addBA timer for tid %d\n", tid);
 #endif
+
+	/*
+	 * addba_resp_timer may have fired before we got here, and
+	 * caused WANT_STOP to be set. If the stop then was already
+	 * processed further, STOPPING might be set.
+	 */
+	if (test_bit(HT_AGG_STATE_WANT_STOP, &tid_tx->state) ||
+	    test_bit(HT_AGG_STATE_STOPPING, &tid_tx->state)) {
+#ifdef CONFIG_MAC80211_HT_DEBUG
+		printk(KERN_DEBUG
+		       "got addBA resp for tid %d but we already gave up\n",
+		       tid);
+#endif
+		goto out;
+	}
+
 	/*
 	 * IEEE 802.11-2007 7.3.1.14:
 	 * In an ADDBA Response frame, when the Status Code field



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mac80211: fix race condition caused by late addBA response
  2011-11-28  8:16             ` Johannes Berg
@ 2011-11-28 12:34               ` Emmanuel Grumbach
  2011-11-28 13:18                 ` Johannes Berg
  0 siblings, 1 reply; 15+ messages in thread
From: Emmanuel Grumbach @ 2011-11-28 12:34 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Nikolay Martynov, linville, linux-wireless, Norbert Preining,
	Emmanuel Grumbach

On Mon, Nov 28, 2011 at 10:16, Johannes Berg <johannes@sipsolutions.net> wrote:
> On Mon, 2011-11-28 at 08:35 +0200, Emmanuel Grumbach wrote:
>
>> > /*
>> >  * addba_resp_timer may have fired before we got here, and
>> >  * caused WANT_STOP to be set. If the stop then was already
>> >  * processed further, STOPPING might be set.
>> >  */
>> >
>> >
>> > Did you notice that I moved this code to after the dialog token check?
>> >
>>
>> Don't you think we should also send a delBA ? The AP thinks we will Tx
>> in Agg and basically we are now out of sync.
>
> We do, during the timer stop path, afaict.
>

Yes, but obviously the AP didn't hear it since it sent the addBA resp.
IMHO, we should send it again.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mac80211: fix race condition caused by late addBA response
  2011-11-28 12:34               ` Emmanuel Grumbach
@ 2011-11-28 13:18                 ` Johannes Berg
  2011-11-28 14:21                   ` Nikolay Martynov
  0 siblings, 1 reply; 15+ messages in thread
From: Johannes Berg @ 2011-11-28 13:18 UTC (permalink / raw)
  To: Emmanuel Grumbach
  Cc: Nikolay Martynov, linville, linux-wireless, Norbert Preining,
	Emmanuel Grumbach

On Mon, 2011-11-28 at 14:34 +0200, Emmanuel Grumbach wrote:
> On Mon, Nov 28, 2011 at 10:16, Johannes Berg <johannes@sipsolutions.net> wrote:
> > On Mon, 2011-11-28 at 08:35 +0200, Emmanuel Grumbach wrote:
> >
> >> > /*
> >> >  * addba_resp_timer may have fired before we got here, and
> >> >  * caused WANT_STOP to be set. If the stop then was already
> >> >  * processed further, STOPPING might be set.
> >> >  */
> >> >
> >> >
> >> > Did you notice that I moved this code to after the dialog token check?
> >> >
> >>
> >> Don't you think we should also send a delBA ? The AP thinks we will Tx
> >> in Agg and basically we are now out of sync.
> >
> > We do, during the timer stop path, afaict.
> >
> 
> Yes, but obviously the AP didn't hear it since it sent the addBA resp.
> IMHO, we should send it again.

I don't think so? This is addressing a race between the timer &
receiving the frame, so typically the delBA would not have gone out yet.

johannes


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] mac80211: fix race condition caused by late addBA response
  2011-11-28 13:18                 ` Johannes Berg
@ 2011-11-28 14:21                   ` Nikolay Martynov
  0 siblings, 0 replies; 15+ messages in thread
From: Nikolay Martynov @ 2011-11-28 14:21 UTC (permalink / raw)
  To: Johannes Berg
  Cc: Emmanuel Grumbach, linville, linux-wireless, Norbert Preining,
	Emmanuel Grumbach

2011/11/28 Johannes Berg <johannes@sipsolutions.net>:
> On Mon, 2011-11-28 at 14:34 +0200, Emmanuel Grumbach wrote:
>> On Mon, Nov 28, 2011 at 10:16, Johannes Berg <johannes@sipsolutions.net> wrote:
>> > On Mon, 2011-11-28 at 08:35 +0200, Emmanuel Grumbach wrote:
>> >
>> >> > /*
>> >> >  * addba_resp_timer may have fired before we got here, and
>> >> >  * caused WANT_STOP to be set. If the stop then was already
>> >> >  * processed further, STOPPING might be set.
>> >> >  */
>> >> >
>> >> >
>> >> > Did you notice that I moved this code to after the dialog token check?
>> >> >
>> >>
>> >> Don't you think we should also send a delBA ? The AP thinks we will Tx
>> >> in Agg and basically we are now out of sync.
>> >
>> > We do, during the timer stop path, afaict.
>> >
>>
>> Yes, but obviously the AP didn't hear it since it sent the addBA resp.
>> IMHO, we should send it again.
>
> I don't think so? This is addressing a race between the timer &
> receiving the frame, so typically the delBA would not have gone out yet.

  IMHO if we hit the fact that timer has already gone off and we got
addba this normally means that we did send dellba. But it just didn't
have a chance to reach remote side before addba left if.
  On the other hand, even if other side doesn't get dellba eventually
(~5 seconds) it'll timeout rx session and we come to sync again. The
whole timer-vs-addba race doesn't happen that often, so this seems
acceptable.

-- 
Truthfully yours,
Martynov Nikolay.
Email: mar.kolya@gmail.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-11-28 14:21 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-27  7:23 [PATCH] mac80211: fix race condition caused by late addBA resp Nikolay Martynov
2011-11-27  7:23 ` Nikolay Martynov
2011-11-27  9:38   ` Johannes Berg
2011-11-27 10:42     ` Emmanuel Grumbach
2011-11-27 15:43     ` Nikolay Martynov
2011-11-27 11:50   ` [PATCH v2] mac80211: fix race condition caused by late addBA response Johannes Berg
2011-11-27 11:53     ` [PATCH v3] " Johannes Berg
2011-11-27 16:12       ` Nikolay Martynov
2011-11-27 16:55         ` Johannes Berg
2011-11-28  6:35           ` Emmanuel Grumbach
2011-11-28  8:16             ` Johannes Berg
2011-11-28 12:34               ` Emmanuel Grumbach
2011-11-28 13:18                 ` Johannes Berg
2011-11-28 14:21                   ` Nikolay Martynov
2011-11-28  8:18       ` [PATCH v4] " Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).