kmemleak report related to ieee80211_start_tx_ba_session, tid_start

linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues?
@ 2013-06-12 18:21 Ben Greear
  2013-06-12 20:46 ` Johannes Berg
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2013-06-12 18:21 UTC (permalink / raw)
  To: linux-wireless@vger.kernel.org

This is on 3.9.5+

I see a fair amount of these.  Once again, the locking is tricky
for my poor brain, but I am suspicious of this part of things.

It seems that ampdu_mlme.mtx is used to protect the tid
arrays (although sta->lock also applies to part of it).

In ieee80211_start_tx_ba_session we are accessing and assigning the tid_start_tx
without holding the ampdu_mlme.mtx mutex.

	spin_lock_bh(&sta->lock);
.....
	tid_tx = rcu_dereference_protected_tid_tx(sta, tid);
	/* check if the TID is not in aggregation flow already */
	if (tid_tx || sta->ampdu_mlme.tid_start_tx[tid]) {

....

	/*
	 * Finally, assign it to the start array; the work item will
	 * collect it and move it to the normal array.
	 */
	sta->ampdu_mlme.tid_start_tx[tid] = tid_tx;


Elsewhere, in ieee80211_ba_session_work, we access the tid_start_tx
without the sta->lock held, but with the ampdu_mlme.mtx held.

I think we should probably hold ampdu_mlme.mtx in ieee80211_start_tx_ba_session
or make sure we hold sta->lock in ieee80211_ba_session_work.


unreferenced object 0xffff880219b4de40 (size 192):
   comm "softirq", pid 0, jiffies 4296416789 (age 1257.971s)
   hex dump (first 32 bytes):
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   backtrace:
     [<ffffffff815bc02c>] kmemleak_alloc+0x73/0x98
     [<ffffffff8117d4b4>] slab_post_alloc_hook+0x28/0x2a
     [<ffffffff8117f4a6>] kmem_cache_alloc_trace+0xa5/0xcc
     [<ffffffffa0365221>] ieee80211_start_tx_ba_session+0x24b/0x360 [mac80211]
     [<ffffffffa03a98f3>] minstrel_ht_tx_status+0x79a/0x7a9 [mac80211]
     [<ffffffffa035d1cd>] ieee80211_tx_status+0x3af/0x947 [mac80211]
     [<ffffffffa06e86fa>] ath_txq_unlock_complete+0xb0/0xbb [ath9k]
     [<ffffffffa06e8992>] ath_tx_edma_tasklet+0x28d/0x2a4 [ath9k]
     [<ffffffffa06e33cd>] ath9k_tasklet+0x111/0x150 [ath9k]
     [<ffffffff8109d6d3>] tasklet_action+0x7d/0xcc
     [<ffffffff8109db2c>] __do_softirq+0x114/0x254
     [<ffffffff8109dcfe>] irq_exit+0x4b/0xa8
     [<ffffffff815d481d>] do_IRQ+0x9d/0xb4
     [<ffffffff815cc8ed>] ret_from_intr+0x0/0x15
     [<ffffffff814c8efb>] cpuidle_enter_tk+0x10/0x12
     [<ffffffff814c89b5>] cpuidle_enter_state+0x17/0x3f

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues?
  2013-06-12 18:21 kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues? Ben Greear
@ 2013-06-12 20:46 ` Johannes Berg
  2013-06-12 20:58   ` Ben Greear
  0 siblings, 1 reply; 4+ messages in thread
From: Johannes Berg @ 2013-06-12 20:46 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless@vger.kernel.org

On Wed, 2013-06-12 at 11:21 -0700, Ben Greear wrote:

> In ieee80211_start_tx_ba_session we are accessing and assigning the tid_start_tx
> without holding the ampdu_mlme.mtx mutex.
> 
> 	spin_lock_bh(&sta->lock);
> .....
> 	tid_tx = rcu_dereference_protected_tid_tx(sta, tid);
> 	/* check if the TID is not in aggregation flow already */
> 	if (tid_tx || sta->ampdu_mlme.tid_start_tx[tid]) {
> 
> ....
> 
> 	/*
> 	 * Finally, assign it to the start array; the work item will
> 	 * collect it and move it to the normal array.
> 	 */
> 	sta->ampdu_mlme.tid_start_tx[tid] = tid_tx;
> 
> 
> Elsewhere, in ieee80211_ba_session_work, we access the tid_start_tx
> without the sta->lock held, but with the ampdu_mlme.mtx held.

Yeah, that seems wrong.

> I think we should probably hold ampdu_mlme.mtx in ieee80211_start_tx_ba_session
> or make sure we hold sta->lock in ieee80211_ba_session_work.

Can't hold the mutex there, but we can do the lock (I'll comment on your
patch separately)

> unreferenced object 0xffff880219b4de40 (size 192):
>    comm "softirq", pid 0, jiffies 4296416789 (age 1257.971s)
>    hex dump (first 32 bytes):
>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>    backtrace:
>      [<ffffffff815bc02c>] kmemleak_alloc+0x73/0x98
>      [<ffffffff8117d4b4>] slab_post_alloc_hook+0x28/0x2a
>      [<ffffffff8117f4a6>] kmem_cache_alloc_trace+0xa5/0xcc
>      [<ffffffffa0365221>] ieee80211_start_tx_ba_session+0x24b/0x360 [mac80211]
>      [<ffffffffa03a98f3>] minstrel_ht_tx_status+0x79a/0x7a9 [mac80211]
>      [<ffffffffa035d1cd>] ieee80211_tx_status+0x3af/0x947 [mac80211]

When did this report get printed?

I have a feeling what happens is that start is requested, and then
before ieee80211_ba_session_work() gets a chance to run the station is
destroyed.

Should probably have something like this:

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index b429798..aaf68d2 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -149,6 +149,7 @@ static void cleanup_single_sta(struct sta_info *sta)
 	 * directly by station destruction.
 	 */
 	for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+		kfree(sta->ampdu_mlme.tid_start_tx[i]);
 		tid_tx = rcu_dereference_raw(sta->ampdu_mlme.tid_tx[i]);
 		if (!tid_tx)
 			continue;

johannes


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues?
  2013-06-12 20:46 ` Johannes Berg
@ 2013-06-12 20:58   ` Ben Greear
  2013-06-12 21:01     ` Johannes Berg
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2013-06-12 20:58 UTC (permalink / raw)
  To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org

On 06/12/2013 01:46 PM, Johannes Berg wrote:
> On Wed, 2013-06-12 at 11:21 -0700, Ben Greear wrote:
>
>> In ieee80211_start_tx_ba_session we are accessing and assigning the tid_start_tx
>> without holding the ampdu_mlme.mtx mutex.
>>
>> 	spin_lock_bh(&sta->lock);
>> .....
>> 	tid_tx = rcu_dereference_protected_tid_tx(sta, tid);
>> 	/* check if the TID is not in aggregation flow already */
>> 	if (tid_tx || sta->ampdu_mlme.tid_start_tx[tid]) {
>>
>> ....
>>
>> 	/*
>> 	 * Finally, assign it to the start array; the work item will
>> 	 * collect it and move it to the normal array.
>> 	 */
>> 	sta->ampdu_mlme.tid_start_tx[tid] = tid_tx;
>>
>>
>> Elsewhere, in ieee80211_ba_session_work, we access the tid_start_tx
>> without the sta->lock held, but with the ampdu_mlme.mtx held.
>
> Yeah, that seems wrong.
>
>> I think we should probably hold ampdu_mlme.mtx in ieee80211_start_tx_ba_session
>> or make sure we hold sta->lock in ieee80211_ba_session_work.
>
> Can't hold the mutex there, but we can do the lock (I'll comment on your
> patch separately)
>
>> unreferenced object 0xffff880219b4de40 (size 192):
>>     comm "softirq", pid 0, jiffies 4296416789 (age 1257.971s)
>>     hex dump (first 32 bytes):
>>       00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>       00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>     backtrace:
>>       [<ffffffff815bc02c>] kmemleak_alloc+0x73/0x98
>>       [<ffffffff8117d4b4>] slab_post_alloc_hook+0x28/0x2a
>>       [<ffffffff8117f4a6>] kmem_cache_alloc_trace+0xa5/0xcc
>>       [<ffffffffa0365221>] ieee80211_start_tx_ba_session+0x24b/0x360 [mac80211]
>>       [<ffffffffa03a98f3>] minstrel_ht_tx_status+0x79a/0x7a9 [mac80211]
>>       [<ffffffffa035d1cd>] ieee80211_tx_status+0x3af/0x947 [mac80211]
>
> When did this report get printed?

I have a system with 100 or so stations constantly trying to
associate with a set of APs that can handle < 100.  This
effectively causes constant churn of re-associations and
associated logic...

Good for shaking out bugs it seems :)

These and other leaks show up after a few minutes of
running this test scenario.  It's not a huge number of
leaks, however...so usually stations go away w/out leaking.

> I have a feeling what happens is that start is requested, and then
> before ieee80211_ba_session_work() gets a chance to run the station is
> destroyed.
>
> Should probably have something like this:
>
> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
> index b429798..aaf68d2 100644
> --- a/net/mac80211/sta_info.c
> +++ b/net/mac80211/sta_info.c
> @@ -149,6 +149,7 @@ static void cleanup_single_sta(struct sta_info *sta)
>   	 * directly by station destruction.
>   	 */
>   	for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
> +		kfree(sta->ampdu_mlme.tid_start_tx[i]);
>   		tid_tx = rcu_dereference_raw(sta->ampdu_mlme.tid_tx[i]);
>   		if (!tid_tx)
>   			continue;

Looks reasonable to me.  I was about to start testing similar logic
in sta_info_free(), but likely your patch is more proper.

I'll give it a try now.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues?
  2013-06-12 20:58   ` Ben Greear
@ 2013-06-12 21:01     ` Johannes Berg
  0 siblings, 0 replies; 4+ messages in thread
From: Johannes Berg @ 2013-06-12 21:01 UTC (permalink / raw)
  To: Ben Greear; +Cc: linux-wireless@vger.kernel.org

On Wed, 2013-06-12 at 13:58 -0700, Ben Greear wrote:

> > When did this report get printed?
> 
> I have a system with 100 or so stations constantly trying to
> associate with a set of APs that can handle < 100.  This
> effectively causes constant churn of re-associations and
> associated logic...

Right ... I figured it was this.

> Good for shaking out bugs it seems :)
> 
> These and other leaks show up after a few minutes of
> running this test scenario.  It's not a huge number of
> leaks, however...so usually stations go away w/out leaking.

That's not all too surprising really, the work should run quickly I
guess.

Anyway I guess kmemleak doesn't actually let you pinpoint when the leak
occurred because it just scans periodically and not on every kfree, so
n/m my question.


> >   	for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
> > +		kfree(sta->ampdu_mlme.tid_start_tx[i]);
> >   		tid_tx = rcu_dereference_raw(sta->ampdu_mlme.tid_tx[i]);
> >   		if (!tid_tx)
> >   			continue;
> 
> Looks reasonable to me.  I was about to start testing similar logic
> in sta_info_free(), but likely your patch is more proper.
> 
> I'll give it a try now.

Thanks.

johannes


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-06-12 21:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-12 18:21 kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues? Ben Greear
2013-06-12 20:46 ` Johannes Berg
2013-06-12 20:58   ` Ben Greear
2013-06-12 21:01     ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).