* kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues?
@ 2013-06-12 18:21 Ben Greear
2013-06-12 20:46 ` Johannes Berg
0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2013-06-12 18:21 UTC (permalink / raw)
To: linux-wireless@vger.kernel.org
This is on 3.9.5+
I see a fair amount of these. Once again, the locking is tricky
for my poor brain, but I am suspicious of this part of things.
It seems that ampdu_mlme.mtx is used to protect the tid
arrays (although sta->lock also applies to part of it).
In ieee80211_start_tx_ba_session we are accessing and assigning the tid_start_tx
without holding the ampdu_mlme.mtx mutex.
spin_lock_bh(&sta->lock);
.....
tid_tx = rcu_dereference_protected_tid_tx(sta, tid);
/* check if the TID is not in aggregation flow already */
if (tid_tx || sta->ampdu_mlme.tid_start_tx[tid]) {
....
/*
* Finally, assign it to the start array; the work item will
* collect it and move it to the normal array.
*/
sta->ampdu_mlme.tid_start_tx[tid] = tid_tx;
Elsewhere, in ieee80211_ba_session_work, we access the tid_start_tx
without the sta->lock held, but with the ampdu_mlme.mtx held.
I think we should probably hold ampdu_mlme.mtx in ieee80211_start_tx_ba_session
or make sure we hold sta->lock in ieee80211_ba_session_work.
unreferenced object 0xffff880219b4de40 (size 192):
comm "softirq", pid 0, jiffies 4296416789 (age 1257.971s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff815bc02c>] kmemleak_alloc+0x73/0x98
[<ffffffff8117d4b4>] slab_post_alloc_hook+0x28/0x2a
[<ffffffff8117f4a6>] kmem_cache_alloc_trace+0xa5/0xcc
[<ffffffffa0365221>] ieee80211_start_tx_ba_session+0x24b/0x360 [mac80211]
[<ffffffffa03a98f3>] minstrel_ht_tx_status+0x79a/0x7a9 [mac80211]
[<ffffffffa035d1cd>] ieee80211_tx_status+0x3af/0x947 [mac80211]
[<ffffffffa06e86fa>] ath_txq_unlock_complete+0xb0/0xbb [ath9k]
[<ffffffffa06e8992>] ath_tx_edma_tasklet+0x28d/0x2a4 [ath9k]
[<ffffffffa06e33cd>] ath9k_tasklet+0x111/0x150 [ath9k]
[<ffffffff8109d6d3>] tasklet_action+0x7d/0xcc
[<ffffffff8109db2c>] __do_softirq+0x114/0x254
[<ffffffff8109dcfe>] irq_exit+0x4b/0xa8
[<ffffffff815d481d>] do_IRQ+0x9d/0xb4
[<ffffffff815cc8ed>] ret_from_intr+0x0/0x15
[<ffffffff814c8efb>] cpuidle_enter_tk+0x10/0x12
[<ffffffff814c89b5>] cpuidle_enter_state+0x17/0x3f
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues?
2013-06-12 18:21 kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues? Ben Greear
@ 2013-06-12 20:46 ` Johannes Berg
2013-06-12 20:58 ` Ben Greear
0 siblings, 1 reply; 4+ messages in thread
From: Johannes Berg @ 2013-06-12 20:46 UTC (permalink / raw)
To: Ben Greear; +Cc: linux-wireless@vger.kernel.org
On Wed, 2013-06-12 at 11:21 -0700, Ben Greear wrote:
> In ieee80211_start_tx_ba_session we are accessing and assigning the tid_start_tx
> without holding the ampdu_mlme.mtx mutex.
>
> spin_lock_bh(&sta->lock);
> .....
> tid_tx = rcu_dereference_protected_tid_tx(sta, tid);
> /* check if the TID is not in aggregation flow already */
> if (tid_tx || sta->ampdu_mlme.tid_start_tx[tid]) {
>
> ....
>
> /*
> * Finally, assign it to the start array; the work item will
> * collect it and move it to the normal array.
> */
> sta->ampdu_mlme.tid_start_tx[tid] = tid_tx;
>
>
> Elsewhere, in ieee80211_ba_session_work, we access the tid_start_tx
> without the sta->lock held, but with the ampdu_mlme.mtx held.
Yeah, that seems wrong.
> I think we should probably hold ampdu_mlme.mtx in ieee80211_start_tx_ba_session
> or make sure we hold sta->lock in ieee80211_ba_session_work.
Can't hold the mutex there, but we can do the lock (I'll comment on your
patch separately)
> unreferenced object 0xffff880219b4de40 (size 192):
> comm "softirq", pid 0, jiffies 4296416789 (age 1257.971s)
> hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff815bc02c>] kmemleak_alloc+0x73/0x98
> [<ffffffff8117d4b4>] slab_post_alloc_hook+0x28/0x2a
> [<ffffffff8117f4a6>] kmem_cache_alloc_trace+0xa5/0xcc
> [<ffffffffa0365221>] ieee80211_start_tx_ba_session+0x24b/0x360 [mac80211]
> [<ffffffffa03a98f3>] minstrel_ht_tx_status+0x79a/0x7a9 [mac80211]
> [<ffffffffa035d1cd>] ieee80211_tx_status+0x3af/0x947 [mac80211]
When did this report get printed?
I have a feeling what happens is that start is requested, and then
before ieee80211_ba_session_work() gets a chance to run the station is
destroyed.
Should probably have something like this:
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index b429798..aaf68d2 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -149,6 +149,7 @@ static void cleanup_single_sta(struct sta_info *sta)
* directly by station destruction.
*/
for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
+ kfree(sta->ampdu_mlme.tid_start_tx[i]);
tid_tx = rcu_dereference_raw(sta->ampdu_mlme.tid_tx[i]);
if (!tid_tx)
continue;
johannes
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues?
2013-06-12 20:46 ` Johannes Berg
@ 2013-06-12 20:58 ` Ben Greear
2013-06-12 21:01 ` Johannes Berg
0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2013-06-12 20:58 UTC (permalink / raw)
To: Johannes Berg; +Cc: linux-wireless@vger.kernel.org
On 06/12/2013 01:46 PM, Johannes Berg wrote:
> On Wed, 2013-06-12 at 11:21 -0700, Ben Greear wrote:
>
>> In ieee80211_start_tx_ba_session we are accessing and assigning the tid_start_tx
>> without holding the ampdu_mlme.mtx mutex.
>>
>> spin_lock_bh(&sta->lock);
>> .....
>> tid_tx = rcu_dereference_protected_tid_tx(sta, tid);
>> /* check if the TID is not in aggregation flow already */
>> if (tid_tx || sta->ampdu_mlme.tid_start_tx[tid]) {
>>
>> ....
>>
>> /*
>> * Finally, assign it to the start array; the work item will
>> * collect it and move it to the normal array.
>> */
>> sta->ampdu_mlme.tid_start_tx[tid] = tid_tx;
>>
>>
>> Elsewhere, in ieee80211_ba_session_work, we access the tid_start_tx
>> without the sta->lock held, but with the ampdu_mlme.mtx held.
>
> Yeah, that seems wrong.
>
>> I think we should probably hold ampdu_mlme.mtx in ieee80211_start_tx_ba_session
>> or make sure we hold sta->lock in ieee80211_ba_session_work.
>
> Can't hold the mutex there, but we can do the lock (I'll comment on your
> patch separately)
>
>> unreferenced object 0xffff880219b4de40 (size 192):
>> comm "softirq", pid 0, jiffies 4296416789 (age 1257.971s)
>> hex dump (first 32 bytes):
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> backtrace:
>> [<ffffffff815bc02c>] kmemleak_alloc+0x73/0x98
>> [<ffffffff8117d4b4>] slab_post_alloc_hook+0x28/0x2a
>> [<ffffffff8117f4a6>] kmem_cache_alloc_trace+0xa5/0xcc
>> [<ffffffffa0365221>] ieee80211_start_tx_ba_session+0x24b/0x360 [mac80211]
>> [<ffffffffa03a98f3>] minstrel_ht_tx_status+0x79a/0x7a9 [mac80211]
>> [<ffffffffa035d1cd>] ieee80211_tx_status+0x3af/0x947 [mac80211]
>
> When did this report get printed?
I have a system with 100 or so stations constantly trying to
associate with a set of APs that can handle < 100. This
effectively causes constant churn of re-associations and
associated logic...
Good for shaking out bugs it seems :)
These and other leaks show up after a few minutes of
running this test scenario. It's not a huge number of
leaks, however...so usually stations go away w/out leaking.
> I have a feeling what happens is that start is requested, and then
> before ieee80211_ba_session_work() gets a chance to run the station is
> destroyed.
>
> Should probably have something like this:
>
> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
> index b429798..aaf68d2 100644
> --- a/net/mac80211/sta_info.c
> +++ b/net/mac80211/sta_info.c
> @@ -149,6 +149,7 @@ static void cleanup_single_sta(struct sta_info *sta)
> * directly by station destruction.
> */
> for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
> + kfree(sta->ampdu_mlme.tid_start_tx[i]);
> tid_tx = rcu_dereference_raw(sta->ampdu_mlme.tid_tx[i]);
> if (!tid_tx)
> continue;
Looks reasonable to me. I was about to start testing similar logic
in sta_info_free(), but likely your patch is more proper.
I'll give it a try now.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues?
2013-06-12 20:58 ` Ben Greear
@ 2013-06-12 21:01 ` Johannes Berg
0 siblings, 0 replies; 4+ messages in thread
From: Johannes Berg @ 2013-06-12 21:01 UTC (permalink / raw)
To: Ben Greear; +Cc: linux-wireless@vger.kernel.org
On Wed, 2013-06-12 at 13:58 -0700, Ben Greear wrote:
> > When did this report get printed?
>
> I have a system with 100 or so stations constantly trying to
> associate with a set of APs that can handle < 100. This
> effectively causes constant churn of re-associations and
> associated logic...
Right ... I figured it was this.
> Good for shaking out bugs it seems :)
>
> These and other leaks show up after a few minutes of
> running this test scenario. It's not a huge number of
> leaks, however...so usually stations go away w/out leaking.
That's not all too surprising really, the work should run quickly I
guess.
Anyway I guess kmemleak doesn't actually let you pinpoint when the leak
occurred because it just scans periodically and not on every kfree, so
n/m my question.
> > for (i = 0; i < IEEE80211_NUM_TIDS; i++) {
> > + kfree(sta->ampdu_mlme.tid_start_tx[i]);
> > tid_tx = rcu_dereference_raw(sta->ampdu_mlme.tid_tx[i]);
> > if (!tid_tx)
> > continue;
>
> Looks reasonable to me. I was about to start testing similar logic
> in sta_info_free(), but likely your patch is more proper.
>
> I'll give it a try now.
Thanks.
johannes
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-06-12 21:01 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-12 18:21 kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues? Ben Greear
2013-06-12 20:46 ` Johannes Berg
2013-06-12 20:58 ` Ben Greear
2013-06-12 21:01 ` Johannes Berg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).