From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from mail.candelatech.com ([208.74.158.172]:42318 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751890Ab3FLU6q (ORCPT ); Wed, 12 Jun 2013 16:58:46 -0400 Message-ID: <51B8E101.2050503@candelatech.com> (sfid-20130612_225849_144928_67A5EEEC) Date: Wed, 12 Jun 2013 13:58:41 -0700 From: Ben Greear MIME-Version: 1.0 To: Johannes Berg CC: "linux-wireless@vger.kernel.org" Subject: Re: kmemleak report related to ieee80211_start_tx_ba_session, tid_start_tx locking issues? References: <51B8BC3E.40905@candelatech.com> (sfid-20130612_202155_810155_8B14A60D) <1371069960.8601.35.camel@jlt4.sipsolutions.net> In-Reply-To: <1371069960.8601.35.camel@jlt4.sipsolutions.net> Content-Type: text/plain; charset=UTF-8; format=flowed Sender: linux-wireless-owner@vger.kernel.org List-ID: On 06/12/2013 01:46 PM, Johannes Berg wrote: > On Wed, 2013-06-12 at 11:21 -0700, Ben Greear wrote: > >> In ieee80211_start_tx_ba_session we are accessing and assigning the tid_start_tx >> without holding the ampdu_mlme.mtx mutex. >> >> spin_lock_bh(&sta->lock); >> ..... >> tid_tx = rcu_dereference_protected_tid_tx(sta, tid); >> /* check if the TID is not in aggregation flow already */ >> if (tid_tx || sta->ampdu_mlme.tid_start_tx[tid]) { >> >> .... >> >> /* >> * Finally, assign it to the start array; the work item will >> * collect it and move it to the normal array. >> */ >> sta->ampdu_mlme.tid_start_tx[tid] = tid_tx; >> >> >> Elsewhere, in ieee80211_ba_session_work, we access the tid_start_tx >> without the sta->lock held, but with the ampdu_mlme.mtx held. > > Yeah, that seems wrong. > >> I think we should probably hold ampdu_mlme.mtx in ieee80211_start_tx_ba_session >> or make sure we hold sta->lock in ieee80211_ba_session_work. > > Can't hold the mutex there, but we can do the lock (I'll comment on your > patch separately) > >> unreferenced object 0xffff880219b4de40 (size 192): >> comm "softirq", pid 0, jiffies 4296416789 (age 1257.971s) >> hex dump (first 32 bytes): >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> backtrace: >> [] kmemleak_alloc+0x73/0x98 >> [] slab_post_alloc_hook+0x28/0x2a >> [] kmem_cache_alloc_trace+0xa5/0xcc >> [] ieee80211_start_tx_ba_session+0x24b/0x360 [mac80211] >> [] minstrel_ht_tx_status+0x79a/0x7a9 [mac80211] >> [] ieee80211_tx_status+0x3af/0x947 [mac80211] > > When did this report get printed? I have a system with 100 or so stations constantly trying to associate with a set of APs that can handle < 100. This effectively causes constant churn of re-associations and associated logic... Good for shaking out bugs it seems :) These and other leaks show up after a few minutes of running this test scenario. It's not a huge number of leaks, however...so usually stations go away w/out leaking. > I have a feeling what happens is that start is requested, and then > before ieee80211_ba_session_work() gets a chance to run the station is > destroyed. > > Should probably have something like this: > > diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c > index b429798..aaf68d2 100644 > --- a/net/mac80211/sta_info.c > +++ b/net/mac80211/sta_info.c > @@ -149,6 +149,7 @@ static void cleanup_single_sta(struct sta_info *sta) > * directly by station destruction. > */ > for (i = 0; i < IEEE80211_NUM_TIDS; i++) { > + kfree(sta->ampdu_mlme.tid_start_tx[i]); > tid_tx = rcu_dereference_raw(sta->ampdu_mlme.tid_tx[i]); > if (!tid_tx) > continue; Looks reasonable to me. I was about to start testing similar logic in sta_info_free(), but likely your patch is more proper. I'll give it a try now. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com