* [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries.
@ 2016-08-19 1:26 greearb
2016-08-19 1:26 ` [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock greearb
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: greearb @ 2016-08-19 1:26 UTC (permalink / raw)
To: ath10k; +Cc: Ben Greear, linux-wireless
From: Ben Greear <greearb@candelatech.com>
I was seeing kernel crashes due to accessing freed memory
while debugging a 9984 firmware that was crashing often.
This patch fixes the crashes. I am not certain if there
is a better way or not.
Signed-off-by: Ben Greear <greearb@candelatech.com>
---
drivers/net/wireless/ath/ath10k/mac.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 5659ef1..916119c 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq *txq)
static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq)
{
struct ath10k_txq *artxq = (void *)txq->drv_priv;
+ struct ath10k_txq *tmp, *walker;
struct ath10k_skb_cb *cb;
struct sk_buff *msdu;
+ struct ieee80211_txq *txq_tmp;
int msdu_id;
if (!txq)
@@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq)
spin_lock_bh(&ar->txqs_lock);
if (!list_empty(&artxq->list))
list_del_init(&artxq->list);
+
+ /* Remove from ar->txqs in case it still exists there. */
+ list_for_each_entry_safe(walker, tmp, &ar->txqs, list) {
+ txq_tmp = container_of((void *)walker, struct ieee80211_txq,
+ drv_priv);
+ if (txq_tmp == txq)
+ list_del(&walker->list);
+ }
spin_unlock_bh(&ar->txqs_lock);
spin_lock_bh(&ar->htt.tx_lock);
--
2.4.11
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock. 2016-08-19 1:26 [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries greearb @ 2016-08-19 1:26 ` greearb 2016-08-19 3:01 ` Manoharan, Rajkumar 2016-09-09 13:36 ` Valo, Kalle 2016-08-19 1:26 ` [PATCH 3/3] ath10k: Improve logging message greearb ` (2 subsequent siblings) 3 siblings, 2 replies; 20+ messages in thread From: greearb @ 2016-08-19 1:26 UTC (permalink / raw) To: ath10k; +Cc: Ben Greear, linux-wireless From: Ben Greear <greearb@candelatech.com> I was seeing some spin-lock hangs in this area of the code, and it seems more proper to do the rcu-read-lock outside of the spin lock. I am not sure how much this matters, however. Signed-off-by: Ben Greear <greearb@candelatech.com> --- drivers/net/wireless/ath/ath10k/mac.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 916119c..d96c06e 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -4307,8 +4307,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) int max; int loop_max = 2000; - spin_lock_bh(&ar->txqs_lock); rcu_read_lock(); + spin_lock_bh(&ar->txqs_lock); last = list_last_entry(&ar->txqs, struct ath10k_txq, list); while (!list_empty(&ar->txqs)) { @@ -4342,8 +4342,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) break; } - rcu_read_unlock(); spin_unlock_bh(&ar->txqs_lock); + rcu_read_unlock(); } /************/ -- 2.4.11 _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock. 2016-08-19 1:26 ` [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock greearb @ 2016-08-19 3:01 ` Manoharan, Rajkumar 2016-08-19 3:28 ` Ben Greear 2016-09-09 13:36 ` Valo, Kalle 1 sibling, 1 reply; 20+ messages in thread From: Manoharan, Rajkumar @ 2016-08-19 3:01 UTC (permalink / raw) To: greearb@candelatech.com, ath10k@lists.infradead.org Cc: linux-wireless@vger.kernel.org > diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c > index 916119c..d96c06e 100644 > --- a/drivers/net/wireless/ath/ath10k/mac.c > +++ b/drivers/net/wireless/ath/ath10k/mac.c > @@ -4307,8 +4307,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) > int max; > int loop_max = 2000; > > - spin_lock_bh(&ar->txqs_lock); > rcu_read_lock(); > + spin_lock_bh(&ar->txqs_lock); > Ben, It is quite possible that push_pending can be preempted after acquiring rcu_lock which may lead to deadlock. no? I assume to prevent that spin_lock is taken first. Could you please explain how this reordering is fixing dead lock? -Rajkumar _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock. 2016-08-19 3:01 ` Manoharan, Rajkumar @ 2016-08-19 3:28 ` Ben Greear 0 siblings, 0 replies; 20+ messages in thread From: Ben Greear @ 2016-08-19 3:28 UTC (permalink / raw) To: Manoharan, Rajkumar, ath10k@lists.infradead.org Cc: linux-wireless@vger.kernel.org On 08/18/2016 08:01 PM, Manoharan, Rajkumar wrote: >> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c >> index 916119c..d96c06e 100644 >> --- a/drivers/net/wireless/ath/ath10k/mac.c >> +++ b/drivers/net/wireless/ath/ath10k/mac.c >> @@ -4307,8 +4307,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) >> int max; >> int loop_max = 2000; >> >> - spin_lock_bh(&ar->txqs_lock); >> rcu_read_lock(); >> + spin_lock_bh(&ar->txqs_lock); >> > Ben, > > It is quite possible that push_pending can be preempted after acquiring rcu_lock which > may lead to deadlock. no? I assume to prevent that spin_lock is taken first. > > Could you please explain how this reordering is fixing dead lock? It did not obviously fix the spin lock issue, but the issue went away. Maybe it was because I fixed the stale memory access issues at around the same time. But, I don't think you can deadlock by taking rcu lock first and then the spinlock. I checked all places where the spinlock is held, and I do not see any way for any of them to block forever (In my kernel, I have a 2000 time limit on spinning in the push pending method, which can help make sure we don't spin forever). http://dmz2.candelatech.com/?p=linux-4.7.dev.y/.git;a=commitdiff;h=5d192657269d8e20fce733f894bb1b68df71db00 I was also worried that some calls from mac80211 might be holding rcu_read_lock while calling into ath10k code that would grab the spinlock. If that is the case (and I didn't verify it was), then you could have a lock inversion by taking spinlock before rcu read lock in the push-pending method. Anyway, someone that understands locking subtleties better might have more clue about this code than I do. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock. 2016-08-19 1:26 ` [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock greearb 2016-08-19 3:01 ` Manoharan, Rajkumar @ 2016-09-09 13:36 ` Valo, Kalle 2016-09-09 14:47 ` Ben Greear 1 sibling, 1 reply; 20+ messages in thread From: Valo, Kalle @ 2016-09-09 13:36 UTC (permalink / raw) To: greearb@candelatech.com Cc: linux-wireless@vger.kernel.org, ath10k@lists.infradead.org greearb@candelatech.com writes: > From: Ben Greear <greearb@candelatech.com> > > I was seeing some spin-lock hangs in this area of the code, > and it seems more proper to do the rcu-read-lock outside of > the spin lock. I am not sure how much this matters, however. > > Signed-off-by: Ben Greear <greearb@candelatech.com> > --- > drivers/net/wireless/ath/ath10k/mac.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c > index 916119c..d96c06e 100644 > --- a/drivers/net/wireless/ath/ath10k/mac.c > +++ b/drivers/net/wireless/ath/ath10k/mac.c > @@ -4307,8 +4307,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) > int max; > int loop_max = 2000; > > - spin_lock_bh(&ar->txqs_lock); > rcu_read_lock(); > + spin_lock_bh(&ar->txqs_lock); > > last = list_last_entry(&ar->txqs, struct ath10k_txq, list); > while (!list_empty(&ar->txqs)) { > @@ -4342,8 +4342,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) > break; > } > > - rcu_read_unlock(); > spin_unlock_bh(&ar->txqs_lock); > + rcu_read_unlock(); I'm no RCU expert but this isn't making any sense. Maybe it changes timings on your kernel so that it hides the real problem? -- Kalle Valo _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock. 2016-09-09 13:36 ` Valo, Kalle @ 2016-09-09 14:47 ` Ben Greear 2016-09-12 6:41 ` Johannes Berg 0 siblings, 1 reply; 20+ messages in thread From: Ben Greear @ 2016-09-09 14:47 UTC (permalink / raw) To: Valo, Kalle; +Cc: linux-wireless@vger.kernel.org, ath10k@lists.infradead.org On 09/09/2016 06:36 AM, Valo, Kalle wrote: > greearb@candelatech.com writes: > >> From: Ben Greear <greearb@candelatech.com> >> >> I was seeing some spin-lock hangs in this area of the code, >> and it seems more proper to do the rcu-read-lock outside of >> the spin lock. I am not sure how much this matters, however. >> >> Signed-off-by: Ben Greear <greearb@candelatech.com> >> --- >> drivers/net/wireless/ath/ath10k/mac.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c >> index 916119c..d96c06e 100644 >> --- a/drivers/net/wireless/ath/ath10k/mac.c >> +++ b/drivers/net/wireless/ath/ath10k/mac.c >> @@ -4307,8 +4307,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) >> int max; >> int loop_max = 2000; >> >> - spin_lock_bh(&ar->txqs_lock); >> rcu_read_lock(); >> + spin_lock_bh(&ar->txqs_lock); >> >> last = list_last_entry(&ar->txqs, struct ath10k_txq, list); >> while (!list_empty(&ar->txqs)) { >> @@ -4342,8 +4342,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar) >> break; >> } >> >> - rcu_read_unlock(); >> spin_unlock_bh(&ar->txqs_lock); >> + rcu_read_unlock(); > > I'm no RCU expert but this isn't making any sense. Maybe it changes > timings on your kernel so that it hides the real problem? I'm not sure this fixed anything or not, it just seemed weird so I changed it. I was hoping someone that understood rcu locking would comment... Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock. 2016-09-09 14:47 ` Ben Greear @ 2016-09-12 6:41 ` Johannes Berg 2016-09-12 16:37 ` Ben Greear 0 siblings, 1 reply; 20+ messages in thread From: Johannes Berg @ 2016-09-12 6:41 UTC (permalink / raw) To: Ben Greear, Valo, Kalle Cc: linux-wireless@vger.kernel.org, ath10k@lists.infradead.org > > > - rcu_read_unlock(); > > > spin_unlock_bh(&ar->txqs_lock); > > > + rcu_read_unlock(); > > > > I'm no RCU expert but this isn't making any sense. Maybe it changes > > timings on your kernel so that it hides the real problem? > > I'm not sure this fixed anything or not, it just seemed weird so I > changed it. > > I was hoping someone that understood rcu locking would comment... > RCU is no "locking". The sooner you get over that notion, the better. This therefore make no sense whatsoever. In fact, you want to keep the RCU protected section *small*, so having the spinlock inside hurts overall system performance. johannes _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock. 2016-09-12 6:41 ` Johannes Berg @ 2016-09-12 16:37 ` Ben Greear 0 siblings, 0 replies; 20+ messages in thread From: Ben Greear @ 2016-09-12 16:37 UTC (permalink / raw) To: Johannes Berg, Valo, Kalle Cc: linux-wireless@vger.kernel.org, ath10k@lists.infradead.org On 09/11/2016 11:41 PM, Johannes Berg wrote: > >>>> - rcu_read_unlock(); >>>> spin_unlock_bh(&ar->txqs_lock); >>>> + rcu_read_unlock(); >>> >>> I'm no RCU expert but this isn't making any sense. Maybe it changes >>> timings on your kernel so that it hides the real problem? >> >> I'm not sure this fixed anything or not, it just seemed weird so I >> changed it. >> >> I was hoping someone that understood rcu locking would comment... >> > > RCU is no "locking". The sooner you get over that notion, the better. > > This therefore make no sense whatsoever. > > In fact, you want to keep the RCU protected section *small*, so having > the spinlock inside hurts overall system performance. Ok, thanks for the review. I'll drop this patch from my tree. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 3/3] ath10k: Improve logging message. 2016-08-19 1:26 [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries greearb 2016-08-19 1:26 ` [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock greearb @ 2016-08-19 1:26 ` greearb 2016-08-19 6:35 ` Mohammed Shafi Shajakhan 2016-09-13 12:29 ` [3/3] " Kalle Valo 2016-08-19 6:59 ` [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries Michal Kazior 2016-09-09 17:25 ` Felix Fietkau 3 siblings, 2 replies; 20+ messages in thread From: greearb @ 2016-08-19 1:26 UTC (permalink / raw) To: ath10k; +Cc: Ben Greear, linux-wireless From: Ben Greear <greearb@candelatech.com> Helps to know the sta pointer. Signed-off-by: Ben Greear <greearb@candelatech.com> --- drivers/net/wireless/ath/ath10k/mac.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index d96c06e..82bc0da 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -6491,8 +6491,8 @@ static int ath10k_sta_state(struct ieee80211_hw *hw, * Existing station deletion. */ ath10k_dbg(ar, ATH10K_DBG_MAC, - "mac vdev %d peer delete %pM (sta gone)\n", - arvif->vdev_id, sta->addr); + "mac vdev %d peer delete %pM (sta gone) sta: %p\n", + arvif->vdev_id, sta->addr, sta); ret = ath10k_peer_delete(ar, arvif->vdev_id, sta->addr); if (ret) -- 2.4.11 _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] ath10k: Improve logging message. 2016-08-19 1:26 ` [PATCH 3/3] ath10k: Improve logging message greearb @ 2016-08-19 6:35 ` Mohammed Shafi Shajakhan 2016-09-09 13:30 ` Valo, Kalle 2016-09-13 12:29 ` [3/3] " Kalle Valo 1 sibling, 1 reply; 20+ messages in thread From: Mohammed Shafi Shajakhan @ 2016-08-19 6:35 UTC (permalink / raw) To: greearb; +Cc: linux-wireless, ath10k Hi Ben, On Thu, Aug 18, 2016 at 06:26:35PM -0700, greearb@candelatech.com wrote: > From: Ben Greear <greearb@candelatech.com> > > Helps to know the sta pointer. > > Signed-off-by: Ben Greear <greearb@candelatech.com> > --- > drivers/net/wireless/ath/ath10k/mac.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c > index d96c06e..82bc0da 100644 > --- a/drivers/net/wireless/ath/ath10k/mac.c > +++ b/drivers/net/wireless/ath/ath10k/mac.c > @@ -6491,8 +6491,8 @@ static int ath10k_sta_state(struct ieee80211_hw *hw, > * Existing station deletion. > */ > ath10k_dbg(ar, ATH10K_DBG_MAC, > - "mac vdev %d peer delete %pM (sta gone)\n", > - arvif->vdev_id, sta->addr); > + "mac vdev %d peer delete %pM (sta gone) sta: %p\n", > + arvif->vdev_id, sta->addr, sta); good to rebase over Maharaja's change (%p to %pK) https://patchwork.kernel.org/patch/9263691/ > > ret = ath10k_peer_delete(ar, arvif->vdev_id, sta->addr); > if (ret) > -- > 2.4.11 > > > _______________________________________________ > ath10k mailing list > ath10k@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/ath10k _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 3/3] ath10k: Improve logging message. 2016-08-19 6:35 ` Mohammed Shafi Shajakhan @ 2016-09-09 13:30 ` Valo, Kalle 0 siblings, 0 replies; 20+ messages in thread From: Valo, Kalle @ 2016-09-09 13:30 UTC (permalink / raw) To: Mohammed Shafi Shajakhan Cc: greearb@candelatech.com, linux-wireless@vger.kernel.org, ath10k@lists.infradead.org Mohammed Shafi Shajakhan <mohammed@codeaurora.org> writes: > Hi Ben, > > On Thu, Aug 18, 2016 at 06:26:35PM -0700, greearb@candelatech.com wrote: >> From: Ben Greear <greearb@candelatech.com> >> >> Helps to know the sta pointer. >> >> Signed-off-by: Ben Greear <greearb@candelatech.com> >> --- >> drivers/net/wireless/ath/ath10k/mac.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c >> index d96c06e..82bc0da 100644 >> --- a/drivers/net/wireless/ath/ath10k/mac.c >> +++ b/drivers/net/wireless/ath/ath10k/mac.c >> @@ -6491,8 +6491,8 @@ static int ath10k_sta_state(struct ieee80211_hw *hw, >> * Existing station deletion. >> */ >> ath10k_dbg(ar, ATH10K_DBG_MAC, >> - "mac vdev %d peer delete %pM (sta gone)\n", >> - arvif->vdev_id, sta->addr); >> + "mac vdev %d peer delete %pM (sta gone) sta: %p\n", >> + arvif->vdev_id, sta->addr, sta); > > good to rebase over Maharaja's change (%p to %pK) > https://patchwork.kernel.org/patch/9263691/ I added that to the patch in the pending branch. -- Kalle Valo _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [3/3] ath10k: Improve logging message. 2016-08-19 1:26 ` [PATCH 3/3] ath10k: Improve logging message greearb 2016-08-19 6:35 ` Mohammed Shafi Shajakhan @ 2016-09-13 12:29 ` Kalle Valo 1 sibling, 0 replies; 20+ messages in thread From: Kalle Valo @ 2016-09-13 12:29 UTC (permalink / raw) To: Ben Greear; +Cc: linux-wireless, ath10k Ben Greear <greearb@candelatech.com> wrote: > From: Ben Greear <greearb@candelatech.com> > > Helps to know the sta pointer. > > Signed-off-by: Ben Greear <greearb@candelatech.com> Thanks, 1 patch applied to ath-next branch of ath.git: 3040420158c1 ath10k: improve logging message -- Sent by pwcli https://patchwork.kernel.org/patch/9289043/ _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries. 2016-08-19 1:26 [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries greearb 2016-08-19 1:26 ` [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock greearb 2016-08-19 1:26 ` [PATCH 3/3] ath10k: Improve logging message greearb @ 2016-08-19 6:59 ` Michal Kazior 2016-08-19 13:34 ` Ben Greear 2016-09-09 17:25 ` Felix Fietkau 3 siblings, 1 reply; 20+ messages in thread From: Michal Kazior @ 2016-08-19 6:59 UTC (permalink / raw) To: Ben Greear; +Cc: linux-wireless, ath10k@lists.infradead.org On 19 August 2016 at 03:26, <greearb@candelatech.com> wrote: > From: Ben Greear <greearb@candelatech.com> > > I was seeing kernel crashes due to accessing freed memory > while debugging a 9984 firmware that was crashing often. > > This patch fixes the crashes. I am not certain if there > is a better way or not. > > Signed-off-by: Ben Greear <greearb@candelatech.com> > --- > drivers/net/wireless/ath/ath10k/mac.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c > index 5659ef1..916119c 100644 > --- a/drivers/net/wireless/ath/ath10k/mac.c > +++ b/drivers/net/wireless/ath/ath10k/mac.c > @@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq *txq) > static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq) > { > struct ath10k_txq *artxq = (void *)txq->drv_priv; > + struct ath10k_txq *tmp, *walker; > struct ath10k_skb_cb *cb; > struct sk_buff *msdu; > + struct ieee80211_txq *txq_tmp; > int msdu_id; > > if (!txq) > @@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq) > spin_lock_bh(&ar->txqs_lock); > if (!list_empty(&artxq->list)) > list_del_init(&artxq->list); > + > + /* Remove from ar->txqs in case it still exists there. */ > + list_for_each_entry_safe(walker, tmp, &ar->txqs, list) { > + txq_tmp = container_of((void *)walker, struct ieee80211_txq, > + drv_priv); > + if (txq_tmp == txq) > + list_del(&walker->list); > + } How could this even happen? All artxq->list accesses (add/del) are protected by txqs_lock so this shouldn't happen, no? Do you perhaps have the logic around txqs reworked in your tree? Michał _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries. 2016-08-19 6:59 ` [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries Michal Kazior @ 2016-08-19 13:34 ` Ben Greear 2016-12-01 22:52 ` Ben Greear 0 siblings, 1 reply; 20+ messages in thread From: Ben Greear @ 2016-08-19 13:34 UTC (permalink / raw) To: Michal Kazior; +Cc: linux-wireless, ath10k@lists.infradead.org On 08/18/2016 11:59 PM, Michal Kazior wrote: > On 19 August 2016 at 03:26, <greearb@candelatech.com> wrote: >> From: Ben Greear <greearb@candelatech.com> >> >> I was seeing kernel crashes due to accessing freed memory >> while debugging a 9984 firmware that was crashing often. >> >> This patch fixes the crashes. I am not certain if there >> is a better way or not. >> >> Signed-off-by: Ben Greear <greearb@candelatech.com> >> --- >> drivers/net/wireless/ath/ath10k/mac.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c >> index 5659ef1..916119c 100644 >> --- a/drivers/net/wireless/ath/ath10k/mac.c >> +++ b/drivers/net/wireless/ath/ath10k/mac.c >> @@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq *txq) >> static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq) >> { >> struct ath10k_txq *artxq = (void *)txq->drv_priv; >> + struct ath10k_txq *tmp, *walker; >> struct ath10k_skb_cb *cb; >> struct sk_buff *msdu; >> + struct ieee80211_txq *txq_tmp; >> int msdu_id; >> >> if (!txq) >> @@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq) >> spin_lock_bh(&ar->txqs_lock); >> if (!list_empty(&artxq->list)) >> list_del_init(&artxq->list); >> + >> + /* Remove from ar->txqs in case it still exists there. */ >> + list_for_each_entry_safe(walker, tmp, &ar->txqs, list) { >> + txq_tmp = container_of((void *)walker, struct ieee80211_txq, >> + drv_priv); >> + if (txq_tmp == txq) >> + list_del(&walker->list); >> + } > > How could this even happen? All artxq->list accesses (add/del) are > protected by txqs_lock so this shouldn't happen, no? > > Do you perhaps have the logic around txqs reworked in your tree? I don't have any significant changes as far as I can tell. I can build you a buggy 9984 firmware to reproduce the problem if you want... Maybe the upstream patch could WARN_ON in this case to see if anyone else ever hits it? I did see a comment in the mac80211 about some assumptions on the driver with regard to station teardown...I am not 100% sure ath10k meets that assumption, so maybe that is why I could see this problem. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries. 2016-08-19 13:34 ` Ben Greear @ 2016-12-01 22:52 ` Ben Greear 2016-12-02 0:24 ` Ben Greear 0 siblings, 1 reply; 20+ messages in thread From: Ben Greear @ 2016-12-01 22:52 UTC (permalink / raw) To: Michal Kazior; +Cc: linux-wireless, ath10k@lists.infradead.org On 08/19/2016 06:34 AM, Ben Greear wrote: > > > On 08/18/2016 11:59 PM, Michal Kazior wrote: >> On 19 August 2016 at 03:26, <greearb@candelatech.com> wrote: >>> From: Ben Greear <greearb@candelatech.com> >>> >>> I was seeing kernel crashes due to accessing freed memory >>> while debugging a 9984 firmware that was crashing often. >>> >>> This patch fixes the crashes. I am not certain if there >>> is a better way or not. I did some more hacking on this today. I think I found some better clue on this. I added this code: static void ath10k_mac_txq_init(struct ath10k *ar, struct ieee80211_txq *txq) { struct ath10k_txq *artxq = (void *)txq->drv_priv; struct ath10k_txq *tmp, *walker; struct ieee80211_txq *txq_tmp; int i = 0; if (!txq) return; spin_lock_bh(&ar->txqs_lock); ar->txqs_lock.rlock.dbg1 = 104; /* Remove from ar->txqs in case it still exists there. */ list_for_each_entry_safe(walker, tmp, &ar->txqs, list) { txq_tmp = container_of((void *)walker, struct ieee80211_txq, drv_priv); if ((++i % 10000) == 0) { ath10k_err(ar, "txq-init: Checking txq_tmp: %p i: %d\n", txq_tmp, i); ath10k_err(ar, "txq-init: txqs: %p walker->list: %p w->next: %p w->prev: %p ar->txqs: %p\n", &ar->txqs, &(walker->list), walker->list.next, walker->list.prev, &ar->txqs); } if (txq_tmp == txq) { WARN_ON_ONCE(1); ath10k_err(ar, "txq-init: Found txq when it should be deleted, txq_tmp: %p txq: %p\n", txq_tmp, txq); list_del(&walker->list); } } spin_unlock_bh(&ar->txqs_lock); INIT_LIST_HEAD(&artxq->list); } [firmware has just crashed] Dec 01 14:43:06 wave2 kernel: ------------[ cut here ]------------ Dec 01 14:43:06 wave2 kernel: WARNING: CPU: 0 PID: 193 at /home/greearb/git/linux-4.7.dev.y/drivers/net/wireless/ath/ath10k/mac.c:4217 ath10k_mac_txq_init+0x1a7/0x1b0 [ath10k_core] Dec 01 14:43:06 wave2 kernel: Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 bridge 8021q garp mrp stp llc bnep bluetooth fuse macvlan pktgen rpcsec_gss_krb5 nfsv4 nfs fscache coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic kvm iTCO_wdt irqbypass iTCO_vendor_support ath10k_pci ath10k_core joydev ath snd_hda_intel mac80211 snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device pcspkr cfg80211 snd_pcm snd_timer snd i2c_i801 lpc_ich shpchp soundcore tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc serio_raw i915 i2c_algo_bit ata_generic drm_kms_helper pata_acpi e1000e ptp drm pps_core i2c_core fjes video ipv6 [last unloaded: nf_conntrack] Dec 01 14:43:06 wave2 kernel: CPU: 0 PID: 193 Comm: kworker/0:1 Not tainted 4.7.10+ #14 Dec 01 14:43:06 wave2 kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013 Dec 01 14:43:06 wave2 kernel: Workqueue: events_freezable ieee80211_restart_work [mac80211] Dec 01 14:43:06 wave2 kernel: ffffffffa0e29507 ffff8801d14f7920 ffffffff8169ed08 0000000000000000 Dec 01 14:43:06 wave2 kernel: 0000000000000000 ffff8801d14f7968 ffffffff811569bc ffff8801d14e4f00 Dec 01 14:43:06 wave2 kernel: 0000107900000001 ffff8800c43ec9a0 0000000000000027 ffff8800c43ec988 Dec 01 14:43:06 wave2 kernel: Call Trace: Dec 01 14:43:06 wave2 kernel: [<ffffffffa0e29507>] ? ath10k_mac_txq_init+0x1a7/0x1b0 [ath10k_core] Dec 01 14:43:06 wave2 kernel: [<ffffffff8169ed08>] dump_stack+0x85/0xcd Dec 01 14:43:06 wave2 kernel: [<ffffffff811569bc>] __warn+0x10c/0x130 Dec 01 14:43:06 wave2 kernel: [<ffffffff81156b58>] warn_slowpath_null+0x18/0x20 Dec 01 14:43:06 wave2 kernel: [<ffffffffa0e29507>] ath10k_mac_txq_init+0x1a7/0x1b0 [ath10k_core] Dec 01 14:43:06 wave2 kernel: [<ffffffffa0e3766f>] ath10k_sta_state+0x4ef/0x1350 [ath10k_core] Dec 01 14:43:06 wave2 kernel: [<ffffffff811e10ed>] ? mark_lock+0x6d/0x8a0 Dec 01 14:43:06 wave2 kernel: [<ffffffffa0e37180>] ? ath10k_station_assoc+0x920/0x920 [ath10k_core] Dec 01 14:43:06 wave2 kernel: [<ffffffff811ddd14>] ? __lock_is_held+0x84/0xc0 Dec 01 14:43:06 wave2 kernel: [<ffffffffa0c0095f>] drv_sta_state+0xef/0xc50 [mac80211] Dec 01 14:43:06 wave2 kernel: [<ffffffffa0c6b1b0>] ieee80211_reconfig+0x10a0/0x2890 [mac80211] Dec 01 14:43:06 wave2 kernel: [<ffffffffa0bf8361>] ieee80211_restart_work+0xb1/0xf0 [mac80211] Dec 01 14:43:06 wave2 kernel: [<ffffffff81184dad>] process_one_work+0x42d/0xac0 Dec 01 14:43:06 wave2 kernel: [<ffffffff81184cf4>] ? process_one_work+0x374/0xac0 Dec 01 14:43:06 wave2 kernel: [<ffffffff81184980>] ? pwq_dec_nr_in_flight+0x110/0x110 Dec 01 14:43:06 wave2 kernel: [<ffffffff811854c6>] worker_thread+0x86/0x730 Dec 01 14:43:06 wave2 kernel: [<ffffffff81df25aa>] ? _raw_spin_unlock_irqrestore+0x5a/0x70 Dec 01 14:43:06 wave2 kernel: [<ffffffff81185440>] ? process_one_work+0xac0/0xac0 Dec 01 14:43:06 wave2 kernel: [<ffffffff8118f381>] kthread+0x191/0x1b0 Dec 01 14:43:06 wave2 kernel: [<ffffffff8118f1f0>] ? kthread_create_on_node+0x320/0x320 Dec 01 14:43:06 wave2 kernel: [<ffffffff8119baa3>] ? preempt_count_sub+0x13/0xd0 Dec 01 14:43:06 wave2 kernel: [<ffffffff81df2f8f>] ret_from_fork+0x1f/0x40 Dec 01 14:43:06 wave2 kernel: [<ffffffff8118f1f0>] ? kthread_create_on_node+0x320/0x320 Dec 01 14:43:06 wave2 kernel: ---[ end trace e64bc8f0c1a2531b ]--- Dec 01 14:43:06 wave2 kernel: ath10k_pci 0000:05:00.0: txq-init: Found txq when it should be deleted, txq_tmp: ffff8800c43ec988 txq: ffff8800c43ec988 Dec 01 14:43:07 wave2 kernel: ath10k_pci 0000:05:00.0: dropping dbg buffer due to crash since read Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries. 2016-12-01 22:52 ` Ben Greear @ 2016-12-02 0:24 ` Ben Greear 2016-12-05 8:50 ` Michal Kazior 0 siblings, 1 reply; 20+ messages in thread From: Ben Greear @ 2016-12-02 0:24 UTC (permalink / raw) To: Michal Kazior; +Cc: linux-wireless, ath10k@lists.infradead.org On 12/01/2016 02:52 PM, Ben Greear wrote: > On 08/19/2016 06:34 AM, Ben Greear wrote: >> >> >> On 08/18/2016 11:59 PM, Michal Kazior wrote: >>> On 19 August 2016 at 03:26, <greearb@candelatech.com> wrote: >>>> From: Ben Greear <greearb@candelatech.com> >>>> >>>> I was seeing kernel crashes due to accessing freed memory >>>> while debugging a 9984 firmware that was crashing often. >>>> >>>> This patch fixes the crashes. I am not certain if there >>>> is a better way or not. Michal, thanks for the help on IRC. I added this logic: static void ieee80211_drv_tx(struct ieee80211_local *local, struct ieee80211_vif *vif, struct ieee80211_sta *pubsta, struct sk_buff *skb) { struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data; struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif); struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); struct ieee80211_tx_control control = { .sta = pubsta, }; struct ieee80211_txq *txq = NULL; struct txq_info *txqi; u8 ac; if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) || (info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE)) goto tx_normal; if (!ieee80211_is_data(hdr->frame_control)) goto tx_normal; if (unlikely(!ieee80211_sdata_running(sdata))) { WARN_ON_ONCE(1); goto delete_and_return; } ... if (atomic_read(&sdata->txqs_len[ac]) >= (local->hw.txq_ac_max_pending * 2)) { /* Must be that something is not paying attention to * max-pending, like pktgen, so just drop this frame. */ delete_and_return: ieee80211_free_txskb(&local->hw, skb); return; } But, I still see the txq entries on the ar->txqs list in the ath10k_mac_txq_init after firmware crash in my test case. Is this the test you were suggesting? Thanks, Ben > > > I did some more hacking on this today. I think I found some better clue on this. > > I added this code: > > static void ath10k_mac_txq_init(struct ath10k *ar, struct ieee80211_txq *txq) > { > struct ath10k_txq *artxq = (void *)txq->drv_priv; > struct ath10k_txq *tmp, *walker; > struct ieee80211_txq *txq_tmp; > int i = 0; > > if (!txq) > return; > > spin_lock_bh(&ar->txqs_lock); > ar->txqs_lock.rlock.dbg1 = 104; > > /* Remove from ar->txqs in case it still exists there. */ > list_for_each_entry_safe(walker, tmp, &ar->txqs, list) { > txq_tmp = container_of((void *)walker, struct ieee80211_txq, > drv_priv); > if ((++i % 10000) == 0) { > ath10k_err(ar, "txq-init: Checking txq_tmp: %p i: %d\n", txq_tmp, i); > ath10k_err(ar, "txq-init: txqs: %p walker->list: %p w->next: %p w->prev: %p ar->txqs: %p\n", > &ar->txqs, &(walker->list), walker->list.next, walker->list.prev, &ar->txqs); > } > > if (txq_tmp == txq) { > WARN_ON_ONCE(1); > ath10k_err(ar, "txq-init: Found txq when it should be deleted, txq_tmp: %p txq: %p\n", > txq_tmp, txq); > list_del(&walker->list); > } > } > spin_unlock_bh(&ar->txqs_lock); > > INIT_LIST_HEAD(&artxq->list); > } > > > [firmware has just crashed] > > Dec 01 14:43:06 wave2 kernel: ------------[ cut here ]------------ > Dec 01 14:43:06 wave2 kernel: WARNING: CPU: 0 PID: 193 at /home/greearb/git/linux-4.7.dev.y/drivers/net/wireless/ath/ath10k/mac.c:4217 > ath10k_mac_txq_init+0x1a7/0x1b0 [ath10k_core] > Dec 01 14:43:06 wave2 kernel: Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 bridge 8021q garp mrp stp llc bnep bluetooth fuse > macvlan pktgen rpcsec_gss_krb5 nfsv4 nfs fscache coretemp hwmon intel_rapl x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi snd_hda_codec_realtek > snd_hda_codec_generic kvm iTCO_wdt irqbypass iTCO_vendor_support ath10k_pci ath10k_core joydev ath snd_hda_intel mac80211 snd_hda_codec snd_hda_core snd_hwdep > snd_seq snd_seq_device pcspkr cfg80211 snd_pcm snd_timer snd i2c_i801 lpc_ich shpchp soundcore tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc serio_raw > i915 i2c_algo_bit ata_generic drm_kms_helper pata_acpi e1000e ptp drm pps_core i2c_core fjes video ipv6 [last unloaded: nf_conntrack] > Dec 01 14:43:06 wave2 kernel: CPU: 0 PID: 193 Comm: kworker/0:1 Not tainted 4.7.10+ #14 > Dec 01 14:43:06 wave2 kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013 > Dec 01 14:43:06 wave2 kernel: Workqueue: events_freezable ieee80211_restart_work [mac80211] > Dec 01 14:43:06 wave2 kernel: ffffffffa0e29507 ffff8801d14f7920 ffffffff8169ed08 0000000000000000 > Dec 01 14:43:06 wave2 kernel: 0000000000000000 ffff8801d14f7968 ffffffff811569bc ffff8801d14e4f00 > Dec 01 14:43:06 wave2 kernel: 0000107900000001 ffff8800c43ec9a0 0000000000000027 ffff8800c43ec988 > Dec 01 14:43:06 wave2 kernel: Call Trace: > Dec 01 14:43:06 wave2 kernel: [<ffffffffa0e29507>] ? ath10k_mac_txq_init+0x1a7/0x1b0 [ath10k_core] > Dec 01 14:43:06 wave2 kernel: [<ffffffff8169ed08>] dump_stack+0x85/0xcd > Dec 01 14:43:06 wave2 kernel: [<ffffffff811569bc>] __warn+0x10c/0x130 > Dec 01 14:43:06 wave2 kernel: [<ffffffff81156b58>] warn_slowpath_null+0x18/0x20 > Dec 01 14:43:06 wave2 kernel: [<ffffffffa0e29507>] ath10k_mac_txq_init+0x1a7/0x1b0 [ath10k_core] > Dec 01 14:43:06 wave2 kernel: [<ffffffffa0e3766f>] ath10k_sta_state+0x4ef/0x1350 [ath10k_core] > Dec 01 14:43:06 wave2 kernel: [<ffffffff811e10ed>] ? mark_lock+0x6d/0x8a0 > Dec 01 14:43:06 wave2 kernel: [<ffffffffa0e37180>] ? ath10k_station_assoc+0x920/0x920 [ath10k_core] > Dec 01 14:43:06 wave2 kernel: [<ffffffff811ddd14>] ? __lock_is_held+0x84/0xc0 > Dec 01 14:43:06 wave2 kernel: [<ffffffffa0c0095f>] drv_sta_state+0xef/0xc50 [mac80211] > Dec 01 14:43:06 wave2 kernel: [<ffffffffa0c6b1b0>] ieee80211_reconfig+0x10a0/0x2890 [mac80211] > Dec 01 14:43:06 wave2 kernel: [<ffffffffa0bf8361>] ieee80211_restart_work+0xb1/0xf0 [mac80211] > Dec 01 14:43:06 wave2 kernel: [<ffffffff81184dad>] process_one_work+0x42d/0xac0 > Dec 01 14:43:06 wave2 kernel: [<ffffffff81184cf4>] ? process_one_work+0x374/0xac0 > Dec 01 14:43:06 wave2 kernel: [<ffffffff81184980>] ? pwq_dec_nr_in_flight+0x110/0x110 > Dec 01 14:43:06 wave2 kernel: [<ffffffff811854c6>] worker_thread+0x86/0x730 > Dec 01 14:43:06 wave2 kernel: [<ffffffff81df25aa>] ? _raw_spin_unlock_irqrestore+0x5a/0x70 > Dec 01 14:43:06 wave2 kernel: [<ffffffff81185440>] ? process_one_work+0xac0/0xac0 > Dec 01 14:43:06 wave2 kernel: [<ffffffff8118f381>] kthread+0x191/0x1b0 > Dec 01 14:43:06 wave2 kernel: [<ffffffff8118f1f0>] ? kthread_create_on_node+0x320/0x320 > Dec 01 14:43:06 wave2 kernel: [<ffffffff8119baa3>] ? preempt_count_sub+0x13/0xd0 > Dec 01 14:43:06 wave2 kernel: [<ffffffff81df2f8f>] ret_from_fork+0x1f/0x40 > Dec 01 14:43:06 wave2 kernel: [<ffffffff8118f1f0>] ? kthread_create_on_node+0x320/0x320 > Dec 01 14:43:06 wave2 kernel: ---[ end trace e64bc8f0c1a2531b ]--- > Dec 01 14:43:06 wave2 kernel: ath10k_pci 0000:05:00.0: txq-init: Found txq when it should be deleted, txq_tmp: ffff8800c43ec988 txq: ffff8800c43ec988 > Dec 01 14:43:07 wave2 kernel: ath10k_pci 0000:05:00.0: dropping dbg buffer due to crash since read > > > Thanks, > Ben > -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries. 2016-12-02 0:24 ` Ben Greear @ 2016-12-05 8:50 ` Michal Kazior 2016-12-05 18:19 ` Ben Greear 0 siblings, 1 reply; 20+ messages in thread From: Michal Kazior @ 2016-12-05 8:50 UTC (permalink / raw) To: Ben Greear; +Cc: linux-wireless, ath10k@lists.infradead.org On 2 December 2016 at 01:24, Ben Greear <greearb@candelatech.com> wrote: > On 12/01/2016 02:52 PM, Ben Greear wrote: >> >> On 08/19/2016 06:34 AM, Ben Greear wrote: >>> >>> >>> >>> On 08/18/2016 11:59 PM, Michal Kazior wrote: >>>> >>>> On 19 August 2016 at 03:26, <greearb@candelatech.com> wrote: >>>>> >>>>> From: Ben Greear <greearb@candelatech.com> >>>>> >>>>> I was seeing kernel crashes due to accessing freed memory >>>>> while debugging a 9984 firmware that was crashing often. >>>>> >>>>> This patch fixes the crashes. I am not certain if there >>>>> is a better way or not. > > > Michal, thanks for the help on IRC. I added this logic: > > static void ieee80211_drv_tx(struct ieee80211_local *local, > struct ieee80211_vif *vif, > struct ieee80211_sta *pubsta, > struct sk_buff *skb) > { > struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data; > struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif); > struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); > struct ieee80211_tx_control control = { > .sta = pubsta, > }; > struct ieee80211_txq *txq = NULL; > struct txq_info *txqi; > u8 ac; > > if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) || > (info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE)) > goto tx_normal; > > if (!ieee80211_is_data(hdr->frame_control)) > goto tx_normal; > > if (unlikely(!ieee80211_sdata_running(sdata))) { > WARN_ON_ONCE(1); > goto delete_and_return; > } > > ... > > if (atomic_read(&sdata->txqs_len[ac]) >= > (local->hw.txq_ac_max_pending * 2)) { > /* Must be that something is not paying attention to > * max-pending, like pktgen, so just drop this frame. > */ > delete_and_return: > ieee80211_free_txskb(&local->hw, skb); > return; > } > > > But, I still see the txq entries on the ar->txqs list in the > ath10k_mac_txq_init > after firmware crash in my test case. Is this the test you were suggesting? Yes. Now that I think about it mac80211 doesn't call anything in driver during hw_restart that would unref txqs. This means you'll have them still linked when add_interface/sta_state is called, no? This means that either: (a) txq (re-)init should be smarter in ath10k (b) txqs should be purged during hw_restart in ath10k Michał _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries. 2016-12-05 8:50 ` Michal Kazior @ 2016-12-05 18:19 ` Ben Greear 0 siblings, 0 replies; 20+ messages in thread From: Ben Greear @ 2016-12-05 18:19 UTC (permalink / raw) To: Michal Kazior; +Cc: linux-wireless, ath10k@lists.infradead.org On 12/05/2016 12:50 AM, Michal Kazior wrote: > On 2 December 2016 at 01:24, Ben Greear <greearb@candelatech.com> wrote: >> On 12/01/2016 02:52 PM, Ben Greear wrote: >>> >>> On 08/19/2016 06:34 AM, Ben Greear wrote: >>>> >>>> >>>> >>>> On 08/18/2016 11:59 PM, Michal Kazior wrote: >>>>> >>>>> On 19 August 2016 at 03:26, <greearb@candelatech.com> wrote: >>>>>> >>>>>> From: Ben Greear <greearb@candelatech.com> >>>>>> >>>>>> I was seeing kernel crashes due to accessing freed memory >>>>>> while debugging a 9984 firmware that was crashing often. >>>>>> >>>>>> This patch fixes the crashes. I am not certain if there >>>>>> is a better way or not. >> >> >> Michal, thanks for the help on IRC. I added this logic: >> >> static void ieee80211_drv_tx(struct ieee80211_local *local, >> struct ieee80211_vif *vif, >> struct ieee80211_sta *pubsta, >> struct sk_buff *skb) >> { >> struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data; >> struct ieee80211_sub_if_data *sdata = vif_to_sdata(vif); >> struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); >> struct ieee80211_tx_control control = { >> .sta = pubsta, >> }; >> struct ieee80211_txq *txq = NULL; >> struct txq_info *txqi; >> u8 ac; >> >> if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) || >> (info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE)) >> goto tx_normal; >> >> if (!ieee80211_is_data(hdr->frame_control)) >> goto tx_normal; >> >> if (unlikely(!ieee80211_sdata_running(sdata))) { >> WARN_ON_ONCE(1); >> goto delete_and_return; >> } >> >> ... >> >> if (atomic_read(&sdata->txqs_len[ac]) >= >> (local->hw.txq_ac_max_pending * 2)) { >> /* Must be that something is not paying attention to >> * max-pending, like pktgen, so just drop this frame. >> */ >> delete_and_return: >> ieee80211_free_txskb(&local->hw, skb); >> return; >> } >> >> >> But, I still see the txq entries on the ar->txqs list in the >> ath10k_mac_txq_init >> after firmware crash in my test case. Is this the test you were suggesting? > > Yes. > > Now that I think about it mac80211 doesn't call anything in driver > during hw_restart that would unref txqs. This means you'll have them > still linked when add_interface/sta_state is called, no? > > This means that either: > (a) txq (re-)init should be smarter in ath10k > (b) txqs should be purged during hw_restart in ath10k I posted a patch that does (a) last Friday: "ath10k: work-around for stale txq in ar->txqs" I realized it will not apply upstream because it is also patching the previous work-around I had in the patch that originated this email thread. With these patches and the iterate hack to mac80211, then I no longer see crashes in my test case that previously crashed very readily. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries. 2016-08-19 1:26 [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries greearb ` (2 preceding siblings ...) 2016-08-19 6:59 ` [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries Michal Kazior @ 2016-09-09 17:25 ` Felix Fietkau 2016-09-09 17:46 ` Ben Greear 3 siblings, 1 reply; 20+ messages in thread From: Felix Fietkau @ 2016-09-09 17:25 UTC (permalink / raw) To: greearb, ath10k; +Cc: linux-wireless On 2016-08-19 03:26, greearb@candelatech.com wrote: > From: Ben Greear <greearb@candelatech.com> > > I was seeing kernel crashes due to accessing freed memory > while debugging a 9984 firmware that was crashing often. > > This patch fixes the crashes. I am not certain if there > is a better way or not. > > Signed-off-by: Ben Greear <greearb@candelatech.com> > --- > drivers/net/wireless/ath/ath10k/mac.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c > index 5659ef1..916119c 100644 > --- a/drivers/net/wireless/ath/ath10k/mac.c > +++ b/drivers/net/wireless/ath/ath10k/mac.c > @@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq *txq) > static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq) > { > struct ath10k_txq *artxq = (void *)txq->drv_priv; > + struct ath10k_txq *tmp, *walker; > struct ath10k_skb_cb *cb; > struct sk_buff *msdu; > + struct ieee80211_txq *txq_tmp; > int msdu_id; > > if (!txq) > @@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq) > spin_lock_bh(&ar->txqs_lock); > if (!list_empty(&artxq->list)) > list_del_init(&artxq->list); > + > + /* Remove from ar->txqs in case it still exists there. */ > + list_for_each_entry_safe(walker, tmp, &ar->txqs, list) { > + txq_tmp = container_of((void *)walker, struct ieee80211_txq, > + drv_priv); > + if (txq_tmp == txq) > + list_del(&walker->list); > + } This makes no sense at all. From txq_tmp == txq we can deduce that walker == artxq. In the context above, it already does a list_del_init(&artxq->list). - Felix _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries. 2016-09-09 17:25 ` Felix Fietkau @ 2016-09-09 17:46 ` Ben Greear 0 siblings, 0 replies; 20+ messages in thread From: Ben Greear @ 2016-09-09 17:46 UTC (permalink / raw) To: Felix Fietkau, ath10k; +Cc: linux-wireless On 09/09/2016 10:25 AM, Felix Fietkau wrote: > On 2016-08-19 03:26, greearb@candelatech.com wrote: >> From: Ben Greear <greearb@candelatech.com> >> >> I was seeing kernel crashes due to accessing freed memory >> while debugging a 9984 firmware that was crashing often. >> >> This patch fixes the crashes. I am not certain if there >> is a better way or not. >> >> Signed-off-by: Ben Greear <greearb@candelatech.com> >> --- >> drivers/net/wireless/ath/ath10k/mac.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c >> index 5659ef1..916119c 100644 >> --- a/drivers/net/wireless/ath/ath10k/mac.c >> +++ b/drivers/net/wireless/ath/ath10k/mac.c >> @@ -4172,8 +4172,10 @@ static void ath10k_mac_txq_init(struct ieee80211_txq *txq) >> static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq) >> { >> struct ath10k_txq *artxq = (void *)txq->drv_priv; >> + struct ath10k_txq *tmp, *walker; >> struct ath10k_skb_cb *cb; >> struct sk_buff *msdu; >> + struct ieee80211_txq *txq_tmp; >> int msdu_id; >> >> if (!txq) >> @@ -4182,6 +4184,14 @@ static void ath10k_mac_txq_unref(struct ath10k *ar, struct ieee80211_txq *txq) >> spin_lock_bh(&ar->txqs_lock); >> if (!list_empty(&artxq->list)) >> list_del_init(&artxq->list); >> + >> + /* Remove from ar->txqs in case it still exists there. */ >> + list_for_each_entry_safe(walker, tmp, &ar->txqs, list) { >> + txq_tmp = container_of((void *)walker, struct ieee80211_txq, >> + drv_priv); >> + if (txq_tmp == txq) >> + list_del(&walker->list); >> + } > This makes no sense at all. From txq_tmp == txq we can deduce that > walker == artxq. In the context above, it already does a > list_del_init(&artxq->list). This fixed my problem, so something about this matters. Possibly it works around some other race, just possibly it is because of some other regression/bug in my driver/kernel. I thought maybe the issue was that flushing doesn't really work for ath10k, so when the upper stack tried to flush and delete a station there were still skbs in the driver that were referencing the txqs up in mac80211. Also, this bug was triggered by firmware that crashed very often on transmit of an skb, so in general there were skbs that were not properly transmitted and maybe that also triggers some other bug/race in the driver. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com _______________________________________________ ath10k mailing list ath10k@lists.infradead.org http://lists.infradead.org/mailman/listinfo/ath10k ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-12-05 18:47 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-08-19 1:26 [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries greearb 2016-08-19 1:26 ` [PATCH 2/3] ath10k: Grab rcu_read_lock before the txqs spinlock greearb 2016-08-19 3:01 ` Manoharan, Rajkumar 2016-08-19 3:28 ` Ben Greear 2016-09-09 13:36 ` Valo, Kalle 2016-09-09 14:47 ` Ben Greear 2016-09-12 6:41 ` Johannes Berg 2016-09-12 16:37 ` Ben Greear 2016-08-19 1:26 ` [PATCH 3/3] ath10k: Improve logging message greearb 2016-08-19 6:35 ` Mohammed Shafi Shajakhan 2016-09-09 13:30 ` Valo, Kalle 2016-09-13 12:29 ` [3/3] " Kalle Valo 2016-08-19 6:59 ` [PATCH 1/3] ath10k: Ensure there are no stale ar->txqs entries Michal Kazior 2016-08-19 13:34 ` Ben Greear 2016-12-01 22:52 ` Ben Greear 2016-12-02 0:24 ` Ben Greear 2016-12-05 8:50 ` Michal Kazior 2016-12-05 18:19 ` Ben Greear 2016-09-09 17:25 ` Felix Fietkau 2016-09-09 17:46 ` Ben Greear
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).