* [RFC] ath9k: Fix softlockup in AR9485
@ 2012-05-30 16:10 Mohammed Shafi Shajakhan
2012-05-30 18:48 ` Adrian Chadd
2012-05-31 2:44 ` [ath9k-devel] " Sujith Manoharan
0 siblings, 2 replies; 9+ messages in thread
From: Mohammed Shafi Shajakhan @ 2012-05-30 16:10 UTC (permalink / raw)
To: John W. Linville
Cc: linux-wireless, Rodriguez Luis, ath9k-devel,
Mohammed Shafi Shajakhan, stable
From: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
steps to recreate:
load latest ath9k driver with AR9485
stop the network-manager and wpa_supplicant
bring the interface up
Call Trace:
[<ffffffffa0517490>] ? ath_hw_check+0xe0/0xe0 [ath9k]
[<ffffffff812cd1e8>] __const_udelay+0x28/0x30
[<ffffffffa03bae7a>] ar9003_get_pll_sqsum_dvc+0x4a/0x80 [ath9k_hw]
[<ffffffffa05174eb>] ath_hw_pll_work+0x5b/0xe0 [ath9k]
[<ffffffff810744fe>] process_one_work+0x11e/0x470
[<ffffffff8107530f>] worker_thread+0x15f/0x360
[<ffffffff810751b0>] ? manage_workers+0x230/0x230
[<ffffffff81079af3>] kthread+0x93/0xa0
[<ffffffff815fd3a4>] kernel_thread_helper+0x4/0x10
[<ffffffff81079a60>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff815fd3a0>] ? gs_change+0x13/0x13
before the STA is associated PLL4(0x1618c) always seem to dump
zero, which causes a softlockup as we keep on polling infinitely
PLL4's 8th bit. Once PLL4'S 8th bit is set indicates we can take
PLL3(0x16188) readings which tells us whether PLL is locked or not.
if the PLL is not locked we have a rx hang.
It does not looks safe to poll PLL4 before the STA is associated.
so it is safe to check this rx hang after association, where we might
have a possibility of rx hang under stress testing. digging into the
PLL stuff did not yield anything much better.
previously the register dumped 0xdeadbeef and the hw_pll_work
itself is cancelled by ath_radio_disable(obselete)
fixes https://bugzilla.redhat.com/show_bug.cgi?id=811142
Reported-by: Rolf Offermanns <rolf.offermanns@gmx.net>
Cc: stable@vger.kernel.org
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
---
drivers/net/wireless/ath/ath9k/main.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index dfa78e8..a03ad3e 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -970,6 +970,9 @@ void ath_hw_pll_work(struct work_struct *work)
hw_pll_work.work);
u32 pll_sqsum;
+ if (!(sc->sc_flags & SC_OP_PRIM_STA_VIF))
+ return;
+
if (AR_SREV_9485(sc->sc_ah)) {
ath9k_ps_wakeup(sc);
--
1.7.0.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC] ath9k: Fix softlockup in AR9485
2012-05-30 16:10 [RFC] ath9k: Fix softlockup in AR9485 Mohammed Shafi Shajakhan
@ 2012-05-30 18:48 ` Adrian Chadd
2012-05-31 4:49 ` Mohammed Shafi Shajakhan
2012-05-31 2:44 ` [ath9k-devel] " Sujith Manoharan
1 sibling, 1 reply; 9+ messages in thread
From: Adrian Chadd @ 2012-05-30 18:48 UTC (permalink / raw)
To: Mohammed Shafi Shajakhan
Cc: John W. Linville, linux-wireless, Rodriguez Luis, ath9k-devel,
stable
(Yes, this is a public response.)
Hi,
That has me a bit concerned. Waiting for the STA to be associated is
really just some kind of "settling" delay. You're still TX/RXing at
that point.
I think we need to speak to the baseband/clock teams and figure out
why this is actually occuring.
Can we please leave this out of the tree (but in the bug report) until
this gets root caused internally?
Thanks,
adrian
^ permalink raw reply [flat|nested] 9+ messages in thread
* [ath9k-devel] [RFC] ath9k: Fix softlockup in AR9485
2012-05-30 16:10 [RFC] ath9k: Fix softlockup in AR9485 Mohammed Shafi Shajakhan
2012-05-30 18:48 ` Adrian Chadd
@ 2012-05-31 2:44 ` Sujith Manoharan
2012-05-31 5:34 ` Mohammed Shafi Shajakhan
1 sibling, 1 reply; 9+ messages in thread
From: Sujith Manoharan @ 2012-05-31 2:44 UTC (permalink / raw)
To: Mohammed Shafi Shajakhan
Cc: John W. Linville, linux-wireless, Mohammed, stable,
Rodriguez Luis, ath9k-devel
Mohammed Shafi Shajakhan wrote:
> before the STA is associated PLL4(0x1618c) always seem to dump
> zero, which causes a softlockup as we keep on polling infinitely
> PLL4's 8th bit. Once PLL4'S 8th bit is set indicates we can take
> PLL3(0x16188) readings which tells us whether PLL is locked or not.
> if the PLL is not locked we have a rx hang.
> It does not looks safe to poll PLL4 before the STA is associated.
> so it is safe to check this rx hang after association, where we might
> have a possibility of rx hang under stress testing. digging into the
> PLL stuff did not yield anything much better.
>
> previously the register dumped 0xdeadbeef and the hw_pll_work
> itself is cancelled by ath_radio_disable(obselete)
>
> fixes https://bugzilla.redhat.com/show_bug.cgi?id=811142
>
> Reported-by: Rolf Offermanns <rolf.offermanns@gmx.net>
> Cc: stable@vger.kernel.org
> Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com>
> ---
> drivers/net/wireless/ath/ath9k/main.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
> index dfa78e8..a03ad3e 100644
> --- a/drivers/net/wireless/ath/ath9k/main.c
> +++ b/drivers/net/wireless/ath/ath9k/main.c
> @@ -970,6 +970,9 @@ void ath_hw_pll_work(struct work_struct *work)
> hw_pll_work.work);
> u32 pll_sqsum;
>
> + if (!(sc->sc_flags & SC_OP_PRIM_STA_VIF))
> + return;
> +
> if (AR_SREV_9485(sc->sc_ah)) {
>
> ath9k_ps_wakeup(sc);
All this is really racy, I think. In the various callsites where we issue a HW
reset by queuing a work, there is no guarantee that the caller would not be
invoked once again before the reset work has had a chance to kick in. And there is
nothing that prevents the various poll routines from checking the HW while a
reset in progress.
And why are we using HZ for the pll_work ?
Sujith
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] ath9k: Fix softlockup in AR9485
2012-05-30 18:48 ` Adrian Chadd
@ 2012-05-31 4:49 ` Mohammed Shafi Shajakhan
0 siblings, 0 replies; 9+ messages in thread
From: Mohammed Shafi Shajakhan @ 2012-05-31 4:49 UTC (permalink / raw)
To: Adrian Chadd
Cc: John W. Linville, linux-wireless, Rodriguez Luis, ath9k-devel,
stable
Hi Adrian,
On Thursday 31 May 2012 12:18 AM, Adrian Chadd wrote:
> (Yes, this is a public response.)
>
> Hi,
>
> That has me a bit concerned. Waiting for the STA to be associated is
> really just some kind of "settling" delay. You're still TX/RXing at
> that point.
yeah, but i am pretty sure in the STA mode we have this PLL registers
having sane values once the STA is associated. i checked if i am missing
hardware code/init_pll routine or if its actually overwritten by INI
values, but no. i also checked this issue with IBSS and monitor mode
even under shield room, still it did not show much improvements or more
clue.
>
> I think we need to speak to the baseband/clock teams and figure out
> why this is actually occuring.
oh yes and i had been already sending some mails internally, checking
with people who might have good knowledge about this.
>
> Can we please leave this out of the tree (but in the bug report) until
> this gets root caused internally?
one thing i observed was this rx hang detection is based on the 'delta
of rx unicast packets' for a particular time. if its lesser than 5 this
rx hang detection kicks off. another system expert told me this rx hang
itself can be specific to AP rather than STA.
finally i would check with one system expert and another Engineer to get
some thoughts, then send it as a PATCH.
>
> Thanks,
>
>
>
> adrian
--
thanks,
shafi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: Fix softlockup in AR9485
2012-05-31 2:44 ` [ath9k-devel] " Sujith Manoharan
@ 2012-05-31 5:34 ` Mohammed Shafi Shajakhan
2012-05-31 5:41 ` Sujith Manoharan
0 siblings, 1 reply; 9+ messages in thread
From: Mohammed Shafi Shajakhan @ 2012-05-31 5:34 UTC (permalink / raw)
To: Sujith Manoharan
Cc: John W. Linville, linux-wireless, Mohammed Shafi Shajakhan,
stable, Rodriguez Luis, ath9k-devel
Hi Sujith,
On Thursday 31 May 2012 08:14 AM, Sujith Manoharan wrote:
> Mohammed Shafi Shajakhan wrote:
>> before the STA is associated PLL4(0x1618c) always seem to dump
>> zero, which causes a softlockup as we keep on polling infinitely
>> PLL4's 8th bit. Once PLL4'S 8th bit is set indicates we can take
>> PLL3(0x16188) readings which tells us whether PLL is locked or not.
>> if the PLL is not locked we have a rx hang.
>> It does not looks safe to poll PLL4 before the STA is associated.
>> so it is safe to check this rx hang after association, where we might
>> have a possibility of rx hang under stress testing. digging into the
>> PLL stuff did not yield anything much better.
>>
>> previously the register dumped 0xdeadbeef and the hw_pll_work
>> itself is cancelled by ath_radio_disable(obselete)
>>
>> fixes https://bugzilla.redhat.com/show_bug.cgi?id=811142
>>
>> Reported-by: Rolf Offermanns<rolf.offermanns@gmx.net>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Mohammed Shafi Shajakhan<mohammed@qca.qualcomm.com>
>> ---
>> drivers/net/wireless/ath/ath9k/main.c | 3 +++
>> 1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
>> index dfa78e8..a03ad3e 100644
>> --- a/drivers/net/wireless/ath/ath9k/main.c
>> +++ b/drivers/net/wireless/ath/ath9k/main.c
>> @@ -970,6 +970,9 @@ void ath_hw_pll_work(struct work_struct *work)
>> hw_pll_work.work);
>> u32 pll_sqsum;
>>
>> + if (!(sc->sc_flags& SC_OP_PRIM_STA_VIF))
>> + return;
>> +
>> if (AR_SREV_9485(sc->sc_ah)) {
>>
>> ath9k_ps_wakeup(sc);
>
>
> All this is really racy, I think. In the various callsites where we issue a HW
> reset by queuing a work, there is no guarantee that the caller would not be
> invoked once again before the reset work has had a chance to kick in. And there is
> nothing that prevents the various poll routines from checking the HW while a
> reset in progress.
should we check if (work_pending(&sc->hw_reset_work)) like we do in
ath_beacon_tasklet in all the other work routine that access hardware
registers in case the hw_reset_work is not cancelled(possible in
ath_set_channel)
>
> And why are we using HZ for the pll_work ?
don't know, i put a printk to see the routine seems to get executed for
every 200ms(HZ/5). scan work in mac80211 seems to use HZ
ieee80211_queue_delayed_work(&local->hw, &local->scan_work, next_delay);
>
> Sujith
--
thanks,
shafi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: Fix softlockup in AR9485
2012-05-31 5:34 ` Mohammed Shafi Shajakhan
@ 2012-05-31 5:41 ` Sujith Manoharan
2012-05-31 5:52 ` Mohammed Shafi Shajakhan
0 siblings, 1 reply; 9+ messages in thread
From: Sujith Manoharan @ 2012-05-31 5:41 UTC (permalink / raw)
To: Mohammed Shafi Shajakhan
Cc: Sujith Manoharan, John W. Linville, linux-wireless, stable,
Rodriguez Luis, ath9k-devel
Mohammed Shafi Shajakhan wrote:
> should we check if (work_pending(&sc->hw_reset_work)) like we do in
> ath_beacon_tasklet in all the other work routine that access hardware
> registers in case the hw_reset_work is not cancelled(possible in
> ath_set_channel)
Probably, but this needs to be done in a clean manner.
> don't know, i put a printk to see the routine seems to get executed for
> every 200ms(HZ/5). scan work in mac80211 seems to use HZ
> ieee80211_queue_delayed_work(&local->hw, &local->scan_work, next_delay);
HZ can be configured to a user-defined value, so we need to make sure
that the timing for the PLL work is what we want.
Sujith
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: Fix softlockup in AR9485
2012-05-31 5:41 ` Sujith Manoharan
@ 2012-05-31 5:52 ` Mohammed Shafi Shajakhan
2012-06-08 12:16 ` Sujith Manoharan
0 siblings, 1 reply; 9+ messages in thread
From: Mohammed Shafi Shajakhan @ 2012-05-31 5:52 UTC (permalink / raw)
To: Sujith Manoharan
Cc: John W. Linville, linux-wireless, stable, Rodriguez Luis,
ath9k-devel, Mohammed Shafi Shajakhan
On Thursday 31 May 2012 11:11 AM, Sujith Manoharan wrote:
> Mohammed Shafi Shajakhan wrote:
>> should we check if (work_pending(&sc->hw_reset_work)) like we do in
>> ath_beacon_tasklet in all the other work routine that access hardware
>> registers in case the hw_reset_work is not cancelled(possible in
>> ath_set_channel)
>
> Probably, but this needs to be done in a clean manner.
ok.
>
>> don't know, i put a printk to see the routine seems to get executed for
>> every 200ms(HZ/5). scan work in mac80211 seems to use HZ
>> ieee80211_queue_delayed_work(&local->hw,&local->scan_work, next_delay);
>
> HZ can be configured to a user-defined value, so we need to make sure
> that the timing for the PLL work is what we want.
ok, then we will change it.
>
> Sujith
--
thanks,
shafi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: Fix softlockup in AR9485
2012-05-31 5:52 ` Mohammed Shafi Shajakhan
@ 2012-06-08 12:16 ` Sujith Manoharan
2012-06-11 5:03 ` Mohammed Shafi Shajakhan
0 siblings, 1 reply; 9+ messages in thread
From: Sujith Manoharan @ 2012-06-08 12:16 UTC (permalink / raw)
To: Mohammed Shafi Shajakhan
Cc: Sujith Manoharan, John W. Linville, linux-wireless, stable,
Rodriguez Luis, ath9k-devel
Mohammed Shafi Shajakhan wrote:
> > HZ can be configured to a user-defined value, so we need to make sure
> > that the timing for the PLL work is what we want.
>
> ok, then we will change it.
Shafi, will you be sending an updated version of this patch ?
AR9485 is unusable currently - just bringing up the interface is
enough to hang the machine...
Sujith
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: Fix softlockup in AR9485
2012-06-08 12:16 ` Sujith Manoharan
@ 2012-06-11 5:03 ` Mohammed Shafi Shajakhan
0 siblings, 0 replies; 9+ messages in thread
From: Mohammed Shafi Shajakhan @ 2012-06-11 5:03 UTC (permalink / raw)
To: Sujith Manoharan
Cc: John W. Linville, linux-wireless, stable, Rodriguez Luis,
ath9k-devel
Hi Sujith,
On Friday 08 June 2012 05:46 PM, Sujith Manoharan wrote:
> Mohammed Shafi Shajakhan wrote:
>>> HZ can be configured to a user-defined value, so we need to make sure
>>> that the timing for the PLL work is what we want.
>>
>> ok, then we will change it.
>
> Shafi, will you be sending an updated version of this patch ?
> AR9485 is unusable currently - just bringing up the interface is
> enough to hang the machine...
>
i will send an updated version based on the latest wireless-testing
tree. thanks.
--
thanks,
shafi
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-06-11 5:03 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-30 16:10 [RFC] ath9k: Fix softlockup in AR9485 Mohammed Shafi Shajakhan
2012-05-30 18:48 ` Adrian Chadd
2012-05-31 4:49 ` Mohammed Shafi Shajakhan
2012-05-31 2:44 ` [ath9k-devel] " Sujith Manoharan
2012-05-31 5:34 ` Mohammed Shafi Shajakhan
2012-05-31 5:41 ` Sujith Manoharan
2012-05-31 5:52 ` Mohammed Shafi Shajakhan
2012-06-08 12:16 ` Sujith Manoharan
2012-06-11 5:03 ` Mohammed Shafi Shajakhan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).