* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-02-23 13:08 ath10k + INTEL_IDLE aka. cstates == firmware crash Fabian Wittenberg
@ 2015-02-23 13:32 ` Michal Kazior
2015-02-23 13:44 ` Fabian Wittenberg
2015-02-23 16:58 ` Ben Greear
2015-03-08 13:45 ` Jeremias Blendin
2 siblings, 1 reply; 16+ messages in thread
From: Michal Kazior @ 2015-02-23 13:32 UTC (permalink / raw)
To: Fabian Wittenberg; +Cc: ath10k@lists.infradead.org
On 23 February 2015 at 14:08, Fabian Wittenberg
<Fabian.Wittenberg@sophos.com> wrote:
> Hi@all,
>
> we are using the brand new QCA988x chipset based on mini-PCIe cards in our newest wifi enabled firewall appliance and we have had
> a lot of problems to get it running (Intel Rangeley platform; Intel(R) Atom(TM) CPU C2558 @ 2.40GHz).
I recall one guy complained his Atom-based laptop wasn't happy running
ath10k either but I think it was some electrical incompatibility and
the machine didn't even POST when the card was plugged into mPCIe
slot.
> The card crashed after some minutes using ath10k-driver (backports-3.19-rc1). Older versions are affected as well.
> At least down to 3.12.20. I did intensive debugging and found out, that there
> are major issues as soon as Intels processor cstates are used. This
> option is called "CONFIG_INTEL_IDLE" in kernel config. This seems to be
> a very heavy issue as it even can lead to low memory corruption and
> kernel freezes. Low memory corruption doesn't occure always; just sometimes. This makes it hard to debug.
> Also you need a multi processor system to trigger the issue.
> If you set kernel parameter "maxcpus=1" the error doesn't occure even if you enable CONFIG_INTEL_IDLE.
Through a quick search I've found this:
https://bugzilla.redhat.com/show_bug.cgi?id=715485
It looks like some BIOSes can have buggy C-state handling. Maybe
that's the root cause? From my experience QCA988x can be sometimes
quirky when it comes to PCIe so I wouldn't be surprised if other
devices don't crash.
> Kernel output looks like this if the card stops working:
>
>
> [ 3715.145865] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
> [ 3715.145876] wifi1: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
> [ 3718.148226] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
> [ 3718.148236] wifi1: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
> [ 3723.152167] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
> [ 3723.152178] wifi0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
> [ 3723.152185] ath10k: failed to transmit management frame via WMI: -11
> [ 3726.154524] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
> [ 3726.154535] wifi0: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
> [ 3729.156884] ath10k: failed to install key for vdev 0 peer 00:0e:8e:ae:5c:1c: -11
> [ 3729.156890] ath10k: failed to transmit management frame via WMI: -11
> [ 3729.156904] wifi0: failed to remove key (0, 00:0e:8e:ae:5c:1c) from hardware (-11)
> [ 3732.159255] ath10k: failed to remove peer wep key 0: -11
> [ 3732.159265] ath10k: failed to clear all peer wep keys for vdev 0: -11
> [ 3732.159273] ath10k: failed to disassociate station: 00:0e:8e:ae:5c:1c vdev 0: -11
[...]
It seems firmware stopped replenishing WMI-HTC Tx credits. It's most
likely not the mgmt-related tx credit starvation but instead
communication with the device is really broken.
> Sometimes but not allways there is the message "firmware crashed!" in dmesg but it doesn't matter which error message it actually is:
> The behavior is allways the same. The card stops working until reboot. Unloading/reloading of ath10k_pci, ath10k_core, ath doesn't help in this case.
> The basic problems of all error messages I saw by now is a broken link between the cards firmware and the ath10k-driver.
> Depending on the point in time this "connection loss" happens the error messages are a little bit different,
> as they are strongly connected to the current state of the driver while it is trying to talk to the cards firmware via WMI.
>
> If you try to reproduce you have to wait between 3 and 60 Minutes to see the crash. You can increase the likelyhood for crashing by increasing
> the number of wifi traffic on foreign networks at the same channel.
> I testet with four laptops that are connected to four QCA988x cards (AP-mode). This takes around 3-10 minutes to get it reproduced.
>
> If you need more information I'm at your disposal.
It'd be nice to know what firmware you're using. Generally I would
discourage from using 999.999.0.636 because it's very old.
Michał
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-02-23 13:32 ` Michal Kazior
@ 2015-02-23 13:44 ` Fabian Wittenberg
2015-02-23 14:20 ` Michal Kazior
0 siblings, 1 reply; 16+ messages in thread
From: Fabian Wittenberg @ 2015-02-23 13:44 UTC (permalink / raw)
To: Michal Kazior; +Cc: ath10k
Hi Michal,
I used firmware version 10.1 and 10.2 from here:
https://github.com/kvalo/ath10k-firmware. Both show the same behavior.
You are right. There are some BIOS that do strange handling of this
cstate stuff but we have no influence on the BIOS as this is done by
our hardware vendor. We experimented a lot with the MSI masking bit of
the pci-e root bridge where the ac-card is connected to.
There were no remarkable improvements playing around with this bit.
We have tested the same boards with cards that need ath9k as well. They
are working just fine. With and without enabled INTEL_IDLE...
Regards,
Fabian
Am 23.02.2015 um 14:32 schrieb Michal Kazior:
> On 23 February 2015 at 14:08, Fabian Wittenberg
> <Fabian.Wittenberg@sophos.com> wrote:
>> Hi@all,
>>
>> we are using the brand new QCA988x chipset based on mini-PCIe cards in our newest wifi enabled firewall appliance and we have had
>> a lot of problems to get it running (Intel Rangeley platform; Intel(R) Atom(TM) CPU C2558 @ 2.40GHz).
> I recall one guy complained his Atom-based laptop wasn't happy running
> ath10k either but I think it was some electrical incompatibility and
> the machine didn't even POST when the card was plugged into mPCIe
> slot.
>
>
>> The card crashed after some minutes using ath10k-driver (backports-3.19-rc1). Older versions are affected as well.
>> At least down to 3.12.20. I did intensive debugging and found out, that there
>> are major issues as soon as Intels processor cstates are used. This
>> option is called "CONFIG_INTEL_IDLE" in kernel config. This seems to be
>> a very heavy issue as it even can lead to low memory corruption and
>> kernel freezes. Low memory corruption doesn't occure always; just sometimes. This makes it hard to debug.
>> Also you need a multi processor system to trigger the issue.
>> If you set kernel parameter "maxcpus=1" the error doesn't occure even if you enable CONFIG_INTEL_IDLE.
> Through a quick search I've found this:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=715485
>
> It looks like some BIOSes can have buggy C-state handling. Maybe
> that's the root cause? From my experience QCA988x can be sometimes
> quirky when it comes to PCIe so I wouldn't be surprised if other
> devices don't crash.
>
>
>> Kernel output looks like this if the card stops working:
>>
>>
>> [ 3715.145865] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
>> [ 3715.145876] wifi1: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
>> [ 3718.148226] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
>> [ 3718.148236] wifi1: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
>> [ 3723.152167] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
>> [ 3723.152178] wifi0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
>> [ 3723.152185] ath10k: failed to transmit management frame via WMI: -11
>> [ 3726.154524] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
>> [ 3726.154535] wifi0: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
>> [ 3729.156884] ath10k: failed to install key for vdev 0 peer 00:0e:8e:ae:5c:1c: -11
>> [ 3729.156890] ath10k: failed to transmit management frame via WMI: -11
>> [ 3729.156904] wifi0: failed to remove key (0, 00:0e:8e:ae:5c:1c) from hardware (-11)
>> [ 3732.159255] ath10k: failed to remove peer wep key 0: -11
>> [ 3732.159265] ath10k: failed to clear all peer wep keys for vdev 0: -11
>> [ 3732.159273] ath10k: failed to disassociate station: 00:0e:8e:ae:5c:1c vdev 0: -11
> [...]
>
> It seems firmware stopped replenishing WMI-HTC Tx credits. It's most
> likely not the mgmt-related tx credit starvation but instead
> communication with the device is really broken.
>
>
>> Sometimes but not allways there is the message "firmware crashed!" in dmesg but it doesn't matter which error message it actually is:
>> The behavior is allways the same. The card stops working until reboot. Unloading/reloading of ath10k_pci, ath10k_core, ath doesn't help in this case.
>> The basic problems of all error messages I saw by now is a broken link between the cards firmware and the ath10k-driver.
>> Depending on the point in time this "connection loss" happens the error messages are a little bit different,
>> as they are strongly connected to the current state of the driver while it is trying to talk to the cards firmware via WMI.
>>
>> If you try to reproduce you have to wait between 3 and 60 Minutes to see the crash. You can increase the likelyhood for crashing by increasing
>> the number of wifi traffic on foreign networks at the same channel.
>> I testet with four laptops that are connected to four QCA988x cards (AP-mode). This takes around 3-10 minutes to get it reproduced.
>>
>> If you need more information I'm at your disposal.
> It'd be nice to know what firmware you're using. Generally I would
> discourage from using 999.999.0.636 because it's very old.
>
>
> Michał
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-02-23 13:44 ` Fabian Wittenberg
@ 2015-02-23 14:20 ` Michal Kazior
2015-02-23 14:41 ` Fabian Wittenberg
0 siblings, 1 reply; 16+ messages in thread
From: Michal Kazior @ 2015-02-23 14:20 UTC (permalink / raw)
To: Fabian Wittenberg; +Cc: ath10k@lists.infradead.org
On 23 February 2015 at 14:44, Fabian Wittenberg
<Fabian.Wittenberg@sophos.com> wrote:
> Hi Michal,
>
> I used firmware version 10.1 and 10.2 from here:
> https://github.com/kvalo/ath10k-firmware. Both show the same behavior.
>
> You are right. There are some BIOS that do strange handling of this
> cstate stuff but we have no influence on the BIOS as this is done by
> our hardware vendor. We experimented a lot with the MSI masking bit of
> the pci-e root bridge where the ac-card is connected to.
> There were no remarkable improvements playing around with this bit.
If you don't have BIOS/UEFI upgrade available you can try appending
`intel_idle.max_cstate=0` to kernel boot parameters. Keep in mind this
will disable CPU power management.
> We have tested the same boards with cards that need ath9k as well. They
> are working just fine. With and without enabled INTEL_IDLE...
I run with my laptops (i5-3320M and i5-2520M) with INTEL_IDLE and QCA988x fine.
I'm not really an expert on this stuff but since C-state alter some
voltages perhaps the Atom SoC has some deviations/instabilities in
voltages which prevent some quirky devices like QCA988x from working
reliably.
Michał
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-02-23 14:20 ` Michal Kazior
@ 2015-02-23 14:41 ` Fabian Wittenberg
2015-03-02 12:20 ` Michal Kazior
0 siblings, 1 reply; 16+ messages in thread
From: Fabian Wittenberg @ 2015-02-23 14:41 UTC (permalink / raw)
To: Michal Kazior; +Cc: ath10k@lists.infradead.org
Hi Michal,
I already did this approach. This works fine and is the current
workaround to get the product out, but I would like to know what the
basic problem is.
The power consumption increases by ~1.25W on idle devices if you disable
cstates. This is not a real problem but a low mem corruption is one.
So I assume a bug in the ath10k-driver/firmware.
Regards,
Fabian
Am 23.02.2015 um 15:20 schrieb Michal Kazior:
> On 23 February 2015 at 14:44, Fabian Wittenberg
> <Fabian.Wittenberg@sophos.com> wrote:
>> Hi Michal,
>>
>> I used firmware version 10.1 and 10.2 from here:
>> https://github.com/kvalo/ath10k-firmware. Both show the same behavior.
>>
>> You are right. There are some BIOS that do strange handling of this
>> cstate stuff but we have no influence on the BIOS as this is done by
>> our hardware vendor. We experimented a lot with the MSI masking bit of
>> the pci-e root bridge where the ac-card is connected to.
>> There were no remarkable improvements playing around with this bit.
> If you don't have BIOS/UEFI upgrade available you can try appending
> `intel_idle.max_cstate=0` to kernel boot parameters. Keep in mind this
> will disable CPU power management.
>
>
>> We have tested the same boards with cards that need ath9k as well. They
>> are working just fine. With and without enabled INTEL_IDLE...
> I run with my laptops (i5-3320M and i5-2520M) with INTEL_IDLE and QCA988x fine.
>
> I'm not really an expert on this stuff but since C-state alter some
> voltages perhaps the Atom SoC has some deviations/instabilities in
> voltages which prevent some quirky devices like QCA988x from working
> reliably.
>
>
> Michał
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-02-23 14:41 ` Fabian Wittenberg
@ 2015-03-02 12:20 ` Michal Kazior
2015-03-19 9:20 ` Fabian Wittenberg
0 siblings, 1 reply; 16+ messages in thread
From: Michal Kazior @ 2015-03-02 12:20 UTC (permalink / raw)
To: Fabian Wittenberg; +Cc: ath10k@lists.infradead.org
On 23 February 2015 at 15:41, Fabian Wittenberg
<Fabian.Wittenberg@sophos.com> wrote:
> Hi Michal,
>
> I already did this approach. This works fine and is the current
> workaround to get the product out, but I would like to know what the
> basic problem is.
> The power consumption increases by ~1.25W on idle devices if you disable
> cstates. This is not a real problem but a low mem corruption is one.
> So I assume a bug in the ath10k-driver/firmware.
Hi Fabian,
Can you try the following diff with _INTEL_IDLE=y, please?
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -2531,6 +2531,11 @@ static int ath10k_pci_claim(struct ath10k *ar)
pci_set_master(pdev);
+ /* Disable RETRY_TIMEOUT register to prevent PCI Tx retries from
+ * interfering with C3 CPU state.
+ */
+ pci_write_config_byte(pdev, 0x41, 0);
+
/* Workaround: Disable ASPM */
pci_read_config_dword(pdev, 0x80, &lcr_val);
pci_write_config_dword(pdev, 0x80, (lcr_val & 0xffffff00));
Michał
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-03-02 12:20 ` Michal Kazior
@ 2015-03-19 9:20 ` Fabian Wittenberg
2015-03-19 15:44 ` Adrian Chadd
0 siblings, 1 reply; 16+ messages in thread
From: Fabian Wittenberg @ 2015-03-19 9:20 UTC (permalink / raw)
To: Michal Kazior; +Cc: ath10k@lists.infradead.org
Hi Michal,
thank you for the patch. Unfortunately I didn't had time until yesterday
to test it.
This patch has nearly no influence in the reported behaviour. As soon as
I enable intel_idle
the firmware stops working.
We also did even more intense testing and figured out that even with
disabled intel_idle
sometimes the card stops working in AP mode after a long time period or
putting really heavy load.
Normally SWBA overruns are reported slightly before the cards
stops responding.
If we disable
CONFIG_IRQ_DOMAIN=n
CONFIG_IRQ_DOMAIN_DEBUG=n
CONFIG_PM_RUNTIME=n
the crash still occure after 1-3 days but the card is further working.
At the moment we are invastigating the behavior with disabled
hibernation and cpu_idle.
For now it seems to work. But this could change in the next days...
This problem drives me really crazy.
Regards,
Fabian
Am 02.03.2015 um 13:20 schrieb Michal Kazior:
> On 23 February 2015 at 15:41, Fabian Wittenberg
> <Fabian.Wittenberg@sophos.com> wrote:
>> Hi Michal,
>>
>> I already did this approach. This works fine and is the current
>> workaround to get the product out, but I would like to know what the
>> basic problem is.
>> The power consumption increases by ~1.25W on idle devices if you disable
>> cstates. This is not a real problem but a low mem corruption is one.
>> So I assume a bug in the ath10k-driver/firmware.
> Hi Fabian,
>
> Can you try the following diff with _INTEL_IDLE=y, please?
>
> --- a/drivers/net/wireless/ath/ath10k/pci.c
> +++ b/drivers/net/wireless/ath/ath10k/pci.c
> @@ -2531,6 +2531,11 @@ static int ath10k_pci_claim(struct ath10k *ar)
>
> pci_set_master(pdev);
>
> + /* Disable RETRY_TIMEOUT register to prevent PCI Tx retries from
> + * interfering with C3 CPU state.
> + */
> + pci_write_config_byte(pdev, 0x41, 0);
> +
> /* Workaround: Disable ASPM */
> pci_read_config_dword(pdev, 0x80, &lcr_val);
> pci_write_config_dword(pdev, 0x80, (lcr_val & 0xffffff00));
>
>
> Michał
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-03-19 9:20 ` Fabian Wittenberg
@ 2015-03-19 15:44 ` Adrian Chadd
2015-03-19 15:57 ` Fabian Wittenberg
0 siblings, 1 reply; 16+ messages in thread
From: Adrian Chadd @ 2015-03-19 15:44 UTC (permalink / raw)
To: Fabian Wittenberg; +Cc: Michal Kazior, ath10k@lists.infradead.org
It's possible that you're entering a sleep state, the whole socket +
dram controller is going to sleep, and the latency that the wakeup
causes is confusing the firmware and/or DMA engine.
-adrian
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-03-19 15:44 ` Adrian Chadd
@ 2015-03-19 15:57 ` Fabian Wittenberg
2015-03-19 16:05 ` Adrian Chadd
0 siblings, 1 reply; 16+ messages in thread
From: Fabian Wittenberg @ 2015-03-19 15:57 UTC (permalink / raw)
To: Adrian Chadd; +Cc: Michal Kazior, ath10k@lists.infradead.org
Yes, I guessed something like that but this should be a firmwarebug :-\
I'm quiet surprized that nowbody else has this problem!?
There are so many configuration constellations that trigger this...
Fabian
Am 19.03.2015 um 16:44 schrieb Adrian Chadd:
> It's possible that you're entering a sleep state, the whole socket +
> dram controller is going to sleep, and the latency that the wakeup
> causes is confusing the firmware and/or DMA engine.
>
>
>
> -adrian
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-03-19 15:57 ` Fabian Wittenberg
@ 2015-03-19 16:05 ` Adrian Chadd
2015-03-19 16:18 ` Fabian Wittenberg
2015-03-20 10:46 ` Fabian Wittenberg
0 siblings, 2 replies; 16+ messages in thread
From: Adrian Chadd @ 2015-03-19 16:05 UTC (permalink / raw)
To: Fabian Wittenberg; +Cc: Michal Kazior, ath10k@lists.infradead.org
On 19 March 2015 at 08:57, Fabian Wittenberg
<Fabian.Wittenberg@sophos.com> wrote:
> Yes, I guessed something like that but this should be a firmwarebug :-\
> I'm quiet surprized that nowbody else has this problem!?
> There are so many configuration constellations that trigger this...
The sleep depth / time that a socket-sleep state can take to wakeup to
do DMA is highly variable. It's based on chipset, BIOS and sleep
settings.
IIRC the ath10k firmware wasn't really debugged with hostap-on-intel
as a supported option, with all the varying things there. So yeah,
someone with more detailed DMA/PCIe bridge documentation for QCA988x
is going to have to dig into the DMA register settings to see what's
going on. Maybe it's just exceeding the transaction timeout and that
should be easy to fix.
(I currently don't have all of the register documentation for the
QCA988x as I do for the pre-11ac chips.)
adrian
> Fabian
>
> Am 19.03.2015 um 16:44 schrieb Adrian Chadd:
>> It's possible that you're entering a sleep state, the whole socket +
>> dram controller is going to sleep, and the latency that the wakeup
>> causes is confusing the firmware and/or DMA engine.
>>
>>
>>
>> -adrian
>
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-03-19 16:05 ` Adrian Chadd
@ 2015-03-19 16:18 ` Fabian Wittenberg
2015-03-19 16:23 ` Adrian Chadd
2015-03-20 10:46 ` Fabian Wittenberg
1 sibling, 1 reply; 16+ messages in thread
From: Fabian Wittenberg @ 2015-03-19 16:18 UTC (permalink / raw)
To: Adrian Chadd; +Cc: Michal Kazior, ath10k@lists.infradead.org
I don't have them either even though we have a NDA with QCA.
There seem to be several NDA steps at QCA. It's really hard to get these
papers.
It's a pain in the but...
Regards,
Fabian
Am 19.03.2015 um 17:05 schrieb Adrian Chadd:
> On 19 March 2015 at 08:57, Fabian Wittenberg
> <Fabian.Wittenberg@sophos.com> wrote:
>> Yes, I guessed something like that but this should be a firmwarebug :-\
>> I'm quiet surprized that nowbody else has this problem!?
>> There are so many configuration constellations that trigger this...
> The sleep depth / time that a socket-sleep state can take to wakeup to
> do DMA is highly variable. It's based on chipset, BIOS and sleep
> settings.
>
> IIRC the ath10k firmware wasn't really debugged with hostap-on-intel
> as a supported option, with all the varying things there. So yeah,
> someone with more detailed DMA/PCIe bridge documentation for QCA988x
> is going to have to dig into the DMA register settings to see what's
> going on. Maybe it's just exceeding the transaction timeout and that
> should be easy to fix.
>
> (I currently don't have all of the register documentation for the
> QCA988x as I do for the pre-11ac chips.)
>
>
> adrian
>
>> Fabian
>>
>> Am 19.03.2015 um 16:44 schrieb Adrian Chadd:
>>> It's possible that you're entering a sleep state, the whole socket +
>>> dram controller is going to sleep, and the latency that the wakeup
>>> causes is confusing the firmware and/or DMA engine.
>>>
>>>
>>>
>>> -adrian
>>
>> _______________________________________________
>> ath10k mailing list
>> ath10k@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/ath10k
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-03-19 16:05 ` Adrian Chadd
2015-03-19 16:18 ` Fabian Wittenberg
@ 2015-03-20 10:46 ` Fabian Wittenberg
1 sibling, 0 replies; 16+ messages in thread
From: Fabian Wittenberg @ 2015-03-20 10:46 UTC (permalink / raw)
To: Adrian Chadd; +Cc: Michal Kazior, ath10k@lists.infradead.org
Today we encountered a similar issue on a QCA9558 as well.
However its really rare to see it on this chipset.
This is a SoC with MIPS architecture. Widely used on access points.
I really think you could be right with your quess.
But that should be a QCA task as its really related to their h/w.
We are using backports/ath10k for the PCI card and the SoC.
Regards,
Fabian
Am 19.03.2015 um 17:05 schrieb Adrian Chadd:
> On 19 March 2015 at 08:57, Fabian Wittenberg
> <Fabian.Wittenberg@sophos.com> wrote:
>> Yes, I guessed something like that but this should be a firmwarebug :-\
>> I'm quiet surprized that nowbody else has this problem!?
>> There are so many configuration constellations that trigger this...
> The sleep depth / time that a socket-sleep state can take to wakeup to
> do DMA is highly variable. It's based on chipset, BIOS and sleep
> settings.
>
> IIRC the ath10k firmware wasn't really debugged with hostap-on-intel
> as a supported option, with all the varying things there. So yeah,
> someone with more detailed DMA/PCIe bridge documentation for QCA988x
> is going to have to dig into the DMA register settings to see what's
> going on. Maybe it's just exceeding the transaction timeout and that
> should be easy to fix.
>
> (I currently don't have all of the register documentation for the
> QCA988x as I do for the pre-11ac chips.)
>
>
> adrian
>
>> Fabian
>>
>> Am 19.03.2015 um 16:44 schrieb Adrian Chadd:
>>> It's possible that you're entering a sleep state, the whole socket +
>>> dram controller is going to sleep, and the latency that the wakeup
>>> causes is confusing the firmware and/or DMA engine.
>>>
>>>
>>>
>>> -adrian
>>
>> _______________________________________________
>> ath10k mailing list
>> ath10k@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/ath10k
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-02-23 13:08 ath10k + INTEL_IDLE aka. cstates == firmware crash Fabian Wittenberg
2015-02-23 13:32 ` Michal Kazior
@ 2015-02-23 16:58 ` Ben Greear
2015-03-08 13:45 ` Jeremias Blendin
2 siblings, 0 replies; 16+ messages in thread
From: Ben Greear @ 2015-02-23 16:58 UTC (permalink / raw)
To: Fabian Wittenberg; +Cc: ath10k
On 02/23/2015 05:08 AM, Fabian Wittenberg wrote:
> Hi@all,
>
> we are using the brand new QCA988x chipset based on mini-PCIe cards in our newest wifi enabled firewall appliance and we have had
> a lot of problems to get it running (Intel Rangeley platform; Intel(R) Atom(TM) CPU C2558 @ 2.40GHz).
> The card crashed after some minutes using ath10k-driver (backports-3.19-rc1). Older versions are affected as well.
> At least down to 3.12.20. I did intensive debugging and found out, that there
> are major issues as soon as Intels processor cstates are used. This
> option is called "CONFIG_INTEL_IDLE" in kernel config. This seems to be
> a very heavy issue as it even can lead to low memory corruption and
> kernel freezes. Low memory corruption doesn't occure always; just sometimes. This makes it hard to debug.
> Also you need a multi processor system to trigger the issue.
> If you set kernel parameter "maxcpus=1" the error doesn't occure even if you enable CONFIG_INTEL_IDLE.
> Kernel output looks like this if the card stops working:
If you want, try using my CT firmware. If you can crash it, send me the kernel
stack dump and I'll try to see if I can figure out what is crashing.
We do see WMI hangs in some cases (probably due to stuck WMI mgt frames). If you want to
patch your driver with my patches, then my firmware might give some extra
debug info if/when it crashes.
http://www.candelatech.com/ath10k.php
We have also seen at least one case where the firmware/NIC reported the equivalent
of DMA engine errors and shortly after the host dereferenced a null pointer. I have
not been able to get debug info to figure out the stack dump for that yet, however.
Thanks,
Ben
>
>
> [ 3715.145865] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
>
> [ 3715.145876] wifi1: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
>
> [ 3718.148226] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
>
> [ 3718.148236] wifi1: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
>
> [ 3723.152167] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
>
> [ 3723.152178] wifi0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
>
> [ 3723.152185] ath10k: failed to transmit management frame via WMI: -11
>
> [ 3726.154524] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
>
> [ 3726.154535] wifi0: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
>
> [ 3729.156884] ath10k: failed to install key for vdev 0 peer 00:0e:8e:ae:5c:1c: -11
>
> [ 3729.156890] ath10k: failed to transmit management frame via WMI: -11
>
> [ 3729.156904] wifi0: failed to remove key (0, 00:0e:8e:ae:5c:1c) from hardware (-11)
>
> [ 3732.159255] ath10k: failed to remove peer wep key 0: -11
>
> [ 3732.159265] ath10k: failed to clear all peer wep keys for vdev 0: -11
>
> [ 3732.159273] ath10k: failed to disassociate station: 00:0e:8e:ae:5c:1c vdev 0: -11
>
> [ 3732.159278] ------------[ cut here ]------------
>
> [ 3732.159317] WARNING: CPU: 1 PID: 5813 at
> /usr/src/packages/BUILD/kernel-smp-3.12.20/modules-3.12.20/backports/net/mac80211/sta_info.c:885
> __sta_info_destroy_part2+0x4f/0xde [mac80211]()
>
> [ 3732.159322] Modules linked in: sr_mod cdrom xt_multidev xt_connmark
> xt_REDIRECT ipt_MASQUERADE xt_policy xt_set xt_multiport xt_addrtype
> ip_set_hash_ip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_ftp
> nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_irc
> nf_conntrack_ftp ctr aesni_intel ablk_helper cryptd lrw aes_i586 xts
> gf128mul aes_generic ebtable_filter ebtables bridge stp llc af_packet
> redv2_netlink(O) ip6table_ips ip6table_mangle ip6table_nat nf_nat_ipv6
> iptable_ips iptable_mangle iptable_nat nf_nat_ipv4 nf_nat xt_NFLOG
> xt_condition(O) xt_tcpudp xt_logmark xt_confirmed xt_owner ip6t_REJECT
> ipt_REJECT xt_state ip_set red2(O) ip_scheduler red nfnetlink_log
> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter iptable_raw
> xt_CT nf_conntrack_netlink nfnetlink nf_conntrack ip6_tables ip_tables
> x_tables ipv6 loop arc4 ath10k_pci(O) ath10k_core(O) mac80211(O) ath(O)
> cfg80211(O) ehci_pci evdev igb(O) rfkill sg ehci_hcd rtc_cmos pcspkr
> acpi_cpufreq i2c_i801 i2c_ismt button compat(O) dca sd_mod processor
> thermal_sys hwmon edd ahci libahci libata scsi_mod hid_generic usbhid
>
>
> Sometimes but not allways there is the message "firmware crashed!" in dmesg but it doesn't matter which error message it actually is:
> The behavior is allways the same. The card stops working until reboot. Unloading/reloading of ath10k_pci, ath10k_core, ath doesn't help in this case.
> The basic problems of all error messages I saw by now is a broken link between the cards firmware and the ath10k-driver.
> Depending on the point in time this "connection loss" happens the error messages are a little bit different,
> as they are strongly connected to the current state of the driver while it is trying to talk to the cards firmware via WMI.
>
> If you try to reproduce you have to wait between 3 and 60 Minutes to see the crash. You can increase the likelyhood for crashing by increasing
> the number of wifi traffic on foreign networks at the same channel.
> I testet with four laptops that are connected to four QCA988x cards (AP-mode). This takes around 3-10 minutes to get it reproduced.
>
> If you need more information I'm at your disposal.
>
> Regards,
> Fabian Wittenberg
>
>
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
>
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-02-23 13:08 ath10k + INTEL_IDLE aka. cstates == firmware crash Fabian Wittenberg
2015-02-23 13:32 ` Michal Kazior
2015-02-23 16:58 ` Ben Greear
@ 2015-03-08 13:45 ` Jeremias Blendin
2015-03-08 18:27 ` Ben Greear
2 siblings, 1 reply; 16+ messages in thread
From: Jeremias Blendin @ 2015-03-08 13:45 UTC (permalink / raw)
To: Fabian Wittenberg; +Cc: ath10k@lists.infradead.org
Hi,
a small update on the issue. It seems I experience the same issue as
Fabian, on a similar Intel Atom system. I have not yet added the fix
for the issue proposed on this list.
However, I also experience the issue with CONFIG_INTEL_IDLE disabled
and a single CPU
core enabled, using maxcpus=1. Still, it takes much, much longer for
the error to occur.
Here is the crash info (unfortunately I haven't had the time yet to
install the candela kernel,
which might report more details):
[160447.707659] ath10k_pci 0000:04:00.0: SWBA overrun on vdev 0
[160447.810144] ath10k_pci 0000:04:00.0: SWBA overrun on vdev 0
[160447.912619] ath10k_pci 0000:04:00.0: SWBA overrun on vdev 0
[160449.822016] wlan1: failed to remove key (0, xx:xx:xx:xx:xx:xx)
from hardware (-11)
[160449.822148] ------------[ cut here ]------------
[160449.822170] WARNING: CPU: 0 PID: 2195 at
/home/xxx/install/linux-3.18.0/net/mac80211/sta_info.c:886
__sta_info_destroy_part2+0x136/0x2b0 [mac80211]()
[160449.822173] Modules linked in: ctr ccm arc4 openvswitch geneve gre
vxlan ip6_udp_tunnel udp_tunnel libcrc32c gpio_ich coretemp kvm_intel
ath10k_pci ath10k_core kvm ath crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel mac80211 aesni_intel aes_x86_64 lrw gf128mul
glue_helper ablk_helper cryptd ast lpc_ich ttm drm_kms_helper drm
syscopyarea joydev nls_iso8859_1 cfg80211 sysfillrect sysimgblt
ipmi_si 8250_fintek ipmi_msghandler mac_hid i2c_ismt shpchp btrfs xor
raid6_pq uas usb_storage hid_generic usbhid hid igb i2c_algo_bit ahci
libahci dca ptp pps_core
[160449.822221] CPU: 0 PID: 2195 Comm: hostapd Not tainted 3.18.0-13-generic #14
[160449.822223] Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.0c 02/27/2014
[160449.822225] 0000000000000009 ffff880468a73908 ffffffff817aa408
0000000000000007
[160449.822230] 0000000000000000 ffff880468a73948 ffffffff81074921
0000000368a73958
[160449.822233] ffff88044d9cc800 ffff880467b14680 ffff8804672608c0
ffff880467260000
[160449.822237] Call Trace:
[160449.822246] [<ffffffff817aa408>] dump_stack+0x46/0x58
[160449.822251] [<ffffffff81074921>] warn_slowpath_common+0x81/0xa0
[160449.822255] [<ffffffff810749fa>] warn_slowpath_null+0x1a/0x20
[160449.822268] [<ffffffffc055b5e6>]
__sta_info_destroy_part2+0x136/0x2b0 [mac80211]
[160449.822282] [<ffffffffc055b78a>] __sta_info_destroy+0x2a/0x40 [mac80211]
[160449.822296] [<ffffffffc055b838>]
sta_info_destroy_addr_bss+0x38/0x60 [mac80211]
[160449.822313] [<ffffffffc057076d>] ieee80211_del_station+0x1d/0x30 [mac80211]
[160449.822330] [<ffffffffc040b6dc>] nl80211_del_station+0x7c/0x130 [cfg80211]
[160449.822336] [<ffffffff816d762a>] genl_family_rcv_msg+0x19a/0x390
[160449.822341] [<ffffffff816d7820>] ? genl_family_rcv_msg+0x390/0x390
[160449.822345] [<ffffffff816d7899>] genl_rcv_msg+0x79/0xc0
[160449.822348] [<ffffffff816d6ee9>] netlink_rcv_skb+0xb9/0xe0
[160449.822352] [<ffffffff816d747c>] genl_rcv+0x2c/0x40
[160449.822355] [<ffffffff816d6621>] netlink_unicast+0x111/0x1b0
[160449.822359] [<ffffffff816d69ca>] netlink_sendmsg+0x30a/0x650
[160449.822364] [<ffffffff8135ba71>] ? aa_sk_perm.isra.4+0x71/0x170
[160449.822369] [<ffffffff8168b4e3>] sock_sendmsg+0x93/0xd0
[160449.822374] [<ffffffff8108c046>] ? __queue_work+0x136/0x330
[160449.822378] [<ffffffff8168b1be>] ? move_addr_to_kernel.part.20+0x1e/0x70
[160449.822382] [<ffffffff8168c0f1>] ? move_addr_to_kernel+0x21/0x30
[160449.822386] [<ffffffff81699ea7>] ? verify_iovec+0x47/0xd0
[160449.822390] [<ffffffff8168b980>] ___sys_sendmsg+0x410/0x420
[160449.822395] [<ffffffff8120e3cc>] ? destroy_inode+0x3c/0x70
[160449.822399] [<ffffffff8120e51f>] ? evict+0x11f/0x1b0
[160449.822403] [<ffffffff812091df>] ? dentry_free+0x5f/0xb0
[160449.822407] [<ffffffff81209b65>] ? __dentry_kill+0x155/0x200
[160449.822411] [<ffffffff81209d90>] ? dput+0x180/0x1c0
[160449.822415] [<ffffffff81213114>] ? mntput+0x24/0x40
[160449.822420] [<ffffffff811f39f0>] ? __fput+0x190/0x240
[160449.822424] [<ffffffff8168c7d2>] __sys_sendmsg+0x42/0x80
[160449.822427] [<ffffffff8168c822>] SyS_sendmsg+0x12/0x20
[160449.822432] [<ffffffff817b1c6d>] system_call_fastpath+0x16/0x1b
[160449.822435] ---[ end trace b1009dc2519db816 ]---
[160452.114371] ath10k_warn: 45 callbacks suppressed
[160452.114384] ath10k_pci 0000:04:00.0: SWBA overrun on vdev 0
....
[208686.051467] ath10k_pci 0000:04:00.0: failed to delete peer
xx:xx:xx:xx:xx:xx for vdev 0: -110
....
and finally:
[388206.713817] ath10k_pci 0000:04:00.0: number of peers exceeded:
peers number 127 (max peers 127)
2015-02-23 14:08 GMT+01:00 Fabian Wittenberg <Fabian.Wittenberg@sophos.com>:
> Hi@all,
>
> we are using the brand new QCA988x chipset based on mini-PCIe cards in our newest wifi enabled firewall appliance and we have had
> a lot of problems to get it running (Intel Rangeley platform; Intel(R) Atom(TM) CPU C2558 @ 2.40GHz).
> The card crashed after some minutes using ath10k-driver (backports-3.19-rc1). Older versions are affected as well.
> At least down to 3.12.20. I did intensive debugging and found out, that there
> are major issues as soon as Intels processor cstates are used. This
> option is called "CONFIG_INTEL_IDLE" in kernel config. This seems to be
> a very heavy issue as it even can lead to low memory corruption and
> kernel freezes. Low memory corruption doesn't occure always; just sometimes. This makes it hard to debug.
> Also you need a multi processor system to trigger the issue.
> If you set kernel parameter "maxcpus=1" the error doesn't occure even if you enable CONFIG_INTEL_IDLE.
> Kernel output looks like this if the card stops working:
>
>
> [ 3715.145865] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
>
> [ 3715.145876] wifi1: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
>
> [ 3718.148226] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
>
> [ 3718.148236] wifi1: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
>
> [ 3723.152167] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
>
> [ 3723.152178] wifi0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
>
> [ 3723.152185] ath10k: failed to transmit management frame via WMI: -11
>
> [ 3726.154524] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
>
> [ 3726.154535] wifi0: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
>
> [ 3729.156884] ath10k: failed to install key for vdev 0 peer 00:0e:8e:ae:5c:1c: -11
>
> [ 3729.156890] ath10k: failed to transmit management frame via WMI: -11
>
> [ 3729.156904] wifi0: failed to remove key (0, 00:0e:8e:ae:5c:1c) from hardware (-11)
>
> [ 3732.159255] ath10k: failed to remove peer wep key 0: -11
>
> [ 3732.159265] ath10k: failed to clear all peer wep keys for vdev 0: -11
>
> [ 3732.159273] ath10k: failed to disassociate station: 00:0e:8e:ae:5c:1c vdev 0: -11
>
> [ 3732.159278] ------------[ cut here ]------------
>
> [ 3732.159317] WARNING: CPU: 1 PID: 5813 at
> /usr/src/packages/BUILD/kernel-smp-3.12.20/modules-3.12.20/backports/net/mac80211/sta_info.c:885
> __sta_info_destroy_part2+0x4f/0xde [mac80211]()
>
> [ 3732.159322] Modules linked in: sr_mod cdrom xt_multidev xt_connmark
> xt_REDIRECT ipt_MASQUERADE xt_policy xt_set xt_multiport xt_addrtype
> ip_set_hash_ip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_ftp
> nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_irc
> nf_conntrack_ftp ctr aesni_intel ablk_helper cryptd lrw aes_i586 xts
> gf128mul aes_generic ebtable_filter ebtables bridge stp llc af_packet
> redv2_netlink(O) ip6table_ips ip6table_mangle ip6table_nat nf_nat_ipv6
> iptable_ips iptable_mangle iptable_nat nf_nat_ipv4 nf_nat xt_NFLOG
> xt_condition(O) xt_tcpudp xt_logmark xt_confirmed xt_owner ip6t_REJECT
> ipt_REJECT xt_state ip_set red2(O) ip_scheduler red nfnetlink_log
> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter iptable_raw
> xt_CT nf_conntrack_netlink nfnetlink nf_conntrack ip6_tables ip_tables
> x_tables ipv6 loop arc4 ath10k_pci(O) ath10k_core(O) mac80211(O) ath(O)
> cfg80211(O) ehci_pci evdev igb(O) rfkill sg ehci_hcd rtc_cmos pcspkr
> acpi_cpufreq i2c_i801 i2c_ismt button compat(O) dca sd_mod processor
> thermal_sys hwmon edd ahci libahci libata scsi_mod hid_generic usbhid
>
>
> Sometimes but not allways there is the message "firmware crashed!" in dmesg but it doesn't matter which error message it actually is:
> The behavior is allways the same. The card stops working until reboot. Unloading/reloading of ath10k_pci, ath10k_core, ath doesn't help in this case.
> The basic problems of all error messages I saw by now is a broken link between the cards firmware and the ath10k-driver.
> Depending on the point in time this "connection loss" happens the error messages are a little bit different,
> as they are strongly connected to the current state of the driver while it is trying to talk to the cards firmware via WMI.
>
> If you try to reproduce you have to wait between 3 and 60 Minutes to see the crash. You can increase the likelyhood for crashing by increasing
> the number of wifi traffic on foreign networks at the same channel.
> I testet with four laptops that are connected to four QCA988x cards (AP-mode). This takes around 3-10 minutes to get it reproduced.
>
> If you need more information I'm at your disposal.
>
> Regards,
> Fabian Wittenberg
>
>
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
2015-03-08 13:45 ` Jeremias Blendin
@ 2015-03-08 18:27 ` Ben Greear
0 siblings, 0 replies; 16+ messages in thread
From: Ben Greear @ 2015-03-08 18:27 UTC (permalink / raw)
To: Jeremias Blendin, Fabian Wittenberg; +Cc: ath10k@lists.infradead.org
There is no particular crash here, but maybe the WMI transport
is hung. Possibly my firmware & kernel will help with that, or at least
help recover the system quicker by asserting in the firmware
if WMI is truly hung.
Thanks,
Ben
On 03/08/2015 06:45 AM, Jeremias Blendin wrote:
> Hi,
>
> a small update on the issue. It seems I experience the same issue as
> Fabian, on a similar Intel Atom system. I have not yet added the fix
> for the issue proposed on this list.
> However, I also experience the issue with CONFIG_INTEL_IDLE disabled
> and a single CPU
> core enabled, using maxcpus=1. Still, it takes much, much longer for
> the error to occur.
>
> Here is the crash info (unfortunately I haven't had the time yet to
> install the candela kernel,
> which might report more details):
>
> [160447.707659] ath10k_pci 0000:04:00.0: SWBA overrun on vdev 0
> [160447.810144] ath10k_pci 0000:04:00.0: SWBA overrun on vdev 0
> [160447.912619] ath10k_pci 0000:04:00.0: SWBA overrun on vdev 0
> [160449.822016] wlan1: failed to remove key (0, xx:xx:xx:xx:xx:xx)
> from hardware (-11)
> [160449.822148] ------------[ cut here ]------------
> [160449.822170] WARNING: CPU: 0 PID: 2195 at
> /home/xxx/install/linux-3.18.0/net/mac80211/sta_info.c:886
> __sta_info_destroy_part2+0x136/0x2b0 [mac80211]()
> [160449.822173] Modules linked in: ctr ccm arc4 openvswitch geneve gre
> vxlan ip6_udp_tunnel udp_tunnel libcrc32c gpio_ich coretemp kvm_intel
> ath10k_pci ath10k_core kvm ath crct10dif_pclmul crc32_pclmul
> ghash_clmulni_intel mac80211 aesni_intel aes_x86_64 lrw gf128mul
> glue_helper ablk_helper cryptd ast lpc_ich ttm drm_kms_helper drm
> syscopyarea joydev nls_iso8859_1 cfg80211 sysfillrect sysimgblt
> ipmi_si 8250_fintek ipmi_msghandler mac_hid i2c_ismt shpchp btrfs xor
> raid6_pq uas usb_storage hid_generic usbhid hid igb i2c_algo_bit ahci
> libahci dca ptp pps_core
> [160449.822221] CPU: 0 PID: 2195 Comm: hostapd Not tainted 3.18.0-13-generic #14
> [160449.822223] Hardware name: Supermicro A1SAi/A1SRi, BIOS 1.0c 02/27/2014
> [160449.822225] 0000000000000009 ffff880468a73908 ffffffff817aa408
> 0000000000000007
> [160449.822230] 0000000000000000 ffff880468a73948 ffffffff81074921
> 0000000368a73958
> [160449.822233] ffff88044d9cc800 ffff880467b14680 ffff8804672608c0
> ffff880467260000
> [160449.822237] Call Trace:
> [160449.822246] [<ffffffff817aa408>] dump_stack+0x46/0x58
> [160449.822251] [<ffffffff81074921>] warn_slowpath_common+0x81/0xa0
> [160449.822255] [<ffffffff810749fa>] warn_slowpath_null+0x1a/0x20
> [160449.822268] [<ffffffffc055b5e6>]
> __sta_info_destroy_part2+0x136/0x2b0 [mac80211]
> [160449.822282] [<ffffffffc055b78a>] __sta_info_destroy+0x2a/0x40 [mac80211]
> [160449.822296] [<ffffffffc055b838>]
> sta_info_destroy_addr_bss+0x38/0x60 [mac80211]
> [160449.822313] [<ffffffffc057076d>] ieee80211_del_station+0x1d/0x30 [mac80211]
> [160449.822330] [<ffffffffc040b6dc>] nl80211_del_station+0x7c/0x130 [cfg80211]
> [160449.822336] [<ffffffff816d762a>] genl_family_rcv_msg+0x19a/0x390
> [160449.822341] [<ffffffff816d7820>] ? genl_family_rcv_msg+0x390/0x390
> [160449.822345] [<ffffffff816d7899>] genl_rcv_msg+0x79/0xc0
> [160449.822348] [<ffffffff816d6ee9>] netlink_rcv_skb+0xb9/0xe0
> [160449.822352] [<ffffffff816d747c>] genl_rcv+0x2c/0x40
> [160449.822355] [<ffffffff816d6621>] netlink_unicast+0x111/0x1b0
> [160449.822359] [<ffffffff816d69ca>] netlink_sendmsg+0x30a/0x650
> [160449.822364] [<ffffffff8135ba71>] ? aa_sk_perm.isra.4+0x71/0x170
> [160449.822369] [<ffffffff8168b4e3>] sock_sendmsg+0x93/0xd0
> [160449.822374] [<ffffffff8108c046>] ? __queue_work+0x136/0x330
> [160449.822378] [<ffffffff8168b1be>] ? move_addr_to_kernel.part.20+0x1e/0x70
> [160449.822382] [<ffffffff8168c0f1>] ? move_addr_to_kernel+0x21/0x30
> [160449.822386] [<ffffffff81699ea7>] ? verify_iovec+0x47/0xd0
> [160449.822390] [<ffffffff8168b980>] ___sys_sendmsg+0x410/0x420
> [160449.822395] [<ffffffff8120e3cc>] ? destroy_inode+0x3c/0x70
> [160449.822399] [<ffffffff8120e51f>] ? evict+0x11f/0x1b0
> [160449.822403] [<ffffffff812091df>] ? dentry_free+0x5f/0xb0
> [160449.822407] [<ffffffff81209b65>] ? __dentry_kill+0x155/0x200
> [160449.822411] [<ffffffff81209d90>] ? dput+0x180/0x1c0
> [160449.822415] [<ffffffff81213114>] ? mntput+0x24/0x40
> [160449.822420] [<ffffffff811f39f0>] ? __fput+0x190/0x240
> [160449.822424] [<ffffffff8168c7d2>] __sys_sendmsg+0x42/0x80
> [160449.822427] [<ffffffff8168c822>] SyS_sendmsg+0x12/0x20
> [160449.822432] [<ffffffff817b1c6d>] system_call_fastpath+0x16/0x1b
> [160449.822435] ---[ end trace b1009dc2519db816 ]---
> [160452.114371] ath10k_warn: 45 callbacks suppressed
> [160452.114384] ath10k_pci 0000:04:00.0: SWBA overrun on vdev 0
> ....
> [208686.051467] ath10k_pci 0000:04:00.0: failed to delete peer
> xx:xx:xx:xx:xx:xx for vdev 0: -110
> ....
> and finally:
> [388206.713817] ath10k_pci 0000:04:00.0: number of peers exceeded:
> peers number 127 (max peers 127)
>
> 2015-02-23 14:08 GMT+01:00 Fabian Wittenberg <Fabian.Wittenberg@sophos.com>:
>> Hi@all,
>>
>> we are using the brand new QCA988x chipset based on mini-PCIe cards in our newest wifi enabled firewall appliance and we have had
>> a lot of problems to get it running (Intel Rangeley platform; Intel(R) Atom(TM) CPU C2558 @ 2.40GHz).
>> The card crashed after some minutes using ath10k-driver (backports-3.19-rc1). Older versions are affected as well.
>> At least down to 3.12.20. I did intensive debugging and found out, that there
>> are major issues as soon as Intels processor cstates are used. This
>> option is called "CONFIG_INTEL_IDLE" in kernel config. This seems to be
>> a very heavy issue as it even can lead to low memory corruption and
>> kernel freezes. Low memory corruption doesn't occure always; just sometimes. This makes it hard to debug.
>> Also you need a multi processor system to trigger the issue.
>> If you set kernel parameter "maxcpus=1" the error doesn't occure even if you enable CONFIG_INTEL_IDLE.
>> Kernel output looks like this if the card stops working:
>>
>>
>> [ 3715.145865] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
>>
>> [ 3715.145876] wifi1: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
>>
>> [ 3718.148226] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
>>
>> [ 3718.148236] wifi1: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
>>
>> [ 3723.152167] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
>>
>> [ 3723.152178] wifi0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
>>
>> [ 3723.152185] ath10k: failed to transmit management frame via WMI: -11
>>
>> [ 3726.154524] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
>>
>> [ 3726.154535] wifi0: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
>>
>> [ 3729.156884] ath10k: failed to install key for vdev 0 peer 00:0e:8e:ae:5c:1c: -11
>>
>> [ 3729.156890] ath10k: failed to transmit management frame via WMI: -11
>>
>> [ 3729.156904] wifi0: failed to remove key (0, 00:0e:8e:ae:5c:1c) from hardware (-11)
>>
>> [ 3732.159255] ath10k: failed to remove peer wep key 0: -11
>>
>> [ 3732.159265] ath10k: failed to clear all peer wep keys for vdev 0: -11
>>
>> [ 3732.159273] ath10k: failed to disassociate station: 00:0e:8e:ae:5c:1c vdev 0: -11
>>
>> [ 3732.159278] ------------[ cut here ]------------
>>
>> [ 3732.159317] WARNING: CPU: 1 PID: 5813 at
>> /usr/src/packages/BUILD/kernel-smp-3.12.20/modules-3.12.20/backports/net/mac80211/sta_info.c:885
>> __sta_info_destroy_part2+0x4f/0xde [mac80211]()
>>
>> [ 3732.159322] Modules linked in: sr_mod cdrom xt_multidev xt_connmark
>> xt_REDIRECT ipt_MASQUERADE xt_policy xt_set xt_multiport xt_addrtype
>> ip_set_hash_ip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_ftp
>> nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_irc
>> nf_conntrack_ftp ctr aesni_intel ablk_helper cryptd lrw aes_i586 xts
>> gf128mul aes_generic ebtable_filter ebtables bridge stp llc af_packet
>> redv2_netlink(O) ip6table_ips ip6table_mangle ip6table_nat nf_nat_ipv6
>> iptable_ips iptable_mangle iptable_nat nf_nat_ipv4 nf_nat xt_NFLOG
>> xt_condition(O) xt_tcpudp xt_logmark xt_confirmed xt_owner ip6t_REJECT
>> ipt_REJECT xt_state ip_set red2(O) ip_scheduler red nfnetlink_log
>> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw
>> nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter iptable_raw
>> xt_CT nf_conntrack_netlink nfnetlink nf_conntrack ip6_tables ip_tables
>> x_tables ipv6 loop arc4 ath10k_pci(O) ath10k_core(O) mac80211(O) ath(O)
>> cfg80211(O) ehci_pci evdev igb(O) rfkill sg ehci_hcd rtc_cmos pcspkr
>> acpi_cpufreq i2c_i801 i2c_ismt button compat(O) dca sd_mod processor
>> thermal_sys hwmon edd ahci libahci libata scsi_mod hid_generic usbhid
>>
>>
>> Sometimes but not allways there is the message "firmware crashed!" in dmesg but it doesn't matter which error message it actually is:
>> The behavior is allways the same. The card stops working until reboot. Unloading/reloading of ath10k_pci, ath10k_core, ath doesn't help in this case.
>> The basic problems of all error messages I saw by now is a broken link between the cards firmware and the ath10k-driver.
>> Depending on the point in time this "connection loss" happens the error messages are a little bit different,
>> as they are strongly connected to the current state of the driver while it is trying to talk to the cards firmware via WMI.
>>
>> If you try to reproduce you have to wait between 3 and 60 Minutes to see the crash. You can increase the likelyhood for crashing by increasing
>> the number of wifi traffic on foreign networks at the same channel.
>> I testet with four laptops that are connected to four QCA988x cards (AP-mode). This takes around 3-10 minutes to get it reproduced.
>>
>> If you need more information I'm at your disposal.
>>
>> Regards,
>> Fabian Wittenberg
>>
>>
>>
>> _______________________________________________
>> ath10k mailing list
>> ath10k@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/ath10k
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
>
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 16+ messages in thread