From: Fabian Wittenberg <Fabian.Wittenberg@sophos.com>
To: Michal Kazior <michal.kazior@tieto.com>
Cc: ath10k@lists.infradead.org
Subject: Re: ath10k + INTEL_IDLE aka. cstates == firmware crash
Date: Mon, 23 Feb 2015 14:44:53 +0100 [thread overview]
Message-ID: <54EB2ED5.8040104@sophos.com> (raw)
In-Reply-To: <CA+BoTQnaNmRs9NcXUtJddhD1SfxAy+Ct-s-hD4mvsi9cqh1eSw@mail.gmail.com>
Hi Michal,
I used firmware version 10.1 and 10.2 from here:
https://github.com/kvalo/ath10k-firmware. Both show the same behavior.
You are right. There are some BIOS that do strange handling of this
cstate stuff but we have no influence on the BIOS as this is done by
our hardware vendor. We experimented a lot with the MSI masking bit of
the pci-e root bridge where the ac-card is connected to.
There were no remarkable improvements playing around with this bit.
We have tested the same boards with cards that need ath9k as well. They
are working just fine. With and without enabled INTEL_IDLE...
Regards,
Fabian
Am 23.02.2015 um 14:32 schrieb Michal Kazior:
> On 23 February 2015 at 14:08, Fabian Wittenberg
> <Fabian.Wittenberg@sophos.com> wrote:
>> Hi@all,
>>
>> we are using the brand new QCA988x chipset based on mini-PCIe cards in our newest wifi enabled firewall appliance and we have had
>> a lot of problems to get it running (Intel Rangeley platform; Intel(R) Atom(TM) CPU C2558 @ 2.40GHz).
> I recall one guy complained his Atom-based laptop wasn't happy running
> ath10k either but I think it was some electrical incompatibility and
> the machine didn't even POST when the card was plugged into mPCIe
> slot.
>
>
>> The card crashed after some minutes using ath10k-driver (backports-3.19-rc1). Older versions are affected as well.
>> At least down to 3.12.20. I did intensive debugging and found out, that there
>> are major issues as soon as Intels processor cstates are used. This
>> option is called "CONFIG_INTEL_IDLE" in kernel config. This seems to be
>> a very heavy issue as it even can lead to low memory corruption and
>> kernel freezes. Low memory corruption doesn't occure always; just sometimes. This makes it hard to debug.
>> Also you need a multi processor system to trigger the issue.
>> If you set kernel parameter "maxcpus=1" the error doesn't occure even if you enable CONFIG_INTEL_IDLE.
> Through a quick search I've found this:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=715485
>
> It looks like some BIOSes can have buggy C-state handling. Maybe
> that's the root cause? From my experience QCA988x can be sometimes
> quirky when it comes to PCIe so I wouldn't be surprised if other
> devices don't crash.
>
>
>> Kernel output looks like this if the card stops working:
>>
>>
>> [ 3715.145865] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
>> [ 3715.145876] wifi1: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
>> [ 3718.148226] ath10k: failed to install key for vdev 2 peer 00:1a:8c:0a:b5:01: -11
>> [ 3718.148236] wifi1: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
>> [ 3723.152167] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
>> [ 3723.152178] wifi0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-11)
>> [ 3723.152185] ath10k: failed to transmit management frame via WMI: -11
>> [ 3726.154524] ath10k: failed to install key for vdev 0 peer 00:1a:8c:0a:34:01: -11
>> [ 3726.154535] wifi0: failed to set key (1, ff:ff:ff:ff:ff:ff) to hardware (-11)
>> [ 3729.156884] ath10k: failed to install key for vdev 0 peer 00:0e:8e:ae:5c:1c: -11
>> [ 3729.156890] ath10k: failed to transmit management frame via WMI: -11
>> [ 3729.156904] wifi0: failed to remove key (0, 00:0e:8e:ae:5c:1c) from hardware (-11)
>> [ 3732.159255] ath10k: failed to remove peer wep key 0: -11
>> [ 3732.159265] ath10k: failed to clear all peer wep keys for vdev 0: -11
>> [ 3732.159273] ath10k: failed to disassociate station: 00:0e:8e:ae:5c:1c vdev 0: -11
> [...]
>
> It seems firmware stopped replenishing WMI-HTC Tx credits. It's most
> likely not the mgmt-related tx credit starvation but instead
> communication with the device is really broken.
>
>
>> Sometimes but not allways there is the message "firmware crashed!" in dmesg but it doesn't matter which error message it actually is:
>> The behavior is allways the same. The card stops working until reboot. Unloading/reloading of ath10k_pci, ath10k_core, ath doesn't help in this case.
>> The basic problems of all error messages I saw by now is a broken link between the cards firmware and the ath10k-driver.
>> Depending on the point in time this "connection loss" happens the error messages are a little bit different,
>> as they are strongly connected to the current state of the driver while it is trying to talk to the cards firmware via WMI.
>>
>> If you try to reproduce you have to wait between 3 and 60 Minutes to see the crash. You can increase the likelyhood for crashing by increasing
>> the number of wifi traffic on foreign networks at the same channel.
>> I testet with four laptops that are connected to four QCA988x cards (AP-mode). This takes around 3-10 minutes to get it reproduced.
>>
>> If you need more information I'm at your disposal.
> It'd be nice to know what firmware you're using. Generally I would
> discourage from using 999.999.0.636 because it's very old.
>
>
> Michał
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
next prev parent reply other threads:[~2015-02-23 13:45 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-23 13:08 ath10k + INTEL_IDLE aka. cstates == firmware crash Fabian Wittenberg
2015-02-23 13:32 ` Michal Kazior
2015-02-23 13:44 ` Fabian Wittenberg [this message]
2015-02-23 14:20 ` Michal Kazior
2015-02-23 14:41 ` Fabian Wittenberg
2015-03-02 12:20 ` Michal Kazior
2015-03-19 9:20 ` Fabian Wittenberg
2015-03-19 15:44 ` Adrian Chadd
2015-03-19 15:57 ` Fabian Wittenberg
2015-03-19 16:05 ` Adrian Chadd
2015-03-19 16:18 ` Fabian Wittenberg
2015-03-19 16:23 ` Adrian Chadd
2015-03-20 10:46 ` Fabian Wittenberg
2015-02-23 16:58 ` Ben Greear
2015-03-08 13:45 ` Jeremias Blendin
2015-03-08 18:27 ` Ben Greear
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54EB2ED5.8040104@sophos.com \
--to=fabian.wittenberg@sophos.com \
--cc=ath10k@lists.infradead.org \
--cc=michal.kazior@tieto.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox