From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Zefir Kurtisi <zefir.kurtisi@westermo.com>,
Zefir Kurtisi <zefku@westermo.com>,
linux-wireless@vger.kernel.org
Cc: Felix Fietkau <nbd@nbd.name>,
qca-developer-program@qualcomm.com,
Adrian Chadd <adrian@freebsd.org>
Subject: Re: [RFT] ath9k: multi-rate-retry fails at HW level
Date: Fri, 11 Dec 2020 11:37:56 +0100 [thread overview]
Message-ID: <878sa44ohn.fsf@toke.dk> (raw)
In-Reply-To: <57d98dc9-7e5d-4d2e-335e-5948ef3645ad@westermo.com>
Zefir Kurtisi <zefir.kurtisi@westermo.com> writes:
> On 01.12.20 14:33, Toke Høiland-Jørgensen wrote:
>> Zefir Kurtisi <zefir.kurtisi@westermo.com> writes:
>>
>>> CC += adrian
>>>
>>> On 24.11.20 15:45, Toke Høiland-Jørgensen wrote:
>>>> Zefir Kurtisi <zefku@westermo.com> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running into a strange issue with the ath9k operating a 9590
>>>>> device which to me seems like a HW issue, but since work on rate
>>>>> controllers is already going for decades, I hardly can imagine this
>>>>> never showed up.
>>>>>
>>>>> The issue observed is this: the TX status descriptors never report
>>>>> rateindex 1, it is always 0, 2, or 3, but never 1.
>>>>>
>>>>> I noticed this by overwriting the rate configuration provided by
>>>>> minstrel to a static setup, e.g. (7,3)(5,3)(3,3)(1,3), all MCS. The
>>>>> device operates as iperf client to a connected AP and continuously
>>>>> transmits data. While at that, the attenuation between the endpoints
>>>>> is gradually increased, expecting to see a gradual shift in the
>>>>> reported TX status rateindex from 0 to 3. But nada, the values
>>>>> reported are 0,2, and 3 - never 1.
>>>>>
>>>>> I double checked that the TX descriptors are correctly set with the
>>>>> rates and retry counts - all looking sane.
>>>>>
>>>>> More obvious, after changing the rate configuration to
>>>>> (7,3)(1,3)(5,3)(3,3) the expectation would be to have either 0 or 1
>>>>> reported as rateidx, since the transmission ought to be successful
>>>>> with the lowest rate or never. Again all rates are reported but 1.
>>>>>
>>>>> Now the question for me is: what is the HW exactly doing with such a
>>>>> configuration? Is it skipping the second rate, or is it just reporting
>>>>> wrong?
>>>>
>>>> You should be able to see this by looking at the rates the frames are
>>>> being sent at, shouldn't you?
>>>>
>>> Yes, did that and from there it points to that the second rate is just skipped.
>>>
>>> Here are some use cases and their sniffing results. Setup is a 11ng STA connected
>>> to AP with the attenuation adjusted such that MCS 7 fails, while MCS 5 and below
>>> succeed. Monitor is sniffing while sending a single ping from AP to STA.
>>>
>>> With a rate configuration of (7/2)(3/2)(1/2) we get:
>>> 14:02:42.923880 9481489761us tsft 2412 MHz 11n -68dBm signal 65.0 Mb/s MCS 7 20
>>> MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: e Pad 20 KeyID 0
>>> 14:02:42.923909 9481490037us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
>>> MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV: e Pad 20 KeyID 0
>>> 14:02:42.925244 9481491044us tsft 2412 MHz 11n -68dBm signal 13.0 Mb/s MCS 1 20
>>> MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: e Pad 20 KeyID 0
>>>
>>>
>>> with (7/2)(1/2)(3/2):
>>> 13:59:37.073147 9295637087us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
>>> MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV: c Pad 20 KeyID 0
>>> 13:59:37.073467 9295637438us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
>>> MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV: c Pad 20 KeyID 0
>>> 13:59:37.074591 9295638498us tsft 2412 MHz 11n -68dBm signal 26.0 Mb/s MCS 3 20
>>> MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: c Pad 20 KeyID 0
>>>
>>> and with (7/2)(3/2):
>>> 14:04:27.269806 9585836783us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
>>> MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV: 10 Pad 20 KeyID 0
>>> 14:04:27.270342 9585837344us tsft 2412 MHz 11n -68dBm signal 65.0 Mb/s MCS 7 20
>>> MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: 10 Pad 20 KeyID 0
>>> 14:04:27.271368 9585838370us tsft 2412 MHz 11n -68dBm signal 65.0 Mb/s MCS 7 20
>>> MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: 10 Pad 20 KeyID 0
>>> [..]
>>>
>>> a total of 14 attempts at MCS 7 with the ping finally failing.
>>>
>>>>> Both possibilities have great impact, since upper layers (like
>>>>> airtime) use the returned rateidx to calculate and configure operating
>>>>> parameters at runtime.
>>>>
>>>> Have you actually observed any issues from this? If it's just skipping a
>>>> rate, minstrel should still be able to make decisions based on the
>>>> actual values returned, no?
>>>>
>>> The issues arise from the fact that the driver reports a
>>> (tx-rateindex/tx-attemp-index) per TX descriptor, leaving the driver to calculate
>>> what was put on air based on these two values. If one had rates set to
>>> (7/2)(3/7)(1/2) and the TX status reports (tx-rateindex=2/tx-attempt-index=0),
>>> driver assumes there were 10 attempts in total while in fact they were 3 when the
>>> second rate is skipped. What direct effect this has on RC I can't grasp, but it
>>> definitively falsifies statistics.
>>>
>>> Same goes for airtime: check how this falsifies its calculation in
>>> ath_tx_count_airtime().
>>
>> Ah, right, I was assuming that rates[1].count would be reset to zero
>> somehow. Have you confirmed that the attempts actually go up on in the
>> Minstrel stats for the skipped rate?
>>
>>> Also, the above mentioned is an immediate visible issue: if RC
>>> provides two rates e.g. (7/3)(5/3) of which the first is too high and
>>> the second is not even attempted, frames don't make it through.
>>
>> Yeah, rate control would likely take longer to converge to the right
>> rate. I suppose if this is a hardware model-specific issue that a quirks
>> bit could be added to instruct Minstrel to disregard the second index.
>> But it does sound a bit odd; have you verified that it's consistent on
>> different units of the same model (and not just a busted device)?
>>
>
> False alarm.
>
> We got confirmation that the observed failure with that exact chip
> revision is not happening on a different platform. It still might be a
> HW issue specific to our rarely used PPC platform, but it is not an
> ath9k malfunction. I'll dig further into that and report back if it is
> relevant for the list.
>
> Thanks Toke for the feedback and insights and sorry for noise.
You're welcome, and great to hear that you got closer to a resolution :)
-Toke
prev parent reply other threads:[~2020-12-11 10:40 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-23 14:06 [RFT] ath9k: multi-rate-retry fails at HW level Zefir Kurtisi
2020-11-24 14:45 ` Toke Høiland-Jørgensen
2020-11-27 15:38 ` Zefir Kurtisi
2020-12-01 13:33 ` Toke Høiland-Jørgensen
2020-12-11 9:00 ` Zefir Kurtisi
2020-12-11 10:37 ` Toke Høiland-Jørgensen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878sa44ohn.fsf@toke.dk \
--to=toke@redhat.com \
--cc=adrian@freebsd.org \
--cc=linux-wireless@vger.kernel.org \
--cc=nbd@nbd.name \
--cc=qca-developer-program@qualcomm.com \
--cc=zefir.kurtisi@westermo.com \
--cc=zefku@westermo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).