From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Zefir Kurtisi <zefir.kurtisi@westermo.com>,
Zefir Kurtisi <zefku@westermo.com>,
linux-wireless@vger.kernel.org
Cc: Felix Fietkau <nbd@nbd.name>,
qca-developer-program@qualcomm.com,
Adrian Chadd <adrian@freebsd.org>
Subject: Re: [RFT] ath9k: multi-rate-retry fails at HW level
Date: Tue, 01 Dec 2020 14:33:37 +0100 [thread overview]
Message-ID: <87r1o91wi6.fsf@toke.dk> (raw)
In-Reply-To: <d05e928a-c78d-d191-7ae0-6342e05d892a@westermo.com>
Zefir Kurtisi <zefir.kurtisi@westermo.com> writes:
> CC += adrian
>
> On 24.11.20 15:45, Toke Høiland-Jørgensen wrote:
>> Zefir Kurtisi <zefku@westermo.com> writes:
>>
>>> Hi,
>>>
>>> I am running into a strange issue with the ath9k operating a 9590
>>> device which to me seems like a HW issue, but since work on rate
>>> controllers is already going for decades, I hardly can imagine this
>>> never showed up.
>>>
>>> The issue observed is this: the TX status descriptors never report
>>> rateindex 1, it is always 0, 2, or 3, but never 1.
>>>
>>> I noticed this by overwriting the rate configuration provided by
>>> minstrel to a static setup, e.g. (7,3)(5,3)(3,3)(1,3), all MCS. The
>>> device operates as iperf client to a connected AP and continuously
>>> transmits data. While at that, the attenuation between the endpoints
>>> is gradually increased, expecting to see a gradual shift in the
>>> reported TX status rateindex from 0 to 3. But nada, the values
>>> reported are 0,2, and 3 - never 1.
>>>
>>> I double checked that the TX descriptors are correctly set with the
>>> rates and retry counts - all looking sane.
>>>
>>> More obvious, after changing the rate configuration to
>>> (7,3)(1,3)(5,3)(3,3) the expectation would be to have either 0 or 1
>>> reported as rateidx, since the transmission ought to be successful
>>> with the lowest rate or never. Again all rates are reported but 1.
>>>
>>> Now the question for me is: what is the HW exactly doing with such a
>>> configuration? Is it skipping the second rate, or is it just reporting
>>> wrong?
>>
>> You should be able to see this by looking at the rates the frames are
>> being sent at, shouldn't you?
>>
> Yes, did that and from there it points to that the second rate is just skipped.
>
> Here are some use cases and their sniffing results. Setup is a 11ng STA connected
> to AP with the attenuation adjusted such that MCS 7 fails, while MCS 5 and below
> succeed. Monitor is sniffing while sending a single ping from AP to STA.
>
> With a rate configuration of (7/2)(3/2)(1/2) we get:
> 14:02:42.923880 9481489761us tsft 2412 MHz 11n -68dBm signal 65.0 Mb/s MCS 7 20
> MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: e Pad 20 KeyID 0
> 14:02:42.923909 9481490037us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
> MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV: e Pad 20 KeyID 0
> 14:02:42.925244 9481491044us tsft 2412 MHz 11n -68dBm signal 13.0 Mb/s MCS 1 20
> MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: e Pad 20 KeyID 0
>
>
> with (7/2)(1/2)(3/2):
> 13:59:37.073147 9295637087us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
> MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV: c Pad 20 KeyID 0
> 13:59:37.073467 9295637438us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
> MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV: c Pad 20 KeyID 0
> 13:59:37.074591 9295638498us tsft 2412 MHz 11n -68dBm signal 26.0 Mb/s MCS 3 20
> MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: c Pad 20 KeyID 0
>
> and with (7/2)(3/2):
> 14:04:27.269806 9585836783us tsft 2412 MHz 11n -69dBm signal 65.0 Mb/s MCS 7 20
> MHz long GI RX-STBC0 -69dBm signal antenna 0 Data IV: 10 Pad 20 KeyID 0
> 14:04:27.270342 9585837344us tsft 2412 MHz 11n -68dBm signal 65.0 Mb/s MCS 7 20
> MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: 10 Pad 20 KeyID 0
> 14:04:27.271368 9585838370us tsft 2412 MHz 11n -68dBm signal 65.0 Mb/s MCS 7 20
> MHz long GI RX-STBC0 -68dBm signal antenna 0 Data IV: 10 Pad 20 KeyID 0
> [..]
>
> a total of 14 attempts at MCS 7 with the ping finally failing.
>
>>> Both possibilities have great impact, since upper layers (like
>>> airtime) use the returned rateidx to calculate and configure operating
>>> parameters at runtime.
>>
>> Have you actually observed any issues from this? If it's just skipping a
>> rate, minstrel should still be able to make decisions based on the
>> actual values returned, no?
>>
> The issues arise from the fact that the driver reports a
> (tx-rateindex/tx-attemp-index) per TX descriptor, leaving the driver to calculate
> what was put on air based on these two values. If one had rates set to
> (7/2)(3/7)(1/2) and the TX status reports (tx-rateindex=2/tx-attempt-index=0),
> driver assumes there were 10 attempts in total while in fact they were 3 when the
> second rate is skipped. What direct effect this has on RC I can't grasp, but it
> definitively falsifies statistics.
>
> Same goes for airtime: check how this falsifies its calculation in
> ath_tx_count_airtime().
Ah, right, I was assuming that rates[1].count would be reset to zero
somehow. Have you confirmed that the attempts actually go up on in the
Minstrel stats for the skipped rate?
> Also, the above mentioned is an immediate visible issue: if RC
> provides two rates e.g. (7/3)(5/3) of which the first is too high and
> the second is not even attempted, frames don't make it through.
Yeah, rate control would likely take longer to converge to the right
rate. I suppose if this is a hardware model-specific issue that a quirks
bit could be added to instruct Minstrel to disregard the second index.
But it does sound a bit odd; have you verified that it's consistent on
different units of the same model (and not just a busted device)?
-Toke
next prev parent reply other threads:[~2020-12-01 13:35 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-23 14:06 [RFT] ath9k: multi-rate-retry fails at HW level Zefir Kurtisi
2020-11-24 14:45 ` Toke Høiland-Jørgensen
2020-11-27 15:38 ` Zefir Kurtisi
2020-12-01 13:33 ` Toke Høiland-Jørgensen [this message]
2020-12-11 9:00 ` Zefir Kurtisi
2020-12-11 10:37 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r1o91wi6.fsf@toke.dk \
--to=toke@redhat.com \
--cc=adrian@freebsd.org \
--cc=linux-wireless@vger.kernel.org \
--cc=nbd@nbd.name \
--cc=qca-developer-program@qualcomm.com \
--cc=zefir.kurtisi@westermo.com \
--cc=zefku@westermo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.