From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: Ben Greear <greearb@candelatech.com>, linux-wireless@vger.kernel.org
Subject: Re: [PATCH] ath10k: Restart xmit queues below low-water mark.
Date: Wed, 29 Apr 2020 11:28:11 +0200 [thread overview]
Message-ID: <87sggmtzdg.fsf@toke.dk> (raw)
In-Reply-To: <e6ee8635-b45f-c5fe-d32a-1d695b3a7934@candelatech.com>
Ben Greear <greearb@candelatech.com> writes:
> On 04/28/2020 01:39 PM, Toke Høiland-Jørgensen wrote:
>> Ben Greear <greearb@candelatech.com> writes:
>>
>>> On 04/28/2020 12:37 PM, Toke Høiland-Jørgensen wrote:
>>>> greearb@candelatech.com writes:
>>>>
>>>>> From: Ben Greear <greearb@candelatech.com>
>>>>>
>>>>> While running tcp upload + download tests with ~200
>>>>> concurrent TCP streams, 1-2 processes, and 30 station
>>>>> vdevs, I noticed that the __ieee80211_stop_queue was taking
>>>>> around 20% of the CPU according to perf-top, which other locking
>>>>> taking an additional ~15%.
>>>>>
>>>>> I believe the issue is that the ath10k driver would unlock the
>>>>> txqueue when a single frame could be transmitted, instead of
>>>>> waiting for a low water mark.
>>>>>
>>>>> So, this patch adds a low-water mark that is 1/4 of the total
>>>>> tx buffers allowed.
>>>>>
>>>>> This appears to resolve the performance problem that I saw.
>>>>>
>>>>> Tested with recent wave-1 ath10k-ct firmware.
>>>>>
>>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>>> ---
>>>>> drivers/net/wireless/ath/ath10k/htt.h | 1 +
>>>>> drivers/net/wireless/ath/ath10k/htt_tx.c | 8 ++++++--
>>>>> 2 files changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
>>>>> index 31c4ddbf45cb..b5634781c0dc 100644
>>>>> --- a/drivers/net/wireless/ath/ath10k/htt.h
>>>>> +++ b/drivers/net/wireless/ath/ath10k/htt.h
>>>>> @@ -1941,6 +1941,7 @@ struct ath10k_htt {
>>>>>
>>>>> u8 target_version_major;
>>>>> u8 target_version_minor;
>>>>> + bool needs_unlock;
>>>>> struct completion target_version_received;
>>>>> u8 max_num_amsdu;
>>>>> u8 max_num_ampdu;
>>>>> diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>>> index 9b3c3b080e92..44795d9a7c0c 100644
>>>>> --- a/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>>> +++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>>> @@ -145,8 +145,10 @@ void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt)
>>>>> lockdep_assert_held(&htt->tx_lock);
>>>>>
>>>>> htt->num_pending_tx--;
>>>>> - if (htt->num_pending_tx == htt->max_num_pending_tx - 1)
>>>>> + if ((htt->num_pending_tx <= (htt->max_num_pending_tx / 4)) && htt->needs_unlock) {
>>>>
>>>> Why /4? Seems a bit arbitrary?
>>>
>>> Yes, arbitrary for sure. I figure restart filling the queue when 1/4
>>> full so that it is unlikely to run dry. Possibly it should restart
>>> sooner to keep it more full on average?
>>
>> Theoretically, the "keep the queue at the lowest possible level that
>> keeps it from underflowing" is what BQL is supposed to do. The diff
>> below uses the dynamic adjustment bit (from dynamic_queue_limits.h) in
>> place of num_pending_tx. I've only compile tested it, and I'm a bit
>> skeptical that it will work right for this, but if anyone wants to give
>> it a shot, there it is.
>>
>> BTW, while doing that, I noticed there's a similar arbitrary limit in
>> ath10k_mac_tx_push_pending() at max_num_pending_tx/2. So if you're going
>> to keep the arbitrary limit maybe use the same one? :)
>>
>>> Before my patch, the behaviour would be to try to keep it as full as
>>> possible, as in restart the queues as soon as a single slot opens up
>>> in the tx queue.
>>
>> Yeah, that seems somewhat misguided as well, from a latency perspective,
>> at least. But I guess that's what we're fixing with AQL. What does the
>> firmware do with the frames queued within? Do they just go on a FIFO
>> queue altogether, or something smarter?
>
> Sort of like a mini-mac80211 stack inside the firmware is used to
> create ampdu/amsdu chains and schedule them with its own scheduler.
>
> For optimal throughput with 200 users steaming video,
> the ath10k driver should think that it has only a few active peers wanting
> to send data at a time (and so firmware would think the same), and the driver should
> be fed a large chunk of pkts for those peers. And then the next few peers.
> That should let firmware send large ampdu/amsdu to each peer, increasing throughput
> over all.
Yes, but also increasing latency because all other stations have to wait
for a longer TXOP (see my other reply).
> If you feed a few frames to each of the 200 peers, then even if
> firmware has 2000 tx buffers, that is only 10 frames per peer at best,
> leading to small ampdu/amsdu and thus worse over-all throughput and
> utilization of airtime.
Do you have any data on exactly how long (in time) each txop becomes in
these highly congested scenarios?
> It would be nice to be able to set certain traffic flows to have the
> throughput optimization and others to have the latency optimization.
> For instance, high latency on a streaming download is a good trade-off
> if it increases total throughput.
For the individual flows to a peer, fq_codel actually does a pretty good
job at putting the latency-sensitive flows first. Which is why we want
the queueing to happen in mac80211 (where fq_codel is active) instead of
in the firmware.
> Maybe some of the AI folks training their AI to categorize cat
> pictures could instead start categorizing traffic flows and adjusting
> the stack realtime...
I know there are people trying to classify traffic with machine learning
(although usually for more nefarious purposes), but I'm not sure it's
feasible to do in a queue management algorithm (if at all). :)
-Toke
next prev parent reply other threads:[~2020-04-29 9:28 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-27 14:54 [PATCH] ath10k: Restart xmit queues below low-water mark greearb
2020-04-28 14:56 ` Steve deRosier
2020-04-28 14:58 ` Ben Greear
2020-04-28 19:36 ` Toke Høiland-Jørgensen
2020-04-28 19:39 ` Dave Taht
2020-04-28 20:00 ` Ben Greear
2020-04-28 20:58 ` Toke Høiland-Jørgensen
2020-04-28 21:23 ` Steve deRosier
2020-04-28 16:27 ` Dave Taht
2020-04-28 16:35 ` Ben Greear
2020-04-28 17:10 ` Dave Taht
2020-04-28 19:43 ` Toke Høiland-Jørgensen
2020-04-28 19:37 ` Toke Høiland-Jørgensen
2020-04-28 19:51 ` Ben Greear
2020-04-28 20:39 ` Toke Høiland-Jørgensen
2020-04-28 21:18 ` Ben Greear
2020-04-29 9:28 ` Toke Høiland-Jørgensen [this message]
2020-04-29 13:54 ` Ben Greear
2020-04-29 14:56 ` Toke Høiland-Jørgensen
2020-04-29 15:26 ` Ben Greear
2020-04-29 15:42 ` Toke Høiland-Jørgensen
2020-04-30 23:31 ` John Deere
2020-05-01 2:16 ` Ben Greear
2020-05-01 15:50 ` John Deere
2020-05-01 17:51 ` Mark Baker
2020-05-02 5:49 ` John Deere
2020-05-02 15:56 ` Ben Greear
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sggmtzdg.fsf@toke.dk \
--to=toke@toke.dk \
--cc=greearb@candelatech.com \
--cc=linux-wireless@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).