Re: [PATCH] ath10k: Restart xmit queues below low-water mark.

linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: Ben Greear <greearb@candelatech.com>, linux-wireless@vger.kernel.org
Subject: Re: [PATCH] ath10k: Restart xmit queues below low-water mark.
Date: Wed, 29 Apr 2020 11:28:11 +0200	[thread overview]
Message-ID: <87sggmtzdg.fsf@toke.dk> (raw)
In-Reply-To: <e6ee8635-b45f-c5fe-d32a-1d695b3a7934@candelatech.com>

Ben Greear <greearb@candelatech.com> writes:

> On 04/28/2020 01:39 PM, Toke Høiland-Jørgensen wrote:
>> Ben Greear <greearb@candelatech.com> writes:
>>
>>> On 04/28/2020 12:37 PM, Toke Høiland-Jørgensen wrote:
>>>> greearb@candelatech.com writes:
>>>>
>>>>> From: Ben Greear <greearb@candelatech.com>
>>>>>
>>>>> While running tcp upload + download tests with ~200
>>>>> concurrent TCP streams, 1-2 processes, and 30 station
>>>>> vdevs, I noticed that the __ieee80211_stop_queue was taking
>>>>> around 20% of the CPU according to perf-top, which other locking
>>>>> taking an additional ~15%.
>>>>>
>>>>> I believe the issue is that the ath10k driver would unlock the
>>>>> txqueue when a single frame could be transmitted, instead of
>>>>> waiting for a low water mark.
>>>>>
>>>>> So, this patch adds a low-water mark that is 1/4 of the total
>>>>> tx buffers allowed.
>>>>>
>>>>> This appears to resolve the performance problem that I saw.
>>>>>
>>>>> Tested with recent wave-1 ath10k-ct firmware.
>>>>>
>>>>> Signed-off-by: Ben Greear <greearb@candelatech.com>
>>>>> ---
>>>>>  drivers/net/wireless/ath/ath10k/htt.h    | 1 +
>>>>>  drivers/net/wireless/ath/ath10k/htt_tx.c | 8 ++++++--
>>>>>  2 files changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
>>>>> index 31c4ddbf45cb..b5634781c0dc 100644
>>>>> --- a/drivers/net/wireless/ath/ath10k/htt.h
>>>>> +++ b/drivers/net/wireless/ath/ath10k/htt.h
>>>>> @@ -1941,6 +1941,7 @@ struct ath10k_htt {
>>>>>
>>>>>  	u8 target_version_major;
>>>>>  	u8 target_version_minor;
>>>>> +	bool needs_unlock;
>>>>>  	struct completion target_version_received;
>>>>>  	u8 max_num_amsdu;
>>>>>  	u8 max_num_ampdu;
>>>>> diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>>> index 9b3c3b080e92..44795d9a7c0c 100644
>>>>> --- a/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>>> +++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
>>>>> @@ -145,8 +145,10 @@ void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt)
>>>>>  	lockdep_assert_held(&htt->tx_lock);
>>>>>
>>>>>  	htt->num_pending_tx--;
>>>>> -	if (htt->num_pending_tx == htt->max_num_pending_tx - 1)
>>>>> +	if ((htt->num_pending_tx <= (htt->max_num_pending_tx / 4)) && htt->needs_unlock) {
>>>>
>>>> Why /4? Seems a bit arbitrary?
>>>
>>> Yes, arbitrary for sure. I figure restart filling the queue when 1/4
>>> full so that it is unlikely to run dry. Possibly it should restart
>>> sooner to keep it more full on average?
>>
>> Theoretically, the "keep the queue at the lowest possible level that
>> keeps it from underflowing" is what BQL is supposed to do. The diff
>> below uses the dynamic adjustment bit (from dynamic_queue_limits.h) in
>> place of num_pending_tx. I've only compile tested it, and I'm a bit
>> skeptical that it will work right for this, but if anyone wants to give
>> it a shot, there it is.
>>
>> BTW, while doing that, I noticed there's a similar arbitrary limit in
>> ath10k_mac_tx_push_pending() at max_num_pending_tx/2. So if you're going
>> to keep the arbitrary limit maybe use the same one? :)
>>
>>> Before my patch, the behaviour would be to try to keep it as full as
>>> possible, as in restart the queues as soon as a single slot opens up
>>> in the tx queue.
>>
>> Yeah, that seems somewhat misguided as well, from a latency perspective,
>> at least. But I guess that's what we're fixing with AQL. What does the
>> firmware do with the frames queued within? Do they just go on a FIFO
>> queue altogether, or something smarter?
>
> Sort of like a mini-mac80211 stack inside the firmware is used to
> create ampdu/amsdu chains and schedule them with its own scheduler.
>
> For optimal throughput with 200 users steaming video,
> the ath10k driver should think that it has only a few active peers wanting
> to send data at a time (and so firmware would think the same), and the driver should
> be fed a large chunk of pkts for those peers.  And then the next few peers.
> That should let firmware send large ampdu/amsdu to each peer, increasing throughput
> over all.

Yes, but also increasing latency because all other stations have to wait
for a longer TXOP (see my other reply).

> If you feed a few frames to each of the 200 peers, then even if
> firmware has 2000 tx buffers, that is only 10 frames per peer at best,
> leading to small ampdu/amsdu and thus worse over-all throughput and
> utilization of airtime.

Do you have any data on exactly how long (in time) each txop becomes in
these highly congested scenarios?

> It would be nice to be able to set certain traffic flows to have the
> throughput optimization and others to have the latency optimization.
> For instance, high latency on a streaming download is a good trade-off
> if it increases total throughput.

For the individual flows to a peer, fq_codel actually does a pretty good
job at putting the latency-sensitive flows first. Which is why we want
the queueing to happen in mac80211 (where fq_codel is active) instead of
in the firmware.

> Maybe some of the AI folks training their AI to categorize cat
> pictures could instead start categorizing traffic flows and adjusting
> the stack realtime...

I know there are people trying to classify traffic with machine learning
(although usually for more nefarious purposes), but I'm not sure it's
feasible to do in a queue management algorithm (if at all). :)

-Toke

next prev parent reply	other threads:[~2020-04-29  9:28 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-27 14:54 [PATCH] ath10k: Restart xmit queues below low-water mark greearb
2020-04-28 14:56 ` Steve deRosier
2020-04-28 14:58   ` Ben Greear
2020-04-28 19:36     ` Toke Høiland-Jørgensen
2020-04-28 19:39       ` Dave Taht
2020-04-28 20:00         ` Ben Greear
2020-04-28 20:58           ` Toke Høiland-Jørgensen
2020-04-28 21:23             ` Steve deRosier
2020-04-28 16:27   ` Dave Taht
2020-04-28 16:35     ` Ben Greear
2020-04-28 17:10       ` Dave Taht
2020-04-28 19:43       ` Toke Høiland-Jørgensen
2020-04-28 19:37 ` Toke Høiland-Jørgensen
2020-04-28 19:51   ` Ben Greear
2020-04-28 20:39     ` Toke Høiland-Jørgensen
2020-04-28 21:18       ` Ben Greear
2020-04-29  9:28         ` Toke Høiland-Jørgensen [this message]
2020-04-29 13:54           ` Ben Greear
2020-04-29 14:56             ` Toke Høiland-Jørgensen
2020-04-29 15:26               ` Ben Greear
2020-04-29 15:42                 ` Toke Høiland-Jørgensen
2020-04-30 23:31       ` John Deere
2020-05-01  2:16         ` Ben Greear
2020-05-01 15:50           ` John Deere
2020-05-01 17:51             ` Mark Baker
2020-05-02  5:49               ` John Deere
2020-05-02 15:56                 ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sggmtzdg.fsf@toke.dk \
    --to=toke@toke.dk \
    --cc=greearb@candelatech.com \
    --cc=linux-wireless@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).