* [RFC] ath9k: improve aggregation throughput by using only first rate
@ 2010-07-26 17:10 Björn Smedman
2010-07-26 17:44 ` [ath9k-devel] " Felix Fietkau
0 siblings, 1 reply; 7+ messages in thread
From: Björn Smedman @ 2010-07-26 17:10 UTC (permalink / raw)
To: ath9k-devel, linux-wireless
Hi all,
I've been running a lot of iperf on AR913x /
compat-wireless-2010-07-16 (w/ openwrt/trunk@22388).
I think there are some (in theory) simple improvements that can be
done to the tx aggregation / rate control logic. A proof of concept of
one such improvement is provided below. Basically, it's a hack that
makes ath9k output aggregates with only the first rate in the rate
series. The reasoning is that a failure is not a problem for
aggregates because there is software retry. Retrying in hardware at a
slower rate is counter productive. So, better to fail and do a
software retry at possibly another rate. Also, since the aggregate
size is often limited by the slowest rate in the MRR series (4 ms txop
limit) having a slow rate in the series may affect performance even if
it is never used by the hardware.
In my (not so scientific) tests max AP downstream throughput increases
about 30-40% with the patch below (from 33.9 to 55.7 Mbit/s with HT20
in noisy environment with 20 meters and a few walls between AP and
client).
Of course, if all rates in the series are high then this patch has no effect.
/Björn
---
diff -urpN a/drivers/net/wireless/ath/ath9k/xmit.c
b/drivers/net/wireless/ath/ath9k/xmit.c
--- a/drivers/net/wireless/ath/ath9k/xmit.c 2010-07-26 15:35:17.000000000 +0200
+++ b/drivers/net/wireless/ath/ath9k/xmit.c 2010-07-26 17:11:33.000000000 +0200
@@ -565,7 +565,7 @@ static u32 ath_lookup_rate(struct ath_so
*/
max_4ms_framelen = ATH_AMPDU_LIMIT_MAX;
- for (i = 0; i < 4; i++) {
+ for (i = 0; i < 1; i++) {
if (rates[i].count) {
int modeidx;
if (!(rates[i].flags & IEEE80211_TX_RC_MCS)) {
@@ -1553,6 +1553,9 @@ static void ath_buf_set_rate(struct ath_
if (sc->sc_flags & SC_OP_PREAMBLE_SHORT)
ctsrate |= rate->hw_value_short;
+ if (bf_isaggr(bf))
+ rates[1].count = rates[2].count = rates[3].count = 0;
+
for (i = 0; i < 4; i++) {
bool is_40, is_sgi, is_sp;
int phy;
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate 2010-07-26 17:10 [RFC] ath9k: improve aggregation throughput by using only first rate Björn Smedman @ 2010-07-26 17:44 ` Felix Fietkau 2010-07-26 19:23 ` Björn Smedman 0 siblings, 1 reply; 7+ messages in thread From: Felix Fietkau @ 2010-07-26 17:44 UTC (permalink / raw) To: Björn Smedman; +Cc: ath9k-devel, linux-wireless On 2010-07-26 7:10 PM, Björn Smedman wrote: > Hi all, > > I've been running a lot of iperf on AR913x / > compat-wireless-2010-07-16 (w/ openwrt/trunk@22388). > > I think there are some (in theory) simple improvements that can be > done to the tx aggregation / rate control logic. A proof of concept of > one such improvement is provided below. Basically, it's a hack that > makes ath9k output aggregates with only the first rate in the rate > series. The reasoning is that a failure is not a problem for > aggregates because there is software retry. Retrying in hardware at a > slower rate is counter productive. So, better to fail and do a > software retry at possibly another rate. Also, since the aggregate > size is often limited by the slowest rate in the MRR series (4 ms txop > limit) having a slow rate in the series may affect performance even if > it is never used by the hardware. > > In my (not so scientific) tests max AP downstream throughput increases > about 30-40% with the patch below (from 33.9 to 55.7 Mbit/s with HT20 > in noisy environment with 20 meters and a few walls between AP and > client). > > Of course, if all rates in the series are high then this patch has no effect. I think it makes sense to rely less on on-chip MRR for fallback, but I think to make this workable, we really should use the MRR table for something, otherwise the rate control algorithm will take much longer to adapt. It's probably better to fix this properly after I'm done with my A-MPDU rewrite, because then I can more easily push parts of the software retransmission behaviour into minstrel_ht directly. - Felix ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate 2010-07-26 17:44 ` [ath9k-devel] " Felix Fietkau @ 2010-07-26 19:23 ` Björn Smedman 2010-07-26 19:41 ` Felix Fietkau 0 siblings, 1 reply; 7+ messages in thread From: Björn Smedman @ 2010-07-26 19:23 UTC (permalink / raw) To: Felix Fietkau; +Cc: ath9k-devel, linux-wireless 2010/7/26 Felix Fietkau <nbd@openwrt.org>: > On 2010-07-26 7:10 PM, Björn Smedman wrote: >> I think there are some (in theory) simple improvements that can be >> done to the tx aggregation / rate control logic. A proof of concept of >> one such improvement is provided below. Basically, it's a hack that > I think it makes sense to rely less on on-chip MRR for fallback, but I > think to make this workable, we really should use the MRR table for > something, otherwise the rate control algorithm will take much longer to > adapt. > It's probably better to fix this properly after I'm done with my A-MPDU > rewrite, because then I can more easily push parts of the software > retransmission behaviour into minstrel_ht directly. Sounds very reasonable. I'm sure you've thought of it but now that it's fresh in my head it would be great if the new aggregation design allowed us to experiment with stuff like this: * The rate control logic treats the average aggregate length as a measured independent variable, when in fact it depends heavily on the rates selected (via the 4 ms txop limit). * When tx is aggregated most rate control probe frames end up inside aggregates and are never used for probing (effective probe frequency is divided by average aggregate length). * When setting up a hardware MRR for an aggregate the focus should be on throughput (as explained earlier in this thread). But there are situations when reliability is important: e.g. when a subframe in the aggregate is about to expire (because of time or block ack window). It may even be advantageous to tx the subframes that are about to expire in their own aggregate with lower / more reliable bitrate? * In many busy radio environments the packet success rate depends very much on the protection method being used (none, cts-to-self or rts-cts), often more so than on the bitrate itself. It would be interesting to experiment with including the protection method in the rate selection, i.e. to probe for the optimal protection method and bitrate combination. * In order to have the best possible rate control in very dynamic rf environments it's important to keep the hardware queue short and select rates as late as possible (to not introduce unnecessary delay when selecting new rates). I have no idea how to do this but it would be great if the tx queue could be kept long enough to never stall tx, but no longer. * If I understand correctly the Atheros hardware does not adjust the rts / cts-to-self duration field when going through the MRR (correct?). In that case it may be even more advantageous to use software retry as much as possible when some form of protection is enabled. Looking forward to the new aggregation code! /Björn ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate 2010-07-26 19:23 ` Björn Smedman @ 2010-07-26 19:41 ` Felix Fietkau 2010-07-26 20:37 ` Björn Smedman 2010-07-27 4:48 ` Ranga Rao Ravuri 0 siblings, 2 replies; 7+ messages in thread From: Felix Fietkau @ 2010-07-26 19:41 UTC (permalink / raw) To: Björn Smedman; +Cc: ath9k-devel, linux-wireless On 2010-07-26 9:23 PM, Björn Smedman wrote: > 2010/7/26 Felix Fietkau <nbd@openwrt.org>: >> On 2010-07-26 7:10 PM, Björn Smedman wrote: >>> I think there are some (in theory) simple improvements that can be >>> done to the tx aggregation / rate control logic. A proof of concept of >>> one such improvement is provided below. Basically, it's a hack that >> I think it makes sense to rely less on on-chip MRR for fallback, but I >> think to make this workable, we really should use the MRR table for >> something, otherwise the rate control algorithm will take much longer to >> adapt. >> It's probably better to fix this properly after I'm done with my A-MPDU >> rewrite, because then I can more easily push parts of the software >> retransmission behaviour into minstrel_ht directly. > > Sounds very reasonable. I'm sure you've thought of it but now that > it's fresh in my head it would be great if the new aggregation design > allowed us to experiment with stuff like this: > > * The rate control logic treats the average aggregate length as a > measured independent variable, when in fact it depends heavily on the > rates selected (via the 4 ms txop limit). Yes, with the new design maybe we could use the initial rate lookup only for setting the sampling flag, and then doing a separate per-AMPDU lookup, which properly takes the AMPDU length into account. > * When tx is aggregated most rate control probe frames end up inside > aggregates and are never used for probing (effective probe frequency > is divided by average aggregate length). Nope, a probing frame never ends up inside an aggregate. It's always sent out as a single frame, which is why I had to make the decision about sending a probing frame more complex in minstrel_ht, compared to minstrel - the previous 10% stuff was limiting aggregation size. > * When setting up a hardware MRR for an aggregate the focus should be > on throughput (as explained earlier in this thread). But there are > situations when reliability is important: e.g. when a subframe in the > aggregate is about to expire (because of time or block ack window). It > may even be advantageous to tx the subframes that are about to expire > in their own aggregate with lower / more reliable bitrate? Yes, that's what I was thinking as well. We should probably make this decision based on the number of sw-retransmitted frames, and maybe consider the offset of seqno vs baw_tail as well. > * In many busy radio environments the packet success rate depends very > much on the protection method being used (none, cts-to-self or > rts-cts), often more so than on the bitrate itself. It would be > interesting to experiment with including the protection method in the > rate selection, i.e. to probe for the optimal protection method and > bitrate combination. Sounds good. > * In order to have the best possible rate control in very dynamic rf > environments it's important to keep the hardware queue short and > select rates as late as possible (to not introduce unnecessary delay > when selecting new rates). I have no idea how to do this but it would > be great if the tx queue could be kept long enough to never stall tx, > but no longer. This would work with what I suggested above - per-AMPDU rate lookup. With software scheduling that's easy to do, since we already restrict the queue to max. 2 AMPDUs > * If I understand correctly the Atheros hardware does not adjust the > rts / cts-to-self duration field when going through the MRR > (correct?). In that case it may be even more advantageous to use > software retry as much as possible when some form of protection is > enabled. Not sure, but I think it does adjust the duration field according to the rate, while transmitting. > Looking forward to the new aggregation code! That will still take some time, I recently came up with some better design ideas, which require some larger changes to the code that I already wrote. - Felix ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate 2010-07-26 19:41 ` Felix Fietkau @ 2010-07-26 20:37 ` Björn Smedman 2010-07-26 20:41 ` Felix Fietkau 2010-07-27 4:48 ` Ranga Rao Ravuri 1 sibling, 1 reply; 7+ messages in thread From: Björn Smedman @ 2010-07-26 20:37 UTC (permalink / raw) To: Felix Fietkau; +Cc: ath9k-devel, linux-wireless 2010/7/26 Felix Fietkau <nbd@openwrt.org>: > On 2010-07-26 9:23 PM, Björn Smedman wrote: >> 2010/7/26 Felix Fietkau <nbd@openwrt.org>: >> * When tx is aggregated most rate control probe frames end up inside >> aggregates and are never used for probing (effective probe frequency >> is divided by average aggregate length). > Nope, a probing frame never ends up inside an aggregate. It's always > sent out as a single frame, which is why I had to make the decision > about sending a probing frame more complex in minstrel_ht, compared to > minstrel - the previous 10% stuff was limiting aggregation size. Ok, I must have jumped to conclusions. I looked quickly at the code and had the impression that it only cared about the RATE_PROBE flag if it was on the first subframe of the aggregate, and then I compared debug output from rc and xmit like this: root@OpenWrt:/sys/kernel/debug# cat ieee80211/phy0/stations/00\:1e\:52\:c7\:cf\:63/rc_stats ; ca t ath9k/phy0/xmit type rate throughput ewma prob this prob this succ/attempt success attempts HT20/LGI MCS0 5.8 87.3 50.0 0( 0) 48 54 HT20/LGI MCS1 12.6 94.6 100.0 0( 0) 46 48 HT20/LGI MCS2 18.9 95.8 100.0 0( 0) 52 73 HT20/LGI MCS3 24.8 94.8 100.0 0( 0) 53 62 HT20/LGI MCS4 38.4 99.2 100.0 0( 0) 45 55 HT20/LGI MCS5 47.4 94.0 100.0 0( 0) 56 72 HT20/LGI MCS6 55.4 98.7 100.0 0( 0) 60 78 HT20/LGI PMCS7 56.2 88.8 66.6 0( 0) 112 143 HT20/LGI MCS8 10.8 81.4 50.0 0( 0) 50 62 HT20/LGI MCS9 23.6 90.4 100.0 0( 0) 66 81 HT20/LGI MCS10 30.6 79.0 50.0 0( 0) 51 64 HT20/LGI MCS11 50.1 99.2 100.0 0( 0) 56 63 HT20/LGI MCS12 60.1 80.6 100.0 0( 0) 217 382 HT20/LGI MCS13 66.6 70.6 50.0 0( 0) 2440 3042 HT20/LGI t MCS14 82.9 77.9 65.9 0( 0) 70446 86949 HT20/LGI T MCS15 85.5 73.5 77.1 264(342) 31170 43240 Total packet count:: ideal 117093 lookaround 1322 Average A-MPDU length: 10.6 BE BK VI VO MPDUs Queued: 120 0 0 224 MPDUs Completed: 120 0 0 224 Aggregates: 7555 0 0 0 AMPDUs Queued: 118358 0 0 50 AMPDUs Completed: 118247 0 0 20 AMPDUs Retried: 15406 0 0 300 AMPDUs XRetried: 21 0 0 30 FIFO Underrun: 0 0 0 0 TXOP Exceeded: 0 0 0 0 TXTIMER Expiry: 0 0 0 0 DESC CFG Error: 0 0 0 0 DATA Underrun: 0 0 0 0 DELIM Underrun: 0 0 0 0 Rate control says 1322 lookaround (=probe frames?) but ath9k xmit says only 120 + 224 MPDUs. /Björn ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate 2010-07-26 20:37 ` Björn Smedman @ 2010-07-26 20:41 ` Felix Fietkau 0 siblings, 0 replies; 7+ messages in thread From: Felix Fietkau @ 2010-07-26 20:41 UTC (permalink / raw) To: Björn Smedman; +Cc: ath9k-devel, linux-wireless On 2010-07-26 10:37 PM, Björn Smedman wrote: > 2010/7/26 Felix Fietkau <nbd@openwrt.org>: >> On 2010-07-26 9:23 PM, Björn Smedman wrote: >>> 2010/7/26 Felix Fietkau <nbd@openwrt.org>: >>> * When tx is aggregated most rate control probe frames end up inside >>> aggregates and are never used for probing (effective probe frequency >>> is divided by average aggregate length). >> Nope, a probing frame never ends up inside an aggregate. It's always >> sent out as a single frame, which is why I had to make the decision >> about sending a probing frame more complex in minstrel_ht, compared to >> minstrel - the previous 10% stuff was limiting aggregation size. > > Ok, I must have jumped to conclusions. I looked quickly at the code > and had the impression that it only cared about the RATE_PROBE flag if > it was on the first subframe of the aggregate, and then I compared > debug output from rc and xmit like this: Oh, wait. It seems that you may be right after all. I think I was remembering stuff from the wrong codebase again Well, at least what I described is what I think the code should be doing ;) - Felix ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate 2010-07-26 19:41 ` Felix Fietkau 2010-07-26 20:37 ` Björn Smedman @ 2010-07-27 4:48 ` Ranga Rao Ravuri 1 sibling, 0 replies; 7+ messages in thread From: Ranga Rao Ravuri @ 2010-07-27 4:48 UTC (permalink / raw) To: Felix Fietkau Cc: Björn Smedman, ath9k-devel@lists.ath9k.org, linux-wireless On 07/27/2010 01:11 AM, Felix Fietkau wrote: > On 2010-07-26 9:23 PM, Björn Smedman wrote: >> 2010/7/26 Felix Fietkau<nbd@openwrt.org>: >>> On 2010-07-26 7:10 PM, Björn Smedman wrote: >>>> I think there are some (in theory) simple improvements that can be >>>> done to the tx aggregation / rate control logic. A proof of concept of >>>> one such improvement is provided below. Basically, it's a hack that >>> I think it makes sense to rely less on on-chip MRR for fallback, but I >>> think to make this workable, we really should use the MRR table for >>> something, otherwise the rate control algorithm will take much longer to >>> adapt. >>> It's probably better to fix this properly after I'm done with my A-MPDU >>> rewrite, because then I can more easily push parts of the software >>> retransmission behaviour into minstrel_ht directly. >> Sounds very reasonable. I'm sure you've thought of it but now that >> it's fresh in my head it would be great if the new aggregation design >> allowed us to experiment with stuff like this: >> >> * The rate control logic treats the average aggregate length as a >> measured independent variable, when in fact it depends heavily on the >> rates selected (via the 4 ms txop limit). > Yes, with the new design maybe we could use the initial rate lookup only > for setting the sampling flag, and then doing a separate per-AMPDU > lookup, which properly takes the AMPDU length into account. > >> * When tx is aggregated most rate control probe frames end up inside >> aggregates and are never used for probing (effective probe frequency >> is divided by average aggregate length). > Nope, a probing frame never ends up inside an aggregate. It's always > sent out as a single frame, which is why I had to make the decision > about sending a probing frame more complex in minstrel_ht, compared to > minstrel - the previous 10% stuff was limiting aggregation size. > >> * When setting up a hardware MRR for an aggregate the focus should be >> on throughput (as explained earlier in this thread). But there are >> situations when reliability is important: e.g. when a subframe in the >> aggregate is about to expire (because of time or block ack window). It >> may even be advantageous to tx the subframes that are about to expire >> in their own aggregate with lower / more reliable bitrate? > Yes, that's what I was thinking as well. We should probably make this > decision based on the number of sw-retransmitted frames, and maybe > consider the offset of seqno vs baw_tail as well. > >> * In many busy radio environments the packet success rate depends very >> much on the protection method being used (none, cts-to-self or >> rts-cts), often more so than on the bitrate itself. It would be >> interesting to experiment with including the protection method in the >> rate selection, i.e. to probe for the optimal protection method and >> bitrate combination. > Sounds good. > >> * In order to have the best possible rate control in very dynamic rf >> environments it's important to keep the hardware queue short and >> select rates as late as possible (to not introduce unnecessary delay >> when selecting new rates). I have no idea how to do this but it would >> be great if the tx queue could be kept long enough to never stall tx, >> but no longer. > This would work with what I suggested above - per-AMPDU rate lookup. > With software scheduling that's easy to do, since we already restrict > the queue to max. 2 AMPDUs > >> * If I understand correctly the Atheros hardware does not adjust the >> rts / cts-to-self duration field when going through the MRR >> (correct?). In that case it may be even more advantageous to use >> software retry as much as possible when some form of protection is >> enabled. > Not sure, but I think it does adjust the duration field according to the > rate, while transmitting. [ranga] Yes it does. If you enable RTS on all rates, you would see different RTSs coming with different duration. >> Looking forward to the new aggregation code! > That will still take some time, I recently came up with some better > design ideas, which require some larger changes to the code that I > already wrote. > > - Felix > _______________________________________________ > ath9k-devel mailing list > ath9k-devel@lists.ath9k.org > https://lists.ath9k.org/mailman/listinfo/ath9k-devel ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-07-27 4:44 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-26 17:10 [RFC] ath9k: improve aggregation throughput by using only first rate Björn Smedman 2010-07-26 17:44 ` [ath9k-devel] " Felix Fietkau 2010-07-26 19:23 ` Björn Smedman 2010-07-26 19:41 ` Felix Fietkau 2010-07-26 20:37 ` Björn Smedman 2010-07-26 20:41 ` Felix Fietkau 2010-07-27 4:48 ` Ranga Rao Ravuri
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).