* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
@ 2010-07-26 19:41 ` Felix Fietkau
0 siblings, 0 replies; 14+ messages in thread
From: Felix Fietkau @ 2010-07-26 19:41 UTC (permalink / raw)
To: Björn Smedman; +Cc: ath9k-devel, linux-wireless
On 2010-07-26 9:23 PM, Björn Smedman wrote:
> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>> On 2010-07-26 7:10 PM, Björn Smedman wrote:
>>> I think there are some (in theory) simple improvements that can be
>>> done to the tx aggregation / rate control logic. A proof of concept of
>>> one such improvement is provided below. Basically, it's a hack that
>> I think it makes sense to rely less on on-chip MRR for fallback, but I
>> think to make this workable, we really should use the MRR table for
>> something, otherwise the rate control algorithm will take much longer to
>> adapt.
>> It's probably better to fix this properly after I'm done with my A-MPDU
>> rewrite, because then I can more easily push parts of the software
>> retransmission behaviour into minstrel_ht directly.
>
> Sounds very reasonable. I'm sure you've thought of it but now that
> it's fresh in my head it would be great if the new aggregation design
> allowed us to experiment with stuff like this:
>
> * The rate control logic treats the average aggregate length as a
> measured independent variable, when in fact it depends heavily on the
> rates selected (via the 4 ms txop limit).
Yes, with the new design maybe we could use the initial rate lookup only
for setting the sampling flag, and then doing a separate per-AMPDU
lookup, which properly takes the AMPDU length into account.
> * When tx is aggregated most rate control probe frames end up inside
> aggregates and are never used for probing (effective probe frequency
> is divided by average aggregate length).
Nope, a probing frame never ends up inside an aggregate. It's always
sent out as a single frame, which is why I had to make the decision
about sending a probing frame more complex in minstrel_ht, compared to
minstrel - the previous 10% stuff was limiting aggregation size.
> * When setting up a hardware MRR for an aggregate the focus should be
> on throughput (as explained earlier in this thread). But there are
> situations when reliability is important: e.g. when a subframe in the
> aggregate is about to expire (because of time or block ack window). It
> may even be advantageous to tx the subframes that are about to expire
> in their own aggregate with lower / more reliable bitrate?
Yes, that's what I was thinking as well. We should probably make this
decision based on the number of sw-retransmitted frames, and maybe
consider the offset of seqno vs baw_tail as well.
> * In many busy radio environments the packet success rate depends very
> much on the protection method being used (none, cts-to-self or
> rts-cts), often more so than on the bitrate itself. It would be
> interesting to experiment with including the protection method in the
> rate selection, i.e. to probe for the optimal protection method and
> bitrate combination.
Sounds good.
> * In order to have the best possible rate control in very dynamic rf
> environments it's important to keep the hardware queue short and
> select rates as late as possible (to not introduce unnecessary delay
> when selecting new rates). I have no idea how to do this but it would
> be great if the tx queue could be kept long enough to never stall tx,
> but no longer.
This would work with what I suggested above - per-AMPDU rate lookup.
With software scheduling that's easy to do, since we already restrict
the queue to max. 2 AMPDUs
> * If I understand correctly the Atheros hardware does not adjust the
> rts / cts-to-self duration field when going through the MRR
> (correct?). In that case it may be even more advantageous to use
> software retry as much as possible when some form of protection is
> enabled.
Not sure, but I think it does adjust the duration field according to the
rate, while transmitting.
> Looking forward to the new aggregation code!
That will still take some time, I recently came up with some better
design ideas, which require some larger changes to the code that I
already wrote.
- Felix
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
2010-07-26 19:41 ` Felix Fietkau
@ 2010-07-26 20:37 ` Björn Smedman
-1 siblings, 0 replies; 14+ messages in thread
From: Björn Smedman @ 2010-07-26 20:37 UTC (permalink / raw)
To: ath9k-devel
2010/7/26 Felix Fietkau <nbd@openwrt.org>:
> On 2010-07-26 9:23 PM, Bj?rn Smedman wrote:
>> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>> * When tx is aggregated most rate control probe frames end up inside
>> aggregates and are never used for probing (effective probe frequency
>> is divided by average aggregate length).
> Nope, a probing frame never ends up inside an aggregate. It's always
> sent out as a single frame, which is why I had to make the decision
> about sending a probing frame more complex in minstrel_ht, compared to
> minstrel - the previous 10% stuff was limiting aggregation size.
Ok, I must have jumped to conclusions. I looked quickly at the code
and had the impression that it only cared about the RATE_PROBE flag if
it was on the first subframe of the aggregate, and then I compared
debug output from rc and xmit like this:
root at OpenWrt:/sys/kernel/debug# cat
ieee80211/phy0/stations/00\:1e\:52\:c7\:cf\:63/rc_stats ; ca
t ath9k/phy0/xmit
type rate throughput ewma prob this prob this
succ/attempt success attempts
HT20/LGI MCS0 5.8 87.3 50.0 0( 0)
48 54
HT20/LGI MCS1 12.6 94.6 100.0 0( 0)
46 48
HT20/LGI MCS2 18.9 95.8 100.0 0( 0)
52 73
HT20/LGI MCS3 24.8 94.8 100.0 0( 0)
53 62
HT20/LGI MCS4 38.4 99.2 100.0 0( 0)
45 55
HT20/LGI MCS5 47.4 94.0 100.0 0( 0)
56 72
HT20/LGI MCS6 55.4 98.7 100.0 0( 0)
60 78
HT20/LGI PMCS7 56.2 88.8 66.6 0( 0)
112 143
HT20/LGI MCS8 10.8 81.4 50.0 0( 0)
50 62
HT20/LGI MCS9 23.6 90.4 100.0 0( 0)
66 81
HT20/LGI MCS10 30.6 79.0 50.0 0( 0)
51 64
HT20/LGI MCS11 50.1 99.2 100.0 0( 0)
56 63
HT20/LGI MCS12 60.1 80.6 100.0 0( 0)
217 382
HT20/LGI MCS13 66.6 70.6 50.0 0( 0)
2440 3042
HT20/LGI t MCS14 82.9 77.9 65.9 0( 0)
70446 86949
HT20/LGI T MCS15 85.5 73.5 77.1 264(342)
31170 43240
Total packet count:: ideal 117093 lookaround 1322
Average A-MPDU length: 10.6
BE BK VI VO
MPDUs Queued: 120 0 0 224
MPDUs Completed: 120 0 0 224
Aggregates: 7555 0 0 0
AMPDUs Queued: 118358 0 0 50
AMPDUs Completed: 118247 0 0 20
AMPDUs Retried: 15406 0 0 300
AMPDUs XRetried: 21 0 0 30
FIFO Underrun: 0 0 0 0
TXOP Exceeded: 0 0 0 0
TXTIMER Expiry: 0 0 0 0
DESC CFG Error: 0 0 0 0
DATA Underrun: 0 0 0 0
DELIM Underrun: 0 0 0 0
Rate control says 1322 lookaround (=probe frames?) but ath9k xmit says
only 120 + 224 MPDUs.
/Bj?rn
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
@ 2010-07-26 20:37 ` Björn Smedman
0 siblings, 0 replies; 14+ messages in thread
From: Björn Smedman @ 2010-07-26 20:37 UTC (permalink / raw)
To: Felix Fietkau; +Cc: ath9k-devel, linux-wireless
2010/7/26 Felix Fietkau <nbd@openwrt.org>:
> On 2010-07-26 9:23 PM, Björn Smedman wrote:
>> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>> * When tx is aggregated most rate control probe frames end up inside
>> aggregates and are never used for probing (effective probe frequency
>> is divided by average aggregate length).
> Nope, a probing frame never ends up inside an aggregate. It's always
> sent out as a single frame, which is why I had to make the decision
> about sending a probing frame more complex in minstrel_ht, compared to
> minstrel - the previous 10% stuff was limiting aggregation size.
Ok, I must have jumped to conclusions. I looked quickly at the code
and had the impression that it only cared about the RATE_PROBE flag if
it was on the first subframe of the aggregate, and then I compared
debug output from rc and xmit like this:
root@OpenWrt:/sys/kernel/debug# cat
ieee80211/phy0/stations/00\:1e\:52\:c7\:cf\:63/rc_stats ; ca
t ath9k/phy0/xmit
type rate throughput ewma prob this prob this
succ/attempt success attempts
HT20/LGI MCS0 5.8 87.3 50.0 0( 0)
48 54
HT20/LGI MCS1 12.6 94.6 100.0 0( 0)
46 48
HT20/LGI MCS2 18.9 95.8 100.0 0( 0)
52 73
HT20/LGI MCS3 24.8 94.8 100.0 0( 0)
53 62
HT20/LGI MCS4 38.4 99.2 100.0 0( 0)
45 55
HT20/LGI MCS5 47.4 94.0 100.0 0( 0)
56 72
HT20/LGI MCS6 55.4 98.7 100.0 0( 0)
60 78
HT20/LGI PMCS7 56.2 88.8 66.6 0( 0)
112 143
HT20/LGI MCS8 10.8 81.4 50.0 0( 0)
50 62
HT20/LGI MCS9 23.6 90.4 100.0 0( 0)
66 81
HT20/LGI MCS10 30.6 79.0 50.0 0( 0)
51 64
HT20/LGI MCS11 50.1 99.2 100.0 0( 0)
56 63
HT20/LGI MCS12 60.1 80.6 100.0 0( 0)
217 382
HT20/LGI MCS13 66.6 70.6 50.0 0( 0)
2440 3042
HT20/LGI t MCS14 82.9 77.9 65.9 0( 0)
70446 86949
HT20/LGI T MCS15 85.5 73.5 77.1 264(342)
31170 43240
Total packet count:: ideal 117093 lookaround 1322
Average A-MPDU length: 10.6
BE BK VI VO
MPDUs Queued: 120 0 0 224
MPDUs Completed: 120 0 0 224
Aggregates: 7555 0 0 0
AMPDUs Queued: 118358 0 0 50
AMPDUs Completed: 118247 0 0 20
AMPDUs Retried: 15406 0 0 300
AMPDUs XRetried: 21 0 0 30
FIFO Underrun: 0 0 0 0
TXOP Exceeded: 0 0 0 0
TXTIMER Expiry: 0 0 0 0
DESC CFG Error: 0 0 0 0
DATA Underrun: 0 0 0 0
DELIM Underrun: 0 0 0 0
Rate control says 1322 lookaround (=probe frames?) but ath9k xmit says
only 120 + 224 MPDUs.
/Björn
^ permalink raw reply [flat|nested] 14+ messages in thread* [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
2010-07-26 20:37 ` Björn Smedman
@ 2010-07-26 20:41 ` Felix Fietkau
-1 siblings, 0 replies; 14+ messages in thread
From: Felix Fietkau @ 2010-07-26 20:41 UTC (permalink / raw)
To: ath9k-devel
On 2010-07-26 10:37 PM, Bj?rn Smedman wrote:
> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>> On 2010-07-26 9:23 PM, Bj?rn Smedman wrote:
>>> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>>> * When tx is aggregated most rate control probe frames end up inside
>>> aggregates and are never used for probing (effective probe frequency
>>> is divided by average aggregate length).
>> Nope, a probing frame never ends up inside an aggregate. It's always
>> sent out as a single frame, which is why I had to make the decision
>> about sending a probing frame more complex in minstrel_ht, compared to
>> minstrel - the previous 10% stuff was limiting aggregation size.
>
> Ok, I must have jumped to conclusions. I looked quickly at the code
> and had the impression that it only cared about the RATE_PROBE flag if
> it was on the first subframe of the aggregate, and then I compared
> debug output from rc and xmit like this:
Oh, wait. It seems that you may be right after all. I think I was
remembering stuff from the wrong codebase again Well, at least what I
described is what I think the code should be doing ;)
- Felix
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
@ 2010-07-26 20:41 ` Felix Fietkau
0 siblings, 0 replies; 14+ messages in thread
From: Felix Fietkau @ 2010-07-26 20:41 UTC (permalink / raw)
To: Björn Smedman; +Cc: ath9k-devel, linux-wireless
On 2010-07-26 10:37 PM, Björn Smedman wrote:
> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>> On 2010-07-26 9:23 PM, Björn Smedman wrote:
>>> 2010/7/26 Felix Fietkau <nbd@openwrt.org>:
>>> * When tx is aggregated most rate control probe frames end up inside
>>> aggregates and are never used for probing (effective probe frequency
>>> is divided by average aggregate length).
>> Nope, a probing frame never ends up inside an aggregate. It's always
>> sent out as a single frame, which is why I had to make the decision
>> about sending a probing frame more complex in minstrel_ht, compared to
>> minstrel - the previous 10% stuff was limiting aggregation size.
>
> Ok, I must have jumped to conclusions. I looked quickly at the code
> and had the impression that it only cared about the RATE_PROBE flag if
> it was on the first subframe of the aggregate, and then I compared
> debug output from rc and xmit like this:
Oh, wait. It seems that you may be right after all. I think I was
remembering stuff from the wrong codebase again Well, at least what I
described is what I think the code should be doing ;)
- Felix
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
2010-07-26 19:41 ` Felix Fietkau
@ 2010-07-27 4:48 ` Ranga Rao Ravuri
-1 siblings, 0 replies; 14+ messages in thread
From: Ranga Rao Ravuri @ 2010-07-27 4:48 UTC (permalink / raw)
To: ath9k-devel
On 07/27/2010 01:11 AM, Felix Fietkau wrote:
> On 2010-07-26 9:23 PM, Bj?rn Smedman wrote:
>> 2010/7/26 Felix Fietkau<nbd@openwrt.org>:
>>> On 2010-07-26 7:10 PM, Bj?rn Smedman wrote:
>>>> I think there are some (in theory) simple improvements that can be
>>>> done to the tx aggregation / rate control logic. A proof of concept of
>>>> one such improvement is provided below. Basically, it's a hack that
>>> I think it makes sense to rely less on on-chip MRR for fallback, but I
>>> think to make this workable, we really should use the MRR table for
>>> something, otherwise the rate control algorithm will take much longer to
>>> adapt.
>>> It's probably better to fix this properly after I'm done with my A-MPDU
>>> rewrite, because then I can more easily push parts of the software
>>> retransmission behaviour into minstrel_ht directly.
>> Sounds very reasonable. I'm sure you've thought of it but now that
>> it's fresh in my head it would be great if the new aggregation design
>> allowed us to experiment with stuff like this:
>>
>> * The rate control logic treats the average aggregate length as a
>> measured independent variable, when in fact it depends heavily on the
>> rates selected (via the 4 ms txop limit).
> Yes, with the new design maybe we could use the initial rate lookup only
> for setting the sampling flag, and then doing a separate per-AMPDU
> lookup, which properly takes the AMPDU length into account.
>
>> * When tx is aggregated most rate control probe frames end up inside
>> aggregates and are never used for probing (effective probe frequency
>> is divided by average aggregate length).
> Nope, a probing frame never ends up inside an aggregate. It's always
> sent out as a single frame, which is why I had to make the decision
> about sending a probing frame more complex in minstrel_ht, compared to
> minstrel - the previous 10% stuff was limiting aggregation size.
>
>> * When setting up a hardware MRR for an aggregate the focus should be
>> on throughput (as explained earlier in this thread). But there are
>> situations when reliability is important: e.g. when a subframe in the
>> aggregate is about to expire (because of time or block ack window). It
>> may even be advantageous to tx the subframes that are about to expire
>> in their own aggregate with lower / more reliable bitrate?
> Yes, that's what I was thinking as well. We should probably make this
> decision based on the number of sw-retransmitted frames, and maybe
> consider the offset of seqno vs baw_tail as well.
>
>> * In many busy radio environments the packet success rate depends very
>> much on the protection method being used (none, cts-to-self or
>> rts-cts), often more so than on the bitrate itself. It would be
>> interesting to experiment with including the protection method in the
>> rate selection, i.e. to probe for the optimal protection method and
>> bitrate combination.
> Sounds good.
>
>> * In order to have the best possible rate control in very dynamic rf
>> environments it's important to keep the hardware queue short and
>> select rates as late as possible (to not introduce unnecessary delay
>> when selecting new rates). I have no idea how to do this but it would
>> be great if the tx queue could be kept long enough to never stall tx,
>> but no longer.
> This would work with what I suggested above - per-AMPDU rate lookup.
> With software scheduling that's easy to do, since we already restrict
> the queue to max. 2 AMPDUs
>
>> * If I understand correctly the Atheros hardware does not adjust the
>> rts / cts-to-self duration field when going through the MRR
>> (correct?). In that case it may be even more advantageous to use
>> software retry as much as possible when some form of protection is
>> enabled.
> Not sure, but I think it does adjust the duration field according to the
> rate, while transmitting.
[ranga] Yes it does. If you enable RTS on all rates, you would see
different RTSs coming with different duration.
>> Looking forward to the new aggregation code!
> That will still take some time, I recently came up with some better
> design ideas, which require some larger changes to the code that I
> already wrote.
>
> - Felix
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel at lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
@ 2010-07-27 4:48 ` Ranga Rao Ravuri
0 siblings, 0 replies; 14+ messages in thread
From: Ranga Rao Ravuri @ 2010-07-27 4:48 UTC (permalink / raw)
To: Felix Fietkau
Cc: Björn Smedman, ath9k-devel@lists.ath9k.org, linux-wireless
On 07/27/2010 01:11 AM, Felix Fietkau wrote:
> On 2010-07-26 9:23 PM, Björn Smedman wrote:
>> 2010/7/26 Felix Fietkau<nbd@openwrt.org>:
>>> On 2010-07-26 7:10 PM, Björn Smedman wrote:
>>>> I think there are some (in theory) simple improvements that can be
>>>> done to the tx aggregation / rate control logic. A proof of concept of
>>>> one such improvement is provided below. Basically, it's a hack that
>>> I think it makes sense to rely less on on-chip MRR for fallback, but I
>>> think to make this workable, we really should use the MRR table for
>>> something, otherwise the rate control algorithm will take much longer to
>>> adapt.
>>> It's probably better to fix this properly after I'm done with my A-MPDU
>>> rewrite, because then I can more easily push parts of the software
>>> retransmission behaviour into minstrel_ht directly.
>> Sounds very reasonable. I'm sure you've thought of it but now that
>> it's fresh in my head it would be great if the new aggregation design
>> allowed us to experiment with stuff like this:
>>
>> * The rate control logic treats the average aggregate length as a
>> measured independent variable, when in fact it depends heavily on the
>> rates selected (via the 4 ms txop limit).
> Yes, with the new design maybe we could use the initial rate lookup only
> for setting the sampling flag, and then doing a separate per-AMPDU
> lookup, which properly takes the AMPDU length into account.
>
>> * When tx is aggregated most rate control probe frames end up inside
>> aggregates and are never used for probing (effective probe frequency
>> is divided by average aggregate length).
> Nope, a probing frame never ends up inside an aggregate. It's always
> sent out as a single frame, which is why I had to make the decision
> about sending a probing frame more complex in minstrel_ht, compared to
> minstrel - the previous 10% stuff was limiting aggregation size.
>
>> * When setting up a hardware MRR for an aggregate the focus should be
>> on throughput (as explained earlier in this thread). But there are
>> situations when reliability is important: e.g. when a subframe in the
>> aggregate is about to expire (because of time or block ack window). It
>> may even be advantageous to tx the subframes that are about to expire
>> in their own aggregate with lower / more reliable bitrate?
> Yes, that's what I was thinking as well. We should probably make this
> decision based on the number of sw-retransmitted frames, and maybe
> consider the offset of seqno vs baw_tail as well.
>
>> * In many busy radio environments the packet success rate depends very
>> much on the protection method being used (none, cts-to-self or
>> rts-cts), often more so than on the bitrate itself. It would be
>> interesting to experiment with including the protection method in the
>> rate selection, i.e. to probe for the optimal protection method and
>> bitrate combination.
> Sounds good.
>
>> * In order to have the best possible rate control in very dynamic rf
>> environments it's important to keep the hardware queue short and
>> select rates as late as possible (to not introduce unnecessary delay
>> when selecting new rates). I have no idea how to do this but it would
>> be great if the tx queue could be kept long enough to never stall tx,
>> but no longer.
> This would work with what I suggested above - per-AMPDU rate lookup.
> With software scheduling that's easy to do, since we already restrict
> the queue to max. 2 AMPDUs
>
>> * If I understand correctly the Atheros hardware does not adjust the
>> rts / cts-to-self duration field when going through the MRR
>> (correct?). In that case it may be even more advantageous to use
>> software retry as much as possible when some form of protection is
>> enabled.
> Not sure, but I think it does adjust the duration field according to the
> rate, while transmitting.
[ranga] Yes it does. If you enable RTS on all rates, you would see
different RTSs coming with different duration.
>> Looking forward to the new aggregation code!
> That will still take some time, I recently came up with some better
> design ideas, which require some larger changes to the code that I
> already wrote.
>
> - Felix
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel@lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel
^ permalink raw reply [flat|nested] 14+ messages in thread