* bug: iwlwifi, aggregation, and mac80211's reorder buffer @ 2011-03-11 8:11 Daniel Halperin 2011-03-11 8:13 ` Guy, Wey-Yi 2011-03-14 16:24 ` Johannes Berg 0 siblings, 2 replies; 16+ messages in thread From: Daniel Halperin @ 2011-03-11 8:11 UTC (permalink / raw) To: ipw3945-devel, linux-wireless I'm doing some performance debugging of iwlwifi. My test setup has one machine with iwlwifi-5300 (3 TX streams) and ar9280 (2 TX streams) connected to a 3-stream iwl5300 AP. There's no/not much external interference in my setup. I'm doing bandwidth tests using iperf. Both cards gets something like 200 Mbps using UDP, and Atheros gets 130--150 Mbps using TCP while iwlwifi gets ~60 Mbps(!!). This seems bad, especially since iwl5300 has 3 TX streams. And they're connecting from the same computer to the same AP while getting similar UDP performance, so I don't think it's a hardware difference. In looking at various TCP effects, I noticed that with ath9k+minstrel_ht, there are never any TCP retransmissions or selective acknowledgements. However, with iwlwifi these are rampant and large bursts of these are highly correlated with network-layer dead time even up to ~100ms(!!). I first suspected rate selection, so I hacked iwlwifi to only use MCS that work very well for my test link. I confirmed that there is very little loss in my experiments. I then moved on to aggregation. One thing that Intel does that ath9k does not is transmit packets out of sequence number order inside a batch. (This is legal in the 802.11 standard). I figured that one explanation for the TCP SACKs would be if, somehow, frames got released to the network stack out of order; indeed, many of the "holes" covered by the SACKs are filled quickly (within ~4ms, about the length of one aggregation batch). Note that iwlwifi defaults to an aggregation frame limit, hence buffer size, of 31 frames. mac80211 honors this buffer size specification by releasing frames to the network stack that are >= 31 sequence numbers below the highest received frame. It looks like Intel doesn't honor its own frame limit, as I often saw it have more than 31 frames outstanding, causing mac80211 on the receiver to release many frames early. Changing iwlwifi's default agg limit to 63 frames on both ends dramatically reduced the prevalence of SACKs/TCP retransmissions and improved avg TCP performance to ~100 Mbps (ranging 83-110). A few questions: (1) Why does iwlwifi default to an aggregation frame limit of 31? I didn't see any negative effects from 63 frame limit and performance improved dramatically (2) Is there a way to make iwlwifi honor the aggregation limit? I know that agg is controlled by a hardware scheduler, so this may be difficult. (3) mac80211's reorder buffer code could probably also be relaxed. It probably wouldn't hurt to have the buffer be twice the transmitter's advertised buffer, and might compensate for devices that don't properly honor frame limits. Also, mac80211 only flushes the reorder buffer every 100 ms. That seems like a LONG time given that batches are limited to 4ms -- 100ms is room for at least 10, maybe 20 retransmissions to attempt to fill in the holes(!). (4) even after this fix, I see a few SACKs, and even when there aren't SACKs I still see TCP "dead time" up to ~100ms. What else would you use to debug this setup? Thanks Dan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-11 8:11 bug: iwlwifi, aggregation, and mac80211's reorder buffer Daniel Halperin @ 2011-03-11 8:13 ` Guy, Wey-Yi 2011-03-13 19:59 ` Daniel Halperin 2011-03-14 16:24 ` Johannes Berg 1 sibling, 1 reply; 16+ messages in thread From: Guy, Wey-Yi @ 2011-03-11 8:13 UTC (permalink / raw) To: Daniel Halperin Cc: ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org Hi Daniel, On Fri, 2011-03-11 at 00:11 -0800, Daniel Halperin wrote: > I'm doing some performance debugging of iwlwifi. My test setup has > one machine with iwlwifi-5300 (3 TX streams) and ar9280 (2 TX streams) > connected to a 3-stream iwl5300 AP. There's no/not much external > interference in my setup. I'm doing bandwidth tests using iperf. > Both cards gets something like 200 Mbps using UDP, and Atheros gets > 130--150 Mbps using TCP while iwlwifi gets ~60 Mbps(!!). This seems > bad, especially since iwl5300 has 3 TX streams. And they're > connecting from the same computer to the same AP while getting similar > UDP performance, so I don't think it's a hardware difference. > > In looking at various TCP effects, I noticed that with > ath9k+minstrel_ht, there are never any TCP retransmissions or > selective acknowledgements. However, with iwlwifi these are rampant > and large bursts of these are highly correlated with network-layer > dead time even up to ~100ms(!!). > > I first suspected rate selection, so I hacked iwlwifi to only use MCS > that work very well for my test link. I confirmed that there is very > little loss in my experiments. I then moved on to aggregation. > > One thing that Intel does that ath9k does not is transmit packets out > of sequence number order inside a batch. (This is legal in the 802.11 > standard). I figured that one explanation for the TCP SACKs would be > if, somehow, frames got released to the network stack out of order; > indeed, many of the "holes" covered by the SACKs are filled quickly > (within ~4ms, about the length of one aggregation batch). Note that > iwlwifi defaults to an aggregation frame limit, hence buffer size, of > 31 frames. mac80211 honors this buffer size specification by > releasing frames to the network stack that are >= 31 sequence numbers > below the highest received frame. > > It looks like Intel doesn't honor its own frame limit, as I often saw > it have more than 31 frames outstanding, causing mac80211 on the > receiver to release many frames early. Changing iwlwifi's default agg > limit to 63 frames on both ends dramatically reduced the prevalence of > SACKs/TCP retransmissions and improved avg TCP performance to ~100 > Mbps (ranging 83-110). > > A few questions: > > (1) Why does iwlwifi default to an aggregation frame limit of 31? I > didn't see any negative effects from 63 frame limit and performance > improved dramatically if I remember correctly, we change from 63 to 31 while we have some 11n performance issue. even later we found out frame limit is not the main reason of low tpt, but we did not change it back since at the time we did not see any performance different, I believe we can use different frame limit, but I will prefer make it more flexible, maybe something could be change by either module parameter or debugfs. Also I am not sure are there any behavior differences between different devices and different versions of ucode. > (2) Is there a way to make iwlwifi honor the aggregation limit? I > know that agg is controlled by a hardware scheduler, so this may be > difficult. Agree, I will try to find more information from our PHY engineer. > (3) mac80211's reorder buffer code could probably also be relaxed. It > probably wouldn't hurt to have the buffer be twice the transmitter's > advertised buffer, and might compensate for devices that don't > properly honor frame limits. Also, mac80211 only flushes the reorder > buffer every 100 ms. That seems like a LONG time given that batches > are limited to 4ms -- 100ms is room for at least 10, maybe 20 > retransmissions to attempt to fill in the holes(!). did you try it and do you have any data? Thanks Wey ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-11 8:13 ` Guy, Wey-Yi @ 2011-03-13 19:59 ` Daniel Halperin 2011-03-14 0:47 ` wwguy 0 siblings, 1 reply; 16+ messages in thread From: Daniel Halperin @ 2011-03-13 19:59 UTC (permalink / raw) To: Guy, Wey-Yi Cc: ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org On Fri, Mar 11, 2011 at 12:13 AM, Guy, Wey-Yi <wey-yi.w.guy@intel.com> wrote: >> (1) Why does iwlwifi default to an aggregation frame limit of 31? I >> didn't see any negative effects from 63 frame limit and performance >> improved dramatically > if I remember correctly, we change from 63 to 31 while we have some 11n > performance issue. even later we found out frame limit is not the main > reason of low tpt, but we did not change it back since at the time we > did not see any performance different, I believe we can use different > frame limit, but I will prefer make it more flexible, maybe something > could be change by either module parameter or debugfs. Also I am not > sure are there any behavior differences between different devices and > different versions of ucode. 63 hasn't hurt me yet, though the scheduler still does occasionally send a 64th frame that shifts the reorder buffer's window. Is there a chance that 64 will work as a max? 63 seems an odd choice. >> (3) mac80211's reorder buffer code could probably also be relaxed. It >> probably wouldn't hurt to have the buffer be twice the transmitter's >> advertised buffer, and might compensate for devices that don't >> properly honor frame limits. Also, mac80211 only flushes the reorder >> buffer every 100 ms. That seems like a LONG time given that batches >> are limited to 4ms -- 100ms is room for at least 10, maybe 20 >> retransmissions to attempt to fill in the holes(!). Removing the check here <https://github.com/mirrors/linux-2.6/blob/master/net/mac80211/agg-rx.c#L231> essentially forces the receive buffer to be 64 frames long. This works just fine for me (iwl5300 TXer and RXer) and would seem to be appropriate for any transmitter window. If the transmitter uses a window of, say, 31 frames and also honors its limit, then I think the only wasted space would be 33 pointers to skb's at the transmitter side. I didn't see any degradation using an ath9k+minstrel_ht transmitter with this change. Dan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-13 19:59 ` Daniel Halperin @ 2011-03-14 0:47 ` wwguy 2011-03-14 18:16 ` Daniel Halperin 0 siblings, 1 reply; 16+ messages in thread From: wwguy @ 2011-03-14 0:47 UTC (permalink / raw) To: Daniel Halperin Cc: ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org On Sun, 2011-03-13 at 12:59 -0700, Daniel Halperin wrote: > On Fri, Mar 11, 2011 at 12:13 AM, Guy, Wey-Yi <wey-yi.w.guy@intel.com> wrote: > >> (1) Why does iwlwifi default to an aggregation frame limit of 31? I > >> didn't see any negative effects from 63 frame limit and performance > >> improved dramatically > > if I remember correctly, we change from 63 to 31 while we have some 11n > > performance issue. even later we found out frame limit is not the main > > reason of low tpt, but we did not change it back since at the time we > > did not see any performance different, I believe we can use different > > frame limit, but I will prefer make it more flexible, maybe something > > could be change by either module parameter or debugfs. Also I am not > > sure are there any behavior differences between different devices and > > different versions of ucode. > > 63 hasn't hurt me yet, though the scheduler still does occasionally > send a 64th frame that shifts the reorder buffer's window. Is there a > chance that 64 will work as a max? 63 seems an odd choice. it is limited by uCode, did not have chance to look at what the reason is yet. my initial guess without look at the code, 63 = 0x3F which use 6 bits and 64=0x40 is 7 bits, but I am not sure, I will check Thanks > > >> (3) mac80211's reorder buffer code could probably also be relaxed. It > >> probably wouldn't hurt to have the buffer be twice the transmitter's > >> advertised buffer, and might compensate for devices that don't > >> properly honor frame limits. Also, mac80211 only flushes the reorder > >> buffer every 100 ms. That seems like a LONG time given that batches > >> are limited to 4ms -- 100ms is room for at least 10, maybe 20 > >> retransmissions to attempt to fill in the holes(!). > > Removing the check here > <https://github.com/mirrors/linux-2.6/blob/master/net/mac80211/agg-rx.c#L231> > essentially forces the receive buffer to be 64 frames long. This > works just fine for me (iwl5300 TXer and RXer) and would seem to be > appropriate for any transmitter window. If the transmitter uses a > window of, say, 31 frames and also honors its limit, then I think the > only wasted space would be 33 pointers to skb's at the transmitter > side. I didn't see any degradation using an ath9k+minstrel_ht > transmitter with this change. > > Dan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-14 0:47 ` wwguy @ 2011-03-14 18:16 ` Daniel Halperin 2011-03-15 11:51 ` Johannes Berg 0 siblings, 1 reply; 16+ messages in thread From: Daniel Halperin @ 2011-03-14 18:16 UTC (permalink / raw) To: wwguy; +Cc: ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org On Sun, Mar 13, 2011 at 5:47 PM, wwguy <wey-yi.w.guy@intel.com> wrote: > On Sun, 2011-03-13 at 12:59 -0700, Daniel Halperin wrote: >> On Fri, Mar 11, 2011 at 12:13 AM, Guy, Wey-Yi <wey-yi.w.guy@intel.com> wrote: >> >> (1) Why does iwlwifi default to an aggregation frame limit of 31? I >> >> didn't see any negative effects from 63 frame limit and performance >> >> improved dramatically >> > if I remember correctly, we change from 63 to 31 while we have some 11n >> > performance issue. even later we found out frame limit is not the main >> > reason of low tpt, but we did not change it back since at the time we >> > did not see any performance different, I believe we can use different >> > frame limit, but I will prefer make it more flexible, maybe something >> > could be change by either module parameter or debugfs. Also I am not >> > sure are there any behavior differences between different devices and >> > different versions of ucode. >> >> 63 hasn't hurt me yet, though the scheduler still does occasionally >> send a 64th frame that shifts the reorder buffer's window. Is there a >> chance that 64 will work as a max? 63 seems an odd choice. > > it is limited by uCode, did not have chance to look at what the reason > is yet. my initial guess without look at the code, 63 = 0x3F which use 6 > bits and 64=0x40 is 7 bits, but I am not sure, I will check A few followups: in iwl-prph.h, we see that the default SCD window size and AGG FRAME LIMIT are both 64 [1]. Indeed, the various masks on these fields are 7 bits long. I bet an agg frame limit of 64 would work just fine. BUT, it looks like when we set up aggregation, we set the scheduler to ALWAYS use the default maximum of 64 frames for both these parameters [2] when configuring the agg queue. This is probably why the scheduler is willing to send long batches and have many outstanding frames. I bet that the fix is to make these parameters match the ones that come down from mac80211. If I get time later, I will see if I can fix this. Dan [1] https://github.com/mirrors/linux-2.6/blob/master/drivers/net/wireless/iwlwifi/iwl-prph.h#L336 [2] https://github.com/mirrors/linux-2.6/blob/master/drivers/net/wireless/iwlwifi/iwl-agn-tx.c#L270 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-14 18:16 ` Daniel Halperin @ 2011-03-15 11:51 ` Johannes Berg 2011-03-15 12:52 ` Johannes Berg 0 siblings, 1 reply; 16+ messages in thread From: Johannes Berg @ 2011-03-15 11:51 UTC (permalink / raw) To: Daniel Halperin Cc: wwguy, ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org On Mon, 2011-03-14 at 11:16 -0700, Daniel Halperin wrote: > BUT, it looks like when we set up aggregation, we set the scheduler to > ALWAYS use the default maximum of 64 frames for both these parameters > [2] when configuring the agg queue. This is probably why the scheduler > is willing to send long batches and have many outstanding frames. I > bet that the fix is to make these parameters match the ones that come > down from mac80211. If I get time later, I will see if I can fix > this. Yeah, you're absolutely right, the window size and frame limit should both match the information about the aggregation session, and be limited to what we and the peer would like to see. The bad thing is that we set up the queue when we get IEEE80211_AMPDU_TX_START, but we only know the window size when we get IEEE80211_AMPDU_TX_OPERATIONAL. However, I think we can change this since the first TX to the queue will only happen after _OPERATIONAL. We just need to refactor iwlagn_tx_agg_start() into the parts that can fail, and the parts that don't. Before that though, I suggest the patch below. johannes --- drivers/net/wireless/iwlwifi/iwl-1000.c | 2 -- drivers/net/wireless/iwlwifi/iwl-2000.c | 2 -- drivers/net/wireless/iwlwifi/iwl-5000.c | 4 ---- drivers/net/wireless/iwlwifi/iwl-6000.c | 4 ---- drivers/net/wireless/iwlwifi/iwl-agn-tx.c | 17 +++++++---------- drivers/net/wireless/iwlwifi/iwl-agn.h | 4 ---- drivers/net/wireless/iwlwifi/iwl-core.h | 5 ----- 7 files changed, 7 insertions(+), 31 deletions(-) --- a/drivers/net/wireless/iwlwifi/iwl-1000.c 2011-03-15 12:33:56.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-1000.c 2011-03-15 12:34:00.000000000 +0100 @@ -179,8 +179,6 @@ static struct iwl_lib_ops iwl1000_lib = .txq_update_byte_cnt_tbl = iwlagn_txq_update_byte_cnt_tbl, .txq_inval_byte_cnt_tbl = iwlagn_txq_inval_byte_cnt_tbl, .txq_set_sched = iwlagn_txq_set_sched, - .txq_agg_enable = iwlagn_txq_agg_enable, - .txq_agg_disable = iwlagn_txq_agg_disable, .txq_attach_buf_to_tfd = iwl_hw_txq_attach_buf_to_tfd, .txq_free_tfd = iwl_hw_txq_free_tfd, .txq_init = iwl_hw_tx_queue_init, --- a/drivers/net/wireless/iwlwifi/iwl-2000.c 2011-03-15 12:33:56.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-2000.c 2011-03-15 12:34:02.000000000 +0100 @@ -259,8 +259,6 @@ static struct iwl_lib_ops iwl2000_lib = .txq_update_byte_cnt_tbl = iwlagn_txq_update_byte_cnt_tbl, .txq_inval_byte_cnt_tbl = iwlagn_txq_inval_byte_cnt_tbl, .txq_set_sched = iwlagn_txq_set_sched, - .txq_agg_enable = iwlagn_txq_agg_enable, - .txq_agg_disable = iwlagn_txq_agg_disable, .txq_attach_buf_to_tfd = iwl_hw_txq_attach_buf_to_tfd, .txq_free_tfd = iwl_hw_txq_free_tfd, .txq_init = iwl_hw_tx_queue_init, --- a/drivers/net/wireless/iwlwifi/iwl-5000.c 2011-03-15 12:33:56.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-5000.c 2011-03-15 12:34:05.000000000 +0100 @@ -348,8 +348,6 @@ static struct iwl_lib_ops iwl5000_lib = .txq_update_byte_cnt_tbl = iwlagn_txq_update_byte_cnt_tbl, .txq_inval_byte_cnt_tbl = iwlagn_txq_inval_byte_cnt_tbl, .txq_set_sched = iwlagn_txq_set_sched, - .txq_agg_enable = iwlagn_txq_agg_enable, - .txq_agg_disable = iwlagn_txq_agg_disable, .txq_attach_buf_to_tfd = iwl_hw_txq_attach_buf_to_tfd, .txq_free_tfd = iwl_hw_txq_free_tfd, .txq_init = iwl_hw_tx_queue_init, @@ -416,8 +414,6 @@ static struct iwl_lib_ops iwl5150_lib = .txq_update_byte_cnt_tbl = iwlagn_txq_update_byte_cnt_tbl, .txq_inval_byte_cnt_tbl = iwlagn_txq_inval_byte_cnt_tbl, .txq_set_sched = iwlagn_txq_set_sched, - .txq_agg_enable = iwlagn_txq_agg_enable, - .txq_agg_disable = iwlagn_txq_agg_disable, .txq_attach_buf_to_tfd = iwl_hw_txq_attach_buf_to_tfd, .txq_free_tfd = iwl_hw_txq_free_tfd, .txq_init = iwl_hw_tx_queue_init, --- a/drivers/net/wireless/iwlwifi/iwl-6000.c 2011-03-15 12:33:56.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-6000.c 2011-03-15 12:34:08.000000000 +0100 @@ -288,8 +288,6 @@ static struct iwl_lib_ops iwl6000_lib = .txq_update_byte_cnt_tbl = iwlagn_txq_update_byte_cnt_tbl, .txq_inval_byte_cnt_tbl = iwlagn_txq_inval_byte_cnt_tbl, .txq_set_sched = iwlagn_txq_set_sched, - .txq_agg_enable = iwlagn_txq_agg_enable, - .txq_agg_disable = iwlagn_txq_agg_disable, .txq_attach_buf_to_tfd = iwl_hw_txq_attach_buf_to_tfd, .txq_free_tfd = iwl_hw_txq_free_tfd, .txq_init = iwl_hw_tx_queue_init, @@ -357,8 +355,6 @@ static struct iwl_lib_ops iwl6030_lib = .txq_update_byte_cnt_tbl = iwlagn_txq_update_byte_cnt_tbl, .txq_inval_byte_cnt_tbl = iwlagn_txq_inval_byte_cnt_tbl, .txq_set_sched = iwlagn_txq_set_sched, - .txq_agg_enable = iwlagn_txq_agg_enable, - .txq_agg_disable = iwlagn_txq_agg_disable, .txq_attach_buf_to_tfd = iwl_hw_txq_attach_buf_to_tfd, .txq_free_tfd = iwl_hw_txq_free_tfd, .txq_init = iwl_hw_tx_queue_init, --- a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c 2011-03-15 12:32:58.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c 2011-03-15 12:33:39.000000000 +0100 @@ -222,8 +222,8 @@ void iwlagn_tx_queue_set_status(struct i scd_retry ? "BA" : "AC/CMD", txq_id, tx_fifo_id); } -int iwlagn_txq_agg_enable(struct iwl_priv *priv, int txq_id, - int tx_fifo, int sta_id, int tid, u16 ssn_idx) +static int iwlagn_txq_agg_enable(struct iwl_priv *priv, int txq_id, + int tx_fifo, int sta_id, int tid, u16 ssn_idx) { unsigned long flags; u16 ra_tid; @@ -288,8 +288,8 @@ int iwlagn_txq_agg_enable(struct iwl_pri return 0; } -int iwlagn_txq_agg_disable(struct iwl_priv *priv, u16 txq_id, - u16 ssn_idx, u8 tx_fifo) +static int iwlagn_txq_agg_disable(struct iwl_priv *priv, u16 txq_id, + u16 ssn_idx, u8 tx_fifo) { if ((IWLAGN_FIRST_AMPDU_QUEUE > txq_id) || (IWLAGN_FIRST_AMPDU_QUEUE + @@ -1037,8 +1037,7 @@ int iwlagn_tx_agg_start(struct iwl_priv iwl_set_swq_id(&priv->txq[txq_id], get_ac_from_tid(tid), txq_id); spin_unlock_irqrestore(&priv->sta_lock, flags); - ret = priv->cfg->ops->lib->txq_agg_enable(priv, txq_id, tx_fifo, - sta_id, tid, *ssn); + ret = iwlagn_txq_agg_enable(priv, txq_id, tx_fifo, sta_id, tid, *ssn); if (ret) return ret; @@ -1125,8 +1124,7 @@ int iwlagn_tx_agg_stop(struct iwl_priv * * to deactivate the uCode queue, just return "success" to allow * mac80211 to clean up it own data. */ - priv->cfg->ops->lib->txq_agg_disable(priv, txq_id, ssn, - tx_fifo_id); + iwlagn_txq_agg_disable(priv, txq_id, ssn, tx_fifo_id); spin_unlock_irqrestore(&priv->lock, flags); ieee80211_stop_tx_ba_cb_irqsafe(vif, sta->addr, tid); @@ -1155,8 +1153,7 @@ int iwlagn_txq_check_empty(struct iwl_pr u16 ssn = SEQ_TO_SN(tid_data->seq_number); int tx_fifo = get_fifo_from_tid(ctx, tid); IWL_DEBUG_HT(priv, "HW queue empty: continue DELBA flow\n"); - priv->cfg->ops->lib->txq_agg_disable(priv, txq_id, - ssn, tx_fifo); + iwlagn_txq_agg_disable(priv, txq_id, ssn, tx_fifo); tid_data->agg.state = IWL_AGG_OFF; ieee80211_stop_tx_ba_cb_irqsafe(ctx->vif, addr, tid); } --- a/drivers/net/wireless/iwlwifi/iwl-agn.h 2011-03-15 12:32:43.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-agn.h 2011-03-15 12:33:42.000000000 +0100 @@ -133,10 +133,6 @@ void iwlagn_txq_update_byte_cnt_tbl(stru u16 byte_cnt); void iwlagn_txq_inval_byte_cnt_tbl(struct iwl_priv *priv, struct iwl_tx_queue *txq); -int iwlagn_txq_agg_enable(struct iwl_priv *priv, int txq_id, - int tx_fifo, int sta_id, int tid, u16 ssn_idx); -int iwlagn_txq_agg_disable(struct iwl_priv *priv, u16 txq_id, - u16 ssn_idx, u8 tx_fifo); void iwlagn_txq_set_sched(struct iwl_priv *priv, u32 mask); void iwl_free_tfds_in_queue(struct iwl_priv *priv, int sta_id, int tid, int freed); --- a/drivers/net/wireless/iwlwifi/iwl-core.h 2011-03-15 12:32:32.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-core.h 2011-03-15 12:32:39.000000000 +0100 @@ -171,11 +171,6 @@ struct iwl_lib_ops { struct iwl_tx_queue *txq); int (*txq_init)(struct iwl_priv *priv, struct iwl_tx_queue *txq); - /* aggregations */ - int (*txq_agg_enable)(struct iwl_priv *priv, int txq_id, int tx_fifo, - int sta_id, int tid, u16 ssn_idx); - int (*txq_agg_disable)(struct iwl_priv *priv, u16 txq_id, u16 ssn_idx, - u8 tx_fifo); /* setup Rx handler */ void (*rx_handler_setup)(struct iwl_priv *priv); /* setup deferred work */ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-15 11:51 ` Johannes Berg @ 2011-03-15 12:52 ` Johannes Berg 2011-03-15 18:16 ` Daniel Halperin 0 siblings, 1 reply; 16+ messages in thread From: Johannes Berg @ 2011-03-15 12:52 UTC (permalink / raw) To: Daniel Halperin Cc: wwguy, ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org On Tue, 2011-03-15 at 12:51 +0100, Johannes Berg wrote: > On Mon, 2011-03-14 at 11:16 -0700, Daniel Halperin wrote: > > > BUT, it looks like when we set up aggregation, we set the scheduler to > > ALWAYS use the default maximum of 64 frames for both these parameters > > [2] when configuring the agg queue. This is probably why the scheduler > > is willing to send long batches and have many outstanding frames. I > > bet that the fix is to make these parameters match the ones that come > > down from mac80211. If I get time later, I will see if I can fix > > this. > > Yeah, you're absolutely right, the window size and frame limit should > both match the information about the aggregation session, and be limited > to what we and the peer would like to see. > > The bad thing is that we set up the queue when we get > IEEE80211_AMPDU_TX_START, but we only know the window size when we get > IEEE80211_AMPDU_TX_OPERATIONAL. However, I think we can change this > since the first TX to the queue will only happen after _OPERATIONAL. We > just need to refactor iwlagn_tx_agg_start() into the parts that can > fail, and the parts that don't. On top of the previous patch, does this work? Note that I haven't tested it at all yet, sorry. johannes --- drivers/net/wireless/iwlwifi/iwl-agn-tx.c | 47 +++++++++++++++++++----------- drivers/net/wireless/iwlwifi/iwl-agn.c | 4 ++ drivers/net/wireless/iwlwifi/iwl-agn.h | 3 + drivers/net/wireless/iwlwifi/iwl-dev.h | 1 4 files changed, 39 insertions(+), 16 deletions(-) --- a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c 2011-03-15 13:22:04.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c 2011-03-15 13:50:34.000000000 +0100 @@ -222,13 +222,8 @@ void iwlagn_tx_queue_set_status(struct i scd_retry ? "BA" : "AC/CMD", txq_id, tx_fifo_id); } -static int iwlagn_txq_agg_enable(struct iwl_priv *priv, int txq_id, - int tx_fifo, int sta_id, int tid, u16 ssn_idx) +static int iwlagn_txq_agg_enable(struct iwl_priv *priv, int txq_id, int sta_id, int tid) { - unsigned long flags; - u16 ra_tid; - int ret; - if ((IWLAGN_FIRST_AMPDU_QUEUE > txq_id) || (IWLAGN_FIRST_AMPDU_QUEUE + priv->cfg->base_params->num_of_ampdu_queues <= txq_id)) { @@ -240,12 +235,33 @@ static int iwlagn_txq_agg_enable(struct return -EINVAL; } - ra_tid = BUILD_RAxTID(sta_id, tid); - /* Modify device's station table to Tx this TID */ - ret = iwl_sta_tx_modify_enable_tid(priv, sta_id, tid); - if (ret) - return ret; + return iwl_sta_tx_modify_enable_tid(priv, sta_id, tid); +} + +void iwlagn_txq_agg_queue_setup(struct iwl_priv *priv, + struct ieee80211_sta *sta, + int tid, int frame_limit) +{ + int sta_id, tx_fifo, txq_id, ssn_idx; + u16 ra_tid; + unsigned long flags; + struct iwl_tid_data *tid_data; + + sta_id = iwl_sta_id(sta); + if (WARN_ON(sta_id == IWL_INVALID_STATION)) + return; + if (WARN_ON(tid >= MAX_TID_COUNT)) + return; + + spin_lock_irqsave(&priv->sta_lock, flags); + tid_data = &priv->stations[sta_id].tid[tid]; + ssn_idx = SEQ_TO_SN(tid_data->seq_number); + txq_id = tid_data->agg.txq_id; + tx_fifo = tid_data->agg.tx_fifo; + spin_unlock_irqrestore(&priv->sta_lock, flags); + + ra_tid = BUILD_RAxTID(sta_id, tid); spin_lock_irqsave(&priv->lock, flags); @@ -271,10 +287,10 @@ static int iwlagn_txq_agg_enable(struct iwl_write_targ_mem(priv, priv->scd_base_addr + IWLAGN_SCD_CONTEXT_QUEUE_OFFSET(txq_id) + sizeof(u32), - ((SCD_WIN_SIZE << + ((frame_limit << IWLAGN_SCD_QUEUE_CTX_REG2_WIN_SIZE_POS) & IWLAGN_SCD_QUEUE_CTX_REG2_WIN_SIZE_MSK) | - ((SCD_FRAME_LIMIT << + ((frame_limit << IWLAGN_SCD_QUEUE_CTX_REG2_FRAME_LIMIT_POS) & IWLAGN_SCD_QUEUE_CTX_REG2_FRAME_LIMIT_MSK)); @@ -284,8 +300,6 @@ static int iwlagn_txq_agg_enable(struct iwlagn_tx_queue_set_status(priv, &priv->txq[txq_id], tx_fifo, 1); spin_unlock_irqrestore(&priv->lock, flags); - - return 0; } static int iwlagn_txq_agg_disable(struct iwl_priv *priv, u16 txq_id, @@ -1034,10 +1048,11 @@ int iwlagn_tx_agg_start(struct iwl_priv tid_data = &priv->stations[sta_id].tid[tid]; *ssn = SEQ_TO_SN(tid_data->seq_number); tid_data->agg.txq_id = txq_id; + tid_data->agg.tx_fifo = tx_fifo; iwl_set_swq_id(&priv->txq[txq_id], get_ac_from_tid(tid), txq_id); spin_unlock_irqrestore(&priv->sta_lock, flags); - ret = iwlagn_txq_agg_enable(priv, txq_id, tx_fifo, sta_id, tid, *ssn); + ret = iwlagn_txq_agg_enable(priv, txq_id, sta_id, tid); if (ret) return ret; --- a/drivers/net/wireless/iwlwifi/iwl-agn.c 2011-03-15 13:21:53.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c 2011-03-15 13:47:49.000000000 +0100 @@ -3344,6 +3344,10 @@ int iwlagn_mac_ampdu_action(struct ieee8 } break; case IEEE80211_AMPDU_TX_OPERATIONAL: + buf_size = min_t(int, buf_size, LINK_QUAL_AGG_FRAME_LIMIT_DEF); + + iwlagn_txq_agg_queue_setup(priv, sta, tid, buf_size); + /* * If the limit is 0, then it wasn't initialised yet, * use the default. We can do that since we take the --- a/drivers/net/wireless/iwlwifi/iwl-agn.h 2011-03-15 13:50:03.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-agn.h 2011-03-15 13:50:24.000000000 +0100 @@ -202,6 +202,9 @@ int iwlagn_tx_agg_start(struct iwl_priv struct ieee80211_sta *sta, u16 tid, u16 *ssn); int iwlagn_tx_agg_stop(struct iwl_priv *priv, struct ieee80211_vif *vif, struct ieee80211_sta *sta, u16 tid); +void iwlagn_txq_agg_queue_setup(struct iwl_priv *priv, + struct ieee80211_sta *sta, + int tid, int frame_limit); int iwlagn_txq_check_empty(struct iwl_priv *priv, int sta_id, u8 tid, int txq_id); void iwlagn_rx_reply_compressed_ba(struct iwl_priv *priv, --- a/drivers/net/wireless/iwlwifi/iwl-dev.h 2011-03-15 13:48:12.000000000 +0100 +++ b/drivers/net/wireless/iwlwifi/iwl-dev.h 2011-03-15 13:48:22.000000000 +0100 @@ -417,6 +417,7 @@ struct iwl_ht_agg { #define IWL_EMPTYING_HW_QUEUE_ADDBA 2 #define IWL_EMPTYING_HW_QUEUE_DELBA 3 u8 state; + u8 tx_fifo; }; ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-15 12:52 ` Johannes Berg @ 2011-03-15 18:16 ` Daniel Halperin 2011-03-15 18:29 ` Johannes Berg 2011-03-15 18:41 ` Johannes Berg 0 siblings, 2 replies; 16+ messages in thread From: Daniel Halperin @ 2011-03-15 18:16 UTC (permalink / raw) To: Johannes Berg Cc: wwguy, ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org I applied these two patches and reverted all my previous changes to mac80211 (force window to be 64 frames) and to iwlwifi (use 64-frame agg limit). + TCP performance goes up from ~60 to ~80 Mbps. + I see no more of these 100ms timeout "ieee80211 phy0: release an RX reorder frame due to timeout on earlier frames (skipped=0)" messages from the receiver's mac80211 stack. Big improvements! There's still something off about the window size: unlike with ath9k, the receiver does continually print out warnings about advancing its window. There's something very fishy here: for any given aggregation session, the part of the sequence number space this happens is always the same! Really weird. Here's a log from 3 agg sessions in a row, just running simple TCP iperf: [341221.679710] New seq exceeds buffering window: 283, 250 [341223.960270] New seq exceeds buffering window: 281, 249 <snip many similar lines> [341237.880568] New seq exceeds buffering window: 282, 251 [341238.849064] New seq exceeds buffering window: 280, 249 [341238.849072] New seq exceeds buffering window: 281, 250 [341239.903488] New seq exceeds buffering window: 281, 250 [341241.468187] New seq exceeds buffering window: 290, 251 [341242.009078] New seq exceeds buffering window: 280, 249 [341242.009085] New seq exceeds buffering window: 281, 250 [341340.865528] New seq exceeds buffering window: 2794, 2763 [341340.865536] New seq exceeds buffering window: 2795, 2764 [341340.865542] New seq exceeds buffering window: 2796, 2765 [341340.865547] New seq exceeds buffering window: 2797, 2766 [341341.451867] New seq exceeds buffering window: 2795, 2764 <snip> [341349.271337] New seq exceeds buffering window: 2797, 2766 [341350.290501] New seq exceeds buffering window: 2796, 2765 [341354.010402] New seq exceeds buffering window: 2795, 2764 [341356.055030] New seq exceeds buffering window: 2795, 2764 [341356.055037] New seq exceeds buffering window: 2797, 2765 [341358.444524] New seq exceeds buffering window: 2794, 2763 [341358.444547] New seq exceeds buffering window: 2797, 2766 This one is complete including agg start/stop. [341398.950019] iwlagn 0000:03:00.0: iwlagn_tx_agg_start on ra = 00:16:ea:c3:b3:8e tid = 0 [341401.790964] New seq exceeds buffering window: 2335, 2304 [341402.918351] New seq exceeds buffering window: 2334, 2303 [341402.918360] New seq exceeds buffering window: 2335, 2304 [341406.182184] New seq exceeds buffering window: 2336, 2305 [341407.730557] New seq exceeds buffering window: 2335, 2304 [341410.996986] New seq exceeds buffering window: 2333, 2302 [341410.996994] New seq exceeds buffering window: 2334, 2303 [341410.997000] New seq exceeds buffering window: 2335, 2304 [341410.997005] New seq exceeds buffering window: 2336, 2305 [341411.664894] New seq exceeds buffering window: 2333, 2302 [341411.664911] New seq exceeds buffering window: 2335, 2304 [341411.664917] New seq exceeds buffering window: 2336, 2305 [341412.696554] New seq exceeds buffering window: 2333, 2302 [341412.696562] New seq exceeds buffering window: 2334, 2303 [341412.696568] New seq exceeds buffering window: 2335, 2304 [341412.696574] New seq exceeds buffering window: 2336, 2305 [341414.828173] New seq exceeds buffering window: 2333, 2302 [341414.828192] New seq exceeds buffering window: 2335, 2304 [341415.849728] New seq exceeds buffering window: 2333, 2302 [341415.851536] New seq exceeds buffering window: 2335, 2304 [341415.851542] New seq exceeds buffering window: 2336, 2305 [341417.085491] New seq exceeds buffering window: 2334, 2303 [341422.440517] iwlagn 0000:03:00.0: iwlagn_tx_agg_stop on ra = 00:16:ea:c3:b3:8e tid = 0 Note that I didn't even remove modules/reassociate anything like that, I just let agg sessions start and stop between iperf. Any thoughts? Dan On Tue, Mar 15, 2011 at 5:52 AM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Tue, 2011-03-15 at 12:51 +0100, Johannes Berg wrote: >> On Mon, 2011-03-14 at 11:16 -0700, Daniel Halperin wrote: >> >> > BUT, it looks like when we set up aggregation, we set the scheduler to >> > ALWAYS use the default maximum of 64 frames for both these parameters >> > [2] when configuring the agg queue. This is probably why the scheduler >> > is willing to send long batches and have many outstanding frames. I >> > bet that the fix is to make these parameters match the ones that come >> > down from mac80211. If I get time later, I will see if I can fix >> > this. >> >> Yeah, you're absolutely right, the window size and frame limit should >> both match the information about the aggregation session, and be limited >> to what we and the peer would like to see. >> >> The bad thing is that we set up the queue when we get >> IEEE80211_AMPDU_TX_START, but we only know the window size when we get >> IEEE80211_AMPDU_TX_OPERATIONAL. However, I think we can change this >> since the first TX to the queue will only happen after _OPERATIONAL. We >> just need to refactor iwlagn_tx_agg_start() into the parts that can >> fail, and the parts that don't. > > On top of the previous patch, does this work? Note that I haven't tested > it at all yet, sorry. > > johannes > > --- > drivers/net/wireless/iwlwifi/iwl-agn-tx.c | 47 +++++++++++++++++++----------- > drivers/net/wireless/iwlwifi/iwl-agn.c | 4 ++ > drivers/net/wireless/iwlwifi/iwl-agn.h | 3 + > drivers/net/wireless/iwlwifi/iwl-dev.h | 1 > 4 files changed, 39 insertions(+), 16 deletions(-) > > --- a/drivers/net/wireless/iwlwifi/iwl-agn-tx.c 2011-03-15 13:22:04.000000000 +0100 > +++ b/drivers/net/wireless/iwlwifi/iwl-agn-tx.c 2011-03-15 13:50:34.000000000 +0100 > @@ -222,13 +222,8 @@ void iwlagn_tx_queue_set_status(struct i > scd_retry ? "BA" : "AC/CMD", txq_id, tx_fifo_id); > } > > -static int iwlagn_txq_agg_enable(struct iwl_priv *priv, int txq_id, > - int tx_fifo, int sta_id, int tid, u16 ssn_idx) > +static int iwlagn_txq_agg_enable(struct iwl_priv *priv, int txq_id, int sta_id, int tid) > { > - unsigned long flags; > - u16 ra_tid; > - int ret; > - > if ((IWLAGN_FIRST_AMPDU_QUEUE > txq_id) || > (IWLAGN_FIRST_AMPDU_QUEUE + > priv->cfg->base_params->num_of_ampdu_queues <= txq_id)) { > @@ -240,12 +235,33 @@ static int iwlagn_txq_agg_enable(struct > return -EINVAL; > } > > - ra_tid = BUILD_RAxTID(sta_id, tid); > - > /* Modify device's station table to Tx this TID */ > - ret = iwl_sta_tx_modify_enable_tid(priv, sta_id, tid); > - if (ret) > - return ret; > + return iwl_sta_tx_modify_enable_tid(priv, sta_id, tid); > +} > + > +void iwlagn_txq_agg_queue_setup(struct iwl_priv *priv, > + struct ieee80211_sta *sta, > + int tid, int frame_limit) > +{ > + int sta_id, tx_fifo, txq_id, ssn_idx; > + u16 ra_tid; > + unsigned long flags; > + struct iwl_tid_data *tid_data; > + > + sta_id = iwl_sta_id(sta); > + if (WARN_ON(sta_id == IWL_INVALID_STATION)) > + return; > + if (WARN_ON(tid >= MAX_TID_COUNT)) > + return; > + > + spin_lock_irqsave(&priv->sta_lock, flags); > + tid_data = &priv->stations[sta_id].tid[tid]; > + ssn_idx = SEQ_TO_SN(tid_data->seq_number); > + txq_id = tid_data->agg.txq_id; > + tx_fifo = tid_data->agg.tx_fifo; > + spin_unlock_irqrestore(&priv->sta_lock, flags); > + > + ra_tid = BUILD_RAxTID(sta_id, tid); > > spin_lock_irqsave(&priv->lock, flags); > > @@ -271,10 +287,10 @@ static int iwlagn_txq_agg_enable(struct > iwl_write_targ_mem(priv, priv->scd_base_addr + > IWLAGN_SCD_CONTEXT_QUEUE_OFFSET(txq_id) + > sizeof(u32), > - ((SCD_WIN_SIZE << > + ((frame_limit << > IWLAGN_SCD_QUEUE_CTX_REG2_WIN_SIZE_POS) & > IWLAGN_SCD_QUEUE_CTX_REG2_WIN_SIZE_MSK) | > - ((SCD_FRAME_LIMIT << > + ((frame_limit << > IWLAGN_SCD_QUEUE_CTX_REG2_FRAME_LIMIT_POS) & > IWLAGN_SCD_QUEUE_CTX_REG2_FRAME_LIMIT_MSK)); > > @@ -284,8 +300,6 @@ static int iwlagn_txq_agg_enable(struct > iwlagn_tx_queue_set_status(priv, &priv->txq[txq_id], tx_fifo, 1); > > spin_unlock_irqrestore(&priv->lock, flags); > - > - return 0; > } > > static int iwlagn_txq_agg_disable(struct iwl_priv *priv, u16 txq_id, > @@ -1034,10 +1048,11 @@ int iwlagn_tx_agg_start(struct iwl_priv > tid_data = &priv->stations[sta_id].tid[tid]; > *ssn = SEQ_TO_SN(tid_data->seq_number); > tid_data->agg.txq_id = txq_id; > + tid_data->agg.tx_fifo = tx_fifo; > iwl_set_swq_id(&priv->txq[txq_id], get_ac_from_tid(tid), txq_id); > spin_unlock_irqrestore(&priv->sta_lock, flags); > > - ret = iwlagn_txq_agg_enable(priv, txq_id, tx_fifo, sta_id, tid, *ssn); > + ret = iwlagn_txq_agg_enable(priv, txq_id, sta_id, tid); > if (ret) > return ret; > > --- a/drivers/net/wireless/iwlwifi/iwl-agn.c 2011-03-15 13:21:53.000000000 +0100 > +++ b/drivers/net/wireless/iwlwifi/iwl-agn.c 2011-03-15 13:47:49.000000000 +0100 > @@ -3344,6 +3344,10 @@ int iwlagn_mac_ampdu_action(struct ieee8 > } > break; > case IEEE80211_AMPDU_TX_OPERATIONAL: > + buf_size = min_t(int, buf_size, LINK_QUAL_AGG_FRAME_LIMIT_DEF); > + > + iwlagn_txq_agg_queue_setup(priv, sta, tid, buf_size); > + > /* > * If the limit is 0, then it wasn't initialised yet, > * use the default. We can do that since we take the > --- a/drivers/net/wireless/iwlwifi/iwl-agn.h 2011-03-15 13:50:03.000000000 +0100 > +++ b/drivers/net/wireless/iwlwifi/iwl-agn.h 2011-03-15 13:50:24.000000000 +0100 > @@ -202,6 +202,9 @@ int iwlagn_tx_agg_start(struct iwl_priv > struct ieee80211_sta *sta, u16 tid, u16 *ssn); > int iwlagn_tx_agg_stop(struct iwl_priv *priv, struct ieee80211_vif *vif, > struct ieee80211_sta *sta, u16 tid); > +void iwlagn_txq_agg_queue_setup(struct iwl_priv *priv, > + struct ieee80211_sta *sta, > + int tid, int frame_limit); > int iwlagn_txq_check_empty(struct iwl_priv *priv, > int sta_id, u8 tid, int txq_id); > void iwlagn_rx_reply_compressed_ba(struct iwl_priv *priv, > --- a/drivers/net/wireless/iwlwifi/iwl-dev.h 2011-03-15 13:48:12.000000000 +0100 > +++ b/drivers/net/wireless/iwlwifi/iwl-dev.h 2011-03-15 13:48:22.000000000 +0100 > @@ -417,6 +417,7 @@ struct iwl_ht_agg { > #define IWL_EMPTYING_HW_QUEUE_ADDBA 2 > #define IWL_EMPTYING_HW_QUEUE_DELBA 3 > u8 state; > + u8 tx_fifo; > }; > > > > > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-15 18:16 ` Daniel Halperin @ 2011-03-15 18:29 ` Johannes Berg 2011-03-15 18:31 ` Daniel Halperin 2011-03-15 18:41 ` Johannes Berg 1 sibling, 1 reply; 16+ messages in thread From: Johannes Berg @ 2011-03-15 18:29 UTC (permalink / raw) To: Daniel Halperin Cc: wwguy, ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org On Tue, 2011-03-15 at 11:16 -0700, Daniel Halperin wrote: > [341349.271337] New seq exceeds buffering window: 2797, 2766 quick q (on the phone right now) - where's that message from? johannes ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-15 18:29 ` Johannes Berg @ 2011-03-15 18:31 ` Daniel Halperin 2011-03-15 18:33 ` Daniel Halperin 0 siblings, 1 reply; 16+ messages in thread From: Daniel Halperin @ 2011-03-15 18:31 UTC (permalink / raw) To: Johannes Berg Cc: wwguy, ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org At the receiver (AP in this case) side, hacked from mac80211. The following diff includes one more mac80211 fix I haven't yet sent to the list. Dan diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index a6701ed..0467ec8 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -605,13 +605,14 @@ static void ieee80211_sta_reorder_release(struct ieee80211 continue; } if (!time_after(jiffies, tid_agg_rx->reorder_time[j] + - HT_RX_REORDER_BUF_TIMEOUT)) + HT_RX_REORDER_BUF_TIMEOUT) && skipped) goto set_release_timer; -#ifdef CONFIG_MAC80211_HT_DEBUG - if (net_ratelimit()) +#if 1 +//#ifdef CONFIG_MAC80211_HT_DEBUG +// if (net_ratelimit()) wiphy_debug(hw->wiphy, - "release an RX reorder frame due to + "release an RX reorder frame due to #endif ieee80211_release_reorder_frame(hw, tid_agg_rx, j); @@ -680,6 +681,7 @@ static bool ieee80211_sta_manage_reorder_buf(struct ieee8021 * size release some previous frames to make room for this one. */ if (!seq_less(mpdu_seq_num, head_seq_num + buf_size)) { + printk("New seq exceeds buffering window: %d, %d\n", mpdu_seq_nu head_seq_num = seq_inc(seq_sub(mpdu_seq_num, buf_size)); /* release stored frames up to new head to stack */ ieee80211_release_reorder_frames(hw, tid_agg_rx, head_seq_num); On Tue, Mar 15, 2011 at 11:29 AM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Tue, 2011-03-15 at 11:16 -0700, Daniel Halperin wrote: > >> [341349.271337] New seq exceeds buffering window: 2797, 2766 > > quick q (on the phone right now) - where's that message from? > > johannes > > > ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-15 18:31 ` Daniel Halperin @ 2011-03-15 18:33 ` Daniel Halperin 0 siblings, 0 replies; 16+ messages in thread From: Daniel Halperin @ 2011-03-15 18:33 UTC (permalink / raw) To: Johannes Berg Cc: wwguy, ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org Sorry, terminal cut off. diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index a6701ed..0467ec8 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -605,13 +605,14 @@ static void ieee80211_sta_reorder_release(struct ieee80211_hw *hw, continue; } if (!time_after(jiffies, tid_agg_rx->reorder_time[j] + - HT_RX_REORDER_BUF_TIMEOUT)) + HT_RX_REORDER_BUF_TIMEOUT) && skipped) goto set_release_timer; -#ifdef CONFIG_MAC80211_HT_DEBUG - if (net_ratelimit()) +#if 1 +//#ifdef CONFIG_MAC80211_HT_DEBUG +// if (net_ratelimit()) wiphy_debug(hw->wiphy, - "release an RX reorder frame due to timeout on earlier frames\n"); + "release an RX reorder frame due to timeout on earlier frames (skipped=%d)\n", skipped); #endif ieee80211_release_reorder_frame(hw, tid_agg_rx, j); @@ -680,6 +681,7 @@ static bool ieee80211_sta_manage_reorder_buf(struct ieee80211_hw *hw, * size release some previous frames to make room for this one. */ if (!seq_less(mpdu_seq_num, head_seq_num + buf_size)) { + printk("New seq exceeds buffering window: %d, %d\n", mpdu_seq_num, head_seq_num); head_seq_num = seq_inc(seq_sub(mpdu_seq_num, buf_size)); /* release stored frames up to new head to stack */ ieee80211_release_reorder_frames(hw, tid_agg_rx, head_seq_num); On Tue, Mar 15, 2011 at 11:31 AM, Daniel Halperin <dhalperi@cs.washington.edu> wrote: > At the receiver (AP in this case) side, hacked from mac80211. The > following diff includes one more mac80211 fix I haven't yet sent to > the list. > > Dan > > diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c > index a6701ed..0467ec8 100644 > --- a/net/mac80211/rx.c > +++ b/net/mac80211/rx.c > @@ -605,13 +605,14 @@ static void ieee80211_sta_reorder_release(struct > ieee80211 continue; > } > if (!time_after(jiffies, tid_agg_rx->reorder_time[j] + > - HT_RX_REORDER_BUF_TIMEOUT)) > + HT_RX_REORDER_BUF_TIMEOUT) && skipped) > goto set_release_timer; > > -#ifdef CONFIG_MAC80211_HT_DEBUG > - if (net_ratelimit()) > +#if 1 > +//#ifdef CONFIG_MAC80211_HT_DEBUG > +// if (net_ratelimit()) > wiphy_debug(hw->wiphy, > - "release an RX reorder > frame due to + "release an > RX reorder frame due to #endif > ieee80211_release_reorder_frame(hw, tid_agg_rx, j); > > @@ -680,6 +681,7 @@ static bool > ieee80211_sta_manage_reorder_buf(struct ieee8021 * size > release some previous frames to make room for this one. > */ > if (!seq_less(mpdu_seq_num, head_seq_num + buf_size)) { > + printk("New seq exceeds buffering window: %d, %d\n", > mpdu_seq_nu head_seq_num = > seq_inc(seq_sub(mpdu_seq_num, buf_size)); > /* release stored frames up to new head to stack */ > ieee80211_release_reorder_frames(hw, tid_agg_rx, head_seq_num); > > > On Tue, Mar 15, 2011 at 11:29 AM, Johannes Berg > <johannes@sipsolutions.net> wrote: >> On Tue, 2011-03-15 at 11:16 -0700, Daniel Halperin wrote: >> >>> [341349.271337] New seq exceeds buffering window: 2797, 2766 >> >> quick q (on the phone right now) - where's that message from? >> >> johannes >> >> >> > ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-15 18:16 ` Daniel Halperin 2011-03-15 18:29 ` Johannes Berg @ 2011-03-15 18:41 ` Johannes Berg 2011-03-15 18:47 ` Daniel Halperin 1 sibling, 1 reply; 16+ messages in thread From: Johannes Berg @ 2011-03-15 18:41 UTC (permalink / raw) To: Daniel Halperin Cc: wwguy, ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org On Tue, 2011-03-15 at 11:16 -0700, Daniel Halperin wrote: > There's still something off about the window size: unlike with ath9k, > the receiver does continually print out warnings about advancing its > window. There's something very fishy here: for any given aggregation > session, the part of the sequence number space this happens is always > the same! Really weird. Here's a log from 3 agg sessions in a row, > just running simple TCP iperf: > > [341221.679710] New seq exceeds buffering window: 283, 250 Ok thanks for the patch, I'll still reply here though. > [341398.950019] iwlagn 0000:03:00.0: iwlagn_tx_agg_start on ra = > 00:16:ea:c3:b3:8e tid = 0 > [341401.790964] New seq exceeds buffering window: 2335, 2304 So the delta is always 31, it seems? Is iwlwifi consistently sending batches of 32 instead of the advertised 31? Also -- don't get confused, the tx_agg_start is on its own TX agg session, but the new seq stuff is on its RX agg session. johannes ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-15 18:41 ` Johannes Berg @ 2011-03-15 18:47 ` Daniel Halperin 2011-03-15 18:52 ` Johannes Berg 0 siblings, 1 reply; 16+ messages in thread From: Daniel Halperin @ 2011-03-15 18:47 UTC (permalink / raw) To: Johannes Berg Cc: wwguy, ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org On Tue, Mar 15, 2011 at 11:41 AM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Tue, 2011-03-15 at 11:16 -0700, Daniel Halperin wrote: > >> There's still something off about the window size: unlike with ath9k, >> the receiver does continually print out warnings about advancing its >> window. There's something very fishy here: for any given aggregation >> session, the part of the sequence number space this happens is always >> the same! Really weird. Here's a log from 3 agg sessions in a row, >> just running simple TCP iperf: >> >> [341221.679710] New seq exceeds buffering window: 283, 250 > > Ok thanks for the patch, I'll still reply here though. > > >> [341398.950019] iwlagn 0000:03:00.0: iwlagn_tx_agg_start on ra = >> 00:16:ea:c3:b3:8e tid = 0 >> [341401.790964] New seq exceeds buffering window: 2335, 2304 > > So the delta is always 31, it seems? Is iwlwifi consistently sending > batches of 32 instead of the advertised 31? I don't think that's the case, in fact I've never seen the hardware send a larger batch than frame limit. I think it's from a missed frame early in one batch being pushed out by a late frame in the next batch. I know how to get the logs to find out. I have a bunch of meetings soon, but I'll dig into this more later. > Also -- don't get confused, the tx_agg_start is on its own TX agg > session, but the new seq stuff is on its RX agg session. You're absolutely true; but both sides pretty much start and stop aggregation in sync, they both have the same timeouts and I'm only using tcp traffic which is bidirectional. They're running the same software on the same hardware, modulo one being an AP and one a client. [Yes, I know that AP mode is broken on the iwl5300 but I disable power save, etc., and it seems to work well enough ... Actually, does AP (not P2P) mode work properly on the P2P-supporting 6300? I could switch to using those.] Dan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-15 18:47 ` Daniel Halperin @ 2011-03-15 18:52 ` Johannes Berg 0 siblings, 0 replies; 16+ messages in thread From: Johannes Berg @ 2011-03-15 18:52 UTC (permalink / raw) To: Daniel Halperin Cc: wwguy, ipw3945-devel@lists.sourceforge.net, linux-wireless@vger.kernel.org On Tue, 2011-03-15 at 11:47 -0700, Daniel Halperin wrote: > >> [341398.950019] iwlagn 0000:03:00.0: iwlagn_tx_agg_start on ra = > >> 00:16:ea:c3:b3:8e tid = 0 > >> [341401.790964] New seq exceeds buffering window: 2335, 2304 > > > > So the delta is always 31, it seems? Is iwlwifi consistently sending > > batches of 32 instead of the advertised 31? > > I don't think that's the case, in fact I've never seen the hardware > send a larger batch than frame limit. Ok. > I think it's from a missed > frame early in one batch being pushed out by a late frame in the next > batch. Ok, yeah, that'd have the same effect. > I know how to get the logs to find out. I have a bunch of > meetings soon, but I'll dig into this more later. Thanks. > > Also -- don't get confused, the tx_agg_start is on its own TX agg > > session, but the new seq stuff is on its RX agg session. > > You're absolutely true; but both sides pretty much start and stop > aggregation in sync, they both have the same timeouts and I'm only > using tcp traffic which is bidirectional. They're running the same > software on the same hardware, modulo one being an AP and one a > client. Right, if you have bidi traffic like TCP iwlwifi's way of starting/stopping aggregation will be in sync. > [Yes, I know that AP mode is broken on the iwl5300 but I disable power > save, etc., and it seems to work well enough ... Actually, does AP > (not P2P) mode work properly on the P2P-supporting 6300? I could > switch to using those.] Yeah, but I wouldn't worry about it in your case right now -- the only broken thing that I know of is all about powersave. johannes ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-11 8:11 bug: iwlwifi, aggregation, and mac80211's reorder buffer Daniel Halperin 2011-03-11 8:13 ` Guy, Wey-Yi @ 2011-03-14 16:24 ` Johannes Berg 2011-03-14 17:38 ` Daniel Halperin 1 sibling, 1 reply; 16+ messages in thread From: Johannes Berg @ 2011-03-14 16:24 UTC (permalink / raw) To: Daniel Halperin; +Cc: ipw3945-devel, linux-wireless On Fri, 2011-03-11 at 00:11 -0800, Daniel Halperin wrote: > One thing that Intel does that ath9k does not is transmit packets out > of sequence number order inside a batch. (This is legal in the 802.11 > standard). Even if that's legal it seems very strange? Do you have packet captures of this by any chance? > I figured that one explanation for the TCP SACKs would be > if, somehow, frames got released to the network stack out of order; > indeed, many of the "holes" covered by the SACKs are filled quickly > (within ~4ms, about the length of one aggregation batch). Note that > iwlwifi defaults to an aggregation frame limit, hence buffer size, of > 31 frames. mac80211 honors this buffer size specification by > releasing frames to the network stack that are >= 31 sequence numbers > below the highest received frame. > > It looks like Intel doesn't honor its own frame limit, as I often saw > it have more than 31 frames outstanding, causing mac80211 on the > receiver to release many frames early. Changing iwlwifi's default agg > limit to 63 frames on both ends dramatically reduced the prevalence of > SACKs/TCP retransmissions and improved avg TCP performance to ~100 > Mbps (ranging 83-110). Great ... what if you just change the mac80211 code like you suggested? Does that already help, by making the receiver have a larger window? > (1) Why does iwlwifi default to an aggregation frame limit of 31? I > didn't see any negative effects from 63 frame limit and performance > improved dramatically Wey-Yi answered this fully afaik. > (2) Is there a way to make iwlwifi honor the aggregation limit? I > know that agg is controlled by a hardware scheduler, so this may be > difficult. Heh. I'd hope it already does but I guess if it doesn't then there's little we can do (but internally report a bug). This is actually not just a throughput problem, but also a correctness issue, since some devices do not allow for receiving long aggregates! > (3) mac80211's reorder buffer code could probably also be relaxed. It > probably wouldn't hurt to have the buffer be twice the transmitter's > advertised buffer, and might compensate for devices that don't > properly honor frame limits. Well, it doesn't make sense to go above 64, no? Can't have reordering across aggregates. > Also, mac80211 only flushes the reorder > buffer every 100 ms. That seems like a LONG time given that batches > are limited to 4ms -- 100ms is room for at least 10, maybe 20 > retransmissions to attempt to fill in the holes(!). Yeah that's true, but you don't really know how much time there is between retries, bluetooth for example might block the retransmission for quite a while. > (4) even after this fix, I see a few SACKs, and even when there aren't > SACKs I still see TCP "dead time" up to ~100ms. What else would you > use to debug this setup? What's "after this fix"? johannes ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: bug: iwlwifi, aggregation, and mac80211's reorder buffer 2011-03-14 16:24 ` Johannes Berg @ 2011-03-14 17:38 ` Daniel Halperin 0 siblings, 0 replies; 16+ messages in thread From: Daniel Halperin @ 2011-03-14 17:38 UTC (permalink / raw) To: Johannes Berg; +Cc: ipw3945-devel, linux-wireless On Mon, Mar 14, 2011 at 9:24 AM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Fri, 2011-03-11 at 00:11 -0800, Daniel Halperin wrote: > >> One thing that Intel does that ath9k does not is transmit packets out >> of sequence number order inside a batch. (This is legal in the 802.11 >> standard). > > Even if that's legal it seems very strange? Do you have packet captures > of this by any chance? I can acquire and send you some logs. I'm not sure whether packet logs work -- last I checked [3 yrs ago] iwlwifi monitor mode didn't pick up agg batches to other destinations -- and I'm not sure where in the stack packet logs fall when taken on the host (e.g., before or after the reorder buffer?). What I did log was the reported sequence numbers from iwlagn_tx_status_reply_tx. What you see is: enqueued frame 1 enqueued frame 2 enqueued frame 3 enqueued frame 4 ... enqueued frame 31 <got compressed block-ack only missing frame 2> enqueued frame 32 enqueued frame 33 enqueued frame 34 enqueued frame 35 enqueued frame 36 enqueued frame 2 enqueued frame 37 enqueued frame 38 In other words, it looks like the scheduler hardware is preloading some of these frames before it gets the compressed BA, allowing the window to be larger than 31 frames. Here, even if frame 2 is received, mac80211 will have already dumped through frame 36: RX of frame 33 causes it to shift the window past 2 and thus dump the entirely successfully received window from 3 through 36. I'll get logs to confirm this a bit later. (this example is made up and assumes iwlwifi's default limit of 31 frames) >> I figured that one explanation for the TCP SACKs would be >> if, somehow, frames got released to the network stack out of order; >> indeed, many of the "holes" covered by the SACKs are filled quickly >> (within ~4ms, about the length of one aggregation batch). Note that >> iwlwifi defaults to an aggregation frame limit, hence buffer size, of >> 31 frames. mac80211 honors this buffer size specification by >> releasing frames to the network stack that are >= 31 sequence numbers >> below the highest received frame. >> >> It looks like Intel doesn't honor its own frame limit, as I often saw >> it have more than 31 frames outstanding, causing mac80211 on the >> receiver to release many frames early. Changing iwlwifi's default agg >> limit to 63 frames on both ends dramatically reduced the prevalence of >> SACKs/TCP retransmissions and improved avg TCP performance to ~100 >> Mbps (ranging 83-110). > > Great ... what if you just change the mac80211 code like you suggested? > Does that already help, by making the receiver have a larger window? Yes. The mac80211 code change I suggested makes Intel work better regardless of Intel's frame limit. >> (2) Is there a way to make iwlwifi honor the aggregation limit? I >> know that agg is controlled by a hardware scheduler, so this may be >> difficult. > > Heh. I'd hope it already does but I guess if it doesn't then there's > little we can do (but internally report a bug). This is actually not > just a throughput problem, but also a correctness issue, since some > devices do not allow for receiving long aggregates! Yep. It looks like Intel correctly sends only 31 frames in a batch, but lets more than 31 frames be in the **window** as described above. >> (3) mac80211's reorder buffer code could probably also be relaxed. It >> probably wouldn't hurt to have the buffer be twice the transmitter's >> advertised buffer, and might compensate for devices that don't >> properly honor frame limits. > > Well, it doesn't make sense to go above 64, no? Can't have reordering > across aggregates. I think that's true, yes. When I said "twice" I meant min(2*RX bufsize, 64). Twice makes sense when the default is 31 ;). >> Also, mac80211 only flushes the reorder >> buffer every 100 ms. That seems like a LONG time given that batches >> are limited to 4ms -- 100ms is room for at least 10, maybe 20 >> retransmissions to attempt to fill in the holes(!). > > Yeah that's true, but you don't really know how much time there is > between retries, bluetooth for example might block the retransmission > for quite a while. Fair enough. >> (4) even after this fix, I see a few SACKs, and even when there aren't >> SACKs I still see TCP "dead time" up to ~100ms. What else would you >> use to debug this setup? It's been several days more of debugging so I'm not *positive*, but I believe "after this fix" probably meant the iwlwifi->63 change, and maybe also the mac80211->64 change. Dan ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-03-15 18:52 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-11 8:11 bug: iwlwifi, aggregation, and mac80211's reorder buffer Daniel Halperin 2011-03-11 8:13 ` Guy, Wey-Yi 2011-03-13 19:59 ` Daniel Halperin 2011-03-14 0:47 ` wwguy 2011-03-14 18:16 ` Daniel Halperin 2011-03-15 11:51 ` Johannes Berg 2011-03-15 12:52 ` Johannes Berg 2011-03-15 18:16 ` Daniel Halperin 2011-03-15 18:29 ` Johannes Berg 2011-03-15 18:31 ` Daniel Halperin 2011-03-15 18:33 ` Daniel Halperin 2011-03-15 18:41 ` Johannes Berg 2011-03-15 18:47 ` Daniel Halperin 2011-03-15 18:52 ` Johannes Berg 2011-03-14 16:24 ` Johannes Berg 2011-03-14 17:38 ` Daniel Halperin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).