* Re: [PATCH] wifi: iwlwifi: Fix spurious packet drops with RSS [not found] <20230430001348.3552-1-sultan@kerneltoast.com> @ 2023-05-04 12:10 ` Johannes Berg 2023-05-04 17:55 ` Sultan Alsawaf 0 siblings, 1 reply; 3+ messages in thread From: Johannes Berg @ 2023-05-04 12:10 UTC (permalink / raw) To: Sultan Alsawaf Cc: Greenman, Gregory, Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Goodstein, Mordechay, Coelho, Luciano, Sisodiya, Mukesh, linux-wireless@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org [let's see if my reply will make it to the list, the original seems to not have] On Sun, 2023-04-30 at 00:13 +0000, Sultan Alsawaf wrote: > From: Sultan Alsawaf <sultan@kerneltoast.com> > > When RSS is used and one of the RX queues lags behind others by more than > 2048 frames, then new frames arriving on the lagged RX queue are > incorrectly treated as old rather than new by the reorder buffer, and are > thus spuriously dropped. This is because the reorder buffer treats frames > as old when they have an SN that is more than 2048 away from the head SN, > which causes the reorder buffer to drop frames that are actually valid. > > The odds of this occurring naturally increase with the number of > RX queues used, so CPUs with many threads are more susceptible to > encountering spurious packet drops caused by this issue. > > As it turns out, the firmware already detects when a frame is either old or > duplicated and exports this information, but it's currently unused. Using > these firmware bits to decide when frames are old or duplicated fixes the > spurious drops. So I assume you tested it now, and it works? Somehow I had been under the impression we never got it to work back when... > Johannes mentions that the 9000 series' firmware doesn't support these > bits, so disable RSS on the 9000 series chipsets since they lack a > mechanism to properly detect old and duplicated frames. Indeed, I checked this again, I also somehow thought it was backported to some versions but doesn't look like. We can either leave those old ones broken (they only shipped with fewer cores anyway), or just disable it as you did here, not sure. RSS is probably not as relevant with those slower speeds anyway. > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c > @@ -918,7 +918,6 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm, > struct iwl_mvm_sta *mvm_sta; > struct iwl_mvm_baid_data *baid_data; > struct iwl_mvm_reorder_buffer *buffer; > - struct sk_buff *tail; > u32 reorder = le32_to_cpu(desc->reorder_data); > bool amsdu = desc->mac_flags2 & IWL_RX_MPDU_MFLG2_AMSDU; > bool last_subframe = > @@ -1020,7 +1019,7 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm, > rx_status->device_timestamp, queue); > > /* drop any oudated packets */ > - if (ieee80211_sn_less(sn, buffer->head_sn)) > + if (reorder & IWL_RX_MPDU_REORDER_BA_OLD_SN) > goto drop; > > /* release immediately if allowed by nssn and no stored frames */ > @@ -1068,24 +1067,12 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm, > return false; > } All that "send queue sync" code in the middle that was _meant_ to fix this issue but I guess never really did can also be removed, no? And the timer, etc. etc. johannes [leaving full quote for the benefit of the mailing list] > > - index = sn % buffer->buf_size; > - > - /* > - * Check if we already stored this frame > - * As AMSDU is either received or not as whole, logic is simple: > - * If we have frames in that position in the buffer and the last frame > - * originated from AMSDU had a different SN then it is a retransmission. > - * If it is the same SN then if the subframe index is incrementing it > - * is the same AMSDU - otherwise it is a retransmission. > - */ > - tail = skb_peek_tail(&entries[index].e.frames); > - if (tail && !amsdu) > - goto drop; > - else if (tail && (sn != buffer->last_amsdu || > - buffer->last_sub_index >= sub_frame_idx)) > + /* drop any duplicated packets */ > + if (desc->status & cpu_to_le32(IWL_RX_MPDU_STATUS_DUPLICATE)) > goto drop; > > /* put in reorder buffer */ > + index = sn % buffer->buf_size; > __skb_queue_tail(&entries[index].e.frames, skb); > buffer->num_stored++; > entries[index].e.reorder_time = jiffies; > -- > 2.40.1 > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] wifi: iwlwifi: Fix spurious packet drops with RSS 2023-05-04 12:10 ` [PATCH] wifi: iwlwifi: Fix spurious packet drops with RSS Johannes Berg @ 2023-05-04 17:55 ` Sultan Alsawaf 2023-05-05 6:40 ` Johannes Berg 0 siblings, 1 reply; 3+ messages in thread From: Sultan Alsawaf @ 2023-05-04 17:55 UTC (permalink / raw) To: Johannes Berg Cc: Greenman, Gregory, Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Goodstein, Mordechay, Coelho, Luciano, Sisodiya, Mukesh, linux-wireless@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, May 04, 2023 at 02:10:50PM +0200, Johannes Berg wrote: > [let's see if my reply will make it to the list, the original seems to > not have] > > On Sun, 2023-04-30 at 00:13 +0000, Sultan Alsawaf wrote: > > From: Sultan Alsawaf <sultan@kerneltoast.com> > > > > When RSS is used and one of the RX queues lags behind others by more than > > 2048 frames, then new frames arriving on the lagged RX queue are > > incorrectly treated as old rather than new by the reorder buffer, and are > > thus spuriously dropped. This is because the reorder buffer treats frames > > as old when they have an SN that is more than 2048 away from the head SN, > > which causes the reorder buffer to drop frames that are actually valid. > > > > The odds of this occurring naturally increase with the number of > > RX queues used, so CPUs with many threads are more susceptible to > > encountering spurious packet drops caused by this issue. > > > > As it turns out, the firmware already detects when a frame is either old or > > duplicated and exports this information, but it's currently unused. Using > > these firmware bits to decide when frames are old or duplicated fixes the > > spurious drops. > > So I assume you tested it now, and it works? Somehow I had been under > the impression we never got it to work back when... Yep, I've been using this for about a year and have let it run through the original iperf3 reproducer I mentioned on bugzilla for hours with no stalls. My big git clones don't freeze anymore either. :) What I wasn't able to get working was the big reorder buffer cleanup that's made possible by using these firmware bits. The explicit queue sync can be removed easily, but there were further potential cleanups you had mentioned that I wasn't able to get working. I hadn't submitted this patch until now because I was hoping to get the big cleanup done simultaneously but I got too busy until now. Since this small patch does fix the issue, my thought is that this could be merged and sent to stable, and with subsequent patches I can chip away at cleaning up the reorder buffer. > > Johannes mentions that the 9000 series' firmware doesn't support these > > bits, so disable RSS on the 9000 series chipsets since they lack a > > mechanism to properly detect old and duplicated frames. > > Indeed, I checked this again, I also somehow thought it was backported > to some versions but doesn't look like. We can either leave those old > ones broken (they only shipped with fewer cores anyway), or just disable > it as you did here, not sure. RSS is probably not as relevant with those > slower speeds anyway. Agreed, I think it's worth disabling RSS on 9000 series to fix it there. If the RX queues are heavily backed up and incoming packets are not released fast enough due to a slow CPU, then I think the spurious drops could happen somewhat regularly on slow devices using 9000 series. It's probably also difficult to judge the impact/frequency of these spurious drops in the wild due to TCP retries potentially masking them. The issue can be very noticeable when a lot of packets are spuriously dropped at once though, so I think it's certainly worth the tradeoff to disable RSS on the older chipsets. > > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c > > @@ -918,7 +918,6 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm, > > struct iwl_mvm_sta *mvm_sta; > > struct iwl_mvm_baid_data *baid_data; > > struct iwl_mvm_reorder_buffer *buffer; > > - struct sk_buff *tail; > > u32 reorder = le32_to_cpu(desc->reorder_data); > > bool amsdu = desc->mac_flags2 & IWL_RX_MPDU_MFLG2_AMSDU; > > bool last_subframe = > > @@ -1020,7 +1019,7 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm, > > rx_status->device_timestamp, queue); > > > > /* drop any oudated packets */ > > - if (ieee80211_sn_less(sn, buffer->head_sn)) > > + if (reorder & IWL_RX_MPDU_REORDER_BA_OLD_SN) > > goto drop; > > > > /* release immediately if allowed by nssn and no stored frames */ > > @@ -1068,24 +1067,12 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm, > > return false; > > } > > All that "send queue sync" code in the middle that was _meant_ to fix > this issue but I guess never really did can also be removed, no? And the > timer, etc. etc. Indeed, and removing the queue sync + timer are easy. Would you prefer I send additional patches for at least those cleanups before the fix itself can be considered for merging? Sultan ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] wifi: iwlwifi: Fix spurious packet drops with RSS 2023-05-04 17:55 ` Sultan Alsawaf @ 2023-05-05 6:40 ` Johannes Berg 0 siblings, 0 replies; 3+ messages in thread From: Johannes Berg @ 2023-05-05 6:40 UTC (permalink / raw) To: Sultan Alsawaf Cc: Greenman, Gregory, Kalle Valo, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Goodstein, Mordechay, Coelho, Luciano, Sisodiya, Mukesh, linux-wireless@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, 2023-05-04 at 10:55 -0700, Sultan Alsawaf wrote: > > > > So I assume you tested it now, and it works? Somehow I had been under > > the impression we never got it to work back when... > > Yep, I've been using this for about a year and have let it run through the > original iperf3 reproducer I mentioned on bugzilla for hours with no stalls. My > big git clones don't freeze anymore either. :) Oh! OK, great. > What I wasn't able to get working was the big reorder buffer cleanup that's made > possible by using these firmware bits. The explicit queue sync can be removed > easily, but there were further potential cleanups you had mentioned that I > wasn't able to get working. Fair enough. > I hadn't submitted this patch until now because I was hoping to get the big > cleanup done simultaneously but I got too busy until now. Since this small patch > does fix the issue, my thought is that this could be merged and sent to stable, > and with subsequent patches I can chip away at cleaning up the reorder buffer. Sure, that makes sense. > > > Johannes mentions that the 9000 series' firmware doesn't support these > > > bits, so disable RSS on the 9000 series chipsets since they lack a > > > mechanism to properly detect old and duplicated frames. > > > > Indeed, I checked this again, I also somehow thought it was backported > > to some versions but doesn't look like. We can either leave those old > > ones broken (they only shipped with fewer cores anyway), or just disable > > it as you did here, not sure. RSS is probably not as relevant with those > > slower speeds anyway. > > Agreed, I think it's worth disabling RSS on 9000 series to fix it there. If the > RX queues are heavily backed up and incoming packets are not released fast > enough due to a slow CPU, then I think the spurious drops could happen somewhat > regularly on slow devices using 9000 series. > > It's probably also difficult to judge the impact/frequency of these spurious > drops in the wild due to TCP retries potentially masking them. The issue can be > very noticeable when a lot of packets are spuriously dropped at once though, so > I think it's certainly worth the tradeoff to disable RSS on the older chipsets. :) > Indeed, and removing the queue sync + timer are easy. Would you prefer I send > additional patches for at least those cleanups before the fix itself can be > considered for merging? > No, you know, maybe this is easier since it's the smallest possible change that fixes issues. Just have to see what Emmanuel says, he had said he sees issues with this change. johannes ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-05-05 6:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20230430001348.3552-1-sultan@kerneltoast.com>
2023-05-04 12:10 ` [PATCH] wifi: iwlwifi: Fix spurious packet drops with RSS Johannes Berg
2023-05-04 17:55 ` Sultan Alsawaf
2023-05-05 6:40 ` Johannes Berg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox