* Allocating more RX descriptors that can fit in their related rings
@ 2024-09-04 18:01 Remi Pommarel
2024-09-05 18:36 ` Kalle Valo
2024-09-06 3:57 ` Karthikeyan Periyasamy
0 siblings, 2 replies; 7+ messages in thread
From: Remi Pommarel @ 2024-09-04 18:01 UTC (permalink / raw)
To: ath12k; +Cc: Nicolas Escande
Hello,
As far as I understand a bunch (ATH12K_RX_DESC_COUNT) of rx descriptors
gets allocated, then CMEM is configured for those descriptors cookie
conversion and is kept available in dp->rx_desc_free_list pool.
Those descriptors seem to be used to fed two different rings, the
rx_refill_buf_ring ring via ath12k_dp_rx_bufs_replenish() and the
reo_reinject_ring one with ath12k_dp_rx_h_defrag_reo_reinject(). While
the former is kept fully used if possible the latter is only used on
demand (i.e. reinjection of defragmented MPDU).
It seems that the number of RX descriptors ATH12K_RX_DESC_COUNT (12288)
is higher than what those two rings can fit (DP_REO_REINJECT_RING_SIZE +
DP_RXDMA_BUF_RING_SIZE = 4096 + 32 = 4128).
My question is why are we allocating that much (12288) buffer if only a
small part (4128) can be used in worst case ?
Wouldn't it be ok to only allocate just enough RX descriptors to fill
both ring (with proper 512 alignment to ease CMEM configuration) as
below ?
#define ATH12K_RX_DESC_COUNT ALIGN(DP_REO_REINJECT_RING_SIZE + \
DP_RXDMA_BUF_RING_SIZE, \
ATH12K_MAX_SPT_ENTRIES)
Or am I missing something and this is going to impact performances ?
Thanks
--
Remi
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Allocating more RX descriptors that can fit in their related rings 2024-09-04 18:01 Allocating more RX descriptors that can fit in their related rings Remi Pommarel @ 2024-09-05 18:36 ` Kalle Valo 2024-09-06 3:57 ` Karthikeyan Periyasamy 1 sibling, 0 replies; 7+ messages in thread From: Kalle Valo @ 2024-09-05 18:36 UTC (permalink / raw) To: Remi Pommarel; +Cc: ath12k, Nicolas Escande Remi Pommarel <repk@triplefau.lt> writes: > As far as I understand a bunch (ATH12K_RX_DESC_COUNT) of rx descriptors > gets allocated, then CMEM is configured for those descriptors cookie > conversion and is kept available in dp->rx_desc_free_list pool. > > Those descriptors seem to be used to fed two different rings, the > rx_refill_buf_ring ring via ath12k_dp_rx_bufs_replenish() and the > reo_reinject_ring one with ath12k_dp_rx_h_defrag_reo_reinject(). While > the former is kept fully used if possible the latter is only used on > demand (i.e. reinjection of defragmented MPDU). > > It seems that the number of RX descriptors ATH12K_RX_DESC_COUNT (12288) > is higher than what those two rings can fit (DP_REO_REINJECT_RING_SIZE + > DP_RXDMA_BUF_RING_SIZE = 4096 + 32 = 4128). > > My question is why are we allocating that much (12288) buffer if only a > small part (4128) can be used in worst case ? > > Wouldn't it be ok to only allocate just enough RX descriptors to fill > both ring (with proper 512 alignment to ease CMEM configuration) as > below ? > > #define ATH12K_RX_DESC_COUNT ALIGN(DP_REO_REINJECT_RING_SIZE + \ > DP_RXDMA_BUF_RING_SIZE, \ > ATH12K_MAX_SPT_ENTRIES) > > Or am I missing something and this is going to impact performances ? I don't know why it's so and no replies from others. I recommend just sending a patch, preferably with numbers explaining how much memory is saved, and let's see if anyone reacts. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Allocating more RX descriptors that can fit in their related rings 2024-09-04 18:01 Allocating more RX descriptors that can fit in their related rings Remi Pommarel 2024-09-05 18:36 ` Kalle Valo @ 2024-09-06 3:57 ` Karthikeyan Periyasamy 2024-09-06 4:00 ` Karthikeyan Periyasamy 1 sibling, 1 reply; 7+ messages in thread From: Karthikeyan Periyasamy @ 2024-09-06 3:57 UTC (permalink / raw) To: ath12k On 9/4/2024 11:31 PM, Remi Pommarel wrote: > Hello, > > As far as I understand a bunch (ATH12K_RX_DESC_COUNT) of rx descriptors > gets allocated, then CMEM is configured for those descriptors cookie > conversion and is kept available in dp->rx_desc_free_list pool. > > Those descriptors seem to be used to fed two different rings, the > rx_refill_buf_ring ring via ath12k_dp_rx_bufs_replenish() and the > reo_reinject_ring one with ath12k_dp_rx_h_defrag_reo_reinject(). While > the former is kept fully used if possible the latter is only used on > demand (i.e. reinjection of defragmented MPDU). > > It seems that the number of RX descriptors ATH12K_RX_DESC_COUNT (12288) > is higher than what those two rings can fit (DP_REO_REINJECT_RING_SIZE + > DP_RXDMA_BUF_RING_SIZE = 4096 + 32 = 4128). > > My question is why are we allocating that much (12288) buffer if only a > small part (4128) can be used in worst case ? > > Wouldn't it be ok to only allocate just enough RX descriptors to fill > both ring (with proper 512 alignment to ease CMEM configuration) as > below ? > > #define ATH12K_RX_DESC_COUNT ALIGN(DP_REO_REINJECT_RING_SIZE + \ > DP_RXDMA_BUF_RING_SIZE, \ > ATH12K_MAX_SPT_ENTRIES) > > Or am I missing something and this is going to impact performances ? > Yes, it will impact performance. Host replenish RxDMA buffers to the HW and after processing it given back to Rx path (REO, WBM Error, Rx Error). So it can be relate to one-to-one direct mapping. HW consume in-progress Rx buffer depend on Data rate mode. If RxDMA buffers not available then it impact performance due to Out-of-order Rx error due to RxDMA buffer unavailable. -- Karthikeyan Periyasamy -- கார்த்திகேயன் பெரியசாமி ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Allocating more RX descriptors that can fit in their related rings 2024-09-06 3:57 ` Karthikeyan Periyasamy @ 2024-09-06 4:00 ` Karthikeyan Periyasamy 2024-09-06 16:09 ` Remi Pommarel 0 siblings, 1 reply; 7+ messages in thread From: Karthikeyan Periyasamy @ 2024-09-06 4:00 UTC (permalink / raw) To: ath12k corrected the statement below On 9/6/2024 9:27 AM, Karthikeyan Periyasamy wrote: > > > On 9/4/2024 11:31 PM, Remi Pommarel wrote: >> Hello, >> >> As far as I understand a bunch (ATH12K_RX_DESC_COUNT) of rx descriptors >> gets allocated, then CMEM is configured for those descriptors cookie >> conversion and is kept available in dp->rx_desc_free_list pool. >> >> Those descriptors seem to be used to fed two different rings, the >> rx_refill_buf_ring ring via ath12k_dp_rx_bufs_replenish() and the >> reo_reinject_ring one with ath12k_dp_rx_h_defrag_reo_reinject(). While >> the former is kept fully used if possible the latter is only used on >> demand (i.e. reinjection of defragmented MPDU). >> >> It seems that the number of RX descriptors ATH12K_RX_DESC_COUNT (12288) >> is higher than what those two rings can fit (DP_REO_REINJECT_RING_SIZE + >> DP_RXDMA_BUF_RING_SIZE = 4096 + 32 = 4128). >> >> My question is why are we allocating that much (12288) buffer if only a >> small part (4128) can be used in worst case ? >> > Wouldn't it be ok to only allocate just enough RX descriptors to fill >> both ring (with proper 512 alignment to ease CMEM configuration) as >> below ? >> >> #define ATH12K_RX_DESC_COUNT ALIGN(DP_REO_REINJECT_RING_SIZE + \ >> DP_RXDMA_BUF_RING_SIZE, \ >> ATH12K_MAX_SPT_ENTRIES) >> >> Or am I missing something and this is going to impact performances ? >> > > Yes, it will impact performance. > > Host replenish RxDMA buffers to the HW and after processing it given > back to Rx path (REO, WBM Error, Rx Error). So it can be relate to > one-to-one direct mapping. HW consume in-progress Rx buffer depend on > Data rate mode. If RxDMA buffers not available then it impact > performance due to Out-of-order Rx error due to RxDMA buffer unavailable. > Yes, it will impact performance. Host replenish RxDMA buffers to the HW and after processing it given back to Rx path (REO, WBM Error, Rx Error). So it cannot be relate to one-to-one direct mapping. HW consume in-progress Rx buffer depend on Data rate mode. If RxDMA buffers not available then it impact performance due to Out-of-order Rx error due to RxDMA buffer unavailable. -- Karthikeyan Periyasamy -- கார்த்திகேயன் பெரியசாமி ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Allocating more RX descriptors that can fit in their related rings 2024-09-06 4:00 ` Karthikeyan Periyasamy @ 2024-09-06 16:09 ` Remi Pommarel 2024-09-07 2:39 ` Karthikeyan Periyasamy 0 siblings, 1 reply; 7+ messages in thread From: Remi Pommarel @ 2024-09-06 16:09 UTC (permalink / raw) To: ath12k On Fri, Sep 06, 2024 at 09:30:33AM +0530, Karthikeyan Periyasamy wrote: > On 9/6/2024 9:27 AM, Karthikeyan Periyasamy wrote: > > > > > > On 9/4/2024 11:31 PM, Remi Pommarel wrote: > > > Hello, > > > > > > As far as I understand a bunch (ATH12K_RX_DESC_COUNT) of rx descriptors > > > gets allocated, then CMEM is configured for those descriptors cookie > > > conversion and is kept available in dp->rx_desc_free_list pool. > > > > > > Those descriptors seem to be used to fed two different rings, the > > > rx_refill_buf_ring ring via ath12k_dp_rx_bufs_replenish() and the > > > reo_reinject_ring one with ath12k_dp_rx_h_defrag_reo_reinject(). While > > > the former is kept fully used if possible the latter is only used on > > > demand (i.e. reinjection of defragmented MPDU). > > > > > > It seems that the number of RX descriptors ATH12K_RX_DESC_COUNT (12288) > > > is higher than what those two rings can fit (DP_REO_REINJECT_RING_SIZE + > > > DP_RXDMA_BUF_RING_SIZE = 4096 + 32 = 4128). > > > > > > My question is why are we allocating that much (12288) buffer if only a > > > small part (4128) can be used in worst case ? > > > > Wouldn't it be ok to only allocate just enough RX descriptors to fill > > > both ring (with proper 512 alignment to ease CMEM configuration) as > > > below ? > > > > > > #define ATH12K_RX_DESC_COUNT ALIGN(DP_REO_REINJECT_RING_SIZE + \ > > > DP_RXDMA_BUF_RING_SIZE, \ > > > ATH12K_MAX_SPT_ENTRIES) > > > > > > Or am I missing something and this is going to impact performances ? > > > ... > > Yes, it will impact performance. > > Host replenish RxDMA buffers to the HW and after processing it given > back to Rx path (REO, WBM Error, Rx Error). So it cannot be relate to > one-to-one direct mapping. HW consume in-progress Rx buffer depend on > Data rate mode. If RxDMA buffers not available then it impact > performance due to Out-of-order Rx error due to RxDMA buffer unavailable. Thanks for the clarification. I think I do see your point, I though the only way to fill descriptors to HW was in ath12k_dp_rx_process() by giving back the rx desc after it has been used. In that case having extra buffer wouldn't be needed as it wouldn't be possible to refill faster that processing those descriptors. But it seems that there is a disctinct irq group (i.e. pci*_wlan_dp_3) that is used to process RE0, WBM Error, Rx error but also to replenish buffers if the refill ring is 3/4 empty (called host2rxdma). So hypothetically here, if you isolate this irq to a specific CPU (e.g. having more that 4 CPU one for each RX data rings and an extra one for error and host2rxdma refill) you could have scenarios where the data ring processing ath12k_dp_rx_process() could lag enough to be needed this extra buffer refilling, is that correct ? If this is right then that would explain why I didn't see any performance difference as with only 4 CPUs (one RX ring processing per CPU) the extra buffer refilling couldn't be faster than just giving the used descriptor back. Thanks -- Remi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Allocating more RX descriptors that can fit in their related rings 2024-09-06 16:09 ` Remi Pommarel @ 2024-09-07 2:39 ` Karthikeyan Periyasamy 2024-09-09 13:12 ` Remi Pommarel 0 siblings, 1 reply; 7+ messages in thread From: Karthikeyan Periyasamy @ 2024-09-07 2:39 UTC (permalink / raw) To: ath12k On 9/6/2024 9:39 PM, Remi Pommarel wrote: > On Fri, Sep 06, 2024 at 09:30:33AM +0530, Karthikeyan Periyasamy wrote: >> On 9/6/2024 9:27 AM, Karthikeyan Periyasamy wrote: >>> >>> >>> On 9/4/2024 11:31 PM, Remi Pommarel wrote: >>>> Hello, >>>> >>>> As far as I understand a bunch (ATH12K_RX_DESC_COUNT) of rx descriptors >>>> gets allocated, then CMEM is configured for those descriptors cookie >>>> conversion and is kept available in dp->rx_desc_free_list pool. >>>> >>>> Those descriptors seem to be used to fed two different rings, the >>>> rx_refill_buf_ring ring via ath12k_dp_rx_bufs_replenish() and the >>>> reo_reinject_ring one with ath12k_dp_rx_h_defrag_reo_reinject(). While >>>> the former is kept fully used if possible the latter is only used on >>>> demand (i.e. reinjection of defragmented MPDU). >>>> >>>> It seems that the number of RX descriptors ATH12K_RX_DESC_COUNT (12288) >>>> is higher than what those two rings can fit (DP_REO_REINJECT_RING_SIZE + >>>> DP_RXDMA_BUF_RING_SIZE = 4096 + 32 = 4128). >>>> >>>> My question is why are we allocating that much (12288) buffer if only a >>>> small part (4128) can be used in worst case ? >>>>> Wouldn't it be ok to only allocate just enough RX descriptors to fill >>>> both ring (with proper 512 alignment to ease CMEM configuration) as >>>> below ? >>>> >>>> #define ATH12K_RX_DESC_COUNT ALIGN(DP_REO_REINJECT_RING_SIZE + \ >>>> DP_RXDMA_BUF_RING_SIZE, \ >>>> ATH12K_MAX_SPT_ENTRIES) >>>> >>>> Or am I missing something and this is going to impact performances ? >>>> > > ... > >> >> Yes, it will impact performance. >> >> Host replenish RxDMA buffers to the HW and after processing it given >> back to Rx path (REO, WBM Error, Rx Error). So it cannot be relate to >> one-to-one direct mapping. HW consume in-progress Rx buffer depend on >> Data rate mode. If RxDMA buffers not available then it impact >> performance due to Out-of-order Rx error due to RxDMA buffer unavailable. > > Thanks for the clarification. > > I think I do see your point, I though the only way to fill descriptors > to HW was in ath12k_dp_rx_process() by giving back the rx desc after it > has been used. In that case having extra buffer wouldn't be needed as it > wouldn't be possible to refill faster that processing those descriptors. > Explicit hw irq request is there to refill the Rx buffer, processed by host2rxdma[grp_id] under ath12k_dp_service_srng(). Whenever HW need refill it raise explicit hw irq. > But it seems that there is a disctinct irq group (i.e. pci*_wlan_dp_3) > that is used to process RE0, WBM Error, Rx error but also to replenish > buffers if the refill ring is 3/4 empty (called host2rxdma). above one > > So hypothetically here, if you isolate this irq to a specific CPU (e.g. > having more that 4 CPU one for each RX data rings and an extra one for > error and host2rxdma refill) you could have scenarios where the data > ring processing ath12k_dp_rx_process() could lag enough to be needed > this extra buffer refilling, is that correct ? Depends on the data rate traffic you pump. But you can experiment the reduced count Rx buffer and see the behavior and conclude the performance impact. Also consider the small size frame traffic with highest data rate, here more Rx descriptors used for the traffic. > > If this is right then that would explain why I didn't see any > performance difference as with only 4 CPUs (one RX ring processing per > CPU) the extra buffer refilling couldn't be faster than just giving the > used descriptor back. > > Thanks > -- Karthikeyan Periyasamy -- கார்த்திகேயன் பெரியசாமி ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Allocating more RX descriptors that can fit in their related rings 2024-09-07 2:39 ` Karthikeyan Periyasamy @ 2024-09-09 13:12 ` Remi Pommarel 0 siblings, 0 replies; 7+ messages in thread From: Remi Pommarel @ 2024-09-09 13:12 UTC (permalink / raw) To: ath12k On Sat, Sep 07, 2024 at 08:09:15AM +0530, Karthikeyan Periyasamy wrote: > > > On 9/6/2024 9:39 PM, Remi Pommarel wrote: > > On Fri, Sep 06, 2024 at 09:30:33AM +0530, Karthikeyan Periyasamy wrote: > > > On 9/6/2024 9:27 AM, Karthikeyan Periyasamy wrote: > > > > > > > > > > > > On 9/4/2024 11:31 PM, Remi Pommarel wrote: > > > > > Hello, > > > > > > > > > > As far as I understand a bunch (ATH12K_RX_DESC_COUNT) of rx descriptors > > > > > gets allocated, then CMEM is configured for those descriptors cookie > > > > > conversion and is kept available in dp->rx_desc_free_list pool. > > > > > > > > > > Those descriptors seem to be used to fed two different rings, the > > > > > rx_refill_buf_ring ring via ath12k_dp_rx_bufs_replenish() and the > > > > > reo_reinject_ring one with ath12k_dp_rx_h_defrag_reo_reinject(). While > > > > > the former is kept fully used if possible the latter is only used on > > > > > demand (i.e. reinjection of defragmented MPDU). > > > > > > > > > > It seems that the number of RX descriptors ATH12K_RX_DESC_COUNT (12288) > > > > > is higher than what those two rings can fit (DP_REO_REINJECT_RING_SIZE + > > > > > DP_RXDMA_BUF_RING_SIZE = 4096 + 32 = 4128). > > > > > > > > > > My question is why are we allocating that much (12288) buffer if only a > > > > > small part (4128) can be used in worst case ? > > > > > > Wouldn't it be ok to only allocate just enough RX descriptors to fill > > > > > both ring (with proper 512 alignment to ease CMEM configuration) as > > > > > below ? > > > > > > > > > > #define ATH12K_RX_DESC_COUNT ALIGN(DP_REO_REINJECT_RING_SIZE + \ > > > > > DP_RXDMA_BUF_RING_SIZE, \ > > > > > ATH12K_MAX_SPT_ENTRIES) > > > > > > > > > > Or am I missing something and this is going to impact performances ? > > > > > > > > > ... > > > > > > > > Yes, it will impact performance. > > > > > > Host replenish RxDMA buffers to the HW and after processing it given > > > back to Rx path (REO, WBM Error, Rx Error). So it cannot be relate to > > > one-to-one direct mapping. HW consume in-progress Rx buffer depend on > > > Data rate mode. If RxDMA buffers not available then it impact > > > performance due to Out-of-order Rx error due to RxDMA buffer unavailable. > > > > Thanks for the clarification. > > > > I think I do see your point, I though the only way to fill descriptors > > to HW was in ath12k_dp_rx_process() by giving back the rx desc after it > > has been used. In that case having extra buffer wouldn't be needed as it > > wouldn't be possible to refill faster that processing those descriptors. > > > > Explicit hw irq request is there to refill the Rx buffer, processed by > host2rxdma[grp_id] under ath12k_dp_service_srng(). > > Whenever HW need refill it raise explicit hw irq. > > > But it seems that there is a disctinct irq group (i.e. pci*_wlan_dp_3) > > that is used to process RE0, WBM Error, Rx error but also to replenish > > buffers if the refill ring is 3/4 empty (called host2rxdma). > > above one I do think it is when refill ring is 3/4 empty if I understand the following excerpt from ath12k_dp_rx_bufs_replenish() correctly: if (!req_entries && (num_free > (rx_ring->bufs_max * 3) / 4)) req_entries = num_free; > > > > > So hypothetically here, if you isolate this irq to a specific CPU (e.g. > > having more that 4 CPU one for each RX data rings and an extra one for > > error and host2rxdma refill) you could have scenarios where the data > > ring processing ath12k_dp_rx_process() could lag enough to be needed > > this extra buffer refilling, is that correct ? > > Depends on the data rate traffic you pump. > > But you can experiment the reduced count Rx buffer and see the behavior and > conclude the performance impact. Also consider the small size frame traffic > with highest data rate, here more Rx descriptors used for the traffic. You right small packet traffic could increase the rx descriptors ring pressure and there rx buffer starvation could possibly happen in one to one mapping. Thanks for your clarifications. This question came from the fact that we are using a Qualcomm internal patch (not apply to mainline yet) that introduces a 512MB memory profile config for which the opposite situation happen (less RX descriptor than room in the DP_RXDMA_BUF_RING_SIZE) causing fragmented packet to be drop (all rx descriptors being used for rxdma buf reception and none left for ath12k_dp_rx_h_defrag_reo_reinject()). So as long as ATH12K_RX_DESC_COUNT is higher than the sum of rx_refill_buf_ring and reo_reinject_ring size that is fine with me. Maybe that is worth to be asserted at build time ? -- Remi ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-09-09 13:12 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-04 18:01 Allocating more RX descriptors that can fit in their related rings Remi Pommarel 2024-09-05 18:36 ` Kalle Valo 2024-09-06 3:57 ` Karthikeyan Periyasamy 2024-09-06 4:00 ` Karthikeyan Periyasamy 2024-09-06 16:09 ` Remi Pommarel 2024-09-07 2:39 ` Karthikeyan Periyasamy 2024-09-09 13:12 ` Remi Pommarel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox