* [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues
@ 2026-06-12 20:59 Nicolai Buchwitz
2026-06-12 20:59 ` [PATCH net-next 1/3] net: bcmgenet: collapse TX priority queues to a single queue Nicolai Buchwitz
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Nicolai Buchwitz @ 2026-06-12 20:59 UTC (permalink / raw)
To: Doug Berger, Florian Fainelli, bcm-kernel-feedback-list,
Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Justin Chen, Ovidiu Panait, netdev, linux-kernel,
Nicolai Buchwitz
The strict-priority TX queues can starve under multi-queue load and
trip NETDEV_WATCHDOG. Justin's earlier series [1] tried to mitigate
the timeouts but kept the multi-queue design. Ovidiu Panait recently
proposed a WRR stop-gap [2]. This series drops the priority queues
entirely. Justin confirmed they are no longer required.
Patch 1 collapses v2-v4 hw_params to the same single-queue path v1
already uses. Patch 2 removes the now-dead priority register writes,
helper macros, and the dead "flow period for ring != 0" branch in
bcmgenet_init_tx_ring(); the DMA_ARBITER_{RR,WRR,SP} and
DMA_RING_BUF_PRIORITY_* HW defines are kept as register
documentation. Patch 3 switches the netdev allocation from
alloc_etherdev_mqs(., 5, 5) to alloc_etherdev(), since only one
TX/RX queue is ever used.
Tested on Raspberry Pi CM4 (BCM2711):
- Ovidiu's reproducer (iperf3 -u -b0 -P16 -t60) no longer trips
NETDEV_WATCHDOG.
- UDP sustains 956 Mbit/s line rate over 60 s with 0 datagrams
lost (0/4952890).
- Single-stream TCP throughput unchanged at 943 Mbit/s.
[1] https://lore.kernel.org/netdev/20260406175756.134567-1-justin.chen@broadcom.com/
[2] https://lore.kernel.org/netdev/20260610085238.56300-1-ovidiu.panait.rb@renesas.com/
Nicolai Buchwitz (3):
net: bcmgenet: collapse TX priority queues to a single queue
net: bcmgenet: remove dead priority queue plumbing
net: bcmgenet: allocate a single-queue netdev
.../net/ethernet/broadcom/genet/bcmgenet.c | 100 +++---------------
.../net/ethernet/broadcom/genet/bcmgenet.h | 2 -
2 files changed, 16 insertions(+), 86 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH net-next 1/3] net: bcmgenet: collapse TX priority queues to a single queue 2026-06-12 20:59 [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues Nicolai Buchwitz @ 2026-06-12 20:59 ` Nicolai Buchwitz 2026-06-15 18:40 ` Justin Chen 2026-06-12 20:59 ` [PATCH net-next 2/3] net: bcmgenet: remove dead priority queue plumbing Nicolai Buchwitz ` (2 subsequent siblings) 3 siblings, 1 reply; 9+ messages in thread From: Nicolai Buchwitz @ 2026-06-12 20:59 UTC (permalink / raw) To: Doug Berger, Florian Fainelli, bcm-kernel-feedback-list, Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Justin Chen, Ovidiu Panait, netdev, linux-kernel, Nicolai Buchwitz The strict-priority TX queues can starve under multi-queue load and trip NETDEV_WATCHDOG. Justin's earlier series [1] worked around the symptom but kept the design. The multi-queue design was originally used for STB use cases that are no longer needed, as confirmed by Justin. v1 hw_params already exercises a single-queue path. Point v2-v4 at the same configuration: ring 0 takes the full BD pool, every per-ring loop collapses to one iteration, and netif_set_real_num_tx_queues drops to 1 via the existing tx_queues + 1 arithmetic. Tested on Raspberry Pi CM4 (BCM2711). The baseline kernel trips NETDEV_WATCHDOG within seconds under iperf3 UDP saturation (-u -b0 -P16 -t60). After the change the same test completes without a watchdog, and a single-stream 60 s UDP run sustains 956 Mbit/s with 0/4952890 datagrams lost. Single-stream TCP throughput is unchanged at 943 Mbit/s. [1] https://lore.kernel.org/netdev/20260406175756.134567-1-justin.chen@broadcom.com/ Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> --- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c index ca403581357d..c892734b4cd0 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -3751,8 +3751,8 @@ static const struct bcmgenet_hw_params bcmgenet_hw_params_v1 = { }; static const struct bcmgenet_hw_params bcmgenet_hw_params_v2 = { - .tx_queues = 4, - .tx_bds_per_q = 32, + .tx_queues = 0, + .tx_bds_per_q = 0, .rx_queues = 0, .rx_bds_per_q = 0, .bp_in_en_shift = 16, @@ -3769,8 +3769,8 @@ static const struct bcmgenet_hw_params bcmgenet_hw_params_v2 = { }; static const struct bcmgenet_hw_params bcmgenet_hw_params_v3 = { - .tx_queues = 4, - .tx_bds_per_q = 32, + .tx_queues = 0, + .tx_bds_per_q = 0, .rx_queues = 0, .rx_bds_per_q = 0, .bp_in_en_shift = 17, @@ -3787,8 +3787,8 @@ static const struct bcmgenet_hw_params bcmgenet_hw_params_v3 = { }; static const struct bcmgenet_hw_params bcmgenet_hw_params_v4 = { - .tx_queues = 4, - .tx_bds_per_q = 32, + .tx_queues = 0, + .tx_bds_per_q = 0, .rx_queues = 0, .rx_bds_per_q = 0, .bp_in_en_shift = 17, -- 2.53.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 1/3] net: bcmgenet: collapse TX priority queues to a single queue 2026-06-12 20:59 ` [PATCH net-next 1/3] net: bcmgenet: collapse TX priority queues to a single queue Nicolai Buchwitz @ 2026-06-15 18:40 ` Justin Chen 0 siblings, 0 replies; 9+ messages in thread From: Justin Chen @ 2026-06-15 18:40 UTC (permalink / raw) To: Nicolai Buchwitz, Doug Berger, Florian Fainelli, bcm-kernel-feedback-list, Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Ovidiu Panait, netdev, linux-kernel On 6/12/26 1:59 PM, Nicolai Buchwitz wrote: > The strict-priority TX queues can starve under multi-queue load and > trip NETDEV_WATCHDOG. Justin's earlier series [1] worked around the > symptom but kept the design. > > The multi-queue design was originally used for STB use cases that are > no longer needed, as confirmed by Justin. v1 hw_params already > exercises a single-queue path. Point v2-v4 at the same configuration: > ring 0 takes the full BD pool, every per-ring loop collapses to one > iteration, and netif_set_real_num_tx_queues drops to 1 via the > existing tx_queues + 1 arithmetic. > > Tested on Raspberry Pi CM4 (BCM2711). The baseline kernel trips > NETDEV_WATCHDOG within seconds under iperf3 UDP saturation > (-u -b0 -P16 -t60). After the change the same test completes > without a watchdog, and a single-stream 60 s UDP run sustains > 956 Mbit/s with 0/4952890 datagrams lost. Single-stream TCP > throughput is unchanged at 943 Mbit/s. > > [1] https://lore.kernel.org/netdev/20260406175756.134567-1-justin.chen@broadcom.com/ > > Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Tested-by: Justin Chen <justin.chen@broadcom.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next 2/3] net: bcmgenet: remove dead priority queue plumbing 2026-06-12 20:59 [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues Nicolai Buchwitz 2026-06-12 20:59 ` [PATCH net-next 1/3] net: bcmgenet: collapse TX priority queues to a single queue Nicolai Buchwitz @ 2026-06-12 20:59 ` Nicolai Buchwitz 2026-06-15 18:40 ` Justin Chen 2026-06-12 20:59 ` [PATCH net-next 3/3] net: bcmgenet: allocate a single-queue netdev Nicolai Buchwitz 2026-06-13 21:57 ` [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues Jakub Kicinski 3 siblings, 1 reply; 9+ messages in thread From: Nicolai Buchwitz @ 2026-06-12 20:59 UTC (permalink / raw) To: Doug Berger, Florian Fainelli, bcm-kernel-feedback-list, Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Justin Chen, Ovidiu Panait, netdev, linux-kernel, Nicolai Buchwitz With a single TX ring there is nothing left to prioritize. Drop the unused register writes, enum entries, helper macros, and the dead "flow period for ring != 0" branch in bcmgenet_init_tx_ring(). The DMA_ARBITER_{RR,WRR,SP} and DMA_RING_BUF_PRIORITY_* HW defines are kept as register documentation. No functional change. Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> --- .../net/ethernet/broadcom/genet/bcmgenet.c | 84 ++----------------- .../net/ethernet/broadcom/genet/bcmgenet.h | 2 - 2 files changed, 9 insertions(+), 77 deletions(-) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c index c892734b4cd0..25f339eb304f 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -40,10 +40,6 @@ #include "bcmgenet.h" -/* Default highest priority queue for multi queue support */ -#define GENET_Q1_PRIORITY 0 -#define GENET_Q0_PRIORITY 1 - #define GENET_Q0_RX_BD_CNT \ (TOTAL_DESC - priv->hw_params->rx_queues * priv->hw_params->rx_bds_per_q) #define GENET_Q0_TX_BD_CNT \ @@ -187,10 +183,6 @@ enum dma_reg { DMA_CTRL, DMA_STATUS, DMA_SCB_BURST_SIZE, - DMA_ARB_CTRL, - DMA_PRIORITY_0, - DMA_PRIORITY_1, - DMA_PRIORITY_2, DMA_INDEX2RING_0, DMA_INDEX2RING_1, DMA_INDEX2RING_2, @@ -223,10 +215,6 @@ static const u8 bcmgenet_dma_regs_v3plus[] = { [DMA_CTRL] = 0x04, [DMA_STATUS] = 0x08, [DMA_SCB_BURST_SIZE] = 0x0C, - [DMA_ARB_CTRL] = 0x2C, - [DMA_PRIORITY_0] = 0x30, - [DMA_PRIORITY_1] = 0x34, - [DMA_PRIORITY_2] = 0x38, [DMA_RING0_TIMEOUT] = 0x2C, [DMA_RING1_TIMEOUT] = 0x30, [DMA_RING2_TIMEOUT] = 0x34, @@ -259,10 +247,6 @@ static const u8 bcmgenet_dma_regs_v2[] = { [DMA_CTRL] = 0x04, [DMA_STATUS] = 0x08, [DMA_SCB_BURST_SIZE] = 0x0C, - [DMA_ARB_CTRL] = 0x30, - [DMA_PRIORITY_0] = 0x34, - [DMA_PRIORITY_1] = 0x38, - [DMA_PRIORITY_2] = 0x3C, [DMA_RING0_TIMEOUT] = 0x2C, [DMA_RING1_TIMEOUT] = 0x30, [DMA_RING2_TIMEOUT] = 0x34, @@ -286,10 +270,6 @@ static const u8 bcmgenet_dma_regs_v1[] = { [DMA_CTRL] = 0x00, [DMA_STATUS] = 0x04, [DMA_SCB_BURST_SIZE] = 0x0C, - [DMA_ARB_CTRL] = 0x30, - [DMA_PRIORITY_0] = 0x34, - [DMA_PRIORITY_1] = 0x38, - [DMA_PRIORITY_2] = 0x3C, [DMA_RING0_TIMEOUT] = 0x2C, [DMA_RING1_TIMEOUT] = 0x30, [DMA_RING2_TIMEOUT] = 0x34, @@ -2126,13 +2106,6 @@ static netdev_tx_t bcmgenet_xmit(struct sk_buff *skb, struct net_device *dev) int i; index = skb_get_queue_mapping(skb); - /* Mapping strategy: - * queue_mapping = 0, unclassified, packet xmited through ring 0 - * queue_mapping = 1, goes to ring 1. (highest priority queue) - * queue_mapping = 2, goes to ring 2. - * queue_mapping = 3, goes to ring 3. - * queue_mapping = 4, goes to ring 4. - */ ring = &priv->tx_rings[index]; txq = netdev_get_tx_queue(dev, index); @@ -2712,7 +2685,6 @@ static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv, { struct bcmgenet_tx_ring *ring = &priv->tx_rings[index]; u32 words_per_bd = WORDS_PER_BD(priv); - u32 flow_period_val = 0; spin_lock_init(&ring->lock); ring->priv = priv; @@ -2727,16 +2699,11 @@ static void bcmgenet_init_tx_ring(struct bcmgenet_priv *priv, ring->end_ptr = end_ptr - 1; ring->prod_index = 0; - /* Set flow period for ring != 0 */ - if (index) - flow_period_val = ENET_MAX_MTU_SIZE << 16; - bcmgenet_tdma_ring_writel(priv, index, 0, TDMA_PROD_INDEX); bcmgenet_tdma_ring_writel(priv, index, 0, TDMA_CONS_INDEX); bcmgenet_tdma_ring_writel(priv, index, 1, DMA_MBUF_DONE_THRESH); - /* Disable rate control for now */ - bcmgenet_tdma_ring_writel(priv, index, flow_period_val, - TDMA_FLOW_PERIOD); + /* Rate control disabled */ + bcmgenet_tdma_ring_writel(priv, index, 0, TDMA_FLOW_PERIOD); bcmgenet_tdma_ring_writel(priv, index, ((size << DMA_RING_SIZE_SHIFT) | RX_BUF_LENGTH), DMA_RING_BUF_SIZE); @@ -2919,52 +2886,20 @@ static int bcmgenet_rdma_disable(struct bcmgenet_priv *priv) return -ETIMEDOUT; } -/* Initialize Tx queues +/* Initialize the single Tx queue. * - * Queues 1-4 are priority-based, each one has 32 descriptors, - * with queue 1 being the highest priority queue. - * - * Queue 0 is the default Tx queue with - * GENET_Q0_TX_BD_CNT = 256 - 4 * 32 = 128 descriptors. - * - * The transmit control block pool is then partitioned as follows: - * - Tx queue 0 uses tx_cbs[0..127] - * - Tx queue 1 uses tx_cbs[128..159] - * - Tx queue 2 uses tx_cbs[160..191] - * - Tx queue 3 uses tx_cbs[192..223] - * - Tx queue 4 uses tx_cbs[224..255] + * Queue 0 owns the full TX descriptor pool (GENET_Q0_TX_BD_CNT BDs) + * and is the only ring enabled in DMA_RING_CFG / DMA_CTRL. */ static void bcmgenet_init_tx_queues(struct net_device *dev) { struct bcmgenet_priv *priv = netdev_priv(dev); - unsigned int start = 0, end = GENET_Q0_TX_BD_CNT; - u32 i, ring_mask, dma_priority[3] = {0, 0, 0}; - - /* Enable strict priority arbiter mode */ - bcmgenet_tdma_writel(priv, DMA_ARBITER_SP, DMA_ARB_CTRL); - /* Initialize Tx priority queues */ - for (i = 0; i <= priv->hw_params->tx_queues; i++) { - bcmgenet_init_tx_ring(priv, i, end - start, start, end); - start = end; - end += priv->hw_params->tx_bds_per_q; - dma_priority[DMA_PRIO_REG_INDEX(i)] |= - (i ? GENET_Q1_PRIORITY : GENET_Q0_PRIORITY) - << DMA_PRIO_REG_SHIFT(i); - } + bcmgenet_init_tx_ring(priv, 0, GENET_Q0_TX_BD_CNT, 0, + GENET_Q0_TX_BD_CNT); - /* Set Tx queue priorities */ - bcmgenet_tdma_writel(priv, dma_priority[0], DMA_PRIORITY_0); - bcmgenet_tdma_writel(priv, dma_priority[1], DMA_PRIORITY_1); - bcmgenet_tdma_writel(priv, dma_priority[2], DMA_PRIORITY_2); - - /* Configure Tx queues as descriptor rings */ - ring_mask = (1 << (priv->hw_params->tx_queues + 1)) - 1; - bcmgenet_tdma_writel(priv, ring_mask, DMA_RING_CFG); - - /* Enable Tx rings */ - ring_mask <<= DMA_RING_BUF_EN_SHIFT; - bcmgenet_tdma_writel(priv, ring_mask, DMA_CTRL); + bcmgenet_tdma_writel(priv, BIT(0), DMA_RING_CFG); + bcmgenet_tdma_writel(priv, BIT(DMA_RING_BUF_EN_SHIFT), DMA_CTRL); } static void bcmgenet_enable_rx_napi(struct bcmgenet_priv *priv) @@ -4123,7 +4058,6 @@ static int bcmgenet_probe(struct platform_device *pdev) if (err) goto err_clk_disable; - /* setup number of real queues + 1 */ netif_set_real_num_tx_queues(priv->dev, priv->hw_params->tx_queues + 1); netif_set_real_num_rx_queues(priv->dev, priv->hw_params->rx_queues + 1); diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.h b/drivers/net/ethernet/broadcom/genet/bcmgenet.h index 22a958ba9902..ce449ea0b40b 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.h +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.h @@ -431,8 +431,6 @@ struct bcmgenet_rx_stats64 { #define DMA_ARBITER_MODE_MASK 0x03 #define DMA_RING_BUF_PRIORITY_MASK 0x1F #define DMA_RING_BUF_PRIORITY_SHIFT 5 -#define DMA_PRIO_REG_INDEX(q) ((q) / 6) -#define DMA_PRIO_REG_SHIFT(q) (((q) % 6) * DMA_RING_BUF_PRIORITY_SHIFT) #define DMA_RATE_ADJ_MASK 0xFF /* Tx/Rx Dma Descriptor common bits*/ -- 2.53.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 2/3] net: bcmgenet: remove dead priority queue plumbing 2026-06-12 20:59 ` [PATCH net-next 2/3] net: bcmgenet: remove dead priority queue plumbing Nicolai Buchwitz @ 2026-06-15 18:40 ` Justin Chen 0 siblings, 0 replies; 9+ messages in thread From: Justin Chen @ 2026-06-15 18:40 UTC (permalink / raw) To: Nicolai Buchwitz, Doug Berger, Florian Fainelli, bcm-kernel-feedback-list, Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Ovidiu Panait, netdev, linux-kernel On 6/12/26 1:59 PM, Nicolai Buchwitz wrote: > With a single TX ring there is nothing left to prioritize. Drop the > unused register writes, enum entries, helper macros, and the dead > "flow period for ring != 0" branch in bcmgenet_init_tx_ring(). > > The DMA_ARBITER_{RR,WRR,SP} and DMA_RING_BUF_PRIORITY_* HW defines > are kept as register documentation. > > No functional change. > > Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Tested-by: Justin Chen <justin.chen@broadcom.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next 3/3] net: bcmgenet: allocate a single-queue netdev 2026-06-12 20:59 [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues Nicolai Buchwitz 2026-06-12 20:59 ` [PATCH net-next 1/3] net: bcmgenet: collapse TX priority queues to a single queue Nicolai Buchwitz 2026-06-12 20:59 ` [PATCH net-next 2/3] net: bcmgenet: remove dead priority queue plumbing Nicolai Buchwitz @ 2026-06-12 20:59 ` Nicolai Buchwitz 2026-06-15 18:40 ` Justin Chen 2026-06-13 21:57 ` [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues Jakub Kicinski 3 siblings, 1 reply; 9+ messages in thread From: Nicolai Buchwitz @ 2026-06-12 20:59 UTC (permalink / raw) To: Doug Berger, Florian Fainelli, bcm-kernel-feedback-list, Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Justin Chen, Ovidiu Panait, netdev, linux-kernel, Nicolai Buchwitz The driver only uses TX ring 0 and RX ring 0, so allocating a netdev with GENET_MAX_MQ_CNT + 1 = 5 TX and 5 RX slots leaves four of each unused. Switch to alloc_etherdev() which allocates exactly one queue of each kind. No functional change: netif_set_real_num_{tx,rx}_queues() already clamps the visible queue count to 1. Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> --- drivers/net/ethernet/broadcom/genet/bcmgenet.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c index 25f339eb304f..001bd445b110 100644 --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c @@ -3915,9 +3915,7 @@ static int bcmgenet_probe(struct platform_device *pdev) unsigned int i; int err = -EIO; - /* Up to GENET_MAX_MQ_CNT + 1 TX queues and RX queues */ - dev = alloc_etherdev_mqs(sizeof(*priv), GENET_MAX_MQ_CNT + 1, - GENET_MAX_MQ_CNT + 1); + dev = alloc_etherdev(sizeof(*priv)); if (!dev) { dev_err(&pdev->dev, "can't allocate net device\n"); return -ENOMEM; -- 2.53.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 3/3] net: bcmgenet: allocate a single-queue netdev 2026-06-12 20:59 ` [PATCH net-next 3/3] net: bcmgenet: allocate a single-queue netdev Nicolai Buchwitz @ 2026-06-15 18:40 ` Justin Chen 0 siblings, 0 replies; 9+ messages in thread From: Justin Chen @ 2026-06-15 18:40 UTC (permalink / raw) To: Nicolai Buchwitz, Doug Berger, Florian Fainelli, bcm-kernel-feedback-list, Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: Ovidiu Panait, netdev, linux-kernel On 6/12/26 1:59 PM, Nicolai Buchwitz wrote: > The driver only uses TX ring 0 and RX ring 0, so allocating a netdev > with GENET_MAX_MQ_CNT + 1 = 5 TX and 5 RX slots leaves four of each > unused. Switch to alloc_etherdev() which allocates exactly one queue > of each kind. > > No functional change: netif_set_real_num_{tx,rx}_queues() already > clamps the visible queue count to 1. > > Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Tested-by: Justin Chen <justin.chen@broadcom.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues 2026-06-12 20:59 [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues Nicolai Buchwitz ` (2 preceding siblings ...) 2026-06-12 20:59 ` [PATCH net-next 3/3] net: bcmgenet: allocate a single-queue netdev Nicolai Buchwitz @ 2026-06-13 21:57 ` Jakub Kicinski 2026-06-15 11:35 ` Nicolai Buchwitz 3 siblings, 1 reply; 9+ messages in thread From: Jakub Kicinski @ 2026-06-13 21:57 UTC (permalink / raw) To: Nicolai Buchwitz Cc: Doug Berger, Florian Fainelli, bcm-kernel-feedback-list, Andrew Lunn, David S . Miller, Eric Dumazet, Paolo Abeni, Justin Chen, Ovidiu Panait, netdev, linux-kernel On Fri, 12 Jun 2026 22:59:12 +0200 Nicolai Buchwitz wrote: > Tested on Raspberry Pi CM4 (BCM2711): > - Ovidiu's reproducer (iperf3 -u -b0 -P16 -t60) no longer trips > NETDEV_WATCHDOG. > - UDP sustains 956 Mbit/s line rate over 60 s with 0 datagrams > lost (0/4952890). > - Single-stream TCP throughput unchanged at 943 Mbit/s. Of course it has no impact on a single TCP stream test, since TCP stream can only use one queue. If anything it should help. The testing here is not very convincing. At least install a realistic qdisc (fq/fq_codel/cake) and run multi-stream test with multiple cores? What's the CPU idle delta in such a test? The reason for this change is not coming thru from the submission. Ovidiu's patch makes much more intuitive sense. I'll apply that, please rebase. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues 2026-06-13 21:57 ` [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues Jakub Kicinski @ 2026-06-15 11:35 ` Nicolai Buchwitz 0 siblings, 0 replies; 9+ messages in thread From: Nicolai Buchwitz @ 2026-06-15 11:35 UTC (permalink / raw) To: Jakub Kicinski Cc: Doug Berger, Florian Fainelli, bcm-kernel-feedback-list, Andrew Lunn, David S . Miller, Eric Dumazet, Paolo Abeni, Justin Chen, Ovidiu Panait, netdev, linux-kernel Hi Jakub On 13.6.2026 23:57, Jakub Kicinski wrote: > On Fri, 12 Jun 2026 22:59:12 +0200 Nicolai Buchwitz wrote: >> Tested on Raspberry Pi CM4 (BCM2711): >> - Ovidiu's reproducer (iperf3 -u -b0 -P16 -t60) no longer trips >> NETDEV_WATCHDOG. >> - UDP sustains 956 Mbit/s line rate over 60 s with 0 datagrams >> lost (0/4952890). >> - Single-stream TCP throughput unchanged at 943 Mbit/s. > > Of course it has no impact on a single TCP stream test, since TCP > stream can only use one queue. If anything it should help. > The testing here is not very convincing. At least install a realistic > qdisc (fq/fq_codel/cake) and run multi-stream test with multiple cores? > What's the CPU idle delta in such a test? Fair. Tests I ran with fq_codel: # TCP iperf3 -c <remote> -P 16 # UDP iperf3 -c <remote> -u -b 1000M -P 16 # RR iperf3 -c <remote> -t 60 & netperf -H <remote> -t TCP_RR -l 30 -- -r 1,1 With the following results (all based on net-next with PP): PP PP+WRR PP+series TCP -P16 Mbit/s 938 941 941 TCP retransmits 56228 57679 48606 UDP -b1000M -P16 Mbit/s 956 956 956 TCP_RR under TCP load 451.7 453.8 596.8 CPU idle 91.55% 90.53% 90.97% CPU0 softirq 32.6% 33.4% 33.5% CPU1-3 idle (avg) 97.6% 97.7% 97.7% So WRR fixes the watchdog issue, otherwise is within noise. This series adds ~31% TCP_RR under load and reduces retransmits by ~14%. Note, that bcmgenet on BCM2711 has only two IRQs and no per-queue affinity, so all HW interrupts are handled by one core regardless of queue count. > > The reason for this change is not coming thru from the submission. > Ovidiu's patch makes much more intuitive sense. I'll apply that, > please rebase. Will rebase and resend with a hopefully better reasoning :) Thanks Nicolai ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-06-15 18:40 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-12 20:59 [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues Nicolai Buchwitz 2026-06-12 20:59 ` [PATCH net-next 1/3] net: bcmgenet: collapse TX priority queues to a single queue Nicolai Buchwitz 2026-06-15 18:40 ` Justin Chen 2026-06-12 20:59 ` [PATCH net-next 2/3] net: bcmgenet: remove dead priority queue plumbing Nicolai Buchwitz 2026-06-15 18:40 ` Justin Chen 2026-06-12 20:59 ` [PATCH net-next 3/3] net: bcmgenet: allocate a single-queue netdev Nicolai Buchwitz 2026-06-15 18:40 ` Justin Chen 2026-06-13 21:57 ` [PATCH net-next 0/3] net: bcmgenet: collapse TX priority queues Jakub Kicinski 2026-06-15 11:35 ` Nicolai Buchwitz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox