public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Anderson <sean.anderson@linux.dev>
To: "Gupta, Suraj" <Suraj.Gupta2@amd.com>,
	"andrew+netdev@lunn.ch" <andrew+netdev@lunn.ch>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"edumazet@google.com" <edumazet@google.com>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	"Simek, Michal" <michal.simek@amd.com>,
	"Pandey, Radhey Shyam" <radhey.shyam.pandey@amd.com>,
	"horms@kernel.org" <horms@kernel.org>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Katakam, Harini" <harini.katakam@amd.com>
Subject: Re: [PATCH net 2/2] net: xilinx: axienet: Fix BQL accounting for multi-BD TX packets
Date: Thu, 26 Mar 2026 15:56:07 -0400	[thread overview]
Message-ID: <c7f22eac-3098-4833-b247-3a615f49a5ea@linux.dev> (raw)
In-Reply-To: <SA1PR12MB679810BDE0FD91B702A869C2C956A@SA1PR12MB6798.namprd12.prod.outlook.com>

On 3/26/26 15:51, Gupta, Suraj wrote:
> [Public]
> 
>> -----Original Message-----
>> From: Sean Anderson <sean.anderson@linux.dev>
>> Sent: Thursday, March 26, 2026 9:08 PM
>> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch;
>> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
>> pabeni@redhat.com; Simek, Michal <michal.simek@amd.com>; Pandey,
>> Radhey Shyam <radhey.shyam.pandey@amd.com>; horms@kernel.org
>> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux-
>> kernel@vger.kernel.org; Katakam, Harini <harini.katakam@amd.com>
>> Subject: Re: [PATCH net 2/2] net: xilinx: axienet: Fix BQL accounting for multi-BD
>> TX packets
>>
>> Caution: This message originated from an External Source. Use proper caution
>> when opening attachments, clicking links, or responding.
>>
>>
>> On 3/25/26 01:30, Gupta, Suraj wrote:
>> > [Public]
>> >
>> >> -----Original Message-----
>> >> From: Sean Anderson <sean.anderson@linux.dev>
>> >> Sent: Tuesday, March 24, 2026 9:39 PM
>> >> To: Gupta, Suraj <Suraj.Gupta2@amd.com>; andrew+netdev@lunn.ch;
>> >> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
>> >> pabeni@redhat.com; Simek, Michal <michal.simek@amd.com>; Pandey,
>> >> Radhey Shyam <radhey.shyam.pandey@amd.com>; horms@kernel.org
>> >> Cc: netdev@vger.kernel.org; linux-arm-kernel@lists.infradead.org;
>> >> linux- kernel@vger.kernel.org; Katakam, Harini
>> >> <harini.katakam@amd.com>
>> >> Subject: Re: [PATCH net 2/2] net: xilinx: axienet: Fix BQL accounting
>> >> for multi-BD TX packets
>> >>
>> >> Caution: This message originated from an External Source. Use proper
>> >> caution when opening attachments, clicking links, or responding.
>> >>
>> >>
>> >> On 3/24/26 10:53, Suraj Gupta wrote:
>> >> > When a TX packet spans multiple buffer descriptors
>> >> > (scatter-gather), the per-BD byte count is accumulated into a local
>> >> > variable that resets on each NAPI poll. If the BDs for a single
>> >> > packet complete across different polls, the earlier bytes are lost and never
>> credited to BQL.
>> >> > This causes BQL to think bytes are permanently in-flight,
>> >> > eventually stalling the TX queue.
>> >> >
>> >> > Fix this by replacing the local accumulator with a persistent
>> >> > counter
>> >> > (tx_compl_bytes) that survives across polls and is reset only after
>> >> > updating BQL and stats.
>> >>
>> >> Do we need this? Can't we just do something like
>> >>
>> >
>> > Nope, the 'size' variable passed to axienet_free_tx_chain() is local
>> > to axienet_tx_poll() and goes out of scope between different polls.
>> > This means it can't track completion bytes across multiple NAPI polls.
>>
>> Yes, but that's fine since we only update completed bytes when we retire at
>> least one packet:
>>
>>         packets = axienet_free_tx_chain(lp, &size, budget);
>>         if (packets) {
>>                 netdev_completed_queue(ndev, packets, size);
>>                 u64_stats_update_begin(&lp->tx_stat_sync);
>>                 u64_stats_add(&lp->tx_packets, packets);
>>                 u64_stats_add(&lp->tx_bytes, size);
>>                 u64_stats_update_end(&lp->tx_stat_sync);
>>
>>                 /* Matches barrier in axienet_start_xmit */
>>                 smp_mb();
>>                 if (((tail - ci) & (lp->rx_bd_num - 1)) >= MAX_SKB_FRAGS + 1)
>>                         netif_wake_queue(ndev);
>>         }
>>
>> and this matches the value we use when enqueuing a packet
>>
>>         tail_p = lp->tx_bd_p + sizeof(*lp->tx_bd_v) * (last_p - lp->tx_bd_v);
>>         smp_store_release(&lp->tx_bd_tail, new_tail_ptr);
>>         netdev_sent_queue(ndev, skb->len);
>>
> 
> Let’s say we have a packet with 4 fragments. When xmit() is called, the last buffer descriptor (the 4th one) is marked with the retiral flag[1]. Now, during completion, suppose the first 3 buffer descriptors are processed in one NAPI poll, and the 4th descriptor is handled in a subsequent poll. In this scenario, the size won’t get updated for the first 3 BDs because there’s no packet retiral for them. Then, during the second poll, the size update will only reflect the completion length of the 4th BD.

That's why I suggested using skb->len and not the size from the descriptor.

--Sean

> This means that the bytes corresponding to the first 3 fragments are never credited to BQL, which leads to incorrect accounting and, eventually, to the TX queue stalling as described.
> Proposed changes to accumulate the completion length across polls until the entire packet is processed directly addresses this gap.
> 
> [1]: https://elixir.bootlin.com/linux/v7.0-rc5/source/drivers/net/ethernet/xilinx/xilinx_axienet_main.c#L1127
> 
> 
> Regards,
> Suraj
> 
>> > Regards,
>> > Suraj
>> >
>> >> diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
>> >> b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
>> >> index 415e9bc252527..1ea8a6592bce1 100644
>> >> --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
>> >> +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
>> >> @@ -768,6 +768,7 @@ static int axienet_free_tx_chain(struct
>> >> axienet_local *lp, u32 *sizep, int budge
>> >>                 if (cur_p->skb) {
>> >>                         struct axienet_cb *cb = (void
>> >> *)cur_p->skb->cb;
>> >>
>> >> +                       *sizep += skb->len;
>> >>                         dma_unmap_sgtable(lp->dev, &cb->sgt, DMA_TO_DEVICE, 0);
>> >>                         sg_free_table_chained(&cb->sgt, XAE_INLINE_SG_CNT);
>> >>                         napi_consume_skb(cur_p->skb, budget); @@
>> >> -783,8 +784,6 @@ static int axienet_free_tx_chain(struct axienet_local *lp,
>> u32 *sizep, int budge
>> >>                 wmb();
>> >>                 cur_p->cntrl = 0;
>> >>                 cur_p->status = 0;
>> >> -
>> >> -               *sizep += status & XAXIDMA_BD_STS_ACTUAL_LEN_MASK;
>> >>         }
>> >>
>> >>         smp_store_release(&lp->tx_bd_ci, (ci + i) & (lp->tx_bd_num -
>> >> 1));
>> >>
>> >> > Fixes: c900e49d58eb ("net: xilinx: axienet: Implement BQL")
>> >> > Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
>> >> > ---
>> >> >  drivers/net/ethernet/xilinx/xilinx_axienet.h  |  3 +++
>> >> > .../net/ethernet/xilinx/xilinx_axienet_main.c | 20
>> >> > +++++++++----------
>> >> >  2 files changed, 13 insertions(+), 10 deletions(-)
>> >> >
>> >> > diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h
>> >> > b/drivers/net/ethernet/xilinx/xilinx_axienet.h
>> >> > index 602389843342..a4444c939451 100644
>> >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
>> >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
>> >> > @@ -509,6 +509,8 @@ struct skbuf_dma_descriptor {
>> >> >   *           complete. Only updated at runtime by TX NAPI poll.
>> >> >   * @tx_bd_tail:      Stores the index of the next Tx buffer descriptor in the
>> ring
>> >> >   *              to be populated.
>> >> > + * @tx_compl_bytes: Accumulates TX completion length until a full
>> packet is
>> >> > + *              reported to the stack.
>> >> >   * @tx_packets: TX packet count for statistics
>> >> >   * @tx_bytes:        TX byte count for statistics
>> >> >   * @tx_stat_sync: Synchronization object for TX stats @@ -592,6
>> >> > +594,7 @@ struct axienet_local {
>> >> >       u32 tx_bd_num;
>> >> >       u32 tx_bd_ci;
>> >> >       u32 tx_bd_tail;
>> >> > +     u32 tx_compl_bytes;
>> >> >       u64_stats_t tx_packets;
>> >> >       u64_stats_t tx_bytes;
>> >> >       struct u64_stats_sync tx_stat_sync; diff --git
>> >> > a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
>> >> > b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
>> >> > index b06e4c37ff61..95bf61986cb7 100644
>> >> > --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
>> >> > +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
>> >> > @@ -692,6 +692,8 @@ static void axienet_dma_stop(struct
>> >> > axienet_local
>> >> *lp)
>> >> >       axienet_lock_mii(lp);
>> >> >       __axienet_device_reset(lp);
>> >> >       axienet_unlock_mii(lp);
>> >> > +
>> >> > +     lp->tx_compl_bytes = 0;
>> >> >  }
>> >> >
>> >> >  /**
>> >> > @@ -770,8 +772,6 @@ static int axienet_device_reset(struct
>> >> > net_device
>> >> *ndev)
>> >> >   * @first_bd:        Index of first descriptor to clean up
>> >> >   * @nr_bds:  Max number of descriptors to clean up
>> >> >   * @force:   Whether to clean descriptors even if not complete
>> >> > - * @sizep:   Pointer to a u32 filled with the total sum of all bytes
>> >> > - *           in all cleaned-up descriptors. Ignored if NULL.
>> >> >   * @budget:  NAPI budget (use 0 when not called from NAPI poll)
>> >> >   *
>> >> >   * Would either be called after a successful transmit operation,
>> >> > or after @@ -780,7 +780,7 @@ static int axienet_device_reset(struct
>> >> > net_device
>> >> *ndev)
>> >> >   * Return: The number of packets handled.
>> >> >   */
>> >> >  static int axienet_free_tx_chain(struct axienet_local *lp, u32 first_bd,
>> >> > -                              int nr_bds, bool force, u32 *sizep, int budget)
>> >> > +                              int nr_bds, bool force, int budget)
>> >> >  {
>> >> >       struct axidma_bd *cur_p;
>> >> >       unsigned int status;
>> >> > @@ -819,8 +819,8 @@ static int axienet_free_tx_chain(struct
>> >> > axienet_local
>> >> *lp, u32 first_bd,
>> >> >               cur_p->cntrl = 0;
>> >> >               cur_p->status = 0;
>> >> >
>> >> > -             if (sizep)
>> >> > -                     *sizep += status & XAXIDMA_BD_STS_ACTUAL_LEN_MASK;
>> >> > +             if (!force)
>> >> > +                     lp->tx_compl_bytes += status &
>> >> > + XAXIDMA_BD_STS_ACTUAL_LEN_MASK;
>> >> >       }
>> >> >
>> >> >       if (!force) {
>> >> > @@ -999,18 +999,18 @@ static int axienet_tx_poll(struct napi_struct
>> >> > *napi, int budget)  {
>> >> >       struct axienet_local *lp = container_of(napi, struct axienet_local,
>> napi_tx);
>> >> >       struct net_device *ndev = lp->ndev;
>> >> > -     u32 size = 0;
>> >> >       int packets;
>> >> >
>> >> >       packets = axienet_free_tx_chain(lp, lp->tx_bd_ci, lp->tx_bd_num, false,
>> >> > -                                     &size, budget);
>> >> > +                                     budget);
>> >> >
>> >> >       if (packets) {
>> >> > -             netdev_completed_queue(ndev, packets, size);
>> >> > +             netdev_completed_queue(ndev, packets,
>> >> > + lp->tx_compl_bytes);
>> >> >               u64_stats_update_begin(&lp->tx_stat_sync);
>> >> >               u64_stats_add(&lp->tx_packets, packets);
>> >> > -             u64_stats_add(&lp->tx_bytes, size);
>> >> > +             u64_stats_add(&lp->tx_bytes, lp->tx_compl_bytes);
>> >> >               u64_stats_update_end(&lp->tx_stat_sync);
>> >> > +             lp->tx_compl_bytes = 0;
>> >> >
>> >> >               /* Matches barrier in axienet_start_xmit */
>> >> >               smp_mb();
>> >> > @@ -1115,7 +1115,7 @@ axienet_start_xmit(struct sk_buff *skb,
>> >> > struct
>> >> net_device *ndev)
>> >> >                               netdev_err(ndev, "TX DMA mapping error\n");
>> >> >                       ndev->stats.tx_dropped++;
>> >> >                       axienet_free_tx_chain(lp, orig_tail_ptr, ii + 1,
>> >> > -                                           true, NULL, 0);
>> >> > +                                           true, 0);
>> >> >                       dev_kfree_skb_any(skb);
>> >> >                       return NETDEV_TX_OK;
>> >> >               }


      reply	other threads:[~2026-03-26 19:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 14:53 [PATCH net 0/2] Correct BD length masks and BQL accounting for multi-BD TX packets Suraj Gupta
2026-03-24 14:53 ` [PATCH net 1/2] net: xilinx: axienet: Correct BD length masks to match AXIDMA IP spec Suraj Gupta
2026-03-24 16:09   ` Sean Anderson
2026-03-24 14:53 ` [PATCH net 2/2] net: xilinx: axienet: Fix BQL accounting for multi-BD TX packets Suraj Gupta
2026-03-24 16:09   ` Sean Anderson
2026-03-25  5:30     ` Gupta, Suraj
2026-03-26 15:38       ` Sean Anderson
2026-03-26 19:51         ` Gupta, Suraj
2026-03-26 19:56           ` Sean Anderson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c7f22eac-3098-4833-b247-3a615f49a5ea@linux.dev \
    --to=sean.anderson@linux.dev \
    --cc=Suraj.Gupta2@amd.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=harini.katakam@amd.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michal.simek@amd.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=radhey.shyam.pandey@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox