From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73575FF8867 for ; Wed, 29 Apr 2026 09:59:20 +0000 (UTC) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id CB52C40693; Wed, 29 Apr 2026 11:58:29 +0200 (CEST) Received: from mx1.wirefilter.com (mx1.wirefilter.com [82.147.223.86]) by mails.dpdk.org (Postfix) with ESMTP id B24A640285; Tue, 28 Apr 2026 09:05:00 +0200 (CEST) Received: from egw.wirefilter.com (localhost.localdomain [127.0.0.1]) by mx1.wirefilter.com (Proxmox) with ESMTP id 3A496C171F; Tue, 28 Apr 2026 10:05:00 +0300 (+03) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wirefilter.com; h=cc:cc:content-type:content-type:date:from:from:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to; s=default; bh=ntQNRlao2mPYk+1Z+7bux7cKbc9P+5B78QmY5TPu5qc=; b= XcGs8Rpbw/p/N5n/miVBNsvMJ2Pj63agvOYRvMh9ch/nosh62PMMhlyQuW+j6uDH gHefu2kSHMRIXTXffpf2okPkDsWZ4u9WwQiVqD5U0xe3RLtkwxj6EOYsFdRAQxwh 7dNeyDMuF385Mbwv7Bwjw6Cg5ZAxp1R6+hFvIlvu8A3ZOhpyLdJvRrNpzAZ89UUe PHKVfg0hzap4+Jp2sENcy9ZZvF05kEkVpNDsKC5QYPAdawcJQlYyHNfjYwTTMa4g AD58q5uE6dxNgvC8GJ7b0ZmKrGKDE45W2xRJIphPo7cbAW7ivQQbWlPUg4bDgy+C D9cYdqq5AvHjpE5yVzVLPg== Date: Tue, 28 Apr 2026 10:05:00 +0300 (AST) From: Abdulrahman Alshawi To: dev Cc: bharat , stable , vipinpv85 Message-ID: <430997417.15909938.1777359900023.JavaMail.zimbra@wirefilter.com> In-Reply-To: <1991720957.15868387.1777308056746.JavaMail.zimbra@wirefilter.com> References: <2136264989.15868282.1777307418083.JavaMail.zimbra@wirefilter.com> <1991720957.15868387.1777308056746.JavaMail.zimbra@wirefilter.com> Subject: Re: [PATCH 1/2] net/cxgbe: fix Rx handling for packed responses MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=_bd862c0c-6a2d-4a7e-909a-8c87184f07ef" X-Originating-IP: [10.1.1.3] X-Mailer: Zimbra 10.1.16_GA_4850 (ZimbraModernWebClient - GC147 (Mac)/10.1.16_GA_4850) Thread-Topic: net/cxgbe: fix Rx handling for packed responses Thread-Index: vibgbhNHf0ETdtVkwLuwbZrWfuFjrQpAUFJekIjF01Q= X-Mailman-Approved-At: Wed, 29 Apr 2026 11:58:18 +0200 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --=_bd862c0c-6a2d-4a7e-909a-8c87184f07ef Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable =20 Adding Vipin Varghese =20 =20 =C2=A0=20 =20 =20 =20 -----Original Message----- From: Abdulrahman To: dev Cc: bharat ; stable Date: Monday, 27 April 2026 7:40 PM +03 Subject: [PATCH 1/2] net/cxgbe: fix Rx handling for packed responses =20 The Rx path assumes every SGE response starts a new Free List buffer: =C2= =A0 =C2=A0 =C2=A0 BUG_ON(!(len & F_RSPD_NEWBUF)); =C2=A0 That is not always= true. On T5/T6, small packets can be packed into the same FL buffer. Only = the first response for that buffer has NEWBUF set; later responses can refe= r to the same buffer with a different payload offset. =C2=A0 The current PM= D consumes one FL buffer per response. When packed responses are delivered,= the FL consumer state goes out of sync with the hardware and the affected = ingress queue stops making progress. From user space this shows up as: =C2= =A0 =C2=A0 - rx_bgN_dropped_packets increasing at line rate =C2=A0 - q_ipac= kets for the affected queue staying at 0 =C2=A0 - imissed increasing =C2=A0= - no recovery until the port is restarted =C2=A0 Fix this by tracking pack= ed-buffer alignment and copying packet payload from the current FL buffer a= t q->offset, only freeing the FL buffer once it has actually been consumed.= =C2=A0 This patch: =C2=A0 - adds sge::fl_align from the ingress pad/pack s= ettings =C2=A0 - introduces cxgbe_copy_rx_pkt() to copy from packed FL buff= ers =C2=A0 - advances q->offset by len rounded up to fl_align =C2=A0 - free= s an FL buffer only after full consumption =C2=A0 - drains the previous buf= fer when a later response arrives with NEWBUF =C2=A0 Reproduce (T62100-LP-C= R, FW 2.1.19.0 / TP 0.1.23.2): =C2=A0 =C2=A0 dpdk-testpmd -l 1-9 -a 0000:18= :00.4 -- \ =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 --rxq=3D32 --txq= =3D32 --nb-cores=3D8 --forward-mode=3Drxonly -i =C2=A0 =C2=A0 testpmd> port= stop all =C2=A0 testpmd> flow flush 0 =C2=A0 testpmd> flow create 0 ingres= s pattern eth \ =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 / vlan tci spec 0 = tci mask 0x0007 \ =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 / end actions qu= eue index 5 / end =C2=A0 testpmd> port start all =C2=A0 testpmd> start =C2= =A0 Without this patch, rx_qN_packets stays at 0 while rx_bgN_dropped_packe= ts rises at line rate. With the patch, rx_qN_packets tracks received traffi= c and rx_bgN_dropped_packets stays at 0. =C2=A0 Signed-off-by: Abdulrahman = Alshawi --- =C2=A0drivers/net/cxgbe/base/adapter.h = | =C2=A0 1 + =C2=A0drivers/net/cxgbe/sge.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 | 122 ++++++++++++++++++++++++------- =C2=A02 files changed, 98 inserti= ons(+), 25 deletions(-) =C2=A0 diff --git a/drivers/net/cxgbe/base/adapter.= h b/drivers/net/cxgbe/base/adapter.h index 207f3ecb88..e67cf22950 100644 --= - a/drivers/net/cxgbe/base/adapter.h +++ b/drivers/net/cxgbe/base/adapter.h= @@ -280,6 +280,7 @@ struct sge { =C2=A0 =C2=A0 u16 max_ethqsets; =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* # of available Ethernet queue sets */ =C2=A0= =C2=A0 u32 stat_len; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* l= ength of status page at ring end */ =C2=A0 =C2=A0 u32 pktshift; =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* padding between CPL & packet data= */ + =C2=A0 u32 fl_align; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= /* packed Rx packet alignment */ =C2=A0 =C2=A0 =C2=A0 /* response queue in= terrupt parameters */ =C2=A0 =C2=A0 u16 timer_val[SGE_NTIMERS]; diff --git = a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c index e9d45f24c4..7cf5c= 70775 100644 --- a/drivers/net/cxgbe/sge.c +++ b/drivers/net/cxgbe/sge.c @@= -1492,6 +1492,90 @@ static inline void cxgbe_fill_mbuf_info(struct adapter= *adap, =C2=A0=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 RTE_MBUF_F_RX_L4_CKSUM_BAD); =C2=A0} =C2=A0 +static int cxgbe_copy_rx_p= kt(struct sge_rspq *q, struct sge_eth_rxq *rxq, + =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 u32 len, struct rte_mbuf **out) +{ + =C2=A0 struct= sge *s =3D &q->adapter->sge; + =C2=A0 struct rte_mbuf *pkt; + =C2=A0 char = *dst; + =C2=A0 u32 copied =3D 0; + =C2=A0 u32 remaining =3D len; + + =C2=A0= pkt =3D rte_pktmbuf_alloc(q->mb_pool); + =C2=A0 if (unlikely(!pkt)) { + = =C2=A0 =C2=A0 =C2=A0 rxq->rspq.eth_dev->data->rx_mbuf_alloc_failed++; + =C2= =A0 =C2=A0 =C2=A0 rxq->stats.rx_drops++; + =C2=A0 =C2=A0 =C2=A0 return -ENO= MEM; + =C2=A0 } + + =C2=A0 dst =3D rte_pktmbuf_append(pkt, len); + =C2=A0 i= f (unlikely(!dst)) { + =C2=A0 =C2=A0 =C2=A0 rte_pktmbuf_free(pkt); + =C2=A0= =C2=A0 =C2=A0 rxq->stats.rx_drops++; + =C2=A0 =C2=A0 =C2=A0 return -ENOMEM= ; + =C2=A0 } + + =C2=A0 while (remaining) { + =C2=A0 =C2=A0 =C2=A0 const st= ruct rx_sw_desc *rsd =3D &rxq->fl.sdesc[rxq->fl.cidx]; + =C2=A0 =C2=A0 =C2= =A0 struct rte_mbuf *src =3D rsd->buf; + =C2=A0 =C2=A0 =C2=A0 u32 bufsz =3D= get_buf_size(q->adapter, rsd); + =C2=A0 =C2=A0 =C2=A0 u32 copy_len; + + = =C2=A0 =C2=A0 =C2=A0 if (unlikely(!src || q->offset < 0 || + =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (u32)q->offset >=3D bufsz)) { + =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 rte_pktmbuf_free(pkt); + =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 rxq->stats.rx_drops++; + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 r= eturn -EINVAL; + =C2=A0 =C2=A0 =C2=A0 } + + =C2=A0 =C2=A0 =C2=A0 copy_len = =3D RTE_MIN(bufsz - (u32)q->offset, remaining); + =C2=A0 =C2=A0 =C2=A0 rte_= memcpy(dst + copied, + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 rte_pktmbu= f_mtod_offset(src, const void *, q->offset), + =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 copy_len); + + =C2=A0 =C2=A0 =C2=A0 copied +=3D copy_len; + = =C2=A0 =C2=A0 =C2=A0 remaining -=3D copy_len; + =C2=A0 =C2=A0 =C2=A0 q->off= set +=3D copy_len; + + =C2=A0 =C2=A0 =C2=A0 if (remaining) { + =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 free_rx_bufs(&rxq->fl, 1); + =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 q->offset =3D 0; + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 continu= e; + =C2=A0 =C2=A0 =C2=A0 } + + =C2=A0 =C2=A0 =C2=A0 q->offset =3D RTE_ALIG= N_CEIL(q->offset, s->fl_align); + =C2=A0 =C2=A0 =C2=A0 if ((u32)q->offset >= =3D bufsz) { + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 free_rx_bufs(&rxq->fl, 1)= ; + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 q->offset =3D 0; + =C2=A0 =C2=A0 =C2= =A0 } + =C2=A0 } + + =C2=A0 *out =3D pkt; + =C2=A0 return 0; +} + +static u= nsigned int cxgbe_fl_pkt_align(struct adapter *adap) +{ + =C2=A0 u32 sge_co= ntrol =3D t4_read_reg(adap, A_SGE_CONTROL); + =C2=A0 unsigned int ingpad_sh= ift, ingpad, fl_align; + + =C2=A0 ingpad_shift =3D CHELSIO_CHIP_VERSION(ada= p->params.chip) <=3D CHELSIO_T5 ? + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 X_INGPADBOUNDARY_SHIFT : X_T6_INGPADBOUNDARY_SHIFT; + =C2=A0 ingpad =3D= 1U << (G_INGPADBOUNDARY(sge_control) + ingpad_shift); + =C2=A0 fl_align = =3D ingpad; + + =C2=A0 if (!is_t4(adap->params.chip)) { + =C2=A0 =C2=A0 =C2= =A0 u32 sge_control2 =3D t4_read_reg(adap, A_SGE_CONTROL2); + =C2=A0 =C2=A0= =C2=A0 unsigned int ingpack =3D G_INGPACKBOUNDARY(sge_control2); + + =C2= =A0 =C2=A0 =C2=A0 ingpack =3D ingpack =3D=3D X_INGPACKBOUNDARY_16B ? +=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 16 : 1U << (ingpack + X_INGPACKBOUND= ARY_SHIFT); + =C2=A0 =C2=A0 =C2=A0 fl_align =3D RTE_MAX(ingpad, ingpack); += =C2=A0 } + + =C2=A0 return fl_align ? fl_align : RTE_CACHE_LINE_SIZE; +} += =C2=A0/** =C2=A0 * process_responses - process responses from an SGE respo= nse queue =C2=A0 * @q: the ingress queue to process @@ -1535,14 +1619,12 @@= static int process_responses(struct sge_rspq *q, int budget, =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 stat_pidx =3D ntohs(q->stat->pidx); =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 stat_pidx_diff =3D P_IDXDIFF(q, stat_pidx);= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 while (stat_pidx_diff && budget_= left) { - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 const struct rx_= sw_desc *rsd =3D - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 &rxq->fl.sdesc[rxq->fl.cidx]; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 const struct rss_header *rss_hdr =3D =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (const void *)q->cur_desc= ; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 const struct cpl_= rx_pkt *cpl =3D =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 (const void *)&q->cur_desc[1]; - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 struct rte_mbuf *pkt, *npkt; - =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 u32 len, bufsz; + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 struct rte_mbuf *pkt; + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 u32 len; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 rc =3D (const struct rsp_ctrl *) =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ((const char *)q->cur_desc + @@ = -1553,28 +1635,16 @@ static int process_responses(struct sge_rspq *q, int b= udget, =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 break; =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 l= en =3D ntohl(rc->pldbuflen_qid); - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 BUG_ON(!(len & F_RSPD_NEWBUF)); - =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 pkt =3D rsd->buf; - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 npkt =3D pkt; - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 len =3D G_RSPD_LEN(len); - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 pkt->pkt_len =3D len; - - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 /* Chain mbufs into len if necessary */ - =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 while (len) { - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 struct rte_mbuf *new_pkt =3D rsd->buf; - - =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bufsz =3D min(g= et_buf_size(q->adapter, - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 rsd), len); - =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 new_pkt->data_l= en =3D bufsz; - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 unmap_rx_buf(&rxq->fl); - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 len -=3D bufsz; - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 npkt->next =3D new_pkt; - =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 npkt =3D new_pkt; - =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pkt->nb_segs++; - =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 rsd =3D &rxq->fl.sd= esc[rxq->fl.cidx]; + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (l= en & F_RSPD_NEWBUF) { + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 if (q->offset > 0) { + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 free_rx_bufs(&rxq->fl, 1); + =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 q->of= fset =3D 0; + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 } =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 } - =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 npkt->next =3D NULL; - =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pkt->nb_segs--; + =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 len =3D G_RSPD_LEN(len); + =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D cxgbe_copy_rx_pkt(q, rxq, len, &= pkt); + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (unlikely(ret))= + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 break; = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 cxgbe_fill_m= buf_info(q->adapter, cpl, pkt); =C2=A0 @@ -2379,6 +2449,7 @@ int t4_sge_ini= t(struct adapter *adap) =C2=A0 =C2=A0 sge_control =3D t4_read_reg(adap, A_S= GE_CONTROL); =C2=A0 =C2=A0 s->pktshift =3D G_PKTSHIFT(sge_control); =C2=A0 = =C2=A0 s->stat_len =3D (sge_control & F_EGRSTATUSPAGESIZE) ? 128 : 64; + = =C2=A0 s->fl_align =3D cxgbe_fl_pkt_align(adap); =C2=A0 =C2=A0 ret =3D t4_s= ge_init_soft(adap); =C2=A0 =C2=A0 if (ret < 0) { =C2=A0 =C2=A0 =C2=A0 =C2= =A0 dev_err(adap, "%s: t4_sge_init_soft failed, error %d\n", @@ -2516,6 +25= 87,7 @@ int t4vf_sge_init(struct adapter *adap) =C2=A0 =C2=A0 s->stat_len = =3D ((sge_control & F_EGRSTATUSPAGESIZE) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 ? 128 : 64); =C2=A0 =C2=A0 s->pktshift =3D G_PKTSHIFT(sge_control);= + =C2=A0 s->fl_align =3D RTE_CACHE_LINE_SIZE; =C2=A0 =C2=A0 =C2=A0 /* =C2= =A0 =C2=A0 * A FL with <=3D fl_starve_thres buffers is starving and a perio= dic --=C2=A0 2.39.5 =20 --=_bd862c0c-6a2d-4a7e-909a-8c87184f07ef Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit

Adding Vipin Varghese

 

From: Abdulrahman <ashawi@wirefilter.com>
To: dev <dev@dpdk.org>
Cc: bharat <bharat@chelsio.com>; stable <stable@dpdk.org>
Date: Monday, 27 April 2026 7:40 PM +03
Subject: [PATCH 1/2] net/cxgbe: fix Rx handling for packed responses

The Rx path assumes every SGE response starts a new Free List buffer:

 

    BUG_ON(!(len & F_RSPD_NEWBUF));

 

That is not always true. On T5/T6, small packets can be packed into the

same FL buffer. Only the first response for that buffer has NEWBUF set;

later responses can refer to the same buffer with a different payload

offset.

 

The current PMD consumes one FL buffer per response. When packed

responses are delivered, the FL consumer state goes out of sync with the

hardware and the affected ingress queue stops making progress. From user

space this shows up as:

 

  - rx_bgN_dropped_packets increasing at line rate

  - q_ipackets for the affected queue staying at 0

  - imissed increasing

  - no recovery until the port is restarted

 

Fix this by tracking packed-buffer alignment and copying packet payload

from the current FL buffer at q->offset, only freeing the FL buffer once

it has actually been consumed.

 

This patch:

  - adds sge::fl_align from the ingress pad/pack settings

  - introduces cxgbe_copy_rx_pkt() to copy from packed FL buffers

  - advances q->offset by len rounded up to fl_align

  - frees an FL buffer only after full consumption

  - drains the previous buffer when a later response arrives with NEWBUF

 

Reproduce (T62100-LP-CR, FW 2.1.19.0 / TP 0.1.23.2):

 

  dpdk-testpmd -l 1-9 -a 0000:18:00.4 -- \

             --rxq=32 --txq=32 --nb-cores=8 --forward-mode=rxonly -i

 

  testpmd> port stop all

  testpmd> flow flush 0

  testpmd> flow create 0 ingress pattern eth \

           / vlan tci spec 0 tci mask 0x0007 \

           / end actions queue index 5 / end

  testpmd> port start all

  testpmd> start

 

Without this patch, rx_qN_packets stays at 0 while

rx_bgN_dropped_packets rises at line rate. With the patch,

rx_qN_packets tracks received traffic and rx_bgN_dropped_packets

stays at 0.

 

Signed-off-by: Abdulrahman Alshawi <ashawi@wirefilter.com>

---

 drivers/net/cxgbe/base/adapter.h |   1 +

 drivers/net/cxgbe/sge.c          | 122 ++++++++++++++++++++++++-------

 2 files changed, 98 insertions(+), 25 deletions(-)

 

diff --git a/drivers/net/cxgbe/base/adapter.h b/drivers/net/cxgbe/base/adapter.h

index 207f3ecb88..e67cf22950 100644

--- a/drivers/net/cxgbe/base/adapter.h

+++ b/drivers/net/cxgbe/base/adapter.h

@@ -280,6 +280,7 @@ struct sge {

    u16 max_ethqsets;           /* # of available Ethernet queue sets */

    u32 stat_len;               /* length of status page at ring end */

    u32 pktshift;               /* padding between CPL & packet data */

+   u32 fl_align;               /* packed Rx packet alignment */

 

    /* response queue interrupt parameters */

    u16 timer_val[SGE_NTIMERS];

diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c

index e9d45f24c4..7cf5c70775 100644

--- a/drivers/net/cxgbe/sge.c

+++ b/drivers/net/cxgbe/sge.c

@@ -1492,6 +1492,90 @@ static inline void cxgbe_fill_mbuf_info(struct adapter *adap,

                   RTE_MBUF_F_RX_L4_CKSUM_BAD);

 }

 

+static int cxgbe_copy_rx_pkt(struct sge_rspq *q, struct sge_eth_rxq *rxq,

+               u32 len, struct rte_mbuf **out)

+{

+   struct sge *s = &q->adapter->sge;

+   struct rte_mbuf *pkt;

+   char *dst;

+   u32 copied = 0;

+   u32 remaining = len;

+

+   pkt = rte_pktmbuf_alloc(q->mb_pool);

+   if (unlikely(!pkt)) {

+       rxq->rspq.eth_dev->data->rx_mbuf_alloc_failed++;

+       rxq->stats.rx_drops++;

+       return -ENOMEM;

+   }

+

+   dst = rte_pktmbuf_append(pkt, len);

+   if (unlikely(!dst)) {

+       rte_pktmbuf_free(pkt);

+       rxq->stats.rx_drops++;

+       return -ENOMEM;

+   }

+

+   while (remaining) {

+       const struct rx_sw_desc *rsd = &rxq->fl.sdesc[rxq->fl.cidx];

+       struct rte_mbuf *src = rsd->buf;

+       u32 bufsz = get_buf_size(q->adapter, rsd);

+       u32 copy_len;

+

+       if (unlikely(!src || q->offset < 0 ||

+               (u32)q->offset >= bufsz)) {

+           rte_pktmbuf_free(pkt);

+           rxq->stats.rx_drops++;

+           return -EINVAL;

+       }

+

+       copy_len = RTE_MIN(bufsz - (u32)q->offset, remaining);

+       rte_memcpy(dst + copied,

+             rte_pktmbuf_mtod_offset(src, const void *, q->offset),

+             copy_len);

+

+       copied += copy_len;

+       remaining -= copy_len;

+       q->offset += copy_len;

+

+       if (remaining) {

+           free_rx_bufs(&rxq->fl, 1);

+           q->offset = 0;

+           continue;

+       }

+

+       q->offset = RTE_ALIGN_CEIL(q->offset, s->fl_align);

+       if ((u32)q->offset >= bufsz) {

+           free_rx_bufs(&rxq->fl, 1);

+           q->offset = 0;

+       }

+   }

+

+   *out = pkt;

+   return 0;

+}

+

+static unsigned int cxgbe_fl_pkt_align(struct adapter *adap)

+{

+   u32 sge_control = t4_read_reg(adap, A_SGE_CONTROL);

+   unsigned int ingpad_shift, ingpad, fl_align;

+

+   ingpad_shift = CHELSIO_CHIP_VERSION(adap->params.chip) <= CHELSIO_T5 ?

+             X_INGPADBOUNDARY_SHIFT : X_T6_INGPADBOUNDARY_SHIFT;

+   ingpad = 1U << (G_INGPADBOUNDARY(sge_control) + ingpad_shift);

+   fl_align = ingpad;

+

+   if (!is_t4(adap->params.chip)) {

+       u32 sge_control2 = t4_read_reg(adap, A_SGE_CONTROL2);

+       unsigned int ingpack = G_INGPACKBOUNDARY(sge_control2);

+

+       ingpack = ingpack == X_INGPACKBOUNDARY_16B ?

+            16 : 1U << (ingpack + X_INGPACKBOUNDARY_SHIFT);

+       fl_align = RTE_MAX(ingpad, ingpack);

+   }

+

+   return fl_align ? fl_align : RTE_CACHE_LINE_SIZE;

+}

+

 /**

  * process_responses - process responses from an SGE response queue

  * @q: the ingress queue to process

@@ -1535,14 +1619,12 @@ static int process_responses(struct sge_rspq *q, int budget,

            stat_pidx = ntohs(q->stat->pidx);

            stat_pidx_diff = P_IDXDIFF(q, stat_pidx);

            while (stat_pidx_diff && budget_left) {

-               const struct rx_sw_desc *rsd =

-                   &rxq->fl.sdesc[rxq->fl.cidx];

                const struct rss_header *rss_hdr =

                    (const void *)q->cur_desc;

                const struct cpl_rx_pkt *cpl =

                    (const void *)&q->cur_desc[1];

-               struct rte_mbuf *pkt, *npkt;

-               u32 len, bufsz;

+               struct rte_mbuf *pkt;

+               u32 len;

 

                rc = (const struct rsp_ctrl *)

                    ((const char *)q->cur_desc +

@@ -1553,28 +1635,16 @@ static int process_responses(struct sge_rspq *q, int budget,

                    break;

 

                len = ntohl(rc->pldbuflen_qid);

-               BUG_ON(!(len & F_RSPD_NEWBUF));

-               pkt = rsd->buf;

-               npkt = pkt;

-               len = G_RSPD_LEN(len);

-               pkt->pkt_len = len;

-

-               /* Chain mbufs into len if necessary */

-               while (len) {

-                   struct rte_mbuf *new_pkt = rsd->buf;

-

-                   bufsz = min(get_buf_size(q->adapter,

-                               rsd), len);

-                   new_pkt->data_len = bufsz;

-                   unmap_rx_buf(&rxq->fl);

-                   len -= bufsz;

-                   npkt->next = new_pkt;

-                   npkt = new_pkt;

-                   pkt->nb_segs++;

-                   rsd = &rxq->fl.sdesc[rxq->fl.cidx];

+               if (len & F_RSPD_NEWBUF) {

+                   if (q->offset > 0) {

+                       free_rx_bufs(&rxq->fl, 1);

+                       q->offset = 0;

+                   }

                }

-               npkt->next = NULL;

-               pkt->nb_segs--;

+               len = G_RSPD_LEN(len);

+               ret = cxgbe_copy_rx_pkt(q, rxq, len, &pkt);

+               if (unlikely(ret))

+                   break;

 

                cxgbe_fill_mbuf_info(q->adapter, cpl, pkt);

 

@@ -2379,6 +2449,7 @@ int t4_sge_init(struct adapter *adap)

    sge_control = t4_read_reg(adap, A_SGE_CONTROL);

    s->pktshift = G_PKTSHIFT(sge_control);

    s->stat_len = (sge_control & F_EGRSTATUSPAGESIZE) ? 128 : 64;

+   s->fl_align = cxgbe_fl_pkt_align(adap);

    ret = t4_sge_init_soft(adap);

    if (ret < 0) {

        dev_err(adap, "%s: t4_sge_init_soft failed, error %d\n",

@@ -2516,6 +2587,7 @@ int t4vf_sge_init(struct adapter *adap)

    s->stat_len = ((sge_control & F_EGRSTATUSPAGESIZE)

            ? 128 : 64);

    s->pktshift = G_PKTSHIFT(sge_control);

+   s->fl_align = RTE_CACHE_LINE_SIZE;

 

    /*

    * A FL with <= fl_starve_thres buffers is starving and a periodic

-- 

2.39.5

--=_bd862c0c-6a2d-4a7e-909a-8c87184f07ef--