From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olivier MATZ Subject: Re: Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd Date: Thu, 30 Jul 2015 10:12:25 +0200 Message-ID: <55B9DC69.3080508@6wind.com> References: <55B8EC16.60404@allegro-packets.com> <2601191342CEEE43887BDE71AB97725836A69E3D@irsmsx105.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" To: "Zhang, Helin" , "Ananyev, Konstantin" , Martin Weiser Return-path: Received: from mail-wi0-f169.google.com (mail-wi0-f169.google.com [209.85.212.169]) by dpdk.org (Postfix) with ESMTP id 00472C47A for ; Thu, 30 Jul 2015 10:12:35 +0200 (CEST) Received: by wibud3 with SMTP id ud3so10700872wib.1 for ; Thu, 30 Jul 2015 01:12:35 -0700 (PDT) In-Reply-To: List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, On 07/29/2015 10:24 PM, Zhang, Helin wrote: > Hi Martin > > Thank you very much for the good catch! > > The similar situation in i40e, as explained by Konstantin. > As header split hasn't been supported by DPDK till now. It would be bet= ter to put the header address in RX descriptor to 0. > But in the future, during header split enabling. We may need to pay ext= ra attention to that. As at least x710 datasheet said specifically as bel= ow. > "The header address should be set by the software to an even number (wo= rd aligned address)". We may need to find a way to ensure that during mem= pool/mbuf allocation. Indeed it would be good to force the priv_size to be aligned. The priv_size could be aligned automatically in rte_pktmbuf_pool_create(). The only possible problem I could see is that it would break applications that access to the data buffer by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the best thing to do (I didn't find any applications like this in dpdk). For applications that directly use rte_mempool_create() instead of rte_pktmbuf_pool_create(), we could add a check using an assert in=20 rte_pktmbuf_init() and some words in the documentation. The question is: what should be the proper alignment? I would say at least 8 bytes, but maybe cache_aligned is an option too. Regards, Olivier > > Regards, > Helin > >> -----Original Message----- >> From: Ananyev, Konstantin >> Sent: Wednesday, July 29, 2015 11:12 AM >> To: Martin Weiser; Zhang, Helin; olivier.matz@6wind.com >> Cc: dev@dpdk.org >> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e = when mbuf >> private area size is odd >> >> Hi Martin, >> >>> -----Original Message----- >>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Martin Weiser >>> Sent: Wednesday, July 29, 2015 4:07 PM >>> To: Zhang, Helin; olivier.matz@6wind.com >>> Cc: dev@dpdk.org >>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e whe= n >>> mbuf private area size is odd >>> >>> Hi Helin, Hi Olivier, >>> >>> we are seeing an issue with the ixgbe and i40e drivers which we could >>> track down to our setting of the private area size of the mbufs. >>> The issue can be easily reproduced with the l2fwd example application >>> when a small modification is done: just set the priv_size parameter i= n >>> the call to the rte_pktmbuf_pool_create function to an odd number lik= e >>> 1. In our tests this causes every call to rte_eth_rx_burst to return >>> 32 (which is the setting of nb_pkts) nonsense mbufs although no >>> packets are received on the interface and the hardware counters do no= t >>> report any received packets. >> >> From Niantic datasheet: >> >> "7.1.6.1 Advanced Receive Descriptors =E2=80=94 Read Format Table 7-15= lists the >> advanced receive descriptor programming by the software. The ... >> Packet Buffer Address (64) >> This is the physical address of the packet buffer. The lowest bit is A= 0 (LSB of the >> address). >> Header Buffer Address (64) >> The physical address of the header buffer with the lowest bit being De= scriptor >> Done (DD). >> When a packet spans in multiple descriptors, the header buffer address= is used >> only on the first descriptor. During the programming phase, software m= ust set >> the DD bit to zero (see the description of the DD bit in this section)= . This means >> that header buffer addresses are always word aligned." >> >> Right now, in ixgbe PMD we always setup Packet Buffer Address (PBA)an= d >> Header Buffer Address (HBA) to the same value: >> buf_physaddr + RTE_PKTMBUF_HEADROOM >> So when pirv_size=3D=3D1, DD bit in RXD is always set to one by SW its= elf, and then >> SW considers that HW already done with it. >> In other words, right now for ixgbe you can't use RX buffer that is no= t aligned on >> word boundary. >> >> So the advice would be, right now - don't set priv_size to the odd val= ue. >> As we don't support split header feature anyway, I think we can fix it= just by >> always setting HBA in the RXD to zero. >> Could you try the fix for ixgbe below? >> >> Same story with FVL, I believe. >> Konstantin >> >> >>> Interestingly this does not happen if we force the scattered rx path. >>> >>> I assume the drivers have some expectations regarding the alignment o= f >>> the buf_addr in the mbuf and setting an odd private are size breaks >>> this alignment in the rte_pktmbuf_init function. If this is the case >>> then one possible fix might be to enforce an alignment on the private= area size. >>> >>> Best regards, >>> Martin >> >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_= rxtx.c >> index a0c8847..94967c5 100644 >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, = bool >> reset_mbuf) >> >> /* populate the descriptors */ >> dma_addr =3D >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb)); >> - rxdp[i].read.hdr_addr =3D dma_addr; >> + rxdp[i].read.hdr_addr =3D 0; >> rxdp[i].read.pkt_addr =3D dma_addr; >> } >> >> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf >> **rx_pkts, >> rxe->mbuf =3D nmb; >> dma_addr =3D >> >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb)); >> - rxdp->read.hdr_addr =3D dma_addr; >> + rxdp->read.hdr_addr =3D 0; >> rxdp->read.pkt_addr =3D dma_addr; >> >> /* >> @@ -1741,7 +1741,7 @@ next_desc: >> rxe->mbuf =3D nmb; >> >> rxm->data_off =3D RTE_PKTMBUF_HEADROOM; >> - rxdp->read.hdr_addr =3D dma; >> + rxdp->read.hdr_addr =3D 0; >> rxdp->read.pkt_addr =3D dma; >> } else >> rxe->mbuf =3D NULL; @@ -3633,7 +3633,7 @@ >> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq) >> dma_addr =3D >> >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf)); >> rxd =3D &rxq->rx_ring[i]; >> - rxd->read.hdr_addr =3D dma_addr; >> + rxd->read.hdr_addr =3D 0; >> rxd->read.pkt_addr =3D dma_addr; >> rxe[i].mbuf =3D mbuf; >> } >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c >> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >> index 6c1647e..16a9c64 100644 >> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c >> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) >> RTE_PKTMBUF_HEADROOM); >> __m128i dma_addr0, dma_addr1; >> >> + const __m128i hba_msk =3D _mm_set_epi64x(0, UINT64_MAX); >> + >> rxdp =3D rxq->rx_ring + rxq->rxrearm_start; >> >> /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 +1= 10,9 @@ >> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) >> dma_addr0 =3D _mm_add_epi64(dma_addr0, hdr_room); >> dma_addr1 =3D _mm_add_epi64(dma_addr1, hdr_room); >> >> + dma_addr0 =3D _mm_and_si128(dma_addr0, hba_msk); >> + dma_addr1 =3D _mm_and_si128(dma_addr1, hba_msk); >> + >> /* flush desc with pa dma_addr */ >> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); >> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); >> bash-4.2$ cat patch1 diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c >> b/drivers/net/ixgbe/ixgbe_rxtx.c index a0c8847..94967c5 100644 >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, = bool >> reset_mbuf) >> >> /* populate the descriptors */ >> dma_addr =3D >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb)); >> - rxdp[i].read.hdr_addr =3D dma_addr; >> + rxdp[i].read.hdr_addr =3D 0; >> rxdp[i].read.pkt_addr =3D dma_addr; >> } >> >> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf >> **rx_pkts, >> rxe->mbuf =3D nmb; >> dma_addr =3D >> >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb)); >> - rxdp->read.hdr_addr =3D dma_addr; >> + rxdp->read.hdr_addr =3D 0; >> rxdp->read.pkt_addr =3D dma_addr; >> >> /* >> @@ -1741,7 +1741,7 @@ next_desc: >> rxe->mbuf =3D nmb; >> >> rxm->data_off =3D RTE_PKTMBUF_HEADROOM; >> - rxdp->read.hdr_addr =3D dma; >> + rxdp->read.hdr_addr =3D 0; >> rxdp->read.pkt_addr =3D dma; >> } else >> rxe->mbuf =3D NULL; @@ -3633,7 +3633,7 @@ >> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq) >> dma_addr =3D >> >> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf)); >> rxd =3D &rxq->rx_ring[i]; >> - rxd->read.hdr_addr =3D dma_addr; >> + rxd->read.hdr_addr =3D 0; >> rxd->read.pkt_addr =3D dma_addr; >> rxe[i].mbuf =3D mbuf; >> } >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c >> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >> index 6c1647e..16a9c64 100644 >> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c >> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) >> RTE_PKTMBUF_HEADROOM); >> __m128i dma_addr0, dma_addr1; >> >> + const __m128i hba_msk =3D _mm_set_epi64x(0, UINT64_MAX); >> + >> rxdp =3D rxq->rx_ring + rxq->rxrearm_start; >> >> /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 +1= 10,9 @@ >> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) >> dma_addr0 =3D _mm_add_epi64(dma_addr0, hdr_room); >> dma_addr1 =3D _mm_add_epi64(dma_addr1, hdr_room); >> >> + dma_addr0 =3D _mm_and_si128(dma_addr0, hba_msk); >> + dma_addr1 =3D _mm_and_si128(dma_addr1, hba_msk); >> + >> /* flush desc with pa dma_addr */ >> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); >> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1);