From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olivier MATZ Subject: Re: Issue with non-scattered rx in ixgbe and i40e when mbuf private area size is odd Date: Thu, 30 Jul 2015 11:10:13 +0200 Message-ID: <55B9E9F5.8080102@6wind.com> References: <55B8EC16.60404@allegro-packets.com> <2601191342CEEE43887BDE71AB97725836A69E3D@irsmsx105.ger.corp.intel.com> <55B9DC69.3080508@6wind.com> <2601191342CEEE43887BDE71AB97725836A6A082@irsmsx105.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" To: "Ananyev, Konstantin" , "Zhang, Helin" , Martin Weiser Return-path: Received: from mail-wi0-f179.google.com (mail-wi0-f179.google.com [209.85.212.179]) by dpdk.org (Postfix) with ESMTP id 83F76C492 for ; Thu, 30 Jul 2015 11:10:22 +0200 (CEST) Received: by wicmv11 with SMTP id mv11so12634243wic.0 for ; Thu, 30 Jul 2015 02:10:22 -0700 (PDT) In-Reply-To: <2601191342CEEE43887BDE71AB97725836A6A082@irsmsx105.ger.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Konstantin, On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote: > Hi Olivier, > >> -----Original Message----- >> From: Olivier MATZ [mailto:olivier.matz@6wind.com] >> Sent: Thursday, July 30, 2015 9:12 AM >> To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser >> Cc: dev@dpdk.org >> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e = when mbuf private area size is odd >> >> Hi, >> >> On 07/29/2015 10:24 PM, Zhang, Helin wrote: >>> Hi Martin >>> >>> Thank you very much for the good catch! >>> >>> The similar situation in i40e, as explained by Konstantin. >>> As header split hasn't been supported by DPDK till now. It would be b= etter to put the header address in RX descriptor to 0. >>> But in the future, during header split enabling. We may need to pay e= xtra attention to that. As at least x710 datasheet said >> specifically as below. >>> "The header address should be set by the software to an even number (= word aligned address)". We may need to find a way to >> ensure that during mempool/mbuf allocation. >> >> Indeed it would be good to force the priv_size to be aligned. >> >> The priv_size could be aligned automatically in >> rte_pktmbuf_pool_create(). The only possible problem I could see >> is that it would break applications that access to the data buffer >> by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the >> best thing to do (I didn't find any applications like this in dpdk). > > > Might be just make rte_pktmbuf_pool_create() fail if input priv_size % = MIN_ALIGN !=3D 0? Hmm maybe it would break more applications: an odd priv_size is probably rare, but a priv_size that is not aligned to 8 bytes is maybe more common. It's maybe safer to align the size transparently? Regards, Olivier > >> >> For applications that directly use rte_mempool_create() instead of >> rte_pktmbuf_pool_create(), we could add a check using an assert in >> rte_pktmbuf_init() and some words in the documentation. >> >> The question is: what should be the proper alignment? I would say >> at least 8 bytes, but maybe cache_aligned is an option too. > > 8 bytes seems enough to me. > > Konstantin > >> >> Regards, >> Olivier >> >> >>> >>> Regards, >>> Helin >>> >>>> -----Original Message----- >>>> From: Ananyev, Konstantin >>>> Sent: Wednesday, July 29, 2015 11:12 AM >>>> To: Martin Weiser; Zhang, Helin; olivier.matz@6wind.com >>>> Cc: dev@dpdk.org >>>> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40= e when mbuf >>>> private area size is odd >>>> >>>> Hi Martin, >>>> >>>>> -----Original Message----- >>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Martin Weiser >>>>> Sent: Wednesday, July 29, 2015 4:07 PM >>>>> To: Zhang, Helin; olivier.matz@6wind.com >>>>> Cc: dev@dpdk.org >>>>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e w= hen >>>>> mbuf private area size is odd >>>>> >>>>> Hi Helin, Hi Olivier, >>>>> >>>>> we are seeing an issue with the ixgbe and i40e drivers which we cou= ld >>>>> track down to our setting of the private area size of the mbufs. >>>>> The issue can be easily reproduced with the l2fwd example applicati= on >>>>> when a small modification is done: just set the priv_size parameter= in >>>>> the call to the rte_pktmbuf_pool_create function to an odd number l= ike >>>>> 1. In our tests this causes every call to rte_eth_rx_burst to retur= n >>>>> 32 (which is the setting of nb_pkts) nonsense mbufs although no >>>>> packets are received on the interface and the hardware counters do = not >>>>> report any received packets. >>>> >>>> From Niantic datasheet: >>>> >>>> "7.1.6.1 Advanced Receive Descriptors =E2=80=94 Read Format Table 7-= 15 lists the >>>> advanced receive descriptor programming by the software. The ... >>>> Packet Buffer Address (64) >>>> This is the physical address of the packet buffer. The lowest bit is= A0 (LSB of the >>>> address). >>>> Header Buffer Address (64) >>>> The physical address of the header buffer with the lowest bit being = Descriptor >>>> Done (DD). >>>> When a packet spans in multiple descriptors, the header buffer addre= ss is used >>>> only on the first descriptor. During the programming phase, software= must set >>>> the DD bit to zero (see the description of the DD bit in this sectio= n). This means >>>> that header buffer addresses are always word aligned." >>>> >>>> Right now, in ixgbe PMD we always setup Packet Buffer Address (PBA)= and >>>> Header Buffer Address (HBA) to the same value: >>>> buf_physaddr + RTE_PKTMBUF_HEADROOM >>>> So when pirv_size=3D=3D1, DD bit in RXD is always set to one by SW i= tself, and then >>>> SW considers that HW already done with it. >>>> In other words, right now for ixgbe you can't use RX buffer that is = not aligned on >>>> word boundary. >>>> >>>> So the advice would be, right now - don't set priv_size to the odd v= alue. >>>> As we don't support split header feature anyway, I think we can fix = it just by >>>> always setting HBA in the RXD to zero. >>>> Could you try the fix for ixgbe below? >>>> >>>> Same story with FVL, I believe. >>>> Konstantin >>>> >>>> >>>>> Interestingly this does not happen if we force the scattered rx pat= h. >>>>> >>>>> I assume the drivers have some expectations regarding the alignment= of >>>>> the buf_addr in the mbuf and setting an odd private are size breaks >>>>> this alignment in the rte_pktmbuf_init function. If this is the cas= e >>>>> then one possible fix might be to enforce an alignment on the priva= te area size. >>>>> >>>>> Best regards, >>>>> Martin >>>> >>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgb= e_rxtx.c >>>> index a0c8847..94967c5 100644 >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq= , bool >>>> reset_mbuf) >>>> >>>> /* populate the descriptors */ >>>> dma_addr =3D >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb)); >>>> - rxdp[i].read.hdr_addr =3D dma_addr; >>>> + rxdp[i].read.hdr_addr =3D 0; >>>> rxdp[i].read.pkt_addr =3D dma_addr; >>>> } >>>> >>>> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbu= f >>>> **rx_pkts, >>>> rxe->mbuf =3D nmb; >>>> dma_addr =3D >>>> >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb)); >>>> - rxdp->read.hdr_addr =3D dma_addr; >>>> + rxdp->read.hdr_addr =3D 0; >>>> rxdp->read.pkt_addr =3D dma_addr; >>>> >>>> /* >>>> @@ -1741,7 +1741,7 @@ next_desc: >>>> rxe->mbuf =3D nmb; >>>> >>>> rxm->data_off =3D RTE_PKTMBUF_HEADROOM; >>>> - rxdp->read.hdr_addr =3D dma; >>>> + rxdp->read.hdr_addr =3D 0; >>>> rxdp->read.pkt_addr =3D dma; >>>> } else >>>> rxe->mbuf =3D NULL; @@ -3633,7 +3633,7 @@ >>>> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq) >>>> dma_addr =3D >>>> >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf)); >>>> rxd =3D &rxq->rx_ring[i]; >>>> - rxd->read.hdr_addr =3D dma_addr; >>>> + rxd->read.hdr_addr =3D 0; >>>> rxd->read.pkt_addr =3D dma_addr; >>>> rxe[i].mbuf =3D mbuf; >>>> } >>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>> index 6c1647e..16a9c64 100644 >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) >>>> RTE_PKTMBUF_HEADROOM); >>>> __m128i dma_addr0, dma_addr1; >>>> >>>> + const __m128i hba_msk =3D _mm_set_epi64x(0, UINT64_MAX); >>>> + >>>> rxdp =3D rxq->rx_ring + rxq->rxrearm_start; >>>> >>>> /* Pull 'n' more MBUFs into the software ring */ @@ -108,6= +110,9 @@ >>>> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) >>>> dma_addr0 =3D _mm_add_epi64(dma_addr0, hdr_room); >>>> dma_addr1 =3D _mm_add_epi64(dma_addr1, hdr_room); >>>> >>>> + dma_addr0 =3D _mm_and_si128(dma_addr0, hba_msk); >>>> + dma_addr1 =3D _mm_and_si128(dma_addr1, hba_msk); >>>> + >>>> /* flush desc with pa dma_addr */ >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr= 0); >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr= 1); >>>> bash-4.2$ cat patch1 diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c >>>> b/drivers/net/ixgbe/ixgbe_rxtx.c index a0c8847..94967c5 100644 >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >>>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq= , bool >>>> reset_mbuf) >>>> >>>> /* populate the descriptors */ >>>> dma_addr =3D >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb)); >>>> - rxdp[i].read.hdr_addr =3D dma_addr; >>>> + rxdp[i].read.hdr_addr =3D 0; >>>> rxdp[i].read.pkt_addr =3D dma_addr; >>>> } >>>> >>>> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbu= f >>>> **rx_pkts, >>>> rxe->mbuf =3D nmb; >>>> dma_addr =3D >>>> >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb)); >>>> - rxdp->read.hdr_addr =3D dma_addr; >>>> + rxdp->read.hdr_addr =3D 0; >>>> rxdp->read.pkt_addr =3D dma_addr; >>>> >>>> /* >>>> @@ -1741,7 +1741,7 @@ next_desc: >>>> rxe->mbuf =3D nmb; >>>> >>>> rxm->data_off =3D RTE_PKTMBUF_HEADROOM; >>>> - rxdp->read.hdr_addr =3D dma; >>>> + rxdp->read.hdr_addr =3D 0; >>>> rxdp->read.pkt_addr =3D dma; >>>> } else >>>> rxe->mbuf =3D NULL; @@ -3633,7 +3633,7 @@ >>>> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq) >>>> dma_addr =3D >>>> >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf)); >>>> rxd =3D &rxq->rx_ring[i]; >>>> - rxd->read.hdr_addr =3D dma_addr; >>>> + rxd->read.hdr_addr =3D 0; >>>> rxd->read.pkt_addr =3D dma_addr; >>>> rxe[i].mbuf =3D mbuf; >>>> } >>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>> index 6c1647e..16a9c64 100644 >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c >>>> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) >>>> RTE_PKTMBUF_HEADROOM); >>>> __m128i dma_addr0, dma_addr1; >>>> >>>> + const __m128i hba_msk =3D _mm_set_epi64x(0, UINT64_MAX); >>>> + >>>> rxdp =3D rxq->rx_ring + rxq->rxrearm_start; >>>> >>>> /* Pull 'n' more MBUFs into the software ring */ @@ -108,6= +110,9 @@ >>>> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) >>>> dma_addr0 =3D _mm_add_epi64(dma_addr0, hdr_room); >>>> dma_addr1 =3D _mm_add_epi64(dma_addr1, hdr_room); >>>> >>>> + dma_addr0 =3D _mm_and_si128(dma_addr0, hba_msk); >>>> + dma_addr1 =3D _mm_and_si128(dma_addr1, hba_msk); >>>> + >>>> /* flush desc with pa dma_addr */ >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr= 0); >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr= 1); >