From mboxrd@z Thu Jan 1 00:00:00 1970 From: Younghwan Go Subject: Re: no hugepage with UIO poll-mode driver Date: Thu, 26 Nov 2015 13:47:03 +0900 Message-ID: <56568EC7.6010905@ndsl.kaist.edu> References: <56554B08.3040400@ndsl.kaist.edu> <49956413.am4JoMJyVU@xps13> <20151125120239.GA23268@bricha3-MOBL3> <5690109.niDVrFKdOE@xps13> <5655BB2C.4090806@intel.com> <2601191342CEEE43887BDE71AB97725836ACD0DB@irsmsx105.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable To: dev@dpdk.org Return-path: Received: from mail.ndsl.kaist.edu (www.ndsl.kaist.ac.kr [143.248.57.3]) by dpdk.org (Postfix) with ESMTP id 66B449219 for ; Thu, 26 Nov 2015 05:46:53 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by mail.ndsl.kaist.edu (Postfix) with ESMTP id D8713C0977 for ; Thu, 26 Nov 2015 13:46:51 +0900 (KST) Received: from mail.ndsl.kaist.edu ([127.0.0.1]) by localhost (ndsl.kaist.ac.kr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id epznV3qRBn6B for ; Thu, 26 Nov 2015 13:46:47 +0900 (KST) Received: from [10.0.2.15] (ndsl-pc2.kaist.ac.kr [143.248.129.22]) by mail.ndsl.kaist.edu (Postfix) with ESMTPSA id 8AE36C022D for ; Thu, 26 Nov 2015 13:46:46 +0900 (KST) In-Reply-To: <2601191342CEEE43887BDE71AB97725836ACD0DB@irsmsx105.ger.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hello, Thank you all for helping us understand on issues with no hugepage option= . As Konstantin mentioned at the end, I tried using VFIO module instead of=20 IGB UIO module. I enabled all necessary parameters (IOMMU,=20 virtualization, vfio-pci, VFIO permission) and ran my code with no=20 hugepage option. At first, it seemed to receive packets fine, but after a while, it=20 stopped receiving packets. I could temporarily remove this issue by not=20 calling rte_eth_tx_burst(). Also, when I looked at the received packets,=20 they all contained 0s instead of actual data. Was there anything that I=20 missed in running with VFIO? I'm curious if no hugepage with no hugepage=20 option was confirmed to run with VFIO. Thank you, Younghwan 2015-11-25 =EC=98=A4=ED=9B=84 11:12=EC=97=90 Ananyev, Konstantin =EC=9D=B4= (=EA=B0=80) =EC=93=B4 =EA=B8=80: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Sergio Gonzalez M= onroy >> Sent: Wednesday, November 25, 2015 1:44 PM >> To: Thomas Monjalon >> Cc: dev@dpdk.org >> Subject: Re: [dpdk-dev] no hugepage with UIO poll-mode driver >> >> On 25/11/2015 13:22, Thomas Monjalon wrote: >>> 2015-11-25 12:02, Bruce Richardson: >>>> On Wed, Nov 25, 2015 at 12:03:05PM +0100, Thomas Monjalon wrote: >>>>> 2015-11-25 11:00, Bruce Richardson: >>>>>> On Wed, Nov 25, 2015 at 11:23:57AM +0100, Thomas Monjalon wrote: >>>>>>> 2015-11-25 10:08, Bruce Richardson: >>>>>>>> On Wed, Nov 25, 2015 at 03:39:17PM +0900, Younghwan Go wrote: >>>>>>>>> Hi Jianfeng, >>>>>>>>> >>>>>>>>> Thanks for the email. rte mempool was successfully created with= out any >>>>>>>>> error. Now the next problem is that rte_eth_rx_burst() is alway= s returning 0 >>>>>>>>> as if there was no packet to receive... Do you have any suggest= ion on what >>>>>>>>> might be causing this issue? In the meantime, I will be digging= through >>>>>>>>> ixgbe driver code to see what's going on. >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> Younghwan >>>>>>>>> >>>>>>>> The problem is that with --no-huge we don't have the physical ad= dress of the memory >>>>>>>> to write to the network card. That's what it's marked as for tes= ting only. >>>>>>> Even with rte_mem_virt2phy() + rte_mem_lock_page() ? >>>>>>> >>>>>> With no-huge, we just set up a single memory segment at startup an= d set its >>>>>> "physaddr" to be the virtual address. >>>>>> >>>>>> /* hugetlbfs can be disabled */ >>>>>> if (internal_config.no_hugetlbfs) { >>>>>> addr =3D mmap(NULL, internal_config.memory, PROT= _READ | PROT_WRITE, >>>>>> MAP_PRIVATE | MAP_ANONYMOUS, 0, = 0); >>>>>> if (addr =3D=3D MAP_FAILED) { >>>>>> RTE_LOG(ERR, EAL, "%s: mmap() failed: %s= \n", __func__, >>>>>> strerror(errno)); >>>>>> return -1; >>>>>> } >>>>>> mcfg->memseg[0].phys_addr =3D (phys_addr_t)(uint= ptr_t)addr; >>>>> rte_mem_virt2phy() does not use memseg.phys_addr but /proc/self/pag= emap: >>>>> >>>>> /* >>>>> * the pfn (page frame number) are bits 0-54 (see >>>>> * pagemap.txt in linux Documentation) >>>>> */ >>>>> physaddr =3D ((page & 0x7fffffffffffffULL) * page_size) >>>>> + ((unsigned long)virtaddr % page_size); >>>>> >>>> Yes, you are right. I was not aware that that function was used as p= art of the >>>> mempool init, but now I see that "rte_mempool_virt2phy()" does indee= d call that >>>> function if hugepages are disabled, so my bad. >>> Do you think we could move --no-huge in the main section (not only fo= r testing)? >> Hi, >> >> I think the main issue is going to be the HW descriptors queues. >> AFAIK drivers now call rte_eth_dma_zone_reserve, which is basically a >> wrapper around >> rte_memzone_reserve, to get a chunk of phys memory, and in the case of >> --no-huge is >> not going to be really phys contiguous. >> >> Ideally we would move and expand the functionality of dma_zone reserve >> API to the EAL, >> so we could detect what page size we have and set the boundary for suc= h >> page size. >> dma_zone_reserve does something similar to work on Xen target by >> reserving memzones >> on 2MB boundary. > With xen we have a special kernel driver that allocates physically cont= inuous > chunks of memory for us. > So we can guarantee that each such chunk would be at least 2MB long. > That's enough to allocate HW rings (max HW ring size for let say ixgbe = is ~64KB). > Here there is absolutely no guarantee that memory allocated by kernel w= ill be memory continuous. > Of course you can search though all pages that you allocated and most l= ikely you'll find a continuous > chunk big enough for that. > Another problem - mbufs. > You need to be sure that each mbuf doesn't cross page boundary > (in case next page is not adjacent to current one). > So you'll probably need to use rte_mempool_xmem_create() to allocate mb= ufs from no hugepages. > BTW, as I remember with vfio in place you should be able to do IO with = no-hugepages options, no? > As it relies on vfio ability to setup IOMMU tables for you. > Konstantin > >> Sergio