From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Burakov, Anatoly" Subject: Re: [RFC v2 00/23] Dynamic memory allocation for DPDK Date: Fri, 22 Dec 2017 09:13:04 +0000 Message-ID: References: <1513892309.2658.80.camel@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "thomas@monjalon.net" , "andras.kovacs@ericsson.com" , "Wiles, Keith" , "Richardson, Bruce" To: "Walker, Benjamin" , "dev@dpdk.org" Return-path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id CF1191B370 for ; Fri, 22 Dec 2017 10:13:08 +0100 (CET) In-Reply-To: <1513892309.2658.80.camel@intel.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 21-Dec-17 9:38 PM, Walker, Benjamin wrote: > On Tue, 2017-12-19 at 11:14 +0000, Anatoly Burakov wrote: >> > >> Quick outline of all changes done as part of this patchset: >> >> * Malloc heap adjusted to handle holes in address space >> * Single memseg list replaced by multiple expandable memseg lists >> * VA space for hugepages is preallocated in advance >> * Added dynamic alloc/free for pages, happening as needed on malloc/free > > SPDK will need some way to register for a notification when pages are allocated > or freed. For storage, the number of requests per second is (relative to > networking) fairly small (hundreds of thousands per second in a traditional > block storage stack, or a few million per second with SPDK). Given that, we can > afford to do a dynamic lookup from va to pa/iova on each request in order to > greatly simplify our APIs (users can just pass pointers around instead of > mbufs). DPDK has a way to lookup the pa from a given va, but it does so by > scanning /proc/self/pagemap and is very slow. SPDK instead handles this by > implementing a lookup table of va to pa/iova which we populate by scanning > through the DPDK memory segments at start up, so the lookup in our table is > sufficiently fast for storage use cases. If the list of memory segments changes, > we need to know about it in order to update our map. Hi Benjamin, So, in other words, we need callbacks on alloa/free. What information would SPDK need when receiving this notification? Since we can't really know in advance how many pages we allocate (it may be one, it may be a thousand) and they no longer are guaranteed to be contiguous, would a per-page callback be OK? Alternatively, we could have one callback per operation, but only provide VA and size of allocated memory, while leaving everything else to the user. I do add a virt2memseg() function which would allow you to look up segment physical addresses easier, so you won't have to manually scan memseg lists to get IOVA for a given VA. Thanks for your feedback and suggestions! > > Having the map also enables a number of other nice things - for instance we > allow users to register memory that wasn't allocated through DPDK and use it for > DMA operations. We keep that va to pa/iova mapping in the same map. I appreciate > you adding APIs to dynamically register this type of memory with the IOMMU on > our behalf. That allows us to eliminate a nasty hack where we were looking up > the vfio file descriptor through sysfs in order to send the registration ioctl. > >> * Added contiguous memory allocation API's for rte_malloc and rte_memzone >> * Integrated Pawel Wodkowski's patch [1] for registering/unregistering memory >> with VFIO -- Thanks, Anatoly