From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1665521325962744648==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] spdk_malloc vs. malloc Date: Tue, 08 May 2018 17:53:25 +0000 Message-ID: <1525802004.2784.33.camel@intel.com> In-Reply-To: 068974EF-8372-491B-8CC2-3223755F251F@netapp.com List-ID: To: spdk@lists.01.org --===============1665521325962744648== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Tue, 2018-05-08 at 17:35 +0000, Rodriguez, Edwin wrote: > Something like this would work for NetApp: > = > enum spdk_malloc_flags { > SPDK_MALLOC_DMA, > SPDK_MALLOC_SHARE, > SPDK_MALLOC_DMA_SHARE > }; > void *spdk_malloc(size_t size, size_t align, uint64_t *phys_addr, int > socket_id, enum spdk_malloc_flags flags); This is very close to the code that is already merged. The code today uses #defines instead of an enum, and DMA is 1, SHARE is 2, and DMA+SHARE is 3, = but it's otherwise the same thing. Although in the current code it is technical= ly possible to pass no flags while in your definition it is not possible, it is already the case that passing no flags is not valid and the call should fai= l. The flags are definitely intended to be prohibited from being 0, which I th= ink is the key requirement for you guys. We can make that clearer in the documentation for sure. > = > Ed R > = > = > From: SPDK on behalf of Ed Rodriguez @netapp.com> > Reply-To: Storage Performance Development Kit > Date: Monday, May 7, 2018 at 1:39 PM > To: "Walker, Benjamin" , "Harris, James R" <= james.r > .harris(a)intel.com>, "spdk(a)lists.01.org" , "z.kha= tami88(a)gmai > l.com" > Subject: Re: [SPDK] spdk_malloc vs. malloc > = > spdk_malloc could work for cases 2-4 if the flags param were prohibited f= rom > being 0 - in our implementation we don't do anything special for shared m= emory > so spdk_free can continue to call the equivalent of spdk_dma_free. > = > Ed R > = > On 5/7/18, 1:19 PM, "Walker, Benjamin" wrot= e: > = > SPDK needs four different types of memory allocation operations today: > = > 1) Standard virtual memory > 2) Process-shared memory > 3) DMA-safe memory ("pinned") > 4) DMA-safe and process-shared memory > = > For #1, SPDK is using POSIX to do that allocation and free. That one = is a > very > separate discussion than the other 3, so I'll leave that alone. > = > For numbers 2-4, SPDK could either have 3 separate functions or one > function > with flags to do the allocations. I don't think the allocation side c= hoice > makes > much of a difference. On the free side, however, we need to decide if > we're > going to do one generic spdk_free(void *buf), an spdk_free with flags= , or > 3 > separate free functions corresponding to 3 separate allocation functi= ons. > We're > currently moving toward spdk_malloc with flags and spdk_free without,= but > that > is up for debate. > = > There are at least 3 known implementations of the SPDK env layer toda= y, so > I'll > attempt to outline what each of those needs so we have some concrete > examples. > = > The primary implementation of the env layer, which ships with SPDK, i= s the > one > based on DPDK. DPDK internally always allocates all memory as both DM= A- > safe and > shared, so the distinction there isn't necessary. Essentially, alloca= tion > types > 2-4 are all degenerate and map to a single DPDK call. On the free sid= e, > DPDK > does not need to know what type of allocation it was (again, because > they're all > degenerate). > = > One known alternative implementation (Oracle) has three separate > operations > corresponding to memory allocation types 2, 3, and 4. However, my > understanding > is that they also have internal tracking structures such that they can > figure > out which type of allocation an address came from originally, so they= are > able > to implement spdk_free(void *buf) by doing internal look-ups and don't > need the > user to pass them flags. > = > Another known alternative implementation (NetApp) has two memory > allocation > operations internally. One corresponds to both types 1 and 2, and the > other > corresponds to types 3 and 4. Further, on the free side there are > different free > functions for types 1 and 2 compared to types 3 and 4. The current > abstraction > in SPDK has a single call for types 2-4 with no flags, so it's valid > criticism > that this API is difficult for NetApp to implement. We need to figure= out > a way > to address this. > = > The easiest solution today, in my opinion, is for the NetApp > implementation to > always allocate DMA-safe memory in response to calls to spdk_malloc, = even > if > only the shared flag was specified (case #2). Note that SPDK doesn't = call > spdk_malloc for case #1 already, so that's a separate code path. Then > always do > the DMA-safe memory free in response to spdk_free. That will definite= ly > make it > work correctly today, and then the NetApp implementation doesn't need= the > flags > on free either. > = > I'd love to have as many people weigh in here as possible. The more > diverse > opinions we get, the better the ultimate design will be. We absolutely > have to > make sure this memory abstraction is solid and meets everyone's needs, > while > still being reasonably simple to use. > = > Thanks, > Ben > = > = > On Fri, 2018-05-04 at 17:17 -0700, Harris, James R wrote: > > You=E2=80=99re right Ed. We=E2=80=99ll need to hold off on adding = usages of this new > API > > until we get this sorted out. = > > > > What if we just add an spdk_dma_shared_malloc() and got rid of the = flags > > completely? And then just keep all of the existing malloc/free cal= ls > where > > DMA/shared memory is not required. This should alleviate the impac= t to > your > > code base and I *think* would work for Oracle =E2=80=93 but would n= eed Zahra to > > confirm. This adds an explicit assumption of course that we won=E2= =80=99t have > a need > > for some other kind of malloc =E2=80=98flag=E2=80=99 in the future. > > > > Sorry John =E2=80=93 I know you wanted to end this thread but I=E2= =80=99ve decided to > continue > > it. ( > > > > -Jim > > > > > > > > On 5/4/18, 3:37 PM, "Rodriguez, Edwin" = wrote: > > > > The problem I have with the new code is that it makes distingui= shing > > between dma and non-dma free difficult. In the kernel, dma memory = comes > from > > a different pool than regular allocations, i.e. contigmalloc/contig= free > in > > FreeBsd. How does spdk_free distinguish between a block allocated = with > > SPDK_MALLOC_DMA and a regular block? > > = > > Ed R > > = > > On 5/4/18, 6:30 PM, "SPDK on behalf of Meneghini, John" s(a)list > > s.01.org on behalf of John.Meneghini(a)netapp.com> wrote: > > = > > Thanks, Ben, et al, for your reply. > > = > > > you may want to avoid calling them at run-time because th= ey > can > > fail, but SPDK generally already does > > > that for performance reasons anyway. Can you outline what= you > mean? > > = > > What is run-time? We are using the SPDK libraries inside = of a > kernel > > kmod. So, all time is run-time. > > = > > I think we can end this thread. I tried to win this argume= nt > once > > before. I guess calloc() and malloc() are here to stay... so > > it doesn't matter what we do with spdk_malloc(). > > = > > :-) > > = > > If anyone wants to ask me about my views on this topic, I'l= l be > at the > > SPDK Developer's Summit in two weeks. > > = > > See you then. > > = > > /John > > = > > On 5/3/18, 7:24 PM, "Walker, Benjamin" m> > > wrote: > > = > > On Thu, 2018-05-03 at 20:14 +0000, Meneghini, John wrot= e: > > > The issue is: spdk_malloc and spdk_calloc are logical= APIs > to > > abstract the > > > POSIX malloc and calloc calls. We chose to NOT use t= hose > API > > names originally > > > and went with the spdk_dma_{} api names. My concern is > that > > taking those API > > > names now will eliminate the possibility of any other= use > in the > > future. > > > > > > I am not asking to "fully abstract POSIX". I am simp= ly > saying > > that the > > > malloc(), calloc() and free() POSIX APIs suffer from= some > > problems. This is > > > why DPDK implemented rte_malloc() and rte_calloc(). = As > we move > > forward to > > > build production quality products out of SPDK we will= want > to > > abstract malloc, > > > calloc and free; especially in the libraries. I don't > care > > about what the > > > applications do. I want a better memory management A= PIs > in the > > libraries. > > = > > DPDK implemented rte_malloc and rte_calloc to allocate = DMA- > safe > > and shared > > memory. They still call POSIX malloc and calloc for all > other > > types of > > allocations. This is the same thing that SPDK has done. > > = > > I've never heard of malloc, calloc, and free having pro= blems > to > > the extent that > > you'd want to reimplement them in your application some > other way. > > Certainly, > > you may want to avoid calling them at run-time because = they > can > > fail, but SPDK > > generally already does that for performance reasons any= way. > Can > > you outline what > > you mean? > > = > > > > > > I'd be happy to simply map spdk_malloc/calloc() to > > rte_malloc/calloc() and I > > > don't see the point introducing an spdk_malloc() that= only > > extends > > > spdk_dma_malloc(). I'd rather absorb an API change to > > spdk_dma_malloc(). We > > > change many APIs in SPDK from release to release. I d= on't > see > > why this API > > > can't be changed - with agreement from the SPDK commu= nity. > > > > > > /* > > > * Allocate memory on default heap. > > > */ > > > void * > > > rte_malloc(const char *type, size_t size, unsigned al= ign) > > > { > > > return rte_malloc_socket(type, size, align, > > SOCKET_ID_ANY); > > > } > > > > > > /* = > > > * Allocate zero'd memory on default heap. > > > */ > > > void * > > > rte_calloc(const char *type, size_t num, size_t size, > unsigned > > align) > > > { > > > return rte_zmalloc(type, num * size, align); > > > } > > > > > > /John > > > > > > On 5/3/18, 3:21 PM, "Walker, Benjamin" tel.co > > m> wrote: > > > > > > On Thu, 2018-05-03 at 18:23 +0000, Meneghini, John > wrote: > > > > > If the user passes flags =3D=3D 0 to the new > spdk_malloc() > > call, this > > > could be > > > > implemented by malloc() or equivalent behind the > scenes, > > > > = > > > > So, does this mean you=E2=80=99re willing to ch= ange all > calls to > > malloc(size) > > > with > > > > spdk_malloc(size, 0, NULL, SPDK_ENV_SOCKET_ID_A= NY, > > SPDK_MALLOC_SHARE)? > > > > = > > > > Is that the plan? > > > > > > > = > > > Hi John, > > > = > > > SPDK explicitly depends on POSIX throughout the c= ode > base. > > We do have an > > > environment abstraction layer designed to make po= rting > SPDK > > to various > > > environments (i.e. away from DPDK) easier, but th= is > only > > abstracts > > > operations > > > that are not available through standard POSIX cal= ls. > We're > > concerned that > > > fully > > > abstracting POSIX would introduce a significant > barrier to > > entry for new > > > contributors to the project, while only enabling = one > > additional use case > > > that > > > we're aware of. I understand that this one use ca= se > happens > > to be yours, > > > and so > > > this is a challenge for you. > > > = > > > In your code today, I assume you carry patches to > replace > > the POSIX calls > > > with > > > your available alternatives. We've attempted to m= ake > > carrying these > > > patches > > > reasonably easy by moving all of the POSIX includ= es > into a > > single header > > > (include/stdinc.h). Since you're already carrying > those > > patches, can you > > > simply > > > choose a different name for your replacement for > > malloc/calloc? > > > = > > > Thanks, > > > Ben > > > = > > > > > = > > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > = > > = > > = > > > = > = > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk > =20 --===============1665521325962744648==--