From mboxrd@z Thu Jan 1 00:00:00 1970 From: Aravindh Puthiyaparambil Subject: Re: [PATCH] libxc: Replace alloca() with mmap() in linux_privcmd_map_foreign_bulk() Date: Fri, 20 Apr 2012 08:12:19 -0700 Message-ID: References: <2f68aefc46c35ef5c0c3.1334874293@vollum> <1334925395.28331.61.camel@zakaz.uk.xensource.com> <260d4d777fe1bf57d19fadc27c781343.squirrel@webmail.lagarcavilla.org> <1334931321.28331.97.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4991420909014632144==" Return-path: In-Reply-To: <1334931321.28331.97.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: "xen-devel@lists.xensource.com" , "andres@lagarcavilla.org" , Santosh Jodh List-Id: xen-devel@lists.xenproject.org --===============4991420909014632144== Content-Type: multipart/alternative; boundary=bcaec555561ef1f1a304be1db629 --bcaec555561ef1f1a304be1db629 Content-Type: text/plain; charset=ISO-8859-1 On Apr 20, 2012 7:15 AM, "Ian Campbell" wrote: > > On Fri, 2012-04-20 at 15:08 +0100, Andres Lagar-Cavilla wrote: > > > On Thu, 2012-04-19 at 23:24 +0100, Aravindh Puthiyaparambil wrote: > > >> When mapping in large amounts of pages (in the GB range) from a guest > > >> in to Dom0 using xc_map_foreign_bulk(), a segfault occurs in the libxc > > >> client application. This is because the pfn array in > > >> linux_privcmd_map_foreign_bulk() is being allocated using alloca() and > > >> the subsequent memcpy causes the stack to blow. This patch replaces > > >> the alloca() with mmap(). > > > > > > The original reason for switching to alloca from malloc was because the > > > similar gntmap path was a hot path and it seemed like if it was > > > reasonable there it would be reasonable here too. > > > > > > So I have a couple of questions... > > > > > > Is it possible that we also introduced this same issue at the other > > > callsites which were changed in that patch or are there other > > > constraints on the size of the allocation which make it not a problem? > > > > I don't know how arbitrary or large can the buffer sizes be in the other > > callsites. It would seem that they can be large enough. > > > > > > Where does this mmap approach fall in terms of performance relative to > > > just switching back to malloc? I presume that alloca is faster than > > > both, but if it doesn't work reliably then it's not an option. > > > > It's hard to dig out exactly how alloca is implemented, but it would seem > > to simply move the stack pointer and return the old one (it's arguable > > whether there is a buggy check for limits or a buggy lack of check against > > the guard page going on -- man page says "behavior undefined"). So, if you > > need new pages in the stack as a result of the stack pointer motion, those > > pages will be demand allocated. > > > > mmap just short circuits straight to page allocation. Note that malloc may > > or may not wind up calling mmap if it needs more pages, depending on its > > internal machinations. > > > > MAP_POPULATE tells the mmap syscall to allocate right now, avoiding demand > > allocation. Presumably this will batch all page allocations in the single > > syscall, and avoid page faults down the line. > > > > Short story, I suggested mmap with MAP_POPULATE because it will do the > > allocation of pages in the most optimal fashion, even better than malloc > > or alloca. But it's only worth if the buffer is big enough such that new > > pages will need to be allocated. > > I think in these cases the whole buffer will always be used, since they > are sized precisely for what they need to contain. > > > I think a reasonable compromise would be to do alloca if the buffer is > > less than one page (a fairly common case, single pfn buffers, etc), and do > > mmap(MAP_POPULATE) for buffers larger than a page. > > I agree. Sounds good. I will send out a patch that does this. > Ian. > > > > > Andres > > > > > > > > If mmap and malloc have roughly equivalent performance properties, or > > > the fast one is still too slow for Santosh's use case, then maybe we > > > need to have a think about other options. > > > > > > e.g. use alloca for small numbers of mappings and mmap/malloc for larger > > > ones. Or do large numbers of mappings in multiple batches. > > > > > > Ian. > > > > > >> > > >> Signed-off-by: Aravindh Puthiyaparambil > > >> Acked-by: Andres Lagar-Cavilla > > >> > > >> diff -r 7c777cb8f705 -r 2f68aefc46c3 tools/libxc/xc_linux_osdep.c > > >> --- a/tools/libxc/xc_linux_osdep.c Wed Apr 18 16:49:55 2012 +0100 > > >> +++ b/tools/libxc/xc_linux_osdep.c Thu Apr 19 15:21:43 2012 -0700 > > >> @@ -39,6 +39,7 @@ > > >> #include "xenctrl.h" > > >> #include "xenctrlosdep.h" > > >> > > >> +#define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<<(_w))-1) & > > >> ~((1UL<<(_w))-1)) > > >> #define ERROR(_m, _a...) > > >> xc_osdep_log(xch,XTL_ERROR,XC_INTERNAL_ERROR,_m , ## _a ) > > >> #define PERROR(_m, _a...) > > >> xc_osdep_log(xch,XTL_ERROR,XC_INTERNAL_ERROR,_m \ > > >> " (%d = %s)", ## _a , errno, xc_strerror(xch, errno)) > > >> @@ -286,7 +287,14 @@ static void *linux_privcmd_map_foreign_b > > >> * IOCTL_PRIVCMD_MMAPBATCH. > > >> */ > > >> privcmd_mmapbatch_t ioctlx; > > >> - xen_pfn_t *pfn = alloca(num * sizeof(*pfn)); > > >> + xen_pfn_t *pfn = mmap(NULL, ROUNDUP((num * sizeof(*pfn)), > > >> XC_PAGE_SHIFT), > > >> + PROT_READ | PROT_WRITE, > > >> + MAP_PRIVATE | MAP_ANON | MAP_POPULATE, > > >> -1, 0); > > >> + if ( pfn == MAP_FAILED ) > > >> + { > > >> + PERROR("xc_map_foreign_bulk: mmap of pfn array failed"); > > >> + return NULL; > > >> + } > > >> > > >> memcpy(pfn, arr, num * sizeof(*arr)); > > >> > > >> @@ -328,6 +336,8 @@ static void *linux_privcmd_map_foreign_b > > >> break; > > >> } > > >> > > >> + munmap(pfn, ROUNDUP((num * sizeof(*pfn)), XC_PAGE_SHIFT)); > > >> + > > >> if ( rc == -ENOENT && i == num ) > > >> rc = 0; > > >> else if ( rc ) > > >> > > >> _______________________________________________ > > >> Xen-devel mailing list > > >> Xen-devel@lists.xen.org > > >> http://lists.xen.org/xen-devel > > > > > > > > > > > > > > > --bcaec555561ef1f1a304be1db629 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

On Apr 20, 2012 7:15 AM, "Ian Campbell" <Ian.Campbell@citrix.com> wrote:
>
> On Fri, 2012-04-20 at 15:08 +0100, Andres Lagar-Cavilla wrote:
> > > On Thu, 2012-04-19 at 23:24 +0100, Aravindh Puthiyaparambil = wrote:
> > >> When mapping in large amounts of pages (in the GB range)= from a guest
> > >> in to Dom0 using xc_map_foreign_bulk(), a segfault occur= s in the libxc
> > >> client application. This is because the pfn array in
> > >> linux_privcmd_map_foreign_bulk() is being allocated usin= g alloca() and
> > >> the subsequent memcpy causes the stack to blow. This pat= ch replaces
> > >> the alloca() with mmap().
> > >
> > > The original reason for switching to alloca from malloc was = because the
> > > similar gntmap path was a hot path and it seemed like if it = was
> > > reasonable there it would be reasonable here too.
> > >
> > > So I have a couple of questions...
> > >
> > > Is it possible that we also introduced this same issue at th= e other
> > > callsites which were changed in that patch or are there othe= r
> > > constraints on the size of the allocation which make it not = a problem?
> >
> > I don't know how arbitrary or large can the buffer sizes be i= n the other
> > callsites. It would seem that they can be large enough.
> > >
> > > Where does this mmap approach fall in terms of performance r= elative to
> > > just switching back to malloc? I presume that alloca is fast= er than
> > > both, but if it doesn't work reliably then it's not = an option.
> >
> > It's hard to dig out exactly how alloca is implemented, but i= t would seem
> > to simply move the stack pointer and return the old one (it's= arguable
> > whether there is a buggy check for limits or a buggy lack of chec= k against
> > the guard page going on -- man page says "behavior undefined= "). So, if you
> > need new pages in the stack as a result of the stack pointer moti= on, those
> > pages will be demand allocated.
> >
> > mmap just short circuits straight to page allocation. Note that m= alloc may
> > or may not wind up calling mmap if it needs more pages, depending= on its
> > internal machinations.
> >
> > MAP_POPULATE tells the mmap syscall to allocate right now, avoidi= ng demand
> > allocation. Presumably this will batch all page allocations in th= e single
> > syscall, and avoid page faults down the line.
> >
> > Short story, I suggested mmap with MAP_POPULATE because it will d= o the
> > allocation of pages in the most optimal fashion, even better than= malloc
> > or alloca. But it's only worth if the buffer is big enough su= ch that new
> > pages will need to be allocated.
>
> I think in these cases the whole buffer will always be used, since the= y
> are sized precisely for what they need to contain.
>
> > I think a reasonable compromise would be to do alloca if the buff= er is
> > less than one page (a fairly common case, single pfn buffers, etc= ), and do
> > mmap(MAP_POPULATE) for buffers larger than a page.
>
> I agree.

Sounds good. I will send out a patch that does this.

> Ian.
>
> >
> > Andres
> >
> > >
> > > If mmap and malloc have roughly equivalent performance prope= rties, or
> > > the fast one is still too slow for Santosh's use case, t= hen maybe we
> > > need to have a think about other options.
> > >
> > > e.g. use alloca for small numbers of mappings and mmap/mallo= c for larger
> > > ones. Or do large numbers of mappings in multiple batches. > > >
> > > Ian.
> > >
> > >>
> > >> Signed-off-by: Aravindh Puthiyaparambil <aravindh@virtuata.com>
> > >> Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
> > >>
> > >> diff -r 7c777cb8f705 -r 2f68aefc46c3 tools/libxc/xc_linu= x_osdep.c
> > >> --- a/tools/libxc/xc_linux_osdep.c Wed Apr 18 16:49:55 2= 012 +0100
> > >> +++ b/tools/libxc/xc_linux_osdep.c Thu Apr 19 15:21:43 2= 012 -0700
> > >> @@ -39,6 +39,7 @@
> > >> =A0#include "xenctrl.h"
> > >> =A0#include "xenctrlosdep.h"
> > >>
> > >> +#define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<&l= t;(_w))-1) &
> > >> ~((1UL<<(_w))-1))
> > >> =A0#define ERROR(_m, _a...)
> > >> xc_osdep_log(xch,XTL_ERROR,XC_INTERNAL_ERROR,_m , ## _a = )
> > >> =A0#define PERROR(_m, _a...)
> > >> xc_osdep_log(xch,XTL_ERROR,XC_INTERNAL_ERROR,_m \
> > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0" (%d =3D %s= )", ## _a , errno, xc_strerror(xch, errno))
> > >> @@ -286,7 +287,14 @@ static void *linux_privcmd_map_fore= ign_b
> > >> =A0 =A0 =A0 =A0 =A0 * IOCTL_PRIVCMD_MMAPBATCH.
> > >> =A0 =A0 =A0 =A0 =A0 */
> > >> =A0 =A0 =A0 =A0 =A0privcmd_mmapbatch_t ioctlx;
> > >> - =A0 =A0 =A0 =A0xen_pfn_t *pfn =3D alloca(num * sizeof(= *pfn));
> > >> + =A0 =A0 =A0 =A0xen_pfn_t *pfn =3D mmap(NULL, ROUNDUP((= num * sizeof(*pfn)),
> > >> XC_PAGE_SHIFT),
> > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0PROT_READ | PROT_WRITE,
> > >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0MAP_PRIVATE | MAP_ANON | MAP_POPULATE,
> > >> -1, 0);
> > >> + =A0 =A0 =A0 =A0if ( pfn =3D=3D MAP_FAILED )
> > >> + =A0 =A0 =A0 =A0{
> > >> + =A0 =A0 =A0 =A0 =A0 =A0PERROR("xc_map_foreign_bul= k: mmap of pfn array failed");
> > >> + =A0 =A0 =A0 =A0 =A0 =A0return NULL;
> > >> + =A0 =A0 =A0 =A0}
> > >>
> > >> =A0 =A0 =A0 =A0 =A0memcpy(pfn, arr, num * sizeof(*arr));=
> > >>
> > >> @@ -328,6 +336,8 @@ static void *linux_privcmd_map_forei= gn_b
> > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0break;
> > >> =A0 =A0 =A0 =A0 =A0}
> > >>
> > >> + =A0 =A0 =A0 =A0munmap(pfn, ROUNDUP((num * sizeof(*pfn)= ), XC_PAGE_SHIFT));
> > >> +
> > >> =A0 =A0 =A0 =A0 =A0if ( rc =3D=3D -ENOENT && i = =3D=3D num )
> > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0rc =3D 0;
> > >> =A0 =A0 =A0 =A0 =A0else if ( rc )
> > >>
> > >> _______________________________________________
> > >> Xen-devel mailing list
> > >> Xen-devel@lis= ts.xen.org
> > >> http://lists.= xen.org/xen-devel
> > >
> > >
> > >
> >
> >
>
>

--bcaec555561ef1f1a304be1db629-- --===============4991420909014632144== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============4991420909014632144==--