From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mats Petersson Subject: Re: VMX status report. Xen:26323 & Dom0:3.7.1 Date: Thu, 10 Jan 2013 17:27:04 +0000 Message-ID: <50EEF9E8.9040508@citrix.com> References: <1B4B44D9196EFF41AE41FDA404FC0A1024486E@SHSMSX101.ccr.corp.intel.com> <50EE908602000078000B44CE@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 10/01/13 17:10, Andres Lagar-Cavilla wrote: > On Jan 10, 2013, at 3:57 AM, "Jan Beulich" wrote: > >>>>> On 10.01.13 at 08:51, "Ren, Yongjie" wrote: >>> New issue(1) >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>> 1. sometimes live migration failed and reported call trace in dom0 >>> http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=3D1841 >> For the failed allocation, the only obvious candidate appears to be >> >> err_array =3D kcalloc(m.num, sizeof(int), GFP_KERNEL); >> >> which quite obviously can be of (almost) arbitrary size because >> >> nr_pages =3D m.num; >> if ((m.num <=3D 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT))) >> return -EINVAL; >> >> really only checks for completely insane values. >> >> This got introduced by Andres' "xen/privcmd: add PRIVCMD_MMAPBATCH_V2 >> ioctl" and is becoming worse with Mukesh's recent "xen: privcmd: >> support autotranslated physmap guests", which added another >> similar (twice as large) allocation in alloc_empty_pages(). > Perhaps the err_array in this case, since alloc_empty_pages only happens = for auto translated dom0s. > > Not familiar wether libxl changes (or is even capable of changing) parame= ters of the migration code. But right now in libxc, mapping is done in MAX_= BATCH_SIZE batches, which are of size 1024. So we are talking about 1024 in= ts, which is *one* page. > > So is really the kernel incapable of allocating one measly page? > > This leads me to think that it might be gather_array, but that one would = allocate a grand total of two pages. > > In any case, both functions allocate arbitrary number of pages, and that = is the fundamental problem. > > What is the approach in the forward ported kernel wrt to gather_array? > > The cleanest alternative I can think of is to refactor the the body of mm= ap_batch to allocate one page for each array, and iteratively call traverse= _pages recycling the local arrays and increasing the pointers in the source= user space arrays. > > Having said that, that would allocate two pages (always), and the code ri= ght now allocates max three (for libxc driven migrations). So maybe the pro= blem is elsewhere=85. > > Thanks, > Andres Whilst this may not add much to the discussion, where I have been = working on the improved privcmd.c, I have been using 3.7.0rc5 and 3.8.0. = Both of these seem to work fine for migration using the libxc interface = (since I've been using the Xenserver build, the migration is not done = through libxl). I have not had a single failure to allocate pages in the migration, - I = have a script that loops around migrating the guest to the same host as = quickly as it can and I have used guests up to 64GB (and left that = running overnight - it takes about 3 minutes, so a night gives several = hundred iterations. So I'm wondering what is different between my setup and this one... -- Mats > >> I'd like to note that the forward ported kernels don't appear to >> have a similar issue, as they never allocates more than a page at >> a time. Was that code consulted at all when that addition was >> done? >> >> Jan >> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > >