From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mats Petersson Subject: Re: [PATCH V5] xen/privcmd.c: improve performance of MMAPBATCH_V2. Date: Thu, 20 Dec 2012 11:00:30 +0000 Message-ID: <50D2EFCE.8070706@citrix.com> References: <1355943187-4167-1-git-send-email-mats.petersson@citrix.com> <1356000020.26722.39.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1356000020.26722.39.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: "xen-devel@lists.xensource.com" , "mats@planetcatfish.com" , Andres Lagar-Cavilla , "JBeulich@suse.com" , "konrad@darnok.org" , David Vrabel List-Id: xen-devel@lists.xenproject.org On 20/12/12 10:40, Ian Campbell wrote: > I've added Andres since I think this accumulation of ENOENT in err_ptr > is a paging related thing, or at least I think he understands this > code ;-) > >> @@ -89,37 +89,112 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr, >> struct page *page = info->pages[info->index++]; >> unsigned long pfn = page_to_pfn(page); >> pte_t pte = pfn_pte(pfn, info->prot); >> - >> - if (map_foreign_page(pfn, info->fgmfn, info->domid)) >> - return -EFAULT; >> + int err; >> + // TODO: We should really batch these updates. >> + err = map_foreign_page(pfn, *info->fgmfn, info->domid); >> + *info->err_ptr++ = err; >> set_pte_at(info->vma->vm_mm, addr, ptep, pte); >> + info->fgmfn++; >> >> - return 0; >> + return err; > This will cause apply_to_page_range to stop walking and return an error. > AIUI the intention of the err_ptr array is to accumulate the individual > success/error for the entire range. The caller can then take care of the > successes/failures and ENOENTs as appropriate (in particular it doesn't > want to abort a batch because of an ENOENT, it wants to do as much as > possible) > > On x86 (when err_ptr != NULL) you accumulate all of the errors from > HYPERVISOR_mmu_update rather than aborting on the first one and this > seems correct to me. Ok, will rework that bit. > >> int xen_unmap_domain_mfn_range(struct vm_area_struct *vma, >> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c >> index 01de35c..323a2ab 100644 >> --- a/arch/x86/xen/mmu.c >> +++ b/arch/x86/xen/mmu.c >> [...] >> @@ -2528,23 +2557,98 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma, >> if (err) >> goto out; >> >> - err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid); >> - if (err < 0) >> - goto out; >> + /* We record the error for each page that gives an error, but >> + * continue mapping until the whole set is done */ >> + do { >> + err = HYPERVISOR_mmu_update(&mmu_update[index], >> + batch_left, &done, domid); >> + if (err < 0) { >> + if (!err_ptr) >> + goto out; >> + /* increment done so we skip the error item */ >> + done++; >> + last_err = err_ptr[index] = err; >> + } >> + batch_left -= done; >> + index += done; >> + } while (batch_left); >> >> nr -= batch; >> addr += range; >> + if (err_ptr) >> + err_ptr += batch; >> } >> >> - err = 0; >> + err = last_err; > This means that if you have 100 failures followed by one success you > return success overall. Is that intentional? That doesn't seem right. As far as I see, it doesn't mean that. last_err is only set at the beginning of the call (to zero) and if there is an error. > >> out: >> >> xen_flush_tlb_all(); >> >> return err; >> } > [...] >> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c >> index 0bbbccb..8f86a44 100644 >> --- a/drivers/xen/privcmd.c >> +++ b/drivers/xen/privcmd.c > [...] >> @@ -283,12 +317,10 @@ static int mmap_batch_fn(void *data, void *state) >> if (xen_feature(XENFEAT_auto_translated_physmap)) >> cur_page = pages[st->index++]; >> >> - ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1, >> - st->vma->vm_page_prot, st->domain, >> - &cur_page); >> - >> - /* Store error code for second pass. */ >> - *(st->err++) = ret; >> + BUG_ON(nr < 0); >> + ret = xen_remap_domain_mfn_array(st->vma, st->va & PAGE_MASK, mfnp, nr, >> + st->err, st->vma->vm_page_prot, >> + st->domain, &cur_page); >> >> /* And see if it affects the global_error. */ > The code which follows needs adjustment to cope with the fact that we > now batch. I think it needs to walk st->err and set global_error as > appropriate. This is related to my comment about the return value of > xen_remap_domain_mfn_range too. The return value, as commented above, is "either 0 or the last error we saw". There should be no need to walk the st->err, as we know if there was some error or not. > > I think rather than trying to fabricate some sort of meaningful error > code for an entire batch xen_remap_domain_mfn_range should just return > an indication about whether there were any errors or not and leave it to > the caller to figure out the overall result by looking at the err array. > > Perhaps return either the number of errors or the number of successes > (which turns the following if into either (ret) or (ret < nr) > respectively). I'm trying to not change how the code above expects things to work. Whilst it would be lovely to rewrite the entire code dealing with mapping memory, I don't think that's within the scope of my current project. And if I don't wish to rewrite all of libxc's memory management code, I don't want to alter what values are returned or when. The current code follows what WAS happening before my changes - which isn't exactly the most fantastic thing, and I think there may actually be bugs in there, such as: if (ret < 0) { if (ret == -ENOENT) st->global_error = -ENOENT; else { /* Record that at least one error has happened. */ if (st->global_error == 0) st->global_error = 1; } } if we enter this once with -EFAULT, and then after that with -ENOENT, global_error will say -ENOENT. I think knowing that we got an EFAULT is "higher importance" than ENOENT, but that's how the old code was working, and I'm not sure I should change it at this point. >> if (ret < 0) { >> @@ -300,7 +332,7 @@ static int mmap_batch_fn(void *data, void *state) >> st->global_error = 1; >> } >> } >> - st->va += PAGE_SIZE; >> + st->va += PAGE_SIZE * nr; >> >> return 0; >> } >> @@ -430,8 +462,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) >> state.err = err_array; >> >> /* mmap_batch_fn guarantees ret == 0 */ >> - BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t), >> - &pagelist, mmap_batch_fn, &state)); >> + BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t), >> + &pagelist, mmap_batch_fn, &state)); > Can we make traverse_pages and _block common by passing a block size > parameter? Yes of course. Is there much benefit from that? I understand that it's less code, but it also makes the original traverse_pages more complex. Not convinced it helps much - it's quite a small function, so not much extra code. Additionally, all of the callback functions will have to deal with an extra parameter (that is probably ignored in all but one place). -- Mats > > Ian. >