From: Mats Petersson <mats.petersson@citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
"mats@planetcatfish.com" <mats@planetcatfish.com>,
Andres Lagar-Cavilla <andres@lagarcavilla.org>,
"JBeulich@suse.com" <JBeulich@suse.com>,
"konrad@darnok.org" <konrad@darnok.org>,
David Vrabel <david.vrabel@citrix.com>
Subject: Re: [PATCH V5] xen/privcmd.c: improve performance of MMAPBATCH_V2.
Date: Thu, 20 Dec 2012 11:00:30 +0000 [thread overview]
Message-ID: <50D2EFCE.8070706@citrix.com> (raw)
In-Reply-To: <1356000020.26722.39.camel@zakaz.uk.xensource.com>
On 20/12/12 10:40, Ian Campbell wrote:
> I've added Andres since I think this accumulation of ENOENT in err_ptr
> is a paging related thing, or at least I think he understands this
> code ;-)
>
>> @@ -89,37 +89,112 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
>> struct page *page = info->pages[info->index++];
>> unsigned long pfn = page_to_pfn(page);
>> pte_t pte = pfn_pte(pfn, info->prot);
>> -
>> - if (map_foreign_page(pfn, info->fgmfn, info->domid))
>> - return -EFAULT;
>> + int err;
>> + // TODO: We should really batch these updates.
>> + err = map_foreign_page(pfn, *info->fgmfn, info->domid);
>> + *info->err_ptr++ = err;
>> set_pte_at(info->vma->vm_mm, addr, ptep, pte);
>> + info->fgmfn++;
>>
>> - return 0;
>> + return err;
> This will cause apply_to_page_range to stop walking and return an error.
> AIUI the intention of the err_ptr array is to accumulate the individual
> success/error for the entire range. The caller can then take care of the
> successes/failures and ENOENTs as appropriate (in particular it doesn't
> want to abort a batch because of an ENOENT, it wants to do as much as
> possible)
>
> On x86 (when err_ptr != NULL) you accumulate all of the errors from
> HYPERVISOR_mmu_update rather than aborting on the first one and this
> seems correct to me.
Ok, will rework that bit.
>
>> int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
>> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>> index 01de35c..323a2ab 100644
>> --- a/arch/x86/xen/mmu.c
>> +++ b/arch/x86/xen/mmu.c
>> [...]
>> @@ -2528,23 +2557,98 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>> if (err)
>> goto out;
>>
>> - err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid);
>> - if (err < 0)
>> - goto out;
>> + /* We record the error for each page that gives an error, but
>> + * continue mapping until the whole set is done */
>> + do {
>> + err = HYPERVISOR_mmu_update(&mmu_update[index],
>> + batch_left, &done, domid);
>> + if (err < 0) {
>> + if (!err_ptr)
>> + goto out;
>> + /* increment done so we skip the error item */
>> + done++;
>> + last_err = err_ptr[index] = err;
>> + }
>> + batch_left -= done;
>> + index += done;
>> + } while (batch_left);
>>
>> nr -= batch;
>> addr += range;
>> + if (err_ptr)
>> + err_ptr += batch;
>> }
>>
>> - err = 0;
>> + err = last_err;
> This means that if you have 100 failures followed by one success you
> return success overall. Is that intentional? That doesn't seem right.
As far as I see, it doesn't mean that. last_err is only set at the
beginning of the call (to zero) and if there is an error.
>
>> out:
>>
>> xen_flush_tlb_all();
>>
>> return err;
>> }
> [...]
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index 0bbbccb..8f86a44 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
> [...]
>> @@ -283,12 +317,10 @@ static int mmap_batch_fn(void *data, void *state)
>> if (xen_feature(XENFEAT_auto_translated_physmap))
>> cur_page = pages[st->index++];
>>
>> - ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1,
>> - st->vma->vm_page_prot, st->domain,
>> - &cur_page);
>> -
>> - /* Store error code for second pass. */
>> - *(st->err++) = ret;
>> + BUG_ON(nr < 0);
>> + ret = xen_remap_domain_mfn_array(st->vma, st->va & PAGE_MASK, mfnp, nr,
>> + st->err, st->vma->vm_page_prot,
>> + st->domain, &cur_page);
>>
>> /* And see if it affects the global_error. */
> The code which follows needs adjustment to cope with the fact that we
> now batch. I think it needs to walk st->err and set global_error as
> appropriate. This is related to my comment about the return value of
> xen_remap_domain_mfn_range too.
The return value, as commented above, is "either 0 or the last error we
saw". There should be no need to walk the st->err, as we know if there
was some error or not.
>
> I think rather than trying to fabricate some sort of meaningful error
> code for an entire batch xen_remap_domain_mfn_range should just return
> an indication about whether there were any errors or not and leave it to
> the caller to figure out the overall result by looking at the err array.
>
> Perhaps return either the number of errors or the number of successes
> (which turns the following if into either (ret) or (ret < nr)
> respectively).
I'm trying to not change how the code above expects things to work.
Whilst it would be lovely to rewrite the entire code dealing with
mapping memory, I don't think that's within the scope of my current
project. And if I don't wish to rewrite all of libxc's memory management
code, I don't want to alter what values are returned or when. The
current code follows what WAS happening before my changes - which isn't
exactly the most fantastic thing, and I think there may actually be bugs
in there, such as:
if (ret < 0) {
if (ret == -ENOENT)
st->global_error = -ENOENT;
else {
/* Record that at least one error has happened. */
if (st->global_error == 0)
st->global_error = 1;
}
}
if we enter this once with -EFAULT, and then after that with -ENOENT,
global_error will say -ENOENT. I think knowing that we got an EFAULT is
"higher importance" than ENOENT, but that's how the old code was
working, and I'm not sure I should change it at this point.
>> if (ret < 0) {
>> @@ -300,7 +332,7 @@ static int mmap_batch_fn(void *data, void *state)
>> st->global_error = 1;
>> }
>> }
>> - st->va += PAGE_SIZE;
>> + st->va += PAGE_SIZE * nr;
>>
>> return 0;
>> }
>> @@ -430,8 +462,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
>> state.err = err_array;
>>
>> /* mmap_batch_fn guarantees ret == 0 */
>> - BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t),
>> - &pagelist, mmap_batch_fn, &state));
>> + BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
>> + &pagelist, mmap_batch_fn, &state));
> Can we make traverse_pages and _block common by passing a block size
> parameter?
Yes of course. Is there much benefit from that? I understand that it's
less code, but it also makes the original traverse_pages more complex.
Not convinced it helps much - it's quite a small function, so not much
extra code. Additionally, all of the callback functions will have to
deal with an extra parameter (that is probably ignored in all but one
place).
--
Mats
>
> Ian.
>
next prev parent reply other threads:[~2012-12-20 11:00 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-19 18:53 [PATCH V5] xen/privcmd.c: improve performance of MMAPBATCH_V2 Mats Petersson
2012-12-20 10:40 ` Ian Campbell
2012-12-20 11:00 ` Mats Petersson [this message]
2012-12-20 11:32 ` Ian Campbell
2013-01-02 15:47 ` Andres Lagar-Cavilla
2013-01-02 16:45 ` Mats Petersson
2013-01-02 16:52 ` Andres Lagar-Cavilla
2012-12-20 11:26 ` Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50D2EFCE.8070706@citrix.com \
--to=mats.petersson@citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=JBeulich@suse.com \
--cc=andres@lagarcavilla.org \
--cc=david.vrabel@citrix.com \
--cc=konrad@darnok.org \
--cc=mats@planetcatfish.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).