Re: [PATCH V5] xen/privcmd.c: improve performance of MMAPBATCH_V2.

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Mats Petersson <mats.petersson@citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	"mats@planetcatfish.com" <mats@planetcatfish.com>,
	Andres Lagar-Cavilla <andres@lagarcavilla.org>,
	"JBeulich@suse.com" <JBeulich@suse.com>,
	"konrad@darnok.org" <konrad@darnok.org>,
	David Vrabel <david.vrabel@citrix.com>
Subject: Re: [PATCH V5] xen/privcmd.c: improve performance of MMAPBATCH_V2.
Date: Thu, 20 Dec 2012 11:00:30 +0000	[thread overview]
Message-ID: <50D2EFCE.8070706@citrix.com> (raw)
In-Reply-To: <1356000020.26722.39.camel@zakaz.uk.xensource.com>

On 20/12/12 10:40, Ian Campbell wrote:
> I've added Andres since I think this accumulation of ENOENT in err_ptr
> is a paging related thing, or at least I think he understands this
> code ;-)
>
>> @@ -89,37 +89,112 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
>>          struct page *page = info->pages[info->index++];
>>          unsigned long pfn = page_to_pfn(page);
>>          pte_t pte = pfn_pte(pfn, info->prot);
>> -
>> -       if (map_foreign_page(pfn, info->fgmfn, info->domid))
>> -               return -EFAULT;
>> +       int err;
>> +       // TODO: We should really batch these updates.
>> +       err = map_foreign_page(pfn, *info->fgmfn, info->domid);
>> +       *info->err_ptr++ = err;
>>          set_pte_at(info->vma->vm_mm, addr, ptep, pte);
>> +       info->fgmfn++;
>>
>> -       return 0;
>> +       return err;
> This will cause apply_to_page_range to stop walking and return an error.
> AIUI the intention of the err_ptr array is to accumulate the individual
> success/error for the entire range. The caller can then take care of the
> successes/failures and ENOENTs as appropriate (in particular it doesn't
> want to abort a batch because of an ENOENT, it wants to do as much as
> possible)
>
> On x86 (when err_ptr != NULL) you accumulate all of the errors from
> HYPERVISOR_mmu_update rather than aborting on the first one  and this
> seems correct to me.
Ok, will rework that bit.
>
>>   int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
>> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>> index 01de35c..323a2ab 100644
>> --- a/arch/x86/xen/mmu.c
>> +++ b/arch/x86/xen/mmu.c
>> [...]
>> @@ -2528,23 +2557,98 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>>                  if (err)
>>                          goto out;
>>
>> -               err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid);
>> -               if (err < 0)
>> -                       goto out;
>> +               /* We record the error for each page that gives an error, but
>> +                * continue mapping until the whole set is done */
>> +               do {
>> +                       err = HYPERVISOR_mmu_update(&mmu_update[index],
>> +                                                   batch_left, &done, domid);
>> +                       if (err < 0) {
>> +                               if (!err_ptr)
>> +                                       goto out;
>> +                               /* increment done so we skip the error item */
>> +                               done++;
>> +                               last_err = err_ptr[index] = err;
>> +                       }
>> +                       batch_left -= done;
>> +                       index += done;
>> +               } while (batch_left);
>>
>>                  nr -= batch;
>>                  addr += range;
>> +               if (err_ptr)
>> +                       err_ptr += batch;
>>          }
>>
>> -       err = 0;
>> +       err = last_err;
> This means that if you have 100 failures followed by one success you
> return success overall. Is that intentional? That doesn't seem right.
As far as I see, it doesn't mean that. last_err is only set at the 
beginning of the call (to zero) and if there is an error.

>
>>   out:
>>
>>          xen_flush_tlb_all();
>>
>>          return err;
>>   }
> [...]
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index 0bbbccb..8f86a44 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
> [...]
>> @@ -283,12 +317,10 @@ static int mmap_batch_fn(void *data, void *state)
>>          if (xen_feature(XENFEAT_auto_translated_physmap))
>>                  cur_page = pages[st->index++];
>>
>> -       ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1,
>> -                                        st->vma->vm_page_prot, st->domain,
>> -                                        &cur_page);
>> -
>> -       /* Store error code for second pass. */
>> -       *(st->err++) = ret;
>> +       BUG_ON(nr < 0);
>> +       ret = xen_remap_domain_mfn_array(st->vma, st->va & PAGE_MASK, mfnp, nr,
>> +                                        st->err, st->vma->vm_page_prot,
>> +                                        st->domain, &cur_page);
>>
>>          /* And see if it affects the global_error. */
> The code which follows needs adjustment to cope with the fact that we
> now batch. I think it needs to walk st->err and set global_error as
> appropriate. This is related to my comment about the return value of
> xen_remap_domain_mfn_range too.
The return value, as commented above, is "either 0 or the last error we 
saw". There should be no need to walk the st->err, as we know if there 
was some error or not.
>
> I think rather than trying to fabricate some sort of meaningful error
> code for an entire batch xen_remap_domain_mfn_range should just return
> an indication about whether there were any errors or not and leave it to
> the caller to figure out the overall result by looking at the err array.
>
> Perhaps return either the number of errors or the number of successes
> (which turns the following if into either (ret) or (ret < nr)
> respectively).
I'm trying to not change how the code above expects things to work. 
Whilst it would be lovely to rewrite the entire code dealing with 
mapping memory, I don't think that's within the scope of my current 
project. And if I don't wish to rewrite all of libxc's memory management 
code, I don't want to alter what values are returned or when. The 
current code follows what WAS happening before my changes - which isn't 
exactly the most fantastic thing, and I think there may actually be bugs 
in there, such as:
     if (ret < 0) {
         if (ret == -ENOENT)
             st->global_error = -ENOENT;
         else {
             /* Record that at least one error has happened. */
             if (st->global_error == 0)
                 st->global_error = 1;
         }
     }
if we enter this once with -EFAULT, and then after that with -ENOENT, 
global_error will say -ENOENT. I think knowing that we got an EFAULT is 
"higher importance" than ENOENT, but that's how the old code was 
working, and I'm not sure I should change it at this point.

>>          if (ret < 0) {
>> @@ -300,7 +332,7 @@ static int mmap_batch_fn(void *data, void *state)
>>                                  st->global_error = 1;
>>                  }
>>          }
>> -       st->va += PAGE_SIZE;
>> +       st->va += PAGE_SIZE * nr;
>>
>>          return 0;
>>   }
>> @@ -430,8 +462,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
>>          state.err           = err_array;
>>
>>          /* mmap_batch_fn guarantees ret == 0 */
>> -       BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t),
>> -                            &pagelist, mmap_batch_fn, &state));
>> +       BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
>> +                                   &pagelist, mmap_batch_fn, &state));
> Can we make traverse_pages and _block common by passing a block size
> parameter?
Yes of course. Is there much benefit from that? I understand that it's 
less code, but it also makes the original traverse_pages more complex. 
Not convinced it helps much - it's quite a small function, so not much 
extra code. Additionally, all of the callback functions will have to 
deal with an extra parameter (that is probably ignored in all but one 
place).

--
Mats
>
> Ian.
>

next prev parent reply	other threads:[~2012-12-20 11:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-19 18:53 [PATCH V5] xen/privcmd.c: improve performance of MMAPBATCH_V2 Mats Petersson
2012-12-20 10:40 ` Ian Campbell
2012-12-20 11:00   ` Mats Petersson [this message]
2012-12-20 11:32     ` Ian Campbell
2013-01-02 15:47       ` Andres Lagar-Cavilla
2013-01-02 16:45         ` Mats Petersson
2013-01-02 16:52           ` Andres Lagar-Cavilla
2012-12-20 11:26   ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50D2EFCE.8070706@citrix.com \
    --to=mats.petersson@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andres@lagarcavilla.org \
    --cc=david.vrabel@citrix.com \
    --cc=konrad@darnok.org \
    --cc=mats@planetcatfish.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).