Re: [PATCH V5] xen/privcmd.c: improve performance of MMAPBATCH_V2.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mats Petersson <mats.petersson@citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: "xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	"mats@planetcatfish.com" <mats@planetcatfish.com>,
	Andres Lagar-Cavilla <andres@lagarcavilla.org>,
	"JBeulich@suse.com" <JBeulich@suse.com>,
	"konrad@darnok.org" <konrad@darnok.org>,
	David Vrabel <david.vrabel@citrix.com>
Subject: Re: [PATCH V5] xen/privcmd.c: improve performance of MMAPBATCH_V2.
Date: Thu, 20 Dec 2012 11:00:30 +0000	[thread overview]
Message-ID: <50D2EFCE.8070706@citrix.com> (raw)
In-Reply-To: <1356000020.26722.39.camel@zakaz.uk.xensource.com>

On 20/12/12 10:40, Ian Campbell wrote:
> I've added Andres since I think this accumulation of ENOENT in err_ptr
> is a paging related thing, or at least I think he understands this
> code ;-)
>
>> @@ -89,37 +89,112 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr,
>>          struct page *page = info->pages[info->index++];
>>          unsigned long pfn = page_to_pfn(page);
>>          pte_t pte = pfn_pte(pfn, info->prot);
>> -
>> -       if (map_foreign_page(pfn, info->fgmfn, info->domid))
>> -               return -EFAULT;
>> +       int err;
>> +       // TODO: We should really batch these updates.
>> +       err = map_foreign_page(pfn, *info->fgmfn, info->domid);
>> +       *info->err_ptr++ = err;
>>          set_pte_at(info->vma->vm_mm, addr, ptep, pte);
>> +       info->fgmfn++;
>>
>> -       return 0;
>> +       return err;
> This will cause apply_to_page_range to stop walking and return an error.
> AIUI the intention of the err_ptr array is to accumulate the individual
> success/error for the entire range. The caller can then take care of the
> successes/failures and ENOENTs as appropriate (in particular it doesn't
> want to abort a batch because of an ENOENT, it wants to do as much as
> possible)
>
> On x86 (when err_ptr != NULL) you accumulate all of the errors from
> HYPERVISOR_mmu_update rather than aborting on the first one  and this
> seems correct to me.
Ok, will rework that bit.
>
>>   int xen_unmap_domain_mfn_range(struct vm_area_struct *vma,
>> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
>> index 01de35c..323a2ab 100644
>> --- a/arch/x86/xen/mmu.c
>> +++ b/arch/x86/xen/mmu.c
>> [...]
>> @@ -2528,23 +2557,98 @@ int xen_remap_domain_mfn_range(struct vm_area_struct *vma,
>>                  if (err)
>>                          goto out;
>>
>> -               err = HYPERVISOR_mmu_update(mmu_update, batch, NULL, domid);
>> -               if (err < 0)
>> -                       goto out;
>> +               /* We record the error for each page that gives an error, but
>> +                * continue mapping until the whole set is done */
>> +               do {
>> +                       err = HYPERVISOR_mmu_update(&mmu_update[index],
>> +                                                   batch_left, &done, domid);
>> +                       if (err < 0) {
>> +                               if (!err_ptr)
>> +                                       goto out;
>> +                               /* increment done so we skip the error item */
>> +                               done++;
>> +                               last_err = err_ptr[index] = err;
>> +                       }
>> +                       batch_left -= done;
>> +                       index += done;
>> +               } while (batch_left);
>>
>>                  nr -= batch;
>>                  addr += range;
>> +               if (err_ptr)
>> +                       err_ptr += batch;
>>          }
>>
>> -       err = 0;
>> +       err = last_err;
> This means that if you have 100 failures followed by one success you
> return success overall. Is that intentional? That doesn't seem right.
As far as I see, it doesn't mean that. last_err is only set at the 
beginning of the call (to zero) and if there is an error.

>
>>   out:
>>
>>          xen_flush_tlb_all();
>>
>>          return err;
>>   }
> [...]
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index 0bbbccb..8f86a44 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
> [...]
>> @@ -283,12 +317,10 @@ static int mmap_batch_fn(void *data, void *state)
>>          if (xen_feature(XENFEAT_auto_translated_physmap))
>>                  cur_page = pages[st->index++];
>>
>> -       ret = xen_remap_domain_mfn_range(st->vma, st->va & PAGE_MASK, *mfnp, 1,
>> -                                        st->vma->vm_page_prot, st->domain,
>> -                                        &cur_page);
>> -
>> -       /* Store error code for second pass. */
>> -       *(st->err++) = ret;
>> +       BUG_ON(nr < 0);
>> +       ret = xen_remap_domain_mfn_array(st->vma, st->va & PAGE_MASK, mfnp, nr,
>> +                                        st->err, st->vma->vm_page_prot,
>> +                                        st->domain, &cur_page);
>>
>>          /* And see if it affects the global_error. */
> The code which follows needs adjustment to cope with the fact that we
> now batch. I think it needs to walk st->err and set global_error as
> appropriate. This is related to my comment about the return value of
> xen_remap_domain_mfn_range too.
The return value, as commented above, is "either 0 or the last error we 
saw". There should be no need to walk the st->err, as we know if there 
was some error or not.
>
> I think rather than trying to fabricate some sort of meaningful error
> code for an entire batch xen_remap_domain_mfn_range should just return
> an indication about whether there were any errors or not and leave it to
> the caller to figure out the overall result by looking at the err array.
>
> Perhaps return either the number of errors or the number of successes
> (which turns the following if into either (ret) or (ret < nr)
> respectively).
I'm trying to not change how the code above expects things to work. 
Whilst it would be lovely to rewrite the entire code dealing with 
mapping memory, I don't think that's within the scope of my current 
project. And if I don't wish to rewrite all of libxc's memory management 
code, I don't want to alter what values are returned or when. The 
current code follows what WAS happening before my changes - which isn't 
exactly the most fantastic thing, and I think there may actually be bugs 
in there, such as:
     if (ret < 0) {
         if (ret == -ENOENT)
             st->global_error = -ENOENT;
         else {
             /* Record that at least one error has happened. */
             if (st->global_error == 0)
                 st->global_error = 1;
         }
     }
if we enter this once with -EFAULT, and then after that with -ENOENT, 
global_error will say -ENOENT. I think knowing that we got an EFAULT is 
"higher importance" than ENOENT, but that's how the old code was 
working, and I'm not sure I should change it at this point.

>>          if (ret < 0) {
>> @@ -300,7 +332,7 @@ static int mmap_batch_fn(void *data, void *state)
>>                                  st->global_error = 1;
>>                  }
>>          }
>> -       st->va += PAGE_SIZE;
>> +       st->va += PAGE_SIZE * nr;
>>
>>          return 0;
>>   }
>> @@ -430,8 +462,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
>>          state.err           = err_array;
>>
>>          /* mmap_batch_fn guarantees ret == 0 */
>> -       BUG_ON(traverse_pages(m.num, sizeof(xen_pfn_t),
>> -                            &pagelist, mmap_batch_fn, &state));
>> +       BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
>> +                                   &pagelist, mmap_batch_fn, &state));
> Can we make traverse_pages and _block common by passing a block size
> parameter?
Yes of course. Is there much benefit from that? I understand that it's 
less code, but it also makes the original traverse_pages more complex. 
Not convinced it helps much - it's quite a small function, so not much 
extra code. Additionally, all of the callback functions will have to 
deal with an extra parameter (that is probably ignored in all but one 
place).

--
Mats
>
> Ian.
>

next prev parent reply	other threads:[~2012-12-20 11:00 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-19 18:53 [PATCH V5] xen/privcmd.c: improve performance of MMAPBATCH_V2 Mats Petersson
2012-12-20 10:40 ` Ian Campbell
2012-12-20 11:00   ` Mats Petersson [this message]
2012-12-20 11:32     ` Ian Campbell
2013-01-02 15:47       ` Andres Lagar-Cavilla
2013-01-02 16:45         ` Mats Petersson
2013-01-02 16:52           ` Andres Lagar-Cavilla
2012-12-20 11:26   ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50D2EFCE.8070706@citrix.com \
    --to=mats.petersson@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=andres@lagarcavilla.org \
    --cc=david.vrabel@citrix.com \
    --cc=konrad@darnok.org \
    --cc=mats@planetcatfish.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.