From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH v5] x86/p2m: use large pages for MMIO
	mappings
Date: Wed, 27 Jan 2016 14:28:17 +0000
Message-ID: <56A8D401.6080100@citrix.com>
References: <56A25C0602000078000CA367@prv-mh.provo.novell.com>
	<1453724207.4320.137.camel@citrix.com>
	<56A6371802000078000CAA6B@prv-mh.provo.novell.com>
	<1453730752.4320.164.camel@citrix.com>
	<56A63C4002000078000CAAA7@prv-mh.provo.novell.com>
	<1453731704.4320.173.camel@citrix.com>
	<56A658FE02000078000CAC3D@prv-mh.provo.novell.com>
	<56A8B8C2.5010905@citrix.com>
	<56A8D61202000078000CB8EF@prv-mh.provo.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <prvs=82734b35b=Andrew.Cooper3@citrix.com>)
	id 1aOR57-0007U8-4e
	for xen-devel@lists.xenproject.org; Wed, 27 Jan 2016 14:28:25 +0000
In-Reply-To: <56A8D61202000078000CB8EF@prv-mh.provo.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>, Wei Liu <wei.liu2@citrix.com>, Ian Campbell <ian.campbell@citrix.com>, Stefano Stabellini <stefano.stabellini@eu.citrix.com>, George Dunlap <George.Dunlap@eu.citrix.com>, Tim Deegan <tim@xen.org>, Ian Jackson <Ian.Jackson@eu.citrix.com>, Jun Nakajima <jun.nakajima@intel.com>, xen-devel <xen-devel@lists.xenproject.org>, Keir Fraser <keir@xen.org>
List-Id: xen-devel@lists.xenproject.org

On 27/01/16 13:37, Jan Beulich wrote:
>>>> On 27.01.16 at 13:32, <andrew.cooper3@citrix.com> wrote:
>> On 25/01/16 16:18, Jan Beulich wrote:
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -2491,7 +2491,7 @@ static int vmx_alloc_vlapic_mapping(stru
>>>      share_xen_page_with_guest(pg, d, XENSHARE_writable);
>>>      d->arch.hvm_domain.vmx.apic_access_mfn = mfn;
>>>      set_mmio_p2m_entry(d, paddr_to_pfn(APIC_DEFAULT_PHYS_BASE), _mfn(mfn),
>>> -                       p2m_get_hostp2m(d)->default_access);
>>> +                       PAGE_ORDER_4K, p2m_get_hostp2m(d)->default_access);
>>>  
>> This should ASSERT() success, in case we make further changes to the
>> error handling.
> Maybe, but since it didn't before I don't see why this couldn't /
> shouldn't be an independent future patch.

Can be.  IMO it is a bug that it isn't already checked.  (-ENOMEM when
allocating p2m leaves perhaps?)

>
>>> --- a/xen/arch/x86/mm/p2m.c
>>> +++ b/xen/arch/x86/mm/p2m.c
>>> @@ -899,48 +899,62 @@ void p2m_change_type_range(struct domain
>>>      p2m_unlock(p2m);
>>>  }
>>>  
>>> -/* Returns: 0 for success, -errno for failure */
>>> +/*
>>> + * Returns:
>>> + *    0        for success
>>> + *    -errno   for failure
>>> + *    order+1  for caller to retry with order (guaranteed smaller than
>>> + *             the order value passed in)
>>> + */
>>>  static int set_typed_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
>>> -                               p2m_type_t gfn_p2mt, p2m_access_t access)
>>> +                               unsigned int order, p2m_type_t gfn_p2mt,
>>> +                               p2m_access_t access)
>>>  {
>>>      int rc = 0;
>>>      p2m_access_t a;
>>>      p2m_type_t ot;
>>>      mfn_t omfn;
>>> +    unsigned int cur_order = 0;
>>>      struct p2m_domain *p2m = p2m_get_hostp2m(d);
>>>  
>>>      if ( !paging_mode_translate(d) )
>>>          return -EIO;
>>>  
>>> -    gfn_lock(p2m, gfn, 0);
>>> -    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, NULL, NULL);
>>> +    gfn_lock(p2m, gfn, order);
>>> +    omfn = p2m->get_entry(p2m, gfn, &ot, &a, 0, &cur_order, NULL);
>>> +    if ( cur_order < order )
>>> +    {
>>> +        gfn_unlock(p2m, gfn, order);
>>> +        return cur_order + 1;
>> Your comment states that the return value is guarenteed to be less than
>> the passed-in order, but this is not the case here.  cur_order could, in
>> principle, be only 1 less than order, at which point your documentation
>> is incorrect.
>>
>> Does this rely on the x86 architectural orders to function as documented?
> No. Maybe the comment text is ambiguous, but I don't see how to
> improve it without making it too lengthy: The return value is
> <order>+1, telling the caller to retry with <order>, which is
> guaranteed to be less than the order that got passed in. I.e. taking
> the variable naming above, the caller would have to retry with
> cur_order, which - due to the if() - is smaller than order.

Ah - I see.  The text is indeed confusing.  How about:

"1 + new order: for caller to retry with smaller order (guaranteed to be
smaller than order passed in)"

>
>>> +    }
>>>      if ( p2m_is_grant(ot) || p2m_is_foreign(ot) )
>>>      {
>>> -        gfn_unlock(p2m, gfn, 0);
>>> +        gfn_unlock(p2m, gfn, order);
>>>          domain_crash(d);
>>>          return -ENOENT;
>>>      }
>>>      else if ( p2m_is_ram(ot) )
>>>      {
>>> +        unsigned long i;
>>> +
>>>          ASSERT(mfn_valid(omfn));
>> Shouldn't this check should be extended to the top of the order?
> Well, yes, perhaps better to move it into ...
>
>>> -        set_gpfn_from_mfn(mfn_x(omfn), INVALID_M2P_ENTRY);
>>> +        for ( i = 0; i < (1UL << order); ++i )
>>> +            set_gpfn_from_mfn(mfn_x(omfn) + i, INVALID_M2P_ENTRY);
> ... the body of the for(). But I'll wait with v6 until we settled on
> the other aspects you raise.
>
>>>  int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn,
>>> -                       p2m_access_t access)
>>> +                       unsigned int order, p2m_access_t access)
>>>  {
>>> -    return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
>>> +    if ( order &&
>>> +         rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
>>> +                                 mfn_x(mfn) + (1UL << order) - 1) &&
>>> +         !rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
>>> +                                  mfn_x(mfn) + (1UL << order) - 1) )
>>> +        return order;
>> Should this not be a hard error?  Even retrying with a lower order is
>> going fail.
> Why? The latest when order == 0, rangeset_overlaps_range()
> will return the same as rangeset_contains_range(), and hence
> the condition above will always be false (one of the two reasons
> for checking order first here).

It isn't the order check which is an issue.

One way or another, if the original (mfn/order) fails the rangeset
checks, the overall call is going to fail, but it will be re-executed
repeatedly with an order decreasing to 0.  Wouldn't it be better just to
short-circuit this back&forth?

Relatedly, is there actually anything wrong with making a superpage
read-only mapping over some scattered read-only 4K pages?

~Andrew