From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Vrabel <david.vrabel@citrix.com>
Subject: Re: [PATCHv6 2/2] x86/ept: defer the invalidation until
 the p2m lock is released
Date: Tue, 22 Dec 2015 14:20:01 +0000
Message-ID: <56795C11.4050204@citrix.com>
References: <1450446634-8762-1-git-send-email-david.vrabel@citrix.com>
	<1450446634-8762-3-git-send-email-david.vrabel@citrix.com>
	<567940A4.6040900@citrix.com> <567957A8.6020702@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Received: from mail6.bemta4.messagelabs.com ([85.158.143.247])
	by lists.xen.org with esmtp (Exim 4.72)
	(envelope-from <prvs=791ab8ba9=david.vrabel@citrix.com>)
	id 1aBNnL-0006yc-51
	for xen-devel@lists.xenproject.org; Tue, 22 Dec 2015 14:20:07 +0000
In-Reply-To: <567957A8.6020702@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Andrew Cooper <andrew.cooper3@citrix.com>, George Dunlap <george.dunlap@citrix.com>, David Vrabel <david.vrabel@citrix.com>, xen-devel@lists.xenproject.org
Cc: George Dunlap <george.dunlap@eu.citrix.com>, Kevin Tian <kevin.tian@intel.com>, Tim Deegan <tim@xen.org>, Jun Nakajima <jun.nakajima@intel.com>, Jan Beulich <jbeulich@suse.com>
List-Id: xen-devel@lists.xenproject.org

On 22/12/15 14:01, Andrew Cooper wrote:
> On 22/12/15 12:23, George Dunlap wrote:
>> On 18/12/15 13:50, David Vrabel wrote:
>>> Holding the p2m lock while calling ept_sync_domain() is very expensive
>>> since it does a on_selected_cpus() call.  IPIs on many socket machines
>>> can be very slows and on_selected_cpus() is serialized.
>>>
>>> It is safe to defer the invalidate until the p2m lock is released
>>> except for two cases:
>>>
>>> 1. When freeing a page table page (since partial translations may be
>>>    cached).
>>> 2. When reclaiming a zero page as part of PoD.
>>>
>>> For these cases, add p2m_tlb_flush_sync() calls which will immediately
>>> perform the invalidate before the page is freed or reclaimed.
>> There are at least two other places in the PoD code where the "remove ->
>> check -> add to cache -> unlock" pattern exist; and it looks to me like
>> there are other places where races might occur (e.g.,
>> p2m_paging_evict(), which does remove -> scrub -> put -> unlock;
>> p2m_altp2m_propagate_change(), which does remove -> put -> unlock).
>>
>> Part of me wonders whether, rather than making callers that need it
>> remember to do a flush, it might be better to explicitly pass in
>> P2M_FLUSH or P2M_CAN_DEFER when calling p2m_set_entry, just to make
>> people think about the fact that the p2m change may not actually take
>> effect until later.  Any thoughts on that?
>>
>> Comments on the current approach inline.
>>
>>> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
>>> index c094320..43c7f1b 100644
>>> --- a/xen/arch/x86/mm/p2m-ept.c
>>> +++ b/xen/arch/x86/mm/p2m-ept.c
>>> @@ -263,6 +263,7 @@ static void ept_free_entry(struct p2m_domain *p2m, ept_entry_t *ept_entry, int l
>>>          unmap_domain_page(epte);
>>>      }
>>>      
>>> +    p2m_tlb_flush_sync(p2m);
>>>      p2m_free_ptp(p2m, mfn_to_page(ept_entry->mfn));
>> It's probably worth a comment here pointing out that even if this
>> function is called several times (e.g., if you replace a load of 4k
>> entries with a 1G entry), the actual flush will only happen the first time.
>>
>>> +static void ept_flush_and_unlock(struct p2m_domain *p2m, bool_t unlock)
>>> +{
>>> +    p2m->need_flush = 0;
>>> +    if ( unlock )
>>> +        mm_write_unlock(&p2m->lock);
>>> +    ept_sync_domain_mask(p2m, p2m->domain->domain_dirty_cpumask);
>>>  }
>> Having a function called "flush_and_unlock", with a boolean as to
>> whether to unlock or not, just seems a bit wonky.
>>
>> Wouldn't it make more sense to have the hook just named "flush_sync()",
>> and move the unlocking out in the generic p2m code (where you already
>> have the check for need_flush)?
>>
>>> diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
>>> index fa46dd9..9c394c2 100644
>>> --- a/xen/include/asm-x86/p2m.h
>>> +++ b/xen/include/asm-x86/p2m.h
>>> @@ -261,6 +261,10 @@ struct p2m_domain {
>>>                                            unsigned long gfn, l1_pgentry_t *p,
>>>                                            l1_pgentry_t new, unsigned int level);
>>>      long               (*audit_p2m)(struct p2m_domain *p2m);
>>> +    void               (*flush_and_unlock)(struct p2m_domain *p2m, bool_t unlock);
>>> +
>>> +    unsigned int defer_flush;
>>> +    bool_t need_flush;
>> It's probably worth a comment that at the moment calling
>> flush_and_unlock() is gated on need_flush; so it's OK not to implement
>> flush_and_unlock() as long as you never set need_flush.
> 
> This is just one small accident (in code elsewhere) away from a code
> injection vulnerability.
> 
> Either we should require that all function pointers are filled in, or
> BUG() if the pointer is missing when we attempt to use it.

Jan asked for the call to be conditional on need_flush and to not test
flush_and_unlock.

David