From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] x86/PV: fix unintended dependency of m2p-strict mode on migration-v2 Date: Wed, 13 Jan 2016 15:25:22 +0000 Message-ID: <56966C62.5010201@citrix.com> References: <5694DEAA02000078000C5D7A@prv-mh.provo.novell.com> <5694E9BF.8090005@citrix.com> <5695279002000078000C5F80@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aJNId-0002oQ-9i for xen-devel@lists.xenproject.org; Wed, 13 Jan 2016 15:25:27 +0000 In-Reply-To: <5695279002000078000C5F80@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: xen-devel , Keir Fraser List-Id: xen-devel@lists.xenproject.org On 12/01/16 15:19, Jan Beulich wrote: >>>> On 12.01.16 at 12:55, wrote: >> On 12/01/16 10:08, Jan Beulich wrote: >>> This went unnoticed until a backport of this to an older Xen got used, >>> causing migration of guests enabling this VM assist to fail, because >>> page table pinning there preceeds vCPU context loading, and hence L4 >>> tables get initialized for the wrong mode. Fix this by post-processing >>> L4 tables when setting the intended VM assist flags for the guest. >>> >>> Note that this leaves in place a dependency on vCPU 0 getting its guest >>> context restored first, but afaict the logic here is not the only thing >>> depending on that. >>> >>> Signed-off-by: Jan Beulich >>> >>> --- a/xen/arch/x86/domain.c >>> +++ b/xen/arch/x86/domain.c >>> @@ -1067,8 +1067,48 @@ int arch_set_info_guest( >>> goto out; >>> >>> if ( v->vcpu_id == 0 ) >>> + { >>> d->vm_assist = c(vm_assist); >>> >>> + /* >>> + * In the restore case we need to deal with L4 pages which got >>> + * initialized with m2p_strict still clear (and which hence lack the >>> + * correct initial RO_MPT_VIRT_{START,END} L4 entry). >>> + */ >>> + if ( d != current->domain && VM_ASSIST(d, m2p_strict) && >>> + is_pv_domain(d) && !is_pv_32bit_domain(d) && >>> + atomic_read(&d->arch.pv_domain.nr_l4_pages) ) >>> + { >>> + bool_t done = 0; >>> + >>> + spin_lock_recursive(&d->page_alloc_lock); >>> + >>> + for ( i = 0; ; ) >>> + { >>> + struct page_info *page = page_list_remove_head(&d->page_list); >>> + >>> + if ( page_lock(page) ) >>> + { >>> + if ( (page->u.inuse.type_info & PGT_type_mask) == >>> + PGT_l4_page_table ) >>> + done = !fill_ro_mpt(page_to_mfn(page)); >>> + >>> + page_unlock(page); >>> + } >>> + >>> + page_list_add_tail(page, &d->page_list); >>> + >>> + if ( done || (!(++i & 0xff) && hypercall_preempt_check()) ) >>> + break; >>> + } >>> + >>> + spin_unlock_recursive(&d->page_alloc_lock); >>> + >>> + if ( !done ) >>> + return -ERESTART; >> This is a long loop. It is preemptible, but will incur a time delay >> proportional to the size of the domain during the VM downtime. >> >> Could you defer the loop until after %cr3 has set been set up, and only >> enter the loop if the kernel l4 table is missing the RO mappings? That >> way, domains migrated with migration v2 will skip the loop entirely. > Well, first of all this would be the result only as long as you or > someone else don't re-think and possibly move pinning ahead of > context load again. A second set_context() will unconditionally hit the loop though. > > Deferring until after CR3 got set up is - afaict - not an option, as > it would defeat the purpose of m2p-strict mode as much as doing > the fixup e.g. in the #PF handler. This mode enabled needs to > strictly mean "L4s start with the slot filled, and user-mode uses > clear it", as documented. I just meant a little further down this function, i.e. gate the loop on whether v->arch.guest_table has inappropriate m2p strictness. ~Andrew