From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel De Graaf Subject: Re: (4.5-rc1) Problems using xl migrate Date: Mon, 24 Nov 2014 17:05:22 -0500 Message-ID: <5473ABA2.6080901@tycho.nsa.gov> References: <20141124124143.GA11483@zion.uk.xensource.com> <54732F8E.4060507@citrix.com> <547343F4.80509@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: M A Young , Andrew Cooper Cc: Wei Liu , Ian Campbell , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On 11/24/2014 03:12 PM, M A Young wrote: > On Mon, 24 Nov 2014, Andrew Cooper wrote: >> On 24/11/14 14:32, M A Young wrote: >>> On Mon, 24 Nov 2014, Andrew Cooper wrote: >>>> On 24/11/14 12:41, Wei Liu wrote: >>>>> On Sat, Nov 22, 2014 at 07:24:21PM +0000, M A Young wrote: >>>>>> While investigating a bug reported on Red Hat Bugzilla >>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461 >>>>>> I discovered the following >>>>>> >>>>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv >>>>>> (the bug >>>>>> report is for Xen 4.3 hvm ) when xl migrate domid localhost works. >>>>>> There are >>>>>> actually two issues here >>>>>> >>>>>> * the segfault in libxl-save-helper --restore-domain (as reported >>>>>> in the bug >>>>>> above) occurs if the guest memory is 1024M (on my 4G box) and is >>>>>> presumably >>>>>> because the allocated memory eventually runs out >>>>>> >>>>>> * the segfault doesn't occur if the guest memory is 128M, but the >>>>>> migration >>>>>> still fails. The first attached file contains the log from a run >>>>>> with xl -v >>>>>> migrate --debug domid localhost (with mfn and duplicated lines >>>>>> stripped out >>>>>> to make the size manageable). >>>>>> >>>>>> I then tried xen 4.5-rc1 to see if the bug was fixed and found that xl >>>>>> migrate doesn't work for me at all - see the second attached file >>>>>> for the >>>>>> output of xl -v migrate domid localhost . >>>>>> >>>>>> Mchael Young >>>>> [...] >>>>>> xc: detail: delta 15801ms, dom0 95%, target 0%, sent 543Mb/s, >>>>>> dirtied 0Mb/s 314 pages >>>>>> xc: detail: Mapping order 0, 268; first pfn 3fcf4 >>>>>> xc: detail: delta 23ms, dom0 100%, target 0%, sent 447Mb/s, dirtied >>>>>> 0Mb/s 0 pages >>>>>> xc: detail: Start last iteration >>>>>> xc: Reloading memory pages: 262213/262144 100%xc: detail: SUSPEND >>>>>> shinfo 00082fbc >>>>>> xc: detail: delta 17ms, dom0 58%, target 58%, sent 0Mb/s, dirtied >>>>>> 1033Mb/s 536 pages >>>>>> xc: detail: delta 8ms, dom0 100%, target 0%, sent 2195Mb/s, dirtied >>>>>> 2195Mb/s 536 pages >>>>>> xc: detail: Total pages sent= 262749 (1.00x) >>>>>> xc: detail: (of which 0 were fixups) >>>>>> xc: detail: All memory is saved >>>>>> xc: error: Error querying maximum number of MSRs for VCPU0 (1 = >>>>>> Operation not permitted): Internal error >>>>> Per your description this is the output of "xl -v migrate domid >>>>> localhost", so no "--debug" is involved. (Just to make sure...) >>>>> >>>>> This error message means a domctl fails, which should be addressed >>>>> first? >>>>> >>>>> FWIW I tried "xl -v migrate domid localhost" for a PV guest it worked >>>>> for me. :-( >>>>> >>>>> Is there anything I need to do to trigger this failure? >>>> >>>> Is XSM in use? I can't think of any other reason why that hypercall >>>> would fail with EPERM. >>> >>> XSM is built in (I wanted to allow the option of people using it) but >>> I didn't think it was active. >>> >>> Michael Young >> >> I don't believe there is any concept of "available but not active", >> which probably means that the default policy is missing an entry for >> this hypercall. >> >> Can you check the hypervisor console around this failure and see whether >> a flask error concerning domctl 72 is reported? > > I do. The error is > (XEN) flask_domctl: Unknown op 72 > > Incidentally, Flask is running in permissive mode. > > Michael Young > This means that the new domctl needs to be added to the switch statement in flask/hooks.c. This error is triggered in permissive mode because it is a code error rather than a policy error (which is what permissive mode is intended to debug). It looks like neither XEN_DOMCTL_get_vcpu_msrs or XEN_DOMCTL_set_vcpu_msrs have a FLASK hook. Andrew, did you want to add these since you introduced the ops? Unless you can think of a reason why there would be a reason to split the access, I think it makes sense to reuse the permissions that are used for XEN_DOMCTL_{get,set}_ext_vcpucontext. -- Daniel De Graaf National Security Agency