From: Jason Andryuk <andryuk@aero.org>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: xen-devel@lists.xen.org, Ian Jackson <ian.jackson@eu.citrix.com>,
Ian Campbell <ian.campbell@citrix.com>,
Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Subject: Re: [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers
Date: Mon, 19 May 2014 08:07:10 -0400 [thread overview]
Message-ID: <5379F3EE.4050809@aero.org> (raw)
In-Reply-To: <5379D0C7.6020309@citrix.com>
On 5/19/2014 5:37 AM, Andrew Cooper wrote:
> On 17/05/14 17:01, Jason Andryuk wrote:
>> xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend.
>> However, nothing verifies the state before modify_returncode() modifies
>> the domain's registers. This will crash guest processes or the kernel
>> itself.
>>
>> This can be demonstrated with `LIBXL_SAVE_HELPER=/bin/false xl migrate`.
>>
>> Signed-off-by: Jason Andryuk <andryuk@aero.org>
>
> Hmm.
>
> There is no possible way whatsoever that migration can work if a PV
> guest is not in SHUTDOWN_suspend. PV guests have to leave an MFN in edx
> which the toolstack rewrites with a new MFN on resume.
>
> By default, there is no need for knowledge from the HVM guest for
> migrate. XenServer is perfectly capable of migrating HVM VMs without PV
> drivers. I suspect therefore that we never use cooperative resume.
I've only used 64-bit PV domUs, so I haven't really thought about HVM. If info.shutdown_reason == SHUTDOWN_suspend is expected for all HVM cases, then the hunk can stand. Otherwise it should be moved later after HVM without PV drivers has exited.
> This cooperative resume which modifies guest register state therefore
> imposes the same SHUTDOWN_suspend restriction on HVM guests as it does
> for PV guests. As a result, your patch below is correct as a fallback
> safety measure, and should be taken.
>
> However the caller of modify_returncode is also at fault for attempting
> to resume an already-running domain. I think there needs to be a bugfix
> there as well. I presume that some piece of code is assuming that
> despite libxl-save-helper failing, xc_domain_safe() paused the guest,
> which is clearly not true in this case.
Agreed. modify_returncode was already making the call to xc_domain_info (and doing the damage), so adding a check there was easy.
The patch was posted RFC since I was looking for guidance on whether xc_domain_resume on a running domain is an error or should it be treated as success? The original modify_returncode returns 0 on success or -1 on error. This patch returns 1 for the already running case. This could be handled differently by the caller to bypass XEN_DOMCTL_resumedomain without returning an error.
-Jason
> ~Andrew
>
>> ---
>>
>> This change stops xc_domain_resume from killing my domUs on a failed
>> migration. I'm using a wrapper around libxl-save-helper which may fail
>> before libxl-save-helper is invoked, so xc_domain_save has not been
>> called. The idle Linux domU kernels would BUG coming out of
>> SCHEDOP_block in xen_safe_halt() since modify_returncode set EAX to 1.
>> journald was also observed to segfault.
>>
>> As written, this code treats calling xc_domain_resume on a running
>> domain as an error. Do we want it silently ignored? Output with this
>> patch looks like:
>>
>> """
>> Migration failed, resuming at sender.
>> xc: error: Domain not in suspended state: Internal error
>> libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed for domain 92: Interrupted system call
>> """
>>
>> libxl__domain_resume prints errno, but it is stale for this case.
>> xc_domain_resume_cooperative could swallow modify_returncode's error,
>> bypass issuing XEN_DOMCTL_resumedomain, and return success to avoid the
>> libxl error message.
>>
>> ---
>> tools/libxc/xc_resume.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
>> index 18b4818..9ec6a59 100644
>> --- a/tools/libxc/xc_resume.c
>> +++ b/tools/libxc/xc_resume.c
>> @@ -39,6 +39,12 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
>> return -1;
>> }
>>
>> + if ( !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
>> + {
>> + ERROR("Domain not in suspended state");
>> + return 1;
>> + }
>> +
>> if ( info.hvm )
>> {
>> /* HVM guests without PV drivers have no return code to modify. */
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
next prev parent reply other threads:[~2014-05-19 12:07 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-17 16:01 [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers Jason Andryuk
2014-05-19 9:37 ` Andrew Cooper
2014-05-19 9:44 ` Andrew Cooper
2014-05-19 12:07 ` Jason Andryuk [this message]
2014-05-19 12:26 ` Andrew Cooper
2014-05-19 11:35 ` Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5379F3EE.4050809@aero.org \
--to=andryuk@aero.org \
--cc=andrew.cooper3@citrix.com \
--cc=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=stefano.stabellini@eu.citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.