* [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers
@ 2014-05-17 16:01 Jason Andryuk
2014-05-19 9:37 ` Andrew Cooper
2014-05-19 11:35 ` Ian Jackson
0 siblings, 2 replies; 6+ messages in thread
From: Jason Andryuk @ 2014-05-17 16:01 UTC (permalink / raw)
To: xen-devel; +Cc: Jason Andryuk, Ian Jackson, Ian Campbell, Stefano Stabellini
xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend.
However, nothing verifies the state before modify_returncode() modifies
the domain's registers. This will crash guest processes or the kernel
itself.
This can be demonstrated with `LIBXL_SAVE_HELPER=/bin/false xl migrate`.
Signed-off-by: Jason Andryuk <andryuk@aero.org>
---
This change stops xc_domain_resume from killing my domUs on a failed
migration. I'm using a wrapper around libxl-save-helper which may fail
before libxl-save-helper is invoked, so xc_domain_save has not been
called. The idle Linux domU kernels would BUG coming out of
SCHEDOP_block in xen_safe_halt() since modify_returncode set EAX to 1.
journald was also observed to segfault.
As written, this code treats calling xc_domain_resume on a running
domain as an error. Do we want it silently ignored? Output with this
patch looks like:
"""
Migration failed, resuming at sender.
xc: error: Domain not in suspended state: Internal error
libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed for domain 92: Interrupted system call
"""
libxl__domain_resume prints errno, but it is stale for this case.
xc_domain_resume_cooperative could swallow modify_returncode's error,
bypass issuing XEN_DOMCTL_resumedomain, and return success to avoid the
libxl error message.
---
tools/libxc/xc_resume.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
index 18b4818..9ec6a59 100644
--- a/tools/libxc/xc_resume.c
+++ b/tools/libxc/xc_resume.c
@@ -39,6 +39,12 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
return -1;
}
+ if ( !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
+ {
+ ERROR("Domain not in suspended state");
+ return 1;
+ }
+
if ( info.hvm )
{
/* HVM guests without PV drivers have no return code to modify. */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers
2014-05-17 16:01 [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers Jason Andryuk
@ 2014-05-19 9:37 ` Andrew Cooper
2014-05-19 9:44 ` Andrew Cooper
2014-05-19 12:07 ` Jason Andryuk
2014-05-19 11:35 ` Ian Jackson
1 sibling, 2 replies; 6+ messages in thread
From: Andrew Cooper @ 2014-05-19 9:37 UTC (permalink / raw)
To: Jason Andryuk; +Cc: Stefano Stabellini, Ian Jackson, Ian Campbell, xen-devel
On 17/05/14 17:01, Jason Andryuk wrote:
> xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend.
> However, nothing verifies the state before modify_returncode() modifies
> the domain's registers. This will crash guest processes or the kernel
> itself.
>
> This can be demonstrated with `LIBXL_SAVE_HELPER=/bin/false xl migrate`.
>
> Signed-off-by: Jason Andryuk <andryuk@aero.org>
Hmm.
There is no possible way whatsoever that migration can work if a PV
guest is not in SHUTDOWN_suspend. PV guests have to leave an MFN in edx
which the toolstack rewrites with a new MFN on resume.
By default, there is no need for knowledge from the HVM guest for
migrate. XenServer is perfectly capable of migrating HVM VMs without PV
drivers. I suspect therefore that we never use cooperative resume.
This cooperative resume which modifies guest register state therefore
imposes the same SHUTDOWN_suspend restriction on HVM guests as it does
for PV guests. As a result, your patch below is correct as a fallback
safety measure, and should be taken.
However the caller of modify_returncode is also at fault for attempting
to resume an already-running domain. I think there needs to be a bugfix
there as well. I presume that some piece of code is assuming that
despite libxl-save-helper failing, xc_domain_safe() paused the guest,
which is clearly not true in this case.
~Andrew
> ---
>
> This change stops xc_domain_resume from killing my domUs on a failed
> migration. I'm using a wrapper around libxl-save-helper which may fail
> before libxl-save-helper is invoked, so xc_domain_save has not been
> called. The idle Linux domU kernels would BUG coming out of
> SCHEDOP_block in xen_safe_halt() since modify_returncode set EAX to 1.
> journald was also observed to segfault.
>
> As written, this code treats calling xc_domain_resume on a running
> domain as an error. Do we want it silently ignored? Output with this
> patch looks like:
>
> """
> Migration failed, resuming at sender.
> xc: error: Domain not in suspended state: Internal error
> libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed for domain 92: Interrupted system call
> """
>
> libxl__domain_resume prints errno, but it is stale for this case.
> xc_domain_resume_cooperative could swallow modify_returncode's error,
> bypass issuing XEN_DOMCTL_resumedomain, and return success to avoid the
> libxl error message.
>
> ---
> tools/libxc/xc_resume.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
> index 18b4818..9ec6a59 100644
> --- a/tools/libxc/xc_resume.c
> +++ b/tools/libxc/xc_resume.c
> @@ -39,6 +39,12 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
> return -1;
> }
>
> + if ( !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
> + {
> + ERROR("Domain not in suspended state");
> + return 1;
> + }
> +
> if ( info.hvm )
> {
> /* HVM guests without PV drivers have no return code to modify. */
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers
2014-05-19 9:37 ` Andrew Cooper
@ 2014-05-19 9:44 ` Andrew Cooper
2014-05-19 12:07 ` Jason Andryuk
1 sibling, 0 replies; 6+ messages in thread
From: Andrew Cooper @ 2014-05-19 9:44 UTC (permalink / raw)
To: Jason Andryuk; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini
On 19/05/14 10:37, Andrew Cooper wrote:
> On 17/05/14 17:01, Jason Andryuk wrote:
>> xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend.
>> However, nothing verifies the state before modify_returncode() modifies
>> the domain's registers. This will crash guest processes or the kernel
>> itself.
>>
>> This can be demonstrated with `LIBXL_SAVE_HELPER=/bin/false xl migrate`.
>>
>> Signed-off-by: Jason Andryuk <andryuk@aero.org>
> Hmm.
>
> There is no possible way whatsoever that migration can work if a PV
> guest is not in SHUTDOWN_suspend. PV guests have to leave an MFN in edx
> which the toolstack rewrites with a new MFN on resume.
>
> By default, there is no need for knowledge from the HVM guest for
> migrate. XenServer is perfectly capable of migrating HVM VMs without PV
> drivers. I suspect therefore that we never use cooperative resume.
>
> This cooperative resume which modifies guest register state therefore
> imposes the same SHUTDOWN_suspend restriction on HVM guests as it does
> for PV guests. As a result, your patch below is correct as a fallback
> safety measure, and should be taken.
>
> However the caller of modify_returncode is also at fault for attempting
> to resume an already-running domain. I think there needs to be a bugfix
> there as well. I presume that some piece of code is assuming that
> despite libxl-save-helper failing, xc_domain_safe() paused the guest,
> which is clearly not true in this case.
>
> ~Andrew
And here, I actually mean xc_domain_save()
~Andrew
>
>> ---
>>
>> This change stops xc_domain_resume from killing my domUs on a failed
>> migration. I'm using a wrapper around libxl-save-helper which may fail
>> before libxl-save-helper is invoked, so xc_domain_save has not been
>> called. The idle Linux domU kernels would BUG coming out of
>> SCHEDOP_block in xen_safe_halt() since modify_returncode set EAX to 1.
>> journald was also observed to segfault.
>>
>> As written, this code treats calling xc_domain_resume on a running
>> domain as an error. Do we want it silently ignored? Output with this
>> patch looks like:
>>
>> """
>> Migration failed, resuming at sender.
>> xc: error: Domain not in suspended state: Internal error
>> libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed for domain 92: Interrupted system call
>> """
>>
>> libxl__domain_resume prints errno, but it is stale for this case.
>> xc_domain_resume_cooperative could swallow modify_returncode's error,
>> bypass issuing XEN_DOMCTL_resumedomain, and return success to avoid the
>> libxl error message.
>>
>> ---
>> tools/libxc/xc_resume.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
>> index 18b4818..9ec6a59 100644
>> --- a/tools/libxc/xc_resume.c
>> +++ b/tools/libxc/xc_resume.c
>> @@ -39,6 +39,12 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
>> return -1;
>> }
>>
>> + if ( !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
>> + {
>> + ERROR("Domain not in suspended state");
>> + return 1;
>> + }
>> +
>> if ( info.hvm )
>> {
>> /* HVM guests without PV drivers have no return code to modify. */
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers
2014-05-17 16:01 [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers Jason Andryuk
2014-05-19 9:37 ` Andrew Cooper
@ 2014-05-19 11:35 ` Ian Jackson
1 sibling, 0 replies; 6+ messages in thread
From: Ian Jackson @ 2014-05-19 11:35 UTC (permalink / raw)
To: Jason Andryuk; +Cc: Stefano Stabellini, Ian Campbell, xen-devel
Jason Andryuk writes ("[PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers"):
> xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend.
> However, nothing verifies the state before modify_returncode() modifies
> the domain's registers. This will crash guest processes or the kernel
> itself.
I think this patch is a step in the right direction and would have
applied it if you hadn't tagged it "RFC".
Thanks,
Ian.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers
2014-05-19 9:37 ` Andrew Cooper
2014-05-19 9:44 ` Andrew Cooper
@ 2014-05-19 12:07 ` Jason Andryuk
2014-05-19 12:26 ` Andrew Cooper
1 sibling, 1 reply; 6+ messages in thread
From: Jason Andryuk @ 2014-05-19 12:07 UTC (permalink / raw)
To: Andrew Cooper; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini
On 5/19/2014 5:37 AM, Andrew Cooper wrote:
> On 17/05/14 17:01, Jason Andryuk wrote:
>> xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend.
>> However, nothing verifies the state before modify_returncode() modifies
>> the domain's registers. This will crash guest processes or the kernel
>> itself.
>>
>> This can be demonstrated with `LIBXL_SAVE_HELPER=/bin/false xl migrate`.
>>
>> Signed-off-by: Jason Andryuk <andryuk@aero.org>
>
> Hmm.
>
> There is no possible way whatsoever that migration can work if a PV
> guest is not in SHUTDOWN_suspend. PV guests have to leave an MFN in edx
> which the toolstack rewrites with a new MFN on resume.
>
> By default, there is no need for knowledge from the HVM guest for
> migrate. XenServer is perfectly capable of migrating HVM VMs without PV
> drivers. I suspect therefore that we never use cooperative resume.
I've only used 64-bit PV domUs, so I haven't really thought about HVM. If info.shutdown_reason == SHUTDOWN_suspend is expected for all HVM cases, then the hunk can stand. Otherwise it should be moved later after HVM without PV drivers has exited.
> This cooperative resume which modifies guest register state therefore
> imposes the same SHUTDOWN_suspend restriction on HVM guests as it does
> for PV guests. As a result, your patch below is correct as a fallback
> safety measure, and should be taken.
>
> However the caller of modify_returncode is also at fault for attempting
> to resume an already-running domain. I think there needs to be a bugfix
> there as well. I presume that some piece of code is assuming that
> despite libxl-save-helper failing, xc_domain_safe() paused the guest,
> which is clearly not true in this case.
Agreed. modify_returncode was already making the call to xc_domain_info (and doing the damage), so adding a check there was easy.
The patch was posted RFC since I was looking for guidance on whether xc_domain_resume on a running domain is an error or should it be treated as success? The original modify_returncode returns 0 on success or -1 on error. This patch returns 1 for the already running case. This could be handled differently by the caller to bypass XEN_DOMCTL_resumedomain without returning an error.
-Jason
> ~Andrew
>
>> ---
>>
>> This change stops xc_domain_resume from killing my domUs on a failed
>> migration. I'm using a wrapper around libxl-save-helper which may fail
>> before libxl-save-helper is invoked, so xc_domain_save has not been
>> called. The idle Linux domU kernels would BUG coming out of
>> SCHEDOP_block in xen_safe_halt() since modify_returncode set EAX to 1.
>> journald was also observed to segfault.
>>
>> As written, this code treats calling xc_domain_resume on a running
>> domain as an error. Do we want it silently ignored? Output with this
>> patch looks like:
>>
>> """
>> Migration failed, resuming at sender.
>> xc: error: Domain not in suspended state: Internal error
>> libxl: error: libxl.c:402:libxl__domain_resume: xc_domain_resume failed for domain 92: Interrupted system call
>> """
>>
>> libxl__domain_resume prints errno, but it is stale for this case.
>> xc_domain_resume_cooperative could swallow modify_returncode's error,
>> bypass issuing XEN_DOMCTL_resumedomain, and return success to avoid the
>> libxl error message.
>>
>> ---
>> tools/libxc/xc_resume.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/tools/libxc/xc_resume.c b/tools/libxc/xc_resume.c
>> index 18b4818..9ec6a59 100644
>> --- a/tools/libxc/xc_resume.c
>> +++ b/tools/libxc/xc_resume.c
>> @@ -39,6 +39,12 @@ static int modify_returncode(xc_interface *xch, uint32_t domid)
>> return -1;
>> }
>>
>> + if ( !info.shutdown || (info.shutdown_reason != SHUTDOWN_suspend) )
>> + {
>> + ERROR("Domain not in suspended state");
>> + return 1;
>> + }
>> +
>> if ( info.hvm )
>> {
>> /* HVM guests without PV drivers have no return code to modify. */
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers
2014-05-19 12:07 ` Jason Andryuk
@ 2014-05-19 12:26 ` Andrew Cooper
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Cooper @ 2014-05-19 12:26 UTC (permalink / raw)
To: Jason Andryuk; +Cc: xen-devel, Ian Jackson, Ian Campbell, Stefano Stabellini
On 19/05/14 13:07, Jason Andryuk wrote:
> On 5/19/2014 5:37 AM, Andrew Cooper wrote:
>> On 17/05/14 17:01, Jason Andryuk wrote:
>>> xc_domain_resume() expects the guest to be in state SHUTDOWN_suspend.
>>> However, nothing verifies the state before modify_returncode() modifies
>>> the domain's registers. This will crash guest processes or the kernel
>>> itself.
>>>
>>> This can be demonstrated with `LIBXL_SAVE_HELPER=/bin/false xl migrate`.
>>>
>>> Signed-off-by: Jason Andryuk <andryuk@aero.org>
>> Hmm.
>>
>> There is no possible way whatsoever that migration can work if a PV
>> guest is not in SHUTDOWN_suspend. PV guests have to leave an MFN in edx
>> which the toolstack rewrites with a new MFN on resume.
>>
>> By default, there is no need for knowledge from the HVM guest for
>> migrate. XenServer is perfectly capable of migrating HVM VMs without PV
>> drivers. I suspect therefore that we never use cooperative resume.
> I've only used 64-bit PV domUs, so I haven't really thought about HVM. If info.shutdown_reason == SHUTDOWN_suspend is expected for all HVM cases, then the hunk can stand. Otherwise it should be moved later after HVM without PV drivers has exited.
I disagree. It is unconditionally an error to be in this function with
a guest which is not in SHUTDOWN_suspend, even if there is a codepath
through the function which would not corrupt state. I would leave the
hunk as it stands, although you might consider setting errno so EINVAL
(or something more appropriate).
>
>> This cooperative resume which modifies guest register state therefore
>> imposes the same SHUTDOWN_suspend restriction on HVM guests as it does
>> for PV guests. As a result, your patch below is correct as a fallback
>> safety measure, and should be taken.
>>
>> However the caller of modify_returncode is also at fault for attempting
>> to resume an already-running domain. I think there needs to be a bugfix
>> there as well. I presume that some piece of code is assuming that
>> despite libxl-save-helper failing, xc_domain_safe() paused the guest,
>> which is clearly not true in this case.
> Agreed. modify_returncode was already making the call to xc_domain_info (and doing the damage), so adding a check there was easy.
>
> The patch was posted RFC since I was looking for guidance on whether xc_domain_resume on a running domain is an error or should it be treated as success? The original modify_returncode returns 0 on success or -1 on error. This patch returns 1 for the already running case. This could be handled differently by the caller to bypass XEN_DOMCTL_resumedomain without returning an error.
>
> -Jason
I still think the real problem is higher up. The toolstack absolutely
shouldn't running xc_domain_resume() on a running domain.
~Andrew
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-05-19 12:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-17 16:01 [PATCH RFC] libxc: Protect xc_domain_resume from clobbering domain registers Jason Andryuk
2014-05-19 9:37 ` Andrew Cooper
2014-05-19 9:44 ` Andrew Cooper
2014-05-19 12:07 ` Jason Andryuk
2014-05-19 12:26 ` Andrew Cooper
2014-05-19 11:35 ` Ian Jackson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.