* [PATCH] x86/domctl: Adjust size calculations for XEN_DOMCTL_get{_ext_vcpucontext, vcpuextstate}
@ 2014-04-28 9:43 Andrew Cooper
2014-04-28 10:34 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2014-04-28 9:43 UTC (permalink / raw)
To: Xen-devel
Cc: Ian Jackson, Andrew Cooper, Keir Fraser, Ian Campbell,
Jan Beulich
When teaching valgrind about XEN_DOMCTL_getvcpuextstate, I discovered that
once the hypercall has happened, there is no indication in the structure as to
whether Xen wrote to the buffer or not, as Xen unconditionally clobbers the
size field. While this can be argued as an issue in valgrind (the post-action
handler does not have the pre-action state to hand), it is contrary to the
prevailing style in Xen of using a NULL guest handle as a request for size.
This patch explicitly adds support in Xen for a NULL guest to indicate a
request for size. None of the in-tree callers have their behaviour changed by
this, and no toolstack should reasonably expect to have a usable guest handle
at 0.
XEN_DOMCTL_get_ext_vcpucontext suffers from the same issue but while trying to
fix that in similar way, I discovered that it had a genuine bug when returning
the count of MSRs to the toolstack. When running the hypercall on an active
vcpu, the vcpu can arbitrarily alter the count returned to the toolstack by
clearing and setting relevant MSRs.
As this behaviour is little use in practice and, because at the time of
writing this patch the implementation of this part of the hypercall is still
in staging without any toolstack users, I am taking this opportunity to fix
the behaviour.
The behaviour is now as follows:
* A NULL guest handle for 'msrs' is a request for msr_count.
* A hypercall with insufficient size will fail with -ENOBUFS. The previous
behaviour of being able to fill a partial buffer with some of the MSRs
serves no practical purpose given the unordered nature of the contents.
* After writing MSRs to the buffer, Xen will update the size field with the
number of MSRs written, which may be fewer than the maximum for empty
MSRs.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
---
xen/arch/x86/domctl.c | 58 +++++++++++++++++++++++++++----------------
xen/include/public/domctl.h | 16 ++++++------
2 files changed, 46 insertions(+), 28 deletions(-)
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index ae29a56..cdf5c2b 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -830,6 +830,8 @@ long arch_do_domctl(
if ( domctl->cmd == XEN_DOMCTL_get_ext_vcpucontext )
{
+ uint16_t nr_msrs = 0;
+
if ( v == current ) /* no vcpu_pause() */
break;
@@ -865,41 +867,54 @@ long arch_do_domctl(
evc->vmce.mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2;
evc->vmce.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2;
- i = ret = 0;
+ /* Count maximum number of optional msrs */
if ( boot_cpu_has(X86_FEATURE_DBEXT) )
+ nr_msrs += 4;
+
+ if ( guest_handle_is_null(evc->msrs) ||
+ (evc->msr_count < nr_msrs) )
+ {
+ evc->msr_count = nr_msrs;
+ ret = guest_handle_is_null(evc->msrs) ? 0 : -ENOBUFS;
+ }
+ else
{
- unsigned int j;
+ i = 0;
- if ( v->arch.pv_vcpu.dr_mask[0] )
+ if ( boot_cpu_has(X86_FEATURE_DBEXT) )
{
- if ( i < evc->msr_count && !ret )
+ unsigned int j;
+
+ if ( v->arch.pv_vcpu.dr_mask[0] )
{
msr.index = MSR_AMD64_DR0_ADDRESS_MASK;
msr.reserved = 0;
msr.value = v->arch.pv_vcpu.dr_mask[0];
if ( copy_to_guest_offset(evc->msrs, i, &msr, 1) )
ret = -EFAULT;
+ else
+ ++i;
}
- ++i;
- }
- for ( j = 0; j < 3; ++j )
- {
- if ( !v->arch.pv_vcpu.dr_mask[1 + j] )
- continue;
- if ( i < evc->msr_count && !ret )
+ for ( j = 0; j < 3; ++j )
{
- msr.index = MSR_AMD64_DR1_ADDRESS_MASK + j;
- msr.reserved = 0;
- msr.value = v->arch.pv_vcpu.dr_mask[1 + j];
- if ( copy_to_guest_offset(evc->msrs, i, &msr, 1) )
- ret = -EFAULT;
+ if ( !v->arch.pv_vcpu.dr_mask[1 + j] )
+ continue;
+ if ( !ret )
+ {
+ msr.index = MSR_AMD64_DR1_ADDRESS_MASK + j;
+ msr.reserved = 0;
+ msr.value = v->arch.pv_vcpu.dr_mask[1 + j];
+ if ( copy_to_guest_offset(evc->msrs, i, &msr, 1) )
+ ret = -EFAULT;
+ else
+ ++i;
+ }
}
- ++i;
}
+ evc->msr_count = i;
+ /* Check we didn't lie to userspace then overflow their buffer */
+ BUG_ON(i > nr_msrs);
}
- if ( i > evc->msr_count && !ret )
- ret = -ENOBUFS;
- evc->msr_count = i;
vcpu_unpause(v);
copyback = 1;
@@ -1177,7 +1192,8 @@ long arch_do_domctl(
{
unsigned int size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
- if ( !evc->size && !evc->xfeature_mask )
+ if ( (evc->size == 0 && evc->xfeature_mask == 0) ||
+ guest_handle_is_null(domctl->u.vcpuextstate.buffer) )
{
evc->xfeature_mask = xfeature_mask;
evc->size = size;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 1b75ab2..e39e436 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -594,11 +594,13 @@ struct xen_domctl_ext_vcpucontext {
uint8_t syscall32_disables_events;
uint8_t sysenter_disables_events;
/*
- * When, for the "get" version, msr_count is too small to cover all MSRs
- * the hypervisor needs to be saved, the call will return -ENOBUFS and
- * set msr_count to the required (minimum) value. Furthermore, for both
- * "get" and "set", that field as well as the msrs one only get looked at
- * if the size field above covers the structure up to the entire msrs one.
+ * Number of msr entries pointed to by the 'msrs' guest handle.
+ *
+ * For the "get" subop, if the guest handle is NULL, Xen shall write back
+ * the maximum number of msrs it might save. If msr_count is fewer than
+ * the maximum, Xen shall return -ENOBUFS. When Xen actually writes into
+ * the buffer, this field shall reflect the actual number of msrs written
+ * which might be fewer than the maximum, if some MSRs are not in use.
*/
uint16_t msr_count;
#if defined(__GNUC__)
@@ -868,8 +870,8 @@ struct xen_domctl_vcpuextstate {
*/
uint64_aligned_t xfeature_mask;
/*
- * SET: Size of struct (IN)
- * GET: Size of struct (IN/OUT)
+ * Size (in bytes) of 'buffer'. For the "get" subop, if the guest handle
+ * is NULL, Xen shall write back the maximum size required.
*/
uint64_aligned_t size;
XEN_GUEST_HANDLE_64(uint64) buffer;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] x86/domctl: Adjust size calculations for XEN_DOMCTL_get{_ext_vcpucontext, vcpuextstate}
2014-04-28 9:43 [PATCH] x86/domctl: Adjust size calculations for XEN_DOMCTL_get{_ext_vcpucontext, vcpuextstate} Andrew Cooper
@ 2014-04-28 10:34 ` Jan Beulich
2014-04-28 10:59 ` Andrew Cooper
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2014-04-28 10:34 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Keir Fraser, IanJackson, Ian Campbell, Xen-devel
>>> On 28.04.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
> XEN_DOMCTL_get_ext_vcpucontext suffers from the same issue but while trying to
> fix that in similar way, I discovered that it had a genuine bug when returning
> the count of MSRs to the toolstack. When running the hypercall on an active
> vcpu, the vcpu can arbitrarily alter the count returned to the toolstack by
> clearing and setting relevant MSRs.
Did you perhaps overlook the vcpu_pause() there?
> @@ -865,41 +867,54 @@ long arch_do_domctl(
> evc->vmce.mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2;
> evc->vmce.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2;
>
> - i = ret = 0;
> + /* Count maximum number of optional msrs */
> if ( boot_cpu_has(X86_FEATURE_DBEXT) )
> + nr_msrs += 4;
> +
> + if ( guest_handle_is_null(evc->msrs) ||
> + (evc->msr_count < nr_msrs) )
> + {
> + evc->msr_count = nr_msrs;
> + ret = guest_handle_is_null(evc->msrs) ? 0 : -ENOBUFS;
> + }
Won't this, with the migration part still not implemented in libxc,
result in guests using any of these getting migrated with a non-
zero MSR count, rather than the migration failing on the sender
side?
I'm also not really in favor of forcing the tools to allocate memory
for the array if in fact no MSRs are being used by the guest.
> @@ -1177,7 +1192,8 @@ long arch_do_domctl(
> {
> unsigned int size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
>
> - if ( !evc->size && !evc->xfeature_mask )
> + if ( (evc->size == 0 && evc->xfeature_mask == 0) ||
> + guest_handle_is_null(domctl->u.vcpuextstate.buffer) )
And strong reason not to keep the shorter ! operators?
> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -594,11 +594,13 @@ struct xen_domctl_ext_vcpucontext {
> uint8_t syscall32_disables_events;
> uint8_t sysenter_disables_events;
> /*
> - * When, for the "get" version, msr_count is too small to cover all MSRs
> - * the hypervisor needs to be saved, the call will return -ENOBUFS and
> - * set msr_count to the required (minimum) value. Furthermore, for both
> - * "get" and "set", that field as well as the msrs one only get looked at
> - * if the size field above covers the structure up to the entire msrs one.
> + * Number of msr entries pointed to by the 'msrs' guest handle.
> + *
> + * For the "get" subop, if the guest handle is NULL, Xen shall write back
> + * the maximum number of msrs it might save. If msr_count is fewer than
I think there's "If the handle is non-NULL" missing at the beginning of
the second sentence.
Jan
> + * the maximum, Xen shall return -ENOBUFS. When Xen actually writes into
> + * the buffer, this field shall reflect the actual number of msrs written
> + * which might be fewer than the maximum, if some MSRs are not in use.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86/domctl: Adjust size calculations for XEN_DOMCTL_get{_ext_vcpucontext, vcpuextstate}
2014-04-28 10:34 ` Jan Beulich
@ 2014-04-28 10:59 ` Andrew Cooper
2014-04-28 11:37 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2014-04-28 10:59 UTC (permalink / raw)
To: Jan Beulich; +Cc: Keir Fraser, IanJackson, Ian Campbell, Xen-devel
On 28/04/14 11:34, Jan Beulich wrote:
>>>> On 28.04.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
>> XEN_DOMCTL_get_ext_vcpucontext suffers from the same issue but while trying to
>> fix that in similar way, I discovered that it had a genuine bug when returning
>> the count of MSRs to the toolstack. When running the hypercall on an active
>> vcpu, the vcpu can arbitrarily alter the count returned to the toolstack by
>> clearing and setting relevant MSRs.
> Did you perhaps overlook the vcpu_pause() there?
There is a vcpu pause in the hypercall, so for the duration of the
hypercall the returned value will be consistent.
However without the toolstack pausing the domain, issuing this hypercall
twice, first to get the size and second to get the data might still
result in -ENOBUFS if the vcpu suddenly writes non-0 values to the MSRs.
>
>> @@ -865,41 +867,54 @@ long arch_do_domctl(
>> evc->vmce.mci_ctl2_bank0 = v->arch.vmce.bank[0].mci_ctl2;
>> evc->vmce.mci_ctl2_bank1 = v->arch.vmce.bank[1].mci_ctl2;
>>
>> - i = ret = 0;
>> + /* Count maximum number of optional msrs */
>> if ( boot_cpu_has(X86_FEATURE_DBEXT) )
>> + nr_msrs += 4;
>> +
>> + if ( guest_handle_is_null(evc->msrs) ||
>> + (evc->msr_count < nr_msrs) )
>> + {
>> + evc->msr_count = nr_msrs;
>> + ret = guest_handle_is_null(evc->msrs) ? 0 : -ENOBUFS;
>> + }
> Won't this, with the migration part still not implemented in libxc,
> result in guests using any of these getting migrated with a non-
> zero MSR count, rather than the migration failing on the sender
> side?
Hmm. As this lack of msrs is only transitory, I could patch
xc_domain_save() to explicitly fail if it finds msr_count != 0.
The current state of play with migration v2 is that we have PV and HVM
migration working and I am currently collating the work into a series
for submission.
>
> I'm also not really in favor of forcing the tools to allocate memory
> for the array if in fact no MSRs are being used by the guest.
If there are no msrs to receive, then passing a NULL guest handle is
still fine.
>
>> @@ -1177,7 +1192,8 @@ long arch_do_domctl(
>> {
>> unsigned int size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
>>
>> - if ( !evc->size && !evc->xfeature_mask )
>> + if ( (evc->size == 0 && evc->xfeature_mask == 0) ||
>> + guest_handle_is_null(domctl->u.vcpuextstate.buffer) )
> And strong reason not to keep the shorter ! operators?
I personally find it clearer than using !, but can change back.
>
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -594,11 +594,13 @@ struct xen_domctl_ext_vcpucontext {
>> uint8_t syscall32_disables_events;
>> uint8_t sysenter_disables_events;
>> /*
>> - * When, for the "get" version, msr_count is too small to cover all MSRs
>> - * the hypervisor needs to be saved, the call will return -ENOBUFS and
>> - * set msr_count to the required (minimum) value. Furthermore, for both
>> - * "get" and "set", that field as well as the msrs one only get looked at
>> - * if the size field above covers the structure up to the entire msrs one.
>> + * Number of msr entries pointed to by the 'msrs' guest handle.
>> + *
>> + * For the "get" subop, if the guest handle is NULL, Xen shall write back
>> + * the maximum number of msrs it might save. If msr_count is fewer than
> I think there's "If the handle is non-NULL" missing at the beginning of
> the second sentence.
>
> Jan
Ok - that is more clear.
~Andrew
>
>> + * the maximum, Xen shall return -ENOBUFS. When Xen actually writes into
>> + * the buffer, this field shall reflect the actual number of msrs written
>> + * which might be fewer than the maximum, if some MSRs are not in use.
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86/domctl: Adjust size calculations for XEN_DOMCTL_get{_ext_vcpucontext, vcpuextstate}
2014-04-28 10:59 ` Andrew Cooper
@ 2014-04-28 11:37 ` Jan Beulich
2014-04-28 12:26 ` Andrew Cooper
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2014-04-28 11:37 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Keir Fraser, IanJackson, Ian Campbell, Xen-devel
>>> On 28.04.14 at 12:59, <andrew.cooper3@citrix.com> wrote:
> On 28/04/14 11:34, Jan Beulich wrote:
>>>>> On 28.04.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
>>> XEN_DOMCTL_get_ext_vcpucontext suffers from the same issue but while trying to
>>> fix that in similar way, I discovered that it had a genuine bug when returning
>>> the count of MSRs to the toolstack. When running the hypercall on an active
>>> vcpu, the vcpu can arbitrarily alter the count returned to the toolstack by
>>> clearing and setting relevant MSRs.
>> Did you perhaps overlook the vcpu_pause() there?
>
> There is a vcpu pause in the hypercall, so for the duration of the
> hypercall the returned value will be consistent.
>
> However without the toolstack pausing the domain, issuing this hypercall
> twice, first to get the size and second to get the data might still
> result in -ENOBUFS if the vcpu suddenly writes non-0 values to the MSRs.
And in what way is this different from e.g. XEN_DOMCTL_get_vcpuextstate?
>> I'm also not really in favor of forcing the tools to allocate memory
>> for the array if in fact no MSRs are being used by the guest.
>
> If there are no msrs to receive, then passing a NULL guest handle is
> still fine.
But the caller can't know whether the count was non-zero just because
that's the theoretical maximum or because some MSR really is in use.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86/domctl: Adjust size calculations for XEN_DOMCTL_get{_ext_vcpucontext, vcpuextstate}
2014-04-28 11:37 ` Jan Beulich
@ 2014-04-28 12:26 ` Andrew Cooper
2014-04-28 13:45 ` Jan Beulich
0 siblings, 1 reply; 7+ messages in thread
From: Andrew Cooper @ 2014-04-28 12:26 UTC (permalink / raw)
To: Jan Beulich; +Cc: Keir Fraser, IanJackson, Ian Campbell, Xen-devel
On 28/04/14 12:37, Jan Beulich wrote:
>>>> On 28.04.14 at 12:59, <andrew.cooper3@citrix.com> wrote:
>> On 28/04/14 11:34, Jan Beulich wrote:
>>>>>> On 28.04.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
>>>> XEN_DOMCTL_get_ext_vcpucontext suffers from the same issue but while trying to
>>>> fix that in similar way, I discovered that it had a genuine bug when returning
>>>> the count of MSRs to the toolstack. When running the hypercall on an active
>>>> vcpu, the vcpu can arbitrarily alter the count returned to the toolstack by
>>>> clearing and setting relevant MSRs.
>>> Did you perhaps overlook the vcpu_pause() there?
>> There is a vcpu pause in the hypercall, so for the duration of the
>> hypercall the returned value will be consistent.
>>
>> However without the toolstack pausing the domain, issuing this hypercall
>> twice, first to get the size and second to get the data might still
>> result in -ENOBUFS if the vcpu suddenly writes non-0 values to the MSRs.
> And in what way is this different from e.g. XEN_DOMCTL_get_vcpuextstate?
As xcr0_accum is strictly increasing and only in a few possible steps,
the size returned can never decrease. As it is context switch material,
the chances are very good that it will reach the maximum the guest
kernel is willing to use a long time before migration happens.
But this does highlight a slight race condition in the migration code
which I will fix up in the v2 series.
>
>>> I'm also not really in favor of forcing the tools to allocate memory
>>> for the array if in fact no MSRs are being used by the guest.
>> If there are no msrs to receive, then passing a NULL guest handle is
>> still fine.
> But the caller can't know whether the count was non-zero just because
> that's the theoretical maximum or because some MSR really is in use.
>
> Jan
>
Why is that a problem?
If the toolstack wants to save any possible MSRs the guest is using,
then it is going to have to provide a buffer large enough for any
eventual number of MSRs. In the case that the buffer is sufficiently
sized, Xen writes back msr_count with the number of MSRs written, so the
toolstack can detect when fewer MSRs have been written back.
~Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86/domctl: Adjust size calculations for XEN_DOMCTL_get{_ext_vcpucontext, vcpuextstate}
2014-04-28 12:26 ` Andrew Cooper
@ 2014-04-28 13:45 ` Jan Beulich
2014-04-28 13:53 ` Andrew Cooper
0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2014-04-28 13:45 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Keir Fraser, IanJackson, Ian Campbell, Xen-devel
>>> On 28.04.14 at 14:26, <andrew.cooper3@citrix.com> wrote:
> On 28/04/14 12:37, Jan Beulich wrote:
>>>>> On 28.04.14 at 12:59, <andrew.cooper3@citrix.com> wrote:
>>> On 28/04/14 11:34, Jan Beulich wrote:
>>>>>>> On 28.04.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
>>>>> XEN_DOMCTL_get_ext_vcpucontext suffers from the same issue but while trying
> to
>>>>> fix that in similar way, I discovered that it had a genuine bug when
> returning
>>>>> the count of MSRs to the toolstack. When running the hypercall on an active
>>>>> vcpu, the vcpu can arbitrarily alter the count returned to the toolstack by
>>>>> clearing and setting relevant MSRs.
>>>> Did you perhaps overlook the vcpu_pause() there?
>>> There is a vcpu pause in the hypercall, so for the duration of the
>>> hypercall the returned value will be consistent.
>>>
>>> However without the toolstack pausing the domain, issuing this hypercall
>>> twice, first to get the size and second to get the data might still
>>> result in -ENOBUFS if the vcpu suddenly writes non-0 values to the MSRs.
>> And in what way is this different from e.g. XEN_DOMCTL_get_vcpuextstate?
>
> As xcr0_accum is strictly increasing and only in a few possible steps,
> the size returned can never decrease. As it is context switch material,
> the chances are very good that it will reach the maximum the guest
> kernel is willing to use a long time before migration happens.
Chances you say. But we need guarantees, or rely on the tool stack
knowing to re-issue such requests upon certain kinds of failures (or
accept that migration may not work occasionally, with a retry helping).
>>>> I'm also not really in favor of forcing the tools to allocate memory
>>>> for the array if in fact no MSRs are being used by the guest.
>>> If there are no msrs to receive, then passing a NULL guest handle is
>>> still fine.
>> But the caller can't know whether the count was non-zero just because
>> that's the theoretical maximum or because some MSR really is in use.
>
> Why is that a problem?
The problem is with the first half of your earlier reply: "If there are
no msrs to receive ..." - the caller just can't tell this with your change
in place.
> If the toolstack wants to save any possible MSRs the guest is using,
> then it is going to have to provide a buffer large enough for any
> eventual number of MSRs. In the case that the buffer is sufficiently
> sized, Xen writes back msr_count with the number of MSRs written, so the
> toolstack can detect when fewer MSRs have been written back.
In the end all I want to be assured is that migration would fail at the
sending side if there are MSRs that need transmitting.
Jan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86/domctl: Adjust size calculations for XEN_DOMCTL_get{_ext_vcpucontext, vcpuextstate}
2014-04-28 13:45 ` Jan Beulich
@ 2014-04-28 13:53 ` Andrew Cooper
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Cooper @ 2014-04-28 13:53 UTC (permalink / raw)
To: Jan Beulich; +Cc: Keir Fraser, IanJackson, Ian Campbell, Xen-devel
On 28/04/14 14:45, Jan Beulich wrote:
>>>> On 28.04.14 at 14:26, <andrew.cooper3@citrix.com> wrote:
>> On 28/04/14 12:37, Jan Beulich wrote:
>>>>>> On 28.04.14 at 12:59, <andrew.cooper3@citrix.com> wrote:
>>>> On 28/04/14 11:34, Jan Beulich wrote:
>>>>>>>> On 28.04.14 at 11:43, <andrew.cooper3@citrix.com> wrote:
>>>>>> XEN_DOMCTL_get_ext_vcpucontext suffers from the same issue but while trying
>> to
>>>>>> fix that in similar way, I discovered that it had a genuine bug when
>> returning
>>>>>> the count of MSRs to the toolstack. When running the hypercall on an active
>>>>>> vcpu, the vcpu can arbitrarily alter the count returned to the toolstack by
>>>>>> clearing and setting relevant MSRs.
>>>>> Did you perhaps overlook the vcpu_pause() there?
>>>> There is a vcpu pause in the hypercall, so for the duration of the
>>>> hypercall the returned value will be consistent.
>>>>
>>>> However without the toolstack pausing the domain, issuing this hypercall
>>>> twice, first to get the size and second to get the data might still
>>>> result in -ENOBUFS if the vcpu suddenly writes non-0 values to the MSRs.
>>> And in what way is this different from e.g. XEN_DOMCTL_get_vcpuextstate?
>> As xcr0_accum is strictly increasing and only in a few possible steps,
>> the size returned can never decrease. As it is context switch material,
>> the chances are very good that it will reach the maximum the guest
>> kernel is willing to use a long time before migration happens.
> Chances you say. But we need guarantees, or rely on the tool stack
> knowing to re-issue such requests upon certain kinds of failures (or
> accept that migration may not work occasionally, with a retry helping).
Yes - that is the fix I intend to use. In the case of -EINVAL and size
is now larger, realloc the buffer to the new size and retry.
>
>>>>> I'm also not really in favor of forcing the tools to allocate memory
>>>>> for the array if in fact no MSRs are being used by the guest.
>>>> If there are no msrs to receive, then passing a NULL guest handle is
>>>> still fine.
>>> But the caller can't know whether the count was non-zero just because
>>> that's the theoretical maximum or because some MSR really is in use.
>> Why is that a problem?
> The problem is with the first half of your earlier reply: "If there are
> no msrs to receive ..." - the caller just can't tell this with your change
> in place.
>
>> If the toolstack wants to save any possible MSRs the guest is using,
>> then it is going to have to provide a buffer large enough for any
>> eventual number of MSRs. In the case that the buffer is sufficiently
>> sized, Xen writes back msr_count with the number of MSRs written, so the
>> toolstack can detect when fewer MSRs have been written back.
> In the end all I want to be assured is that migration would fail at the
> sending side if there are MSRs that need transmitting.
>
> Jan
>
Ah I see. Given the one sole caller in xc_domain_save(), I will add a
hunk in v2 which explicitly fails the migration if MSRs would need
transmitting, making this safe for the short period before proper MSR
transmission can be added.
~Andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-04-28 13:53 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-28 9:43 [PATCH] x86/domctl: Adjust size calculations for XEN_DOMCTL_get{_ext_vcpucontext, vcpuextstate} Andrew Cooper
2014-04-28 10:34 ` Jan Beulich
2014-04-28 10:59 ` Andrew Cooper
2014-04-28 11:37 ` Jan Beulich
2014-04-28 12:26 ` Andrew Cooper
2014-04-28 13:45 ` Jan Beulich
2014-04-28 13:53 ` Andrew Cooper
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).