From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH 2/6] x86/domctl: Fix TOCTOU race with the use of XEN_DOMCTL_getvcpuextstate
Date: Mon, 12 Sep 2016 14:09:17 +0100 [thread overview]
Message-ID: <89cd0ea4-27b2-57bc-a21e-a47f930198d5@citrix.com> (raw)
In-Reply-To: <57D6BCCA020000780010E05F@prv-mh.provo.novell.com>
On 12/09/16 13:33, Jan Beulich wrote:
>>>> On 12.09.16 at 14:02, <andrew.cooper3@citrix.com> wrote:
>> On 12/09/16 12:17, Jan Beulich wrote:
>>>>>> On 12.09.16 at 11:51, <andrew.cooper3@citrix.com> wrote:
>>>> A toolstack must call XEN_DOMCTL_getvcpuextstate twice; first to find the
>> size
>>>> of the buffer to use, and a second time to get the actual content.
>>>>
>>>> The reported size was based on v->arch.xcr0_accum, but a guest which extends
>>>> its xcr0_accum between the two hypercalls will cause the toolstack to fail
>> the
>>>> evc->size != size check, as the provided buffer is now too small. This causes
>>>> a hard error during the final phase of migration.
>>>>
>>>> Instead, return return a size based on xfeature_mask, which is the maximum
>>>> size Xen will ever permit. The hypercall must now tolerate a
>>>> toolstack-provided buffer which is overly large (for the case where a guest
>>>> isn't using all available xsave states), and should write back how much data
>>>> was actually written into the buffer.
>>> To be honest, I'm of two minds here. Part of me thinks this is an
>>> okay change. However, in particular ...
>>>
>>>> --- a/xen/arch/x86/domctl.c
>>>> +++ b/xen/arch/x86/domctl.c
>>>> @@ -1054,19 +1054,25 @@ long arch_do_domctl(
>>>> unsigned int size;
>>>>
>>>> ret = 0;
>>>> - vcpu_pause(v);
>>>>
>>>> - size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
>>>> if ( (!evc->size && !evc->xfeature_mask) ||
>>>> guest_handle_is_null(evc->buffer) )
>>>> {
>>>> + /*
>>>> + * A query for the size of buffer to use. Must return the
>>>> + * maximum size we ever might hand back to userspace,
>> bearing
>>>> + * in mind that the vcpu might increase its xcr0_accum
>> between
>>>> + * this query for size, and the following query for data.
>>>> + */
>>>> evc->xfeature_mask = xfeature_mask;
>>>> - evc->size = size;
>>>> - vcpu_unpause(v);
>>>> + evc->size = PV_XSAVE_SIZE(xfeature_mask);
>>>> goto vcpuextstate_out;
>>>> }
>>>>
>>>> - if ( evc->size != size || evc->xfeature_mask != xfeature_mask )
>>>> + vcpu_pause(v);
>>>> + size = PV_XSAVE_SIZE(v->arch.xcr0_accum);
>>>> +
>>>> + if ( evc->size < size || evc->xfeature_mask != xfeature_mask )
>>> ... the relaxation from != to < looks somewhat fragile to me, going
>>> forward.
>> What is fragile about it? It is a very common pattern in general, and
>> we use it elsewhere.
> If we get handed too large a buffer, what meaning do you want to
> assign to the extra bytes? Them being meaningless is just one
> possible interpretation.
I am confused, and I think you are too. There is no meaning to the
bytes; this is getvcpuextstate, not setvcpuextstate.
All it means is that Xen doesn't need to use all the buffer space
provided by the toolstack to write the current state into.
It is exactly the same concept as read(fd, buf, 4096) returning 1024.
Only some of the potential buffer was actually filled by the call.
>
>>> Did you consider dealing with the issue in the tool stack?
>> Yes, and rejected doing so.
>>
>>> It can't be that hard to repeat the size query in case data retrieval fails.
>> In principle, this is true if the size case was distinguishable from all
>> the other cases which currently return -EINVAL.
> It easily is, as long as the caller know what size it used on the most
> recent earlier call (since it would then notice it having grown).
>
>>>> @@ -1103,6 +1109,10 @@ long arch_do_domctl(
>>>> }
>>>>
>>>> vcpu_unpause(v);
>>>> +
>>>> + /* Specify how much data we actually wrote into the buffer. */
>>>> + if ( !ret )
>>>> + evc->size = size;
>>> ... if this got written on the earlier error path, there wouldn't even
>>> be a need to retry the size query: Data retrieval could be retried
>>> with the new size right after enlarging the buffer.
>> True, but userspace hypercalls are expensive for the domain in question,
>> and take a global spinlock in Xen which is expensive for the system as a
>> whole. Looking forward to PVH control domains, any hypercall will be
>> far more expensive (a vmexit/entry vs syscall/sysret).
>>
>> As such, I am not happy introducing unnecessary hypercalls, as they are
>> entirely detrimental to dom0 and Xen.
> Let's be realistic: How often do you expect xcr0_accum to change
> in practice between the size query and the data retrieval?
Very very slim, but not 0.
> I'm perfectly fine fixing the theoretical issue, but I don't think there's
> a performance aspect to be considered when deciding how to deal
> with it.
This doesn't mean we should fix it in an inefficient way.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-09-12 13:09 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-12 9:51 [PATCH 0/6] Fix multiple issues with xsave state handling on migrate Andrew Cooper
2016-09-12 9:51 ` [PATCH 1/6] x86/domctl: Introduce PV_XSAVE_HDR_SIZE and remove its opencoding Andrew Cooper
2016-09-12 11:05 ` Jan Beulich
2016-09-12 9:51 ` [PATCH 2/6] x86/domctl: Fix TOCTOU race with the use of XEN_DOMCTL_getvcpuextstate Andrew Cooper
2016-09-12 11:17 ` Jan Beulich
2016-09-12 12:02 ` Andrew Cooper
2016-09-12 12:33 ` Jan Beulich
2016-09-12 13:09 ` Andrew Cooper [this message]
2016-09-12 13:35 ` Jan Beulich
2016-09-12 13:37 ` Jan Beulich
2016-09-12 9:51 ` [PATCH 3/6] x86/domctl: Simplfy XEN_DOMCTL_getvcpuextstate when xsave is not in use Andrew Cooper
2016-09-12 11:26 ` Jan Beulich
2016-09-12 9:51 ` [PATCH 4/6] x86/xstate: Fix latent bugs in expand_xsave_states() Andrew Cooper
2016-09-12 11:41 ` Jan Beulich
2016-09-12 12:29 ` Andrew Cooper
2016-09-12 12:41 ` Jan Beulich
2016-09-12 12:43 ` Jan Beulich
2016-09-12 13:57 ` Andrew Cooper
2016-09-12 14:13 ` Jan Beulich
2016-09-12 9:51 ` [PATCH 5/6] x86/domctl: Fix migration of guests which are not using xsave Andrew Cooper
2016-09-12 12:14 ` Jan Beulich
2016-09-12 12:46 ` Andrew Cooper
2016-09-12 13:41 ` Jan Beulich
2016-09-12 9:51 ` [PATCH 6/6] x86/xstate: Fix latent bugs in compress_xsave_states() Andrew Cooper
2016-09-12 12:27 ` Jan Beulich
2016-09-12 12:59 ` Andrew Cooper
2016-09-12 13:47 ` Jan Beulich
2016-09-12 15:28 ` Andrew Cooper
2016-09-12 16:10 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=89cd0ea4-27b2-57bc-a21e-a47f930198d5@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).