From: George Dunlap <george.dunlap@eu.citrix.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
Julien Grall <julien.grall@linaro.org>,
Ian Campbell <Ian.Campbell@citrix.com>
Cc: jgross@suse.com,
Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
Dario Faggioli <dario.faggioli@citrix.com>,
Tim Deegan <tim@xen.org>,
george.dunlap@citrix.com, xen-devel <xen-devel@lists.xen.org>
Subject: Re: Xen crashing when killing a domain with no VCPUs allocated
Date: Mon, 21 Jul 2014 11:49:18 +0100 [thread overview]
Message-ID: <53CCF02E.7000607@eu.citrix.com> (raw)
In-Reply-To: <53CCEEA3.5080305@citrix.com>
On 07/21/2014 11:42 AM, Andrew Cooper wrote:
> On 21/07/14 11:33, George Dunlap wrote:
>> On 07/18/2014 09:26 PM, Julien Grall wrote:
>>> On 18/07/14 17:39, Ian Campbell wrote:
>>>> On Fri, 2014-07-18 at 14:27 +0100, Julien Grall wrote:
>>>>> Hi all,
>>>>>
>>>>> I've been played with the function alloc_vcpu on ARM. And I hit one
>>>>> case
>>>>> where this function can failed.
>>>>>
>>>>> During domain creation, the toolstack will call DOMCTL_max_vcpus
>>>>> which may
>>>>> fail, for instance because alloc_vcpu didn't succeed. In this case,
>>>>> the
>>>>> toolstack will call DOMCTL_domaindestroy. And I got the below stack
>>>>> trace.
>>>>>
>>>>> It can be reproduced on Xen 4.5 (and I also suspect Xen 4.4) by
>>>>> returning
>>>>> in an error in vcpu_initialize.
>>>>>
>>>>> I'm not sure how to correctly fix it.
>>>> I think a simple check at the head of the function would be ok.
>>>>
>>>> Alternatively perhaps in sched_mode_domain, which could either detect
>>>> this or could detect a domain in pool0 being moved to pool0 and short
>>>> circuit.
>>> I was thinking about the small fix below. If it's fine for everyone,
>>> I can
>>> send a patch next week.
>>>
>>> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
>>> index e9eb0bc..c44d047 100644
>>> --- a/xen/common/schedule.c
>>> +++ b/xen/common/schedule.c
>>> @@ -311,7 +311,7 @@ int sched_move_domain(struct domain *d, struct
>>> cpupool *c)
>>> }
>>> /* Do we have vcpus already? If not, no need to update
>>> node-affinity */
>>> - if ( d->vcpu )
>>> + if ( d->vcpu && d->vcpu[0] != NULL )
>>> domain_update_node_affinity(d);
>> So is the problem that we're allocating the vcpu array area, but not
>> putting any vcpus in it?
> The problem (as I recall) was that domain_create() got midway through
> and alloc_vcpu(0) failed with -ENOMEM. Following that failure, the
> toolstack called domain_destroy().
>
> Having d->vcpu properly allocated and containing fully NULL pointers is
> a valid position to be in, especial in error or teardown paths.
>
>> Overall it seems like those checks for the existence of cpus should be
>> moved into domain_update_node_affinity(). The ASSERT() there I think
>> is just a sanity check to make sure we're not getting a ridiculous
>> result out of our calculation; but of course if there actually are no
>> vcpus, it's not ridiculous at all.
>>
>> One solution might be to change the ASSERT to
>> ASSERT(!cpumask_empty(dom_cpumask) || !d->vcpu || !d->vcpu[0]). Then
>> we could probably even remove the d->vcpu conditional when calling it.
> If you were going along this line, the pointer checks are substantially
> less expensive than cpumask_empty(), so the ||'s should be reordered.
> However, I am not convinced that it is necessarily the best solution,
> given my previous observation.
Er, I was with you until the last part. What's wrong with changing the
assert from "Make sure I have *something* in there" to "Make sure I have
*something* in there *if I have any vcpus*"? That seems to be accepting
that having d->vcpu allocated but full of null pointers is a valid
condition.
-George
next prev parent reply other threads:[~2014-07-21 10:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-18 13:27 Xen crashing when killing a domain with no VCPUs allocated Julien Grall
2014-07-18 16:39 ` Ian Campbell
2014-07-18 20:26 ` Julien Grall
2014-07-21 10:33 ` George Dunlap
2014-07-21 10:42 ` Andrew Cooper
2014-07-21 10:49 ` George Dunlap [this message]
2014-07-21 11:46 ` Julien Grall
2014-07-21 12:57 ` Dario Faggioli
2014-07-23 15:31 ` Jan Beulich
2014-07-24 14:04 ` Julien Grall
2014-07-21 10:12 ` George Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53CCF02E.7000607@eu.citrix.com \
--to=george.dunlap@eu.citrix.com \
--cc=Ian.Campbell@citrix.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=george.dunlap@citrix.com \
--cc=jgross@suse.com \
--cc=julien.grall@linaro.org \
--cc=stefano.stabellini@eu.citrix.com \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.