From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: George Dunlap <dunlapg@umich.edu>, Jan Beulich <JBeulich@suse.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH] move domain to cpupool0 before destroying it
Date: Tue, 20 May 2014 06:28:54 +0200 [thread overview]
Message-ID: <537ADA06.4070903@ts.fujitsu.com> (raw)
In-Reply-To: <CAFLBxZY3oBK8ZVKoW3Db3D+U=Ps+9dwy+X7tA7Gf5p=vFoH0Gg@mail.gmail.com>
On 19.05.2014 18:19, George Dunlap wrote:
> On Mon, May 19, 2014 at 4:34 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 19.05.14 at 16:57, <dunlapg@umich.edu> wrote:
>>> On Thu, May 15, 2014 at 5:59 AM, Juergen Gross
>>> <juergen.gross@ts.fujitsu.com> wrote:
>>>> Currently when a domain is destroyed it is removed from the domain_list
>>>> before all of it's resources, including the cpupool membership, are freed.
>>>> This can lead to a situation where the domain is still member of a cpupool
>>>> without for_each_domain_in_cpupool() (or even for_each_domain()) being
>>>> able to find it any more. This in turn can result in rejection of removing
>>>> the last cpu from a cpupool, because there seems to be still a domain in
>>>> the cpupool, even if it can't be found by scanning through all domains.
>>>>
>>>> This situation can be avoided by moving the domain to be destroyed to
>>>> cpupool0 first and then remove it from this cpupool BEFORE deleting it from
>>>> the domain_list. As cpupool0 is always active and a domain without any
>>> cpupool
>>>> membership is implicitly regarded as belonging to cpupool0, this poses no
>>>> problem.
>>>
>>> I'm a bit unclear why we're doing *both* a sched_move_domain(), *and*
>>> moving the "cpupool_rm_domain()".
>>>
>>> The sched_move_domain() only happens in domain_kill(), which is only
>>> initiated (AFAICT) by hypercall: does that mean if a VM dies for some
>>> other reason (i.e., crashes), that you may still have the same race?
>>> If not, then just this change alone should be sufficent. If it does,
>>> then this change is redundant.
>>
>> No, a crashed domain is merely being reported as crashed to the
>> tool stack. It's the tool stack to then actually invoke the killing of
>> it (or else e.g. "on_crash=preserve" would be rather hard to handle).
>
> Right, I see.
>
>>
>>> Moving the cpupool_rm_domain() will change things so that there is now
>>> a period of time where the VM is not being listed as being in
>>> cpupool0's pool, but may still be in that pool's scheduler's list of
>>> domains. Is that OK? If it is OK, it seems like that change alone
>>> should be sufficient.
>>
>> Moving this earlier was a requirement to avoid the race that the
>> earlier (much different) patch tried to address. Also I think the
>> patch's description already addresses that question (see the last
>> sentence of the quoted original mail contents above).
>
> But we're avoiding that race by instead moving the dying domain to
> cpupool0, which is never going to disappear.
>
> Or, moving the domain to cpupool0 *won't* sufficiently solve the race,
> and this will -- in which case, why are we bothering to move it to
> cpupool0 at all? Why not just remove it from the cpupool when
> removing it from the domain list? Wouldn't that also solve the
> original problem?
>
> Regarding the last bit, "...a domain without any cpupool membership is
> implicitly regarded as belonging to cpupool0...":
>
> 1. At a quick glance through the code, I couldn't find any evidence
> that this was the case; I couldn't find an instance where d->cpupool
> == NULL => assumed cpupool0.
xen/common/schedule.c:
#define DOM2OP(_d) (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
together with:
struct scheduler *scheduler_get_default(void)
{
return &ops;
}
>
> 2. If in reality d->cpupool is never (or almost never) actually NULL,
> then the "implicitly belongs to cpupool0" assumption will bitrot.
> Having that kind of assumption without some way of making sure it's
> maintained is a bug waiting to happen.
That's not going to happen: This assumption is tested every time the idle
domain is being referenced by the scheduler...
>
>>> I've been trying to trace through the twisty little passages of domain
>>> destruction, and I'm still not quite sure: would it be OK if we just
>>> called sched_move_domain() in domain_destroy() instead of calling
>>> cpupool_rm_domain()?
>>
>> No, it would not, because then again we wouldn't be able to deal
>> with potential failure, needing re-invocation of the function.
>
> Right.
Juergen
--
Juergen Gross Principal Developer Operating Systems
PSO PM&D ES&S SWE OS6 Telephone: +49 (0) 89 62060 2932
Fujitsu e-mail: juergen.gross@ts.fujitsu.com
Mies-van-der-Rohe-Str. 8 Internet: ts.fujitsu.com
D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
next prev parent reply other threads:[~2014-05-20 4:28 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-15 4:59 [PATCH] move domain to cpupool0 before destroying it Juergen Gross
2014-05-19 14:57 ` George Dunlap
2014-05-19 15:34 ` Jan Beulich
2014-05-19 16:19 ` George Dunlap
2014-05-20 4:28 ` Juergen Gross [this message]
2014-05-20 9:56 ` George Dunlap
2014-05-20 4:44 ` Juergen Gross
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=537ADA06.4070903@ts.fujitsu.com \
--to=juergen.gross@ts.fujitsu.com \
--cc=JBeulich@suse.com \
--cc=dunlapg@umich.edu \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).