[PATCH] move domain to cpupool0 before destroying it

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] move domain to cpupool0 before destroying it
@ 2014-05-15  4:59 Juergen Gross
  2014-05-19 14:57 ` George Dunlap
  0 siblings, 1 reply; 7+ messages in thread
From: Juergen Gross @ 2014-05-15  4:59 UTC (permalink / raw)
  To: xen-devel, JBeulich; +Cc: Juergen Gross

Currently when a domain is destroyed it is removed from the domain_list
before all of it's resources, including the cpupool membership, are freed.
This can lead to a situation where the domain is still member of a cpupool
without for_each_domain_in_cpupool() (or even for_each_domain()) being
able to find it any more. This in turn can result in rejection of removing
the last cpu from a cpupool, because there seems to be still a domain in
the cpupool, even if it can't be found by scanning through all domains.

This situation can be avoided by moving the domain to be destroyed to
cpupool0 first and then remove it from this cpupool BEFORE deleting it from
the domain_list. As cpupool0 is always active and a domain without any cpupool
membership is implicitly regarded as belonging to cpupool0, this poses no
problem.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
---
 xen/common/domain.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 4291e29..d4bcf6b 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -593,6 +593,8 @@ int domain_kill(struct domain *d)
             BUG_ON(rc != -EAGAIN);
             break;
         }
+        if ( sched_move_domain(d, cpupool0) )
+            return -EAGAIN;
         for_each_vcpu ( d, v )
             unmap_vcpu_info(v);
         d->is_dying = DOMDYING_dead;
@@ -775,8 +777,6 @@ static void complete_domain_destroy(struct rcu_head *head)

     sched_destroy_domain(d);

-    cpupool_rm_domain(d);
-
     /* Free page used by xen oprofile buffer. */
 #ifdef CONFIG_XENOPROF
     free_xenoprof_pages(d);
@@ -823,6 +823,8 @@ void domain_destroy(struct domain *d)
     if ( _atomic_read(old) != 0 )
         return;

+    cpupool_rm_domain(d);
+
     /* Delete from task list and task hashtable. */
     TRACE_1D(TRC_SCHED_DOM_REM, d->domain_id);
     spin_lock(&domlist_update_lock);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] move domain to cpupool0 before destroying it
  2014-05-15  4:59 [PATCH] move domain to cpupool0 before destroying it Juergen Gross
@ 2014-05-19 14:57 ` George Dunlap
  2014-05-19 15:34   ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: George Dunlap @ 2014-05-19 14:57 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Jan Beulich, xen-devel@lists.xen.org

On Thu, May 15, 2014 at 5:59 AM, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:
> Currently when a domain is destroyed it is removed from the domain_list
> before all of it's resources, including the cpupool membership, are freed.
> This can lead to a situation where the domain is still member of a cpupool
> without for_each_domain_in_cpupool() (or even for_each_domain()) being
> able to find it any more. This in turn can result in rejection of removing
> the last cpu from a cpupool, because there seems to be still a domain in
> the cpupool, even if it can't be found by scanning through all domains.
>
> This situation can be avoided by moving the domain to be destroyed to
> cpupool0 first and then remove it from this cpupool BEFORE deleting it from
> the domain_list. As cpupool0 is always active and a domain without any cpupool
> membership is implicitly regarded as belonging to cpupool0, this poses no
> problem.

I'm a bit unclear why we're doing *both* a sched_move_domain(), *and*
moving the "cpupool_rm_domain()".

The sched_move_domain() only happens in domain_kill(), which is only
initiated (AFAICT) by hypercall: does that mean if a VM dies for some
other reason (i.e., crashes), that you may still have the same race?
If not, then just this change alone should be sufficent.  If it does,
then this change is redundant.

Moving the cpupool_rm_domain() will change things so that there is now
a period of time where the VM is not being listed as being in
cpupool0's pool, but may still be in that pool's scheduler's list of
domains.  Is that OK?  If it is OK, it seems like that change alone
should be sufficient.

I've been trying to trace through the twisty little passages of domain
destruction, and I'm still not quite sure: would it be OK if we just
called sched_move_domain() in domain_destroy() instead of calling
cpupool_rm_domain()?

 -George


>
> Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
> ---
>  xen/common/domain.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 4291e29..d4bcf6b 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -593,6 +593,8 @@ int domain_kill(struct domain *d)
>              BUG_ON(rc != -EAGAIN);
>              break;
>          }
> +        if ( sched_move_domain(d, cpupool0) )
> +            return -EAGAIN;
>          for_each_vcpu ( d, v )
>              unmap_vcpu_info(v);
>          d->is_dying = DOMDYING_dead;
> @@ -775,8 +777,6 @@ static void complete_domain_destroy(struct rcu_head *head)
>
>      sched_destroy_domain(d);
>
> -    cpupool_rm_domain(d);
> -
>      /* Free page used by xen oprofile buffer. */
>  #ifdef CONFIG_XENOPROF
>      free_xenoprof_pages(d);
> @@ -823,6 +823,8 @@ void domain_destroy(struct domain *d)
>      if ( _atomic_read(old) != 0 )
>          return;
>
> +    cpupool_rm_domain(d);
> +
>      /* Delete from task list and task hashtable. */
>      TRACE_1D(TRC_SCHED_DOM_REM, d->domain_id);
>      spin_lock(&domlist_update_lock);
> --
> 1.7.10.4
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] move domain to cpupool0 before destroying it
  2014-05-19 14:57 ` George Dunlap
@ 2014-05-19 15:34   ` Jan Beulich
  2014-05-19 16:19     ` George Dunlap
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2014-05-19 15:34 UTC (permalink / raw)
  To: George Dunlap; +Cc: Juergen Gross, xen-devel@lists.xen.org

>>> On 19.05.14 at 16:57, <dunlapg@umich.edu> wrote:
> On Thu, May 15, 2014 at 5:59 AM, Juergen Gross
> <juergen.gross@ts.fujitsu.com> wrote:
>> Currently when a domain is destroyed it is removed from the domain_list
>> before all of it's resources, including the cpupool membership, are freed.
>> This can lead to a situation where the domain is still member of a cpupool
>> without for_each_domain_in_cpupool() (or even for_each_domain()) being
>> able to find it any more. This in turn can result in rejection of removing
>> the last cpu from a cpupool, because there seems to be still a domain in
>> the cpupool, even if it can't be found by scanning through all domains.
>>
>> This situation can be avoided by moving the domain to be destroyed to
>> cpupool0 first and then remove it from this cpupool BEFORE deleting it from
>> the domain_list. As cpupool0 is always active and a domain without any 
> cpupool
>> membership is implicitly regarded as belonging to cpupool0, this poses no
>> problem.
> 
> I'm a bit unclear why we're doing *both* a sched_move_domain(), *and*
> moving the "cpupool_rm_domain()".
> 
> The sched_move_domain() only happens in domain_kill(), which is only
> initiated (AFAICT) by hypercall: does that mean if a VM dies for some
> other reason (i.e., crashes), that you may still have the same race?
> If not, then just this change alone should be sufficent.  If it does,
> then this change is redundant.

No, a crashed domain is merely being reported as crashed to the
tool stack. It's the tool stack to then actually invoke the killing of
it (or else e.g. "on_crash=preserve" would be rather hard to handle).

> Moving the cpupool_rm_domain() will change things so that there is now
> a period of time where the VM is not being listed as being in
> cpupool0's pool, but may still be in that pool's scheduler's list of
> domains.  Is that OK?  If it is OK, it seems like that change alone
> should be sufficient.

Moving this earlier was a requirement to avoid the race that the
earlier (much different) patch tried to address. Also I think the
patch's description already addresses that question (see the last
sentence of the quoted original mail contents above).

> I've been trying to trace through the twisty little passages of domain
> destruction, and I'm still not quite sure: would it be OK if we just
> called sched_move_domain() in domain_destroy() instead of calling
> cpupool_rm_domain()?

No, it would not, because then again we wouldn't be able to deal
with potential failure, needing re-invocation of the function.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] move domain to cpupool0 before destroying it
  2014-05-19 15:34   ` Jan Beulich
@ 2014-05-19 16:19     ` George Dunlap
  2014-05-20  4:28       ` Juergen Gross
  2014-05-20  4:44       ` Juergen Gross
  0 siblings, 2 replies; 7+ messages in thread
From: George Dunlap @ 2014-05-19 16:19 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Juergen Gross, xen-devel@lists.xen.org

On Mon, May 19, 2014 at 4:34 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 19.05.14 at 16:57, <dunlapg@umich.edu> wrote:
>> On Thu, May 15, 2014 at 5:59 AM, Juergen Gross
>> <juergen.gross@ts.fujitsu.com> wrote:
>>> Currently when a domain is destroyed it is removed from the domain_list
>>> before all of it's resources, including the cpupool membership, are freed.
>>> This can lead to a situation where the domain is still member of a cpupool
>>> without for_each_domain_in_cpupool() (or even for_each_domain()) being
>>> able to find it any more. This in turn can result in rejection of removing
>>> the last cpu from a cpupool, because there seems to be still a domain in
>>> the cpupool, even if it can't be found by scanning through all domains.
>>>
>>> This situation can be avoided by moving the domain to be destroyed to
>>> cpupool0 first and then remove it from this cpupool BEFORE deleting it from
>>> the domain_list. As cpupool0 is always active and a domain without any
>> cpupool
>>> membership is implicitly regarded as belonging to cpupool0, this poses no
>>> problem.
>>
>> I'm a bit unclear why we're doing *both* a sched_move_domain(), *and*
>> moving the "cpupool_rm_domain()".
>>
>> The sched_move_domain() only happens in domain_kill(), which is only
>> initiated (AFAICT) by hypercall: does that mean if a VM dies for some
>> other reason (i.e., crashes), that you may still have the same race?
>> If not, then just this change alone should be sufficent.  If it does,
>> then this change is redundant.
>
> No, a crashed domain is merely being reported as crashed to the
> tool stack. It's the tool stack to then actually invoke the killing of
> it (or else e.g. "on_crash=preserve" would be rather hard to handle).

Right, I see.

>
>> Moving the cpupool_rm_domain() will change things so that there is now
>> a period of time where the VM is not being listed as being in
>> cpupool0's pool, but may still be in that pool's scheduler's list of
>> domains.  Is that OK?  If it is OK, it seems like that change alone
>> should be sufficient.
>
> Moving this earlier was a requirement to avoid the race that the
> earlier (much different) patch tried to address. Also I think the
> patch's description already addresses that question (see the last
> sentence of the quoted original mail contents above).

But we're avoiding that race by instead moving the dying domain to
cpupool0, which is never going to disappear.

Or, moving the domain to cpupool0 *won't* sufficiently solve the race,
and this will -- in which case, why are we bothering to move it to
cpupool0 at all?  Why not just remove it from the cpupool when
removing it from the domain list?  Wouldn't that also solve the
original problem?

Regarding the last bit, "...a domain without any cpupool membership is
implicitly regarded as belonging to cpupool0...":

1. At a quick glance through the code, I couldn't find any evidence
that this was the case; I couldn't find an instance where d->cpupool
== NULL => assumed cpupool0.

2. If in reality d->cpupool is never (or almost never) actually NULL,
then the "implicitly belongs to cpupool0" assumption will bitrot.
Having that kind of assumption without some way of making sure it's
maintained is a bug waiting to happen.

>> I've been trying to trace through the twisty little passages of domain
>> destruction, and I'm still not quite sure: would it be OK if we just
>> called sched_move_domain() in domain_destroy() instead of calling
>> cpupool_rm_domain()?
>
> No, it would not, because then again we wouldn't be able to deal
> with potential failure, needing re-invocation of the function.

Right.

 -George

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] move domain to cpupool0 before destroying it
  2014-05-19 16:19     ` George Dunlap
@ 2014-05-20  4:28       ` Juergen Gross
  2014-05-20  9:56         ` George Dunlap
  2014-05-20  4:44       ` Juergen Gross
  1 sibling, 1 reply; 7+ messages in thread
From: Juergen Gross @ 2014-05-20  4:28 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich; +Cc: xen-devel@lists.xen.org

On 19.05.2014 18:19, George Dunlap wrote:
> On Mon, May 19, 2014 at 4:34 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 19.05.14 at 16:57, <dunlapg@umich.edu> wrote:
>>> On Thu, May 15, 2014 at 5:59 AM, Juergen Gross
>>> <juergen.gross@ts.fujitsu.com> wrote:
>>>> Currently when a domain is destroyed it is removed from the domain_list
>>>> before all of it's resources, including the cpupool membership, are freed.
>>>> This can lead to a situation where the domain is still member of a cpupool
>>>> without for_each_domain_in_cpupool() (or even for_each_domain()) being
>>>> able to find it any more. This in turn can result in rejection of removing
>>>> the last cpu from a cpupool, because there seems to be still a domain in
>>>> the cpupool, even if it can't be found by scanning through all domains.
>>>>
>>>> This situation can be avoided by moving the domain to be destroyed to
>>>> cpupool0 first and then remove it from this cpupool BEFORE deleting it from
>>>> the domain_list. As cpupool0 is always active and a domain without any
>>> cpupool
>>>> membership is implicitly regarded as belonging to cpupool0, this poses no
>>>> problem.
>>>
>>> I'm a bit unclear why we're doing *both* a sched_move_domain(), *and*
>>> moving the "cpupool_rm_domain()".
>>>
>>> The sched_move_domain() only happens in domain_kill(), which is only
>>> initiated (AFAICT) by hypercall: does that mean if a VM dies for some
>>> other reason (i.e., crashes), that you may still have the same race?
>>> If not, then just this change alone should be sufficent.  If it does,
>>> then this change is redundant.
>>
>> No, a crashed domain is merely being reported as crashed to the
>> tool stack. It's the tool stack to then actually invoke the killing of
>> it (or else e.g. "on_crash=preserve" would be rather hard to handle).
>
> Right, I see.
>
>>
>>> Moving the cpupool_rm_domain() will change things so that there is now
>>> a period of time where the VM is not being listed as being in
>>> cpupool0's pool, but may still be in that pool's scheduler's list of
>>> domains.  Is that OK?  If it is OK, it seems like that change alone
>>> should be sufficient.
>>
>> Moving this earlier was a requirement to avoid the race that the
>> earlier (much different) patch tried to address. Also I think the
>> patch's description already addresses that question (see the last
>> sentence of the quoted original mail contents above).
>
> But we're avoiding that race by instead moving the dying domain to
> cpupool0, which is never going to disappear.
>
> Or, moving the domain to cpupool0 *won't* sufficiently solve the race,
> and this will -- in which case, why are we bothering to move it to
> cpupool0 at all?  Why not just remove it from the cpupool when
> removing it from the domain list?  Wouldn't that also solve the
> original problem?
>
> Regarding the last bit, "...a domain without any cpupool membership is
> implicitly regarded as belonging to cpupool0...":
>
> 1. At a quick glance through the code, I couldn't find any evidence
> that this was the case; I couldn't find an instance where d->cpupool
> == NULL => assumed cpupool0.

xen/common/schedule.c:

#define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))

together with:

struct scheduler *scheduler_get_default(void)
{
     return &ops;
}

>
> 2. If in reality d->cpupool is never (or almost never) actually NULL,
> then the "implicitly belongs to cpupool0" assumption will bitrot.
> Having that kind of assumption without some way of making sure it's
> maintained is a bug waiting to happen.

That's not going to happen: This assumption is tested every time the idle
domain is being referenced by the scheduler...

>
>>> I've been trying to trace through the twisty little passages of domain
>>> destruction, and I'm still not quite sure: would it be OK if we just
>>> called sched_move_domain() in domain_destroy() instead of calling
>>> cpupool_rm_domain()?
>>
>> No, it would not, because then again we wouldn't be able to deal
>> with potential failure, needing re-invocation of the function.
>
> Right.

Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
PSO PM&D ES&S SWE OS6                  Telephone: +49 (0) 89 62060 2932
Fujitsu                                   e-mail: juergen.gross@ts.fujitsu.com
Mies-van-der-Rohe-Str. 8                Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] move domain to cpupool0 before destroying it
  2014-05-19 16:19     ` George Dunlap
  2014-05-20  4:28       ` Juergen Gross
@ 2014-05-20  4:44       ` Juergen Gross
  1 sibling, 0 replies; 7+ messages in thread
From: Juergen Gross @ 2014-05-20  4:44 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich; +Cc: xen-devel@lists.xen.org

On 19.05.2014 18:19, George Dunlap wrote:
> On Mon, May 19, 2014 at 4:34 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 19.05.14 at 16:57, <dunlapg@umich.edu> wrote:
>>> On Thu, May 15, 2014 at 5:59 AM, Juergen Gross
>>> <juergen.gross@ts.fujitsu.com> wrote:
>>>> Currently when a domain is destroyed it is removed from the domain_list
>>>> before all of it's resources, including the cpupool membership, are freed.
>>>> This can lead to a situation where the domain is still member of a cpupool
>>>> without for_each_domain_in_cpupool() (or even for_each_domain()) being
>>>> able to find it any more. This in turn can result in rejection of removing
>>>> the last cpu from a cpupool, because there seems to be still a domain in
>>>> the cpupool, even if it can't be found by scanning through all domains.
>>>>
>>>> This situation can be avoided by moving the domain to be destroyed to
>>>> cpupool0 first and then remove it from this cpupool BEFORE deleting it from
>>>> the domain_list. As cpupool0 is always active and a domain without any
>>> cpupool
>>>> membership is implicitly regarded as belonging to cpupool0, this poses no
>>>> problem.
>>>
>>> I'm a bit unclear why we're doing *both* a sched_move_domain(), *and*
>>> moving the "cpupool_rm_domain()".
>>>
>>> The sched_move_domain() only happens in domain_kill(), which is only
>>> initiated (AFAICT) by hypercall: does that mean if a VM dies for some
>>> other reason (i.e., crashes), that you may still have the same race?
>>> If not, then just this change alone should be sufficent.  If it does,
>>> then this change is redundant.
>>
>> No, a crashed domain is merely being reported as crashed to the
>> tool stack. It's the tool stack to then actually invoke the killing of
>> it (or else e.g. "on_crash=preserve" would be rather hard to handle).
>
> Right, I see.
>
>>
>>> Moving the cpupool_rm_domain() will change things so that there is now
>>> a period of time where the VM is not being listed as being in
>>> cpupool0's pool, but may still be in that pool's scheduler's list of
>>> domains.  Is that OK?  If it is OK, it seems like that change alone
>>> should be sufficient.
>>
>> Moving this earlier was a requirement to avoid the race that the
>> earlier (much different) patch tried to address. Also I think the
>> patch's description already addresses that question (see the last
>> sentence of the quoted original mail contents above).
>
> But we're avoiding that race by instead moving the dying domain to
> cpupool0, which is never going to disappear.
>
> Or, moving the domain to cpupool0 *won't* sufficiently solve the race,
> and this will -- in which case, why are we bothering to move it to
> cpupool0 at all?  Why not just remove it from the cpupool when
> removing it from the domain list?  Wouldn't that also solve the
> original problem?

No. sched_destroy_domain() has to be called with the domain in the
correct cpupool. Otherwise the selection of the scheduler to use for freeing
the scheduler data won't be correct, as it will assume the default scheduler
if the domain isn't registered to any cpupool.

Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
PSO PM&D ES&S SWE OS6                  Telephone: +49 (0) 89 62060 2932
Fujitsu                                   e-mail: juergen.gross@ts.fujitsu.com
Mies-van-der-Rohe-Str. 8                Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] move domain to cpupool0 before destroying it
  2014-05-20  4:28       ` Juergen Gross
@ 2014-05-20  9:56         ` George Dunlap
  0 siblings, 0 replies; 7+ messages in thread
From: George Dunlap @ 2014-05-20  9:56 UTC (permalink / raw)
  To: Juergen Gross; +Cc: Jan Beulich, xen-devel@lists.xen.org

On Tue, May 20, 2014 at 5:28 AM, Juergen Gross
<juergen.gross@ts.fujitsu.com> wrote:
> On 19.05.2014 18:19, George Dunlap wrote:
>>
>> On Mon, May 19, 2014 at 4:34 PM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>
>>>>>> On 19.05.14 at 16:57, <dunlapg@umich.edu> wrote:
>>>>
>>>> On Thu, May 15, 2014 at 5:59 AM, Juergen Gross
>>>> <juergen.gross@ts.fujitsu.com> wrote:
>>>>>
>>>>> Currently when a domain is destroyed it is removed from the domain_list
>>>>> before all of it's resources, including the cpupool membership, are
>>>>> freed.
>>>>> This can lead to a situation where the domain is still member of a
>>>>> cpupool
>>>>> without for_each_domain_in_cpupool() (or even for_each_domain()) being
>>>>> able to find it any more. This in turn can result in rejection of
>>>>> removing
>>>>> the last cpu from a cpupool, because there seems to be still a domain
>>>>> in
>>>>> the cpupool, even if it can't be found by scanning through all domains.
>>>>>
>>>>> This situation can be avoided by moving the domain to be destroyed to
>>>>> cpupool0 first and then remove it from this cpupool BEFORE deleting it
>>>>> from
>>>>> the domain_list. As cpupool0 is always active and a domain without any
>>>>
>>>> cpupool
>>>>>
>>>>> membership is implicitly regarded as belonging to cpupool0, this poses
>>>>> no
>>>>> problem.
>>>>
>>>>
>>>> I'm a bit unclear why we're doing *both* a sched_move_domain(), *and*
>>>> moving the "cpupool_rm_domain()".
>>>>
>>>> The sched_move_domain() only happens in domain_kill(), which is only
>>>> initiated (AFAICT) by hypercall: does that mean if a VM dies for some
>>>> other reason (i.e., crashes), that you may still have the same race?
>>>> If not, then just this change alone should be sufficent.  If it does,
>>>> then this change is redundant.
>>>
>>>
>>> No, a crashed domain is merely being reported as crashed to the
>>> tool stack. It's the tool stack to then actually invoke the killing of
>>> it (or else e.g. "on_crash=preserve" would be rather hard to handle).
>>
>>
>> Right, I see.
>>
>>>
>>>> Moving the cpupool_rm_domain() will change things so that there is now
>>>> a period of time where the VM is not being listed as being in
>>>> cpupool0's pool, but may still be in that pool's scheduler's list of
>>>> domains.  Is that OK?  If it is OK, it seems like that change alone
>>>> should be sufficient.
>>>
>>>
>>> Moving this earlier was a requirement to avoid the race that the
>>> earlier (much different) patch tried to address. Also I think the
>>> patch's description already addresses that question (see the last
>>> sentence of the quoted original mail contents above).
>>
>>
>> But we're avoiding that race by instead moving the dying domain to
>> cpupool0, which is never going to disappear.
>>
>> Or, moving the domain to cpupool0 *won't* sufficiently solve the race,
>> and this will -- in which case, why are we bothering to move it to
>> cpupool0 at all?  Why not just remove it from the cpupool when
>> removing it from the domain list?  Wouldn't that also solve the
>> original problem?
>>
>> Regarding the last bit, "...a domain without any cpupool membership is
>> implicitly regarded as belonging to cpupool0...":
>>
>> 1. At a quick glance through the code, I couldn't find any evidence
>> that this was the case; I couldn't find an instance where d->cpupool
>> == NULL => assumed cpupool0.
>
>
> xen/common/schedule.c:
>
> #define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops :
> ((_d)->cpupool->sched))
>
> together with:
>
> struct scheduler *scheduler_get_default(void)
> {
>     return &ops;
>
> }
>
>>
>> 2. If in reality d->cpupool is never (or almost never) actually NULL,
>> then the "implicitly belongs to cpupool0" assumption will bitrot.
>> Having that kind of assumption without some way of making sure it's
>> maintained is a bug waiting to happen.
>
>
> That's not going to happen: This assumption is tested every time the idle
> domain is being referenced by the scheduler...

Great.  That's what I needed to know.

In that case:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

Thanks for putting up with my skepticism. :-)

 -George

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-05-20  9:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-15  4:59 [PATCH] move domain to cpupool0 before destroying it Juergen Gross
2014-05-19 14:57 ` George Dunlap
2014-05-19 15:34   ` Jan Beulich
2014-05-19 16:19     ` George Dunlap
2014-05-20  4:28       ` Juergen Gross
2014-05-20  9:56         ` George Dunlap
2014-05-20  4:44       ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).