xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] xen: cpupool (small) improvement and (latent) bug fix
@ 2016-07-14  6:41 Dario Faggioli
  2016-07-14  6:41 ` [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy Dario Faggioli
  2016-07-14  6:41 ` [PATCH 2/2] xen: cpupool: small optimization when moving between pools Dario Faggioli
  0 siblings, 2 replies; 6+ messages in thread
From: Dario Faggioli @ 2016-07-14  6:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Andrew Cooper, George Dunlap, Jan Beulich

Hi,

Small cpupool couple of patches.

One is a small optimization, while the other fixes a nasty --although currently
only theoretical-- bug in the domain destruction path.

Regards, Dario
---
Dario Faggioli (2):
      xen: fix a (latent) cpupool-related race during domain destroy
      xen: cpupool: small optimization when moving between pools


 xen/common/cpupool.c |    3 +++
 xen/common/domain.c  |    4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy
  2016-07-14  6:41 [PATCH 0/2] xen: cpupool (small) improvement and (latent) bug fix Dario Faggioli
@ 2016-07-14  6:41 ` Dario Faggioli
  2016-07-14  9:37   ` Andrew Cooper
  2016-07-14  6:41 ` [PATCH 2/2] xen: cpupool: small optimization when moving between pools Dario Faggioli
  1 sibling, 1 reply; 6+ messages in thread
From: Dario Faggioli @ 2016-07-14  6:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross, Andrew Cooper, George Dunlap, Jan Beulich

So, during domain destruction, we do:
 cpupool_rm_domain()    [ in domain_destroy() ]
 sched_destroy_domain() [ in complete_domain_destroy() ]

Therefore, there's a window during which, from the
scheduler's point of view, a domain is still there, but
without it being part of any cpupool.

In fact, cpupool_rm_domain() does d->cpupool=NULL,
and we don't allow anything like that to hold, for
any domain with the only exception of the idle one.
And if we stuble upon such a domain, there are ASSERT()s
and BUG_ON()s that do trigger.

This never happens, right now, but only because none
of the functions containing one of those sanity checks
are called during the above described window. However,
Credit2 goes (during load balancing) through the list
of domains assigned to a certain runqueue, and not only
the ones that are running or runnable. If one of those
domains had cpupool_rm_domain() called upon itself
already, and we call one of those functions which checks
for d->cpupool!=NULL... Boom!

For example, while doing Credit2 development, calling
something similar to __vcpu_has_soft_affinity() from
balance_load(), makes `xl shutdown <domid>' reliably
crash (this is how I discovered this).

On the other hand, cpupool_rm_domain() "only" does
cpupool related bookkeeping, and there's no harm
postponing it a little bit.

Finally, considering that, during domain initialization,
we do:
 cpupool_add_domain()
 sched_init_domain()

It looks like it makes much more sense for the domain
destroy path to look like the opposite of it, i.e.:
 sched_destroy_domain()
 cpupool_rm_domain()

This patch does that, and it's considered worth, as it
fixes a bug, even if only a latent one.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Juergen Gross <jgross@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
---
 xen/common/domain.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 42c07ee..f8096d3 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -811,6 +811,8 @@ static void complete_domain_destroy(struct rcu_head *head)
         destroy_waitqueue_vcpu(v);
     }
 
+    cpupool_rm_domain(d);
+
     grant_table_destroy(d);
 
     arch_domain_destroy(d);
@@ -868,8 +870,6 @@ void domain_destroy(struct domain *d)
 
     TRACE_1D(TRC_DOM0_DOM_REM, d->domain_id);
 
-    cpupool_rm_domain(d);
-
     /* Delete from task list and task hashtable. */
     spin_lock(&domlist_update_lock);
     pd = &domain_list;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] xen: cpupool: small optimization when moving between pools
  2016-07-14  6:41 [PATCH 0/2] xen: cpupool (small) improvement and (latent) bug fix Dario Faggioli
  2016-07-14  6:41 ` [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy Dario Faggioli
@ 2016-07-14  6:41 ` Dario Faggioli
  1 sibling, 0 replies; 6+ messages in thread
From: Dario Faggioli @ 2016-07-14  6:41 UTC (permalink / raw)
  To: xen-devel; +Cc: Juergen Gross

If the domain is already where we want it to go,
there's not much to do indeed.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: Juergen Gross <jgross@suse.com>
---
 xen/common/cpupool.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index 5dacc61..9998394 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -232,6 +232,9 @@ static int cpupool_move_domain_locked(struct domain *d, struct cpupool *c)
 {
     int ret;
 
+    if ( unlikely(d->cpupool == c) )
+        return 0;
+
     d->cpupool->n_dom--;
     ret = sched_move_domain(d, c);
     if ( ret )


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy
  2016-07-14  6:41 ` [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy Dario Faggioli
@ 2016-07-14  9:37   ` Andrew Cooper
  2016-07-14 14:54     ` Dario Faggioli
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Cooper @ 2016-07-14  9:37 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Juergen Gross, George Dunlap, Jan Beulich

On 14/07/16 07:41, Dario Faggioli wrote:
> So, during domain destruction, we do:
>  cpupool_rm_domain()    [ in domain_destroy() ]
>  sched_destroy_domain() [ in complete_domain_destroy() ]
>
> Therefore, there's a window during which, from the
> scheduler's point of view, a domain is still there, but
> without it being part of any cpupool.
>
> In fact, cpupool_rm_domain() does d->cpupool=NULL,
> and we don't allow anything like that to hold, for
> any domain with the only exception of the idle one.
> And if we stuble upon such a domain, there are ASSERT()s
> and BUG_ON()s that do trigger.
>
> This never happens, right now, but only because none
> of the functions containing one of those sanity checks
> are called during the above described window. However,
> Credit2 goes (during load balancing) through the list
> of domains assigned to a certain runqueue, and not only
> the ones that are running or runnable. If one of those
> domains had cpupool_rm_domain() called upon itself
> already, and we call one of those functions which checks
> for d->cpupool!=NULL... Boom!
>
> For example, while doing Credit2 development, calling
> something similar to __vcpu_has_soft_affinity() from
> balance_load(), makes `xl shutdown <domid>' reliably
> crash (this is how I discovered this).
>
> On the other hand, cpupool_rm_domain() "only" does
> cpupool related bookkeeping, and there's no harm
> postponing it a little bit.
>
> Finally, considering that, during domain initialization,
> we do:
>  cpupool_add_domain()
>  sched_init_domain()
>
> It looks like it makes much more sense for the domain
> destroy path to look like the opposite of it, i.e.:
>  sched_destroy_domain()
>  cpupool_rm_domain()
>
> This patch does that, and it's considered worth, as it
> fixes a bug, even if only a latent one.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

As the cpupool bookkeeping is very closely related to the scheduler
bookkeeping, how about having the sched_*_domain() functions involve the
cpupool_*_domain() functions?

That way it can't get out-of-order like this in the future.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy
  2016-07-14  9:37   ` Andrew Cooper
@ 2016-07-14 14:54     ` Dario Faggioli
  2016-07-14 14:55       ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: Dario Faggioli @ 2016-07-14 14:54 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel; +Cc: Juergen Gross, George Dunlap, Jan Beulich


[-- Attachment #1.1: Type: text/plain, Size: 1870 bytes --]

On Thu, 2016-07-14 at 10:37 +0100, Andrew Cooper wrote:
> On 14/07/16 07:41, Dario Faggioli wrote:
> > 
> > So, during domain destruction, we do:
> >  cpupool_rm_domain()    [ in domain_destroy() ]
> >  sched_destroy_domain() [ in complete_domain_destroy() ]
> > 
> > Therefore, there's a window during which, from the
> > scheduler's point of view, a domain is still there, but
> > without it being part of any cpupool.
> > 
> > [...]
> >
> > On the other hand, cpupool_rm_domain() "only" does
> > cpupool related bookkeeping, and there's no harm
> > postponing it a little bit.
> > 
> > Finally, considering that, during domain initialization,
> > we do:
> >  cpupool_add_domain()
> >  sched_init_domain()
> > 
> > It looks like it makes much more sense for the domain
> > destroy path to look like the opposite of it, i.e.:
> >  sched_destroy_domain()
> >  cpupool_rm_domain()
> > 
> > This patch does that, and it's considered worth, as it
> > fixes a bug, even if only a latent one.
> > 
> > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> As the cpupool bookkeeping is very closely related to the scheduler
> bookkeeping, how about having the sched_*_domain() functions involve
> the
> cpupool_*_domain() functions?
> 
That's certainly a good point.

At minimum, I certainly can (and probably should have :-P) put a couple
of ASSERT()-s in place.

However, both cpupool_add_domain() and cpupool_rm_domain() are called
only once, and I guess I can make them go into sched_init_domain() and
sched_destroy_domain(), respectively.

Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy
  2016-07-14 14:54     ` Dario Faggioli
@ 2016-07-14 14:55       ` Andrew Cooper
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Cooper @ 2016-07-14 14:55 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel; +Cc: Juergen Gross, George Dunlap, Jan Beulich

On 14/07/16 15:54, Dario Faggioli wrote:
> On Thu, 2016-07-14 at 10:37 +0100, Andrew Cooper wrote:
>> On 14/07/16 07:41, Dario Faggioli wrote:
>>> So, during domain destruction, we do:
>>>  cpupool_rm_domain()    [ in domain_destroy() ]
>>>  sched_destroy_domain() [ in complete_domain_destroy() ]
>>>
>>> Therefore, there's a window during which, from the
>>> scheduler's point of view, a domain is still there, but
>>> without it being part of any cpupool.
>>>
>>> [...]
>>>
>>> On the other hand, cpupool_rm_domain() "only" does
>>> cpupool related bookkeeping, and there's no harm
>>> postponing it a little bit.
>>>
>>> Finally, considering that, during domain initialization,
>>> we do:
>>>  cpupool_add_domain()
>>>  sched_init_domain()
>>>
>>> It looks like it makes much more sense for the domain
>>> destroy path to look like the opposite of it, i.e.:
>>>  sched_destroy_domain()
>>>  cpupool_rm_domain()
>>>
>>> This patch does that, and it's considered worth, as it
>>> fixes a bug, even if only a latent one.
>>>
>>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> As the cpupool bookkeeping is very closely related to the scheduler
>> bookkeeping, how about having the sched_*_domain() functions involve
>> the
>> cpupool_*_domain() functions?
>>
> That's certainly a good point.
>
> At minimum, I certainly can (and probably should have :-P) put a couple
> of ASSERT()-s in place.
>
> However, both cpupool_add_domain() and cpupool_rm_domain() are called
> only once, and I guess I can make them go into sched_init_domain() and
> sched_destroy_domain(), respectively.

I think that would be the most robust solution.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-07-14 14:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-14  6:41 [PATCH 0/2] xen: cpupool (small) improvement and (latent) bug fix Dario Faggioli
2016-07-14  6:41 ` [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy Dario Faggioli
2016-07-14  9:37   ` Andrew Cooper
2016-07-14 14:54     ` Dario Faggioli
2016-07-14 14:55       ` Andrew Cooper
2016-07-14  6:41 ` [PATCH 2/2] xen: cpupool: small optimization when moving between pools Dario Faggioli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).