* [PATCH 0/2] xen: cpupool (small) improvement and (latent) bug fix
@ 2016-07-14 6:41 Dario Faggioli
2016-07-14 6:41 ` [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy Dario Faggioli
2016-07-14 6:41 ` [PATCH 2/2] xen: cpupool: small optimization when moving between pools Dario Faggioli
0 siblings, 2 replies; 6+ messages in thread
From: Dario Faggioli @ 2016-07-14 6:41 UTC (permalink / raw)
To: xen-devel; +Cc: Juergen Gross, Andrew Cooper, George Dunlap, Jan Beulich
Hi,
Small cpupool couple of patches.
One is a small optimization, while the other fixes a nasty --although currently
only theoretical-- bug in the domain destruction path.
Regards, Dario
---
Dario Faggioli (2):
xen: fix a (latent) cpupool-related race during domain destroy
xen: cpupool: small optimization when moving between pools
xen/common/cpupool.c | 3 +++
xen/common/domain.c | 4 ++--
2 files changed, 5 insertions(+), 2 deletions(-)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy 2016-07-14 6:41 [PATCH 0/2] xen: cpupool (small) improvement and (latent) bug fix Dario Faggioli @ 2016-07-14 6:41 ` Dario Faggioli 2016-07-14 9:37 ` Andrew Cooper 2016-07-14 6:41 ` [PATCH 2/2] xen: cpupool: small optimization when moving between pools Dario Faggioli 1 sibling, 1 reply; 6+ messages in thread From: Dario Faggioli @ 2016-07-14 6:41 UTC (permalink / raw) To: xen-devel; +Cc: Juergen Gross, Andrew Cooper, George Dunlap, Jan Beulich So, during domain destruction, we do: cpupool_rm_domain() [ in domain_destroy() ] sched_destroy_domain() [ in complete_domain_destroy() ] Therefore, there's a window during which, from the scheduler's point of view, a domain is still there, but without it being part of any cpupool. In fact, cpupool_rm_domain() does d->cpupool=NULL, and we don't allow anything like that to hold, for any domain with the only exception of the idle one. And if we stuble upon such a domain, there are ASSERT()s and BUG_ON()s that do trigger. This never happens, right now, but only because none of the functions containing one of those sanity checks are called during the above described window. However, Credit2 goes (during load balancing) through the list of domains assigned to a certain runqueue, and not only the ones that are running or runnable. If one of those domains had cpupool_rm_domain() called upon itself already, and we call one of those functions which checks for d->cpupool!=NULL... Boom! For example, while doing Credit2 development, calling something similar to __vcpu_has_soft_affinity() from balance_load(), makes `xl shutdown <domid>' reliably crash (this is how I discovered this). On the other hand, cpupool_rm_domain() "only" does cpupool related bookkeeping, and there's no harm postponing it a little bit. Finally, considering that, during domain initialization, we do: cpupool_add_domain() sched_init_domain() It looks like it makes much more sense for the domain destroy path to look like the opposite of it, i.e.: sched_destroy_domain() cpupool_rm_domain() This patch does that, and it's considered worth, as it fixes a bug, even if only a latent one. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Cc: Juergen Gross <jgross@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> --- xen/common/domain.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index 42c07ee..f8096d3 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -811,6 +811,8 @@ static void complete_domain_destroy(struct rcu_head *head) destroy_waitqueue_vcpu(v); } + cpupool_rm_domain(d); + grant_table_destroy(d); arch_domain_destroy(d); @@ -868,8 +870,6 @@ void domain_destroy(struct domain *d) TRACE_1D(TRC_DOM0_DOM_REM, d->domain_id); - cpupool_rm_domain(d); - /* Delete from task list and task hashtable. */ spin_lock(&domlist_update_lock); pd = &domain_list; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy 2016-07-14 6:41 ` [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy Dario Faggioli @ 2016-07-14 9:37 ` Andrew Cooper 2016-07-14 14:54 ` Dario Faggioli 0 siblings, 1 reply; 6+ messages in thread From: Andrew Cooper @ 2016-07-14 9:37 UTC (permalink / raw) To: Dario Faggioli, xen-devel; +Cc: Juergen Gross, George Dunlap, Jan Beulich On 14/07/16 07:41, Dario Faggioli wrote: > So, during domain destruction, we do: > cpupool_rm_domain() [ in domain_destroy() ] > sched_destroy_domain() [ in complete_domain_destroy() ] > > Therefore, there's a window during which, from the > scheduler's point of view, a domain is still there, but > without it being part of any cpupool. > > In fact, cpupool_rm_domain() does d->cpupool=NULL, > and we don't allow anything like that to hold, for > any domain with the only exception of the idle one. > And if we stuble upon such a domain, there are ASSERT()s > and BUG_ON()s that do trigger. > > This never happens, right now, but only because none > of the functions containing one of those sanity checks > are called during the above described window. However, > Credit2 goes (during load balancing) through the list > of domains assigned to a certain runqueue, and not only > the ones that are running or runnable. If one of those > domains had cpupool_rm_domain() called upon itself > already, and we call one of those functions which checks > for d->cpupool!=NULL... Boom! > > For example, while doing Credit2 development, calling > something similar to __vcpu_has_soft_affinity() from > balance_load(), makes `xl shutdown <domid>' reliably > crash (this is how I discovered this). > > On the other hand, cpupool_rm_domain() "only" does > cpupool related bookkeeping, and there's no harm > postponing it a little bit. > > Finally, considering that, during domain initialization, > we do: > cpupool_add_domain() > sched_init_domain() > > It looks like it makes much more sense for the domain > destroy path to look like the opposite of it, i.e.: > sched_destroy_domain() > cpupool_rm_domain() > > This patch does that, and it's considered worth, as it > fixes a bug, even if only a latent one. > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> As the cpupool bookkeeping is very closely related to the scheduler bookkeeping, how about having the sched_*_domain() functions involve the cpupool_*_domain() functions? That way it can't get out-of-order like this in the future. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy 2016-07-14 9:37 ` Andrew Cooper @ 2016-07-14 14:54 ` Dario Faggioli 2016-07-14 14:55 ` Andrew Cooper 0 siblings, 1 reply; 6+ messages in thread From: Dario Faggioli @ 2016-07-14 14:54 UTC (permalink / raw) To: Andrew Cooper, xen-devel; +Cc: Juergen Gross, George Dunlap, Jan Beulich [-- Attachment #1.1: Type: text/plain, Size: 1870 bytes --] On Thu, 2016-07-14 at 10:37 +0100, Andrew Cooper wrote: > On 14/07/16 07:41, Dario Faggioli wrote: > > > > So, during domain destruction, we do: > > cpupool_rm_domain() [ in domain_destroy() ] > > sched_destroy_domain() [ in complete_domain_destroy() ] > > > > Therefore, there's a window during which, from the > > scheduler's point of view, a domain is still there, but > > without it being part of any cpupool. > > > > [...] > > > > On the other hand, cpupool_rm_domain() "only" does > > cpupool related bookkeeping, and there's no harm > > postponing it a little bit. > > > > Finally, considering that, during domain initialization, > > we do: > > cpupool_add_domain() > > sched_init_domain() > > > > It looks like it makes much more sense for the domain > > destroy path to look like the opposite of it, i.e.: > > sched_destroy_domain() > > cpupool_rm_domain() > > > > This patch does that, and it's considered worth, as it > > fixes a bug, even if only a latent one. > > > > Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> > As the cpupool bookkeeping is very closely related to the scheduler > bookkeeping, how about having the sched_*_domain() functions involve > the > cpupool_*_domain() functions? > That's certainly a good point. At minimum, I certainly can (and probably should have :-P) put a couple of ASSERT()-s in place. However, both cpupool_add_domain() and cpupool_rm_domain() are called only once, and I guess I can make them go into sched_init_domain() and sched_destroy_domain(), respectively. Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 127 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy 2016-07-14 14:54 ` Dario Faggioli @ 2016-07-14 14:55 ` Andrew Cooper 0 siblings, 0 replies; 6+ messages in thread From: Andrew Cooper @ 2016-07-14 14:55 UTC (permalink / raw) To: Dario Faggioli, xen-devel; +Cc: Juergen Gross, George Dunlap, Jan Beulich On 14/07/16 15:54, Dario Faggioli wrote: > On Thu, 2016-07-14 at 10:37 +0100, Andrew Cooper wrote: >> On 14/07/16 07:41, Dario Faggioli wrote: >>> So, during domain destruction, we do: >>> cpupool_rm_domain() [ in domain_destroy() ] >>> sched_destroy_domain() [ in complete_domain_destroy() ] >>> >>> Therefore, there's a window during which, from the >>> scheduler's point of view, a domain is still there, but >>> without it being part of any cpupool. >>> >>> [...] >>> >>> On the other hand, cpupool_rm_domain() "only" does >>> cpupool related bookkeeping, and there's no harm >>> postponing it a little bit. >>> >>> Finally, considering that, during domain initialization, >>> we do: >>> cpupool_add_domain() >>> sched_init_domain() >>> >>> It looks like it makes much more sense for the domain >>> destroy path to look like the opposite of it, i.e.: >>> sched_destroy_domain() >>> cpupool_rm_domain() >>> >>> This patch does that, and it's considered worth, as it >>> fixes a bug, even if only a latent one. >>> >>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> >> As the cpupool bookkeeping is very closely related to the scheduler >> bookkeeping, how about having the sched_*_domain() functions involve >> the >> cpupool_*_domain() functions? >> > That's certainly a good point. > > At minimum, I certainly can (and probably should have :-P) put a couple > of ASSERT()-s in place. > > However, both cpupool_add_domain() and cpupool_rm_domain() are called > only once, and I guess I can make them go into sched_init_domain() and > sched_destroy_domain(), respectively. I think that would be the most robust solution. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] xen: cpupool: small optimization when moving between pools 2016-07-14 6:41 [PATCH 0/2] xen: cpupool (small) improvement and (latent) bug fix Dario Faggioli 2016-07-14 6:41 ` [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy Dario Faggioli @ 2016-07-14 6:41 ` Dario Faggioli 1 sibling, 0 replies; 6+ messages in thread From: Dario Faggioli @ 2016-07-14 6:41 UTC (permalink / raw) To: xen-devel; +Cc: Juergen Gross If the domain is already where we want it to go, there's not much to do indeed. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Cc: Juergen Gross <jgross@suse.com> --- xen/common/cpupool.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 5dacc61..9998394 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -232,6 +232,9 @@ static int cpupool_move_domain_locked(struct domain *d, struct cpupool *c) { int ret; + if ( unlikely(d->cpupool == c) ) + return 0; + d->cpupool->n_dom--; ret = sched_move_domain(d, c); if ( ret ) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-07-14 14:56 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-14 6:41 [PATCH 0/2] xen: cpupool (small) improvement and (latent) bug fix Dario Faggioli 2016-07-14 6:41 ` [PATCH 1/2] xen: fix a (latent) cpupool-related race during domain destroy Dario Faggioli 2016-07-14 9:37 ` Andrew Cooper 2016-07-14 14:54 ` Dario Faggioli 2016-07-14 14:55 ` Andrew Cooper 2016-07-14 6:41 ` [PATCH 2/2] xen: cpupool: small optimization when moving between pools Dario Faggioli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).