From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying Date: Wed, 14 May 2014 11:19:36 +0100 Message-ID: <53734338.6090100@eu.citrix.com> References: <1399895385-18894-1-git-send-email-juergen.gross@ts.fujitsu.com> <53733DE5.9060406@ts.fujitsu.com> <53735E690200007800012106@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <53735E690200007800012106@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Juergen Gross Cc: "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 05/14/2014 11:15 AM, Jan Beulich wrote: >>>> On 14.05.14 at 11:56, wrote: >> On 14.05.2014 11:16, George Dunlap wrote: >>> On Mon, May 12, 2014 at 12:49 PM, Juergen Gross >>> wrote: >>>> When a cpupool is destroyed just after the last domain has been stopped the >>>> domain might already be removed from the domain list without being removed >>>> from the cpupool. >>>> It is easy to detect this situation and to return EAGAIN in this case which >>>> is already handled in libxc by doing a retry. >>> OK, I hate to be picky over two lines, but it still seems to me like >>> this is papering over issues instead of dealing with them properly. >>> The real problem here is that "for_each_domain_in_cpupool()" doesn't >>> actually go over every domain in the cpupool. Instead of making it so >>> that it actually does, you're compensating for that fact in an ad-hoc >>> fashion. >>> >>> Now as it happens, it looks like all the other current uses of >>> for_each_domain_in_cpupool() work just fine if there are domains in >>> the pool it doesn't see, as long as they're about to disappear. But >>> we've already seen a bug caused because of a situation where "don't >>> see domains that are about to disappear" *does* actually cause a >>> problem; working around it is just setting a trap for future >>> developers to fall into. (And who knows, there may already be a bug >>> we haven't discovered in the other invocations of >>> for_each_domain_in_cpupool()). >> This isn't unique to for_each_domain_in_cpupool(). It is a problem for all >> uses of for_each_domain() which are related to resources freed only in >> complete_domain_destroy(). > Of which there shouldn't be that many, if any at all. > > What prevents cpupool_rm_domain() getting moved from > complete_domain_destroy() to domain_destroy(), before the > domain gets taken off the list? I actually assume that there are > more things here that may not really need deferring until the > last possible moment... Well the domain needs to be taken out of the scheduler for one, which is done by sched_destroy_domain(). That's where I stopped the "dependency tracking", as it seemed somewhat likely that somewhere there would be things that depended on having a valid scheduler. (That may not be correct, but it would take a careful reading to convince yourself it was true.) -George