From mboxrd@z Thu Jan  1 00:00:00 1970
From: George Dunlap <george.dunlap@eu.citrix.com>
Subject: Re: [PATCH] cpupools: retry cpupool-destroy if domain
 in cpupool is dying
Date: Wed, 14 May 2014 11:19:36 +0100
Message-ID: <53734338.6090100@eu.citrix.com>
References: <1399895385-18894-1-git-send-email-juergen.gross@ts.fujitsu.com>
	<CAFLBxZZK7ZWZm_Tu-VUhHD8rHi-YwL=ghmztvr39y8T6TwsDoQ@mail.gmail.com>
	<53733DE5.9060406@ts.fujitsu.com>
	<53735E690200007800012106@mail.emea.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <53735E690200007800012106@mail.emea.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>, Juergen Gross <juergen.gross@ts.fujitsu.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 05/14/2014 11:15 AM, Jan Beulich wrote:
>>>> On 14.05.14 at 11:56, <juergen.gross@ts.fujitsu.com> wrote:
>> On 14.05.2014 11:16, George Dunlap wrote:
>>> On Mon, May 12, 2014 at 12:49 PM, Juergen Gross
>>> <juergen.gross@ts.fujitsu.com> wrote:
>>>> When a cpupool is destroyed just after the last domain has been stopped the
>>>> domain might already be removed from the domain list without being removed
>>>> from the cpupool.
>>>> It is easy to detect this situation and to return EAGAIN in this case which
>>>> is already handled in libxc by doing a retry.
>>> OK, I hate to be picky over two lines, but it still seems to me like
>>> this is papering over issues instead of dealing with them properly.
>>> The real problem here is that "for_each_domain_in_cpupool()" doesn't
>>> actually go over every domain in the cpupool.  Instead of making it so
>>> that it actually does, you're compensating for that fact in an ad-hoc
>>> fashion.
>>>
>>> Now as it happens, it looks like all the other current uses of
>>> for_each_domain_in_cpupool() work just fine if there are domains in
>>> the pool it doesn't see, as long as they're about to disappear.  But
>>> we've already seen a bug caused because of a situation where "don't
>>> see domains that are about to disappear" *does* actually cause a
>>> problem; working around it is just setting a trap for future
>>> developers to fall into.  (And who knows, there may already be a bug
>>> we haven't discovered in the other invocations of
>>> for_each_domain_in_cpupool()).
>> This isn't unique to for_each_domain_in_cpupool(). It is a problem for all
>> uses of for_each_domain() which are related to resources freed only in
>> complete_domain_destroy().
> Of which there shouldn't be that many, if any at all.
>
> What prevents cpupool_rm_domain() getting moved from
> complete_domain_destroy() to domain_destroy(), before the
> domain gets taken off the list? I actually assume that there are
> more things here that may not really need deferring until the
> last possible moment...

Well the domain needs to be taken out of the scheduler for one, which is 
done by sched_destroy_domain().  That's where I stopped the "dependency 
tracking", as it seemed somewhat likely that somewhere there would be 
things that depended on having a valid scheduler. (That may not be 
correct, but it would take a careful reading to convince yourself it was 
true.)

  -George