From mboxrd@z Thu Jan  1 00:00:00 1970
From: Juergen Gross <juergen.gross@ts.fujitsu.com>
Subject: Re: [PATCH] cpupools: retry cpupool-destroy if domain
 in cpupool is dying
Date: Wed, 14 May 2014 12:35:07 +0200
Message-ID: <537346DB.9070009@ts.fujitsu.com>
References: <1399895385-18894-1-git-send-email-juergen.gross@ts.fujitsu.com>	<CAFLBxZZK7ZWZm_Tu-VUhHD8rHi-YwL=ghmztvr39y8T6TwsDoQ@mail.gmail.com>	<53733DE5.9060406@ts.fujitsu.com>
	<53735E690200007800012106@mail.emea.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <53735E690200007800012106@mail.emea.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 14.05.2014 12:15, Jan Beulich wrote:
>>>> On 14.05.14 at 11:56, <juergen.gross@ts.fujitsu.com> wrote:
>> On 14.05.2014 11:16, George Dunlap wrote:
>>> On Mon, May 12, 2014 at 12:49 PM, Juergen Gross
>>> <juergen.gross@ts.fujitsu.com> wrote:
>>>> When a cpupool is destroyed just after the last domain has been stopped the
>>>> domain might already be removed from the domain list without being removed
>>>> from the cpupool.
>>>> It is easy to detect this situation and to return EAGAIN in this case which
>>>> is already handled in libxc by doing a retry.
>>>
>>> OK, I hate to be picky over two lines, but it still seems to me like
>>> this is papering over issues instead of dealing with them properly.
>>> The real problem here is that "for_each_domain_in_cpupool()" doesn't
>>> actually go over every domain in the cpupool.  Instead of making it so
>>> that it actually does, you're compensating for that fact in an ad-hoc
>>> fashion.
>>>
>>> Now as it happens, it looks like all the other current uses of
>>> for_each_domain_in_cpupool() work just fine if there are domains in
>>> the pool it doesn't see, as long as they're about to disappear.  But
>>> we've already seen a bug caused because of a situation where "don't
>>> see domains that are about to disappear" *does* actually cause a
>>> problem; working around it is just setting a trap for future
>>> developers to fall into.  (And who knows, there may already be a bug
>>> we haven't discovered in the other invocations of
>>> for_each_domain_in_cpupool()).
>>
>> This isn't unique to for_each_domain_in_cpupool(). It is a problem for all
>> uses of for_each_domain() which are related to resources freed only in
>> complete_domain_destroy().
>
> Of which there shouldn't be that many, if any at all.
>
> What prevents cpupool_rm_domain() getting moved from
> complete_domain_destroy() to domain_destroy(), before the
> domain gets taken off the list? I actually assume that there are
> more things here that may not really need deferring until the
> last possible moment...

sched_destroy_vcpu() and sched_destroy_domain() have to happen before
cpupool_rm_domain(). This could be avoided if the domain would be moved to
cpupool0 in domain_destroy().

Hmm, doesn't sound too bad. This would be just symmetrical to domain
creation. What do you think?


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
PSO PM&D ES&S SWE OS6                  Telephone: +49 (0) 89 62060 2932
Fujitsu                                   e-mail: juergen.gross@ts.fujitsu.com
Mies-van-der-Rohe-Str. 8                Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html