From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying
Date: Mon, 12 May 2014 13:31:27 +0200 [thread overview]
Message-ID: <5370B10F.2080308@ts.fujitsu.com> (raw)
In-Reply-To: <CAFLBxZa_140Hkhw6RXqK=UVTREsmZJavRWzH2kvA+K9O+G=ZYw@mail.gmail.com>
On 12.05.2014 12:50, George Dunlap wrote:
> On Fri, May 9, 2014 at 6:01 AM, Juergen Gross
> <juergen.gross@ts.fujitsu.com> wrote:
>> On 08.05.2014 17:10, George Dunlap wrote:
>>>
>>> On Wed, May 7, 2014 at 2:23 PM, Juergen Gross
>>> <juergen.gross@ts.fujitsu.com> wrote:
>>>>
>>>> On 07.05.2014 15:10, George Dunlap wrote:
>>>>>
>>>>>
>>>>> On Wed, May 7, 2014 at 8:52 AM, Juergen Gross
>>>>> <juergen.gross@ts.fujitsu.com> wrote:
>>>>>>
>>>>>>
>>>>>> When a cpupool is destroyed just after the last domain has been stopped
>>>>>> the
>>>>>> domain might already be removed from the cpupool without having
>>>>>> decremented
>>>>>> the domain count of the cpupool. This will result in rejection of the
>>>>>> cpupool-destroy operation.
>>>>>
>>>>>
>>>>>
>>>>> I'm a bit confused. What's the sched_move_domain() for, then? If
>>>>> we're going to handle "dying domains" by doing a retry, could we just
>>>>> get rid of it?
>>>>
>>>>
>>>>
>>>> The sched_move_domain() is still needed for cases where a domain stays
>>>> dying for a longer time, e.g. when a dom0 process is still referencing
>>>> some of it's memory pages. This may be a rare situation, but being unable
>>>> to use a physical cpu for another cpupool just because of this case is
>>>> worse than this little piece of code, IMO.
>>>
>>>
>>> And I take it there are times when the move fails for whatever reason?
>>
>>
>> ENOMEM for example.
>>
>>
>>> Could you add a comment explaining this above the for() loop then, for
>>> posterity?
>>
>>
>> Could you define 'this', please? The reason for the sched_move_domain()
>> is mentioned in the head comment of the function (zombie domains). The
>> possibility of a failing sched_move_domain() is obvious by the return
>> value checking.
>
> Oh, sorry -- I misunderstood the patch. I thought you were adding
> code to handle the case when sched_move_domain() failed -- but
> actually, you're handling the case where you go through the
> for_each_domain_in_cpupool() loop, successfully call
> sched_move_domain() on each such domain (and decrement n_dom each
> time), but still somehow at the end have a positive n_dom.
>
> Do I have it right now, or am I still confused?
>
> So there are times when a domain might not come up in the
> for_each_domain_in_cpupool() loop, but for some reason still be in he
> n_dom reference count?
>
> That doesn't seem like it should be allowed to happen; and at the
> moment I'm failing to see how that should happen, unless you have a
> race somewhere.
>
> Sorry if I'm just being really dense here, but from what I can tell:
> * for_each_domain_in_cpupool() iterates through the domain list,
> looking for domains such that d->cpupool == c
> * d->cpupool is only modified when the cpupool_lock is held
> * Whenever d->cpupool is modified, n_dom for the appropriate cpupools
> are also modified
> * In this function, you hold cpupool_lock
>
> So how is it that you have a situation where d->cpupool != c, but
> c->n_dom is counting that domain?
Sorry, my explanation above seems to be wrong, the patch is correct. I should
have written the complete patch when I discovered the problem, not only the
source modification (it took some time to verify the solution works).
This is the correct problem description:
When a domain is destroyed, it is removed from the domain_list first,
then it is removed from the cpupool. So for_each_domain_in_cpupool() can
miss the domain while n_dom isn't yet decremented. This scenario will
happen as long as there are references to the domain.
I'll update the patch accordingly.
Thanks for trying to understand :-)
Juergen
--
Juergen Gross Principal Developer Operating Systems
PSO PM&D ES&S SWE OS6 Telephone: +49 (0) 89 62060 2932
Fujitsu e-mail: juergen.gross@ts.fujitsu.com
Mies-van-der-Rohe-Str. 8 Internet: ts.fujitsu.com
D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
next prev parent reply other threads:[~2014-05-12 11:31 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-07 7:52 [PATCH] cpupools: retry cpupool-destroy if domain in cpupool is dying Juergen Gross
2014-05-07 13:10 ` George Dunlap
2014-05-07 13:23 ` Juergen Gross
2014-05-08 15:10 ` George Dunlap
2014-05-09 5:01 ` Juergen Gross
2014-05-12 10:50 ` George Dunlap
2014-05-12 10:54 ` George Dunlap
2014-05-12 11:31 ` Juergen Gross [this message]
-- strict thread matches above, loose matches on Subject: below --
2014-05-12 11:49 Juergen Gross
2014-05-14 9:16 ` George Dunlap
2014-05-14 9:48 ` George Dunlap
2014-05-14 9:50 ` George Dunlap
2014-05-14 12:28 ` Tim Deegan
2014-05-14 9:56 ` Juergen Gross
2014-05-14 10:15 ` Jan Beulich
2014-05-14 10:19 ` George Dunlap
2014-05-14 10:35 ` Juergen Gross
2014-05-14 12:28 ` Jan Beulich
2014-05-14 13:05 ` Juergen Gross
2014-05-14 13:13 ` Jan Beulich
2014-05-14 13:22 ` Juergen Gross
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5370B10F.2080308@ts.fujitsu.com \
--to=juergen.gross@ts.fujitsu.com \
--cc=George.Dunlap@eu.citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.