From: Nate Studer <nate.studer@dornerworks.com>
To: Juergen Gross <juergen.gross@ts.fujitsu.com>,
Dario Faggioli <dario.faggioli@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
Keir Fraser <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
xen-devel@lists.xen.org
Subject: Re: [Patch] Call sched_destroy_domain before cpupool_rm_domain.
Date: Mon, 4 Nov 2013 10:22:30 -0500 [thread overview]
Message-ID: <5277BBB6.5050604@dornerworks.com> (raw)
In-Reply-To: <52776FBC.50800@ts.fujitsu.com>
On 11/4/2013 4:58 AM, Juergen Gross wrote:
> On 04.11.2013 10:26, Dario Faggioli wrote:
>> On lun, 2013-11-04 at 07:30 +0100, Juergen Gross wrote:
>>> On 04.11.2013 04:03, Nathan Studer wrote:
>>>> From: Nathan Studer <nate.studer@dornerworks.com>
>>>>
>>>> The domain destruction code, removes a domain from its cpupool
>>>> before attempting to destroy its scheduler information. Since
>>>> the scheduler framework uses the domain's cpupool information
>>>> to decide on which scheduler ops to use, this results in the
>>>> the wrong scheduler's destroy domain function being called
>>>> when the cpupool scheduler and the initial scheduler are
>>>> different.
>>>>
>>>> Correct this by destroying the domain's scheduling information
>>>> before removing it from the pool.
>>>>
>>>> Signed-off-by: Nathan Studer <nate.studer@dornerworks.com>
>>>
>>> Reviewed-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
>>>
>> I think this is a candidate for backports too, isn't it?
>>
>> Nathan, what was happening without this patch? Are you able to quickly
>> figure out what previous Xen versions suffers from the same bug?
Various things:
If I used the credit scheduler in Pool-0 and the arinc653 scheduler in a cpupool
the other pool, it would:
1. Hit a BUG_ON in the arinc653 scheduler.
2. Hit an assert in the scheduling framework code.
3. Or crash in the credit scheduler's csched_free_domdata function.
The latter clued me in that the wrong scheduler's destroy function was somehow
being called.
If I used the credit2 scheduler in the other pool, I would only ever see the latter.
Similarly, if I used the sedf scheduler in the other pool, I would only ever see
the latter. However when using the sedf scheduler I would have to create and
destroy the domain twice, instead of just once.
>
> In theory this bug is present since 4.1.
>
> OTOH it will be hit only with arinc653 scheduler in a cpupool other than
> Pool-0. And I don't see how this is being supported by arinc653 today (pick_cpu
> will always return 0).
Correct, the arinc653 scheduler currently does not work with cpupools. We are
working on remedying that though, which is how I ran into this. I would have
just wrapped this patch in with the upcoming arinc653 ones, if I had not run
into the same issue with the other schedulers.
>
> All other schedulers will just call xfree() for the domain specific data (and
> may be update some statistic data, which is not critical).
The credit and credit2 schedulers do a bit more than that in their free_domdata
functions.
The credit scheduler frees the node_affinity_cpumask contained in the domain
data and the credit2 scheduler deletes a list element contained in the domain
data. Since with this bug they are accessing structures that do not belong to
them, bad things happen.
With the credit scheduler in Pool-0, the result should be an invalid free and an
eventual crash.
With the credit2 scheduler in Pool-0, the effects might be a be more
unpredictable. At best it should result in an invalid pointer dereference.
Likewise, since the other schedulers do not do this additional work, there would
probably be other issues if the sedf or arinc653 scheduler was running in Pool-0
and one of the credit schedulers was run in the other pool. I do not know
enough about the credit scheduler to make any predictions about what would
happen though.
>
>
> Juergen
>
next prev parent reply other threads:[~2013-11-04 15:22 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-04 3:03 [Patch] Call sched_destroy_domain before cpupool_rm_domain Nathan Studer
2013-11-04 6:30 ` Juergen Gross
2013-11-04 9:26 ` Dario Faggioli
2013-11-04 9:58 ` Juergen Gross
2013-11-04 15:22 ` Nate Studer [this message]
2013-11-05 5:59 ` Juergen Gross
2013-11-07 7:39 ` Jan Beulich
2013-11-07 9:09 ` Juergen Gross
2013-11-07 9:37 ` Jan Beulich
2013-11-07 9:43 ` Juergen Gross
2013-11-04 9:33 ` Andrew Cooper
2013-11-05 21:09 ` Keir Fraser
2013-11-04 15:10 ` George Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5277BBB6.5050604@dornerworks.com \
--to=nate.studer@dornerworks.com \
--cc=JBeulich@suse.com \
--cc=dario.faggioli@citrix.com \
--cc=george.dunlap@eu.citrix.com \
--cc=juergen.gross@ts.fujitsu.com \
--cc=keir@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.