Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy

All of lore.kernel.org
 help / color / mirror / Atom feed

From: George Dunlap <george.dunlap@citrix.com>
To: Juergen Gross <jgross@suse.com>,
	Dario Faggioli <dario.faggioli@citrix.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"Andrew Cooper" <Andrew.Cooper3@citrix.com>,
	"Luis R. Rodriguez" <mcgrof@do-not-panic.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"David Vrabel" <david.vrabel@citrix.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Subject: Re: [Xen-devel] [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy
Date: Wed, 23 Sep 2015 11:23:46 +0100	[thread overview]
Message-ID: <56027DB2.50203@citrix.com> (raw)
In-Reply-To: <56022C57.3000409@suse.com>

On 09/23/2015 05:36 AM, Juergen Gross wrote:
> On 09/22/2015 06:22 PM, George Dunlap wrote:
>> On 09/22/2015 05:42 AM, Juergen Gross wrote:
>>> One other thing I just discovered: there are other consumers of the
>>> topology sibling masks (e.g. topology_sibling_cpumask()) as well.
>>>
>>> I think we would want to avoid any optimizations based on those in
>>> drivers as well, not only in the scheduler.
>>
>> I'm beginning to lose the thread of the discussion here a bit.
>>
>> Juergen / Dario, could one of you summarize your two approaches, and the
>> (alleged) advantages and disadvantages of each one?
> 
> Okay, I'll have a try:
> 
> The problem we want to solve:
> -----------------------------
> 
> The Linux kernel is gathering cpu topology data during boot via the
> CPUID instruction on each processor coming online. This data is
> primarily used in the scheduler to decide to which cpu a thread should
> be migrated when this seems to be necessary. There are other users of
> the topology information in the kernel (e.g. some drivers try to do
> optimizations like core-specific queues/lists).
> 
> When started in a virtualized environment the obtained data is next to
> useless or even wrong, as it is reflecting only the status of the time
> of booting the system. Scheduling of the (v)cpus done by the hypervisor
> is changing the topology beneath the feet of the Linux kernel without
> reflecting this in the gathered topology information. So any decisions
> taken based on that data will be clueless and possibly just wrong.
> 
> The minimal solution is to change the topology data in the kernel in a
> way that all cpus are regarded as equal regarding their relation to each
> other (e.g. when migrating a thread to another cpu no cpu is preferred
> as a target).
> 
> The topology information of the CPUID instruction is, however, even
> accessible form user mode and might be used for licensing purposes of
> any user program (e.g. by limiting the software to run on a specific
> number of cores or sockets). So just mangling the data returned by
> CPUID in the hypervisor seems not to be a general solution, while we
> might want to do it at least optionally in the future.
> 
> In the future we might want to support either dynamic topology updates
> or be able to tell the kernel to use some of the topology data, e.g.
> when pinning vcpus.
> 
> 
> Solution 1 (Dario):
> -------------------
> 
> Don't use the CPUID derived topology information in the Linux scheduler,
> but let it use a simple "flat" topology by setting own scheduler domain
> data under Xen.
> 
> Advantages:
> + very clean solution regarding the scheduler interface
> + scheduler decisions are based on a minimal data set
> + small patch
> 
> Disadvantages:
> - covers the scheduler only, drivers still use the "wrong" data
> - a little bit hacky regarding some NUMA architectures (needs either a
>   hook in the code dealing with that architecture or multiple scheduler
>   domain data overwrites)
> - future enhancements will make the solution less clean (either need
>   duplicating scheduler domain data or some new hooks in scheduler
>   domain interface)
> 
> 
> Solution 2 (Juergen):
> ---------------------
> 
> When booted as a Xen guest modify the topology data built during boot
> resulting in the same simple "flat" topology as in Dario's solution.
> 
> Advantages:
> + the simple topology is seen by all consumers of topology data as the
>   data itself is modified accordingly
> + small patch
> + future enhancements rather easy by selecting which data to modify
> 
> Disadvantages:
> - interface to scheduler not as clean as in Dario's approach
> - scheduler decisions are based on multiple layers of topology data
>   where one layer would be enough to describe the topology
> 
> 
> Dario, are you okay with this summary?

Thanks -- that's very helpful.

 -George

next prev parent reply	other threads:[~2015-09-23 10:24 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-18 15:55 [PATCH RFC] xen: if on Xen, "flatten" the scheduling domain hierarchy Dario Faggioli
2015-08-18 16:53 ` Konrad Rzeszutek Wilk
2015-08-20 18:16 ` Juergen Groß
2015-08-31 16:12   ` Boris Ostrovsky
2015-09-02 11:58     ` Juergen Gross
2015-09-02 14:08       ` Boris Ostrovsky
2015-09-02 14:30         ` Juergen Gross
2015-09-15 17:16           ` [Xen-devel] " Dario Faggioli
2015-09-15 16:50   ` Dario Faggioli
2015-09-21  5:49     ` Juergen Gross
2015-09-22  4:42       ` Juergen Gross
2015-09-22 16:22         ` George Dunlap
2015-09-23  4:36           ` Juergen Gross
2015-09-23  8:30             ` Dario Faggioli
2015-09-23  9:44               ` Juergen Gross
2015-09-23 10:23             ` George Dunlap [this message]
2015-09-23  7:24       ` Dario Faggioli
2015-09-23  7:35         ` Juergen Gross
2015-09-23 12:25           ` Boris Ostrovsky
2015-08-27 10:24 ` George Dunlap
2015-08-27 17:05   ` [Xen-devel] " George Dunlap
2015-09-15 14:32   ` Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56027DB2.50203@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dario.faggioli@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@do-not-panic.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.