All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juergen Gross <juergen.gross@ts.fujitsu.com>
To: George Dunlap <dunlapg@umich.edu>
Cc: xen-devel@lists.xensource.com, Keir Fraser <keir.fraser@eu.citrix.com>
Subject: Re: Cpu pools discussion
Date: Tue, 28 Jul 2009 07:40:54 +0200	[thread overview]
Message-ID: <4A6E8F66.1080506@ts.fujitsu.com> (raw)
In-Reply-To: <de76405a0907270820gd76458cs34354a61cc410acb@mail.gmail.com>

George Dunlap wrote:
> Keir (and community),
> 
> Any thoughts on Jeurgen Gross' patch on cpu pools?
> 
> As a reminder, the idea is to allow "pools" of cpus that would have
> separate schedulers.  Physical cpus and domains can be moved from one
> pool to another only by an explicit command.  The main purpose Fujitsu
> seems to have is to allow a simple machine "partitioning" that is more
> robust than using simple affinity masks.  Another potential advantage
> would be the ability to use different schedulers for different
> purposes.
> 
> For my part, it seems like they should be OK.  The main thing I don't
> like is the ugliness related to continue_hypercall_on_cpu(), described
> below.
> 
> Jeurgen, could you remind us what were the advantages of pools in the
> hypervisor, versus just having
> affinity masks (with maybe sugar in the toolstack)?

Sure.

Our main reason for introducing pools was the weakness of the current
scheduler(s) to schedule domains according to their weights while restricting
the domains to a subset of the physical processors using pinning.
I think it is virtually impossible to find a general solution for this
problem without some sort of pooling (if somebody proves me being wrong here,
I'm completely glad to take this "perfect" scheduler instead of pools :-) ).

So while the reason for the pools was a lack of functionality in the first
run, there are some more benefits:
+ possibility to use different schedulers for different domains on the same
  machine (do you remember the discussion with bcredit?). Zhigang has posted
  a request for this feature already.
+ less lock conflicts on huge machines with many processors
+ pools could be a good base for NUMA-aware scheduling policies

> 
> Re the ugly part of the patch, relating to continue_hypercall_on_cpu():
> 
> Domains are assigned to a pool, so
> if continue_hypercall_on_cpu() is called for a cpu not in the domain's
> pool, you can't just run it normally.  Jeurgen's solution (IIRC) was to
> pause all domains in the other pool, temporarily move the cpu in
> question to the calling domain's pool, finish the hypercall, then move
> the cpu in question back to the other pool.
> 
> Since there's a lot of antecedents in that, let's take an example:
> 
> Two pools; Pool A has cpus 0 and 1, pool B has cpus 2 and 3.
> 
> Domain 0 is running in pool A, domain 1 is running in pool B.
> 
> Domain 0 calls "continue_hypercall_on_cpu()" for cpu 2.
> 
> Cpu 2 is in pool B, so Jeurgen's patch:
>  * Pauses domain 1
>  * Moves cpu 2 to pool A
>  * Finishes the hypercall
>  * Moves cpu 2 back to pool B
>  * Unpauses domain 1
> 
> That seemed a bit ugly to me, but I'm not familiar enough with the use
> cases or the code to know if there's a cleaner solution.

Some thoughts on this topic:

The continue_hypercall_on_cpu() function is needed on x86 for loading new
microcode into the processor. The source buffer of the new microcode is
located in dom0-memory so dom0 has to run on the physical processor the new
code is loaded into (otherwise it wouldn't be accessible).
We could avoid the complete continue_hypercall_on_cpu() stuff if the microcode
would be copied into a hypervisor buffer and use on_selected_cpus() instead.
Other users (cpu hotplug and acpi_enter_sleep) would have to switch to other
solutions as well.

BTW: continue_hypercall_on_cpu() exists on x86 only and it isn't really much
better than my usage of it:
- remember old pinning state of current vcpu
- pin it temporarily to the cpu it should continue on
- continue the hypercall
- remove temporary pinning
- re-establish old pinning (if any)
Pretty much the same as my solution above ;-)

So I would suggest to eliminate continue_hypercall_on_cpu() completely if you
are feeling uneasy with my solution.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

  parent reply	other threads:[~2009-07-28  5:40 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-27 15:20 Cpu pools discussion George Dunlap
2009-07-27 15:50 ` Keir Fraser
2009-07-28  0:41 ` Zhigang Wang
2009-07-28  9:19   ` Tim Deegan
2009-07-28 10:15     ` Juergen Gross
2009-07-28 12:50       ` George Dunlap
2009-07-28 13:07         ` Tim Deegan
2009-07-28 13:24           ` Juergen Gross
2009-07-28 13:31             ` Tim Deegan
2009-07-28 13:39               ` Juergen Gross
2009-07-28 13:47                 ` George Dunlap
2009-07-28 13:57                   ` Juergen Gross
2009-07-28 15:29                     ` Dan Magenheimer
2009-07-28 15:49                       ` Keir Fraser
2009-07-28 16:26                         ` George Dunlap
2009-07-29  0:29                         ` Zhigang Wang
2009-07-29  5:47                       ` Juergen Gross
2009-07-28 13:41               ` George Dunlap
2009-07-28 13:55                 ` Keir Fraser
2009-07-29  6:14                   ` Juergen Gross
2009-07-29  7:39                     ` Keir Fraser
2009-07-29  8:52                       ` Juergen Gross
2009-07-29  9:35                         ` Keir Fraser
2009-07-29 11:06                           ` Juergen Gross
2009-07-29 12:28                             ` Keir Fraser
2009-07-29 12:33                               ` Juergen Gross
2009-07-29 13:00                                 ` Keir Fraser
2009-07-30  5:46                                   ` Juergen Gross
2009-07-30  8:30                                     ` Keir Fraser
2009-07-30  8:58                                       ` Juergen Gross
2009-07-30 12:51                                       ` Juergen Gross
2009-07-30 13:18                                         ` Keir Fraser
2009-07-31  5:25                                           ` Juergen Gross
2009-07-28  5:40 ` Juergen Gross [this message]
2009-07-28  9:09   ` Keir Fraser
2009-07-28 10:19     ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A6E8F66.1080506@ts.fujitsu.com \
    --to=juergen.gross@ts.fujitsu.com \
    --cc=dunlapg@umich.edu \
    --cc=keir.fraser@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.