[DOC RFC] Heterogeneous Multi Processing Support in Xen

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [DOC RFC] Heterogeneous Multi Processing Support in Xen
@ 2016-12-07 18:29 Dario Faggioli
  2016-12-08  6:12 ` Juergen Gross
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Dario Faggioli @ 2016-12-07 18:29 UTC (permalink / raw)
  To: Xen Devel
  Cc: Jürgen Groß, Peng Fan, Stefano Stabellini,
	George Dunlap, Andrew Cooper, Dario Faggioli, anastassios.nanos,
	Jan Beulich, Peng Fan

[-- Attachment #1.1: Type: text/plain, Size: 16542 bytes --]

% Heterogeneous Multi Processing Support in Xen
% Revision 1

\clearpage

# Basics

---------------- ------------------------
         Status: **Design Document**

Architecture(s): x86, arm

   Component(s): Hypervisor and toolstack
---------------- ------------------------

# Overview

HMP (Heterogeneous Multi Processing) and AMP (Asymmetric Multi Processing)
refer to systems where physical CPUs are not exactly equal. It may be that
they have different processing power, or capabilities, or that each is
specifically designed to run a particular system component.
Most of the times the CPUs have different Instruction Set Architectures (ISA)
or Application Binary Interfaces (ABIs). But they may *just* be different
implementations of the same ISA, in which case they typically differ in
speed, power efficiency or handling of special things (e.g., erratas).

An example is ARM big.LITTLE, which in fact, is the use case that got the
discussion about HMP started. This document, however, is generic, and does
not target only big.LITTLE.

What need proper Xen support are systems and use cases where virtual CPUs
can not be seamlessly moved around all the physical CPUs. In fact, in these
cases, there must be a way to:

* decide and specify on what (set of) physical CPU(s), each vCPU can execute on;
* enforce that a vCPU that can only run on a certain (set of) pCPUs, is never
  actually run anywhere else.

**N.B.:** it is becoming common to refer as AMP or HMP also to systems which
have various kind of co-processors (from crypto engines to graphic hardware),
integrated with the CPUs on the same chip. This is not what this design 
document is about.

# Classes of CPUs

A *class of CPUs* is defined as follows:

1. each pCPU in the system belongs to a class;
2. a class can consist of one or more pCPUs;
3. each pCPU can only be in one class;
4. CPUs belonging to the same class are homogeneous enough that a virtual
   CPU that blocks/is preempted while running on a pCPU of a class can,
   **seamlessly**, unblock/be scheduler on any pCPU of that same class;
5. when a virtual CPU is associated with a (set of) class(es) of CPUs, it
   means that the vCPU can run on all the pCPUs belonging to the said
   class(es).

So, for instance, in architecture Foobar two classes of CPUs exist, class
foo and class bar. If a virtual CPU running on a CPU 0, which is of class
foo, blocks (or is preempted), it can, when it unblocks (or is selected by
the scheduler to run  again), run on CPU 3, still of class foo, but not on
CPU 6, which is of class bar.

## Defining classes

How a class is defined, i.e., what are the specific characteristics that
determine what CPUs belong to which class, is highly architecture specific.

### x86

There is no HMP platform of relevance, for now, in x86 world. Therefore,
only one class will exist, and all the CPUs will be set to belong to it.
**TODO X86:** is this correct?

### ARM

**TODO ARM:** I know nothing about what specifically should be used to
form classes, so I'm deferring this to ARM people.

So far, in the original thread the following ideas came up (well, there's
more, but I don't know enough of ARM to judge what is really relevant about
this topic):

* [Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02153.html)
  "I don't think an hardcoded list of processor in Xen is the right solution.
   There are many existing processors and combinations for big.LITTLE so it
   will nearly be impossible to keep updated."
* [Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02256.html)
  "Well, before trying to do something clever like that (i.e naming "big" and
  "little"), we need to have upstreamed bindings available to acknowledge the
  difference. AFAICT, it is not yet upstreamed for Device Tree and I don't
  know any static ACPI tables providing the similar information."
* [Peng](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02194.html)
  "For how to differentiate cpus, I am looking the linaro eas cpu topology code"

# User details

## Classes of CPUs for the users

It will be possible, in a VM config file, to specify the (set of) class(es)
of each vCPU. This allows creating HMP VMs.

E.g., on ARM, it will be possible to create big.LITTLE VMs which, if run on
big.LITTLE hosts, could leverage the big.LITTLE support of the guest OS kernel
and tools.

For such purpose, a new option will be added to xl config file:

    vcpus = "8"
    vcpuclass = ["0-2:class0", "3,4:class1,class3", "5:class0, class2", "8:class4"]

with the following meaning:

* vCPUs 0, 1, 2 can only run on pcpus of class class0
* vCPUs 3, 4 can run on pcpus of class class1 **and** on pcpus of class class3
* vCPUs 5 can run on pcpus of class class0 **and** on pCPUs of class class2
* for vCPUs 7, since they're not mentioned, default applies
* vCPUs 8 can only run on pcpus of class class4

For the vCPUs for which no class is specified, default behavior applies.

**TODO:** note that I think it must be possible to associate more than
one class to a vCPU. This is expressed in the example above, and assumed
to be true throughout the document. It might be, though, that, at least at
early stages (see implementation phases below), we will enable only 1-to-1
mapping.

**TODO:** default can be, either:

1. the vCPU can run on any CPU of any class,
2. the vCPU can only run on a specific, arbitrary decided, class (and I'd say
   that should be class 0).

The former seems a better interface. It looks to me like the most natural
and less surprising, from the user point of view, and the most future proof
(see phase 3 of implementation below).
The latter may be more practical, though. In fact, with the former, we risk
crashing (the guest or the hypervisor) if one creates a VM and forgets to
specify the vCPU classes --which does not look ideal.

It will be possible to gather information about what classes exist, and what
pCPUs belong to each class, by issuing the `xl info -n' command:

    cpu_topology           :
    cpu:    core    socket     node     class
      0:       0        1        0        0
      1:       0        1        0        1
      2:       1        1        0        2
      3:       1        1        0        3
      4:       9        1        0        3
      5:       9        1        0        0
      6:      10        1        0        1
      7:      10        1        0        2
      8:       0        0        1        3
      9:       0        0        1        3
     10:       1        0        1        1
     11:       1        0        1        0
     12:       9        0        1        1
     13:       9        0        1        0
     14:      10        0        1        2
     15:      10        0        1        2

**TODO:** do we want to keep using `-n`, or add another switch, like -c or
something? I'm not sure I like using `-n` as, e.g., on x86, this would most
of the times result in just a column full of `0`, and it may raise confusion
among users about what that actually means.
Also, do we want to print the class ids, or some more abstract class names?
(or support both, and have a way to decide which one to see)? 

# Technical details

## Hypervisor

The hypervisor needs to know within which class each of the present CPUs
falls. At boot (or, in general, CPU bringup) time, while identifying the CPU,
a list of classes is constructed, and the mapping between each CPU and the
class it is determined it should belong, established.

The list of classes is kept ordered from the more powerful to the less
powerful.
**TODO:** this has been [proposed by George](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html).
I like the idea, what do others think? If we agree on that, note that there
has been no discussion on defining what "more powerful" means, neither on
x86 (although, not really that interesting, for now, I'd say), nor on ARM.

The mapping between CPUs and classes will be kept in memory in the following
data structures:

    uint16_t cpu_to_class[NR_CPUS] __read_mostly;
    cpumask_t class_to_cpumask[NR_CPUS] __read_mostly;

**TODO:** it's probably better to allocate the cpumask array dynamically,
to avoid wasting too much space.

**TODO:** if we want the ordering, structure needs to be kept ordered too
(or additional structures should be used for the purpose).

Each virtual CPU must know on what class of CPUs it can run on. Since a
vCPU can be associated to more than one class, the best way to keep track
of this information is a bitamp. That will be a new `cpumask` typed member
in `struct vcpu`. were the i-eth bit set means the vCPU can
run on CPUs of class i.

If a vCPU is found running on a pCPU of a class that is not associated to
the vCPU itself, an exception should be raised.
**TODO:** What kind? BUG_ON? Crash the guest? The guest would probably crash
--or become unreliable-- by its own, I guess.

Setting and getting the CPU class of a vCPU will happen via two new
hypercalls:

* `XEN_DOMCTL_setvcpuclass`
* `XEN_DOMCTL_setvcpuclass`

Information about CPU classes will be propagated to toolstak by adding a
new field in xen_sysctl_cputopo, which will become:

    struct xen_sysctl_cputopo {
        uint32_t core;
        uint32_t socket;
        uint32_t node;
        unit32_t class;
    };

For homogeneous and SMP systems, the value of the new class field will
be 0 for all the cores.

## Toolstack

It will be possible for the toolstack to retrieve from Xen the list of
existing CPU classes, their names, and the information about to which
class each present CPU belongs to.

**TODO:** [George suggested](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html)
to allow a richer set of labels, at the toolstack level, and I like
the idea very much. It's not clear to me, though, in what component
this list of names, and the mapping between them and the classes as
they're known inside Xen should live.

Libxl and libxc interfaces will be introduced for associating a vCPU to
a (set of) class(es):

* `libxl_set_vcpuclass()`, `libxl_get_vcpuclass()`;
* `xc_vcpu_setclass()`, `xc_vcpu_getclass()`.

In libxl, class information will be added in `struct libxl_cputopology`,
which is filled by `libxl_get_cpu_topology()`.

# Implementation

Implementation can proceed in phases.

## Phase 1

Class definition, identification and mapping of CPUs to classes, inside
Xen, will be implemented. And so they will be libxc and libxl interfaces
for retrieving such information.

Parsing of the new `vcpuclass` parameter will be implemented in `xl`. The
result of such parsing will then be used as if it were the hard-affinity of
the various vCPUs. That is, we will set the hard-affinity of each vCPU, to
the pCPUs that are part of the class(es) the vCPU itself is being assigned,
according to `vcpuclass`.

This would *Just Work(TM)*, as soon as the user does not try to change the
hard-affinity, during the VM lifetime (e.g., with `xl vcpu-pin').

**TODO:** It may be useful, for avoiding the above to happen, to add another
`xl` config option that, if set, disallows changing the affinity from what it
was at VM creation time (something like `immutable_affinity=1`). Thoughts?
I'm leaning toward doing that, as it may even be something useful to have
in other usecases.

### Phase 1.5

Library (libxc and libxl) calls and hypercalls that are necessary to associate
a class to the vCPUs will be implemented.

At which point, when parsing `vcpuclass` in `xl`, we will call both (with the
same bitmap as input):

* `libxl_set_vcpuclass()`
* `libxl_set_vcpuaffinity()`

`libxl__set_vcpuaffinity()` will be modified in such a way that, when setting
hard-affinity for a vCPU:

* it will get the CPU class(es) associated to the vCPU;
* it will check what pCPUs that belong to the class(es);
* it will filter out, from the new hard-affinity being set, the pCPUs that
   are not in the vCPU's class(es)'.

As a safety measure, `vcpu_set_hard_affinity()` in Xen will also be modified
such that, if someone somehow manages to pass down an hard-affinity mask
which contains pCPUs outside from the proper classes, it will error out
with -EINVAL.

### Phase 2

Inside Xen, the various schedulers will be modified to deal internally with
the fact that vCPUs can only run on pCPUs from the class(es) they are
associated with. This allows for more efficient implementation, and paves
the way for enabling more intelligent logic (e.g., for minimizing power
consumption) in *phase 3*.

Calling `libxl_set_vcpuaffinity()` from `xl` / libxl is therefore no longer
necessary and will be avoided (i.e., only `libxl_set_vcpuclass()` will be
called).

### Phase 3

Moving vCPUs between classes will be implemented. This means that, e.g.,
on ARM big.LITTLE, it will be possible for a vCPU to block on a big core
and wakeup on a LITTLE core.

**TODO:** About what this takes, see [Julien's email](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02345.html).

This means it will be no longer necessary to specify the class of the
vCPUs via `vcpuclass` in `xl`, although that will of course remain
supported. So:

1. if one wants (sticking with big.LITTLE as example) a big.LITTLE VM,
   and wants to make sure that make sure that big vCPUs will run on big
   pCPUs, and that LITTLE vCPUs will run on LITTLE pCPUs, she will use:

    vcpus = "8"
    vcpuclass = ["0-3:big", "4-7:little"]

2. if one does not care, and is happy to let the Xen scheduler decide
   where to run the various vCPUs, in order, for instance, to be sure
   to get the best power efficiency for the host as a whole, he can
   just avoid specifying any `vcpuclass`, or doing something like this:

    vcpuclass = ["all:all"]

# Limitations

* Until in *phase 1*, it won't be possible to use vCPU hard-affinity
  for anything else than HMP support;
* until before *phase 3*, since HMP support is basically the same as
  setting hard-affinity, performance may not be ideal;
* until before *phase 3*, vCPUs can't move between classes. This means.
  for instance, in the big.LITTLE world, Xen's scheduler can't move a
  vCPU running on a big core on a LITTLE core (e.g., to try save power).

# Testing

Testing requires an actual AMP/HMP system. On such a system, we at least
want to:

* create a VM **without** specifying `vcpuclass` in its config file, and
  check that the default policy is correctly applied to all vCPUs;
* create a VM **specifying** `vcpuclass` in its config file and check that
  the classes are assegned to vCPUs appropriately;
* create a VM **specifying** `vcpuclass` in its config file and check that
  the various vCPUs are not running on any pCPU outside of their respective
  classes.

# Areas for improvement

* Make it possible to test even on non-HMP systems. That could be done by
  making it possible to provide Xen with fake CPU classes for the system
  CPUs (e.g., with boot time parameters);
* implement a way to view the class the vCPUs have been assigned (either as
  past of the output of `xl vcpu-list`, or as a dedicated `xl` subcommand);
* make it possible to dynamically change the class of vCPUs at runtime, with
  `xl` (either via a new parameter to `vcpu-pin` subcommand, or via a new
  subcommand).

# Known issues

*TBD*.

# References

* [Asymetric Multi Processing](https://en.wikipedia.org/wiki/Asymmetric_multiprocessing)
* [Heterogeneous Multi Processing](https://en.wikipedia.org/wiki/Heterogeneous_computing)
* [ARM big.LITTLE](https://www.arm.com/products/processors/technologies/biglittleprocessing.php)

# History

------------------------------------------------------------------------
Date       Revision Version  Notes
---------- -------- -------- -------------------------------------------
2016-12-02 1                 RFC of design document
---------- -------- -------- -------------------------------------------

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-07 18:29 [DOC RFC] Heterogeneous Multi Processing Support in Xen Dario Faggioli
@ 2016-12-08  6:12 ` Juergen Gross
  2016-12-08 10:27   ` Dario Faggioli
  2016-12-08 10:14 ` Jan Beulich
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Juergen Gross @ 2016-12-08  6:12 UTC (permalink / raw)
  To: Dario Faggioli, Xen Devel
  Cc: Peng Fan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	anastassios.nanos, Jan Beulich, Peng Fan

On 07/12/16 19:29, Dario Faggioli wrote:
> Setting and getting the CPU class of a vCPU will happen via two new
> hypercalls:
> 
> * `XEN_DOMCTL_setvcpuclass`
> * `XEN_DOMCTL_setvcpuclass`

XEN_DOMCTL_getvcpuclass

> ### Phase 2
> 
> Inside Xen, the various schedulers will be modified to deal internally with
> the fact that vCPUs can only run on pCPUs from the class(es) they are
> associated with. This allows for more efficient implementation, and paves
> the way for enabling more intelligent logic (e.g., for minimizing power
> consumption) in *phase 3*.
> 
> Calling `libxl_set_vcpuaffinity()` from `xl` / libxl is therefore no longer
> necessary and will be avoided (i.e., only `libxl_set_vcpuclass()` will be
> called).

Any idea how to avoid problems in the schedulers related to vcpus with
different weights? Remember, weights and pinning don't go well together,
that was the main reason for inventing cpupools. You should at least
name that problem. In case of vcpus being capable to run on pcpus of
more than one class this problem might surface again.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-08  6:12 ` Juergen Gross
@ 2016-12-08 10:27   ` Dario Faggioli
  2016-12-08 10:38     ` Juergen Gross
  0 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2016-12-08 10:27 UTC (permalink / raw)
  To: Juergen Gross, Xen Devel
  Cc: Peng Fan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	anastassios.nanos, Jan Beulich, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 1558 bytes --]

On Thu, 2016-12-08 at 07:12 +0100, Juergen Gross wrote:
> On 07/12/16 19:29, Dario Faggioli wrote:
> > 
> > Setting and getting the CPU class of a vCPU will happen via two new
> > hypercalls:
> > 
> > * `XEN_DOMCTL_setvcpuclass`
> > * `XEN_DOMCTL_setvcpuclass`
> 
> XEN_DOMCTL_getvcpuclass
> 
Oops, thanks.

> > ### Phase 2
> > 
> > Inside Xen, the various schedulers will be modified to deal
> > internally with
> > the fact that vCPUs can only run on pCPUs from the class(es) they
> > are
> > associated with. This allows for more efficient implementation, and
> > paves
> > the way for enabling more intelligent logic (e.g., for minimizing
> > power
> > consumption) in *phase 3*.
> > 
> Any idea how to avoid problems in the schedulers related to vcpus
> with
> different weights? 
>
Sure: use Credit2! :-P

And I'm not joking (not entirely, at least), as the alternative is to
re-engineer significantly the algorithm inside Credit, which I'm not
sure is doable or worthwhile, especially considering we have
alternatives.

> Remember, weights and pinning don't go well together,
> that was the main reason for inventing cpupools. You should at least
> name that problem. 
>
Yes, that's true. I will add a paragraph about it.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-08 10:27   ` Dario Faggioli
@ 2016-12-08 10:38     ` Juergen Gross
  2016-12-08 21:45       ` Dario Faggioli
  2016-12-15 18:41       ` Dario Faggioli
  0 siblings, 2 replies; 22+ messages in thread
From: Juergen Gross @ 2016-12-08 10:38 UTC (permalink / raw)
  To: Dario Faggioli, Xen Devel
  Cc: Peng Fan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	anastassios.nanos, Jan Beulich, Peng Fan

On 08/12/16 11:27, Dario Faggioli wrote:
> On Thu, 2016-12-08 at 07:12 +0100, Juergen Gross wrote:
>> On 07/12/16 19:29, Dario Faggioli wrote:
>>> ### Phase 2
>>>
>>> Inside Xen, the various schedulers will be modified to deal
>>> internally with
>>> the fact that vCPUs can only run on pCPUs from the class(es) they
>>> are
>>> associated with. This allows for more efficient implementation, and
>>> paves
>>> the way for enabling more intelligent logic (e.g., for minimizing
>>> power
>>> consumption) in *phase 3*.
>>>
>> Any idea how to avoid problems in the schedulers related to vcpus
>> with
>> different weights? 
>>
> Sure: use Credit2! :-P
> 
> And I'm not joking (not entirely, at least), as the alternative is to
> re-engineer significantly the algorithm inside Credit, which I'm not
> sure is doable or worthwhile, especially considering we have
> alternatives.

So you really solved the following problem in credit2?

You have three domains with 2 vcpus each and different weights. Run them
on 3 physical cpus with following pinning:

dom1: pcpu 1 and 2
dom2: pcpu 2 and 3
dom3: pcpu 1 and 3

How do you decide which vcpu to run on which pcpu for how long?


Juergen

> 
>> Remember, weights and pinning don't go well together,
>> that was the main reason for inventing cpupools. You should at least
>> name that problem. 
>>
> Yes, that's true. I will add a paragraph about it.
> 
> Thanks and Regards,
> Dario
> -- <<This happens because I choose it to happen!>> (Raistlin Majere)
> ----------------------------------------------------------------- Dario
> Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer,
> Citrix Systems R&D Ltd., Cambridge (UK)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-08 10:38     ` Juergen Gross
@ 2016-12-08 21:45       ` Dario Faggioli
  2016-12-15 18:41       ` Dario Faggioli
  1 sibling, 0 replies; 22+ messages in thread
From: Dario Faggioli @ 2016-12-08 21:45 UTC (permalink / raw)
  To: Juergen Gross, Xen Devel
  Cc: Peng Fan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	anastassios.nanos, Jan Beulich, Peng Fan

[-- Attachment #1.1: Type: text/plain, Size: 2871 bytes --]

On Thu, 2016-12-08 at 11:38 +0100, Juergen Gross wrote:
> On 08/12/16 11:27, Dario Faggioli wrote:
> > On Thu, 2016-12-08 at 07:12 +0100, Juergen Gross wrote:
> > > Any idea how to avoid problems in the schedulers related to vcpus
> > > with
> > > different weights? 
> > > 
> > Sure: use Credit2! :-P
> > 
> > And I'm not joking (not entirely, at least), as the alternative is
> > to
> > re-engineer significantly the algorithm inside Credit, which I'm
> > not
> > sure is doable or worthwhile, especially considering we have
> > alternatives.
> 
> So you really solved the following problem in credit2?
> 
So, pinning will always _affect_ scheduling, that is actually its goal.
And in fact, it really should be used when there is no alternative, or
when the scenario is understood well enough, that its effects are known
(or at least known to be beneficial for the workload running on the
host).

In Credit2, weights used to make a vCPU burn credits faster or slower
than the other vCPUs, while in Credit1, the algorithm is much more
complex. Also, in Credit2, everything is computed per-runqueue. Pinning
of course interferes, but should really be less disruptive than in
Credit1.

All this being said, I was not yet around when you came up with the
idea that pinning was disturbing weighted fairness, so I'm not sure
what the original argument was... I'll go back check the email
conversation in the archive. And again, all the times that one can use
cpupool, that should be the preferred solution, but there are
situations where that's just not suitable, and we need pinning.

This case is a little bit border-line. Sure using pinning is not ideal,
and in fact it's only happening in the initial stages. When actually
modifying the scheduler, we will, in Credit2, do something like having
one runqueue per class (or more, but certainly not any runqueues that
"cross" classes, as that would not work), which puts us in a pretty
decent situation, I think. For Credit, let's see, but I'm afraid we
won't be able to guarantee much more than technical correctness (i.e.,
not scheduling on forbidden classes).

> You have three domains with 2 vcpus each and different weights. Run
> them
> on 3 physical cpus with following pinning:
> 
> dom1: pcpu 1 and 2
> dom2: pcpu 2 and 3
> dom3: pcpu 1 and 3
> 
> How do you decide which vcpu to run on which pcpu for how long?
> 
Ok, it was a public holiday here today, so I did not really have time
to think about this example. And tomorrow I'm on PTO. I'll look closely
on Monday.

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-08 10:38     ` Juergen Gross
  2016-12-08 21:45       ` Dario Faggioli
@ 2016-12-15 18:41       ` Dario Faggioli
  2016-12-16  7:44         ` Juergen Gross
  1 sibling, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2016-12-15 18:41 UTC (permalink / raw)
  To: Juergen Gross, Xen Devel
  Cc: Peng Fan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	anastassios.nanos, Jan Beulich, Peng Fan

[-- Attachment #1.1: Type: text/plain, Size: 3900 bytes --]

On Thu, 2016-12-08 at 11:38 +0100, Juergen Gross wrote:
> So you really solved the following problem in credit2?
> 
> You have three domains with 2 vcpus each and different weights. Run
> them
> on 3 physical cpus with following pinning:
> 
> dom1: pcpu 1 and 2
> dom2: pcpu 2 and 3
> dom3: pcpu 1 and 3
> 
> How do you decide which vcpu to run on which pcpu for how long?
> 
Ok, back to this (sorry, a bit later than how I'd hoped). So, I tried
to think a bit at the described scenario, but could not figure out what
you are hinting at.

There are missing pieces of information, such as what the vcpus do, and
what exactly are the weights (besides than being different).

Therefore, I decided to put together a quick eperiment. I've created
the domains, sat up all their vcpus to run cpu-hog tasks, picked up a
configuration of my choice for the weights, and run them under both
Credit1 and Credit2.

It's a very simple tests, but it will hopefully be helpful in
understanding the situation better.

Here's the result.

On Credit1, equal weigths, unpinned (i.e., plenty of pCPUs available):
 NAME  CPU(%) [1]
 vm1   199.9
 vm2   199.9
 vm3   199.9

Pinning as you suggest (i.e., to 3 pCPUs):
 NAME  CPU(%) [2]
 vm1   149.0
 vm2    66.2
 vm3    84.8

Changing the weights:
 Name  ID Weight  Cap [3]
 vm1   8    256    0
 vm2   9    512    0
 vm3   6   1024    0
 NAME  CPU(%)
 vm1   100.0
 vm2   100.0
 vm3   100.0

So, here in Credit1, things are ok when there's no pinning in place [1]. As soon as we pin, _even_without_ touching the weights [2], things become *crazy*. In fact, there's absolutely no reason why CPU% numbers would look like how they look in [2].

This does not surprise me much, though. Credit1's load balancer basically moves vcpus around in a pseudo random fashion, and having to enforce pinning constraints make things even more unpredictable.

Then it comes the amusing part. At this point, I wonder if I haven't done something wrong in setting up the experiments... Because things really looks too funny. :-O
In fact, for some reasons, changing the weights as shown [3] cause CPU% numbers to fluctuate a bit (not visible above) and then to stabilize at 100%. That may look like an improvement, but certainly does not reflect the chosen set of weights.

So, I'd say you were right. Or, actually, things are even worse than what you said: in Credit1, it's not only that pinning and weights does not play well together, it's that even pinning alone works pretty bad.

Now, on Credit2, equal weigths, unpinned (i.e., plenty of pCPUs
available):
 NAME  CPU(%) [4]
 vm1   199.9
 vm2   199.9
 vm3   199.9

Pinning as you suggest (i.e., to 3 pCPUs):
 NAME  CPU(%) [5]
 vm1   100.0
 vm2   100.1
 vm3   100.0

Changing the weights:
 Name  ID Weight [6]
 vm1   2    256
 vm2   3    512
 vm3   6   1024
 NAME  CPU(%)
 vm1    44.1
 vm2    87.2
 vm3   168.7

Which looks nearly *perfect* to me. :-)

In fact, with no constraints [4], each VM gets the 200% share it's
asking for.

When only 3 pCPUs can be used, by means of pinning [5], each VM gets
its fair share of 100%.

When setting up weights in such a way that vm2 should get 2x CPU time
than vm1 and vm3 should get 2x CPU time than vm2 [6], things looks,
well, exactly like that! :-P

So, since I did not fully understand the problem, I'm not sure whether
this really answers your question, but it look to me like it actually
could! :-D

For sure, it puts Credit2 in rather a good light :-P.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-15 18:41       ` Dario Faggioli
@ 2016-12-16  7:44         ` Juergen Gross
  0 siblings, 0 replies; 22+ messages in thread
From: Juergen Gross @ 2016-12-16  7:44 UTC (permalink / raw)
  To: Dario Faggioli, Xen Devel
  Cc: Peng Fan, Stefano Stabellini, George Dunlap, Andrew Cooper,
	anastassios.nanos, Jan Beulich, Peng Fan

On 15/12/16 19:41, Dario Faggioli wrote:
> On Thu, 2016-12-08 at 11:38 +0100, Juergen Gross wrote:
>> So you really solved the following problem in credit2?
>>
>> You have three domains with 2 vcpus each and different weights. Run
>> them
>> on 3 physical cpus with following pinning:
>>
>> dom1: pcpu 1 and 2
>> dom2: pcpu 2 and 3
>> dom3: pcpu 1 and 3
>>
>> How do you decide which vcpu to run on which pcpu for how long?
>>
> Ok, back to this (sorry, a bit later than how I'd hoped). So, I tried
> to think a bit at the described scenario, but could not figure out what
> you are hinting at.
> 
> There are missing pieces of information, such as what the vcpus do, and
> what exactly are the weights (besides than being different).
> 
> Therefore, I decided to put together a quick eperiment. I've created
> the domains, sat up all their vcpus to run cpu-hog tasks, picked up a
> configuration of my choice for the weights, and run them under both
> Credit1 and Credit2.
> 
> It's a very simple tests, but it will hopefully be helpful in
> understanding the situation better.
> 
> Here's the result.
> 
> On Credit1, equal weigths, unpinned (i.e., plenty of pCPUs available):
>  NAME  CPU(%) [1]
>  vm1   199.9
>  vm2   199.9
>  vm3   199.9
> 
> Pinning as you suggest (i.e., to 3 pCPUs):
>  NAME  CPU(%) [2]
>  vm1   149.0
>  vm2    66.2
>  vm3    84.8
> 
> Changing the weights:
>  Name  ID Weight  Cap [3]
>  vm1   8    256    0
>  vm2   9    512    0
>  vm3   6   1024    0
>  NAME  CPU(%)
>  vm1   100.0
>  vm2   100.0
>  vm3   100.0
> 
> So, here in Credit1, things are ok when there's no pinning in place [1]. As soon as we pin, _even_without_ touching the weights [2], things become *crazy*. In fact, there's absolutely no reason why CPU% numbers would look like how they look in [2].
> 
> This does not surprise me much, though. Credit1's load balancer basically moves vcpus around in a pseudo random fashion, and having to enforce pinning constraints make things even more unpredictable.
> 
> Then it comes the amusing part. At this point, I wonder if I haven't done something wrong in setting up the experiments... Because things really looks too funny. :-O
> In fact, for some reasons, changing the weights as shown [3] cause CPU% numbers to fluctuate a bit (not visible above) and then to stabilize at 100%. That may look like an improvement, but certainly does not reflect the chosen set of weights.
> 
> So, I'd say you were right. Or, actually, things are even worse than what you said: in Credit1, it's not only that pinning and weights does not play well together, it's that even pinning alone works pretty bad.

I'd say: With credit1 pinning should be rather explicit in one of the
following ways:

- a vcpu should be pinned to only 1 pcpu, or
- a group of vcpus should be pinned to a group of pcpus no other
  vcpu is allowed to run on (cpupools seem to be the better choice
  in this case)

> Now, on Credit2, equal weigths, unpinned (i.e., plenty of pCPUs
> available):
>  NAME  CPU(%) [4]
>  vm1   199.9
>  vm2   199.9
>  vm3   199.9
> 
> Pinning as you suggest (i.e., to 3 pCPUs):
>  NAME  CPU(%) [5]
>  vm1   100.0
>  vm2   100.1
>  vm3   100.0
> 
> Changing the weights:
>  Name  ID Weight [6]
>  vm1   2    256
>  vm2   3    512
>  vm3   6   1024
>  NAME  CPU(%)
>  vm1    44.1
>  vm2    87.2
>  vm3   168.7
> 
> Which looks nearly *perfect* to me. :-)

_Really_ impressive!

> In fact, with no constraints [4], each VM gets the 200% share it's
> asking for.
> 
> When only 3 pCPUs can be used, by means of pinning [5], each VM gets
> its fair share of 100%.
> 
> When setting up weights in such a way that vm2 should get 2x CPU time
> than vm1 and vm3 should get 2x CPU time than vm2 [6], things looks,
> well, exactly like that! :-P
> 
> So, since I did not fully understand the problem, I'm not sure whether
> this really answers your question, but it look to me like it actually
> could! :-D
> 
> For sure, it puts Credit2 in rather a good light :-P.

Absolutely!


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-07 18:29 [DOC RFC] Heterogeneous Multi Processing Support in Xen Dario Faggioli
  2016-12-08  6:12 ` Juergen Gross
@ 2016-12-08 10:14 ` Jan Beulich
  2016-12-08 10:23   ` Dario Faggioli
                     ` (2 more replies)
  2016-12-16  8:05 ` George Dunlap
  2017-03-01  0:05 ` Anastassios Nanos
  3 siblings, 3 replies; 22+ messages in thread
From: Jan Beulich @ 2016-12-08 10:14 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	AndrewCooper, Xen Devel, anastassios.nanos, Peng Fan

>>> On 07.12.16 at 19:29, <dario.faggioli@citrix.com> wrote:
> ### x86
> 
> There is no HMP platform of relevance, for now, in x86 world. Therefore,
> only one class will exist, and all the CPUs will be set to belong to it.
> **TODO X86:** is this correct?

What about the original Xeon Phi (on a PCIe card)?

> ## Hypervisor
> 
> The hypervisor needs to know within which class each of the present CPUs
> falls. At boot (or, in general, CPU bringup) time, while identifying the CPU,
> a list of classes is constructed, and the mapping between each CPU and the
> class it is determined it should belong, established.
> 
> The list of classes is kept ordered from the more powerful to the less
> powerful.
> **TODO:** this has been [proposed by 
> George](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html).
> I like the idea, what do others think? If we agree on that, note that there
> has been no discussion on defining what "more powerful" means, neither on
> x86 (although, not really that interesting, for now, I'd say), nor on ARM.

Indeed I think there should be no assumption about the ability to
order things here: Even if for some initial set of hardware it may
be possible to clearly tell which one's more powerful and which
one's more weak, already the moment you extend this from
compute power to different ISA extensions you'll immediately end
up with the possibility of two CPUs have a distinct extra feature
compared to one another (say one a crypto extension and the
other a wider vector compute engine).

It may be possible to establish partial ordering though, but it's
not really clear to me what such ordering would be used for.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-08 10:14 ` Jan Beulich
@ 2016-12-08 10:23   ` Dario Faggioli
  2016-12-08 10:41     ` Jan Beulich
  2016-12-08 19:09   ` Stefano Stabellini
  2016-12-08 21:54   ` Dario Faggioli
  2 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2016-12-08 10:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	AndrewCooper, Xen Devel, anastassios.nanos, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 1108 bytes --]

On Thu, 2016-12-08 at 03:14 -0700, Jan Beulich wrote:
> > > > On 07.12.16 at 19:29, <dario.faggioli@citrix.com> wrote:
> > ### x86
> > 
> > There is no HMP platform of relevance, for now, in x86 world.
> > Therefore,
> > only one class will exist, and all the CPUs will be set to belong
> > to it.
> > **TODO X86:** is this correct?
> 
> What about the original Xeon Phi (on a PCIe card)?
> 
Well, what I'd say about it is that I did not know about its existence.
:-)

Anyway, if we have HMP on x86 already, and we want to support them,
we'll have to define criteria for building classes there too. Once that
is done, the rest of this document should be general enough (or at
least that was the intent).

About defining those criteria, I'd appreciate whatever input you x86
experts will be able to share. :-)

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-08 10:23   ` Dario Faggioli
@ 2016-12-08 10:41     ` Jan Beulich
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2016-12-08 10:41 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, StefanoStabellini, George Dunlap,
	AndrewCooper, Xen Devel, anastassios.nanos, Peng Fan

>>> On 08.12.16 at 11:23, <dario.faggioli@citrix.com> wrote:
> On Thu, 2016-12-08 at 03:14 -0700, Jan Beulich wrote:
>> > > > On 07.12.16 at 19:29, <dario.faggioli@citrix.com> wrote:
>> > ### x86
>> > 
>> > There is no HMP platform of relevance, for now, in x86 world.
>> > Therefore,
>> > only one class will exist, and all the CPUs will be set to belong
>> > to it.
>> > **TODO X86:** is this correct?
>> 
>> What about the original Xeon Phi (on a PCIe card)?
>> 
> Well, what I'd say about it is that I did not know about its existence.
> :-)
> 
> Anyway, if we have HMP on x86 already, and we want to support them,
> we'll have to define criteria for building classes there too. Once that
> is done, the rest of this document should be general enough (or at
> least that was the intent).
> 
> About defining those criteria, I'd appreciate whatever input you x86
> experts will be able to share. :-)

Well, the obvious part of the classification would be differences
in CPUID output - vendor, family, model, stepping, feature flags.
I'm not currently aware of ways to identify differing performance,
but I'm also unaware of systems built with CPUs varying in e.g.
clock speeds.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-08 10:14 ` Jan Beulich
  2016-12-08 10:23   ` Dario Faggioli
@ 2016-12-08 19:09   ` Stefano Stabellini
  2016-12-08 21:54   ` Dario Faggioli
  2 siblings, 0 replies; 22+ messages in thread
From: Stefano Stabellini @ 2016-12-08 19:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	AndrewCooper, Dario Faggioli, Xen Devel, anastassios.nanos,
	Peng Fan

On Thu, 8 Dec 2016, Jan Beulich wrote:
> >>> On 07.12.16 at 19:29, <dario.faggioli@citrix.com> wrote:
> > ### x86
> > 
> > There is no HMP platform of relevance, for now, in x86 world. Therefore,
> > only one class will exist, and all the CPUs will be set to belong to it.
> > **TODO X86:** is this correct?
> 
> What about the original Xeon Phi (on a PCIe card)?
> 
> > ## Hypervisor
> > 
> > The hypervisor needs to know within which class each of the present CPUs
> > falls. At boot (or, in general, CPU bringup) time, while identifying the CPU,
> > a list of classes is constructed, and the mapping between each CPU and the
> > class it is determined it should belong, established.
> > 
> > The list of classes is kept ordered from the more powerful to the less
> > powerful.
> > **TODO:** this has been [proposed by 
> > George](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html).
> > I like the idea, what do others think? If we agree on that, note that there
> > has been no discussion on defining what "more powerful" means, neither on
> > x86 (although, not really that interesting, for now, I'd say), nor on ARM.
> 
> Indeed I think there should be no assumption about the ability to
> order things here: Even if for some initial set of hardware it may
> be possible to clearly tell which one's more powerful and which
> one's more weak, already the moment you extend this from
> compute power to different ISA extensions you'll immediately end
> up with the possibility of two CPUs have a distinct extra feature
> compared to one another (say one a crypto extension and the
> other a wider vector compute engine).
> 
> It may be possible to establish partial ordering though, but it's
> not really clear to me what such ordering would be used for.

I think you are right in saying that there might not be a
straightforward ordering from powerful to weak.

Maybe it is better to say that the Xen architecture specific code will
pick a default class (not necessarily class0). The default class can be
changed with a Xen command line parameter or an hypercall.

This way we can have Xen use big cpus by default, but it can be changed
to LITTLE for example, without implying that big or LITTLE is more
powerful, which actually is difficult to determine even on ARM.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-08 10:14 ` Jan Beulich
  2016-12-08 10:23   ` Dario Faggioli
  2016-12-08 19:09   ` Stefano Stabellini
@ 2016-12-08 21:54   ` Dario Faggioli
  2016-12-09  8:13     ` Jan Beulich
  2 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2016-12-08 21:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Peng Fan, Stefano Stabellini, George Dunlap,
	AndrewCooper, Xen Devel, anastassios.nanos, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 1961 bytes --]

On Thu, 2016-12-08 at 03:14 -0700, Jan Beulich wrote:
> > > > On 07.12.16 at 19:29, <dario.faggioli@citrix.com> wrote:
> > The list of classes is kept ordered from the more powerful to the
> > less
> > powerful.
> > **TODO:** this has been [proposed by 
> > George](https://lists.xenproject.org/archives/html/xen-devel/2016-0
> > 9/msg02212.html).
> > I like the idea, what do others think? If we agree on that, note
> > that there
> > has been no discussion on defining what "more powerful" means,
> > neither on
> > x86 (although, not really that interesting, for now, I'd say), nor
> > on ARM.
> 
> Indeed I think there should be no assumption about the ability to
> order things here: Even if for some initial set of hardware it may
> be possible to clearly tell which one's more powerful and which
> one's more weak, already the moment you extend this from
> compute power to different ISA extensions you'll immediately end
> up with the possibility of two CPUs have a distinct extra feature
> compared to one another (say one a crypto extension and the
> other a wider vector compute engine).
> 
Yeah, that was what was puzzling me too. Keeping them ordered has the
nice property that if a user says the following in a config file:

 vcpuclass=["0-3:class0", "4-7:class1"]

(assuming that class0 and class1 are the always available Xen names) it
would be always true that vCPUs 0-3 are 'more powerful', no matter on
what host the VM runs (ARM and x86, now and in 5 years, etc), which
would be really nice.

But I really am not sure whether that is possible.

Perhaps George, which thought about this first, has it more clear...

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-08 21:54   ` Dario Faggioli
@ 2016-12-09  8:13     ` Jan Beulich
  2016-12-09  8:29       ` Dario Faggioli
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2016-12-09  8:13 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, StefanoStabellini, George Dunlap,
	AndrewCooper, Xen Devel, anastassios.nanos, Peng Fan

>>> On 08.12.16 at 22:54, <dario.faggioli@citrix.com> wrote:
> On Thu, 2016-12-08 at 03:14 -0700, Jan Beulich wrote:
>> > > > On 07.12.16 at 19:29, <dario.faggioli@citrix.com> wrote:
>> > The list of classes is kept ordered from the more powerful to the
>> > less
>> > powerful.
>> > **TODO:** this has been [proposed by 
>> > George](https://lists.xenproject.org/archives/html/xen-devel/2016-0 
>> > 9/msg02212.html).
>> > I like the idea, what do others think? If we agree on that, note
>> > that there
>> > has been no discussion on defining what "more powerful" means,
>> > neither on
>> > x86 (although, not really that interesting, for now, I'd say), nor
>> > on ARM.
>> 
>> Indeed I think there should be no assumption about the ability to
>> order things here: Even if for some initial set of hardware it may
>> be possible to clearly tell which one's more powerful and which
>> one's more weak, already the moment you extend this from
>> compute power to different ISA extensions you'll immediately end
>> up with the possibility of two CPUs have a distinct extra feature
>> compared to one another (say one a crypto extension and the
>> other a wider vector compute engine).
>> 
> Yeah, that was what was puzzling me too. Keeping them ordered has the
> nice property that if a user says the following in a config file:
> 
>  vcpuclass=["0-3:class0", "4-7:class1"]
> 
> (assuming that class0 and class1 are the always available Xen names) it

This, btw, is another aspect I think has a basic problem: class0 and
class1 say nothing about the properties of a class, and hence are
tied to one particular host. I think class names need to be descriptive
and uniform across hosts. That would allow migration of such VMs as
well as prevent starting them on a host not having suitable hardware.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-09  8:13     ` Jan Beulich
@ 2016-12-09  8:29       ` Dario Faggioli
  2016-12-09  9:09         ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2016-12-09  8:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Peng Fan, StefanoStabellini, George Dunlap,
	AndrewCooper, Xen Devel, anastassios.nanos, Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 2454 bytes --]

On Fri, 2016-12-09 at 01:13 -0700, Jan Beulich wrote:
> > > > On 08.12.16 at 22:54, <dario.faggioli@citrix.com> wrote:
> > Yeah, that was what was puzzling me too. Keeping them ordered has
> > the
> > nice property that if a user says the following in a config file:
> > 
> >  vcpuclass=["0-3:class0", "4-7:class1"]
> > 
> > (assuming that class0 and class1 are the always available Xen
> > names) it
> 
> This, btw, is another aspect I think has a basic problem: class0 and
> class1 say nothing about the properties of a class, and hence are
> tied to one particular host.
>
The other way round, I'd say. I mean, since they say nothing, they're
_not_ host specific?

Anyway, naming was another thing on which the debate was not at all
closed, but the point is exactly the one you're making here, in fact...

>  I think class names need to be descriptive
> and uniform across hosts. That would allow migration of such VMs as
> well as prevent starting them on a host not having suitable hardware.
> 
...what George suggested (but please, George, when back, correct me if
I'm misrepresenting your ideas :-)) that:
 - something generic, such as class0, class1 will always exist (well, 
   at least class0). They would basically constitute the Xen interface;
 - toolstack will accept more specific names, such as 'big' and 
   'little', and also 'A57' and 'A43' (I'm making up the names), etc.
 - a VM with vCPUs in class0 and class1 will always be created and run 
   on any 2 classes system; a VM with big and little vCPUs will only 
   run on an ARM big.LITTLE incarnation; a VM with A57 and A43 vCPUs 
   will only run on an host that has at least one A57 and one A43 
   pCPUs.

What's not clear to me is how to establish:
 - the ordering among classes;
 - the mapping between Xen's neuter names and the toolstack's (arch) 
   specific ones.

All this being said, yes, if one specify more than one class and
there's only one, as well as if one specify a class that does not
exist, we should abort domain creation. I shall add this to the specs
(it was covered in the thread, I just forgot).

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-09  8:29       ` Dario Faggioli
@ 2016-12-09  9:09         ` Jan Beulich
  2016-12-09 19:20           ` Stefano Stabellini
  2016-12-16  8:00           ` George Dunlap
  0 siblings, 2 replies; 22+ messages in thread
From: Jan Beulich @ 2016-12-09  9:09 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Juergen Gross, Peng Fan, StefanoStabellini, George Dunlap,
	AndrewCooper, Xen Devel, anastassios.nanos, Peng Fan

>>> On 09.12.16 at 09:29, <dario.faggioli@citrix.com> wrote:
> On Fri, 2016-12-09 at 01:13 -0700, Jan Beulich wrote:
>> > > > On 08.12.16 at 22:54, <dario.faggioli@citrix.com> wrote:
>> > Yeah, that was what was puzzling me too. Keeping them ordered has
>> > the
>> > nice property that if a user says the following in a config file:
>> > 
>> >  vcpuclass=["0-3:class0", "4-7:class1"]
>> > 
>> > (assuming that class0 and class1 are the always available Xen
>> > names) it
>> 
>> This, btw, is another aspect I think has a basic problem: class0 and
>> class1 say nothing about the properties of a class, and hence are
>> tied to one particular host.
>>
> The other way round, I'd say. I mean, since they say nothing, they're
> _not_ host specific?

No, not really. Or perhaps we mean different things. The name
itself of course can be anything, but what is relevant here is
what it stands for. And "class0" may mean one thing on host 1
and a completely different thing on host2. Yet we need a certain
name to always mean the same thing (or else we'd need
translation when moving VMs between hosts).

>>  I think class names need to be descriptive
>> and uniform across hosts. That would allow migration of such VMs as
>> well as prevent starting them on a host not having suitable hardware.
>> 
> ...what George suggested (but please, George, when back, correct me if
> I'm misrepresenting your ideas :-)) that:
>  - something generic, such as class0, class1 will always exist (well, 
>    at least class0). They would basically constitute the Xen interface;
>  - toolstack will accept more specific names, such as 'big' and 
>    'little', and also 'A57' and 'A43' (I'm making up the names), etc.
>  - a VM with vCPUs in class0 and class1 will always be created and run 
>    on any 2 classes system;

How can that work, if you don't know what class1 represents?

> a VM with big and little vCPUs will only 
>    run on an ARM big.LITTLE incarnation; a VM with A57 and A43 vCPUs 
>    will only run on an host that has at least one A57 and one A43 
>    pCPUs.
> 
> What's not clear to me is how to establish:
>  - the ordering among classes;

As said before - there's at best some partial ordering going to be
possible.

>  - the mapping between Xen's neuter names and the toolstack's (arch) 
>    specific ones.

Perhaps it needs re-consideration whether class names make
sense in the first place? What about, for example, making class
names something entirely local to the domain config file, and
besides specifying

vcpuclass=["0-3:class0", "4-7:class1"]

requiring for it to also specify the properties of the classes it
uses:

class0=["..."]
class1=["..."]

The specifiers then would be architecture specific, e.g.

class0=["arm64"]
class1=["arm64.big"]

or on x86

class0=["x86-64"]
class1=["x86.avx", "x86.avx2"]
class2=["x86.XeonPhi"]

Of course this goes quite a bit in the direction of CPUID handling,
so Andrew may have a word to say here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-09  9:09         ` Jan Beulich
@ 2016-12-09 19:20           ` Stefano Stabellini
  2016-12-16  8:00           ` George Dunlap
  1 sibling, 0 replies; 22+ messages in thread
From: Stefano Stabellini @ 2016-12-09 19:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Peng Fan, StefanoStabellini, George Dunlap,
	AndrewCooper, Dario Faggioli, Xen Devel, anastassios.nanos,
	Peng Fan

On Fri, 9 Dec 2016, Jan Beulich wrote:
> >>> On 09.12.16 at 09:29, <dario.faggioli@citrix.com> wrote:
> > On Fri, 2016-12-09 at 01:13 -0700, Jan Beulich wrote:
> >> > > > On 08.12.16 at 22:54, <dario.faggioli@citrix.com> wrote:
> >> > Yeah, that was what was puzzling me too. Keeping them ordered has
> >> > the
> >> > nice property that if a user says the following in a config file:
> >> > 
> >> >  vcpuclass=["0-3:class0", "4-7:class1"]
> >> > 
> >> > (assuming that class0 and class1 are the always available Xen
> >> > names) it
> >> 
> >> This, btw, is another aspect I think has a basic problem: class0 and
> >> class1 say nothing about the properties of a class, and hence are
> >> tied to one particular host.
> >>
> > The other way round, I'd say. I mean, since they say nothing, they're
> > _not_ host specific?
> 
> No, not really. Or perhaps we mean different things. The name
> itself of course can be anything, but what is relevant here is
> what it stands for. And "class0" may mean one thing on host 1
> and a completely different thing on host2. Yet we need a certain
> name to always mean the same thing (or else we'd need
> translation when moving VMs between hosts).
> 
> >>  I think class names need to be descriptive
> >> and uniform across hosts. That would allow migration of such VMs as
> >> well as prevent starting them on a host not having suitable hardware.
> >> 
> > ...what George suggested (but please, George, when back, correct me if
> > I'm misrepresenting your ideas :-)) that:
> >  - something generic, such as class0, class1 will always exist (well, 
> >    at least class0). They would basically constitute the Xen interface;
> >  - toolstack will accept more specific names, such as 'big' and 
> >    'little', and also 'A57' and 'A43' (I'm making up the names), etc.
> >  - a VM with vCPUs in class0 and class1 will always be created and run 
> >    on any 2 classes system;
> 
> How can that work, if you don't know what class1 represents?
> 
> > a VM with big and little vCPUs will only 
> >    run on an ARM big.LITTLE incarnation; a VM with A57 and A43 vCPUs 
> >    will only run on an host that has at least one A57 and one A43 
> >    pCPUs.
> > 
> > What's not clear to me is how to establish:
> >  - the ordering among classes;
> 
> As said before - there's at best some partial ordering going to be
> possible.
> 
> >  - the mapping between Xen's neuter names and the toolstack's (arch) 
> >    specific ones.
> 
> Perhaps it needs re-consideration whether class names make
> sense in the first place? What about, for example, making class
> names something entirely local to the domain config file, and
> besides specifying
> 
> vcpuclass=["0-3:class0", "4-7:class1"]
> 
> requiring for it to also specify the properties of the classes it
> uses:
> 
> class0=["..."]
> class1=["..."]
> 
> The specifiers then would be architecture specific, e.g.
> 
> class0=["arm64"]
> class1=["arm64.big"]
> 
> or on x86
> 
> class0=["x86-64"]
> class1=["x86.avx", "x86.avx2"]
> class2=["x86.XeonPhi"]
> 
> Of course this goes quite a bit in the direction of CPUID handling,
> so Andrew may have a word to say here.

This is good, but given that we are not likely to support cross-arch
migration (i.e. ARM to x86), the xl parser can be smart enough to
accept the following syntax too, as an alias to the one you suggested:

vcpuclass=["0-3:arm64.big", "4-7:arm64.LITTLE"]

or even

vcpuclass=["0-3:big", "4-7:LITTLE"]

if the receiving end is not a big.LITTLE machine, it will be easy for it
to map "big" and "LITTLE" to two arbitrary classes, such as class0 and
class1.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-09  9:09         ` Jan Beulich
  2016-12-09 19:20           ` Stefano Stabellini
@ 2016-12-16  8:00           ` George Dunlap
  1 sibling, 0 replies; 22+ messages in thread
From: George Dunlap @ 2016-12-16  8:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Peng Fan, StefanoStabellini, Andrew Cooper,
	Dario Faggioli, George Dunlap, Xen Devel,
	anastassios.nanos@onapp.com, Peng Fan

> On Dec 9, 2016, at 5:09 PM, Jan Beulich <JBeulich@suse.com> wrote:
> 
>>>> On 09.12.16 at 09:29, <dario.faggioli@citrix.com> wrote:
>> On Fri, 2016-12-09 at 01:13 -0700, Jan Beulich wrote:
>>>>>> On 08.12.16 at 22:54, <dario.faggioli@citrix.com> wrote:
>>>> Yeah, that was what was puzzling me too. Keeping them ordered has
>>>> the
>>>> nice property that if a user says the following in a config file:
>>>> 
>>>> vcpuclass=["0-3:class0", "4-7:class1"]
>>>> 
>>>> (assuming that class0 and class1 are the always available Xen
>>>> names) it
>>> 
>>> This, btw, is another aspect I think has a basic problem: class0 and
>>> class1 say nothing about the properties of a class, and hence are
>>> tied to one particular host.
>>> 
>> The other way round, I'd say. I mean, since they say nothing, they're
>> _not_ host specific?
> 
> No, not really. Or perhaps we mean different things. The name
> itself of course can be anything, but what is relevant here is
> what it stands for. And "class0" may mean one thing on host 1
> and a completely different thing on host2. Yet we need a certain
> name to always mean the same thing (or else we'd need
> translation when moving VMs between hosts).
> 
>>> I think class names need to be descriptive
>>> and uniform across hosts. That would allow migration of such VMs as
>>> well as prevent starting them on a host not having suitable hardware.
>>> 
>> ...what George suggested (but please, George, when back, correct me if
>> I'm misrepresenting your ideas :-)) that:
>> - something generic, such as class0, class1 will always exist (well, 
>>   at least class0). They would basically constitute the Xen interface;
>> - toolstack will accept more specific names, such as 'big' and 
>>   'little', and also 'A57' and 'A43' (I'm making up the names), etc.
>> - a VM with vCPUs in class0 and class1 will always be created and run 
>>   on any 2 classes system;
> 
> How can that work, if you don't know what class1 represents?
> 
>> a VM with big and little vCPUs will only 
>>   run on an ARM big.LITTLE incarnation; a VM with A57 and A43 vCPUs 
>>   will only run on an host that has at least one A57 and one A43 
>>   pCPUs.
>> 
>> What's not clear to me is how to establish:
>> - the ordering among classes;
> 
> As said before - there's at best some partial ordering going to be
> possible.
> 
>> - the mapping between Xen's neuter names and the toolstack's (arch) 
>>   specific ones.
> 
> Perhaps it needs re-consideration whether class names make
> sense in the first place? What about, for example, making class
> names something entirely local to the domain config file, and
> besides specifying
> 
> vcpuclass=["0-3:class0", "4-7:class1"]
> 
> requiring for it to also specify the properties of the classes it
> uses:
> 
> class0=["..."]
> class1=["..."]
> 
> The specifiers then would be architecture specific, e.g.
> 
> class0=["arm64"]
> class1=["arm64.big"]
> 
> or on x86
> 
> class0=["x86-64"]
> class1=["x86.avx", "x86.avx2"]
> class2=["x86.XeonPhi"]
> 
> Of course this goes quite a bit in the direction of CPUID handling,
> so Andrew may have a word to say here.

So my goal when I made my suggestion was that:

1. People who knew exactly what they wanted and knew what their hardware would be could specify exactly what they wanted to happen.  This would probably include embedded chip vendors designing a custom system.

2. People who had a general preference but didn’t know the exact hardware could specify vague parameters (such as “large class” or “small class”) and get something approximating the vague parameters.  This might include people who were writing a generic piece of software to be run on a large class of potential devices (automotive, routers, &c).

3. People who didn’t specify anything would get a default behavior which was sensible.

From what I remember of the last discussion, there is no “arm64.big”.  You might have an A15 core and an A7 core; and in that case the A15 core would be “big”.  But you also might have two A15 cores, one with a higher clock speed and/or more cache than the other.  So “arm64.big” and “arm64.little" isn't actually any more precise than “class0” and “class1”.

So my idea was (to re-iterate):
1. Sort them into classes by power
2. Allow the user to either specify the class number (class0 > class 1 > class2), *or* to make more specific requests (“arm64.A15”, &c).

That way, people descibed by #3 can not specify anything, and the toolstack can decide whether to give it class 0/1 based on some heuristic and / or policy; people described by #2 can just say “class 0” or “class 1”, and people in class #1 can specify exactly what they want.

Now I understand that it may not always be clear which of two processors is “more powerful” — but to accomplish the above-stated goal, it turns out that’s not necessary.  If two processors are about equally powerful but in slightly different ways, then it doesn’t matter which one you get when you ask for “the bigger one”; so it doesn’t matter which order you put them in (although it should probably be repeatable).

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-07 18:29 [DOC RFC] Heterogeneous Multi Processing Support in Xen Dario Faggioli
  2016-12-08  6:12 ` Juergen Gross
  2016-12-08 10:14 ` Jan Beulich
@ 2016-12-16  8:05 ` George Dunlap
  2016-12-16  8:07   ` George Dunlap
  2017-03-01  0:05 ` Anastassios Nanos
  3 siblings, 1 reply; 22+ messages in thread
From: George Dunlap @ 2016-12-16  8:05 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Jürgen Groß, Peng Fan, Stefano Stabellini,
	Andrew Cooper, George Dunlap, Xen Devel,
	anastassios.nanos@onapp.com, Jan Beulich, Peng Fan


> On Dec 8, 2016, at 2:29 AM, Dario Faggioli <dario.faggioli@citrix.com> wrote:

> For the vCPUs for which no class is specified, default behavior applies.
> 
> **TODO:** note that I think it must be possible to associate more than
> one class to a vCPU. This is expressed in the example above, and assumed
> to be true throughout the document. It might be, though, that, at least at
> early stages (see implementation phases below), we will enable only 1-to-1
> mapping.
> 
> **TODO:** default can be, either:
> 
> 1. the vCPU can run on any CPU of any class,
> 2. the vCPU can only run on a specific, arbitrary decided, class (and I'd say
>   that should be class 0).

I thought that one of the issues was that sometimes there are instructions available on one pcpu and not on another; in which case once the kernel initializes a particular vcpu on a particular pcpu it needs to stay there.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-16  8:05 ` George Dunlap
@ 2016-12-16  8:07   ` George Dunlap
  0 siblings, 0 replies; 22+ messages in thread
From: George Dunlap @ 2016-12-16  8:07 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Jürgen Groß, Peng Fan, Stefano Stabellini,
	Andrew Cooper, Xen Devel, anastassios.nanos@onapp.com,
	Jan Beulich, Peng Fan


> On Dec 16, 2016, at 4:05 PM, George Dunlap <george.dunlap@citrix.com> wrote:
> 
> 
>> On Dec 8, 2016, at 2:29 AM, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> 
>> For the vCPUs for which no class is specified, default behavior applies.
>> 
>> **TODO:** note that I think it must be possible to associate more than
>> one class to a vCPU. This is expressed in the example above, and assumed
>> to be true throughout the document. It might be, though, that, at least at
>> early stages (see implementation phases below), we will enable only 1-to-1
>> mapping.
>> 
>> **TODO:** default can be, either:
>> 
>> 1. the vCPU can run on any CPU of any class,
>> 2. the vCPU can only run on a specific, arbitrary decided, class (and I'd say
>>  that should be class 0).
> 
> I thought that one of the issues was that sometimes there are instructions available on one pcpu and not on another; in which case once the kernel initializes a particular vcpu on a particular pcpu it needs to stay there.

Sorry, this should say, “once a kernel initializes a particular vcpu on a particular *class of* pcpu, it needs to stay *within that class*.”

 -G

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2016-12-07 18:29 [DOC RFC] Heterogeneous Multi Processing Support in Xen Dario Faggioli
                   ` (2 preceding siblings ...)
  2016-12-16  8:05 ` George Dunlap
@ 2017-03-01  0:05 ` Anastassios Nanos
  2017-03-01 17:38   ` Dario Faggioli
  3 siblings, 1 reply; 22+ messages in thread
From: Anastassios Nanos @ 2017-03-01  0:05 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Jürgen Groß, Peng Fan, Stefano Stabellini,
	George Dunlap, Andrew Cooper, thkatsios, Xen Devel, Jan Beulich,
	Peng Fan

On Wed, Dec 7, 2016 at 8:29 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> % Heterogeneous Multi Processing Support in Xen
> % Revision 1
>
> \clearpage
>
> # Basics
>
> ---------------- ------------------------
>          Status: **Design Document**
>
> Architecture(s): x86, arm
>
>    Component(s): Hypervisor and toolstack
> ---------------- ------------------------
>
> # Overview
>
> HMP (Heterogeneous Multi Processing) and AMP (Asymmetric Multi Processing)
> refer to systems where physical CPUs are not exactly equal. It may be that
> they have different processing power, or capabilities, or that each is
> specifically designed to run a particular system component.
> Most of the times the CPUs have different Instruction Set Architectures (ISA)
> or Application Binary Interfaces (ABIs). But they may *just* be different
> implementations of the same ISA, in which case they typically differ in
> speed, power efficiency or handling of special things (e.g., erratas).
>
> An example is ARM big.LITTLE, which in fact, is the use case that got the
> discussion about HMP started. This document, however, is generic, and does
> not target only big.LITTLE.
>
> What need proper Xen support are systems and use cases where virtual CPUs
> can not be seamlessly moved around all the physical CPUs. In fact, in these
> cases, there must be a way to:
>
> * decide and specify on what (set of) physical CPU(s), each vCPU can execute on;
> * enforce that a vCPU that can only run on a certain (set of) pCPUs, is never
>   actually run anywhere else.
>
> **N.B.:** it is becoming common to refer as AMP or HMP also to systems which
> have various kind of co-processors (from crypto engines to graphic hardware),
> integrated with the CPUs on the same chip. This is not what this design
> document is about.
>
> # Classes of CPUs
>
> A *class of CPUs* is defined as follows:
>
> 1. each pCPU in the system belongs to a class;
> 2. a class can consist of one or more pCPUs;
> 3. each pCPU can only be in one class;
> 4. CPUs belonging to the same class are homogeneous enough that a virtual
>    CPU that blocks/is preempted while running on a pCPU of a class can,
>    **seamlessly**, unblock/be scheduler on any pCPU of that same class;
> 5. when a virtual CPU is associated with a (set of) class(es) of CPUs, it
>    means that the vCPU can run on all the pCPUs belonging to the said
>    class(es).
>
> So, for instance, in architecture Foobar two classes of CPUs exist, class
> foo and class bar. If a virtual CPU running on a CPU 0, which is of class
> foo, blocks (or is preempted), it can, when it unblocks (or is selected by
> the scheduler to run  again), run on CPU 3, still of class foo, but not on
> CPU 6, which is of class bar.
>
> ## Defining classes
>
> How a class is defined, i.e., what are the specific characteristics that
> determine what CPUs belong to which class, is highly architecture specific.
>
> ### x86
>
> There is no HMP platform of relevance, for now, in x86 world. Therefore,
> only one class will exist, and all the CPUs will be set to belong to it.
> **TODO X86:** is this correct?
>
> ### ARM
>
> **TODO ARM:** I know nothing about what specifically should be used to
> form classes, so I'm deferring this to ARM people.
>
> So far, in the original thread the following ideas came up (well, there's
> more, but I don't know enough of ARM to judge what is really relevant about
> this topic):
>
> * [Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02153.html)
>   "I don't think an hardcoded list of processor in Xen is the right solution.
>    There are many existing processors and combinations for big.LITTLE so it
>    will nearly be impossible to keep updated."
> * [Julien](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02256.html)
>   "Well, before trying to do something clever like that (i.e naming "big" and
>   "little"), we need to have upstreamed bindings available to acknowledge the
>   difference. AFAICT, it is not yet upstreamed for Device Tree and I don't
>   know any static ACPI tables providing the similar information."
> * [Peng](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02194.html)
>   "For how to differentiate cpus, I am looking the linaro eas cpu topology code"
>
> # User details
>
> ## Classes of CPUs for the users
>
> It will be possible, in a VM config file, to specify the (set of) class(es)
> of each vCPU. This allows creating HMP VMs.
>
> E.g., on ARM, it will be possible to create big.LITTLE VMs which, if run on
> big.LITTLE hosts, could leverage the big.LITTLE support of the guest OS kernel
> and tools.
>
> For such purpose, a new option will be added to xl config file:
>
>     vcpus = "8"
>     vcpuclass = ["0-2:class0", "3,4:class1,class3", "5:class0, class2", "8:class4"]
>
> with the following meaning:
>
> * vCPUs 0, 1, 2 can only run on pcpus of class class0
> * vCPUs 3, 4 can run on pcpus of class class1 **and** on pcpus of class class3
> * vCPUs 5 can run on pcpus of class class0 **and** on pCPUs of class class2
> * for vCPUs 7, since they're not mentioned, default applies
> * vCPUs 8 can only run on pcpus of class class4
>
> For the vCPUs for which no class is specified, default behavior applies.
>
> **TODO:** note that I think it must be possible to associate more than
> one class to a vCPU. This is expressed in the example above, and assumed
> to be true throughout the document. It might be, though, that, at least at
> early stages (see implementation phases below), we will enable only 1-to-1
> mapping.
>
> **TODO:** default can be, either:
>
> 1. the vCPU can run on any CPU of any class,
> 2. the vCPU can only run on a specific, arbitrary decided, class (and I'd say
>    that should be class 0).
>
> The former seems a better interface. It looks to me like the most natural
> and less surprising, from the user point of view, and the most future proof
> (see phase 3 of implementation below).
> The latter may be more practical, though. In fact, with the former, we risk
> crashing (the guest or the hypervisor) if one creates a VM and forgets to
> specify the vCPU classes --which does not look ideal.
>
> It will be possible to gather information about what classes exist, and what
> pCPUs belong to each class, by issuing the `xl info -n' command:
>
>     cpu_topology           :
>     cpu:    core    socket     node     class
>       0:       0        1        0        0
>       1:       0        1        0        1
>       2:       1        1        0        2
>       3:       1        1        0        3
>       4:       9        1        0        3
>       5:       9        1        0        0
>       6:      10        1        0        1
>       7:      10        1        0        2
>       8:       0        0        1        3
>       9:       0        0        1        3
>      10:       1        0        1        1
>      11:       1        0        1        0
>      12:       9        0        1        1
>      13:       9        0        1        0
>      14:      10        0        1        2
>      15:      10        0        1        2
>
> **TODO:** do we want to keep using `-n`, or add another switch, like -c or
> something? I'm not sure I like using `-n` as, e.g., on x86, this would most
> of the times result in just a column full of `0`, and it may raise confusion
> among users about what that actually means.
> Also, do we want to print the class ids, or some more abstract class names?
> (or support both, and have a way to decide which one to see)?
>
> # Technical details
>
> ## Hypervisor
>
> The hypervisor needs to know within which class each of the present CPUs
> falls. At boot (or, in general, CPU bringup) time, while identifying the CPU,
> a list of classes is constructed, and the mapping between each CPU and the
> class it is determined it should belong, established.
>
> The list of classes is kept ordered from the more powerful to the less
> powerful.
> **TODO:** this has been [proposed by George](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html).
> I like the idea, what do others think? If we agree on that, note that there
> has been no discussion on defining what "more powerful" means, neither on
> x86 (although, not really that interesting, for now, I'd say), nor on ARM.
>
> The mapping between CPUs and classes will be kept in memory in the following
> data structures:
>
>     uint16_t cpu_to_class[NR_CPUS] __read_mostly;
>     cpumask_t class_to_cpumask[NR_CPUS] __read_mostly;
>
> **TODO:** it's probably better to allocate the cpumask array dynamically,
> to avoid wasting too much space.
>
> **TODO:** if we want the ordering, structure needs to be kept ordered too
> (or additional structures should be used for the purpose).
>
> Each virtual CPU must know on what class of CPUs it can run on. Since a
> vCPU can be associated to more than one class, the best way to keep track
> of this information is a bitamp. That will be a new `cpumask` typed member
> in `struct vcpu`. were the i-eth bit set means the vCPU can
> run on CPUs of class i.
>
> If a vCPU is found running on a pCPU of a class that is not associated to
> the vCPU itself, an exception should be raised.
> **TODO:** What kind? BUG_ON? Crash the guest? The guest would probably crash
> --or become unreliable-- by its own, I guess.
>
> Setting and getting the CPU class of a vCPU will happen via two new
> hypercalls:
>
> * `XEN_DOMCTL_setvcpuclass`
> * `XEN_DOMCTL_setvcpuclass`
>
> Information about CPU classes will be propagated to toolstak by adding a
> new field in xen_sysctl_cputopo, which will become:
>
>     struct xen_sysctl_cputopo {
>         uint32_t core;
>         uint32_t socket;
>         uint32_t node;
>         unit32_t class;
>     };
>
> For homogeneous and SMP systems, the value of the new class field will
> be 0 for all the cores.
>
> ## Toolstack
>
> It will be possible for the toolstack to retrieve from Xen the list of
> existing CPU classes, their names, and the information about to which
> class each present CPU belongs to.
>
> **TODO:** [George suggested](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02212.html)
> to allow a richer set of labels, at the toolstack level, and I like
> the idea very much. It's not clear to me, though, in what component
> this list of names, and the mapping between them and the classes as
> they're known inside Xen should live.
>
> Libxl and libxc interfaces will be introduced for associating a vCPU to
> a (set of) class(es):
>
> * `libxl_set_vcpuclass()`, `libxl_get_vcpuclass()`;
> * `xc_vcpu_setclass()`, `xc_vcpu_getclass()`.
>
> In libxl, class information will be added in `struct libxl_cputopology`,
> which is filled by `libxl_get_cpu_topology()`.
>
> # Implementation
>
> Implementation can proceed in phases.
>
> ## Phase 1
>
> Class definition, identification and mapping of CPUs to classes, inside
> Xen, will be implemented. And so they will be libxc and libxl interfaces
> for retrieving such information.
>
> Parsing of the new `vcpuclass` parameter will be implemented in `xl`. The
> result of such parsing will then be used as if it were the hard-affinity of
> the various vCPUs. That is, we will set the hard-affinity of each vCPU, to
> the pCPUs that are part of the class(es) the vCPU itself is being assigned,
> according to `vcpuclass`.
>
> This would *Just Work(TM)*, as soon as the user does not try to change the
> hard-affinity, during the VM lifetime (e.g., with `xl vcpu-pin').
>
> **TODO:** It may be useful, for avoiding the above to happen, to add another
> `xl` config option that, if set, disallows changing the affinity from what it
> was at VM creation time (something like `immutable_affinity=1`). Thoughts?
> I'm leaning toward doing that, as it may even be something useful to have
> in other usecases.
>
> ### Phase 1.5
>
> Library (libxc and libxl) calls and hypercalls that are necessary to associate
> a class to the vCPUs will be implemented.
>
> At which point, when parsing `vcpuclass` in `xl`, we will call both (with the
> same bitmap as input):
>
> * `libxl_set_vcpuclass()`
> * `libxl_set_vcpuaffinity()`
>
> `libxl__set_vcpuaffinity()` will be modified in such a way that, when setting
> hard-affinity for a vCPU:
>
> * it will get the CPU class(es) associated to the vCPU;
> * it will check what pCPUs that belong to the class(es);
> * it will filter out, from the new hard-affinity being set, the pCPUs that
>    are not in the vCPU's class(es)'.
>
> As a safety measure, `vcpu_set_hard_affinity()` in Xen will also be modified
> such that, if someone somehow manages to pass down an hard-affinity mask
> which contains pCPUs outside from the proper classes, it will error out
> with -EINVAL.
>
> ### Phase 2
>
> Inside Xen, the various schedulers will be modified to deal internally with
> the fact that vCPUs can only run on pCPUs from the class(es) they are
> associated with. This allows for more efficient implementation, and paves
> the way for enabling more intelligent logic (e.g., for minimizing power
> consumption) in *phase 3*.
>
> Calling `libxl_set_vcpuaffinity()` from `xl` / libxl is therefore no longer
> necessary and will be avoided (i.e., only `libxl_set_vcpuclass()` will be
> called).
>
> ### Phase 3
>
> Moving vCPUs between classes will be implemented. This means that, e.g.,
> on ARM big.LITTLE, it will be possible for a vCPU to block on a big core
> and wakeup on a LITTLE core.
>
> **TODO:** About what this takes, see [Julien's email](https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg02345.html).
>
> This means it will be no longer necessary to specify the class of the
> vCPUs via `vcpuclass` in `xl`, although that will of course remain
> supported. So:
>
> 1. if one wants (sticking with big.LITTLE as example) a big.LITTLE VM,
>    and wants to make sure that make sure that big vCPUs will run on big
>    pCPUs, and that LITTLE vCPUs will run on LITTLE pCPUs, she will use:
>
>     vcpus = "8"
>     vcpuclass = ["0-3:big", "4-7:little"]
>
> 2. if one does not care, and is happy to let the Xen scheduler decide
>    where to run the various vCPUs, in order, for instance, to be sure
>    to get the best power efficiency for the host as a whole, he can
>    just avoid specifying any `vcpuclass`, or doing something like this:
>
>     vcpuclass = ["all:all"]
>
> # Limitations
>
> * Until in *phase 1*, it won't be possible to use vCPU hard-affinity
>   for anything else than HMP support;
> * until before *phase 3*, since HMP support is basically the same as
>   setting hard-affinity, performance may not be ideal;
> * until before *phase 3*, vCPUs can't move between classes. This means.
>   for instance, in the big.LITTLE world, Xen's scheduler can't move a
>   vCPU running on a big core on a LITTLE core (e.g., to try save power).
>
> # Testing
>
> Testing requires an actual AMP/HMP system. On such a system, we at least
> want to:
>
> * create a VM **without** specifying `vcpuclass` in its config file, and
>   check that the default policy is correctly applied to all vCPUs;
> * create a VM **specifying** `vcpuclass` in its config file and check that
>   the classes are assegned to vCPUs appropriately;
> * create a VM **specifying** `vcpuclass` in its config file and check that
>   the various vCPUs are not running on any pCPU outside of their respective
>   classes.
>
> # Areas for improvement
>
> * Make it possible to test even on non-HMP systems. That could be done by
>   making it possible to provide Xen with fake CPU classes for the system
>   CPUs (e.g., with boot time parameters);
> * implement a way to view the class the vCPUs have been assigned (either as
>   past of the output of `xl vcpu-list`, or as a dedicated `xl` subcommand);
> * make it possible to dynamically change the class of vCPUs at runtime, with
>   `xl` (either via a new parameter to `vcpu-pin` subcommand, or via a new
>   subcommand).
>
> # Known issues
>
> *TBD*.
>
> # References
>
> * [Asymetric Multi Processing](https://en.wikipedia.org/wiki/Asymmetric_multiprocessing)
> * [Heterogeneous Multi Processing](https://en.wikipedia.org/wiki/Heterogeneous_computing)
> * [ARM big.LITTLE](https://www.arm.com/products/processors/technologies/biglittleprocessing.php)
>
> # History
>
> ------------------------------------------------------------------------
> Date       Revision Version  Notes
> ---------- -------- -------- -------------------------------------------
> 2016-12-02 1                 RFC of design document
> ---------- -------- -------- -------------------------------------------

Hi all,

We are sending a branch[1] for comments on an initial implementation
of the above design document. Essentially it targets the ARM
big.LITTLE architecture. It would be great if you guys could comment
on the changes and provide some guidance for us to get it upstream.

We have tested it on an odroid xu4 [2] and we are able to boot guests
with mixed vcpu affinities (big and LITTLE).

We are more than happy to submit patches once we address the issues
and come up with a review-able version of this implementation.

Thanks!
A.

[1] https://github.com/HPSI/xen/tree/big.LITTLE
[2] using this cherry pick: 8d56205455a4a1e0233421d3ee98e3c7dee20bd2
from: https://github.com/bkrepo/xen.git

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2017-03-01  0:05 ` Anastassios Nanos
@ 2017-03-01 17:38   ` Dario Faggioli
  2017-03-01 18:58     ` Stefano Stabellini
  0 siblings, 1 reply; 22+ messages in thread
From: Dario Faggioli @ 2017-03-01 17:38 UTC (permalink / raw)
  To: Anastassios Nanos
  Cc: Jürgen Groß, Peng Fan, Stefano Stabellini,
	George Dunlap, Andrew Cooper, thkatsios, Xen Devel, Jan Beulich,
	Peng Fan


[-- Attachment #1.1: Type: text/plain, Size: 2509 bytes --]

On Wed, 2017-03-01 at 02:05 +0200, Anastassios Nanos wrote:
> On Wed, Dec 7, 2016 at 8:29 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > 
> > % Heterogeneous Multi Processing Support in Xen
> > % Revision 1
> > 
> > [...]
> Hi all,
> 
Hello,

> We are sending a branch[1] for comments on an initial implementation
> of the above design document. Essentially it targets the ARM
> big.LITTLE architecture. 
>
W00t ?!?! Just the fact that you did this, it is just great... thanks
for that.

> It would be great if you guys could comment
> on the changes and provide some guidance for us to get it upstream.
> 
I'm sure up for that. I already know I won't have time to look at it
until next week. But I'll make some space to look at the code then (I'm
travelling, so I won't be furiously doing my own development anyway).

> We have tested it on an odroid xu4 [2] and we are able to boot guests
> with mixed vcpu affinities (big and LITTLE).
> 
Great to hear this too.

> We are more than happy to submit patches once we address the issues
> and come up with a review-able version of this implementation.
> 
Sure. So, from just a very quick glance, I can see an unique giant
commit. This is ok for now, and I will look at it as it is.

But, for sure, the first step toward making things reviewable, is to
split the big patch in a series of smaller patches, as you probably
know yourself already. :-)

Since you're touching different component (as in, hypervisor,
toolstack, build system, etc), splitting at the component boundaries is
quite often something we want and ask for.

Another criteria, orthogonal to the one cited above, is to separate
patches that change architecture specific code, from patches that
touches common areas.

But, in general, the principle to follow is to split the patches at the
"logical boundary", as this tries to explain:
https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches#Break_down_your_patches

It's a rather difficult call, especially for changes like this.
Therefore, as a first and fundamental step toward reviewability, I'd
suggest start thinking at how to do the splitup.

Anyway, I'll let you have my comments.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [DOC RFC] Heterogeneous Multi Processing Support in Xen
  2017-03-01 17:38   ` Dario Faggioli
@ 2017-03-01 18:58     ` Stefano Stabellini
  0 siblings, 0 replies; 22+ messages in thread
From: Stefano Stabellini @ 2017-03-01 18:58 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: Jürgen Groß, Peng Fan, Stefano Stabellini,
	George Dunlap, Andrew Cooper, thkatsios, Xen Devel, julien.grall,
	Jan Beulich, Peng Fan, Anastassios Nanos

CC'ing Julien.

On Wed, 1 Mar 2017, Dario Faggioli wrote:
> On Wed, 2017-03-01 at 02:05 +0200, Anastassios Nanos wrote:
> > On Wed, Dec 7, 2016 at 8:29 PM, Dario Faggioli
> > <dario.faggioli@citrix.com> wrote:
> > > 
> > > % Heterogeneous Multi Processing Support in Xen
> > > % Revision 1
> > > 
> > > [...]
> > Hi all,
> > 
> Hello,
> 
> > We are sending a branch[1] for comments on an initial implementation
> > of the above design document. Essentially it targets the ARM
> > big.LITTLE architecture. 
> >
> W00t ?!?! Just the fact that you did this, it is just great... thanks
> for that.

Yes, thank you for your work!


> > It would be great if you guys could comment
> > on the changes and provide some guidance for us to get it upstream.
> > 
> I'm sure up for that. I already know I won't have time to look at it
> until next week. But I'll make some space to look at the code then (I'm
> travelling, so I won't be furiously doing my own development anyway).
> 
> > We have tested it on an odroid xu4 [2] and we are able to boot guests
> > with mixed vcpu affinities (big and LITTLE).
> > 
> Great to hear this too.
> 
> > We are more than happy to submit patches once we address the issues
> > and come up with a review-able version of this implementation.
> > 
> Sure. So, from just a very quick glance, I can see an unique giant
> commit. This is ok for now, and I will look at it as it is.
> 
> But, for sure, the first step toward making things reviewable, is to
> split the big patch in a series of smaller patches, as you probably
> know yourself already. :-)
> 
> Since you're touching different component (as in, hypervisor,
> toolstack, build system, etc), splitting at the component boundaries is
> quite often something we want and ask for.
> 
> Another criteria, orthogonal to the one cited above, is to separate
> patches that change architecture specific code, from patches that
> touches common areas.
> 
> But, in general, the principle to follow is to split the patches at the
> "logical boundary", as this tries to explain:
> https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches#Break_down_your_patches

It would also be nice if you could summarize the design, and the main
architectural choices, in your introductory 0/N patch.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-03-01 18:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-07 18:29 [DOC RFC] Heterogeneous Multi Processing Support in Xen Dario Faggioli
2016-12-08  6:12 ` Juergen Gross
2016-12-08 10:27   ` Dario Faggioli
2016-12-08 10:38     ` Juergen Gross
2016-12-08 21:45       ` Dario Faggioli
2016-12-15 18:41       ` Dario Faggioli
2016-12-16  7:44         ` Juergen Gross
2016-12-08 10:14 ` Jan Beulich
2016-12-08 10:23   ` Dario Faggioli
2016-12-08 10:41     ` Jan Beulich
2016-12-08 19:09   ` Stefano Stabellini
2016-12-08 21:54   ` Dario Faggioli
2016-12-09  8:13     ` Jan Beulich
2016-12-09  8:29       ` Dario Faggioli
2016-12-09  9:09         ` Jan Beulich
2016-12-09 19:20           ` Stefano Stabellini
2016-12-16  8:00           ` George Dunlap
2016-12-16  8:05 ` George Dunlap
2016-12-16  8:07   ` George Dunlap
2017-03-01  0:05 ` Anastassios Nanos
2017-03-01 17:38   ` Dario Faggioli
2017-03-01 18:58     ` Stefano Stabellini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).