"cpus" config parameter broken?

All of lore.kernel.org
 help / color / mirror / Atom feed

* "cpus" config parameter broken?
@ 2008-01-08  1:09 Dan Magenheimer
  2008-01-08  1:57 ` Ian Pratt
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Magenheimer @ 2008-01-08  1:09 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com


[-- Attachment #1.1: Type: text/plain, Size: 1243 bytes --]

I've been looking into a report that vcpu pinning doesn't get preserved across a save/restore
(or migrate) and, debugging this, it's starting to look like the "cpus" config file parameter doesn't
work very well -- on 3.1 or unstable!  If anything is specified other than a single integer,
the code reverts to "any cpu".  I think I found this specific problem but there seems to be some
bad bit rot hiding behind that.   So before I go any further, I thought I'd ask a few questions:

1) Is the "cpus" parameter expected to work in a config file or is it somehow deprecated?
   (I see there is an "xm vcpu-pin" command so perhaps this is the accepted way to
   pin cpu's?)
2) Pinning via the "cpus" parameter calls vcpu_set_affinity() but I've always thought the term
    "affinity" expresses a preference not a restriction.  If the call to setaffinity did get
   made properly, would the scheduler really restrict the vcpu to certain pcpu's?  And
   what happens if the vcpu is ready to schedule but none of the restricted set of
   pcpu's is available?
3) Does "cpus" really have any real-world usage anyhow?  E.g. are most uses probably just
    user misunderstanding where "vcpu_avail" should be used instead?

Thanks,
Dan

[-- Attachment #1.2: Type: text/html, Size: 3423 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: "cpus" config parameter broken?
  2008-01-08  1:09 "cpus" config parameter broken? Dan Magenheimer
@ 2008-01-08  1:57 ` Ian Pratt
  2008-01-09 18:40   ` Dan Magenheimer
  0 siblings, 1 reply; 16+ messages in thread
From: Ian Pratt @ 2008-01-08  1:57 UTC (permalink / raw)
  To: Dan Magenheimer, xen-devel; +Cc: Ian Pratt

[-- Attachment #1: Type: text/plain, Size: 1790 bytes --]

> 1) Is the "cpus" parameter expected to work in a config file or is it
> somehow deprecated?
>    (I see there is an "xm vcpu-pin" command so perhaps this is the
> accepted way to
>    pin cpu's?)

It's expected to work.

> 2) Pinning via the "cpus" parameter calls vcpu_set_affinity() but I've
> always thought the term
>     "affinity" expresses a preference not a restriction.  If the call
> to setaffinity did get
>    made properly, would the scheduler really restrict the vcpu to
> certain pcpu's?  And
>    what happens if the vcpu is ready to schedule but none of the
> restricted set of pcpu's is available?

It's a restriction. Each of the values in the mask is processed modulo the number of physical CPUs.

> 3) Does "cpus" really have any real-world usage anyhow?  E.g. are most
> uses probably just
>     user misunderstanding where "vcpu_avail" should be used instead?

I'm sure some admins use it to good effect in hand placing domains on CPUs, especially in a NUMA context. In most cases its typically best to be fully work conserving and give Xen's scheduler full flexibility.

There was an extension to the cpus= syntax proposed at one point that I'm not sure whether it ever got checked in. The idea was to allow the cpus= parameter to be a list of strings, enabling a different mask to specified for each VCPU. This would enable an admin to pin individual VCPUs to CPUs rather than just at a domain level.

I'm not a huge fan of the cpus= mechanism. It would likely be more user friendly to allow physical CPUs to be put into groups then allow domains to be assigned to CPU groups. It would also be better if you could specify physical CPUs by a node.socket.core.thread hierarchy rather than the enumerated CPU number.

Ian

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: "cpus" config parameter broken?
  2008-01-08  1:57 ` Ian Pratt
@ 2008-01-09 18:40   ` Dan Magenheimer
  2008-01-09 19:17     ` Keir Fraser
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Magenheimer @ 2008-01-09 18:40 UTC (permalink / raw)
  To: Ian Pratt, xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 4807 bytes --]

Thanks for the reply and sorry for the delay in mine... I've been
having email problems.

Please note proposal and request for comments below
(marked with >>>>>> Comments? <<<<<)

> > 1) Is the "cpus" parameter expected to work in a config 
> file or is it
> > somehow deprecated?
> >    (I see there is an "xm vcpu-pin" command so perhaps this is the
> > accepted way to
> >    pin cpu's?)
> 
> It's expected to work.

Yes indeed it does work.  There were some syntax variations in the cpus
param that I didn't quite understand.  However, my misunderstanding
uncovered another interesting problem.  See below.

> > 3) Does "cpus" really have any real-world usage anyhow?  
> E.g. are most
> > uses probably just
> >     user misunderstanding where "vcpu_avail" should be used instead?
> 
> I'm sure some admins use it to good effect in hand placing 
> domains on CPUs, especially in a NUMA context. In most cases 
> its typically best to be fully work conserving and give Xen's 
> scheduler full flexibility.

Yeah, I guess if you think of it as "poor man's hard partitioning"
it makes a lot of sense.  But if you think of it in a utility data
center context, true affinity rather than restriction may make more
sense.

And vcpu_avail should cover most app licensing/pricing concerns.

> >    what happens if the vcpu is ready to schedule but none of the
> > restricted set of pcpu's is available?
> 
> It's a restriction. Each of the values in the mask is 
> processed modulo the number of physical CPUs.

The output from "xm vcpu-list" observes the "modulo" but apparently
the scheduler does not.  For example on a 2 pcpu system launching
a 2 vcpu guest with cpus=0,3 (noting that 3 mod 2 = 1), "xm vcpu-list"
shows that each of the 2 vcpu's of the guest have "any cpu" in the
"CPU Affinity" column, reflecting the fact that 0,3 is modulo the
same as 0,1 which is the same as 0-1 which is the same as all.

However, the cpu_mask is saved as 0,3 and the scheduler ignores
any pcpu's other than 0 and 1.  This can be observed in "xm vcpu-list"
in the above example by seeing that both guest vcpus are sharing
processor 0.

So the results displayed by "xm vcpu-list" and the actual scheduler
placement are different, but which one is the bug?  Consider:

If a 2 vcpu guest is running on an 8 pcpu machine and has been
restricted to cpus="2,3,4,5" and this 2 vcpu guest gets migrated
to a 4 pcpu system, to which pcpus should the migrated guest be
restricted?  Using the xm_vcpu-list logic it gets all 4 pcpus,
but (if cpu_mask were preserved which it currently isn't) the
scheduler logic would give it just two (2 and 3).  And suppose
this 2 vcpu guest on the 8 pcpu system were restricted to "5-8"
and migrated to a 4 pcpu system.  It wouldn't get any processor
time at all (though xm_vcpu-list would say each vcpu's CPU Affinity
is "any").

Because affinity/cpu_restriction is not currently preserved across
save/restore or migration, this is a moot discussion.  But if I
were to "fix" it so it were preserved, the decision is important.

My opinion: CPU affinity/restriction should NOT be preserved
across migration.  Or if it is, it should only be preserved
when the source and target have the same number of pcpus
(thus allowing save/restore to work OK).  Or maybe it should
only be preserved for save/restore and not for migration.
>>>>>>>>>>>>>>>>> Comments? <<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Note that vcpu_avail would still work across migration.
(Hmmm... have to look to see if vcpu_avail is currently
preserved across save/restore/migration. If not, I will
definitely need to find and fix that one.)

> There was an extension to the cpus= syntax proposed at one 
> point that I'm not sure whether it ever got checked in. The 
> idea was to allow the cpus= parameter to be a list of 
> strings, enabling a different mask to specified for each 
> VCPU. This would enable an admin to pin individual VCPUs to 
> CPUs rather than just at a domain level.

It looks like the internal vcpu data structure supports this
and xm_vcpu-pin supports it, but afaict there's no way to
specify per-vcpu-affinity at xm_create.

> I'm not a huge fan of the cpus= mechanism. It would likely be 
> more user friendly to allow physical CPUs to be put into 
> groups then allow domains to be assigned to CPU groups. It 
> would also be better if you could specify physical CPUs by a 
> node.socket.core.thread hierarchy rather than the enumerated 
> CPU number.

Agreed, though I'll bet that would take major scheduler surgery.
And this would also further increase the confusion for migration!

I'd also like to see affinity and restriction teased apart
because they are separate concepts with different uses.

Dan

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: "cpus" config parameter broken?
  2008-01-09 18:40   ` Dan Magenheimer
@ 2008-01-09 19:17     ` Keir Fraser
  2008-01-10 18:38       ` Dan Magenheimer
  0 siblings, 1 reply; 16+ messages in thread
From: Keir Fraser @ 2008-01-09 19:17 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com, Ian Pratt,
	xen-devel@lists.xensource.com

On 9/1/08 18:40, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

> My opinion: CPU affinity/restriction should NOT be preserved
> across migration.  Or if it is, it should only be preserved
> when the source and target have the same number of pcpus
> (thus allowing save/restore to work OK).  Or maybe it should
> only be preserved for save/restore and not for migration.
>>>>>>>>>>>> Comments? <<<<<<<<<<<<<<<<<<<<<<<<<<<<<

I agree with that. Unless save/restore is on the same machine (identified in
some way) or at least has identical CPU topology as far as we can see.
Otherwise some higher-level entity needs to be smart enough to work out
affinity during restore and issue the correct 'xm' commands (or equivalent).

 -- Keir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: "cpus" config parameter broken?
  2008-01-09 19:17     ` Keir Fraser
@ 2008-01-10 18:38       ` Dan Magenheimer
  2008-01-10 20:50         ` Keir Fraser
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Magenheimer @ 2008-01-10 18:38 UTC (permalink / raw)
  To: Keir Fraser, Ian Pratt, xen-devel@lists.xensource.com

As a logical consequence:

- the v->cpu_affinity mask should never have bits set for
  processors that don't exist on the current physical system
  (although all bits set == "any" is probably an OK exception)

- the modulo behavior currently implemented in "xm vcpu-pin"
  and the config file "cpus" parameter should be removed, and

- if cpu values are specified by "xm vcpu-pin" or "cpus"
  beyond the number of physical cpus, the xm command should
  fail.

Agreed?

> -----Original Message-----
> From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk]
> Sent: Wednesday, January 09, 2008 12:17 PM
> To: dan.magenheimer@oracle.com; Ian Pratt; 
> xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] "cpus" config parameter broken?
> 
> 
> On 9/1/08 18:40, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
> 
> > My opinion: CPU affinity/restriction should NOT be preserved
> > across migration.  Or if it is, it should only be preserved
> > when the source and target have the same number of pcpus
> > (thus allowing save/restore to work OK).  Or maybe it should
> > only be preserved for save/restore and not for migration.
> >>>>>>>>>>>> Comments? <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> 
> I agree with that. Unless save/restore is on the same machine 
> (identified in
> some way) or at least has identical CPU topology as far as we can see.
> Otherwise some higher-level entity needs to be smart enough 
> to work out
> affinity during restore and issue the correct 'xm' commands 
> (or equivalent).
> 
>  -- Keir
> 
> 
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: "cpus" config parameter broken?
  2008-01-10 18:38       ` Dan Magenheimer
@ 2008-01-10 20:50         ` Keir Fraser
  2008-01-10 21:10           ` Dan Magenheimer
  0 siblings, 1 reply; 16+ messages in thread
From: Keir Fraser @ 2008-01-10 20:50 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com, Ian Pratt,
	xen-devel@lists.xensource.com




On 10/1/08 18:38, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

> As a logical consequence:
> 
> - the v->cpu_affinity mask should never have bits set for
>   processors that don't exist on the current physical system
>   (although all bits set == "any" is probably an OK exception)

This is already the case.

> - the modulo behavior currently implemented in "xm vcpu-pin"
>   and the config file "cpus" parameter should be removed, and

Possibly.

> - if cpu values are specified by "xm vcpu-pin" or "cpus"
>   beyond the number of physical cpus, the xm command should
>   fail.

Again, possibly. I don't see much wrong with a liberal interpretation of
otherwise incorrect cpu config parameters though. If we tighten things up
then we need to make it easier to access CPU topology info from within
domain config files.

 -- Keir

> Agreed?
> 
>> -----Original Message-----
>> From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk]
>> Sent: Wednesday, January 09, 2008 12:17 PM
>> To: dan.magenheimer@oracle.com; Ian Pratt;
>> xen-devel@lists.xensource.com
>> Subject: Re: [Xen-devel] "cpus" config parameter broken?
>> 
>> 
>> On 9/1/08 18:40, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:
>> 
>>> My opinion: CPU affinity/restriction should NOT be preserved
>>> across migration.  Or if it is, it should only be preserved
>>> when the source and target have the same number of pcpus
>>> (thus allowing save/restore to work OK).  Or maybe it should
>>> only be preserved for save/restore and not for migration.
>>>>>>>>>>>> Comments? <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>> 
>> I agree with that. Unless save/restore is on the same machine
>> (identified in
>> some way) or at least has identical CPU topology as far as we can see.
>> Otherwise some higher-level entity needs to be smart enough
>> to work out
>> affinity during restore and issue the correct 'xm' commands
>> (or equivalent).
>> 
>>  -- Keir
>> 
>> 
>> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: "cpus" config parameter broken?
  2008-01-10 20:50         ` Keir Fraser
@ 2008-01-10 21:10           ` Dan Magenheimer
  2008-01-10 21:57             ` Keir Fraser
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Magenheimer @ 2008-01-10 21:10 UTC (permalink / raw)
  To: Keir Fraser, Ian Pratt, xen-devel@lists.xensource.com

I have blinders on since this discussion started with
my trying to figure out the syntax and semantics for
the "cpus" parameter as used in a config file, but:

> > - the v->cpu_affinity mask should never have bits set for
> 
> This is already the case.

No, with the cpus parameter, it is currently possible to
set bits in v->cpu_affinity mask for processors that don't
exist.

Perhaps this is the real bug then.  I will spin a patch
to implement the modulo behavior from "xm vcpu-set" for
the parsing of the cpus parameter and all will be well.

Thanks,
Dan

> -----Original Message-----
> From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk]
> Sent: Thursday, January 10, 2008 1:50 PM
> To: dan.magenheimer@oracle.com; Ian Pratt; 
> xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] "cpus" config parameter broken?
> 
> 
> 
> 
> 
> On 10/1/08 18:38, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> 
> > As a logical consequence:
> >
> > - the v->cpu_affinity mask should never have bits set for
> >   processors that don't exist on the current physical system
> >   (although all bits set == "any" is probably an OK exception)
> 
> This is already the case.
> 
> > - the modulo behavior currently implemented in "xm vcpu-pin"
> >   and the config file "cpus" parameter should be removed, and
> 
> Possibly.
> 
> > - if cpu values are specified by "xm vcpu-pin" or "cpus"
> >   beyond the number of physical cpus, the xm command should
> >   fail.
> 
> Again, possibly. I don't see much wrong with a liberal 
> interpretation of
> otherwise incorrect cpu config parameters though. If we 
> tighten things up
> then we need to make it easier to access CPU topology info from within
> domain config files.
> 
>  -- Keir
> 
> > Agreed?
> >
> >> -----Original Message-----
> >> From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk]
> >> Sent: Wednesday, January 09, 2008 12:17 PM
> >> To: dan.magenheimer@oracle.com; Ian Pratt;
> >> xen-devel@lists.xensource.com
> >> Subject: Re: [Xen-devel] "cpus" config parameter broken?
> >>
> >>
> >> On 9/1/08 18:40, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> >>
> >>> My opinion: CPU affinity/restriction should NOT be preserved
> >>> across migration.  Or if it is, it should only be preserved
> >>> when the source and target have the same number of pcpus
> >>> (thus allowing save/restore to work OK).  Or maybe it should
> >>> only be preserved for save/restore and not for migration.
> >>>>>>>>>>>> Comments? <<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> >>
> >> I agree with that. Unless save/restore is on the same machine
> >> (identified in
> >> some way) or at least has identical CPU topology as far as 
> we can see.
> >> Otherwise some higher-level entity needs to be smart enough
> >> to work out
> >> affinity during restore and issue the correct 'xm' commands
> >> (or equivalent).
> >>
> >>  -- Keir
> >>
> >>
> >>
> >
> 
> 
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: "cpus" config parameter broken?
  2008-01-10 21:10           ` Dan Magenheimer
@ 2008-01-10 21:57             ` Keir Fraser
  2008-01-10 22:40               ` Dan Magenheimer
  0 siblings, 1 reply; 16+ messages in thread
From: Keir Fraser @ 2008-01-10 21:57 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com, Ian Pratt,
	xen-devel@lists.xensource.com

On 10/1/08 21:10, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

>>> - the v->cpu_affinity mask should never have bits set for
>> 
>> This is already the case.
> 
> No, with the cpus parameter, it is currently possible to
> set bits in v->cpu_affinity mask for processors that don't
> exist.

Ah yes. But then the offline CPUs get masked out in vcpu_set_affinity(), and
the affinity mask is then rejected if the remaining CPU set is empty.

 -- Keir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: "cpus" config parameter broken?
  2008-01-10 21:57             ` Keir Fraser
@ 2008-01-10 22:40               ` Dan Magenheimer
  2008-01-10 22:46                 ` Keir Fraser
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Magenheimer @ 2008-01-10 22:40 UTC (permalink / raw)
  To: Keir Fraser, Ian Pratt, xen-devel@lists.xensource.com

> >>> - the v->cpu_affinity mask should never have bits set for
> >>
> >> This is already the case.
> >
> > No, with the cpus parameter, it is currently possible to
> > set bits in v->cpu_affinity mask for processors that don't
> > exist.
> 
> Ah yes. But then the offline CPUs get masked out in 
> vcpu_set_affinity(), and
> the affinity mask is then rejected if the remaining CPU set is empty.

I see you are correct that the v->cpu_affinity bits never do get set.
But the mask is not rejected -- but instead some bits are silently
ignored -- if there are both online and offline cpus in the list.
So:

   cpus="0,3"

on a 2p machine will currently set only one bit (bit 0) on a 2p but

   xm vcpu-pin domid all "0,3"

will set two bits.  Whereas

   cpus="2-3"

will cause an error on a 2p but

   xm vcpu-pin domid all "2-3"

will not.  This would become relevant if the "cpus" parameter
were preserved across a migration (rather than v->cpu_affinity),
which is what led to my original confusion.

So modulo-izing the cpus parameter code will eliminate this
case, but I still wonder if vcpu_set_affinity should reject any
mask that has bits set beyond max_pcpu instead of silently
ignoring those bits.  Seems like an accident waiting to happen
and indeed I got bitten by it.

Which is why I proposed tightening the definition of all affinity
masks (and strings representing masks) to "if you try to enable
a bit in the cpumask that refers to a non-existent processor, you
will get an error"

Thanks,
Dan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: "cpus" config parameter broken?
  2008-01-10 22:40               ` Dan Magenheimer
@ 2008-01-10 22:46                 ` Keir Fraser
  2008-01-10 22:53                   ` Dan Magenheimer
  0 siblings, 1 reply; 16+ messages in thread
From: Keir Fraser @ 2008-01-10 22:46 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com, Ian Pratt,
	xen-devel@lists.xensource.com

On 10/1/08 22:40, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

> So modulo-izing the cpus parameter code will eliminate this
> case, but I still wonder if vcpu_set_affinity should reject any
> mask that has bits set beyond max_pcpu instead of silently
> ignoring those bits.  Seems like an accident waiting to happen
> and indeed I got bitten by it.
> 
> Which is why I proposed tightening the definition of all affinity
> masks (and strings representing masks) to "if you try to enable
> a bit in the cpumask that refers to a non-existent processor, you
> will get an error"

That doesn't play nicely with CPU hotplug (not supported yet, but could well
be in future) where the online_map could be continually changing. The model
I'm aiming for in Xen is to remember all the CPUs requested by the
toolstack, but only schedule onto the subset that are actually online right
now (obviously). The implementation of this is of course quite simple given
the CPU hotplug is not supported right now.

 -- Keir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: "cpus" config parameter broken?
  2008-01-10 22:46                 ` Keir Fraser
@ 2008-01-10 22:53                   ` Dan Magenheimer
  2008-01-10 22:55                     ` Keir Fraser
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Magenheimer @ 2008-01-10 22:53 UTC (permalink / raw)
  To: Keir Fraser, Ian Pratt, xen-devel@lists.xensource.com

> > So modulo-izing the cpus parameter code will eliminate this
> > case, but I still wonder if vcpu_set_affinity should reject any
> > mask that has bits set beyond max_pcpu instead of silently
> > ignoring those bits.  Seems like an accident waiting to happen
> > and indeed I got bitten by it.
> >
> > Which is why I proposed tightening the definition of all affinity
> > masks (and strings representing masks) to "if you try to enable
> > a bit in the cpumask that refers to a non-existent processor, you
> > will get an error"
> 
> That doesn't play nicely with CPU hotplug (not supported yet, 
> but could well
> be in future) where the online_map could be continually 
> changing. The model
> I'm aiming for in Xen is to remember all the CPUs requested by the
> toolstack, but only schedule onto the subset that are 
> actually online right
> now (obviously). The implementation of this is of course 
> quite simple given
> the CPU hotplug is not supported right now.

Agreed, but even with CPU hotplug there will be some max_pcpu value
on any given machine.  That's why I said "non-existent processor"
in the proposal even though you said "offline processor".

Dan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: "cpus" config parameter broken?
  2008-01-10 22:53                   ` Dan Magenheimer
@ 2008-01-10 22:55                     ` Keir Fraser
  2008-01-10 23:46                       ` Dan Magenheimer
  0 siblings, 1 reply; 16+ messages in thread
From: Keir Fraser @ 2008-01-10 22:55 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com, Ian Pratt,
	xen-devel@lists.xensource.com

On 10/1/08 22:53, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

>> That doesn't play nicely with CPU hotplug (not supported yet,
>> but could well
>> be in future) where the online_map could be continually
>> changing. The model
>> I'm aiming for in Xen is to remember all the CPUs requested by the
>> toolstack, but only schedule onto the subset that are
>> actually online right
>> now (obviously). The implementation of this is of course
>> quite simple given
>> the CPU hotplug is not supported right now.
> 
> Agreed, but even with CPU hotplug there will be some max_pcpu value
> on any given machine.  That's why I said "non-existent processor"
> in the proposal even though you said "offline processor".

You mean CPUs beyond NR_CPUS? All the cpumask iterators are careful not to
return values beyond NR_CPUS, regardless of what stray bits lie beyond that
range in the longword bitmap.

 -- Keir

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: "cpus" config parameter broken?
  2008-01-10 22:55                     ` Keir Fraser
@ 2008-01-10 23:46                       ` Dan Magenheimer
  2008-01-10 23:53                         ` Keir Fraser
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Magenheimer @ 2008-01-10 23:46 UTC (permalink / raw)
  To: Keir Fraser, Ian Pratt, xen-devel@lists.xensource.com

> >> changing. The model
> >> I'm aiming for in Xen is to remember all the CPUs requested by the
> >> toolstack, but only schedule onto the subset that are
> >> actually online right
> >> now (obviously). The implementation of this is of course
> >> quite simple given
> >> the CPU hotplug is not supported right now.
> >
> > Agreed, but even with CPU hotplug there will be some max_pcpu value
> > on any given machine.  That's why I said "non-existent processor"
> > in the proposal even though you said "offline processor".
> 
> You mean CPUs beyond NR_CPUS? All the cpumask iterators are 
> careful not to
> return values beyond NR_CPUS, regardless of what stray bits 
> lie beyond that
> range in the longword bitmap.

I see... you are allowing for any future box to grow to NR_CPUS
and I am assuming that, even with future hot-add processors,
Xen will be told by the box the maximum number of processors
that will ever be online (call this max_pcpu), and that max_pcpu
is probably less than NR_CPUS.  So for these NR_CPUS-max_pcpu
processors that are "non-existent" (and especially for the
foreseeable future on the vast majority of machines for which
max_pcpu=npcpu=constant and ncpu << NR_CPUS), trying to set
bits for non-existent processors should not be silently ignored
and discarded, but should either be entirely
disallowed or, at least, should be retained and ignored.
I would propose "disallowed" for n > max_pcpu and retained
and ignored for online_pcpu < n < max_pcpu.

A related aside, for either model for hot-add (yours or mine),
the current modulo mechanism in xm_vcpu_pin is not scaleable
and imho should be removed now as well before anybody comes to
depend on it.

And lastly, this hot-add discussion reinforces in my mind the
difference between affinity and restriction (and pinning) which
are all muddled in the current hypervisor and tools.

Dan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: "cpus" config parameter broken?
  2008-01-10 23:46                       ` Dan Magenheimer
@ 2008-01-10 23:53                         ` Keir Fraser
  2008-01-11  0:43                           ` Dan Magenheimer
  0 siblings, 1 reply; 16+ messages in thread
From: Keir Fraser @ 2008-01-10 23:53 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com, Ian Pratt,
	xen-devel@lists.xensource.com

The current hypervisor interface has the advantage of flexibility. You can
easily enforce various policies (including strict checking, or modulo
arithmetic) in the toolstack on top of the current interface. But you can't
(easily) implement the current hypervisor policy in the toolstack on top of
strict checking or modulo arithmetic (if one of those policies becomes
hardcoded into the hypervisor).

The current interface assumes the lowest levels of the toolstack know what
they are doing, and presents a policy that is as permissive as possible.

 -- Keir

On 10/1/08 23:46, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

>> You mean CPUs beyond NR_CPUS? All the cpumask iterators are
>> careful not to
>> return values beyond NR_CPUS, regardless of what stray bits
>> lie beyond that
>> range in the longword bitmap.
> 
> I see... you are allowing for any future box to grow to NR_CPUS
> and I am assuming that, even with future hot-add processors,
> Xen will be told by the box the maximum number of processors
> that will ever be online (call this max_pcpu), and that max_pcpu
> is probably less than NR_CPUS.  So for these NR_CPUS-max_pcpu
> processors that are "non-existent" (and especially for the
> foreseeable future on the vast majority of machines for which
> max_pcpu=npcpu=constant and ncpu << NR_CPUS), trying to set
> bits for non-existent processors should not be silently ignored
> and discarded, but should either be entirely
> disallowed or, at least, should be retained and ignored.
> I would propose "disallowed" for n > max_pcpu and retained
> and ignored for online_pcpu < n < max_pcpu.
> 
> A related aside, for either model for hot-add (yours or mine),
> the current modulo mechanism in xm_vcpu_pin is not scaleable
> and imho should be removed now as well before anybody comes to
> depend on it.
> 
> And lastly, this hot-add discussion reinforces in my mind the
> difference between affinity and restriction (and pinning) which
> are all muddled in the current hypervisor and tools.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: "cpus" config parameter broken?
  2008-01-10 23:53                         ` Keir Fraser
@ 2008-01-11  0:43                           ` Dan Magenheimer
  2008-01-11  0:53                             ` Keir Fraser
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Magenheimer @ 2008-01-11  0:43 UTC (permalink / raw)
  To: Keir Fraser, Ian Pratt, xen-devel@lists.xensource.com

Sorry to belabo(u)r the point but I beg to differ: The current
hypervisor interface is a strange mixture of flexibility and
restriction (and policy and mechanism):  Some mask parameters
are left alone by vcpu_set_affinity, others are rejected entirely,
and still others are silently modified.  The advantage to the
existing interface is of course that it preserves the downward
interface to the schedulers, eg. schedulers can assume that
any bit set represents a schedulable processor.

So if the toolstack knows what it is doing, why does
vcpu_set_affinity even look at the mask? IMHO either:

1) the policy belongs in the tools, in which case the and'ing
   of the mask should only be done by the scheduler whenever a
   vcpu is scheduled (thus allowing maximal flexibility
   for future highly dynamic hot-plug but ensuring a vcpu
   never gets scheduled on an offline or non-existent pcpu),
   or
2) the policy belongs in the hypervisor, in which case any
   attempt by the tools to allow scheduling (e.g. set affinity)
   on an offline or non-existent processor should be rejected
   (in which case the toolset is immediately notified that
   its understanding of the current online set is faulty).

Though it could be argued academically that "policy" doesn't
belong in the hypervisor, rejecting an attempt by the tools
to use a non-available processor isn't much different than
rejecting an SSE3 instruction on a non-SSE3 processor.
(In other words, it's really processor enforcement mechanism.)
So I like #2.  #1 would be OK too.  I just don't like the
current muddle which has already led to misunderstandings
and inconsistent implementations in the current toolchain.

Dan

> -----Original Message-----
> From: Keir Fraser [mailto:Keir.Fraser@cl.cam.ac.uk]
> Sent: Thursday, January 10, 2008 4:53 PM
> To: dan.magenheimer@oracle.com; Ian Pratt; 
> xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] "cpus" config parameter broken?
> 
> 
> The current hypervisor interface has the advantage of 
> flexibility. You can
> easily enforce various policies (including strict checking, or modulo
> arithmetic) in the toolstack on top of the current interface. 
> But you can't
> (easily) implement the current hypervisor policy in the 
> toolstack on top of
> strict checking or modulo arithmetic (if one of those policies becomes
> hardcoded into the hypervisor).
> 
> The current interface assumes the lowest levels of the 
> toolstack know what
> they are doing, and presents a policy that is as permissive 
> as possible.
> 
>  -- Keir
> 
> On 10/1/08 23:46, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> 
> >> You mean CPUs beyond NR_CPUS? All the cpumask iterators are
> >> careful not to
> >> return values beyond NR_CPUS, regardless of what stray bits
> >> lie beyond that
> >> range in the longword bitmap.
> >
> > I see... you are allowing for any future box to grow to NR_CPUS
> > and I am assuming that, even with future hot-add processors,
> > Xen will be told by the box the maximum number of processors
> > that will ever be online (call this max_pcpu), and that max_pcpu
> > is probably less than NR_CPUS.  So for these NR_CPUS-max_pcpu
> > processors that are "non-existent" (and especially for the
> > foreseeable future on the vast majority of machines for which
> > max_pcpu=npcpu=constant and ncpu << NR_CPUS), trying to set
> > bits for non-existent processors should not be silently ignored
> > and discarded, but should either be entirely
> > disallowed or, at least, should be retained and ignored.
> > I would propose "disallowed" for n > max_pcpu and retained
> > and ignored for online_pcpu < n < max_pcpu.
> >
> > A related aside, for either model for hot-add (yours or mine),
> > the current modulo mechanism in xm_vcpu_pin is not scaleable
> > and imho should be removed now as well before anybody comes to
> > depend on it.
> >
> > And lastly, this hot-add discussion reinforces in my mind the
> > difference between affinity and restriction (and pinning) which
> > are all muddled in the current hypervisor and tools.
> 
> 
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: "cpus" config parameter broken?
  2008-01-11  0:43                           ` Dan Magenheimer
@ 2008-01-11  0:53                             ` Keir Fraser
  0 siblings, 0 replies; 16+ messages in thread
From: Keir Fraser @ 2008-01-11  0:53 UTC (permalink / raw)
  To: dan.magenheimer@oracle.com, Ian Pratt,
	xen-devel@lists.xensource.com

On 11/1/08 00:43, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:

> Though it could be argued academically that "policy" doesn't
> belong in the hypervisor, rejecting an attempt by the tools
> to use a non-available processor isn't much different than
> rejecting an SSE3 instruction on a non-SSE3 processor.
> (In other words, it's really processor enforcement mechanism.)
> So I like #2.  #1 would be OK too.  I just don't like the
> current muddle which has already led to misunderstandings
> and inconsistent implementations in the current toolchain.

Yes, probably we should not return an error if ANDing with online_map
returns an empty set, and instead we should do some fallback (like ignore
affinity altogether). This is what we would have to do in a cpu hot-unplug
case, where that unplugged cpu was the only cpu in some vcpu's affinity map.
Either that or fail the CPU hot-unplug, I suppose.

 -- Keir

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2008-01-11  0:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-08  1:09 "cpus" config parameter broken? Dan Magenheimer
2008-01-08  1:57 ` Ian Pratt
2008-01-09 18:40   ` Dan Magenheimer
2008-01-09 19:17     ` Keir Fraser
2008-01-10 18:38       ` Dan Magenheimer
2008-01-10 20:50         ` Keir Fraser
2008-01-10 21:10           ` Dan Magenheimer
2008-01-10 21:57             ` Keir Fraser
2008-01-10 22:40               ` Dan Magenheimer
2008-01-10 22:46                 ` Keir Fraser
2008-01-10 22:53                   ` Dan Magenheimer
2008-01-10 22:55                     ` Keir Fraser
2008-01-10 23:46                       ` Dan Magenheimer
2008-01-10 23:53                         ` Keir Fraser
2008-01-11  0:43                           ` Dan Magenheimer
2008-01-11  0:53                             ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.