Scheduler follow-up: Design target (was [RFC] Scheduler work, part 1)

All of lore.kernel.org
 help / color / mirror / Atom feed

* Scheduler follow-up: Design target (was [RFC] Scheduler work, part 1)
@ 2009-04-14 12:38 George Dunlap
  2009-04-16  3:29 ` Tian, Kevin
  2009-04-17 12:11 ` Juergen Gross
  0 siblings, 2 replies; 3+ messages in thread
From: George Dunlap @ 2009-04-14 12:38 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com; +Cc: Ian Pratt, Tian, Kevin, Jeremy Fitzhardinge

Hey all,

Thanks for the feedback; and, sorry for sending it just before a
holiday weekend so there was a delay in writing up a response.  (OTOH,
as I did read the e-mails as they came out, it's given me more time to
think and coalesce.)

A couple of high bits: This first e-mail was meant to lay out design
goals and discuss interface.  If we can agree (for example) that we
want latency-sensitive workloads (such as network, audio, and video)
to perform well, and use latency-sensitive workloads as test cases
while developing, then we don't need to agree on a specific algorithm
up-front.

OK, with that in mind, some specific responses:

* [Jeremy] Is that forward-looking enough?  That hardware is currently
available; what's going to be commonplace in 2-3 years?

I think we need to distinguish between "works optimally" and "works
well".  Obviously we want the design to be scalable, and we don't want
to have to do a major revision in a year because 16 logical cpus works
well but 32 tanks.  And it may be a good idea to "lead" the target, so
that when we actually ship something it will be right on, rather than
6 months behind.

Still, in 2-3 years, will the vast majority of servers have 32 logical
cpus, or still only 16 or less?

Any thoughts on a reasonable target?

* [Kevin Tian] How is 80%/800% chosen here?

Heuristics.  80% is a general rule of thumb for optimal server
performance.  Above 80% and you may get a higher total throughput (or
maybe not) but it will be common for individual VMs to have to wait
for CPU resources, which may cause significant performance impact.

(I should clarify, 80% means 80% of *all* resources, not 80% of one
cpu; i.e., if you have 4 cores, xenuse may report 360% of one cpu;
but 100% of all resources would be 400% of one cpu.)

800% was just a general boundary.  I think it's sometimes as important
to say what you *aren't* doing as what you are doing.  For example, if
someone comes in and says, "This new scheduler sucks if you have a
load average of 10 (i.e., 1000% utilization)", we can say, "Running
with a load average of 10 isn't what we're designing for.  Patches
will be accepted if they don't adversely impact performance at 80%.
Otherwise feel free to write your own scheduler for that kind of
system."  OTOH, if a hosting provider (for example) says, "Performance
really tanks around a load of 3", we should make an effort to
accomodate that.

* [Kevin Tian] How about VM number in total you'd like to support?

Good question.  I'll do some research for how many VMs a virtual
desktop system might want to support.

For servers, I think a reasonable design space would be between 1 VM
every 3 cores (for a few extremely high-load servers) to 8 VMs every
core (for highly aggregated servers).  I suppose server farms may want
more.

Does anyone else have any thoughts on this subject -- either
suggestions for different numbers, or other use cases they want
considered?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Scheduler follow-up: Design target (was [RFC] Scheduler work, part 1)
  2009-04-14 12:38 Scheduler follow-up: Design target (was [RFC] Scheduler work, part 1) George Dunlap
@ 2009-04-16  3:29 ` Tian, Kevin
  2009-04-17 12:11 ` Juergen Gross
  1 sibling, 0 replies; 3+ messages in thread
From: Tian, Kevin @ 2009-04-16  3:29 UTC (permalink / raw)
  To: George Dunlap, xen-devel@lists.xensource.com
  Cc: Ian Pratt, Jeremy Fitzhardinge

[-- Attachment #1: Type: text/plain, Size: 2099 bytes --]

>From: George Dunlap
>Sent: 2009年4月14日 20:38
>
>Hey all,
>
>Thanks for the feedback; and, sorry for sending it just before a
>holiday weekend so there was a delay in writing up a response.  (OTOH,
>as I did read the e-mails as they came out, it's given me more time to
>think and coalesce.)
>
>A couple of high bits: This first e-mail was meant to lay out design
>goals and discuss interface.  If we can agree (for example) that we
>want latency-sensitive workloads (such as network, audio, and video)
>to perform well, and use latency-sensitive workloads as test cases
>while developing, then we don't need to agree on a specific algorithm
>up-front.

That looks fine to me, but latency-sentitive shouldn't be the only part
to be concerned. :-)

>
>* [Kevin Tian] How is 80%/800% chosen here?
>
>Heuristics.  80% is a general rule of thumb for optimal server
>performance.  Above 80% and you may get a higher total throughput (or
>maybe not) but it will be common for individual VMs to have to wait
>for CPU resources, which may cause significant performance impact.
>
>(I should clarify, 80% means 80% of *all* resources, not 80% of one
>cpu; i.e., if you have 4 cores, xenuse may report 360% of one cpu;
>but 100% of all resources would be 400% of one cpu.)
>
>800% was just a general boundary.  I think it's sometimes as important
>to say what you *aren't* doing as what you are doing.  For example, if
>someone comes in and says, "This new scheduler sucks if you have a
>load average of 10 (i.e., 1000% utilization)", we can say, "Running
>with a load average of 10 isn't what we're designing for.  Patches
>will be accepted if they don't adversely impact performance at 80%.
>Otherwise feel free to write your own scheduler for that kind of
>system."  OTOH, if a hosting provider (for example) says, "Performance
>really tanks around a load of 3", we should make an effort to
>accomodate that.

Got it. So one more interesting question is, how do you define a
''function reasonablely well'' under 800% utilization, any criteria?

Thanks,
Kevin

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Scheduler follow-up: Design target (was [RFC] Scheduler work, part 1)
  2009-04-14 12:38 Scheduler follow-up: Design target (was [RFC] Scheduler work, part 1) George Dunlap
  2009-04-16  3:29 ` Tian, Kevin
@ 2009-04-17 12:11 ` Juergen Gross
  1 sibling, 0 replies; 3+ messages in thread
From: Juergen Gross @ 2009-04-17 12:11 UTC (permalink / raw)
  To: George Dunlap
  Cc: Ian Pratt, Tian, Kevin, xen-devel@lists.xensource.com,
	Jeremy Fitzhardinge

George Dunlap wrote:
> * [Jeremy] Is that forward-looking enough?  That hardware is currently
> available; what's going to be commonplace in 2-3 years?
> 
> I think we need to distinguish between "works optimally" and "works
> well".  Obviously we want the design to be scalable, and we don't want
> to have to do a major revision in a year because 16 logical cpus works
> well but 32 tanks.  And it may be a good idea to "lead" the target, so
> that when we actually ship something it will be right on, rather than
> 6 months behind.

This problem might be less critical if cpupools are supported. On really
large systems it would be possible to limit the number of logical cpus
for a scheduler.

> 
> Still, in 2-3 years, will the vast majority of servers have 32 logical
> cpus, or still only 16 or less?

I think Nehalem-EX will have 16 on one socket (8 cores with 2 HT each).
With 4 sockets this would sum up to 64.

> * [Kevin Tian] How about VM number in total you'd like to support?
> 
> Good question.  I'll do some research for how many VMs a virtual
> desktop system might want to support.
> 
> For servers, I think a reasonable design space would be between 1 VM
> every 3 cores (for a few extremely high-load servers) to 8 VMs every
> core (for highly aggregated servers).  I suppose server farms may want
> more.
> 
> Does anyone else have any thoughts on this subject -- either
> suggestions for different numbers, or other use cases they want
> considered?

For our BS2000 servers we would really appreciate support of cpupools :-)
Or as an alternative correct handling of weights with cpu-pinning.

Another question: do you plan to replace the current credit scheduler or will
the new scheduler be another alternative to credit and sedf?


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 636 47950
Fujitsu Technolgy Solutions               e-mail: juergen.gross@ts.fujitsu.com
Otto-Hahn-Ring 6                        Internet: ts.fujitsu.com
D-81739 Muenchen                 Company details: ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-04-17 12:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-14 12:38 Scheduler follow-up: Design target (was [RFC] Scheduler work, part 1) George Dunlap
2009-04-16  3:29 ` Tian, Kevin
2009-04-17 12:11 ` Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.