Re: schedulers and topology exposing questions

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: George Dunlap <george.dunlap@citrix.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
	george.dunlap@eu.citrix.com, dario.faggioli@citrix.com,
	xen-devel@lists.xen.org, joao.m.martins@oracle.com,
	boris.ostrovsky@oracle.com
Subject: Re: schedulers and topology exposing questions
Date: Wed, 27 Jan 2016 11:12:16 -0500	[thread overview]
Message-ID: <20160127161216.GA1715@char.us.oracle.com> (raw)
In-Reply-To: <56A8E802.9090702@citrix.com>

On Wed, Jan 27, 2016 at 03:53:38PM +0000, George Dunlap wrote:
> On 27/01/16 15:27, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jan 27, 2016 at 03:10:01PM +0000, George Dunlap wrote:
> >> On 27/01/16 14:33, Konrad Rzeszutek Wilk wrote:
> >>> On Xen - the schedule() would go HLT.. and then later be woken up by the
> >>> VIRQ_TIMER. And since the two applications were on seperate CPUs - the
> >>> single packet would just stick in the queue until the VIRQ_TIMER arrived.
> >>
> >> I'm not sure I understand the situation right, but it sounds a bit like
> >> what you're seeing is just a quirk of the fact that Linux doesn't always
> >> send IPIs to wake other processes up (either by design or by accident),
> > 
> > It does and it does not :-)
> > 
> >> but relies on scheduling timers to check for work to do.  Presumably
> > 
> > It .. I am not explaining it well. The Linux kernel scheduler when
> > called for 'schedule' (from the UDP sendmsg) would either pick the next
> > appliction and do a context swap - of if there were none - go to sleep.
> > [Kind of - it also may do an IPI to the other CPU if requested ,but that requires
> > some hints from underlaying layers]
> > Since there were only two apps on the runqueue - udp sender and udp receiver
> > it would run them back-to back (this is on baremetal)
> 
> I think I understand at a high level from your description what's
> happening (No IPIs -> happens to run if on the same cpu, waits until
> next timer tick if on a different cpu); but what I don't quite get is
> *why* Linux doesn't send an IPI.

Wait no no.

"happens to run if on the same cpu" - only if on baremetal or if
we expose SMT topology to a guest.

Otherwise the applications are not on the same CPU.

The sending IPI part is because there are two CPUs - and the apps on those
two runqeueus are not intertwined from the perspective of the scheduler.
(Unless the udp code has given the scheduler hints).


However if I tasket the applications on the same vCPU (this being without
exposing SMT threads or just the normal situation as today) - the scheduler
will send IPI context switches.

Then I found that if I enable vAPIC and disable event channels for IPIs
and only use the native APIC machinery for (aka vAPIC) we can even do less
VMEXITs, but that is a different story:
http://lists.xenproject.org/archives/html/xen-devel/2015-10/msg00897.html

> 
> It's been quite a while since I looked at the Linux scheduling code, so
> I'm trying to understand it based a lot on the Xen code. In Xen a vcpu
> can be "runnable" (has something to do) and "blocked" (waiting for
> something to do). Whenever a vcpu goes from "blocked" to "runnable", the
> scheduler will call vcpu_wake(), which sends an IPI to the appropriate
> pcpu to get it to run the vcpu.
> 
> What you're describing is a situation where a process is blocked (either
> in 'listen' or 'read'), and another process does something which should
> cause it to become 'runnable' (sends it a UDP message). If anyone
> happens to run the scheduler on its cpu, it will run; but no proactive
> actions are taken to wake it up (i.e., sending an IPI).

Right. And that is a UDP code decision. It called the schedule without
any timeout or hints.
> 
> The idea of not sending an IPI when a process goes from "waiting for
> something to do" to "has something to do" seems strange to me; and if it
> wasn't a mistake, then my only guess why they would choose to do that
> would be to reduce IPI traffic on large systems.
> 
> But whether it's a mistake or on purpose, it's a Linux thing, so...

Yes :-)
> 
> >> they knew that low performance on ping-pong workloads might be a
> >> possibility when they wrote the code that way; I don't see a reason why
> >> we should try to work around that in Xen.
> > 
> > Which is not what I am suggesting.
> 
> I'm glad we agree on this. :-)
> 
> > Our first ideas was that since this is a Linux kernel schduler characteristic
> > - let us give the guest all the information it needs to do this. That is
> > make it look as baremetal as possible - and that is where the vCPU
> > pinning and the exposing of SMT information came about. That (Elena
> > pls correct me if I am wrong) did indeed show that the guest was doing
> > what we expected.
> > 
> > But naturally that requires pinning and all that - and while it is a useful
> > case for those that have the vCPUs to spare and can do it - that is not
> > a general use-case.
> > 
> > So Elena started looking at the CPU bound and seeing how Xen behaves then
> > and if we can improve the floating situation as she saw some abnormal
> > behavious.
> 
> OK -- if the focus was on the two cases where the Xen credit1 scheduler
> (apparently) co-located two cpu-burning vcpus on sibling threads, then
> yeah, that's behavior we should probably try to get to the bottom of.

Right!

next prev parent reply	other threads:[~2016-01-27 16:12 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-22 16:54 schedulers and topology exposing questions Elena Ufimtseva
2016-01-22 17:29 ` Dario Faggioli
2016-01-22 23:58   ` Elena Ufimtseva
2016-01-26 11:21 ` George Dunlap
2016-01-27 14:25   ` Dario Faggioli
2016-01-27 14:33   ` Konrad Rzeszutek Wilk
2016-01-27 15:10     ` George Dunlap
2016-01-27 15:27       ` Konrad Rzeszutek Wilk
2016-01-27 15:53         ` George Dunlap
2016-01-27 16:12           ` Konrad Rzeszutek Wilk [this message]
2016-01-28  9:55           ` Dario Faggioli
2016-01-29 21:59             ` Elena Ufimtseva
2016-02-02 11:58               ` Dario Faggioli
2016-01-27 16:03         ` Elena Ufimtseva
2016-01-28  9:46           ` Dario Faggioli
2016-01-29 16:09             ` Elena Ufimtseva
2016-01-28 15:10         ` Dario Faggioli
2016-01-29  3:27           ` Konrad Rzeszutek Wilk
2016-02-02 11:45             ` Dario Faggioli
2016-02-03 18:05               ` Konrad Rzeszutek Wilk
2016-01-27 14:01 ` Dario Faggioli
2016-01-28 18:51   ` Elena Ufimtseva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160127161216.GA1715@char.us.oracle.com \
    --to=konrad.wilk@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dario.faggioli@citrix.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=george.dunlap@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=joao.m.martins@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).