From: George Dunlap <george.dunlap@citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com>,
george.dunlap@eu.citrix.com, dario.faggioli@citrix.com,
xen-devel@lists.xen.org, joao.m.martins@oracle.com,
boris.ostrovsky@oracle.com
Subject: Re: schedulers and topology exposing questions
Date: Wed, 27 Jan 2016 15:53:38 +0000 [thread overview]
Message-ID: <56A8E802.9090702@citrix.com> (raw)
In-Reply-To: <20160127152701.GF552@char.us.oracle.com>
On 27/01/16 15:27, Konrad Rzeszutek Wilk wrote:
> On Wed, Jan 27, 2016 at 03:10:01PM +0000, George Dunlap wrote:
>> On 27/01/16 14:33, Konrad Rzeszutek Wilk wrote:
>>> On Xen - the schedule() would go HLT.. and then later be woken up by the
>>> VIRQ_TIMER. And since the two applications were on seperate CPUs - the
>>> single packet would just stick in the queue until the VIRQ_TIMER arrived.
>>
>> I'm not sure I understand the situation right, but it sounds a bit like
>> what you're seeing is just a quirk of the fact that Linux doesn't always
>> send IPIs to wake other processes up (either by design or by accident),
>
> It does and it does not :-)
>
>> but relies on scheduling timers to check for work to do. Presumably
>
> It .. I am not explaining it well. The Linux kernel scheduler when
> called for 'schedule' (from the UDP sendmsg) would either pick the next
> appliction and do a context swap - of if there were none - go to sleep.
> [Kind of - it also may do an IPI to the other CPU if requested ,but that requires
> some hints from underlaying layers]
> Since there were only two apps on the runqueue - udp sender and udp receiver
> it would run them back-to back (this is on baremetal)
I think I understand at a high level from your description what's
happening (No IPIs -> happens to run if on the same cpu, waits until
next timer tick if on a different cpu); but what I don't quite get is
*why* Linux doesn't send an IPI.
It's been quite a while since I looked at the Linux scheduling code, so
I'm trying to understand it based a lot on the Xen code. In Xen a vcpu
can be "runnable" (has something to do) and "blocked" (waiting for
something to do). Whenever a vcpu goes from "blocked" to "runnable", the
scheduler will call vcpu_wake(), which sends an IPI to the appropriate
pcpu to get it to run the vcpu.
What you're describing is a situation where a process is blocked (either
in 'listen' or 'read'), and another process does something which should
cause it to become 'runnable' (sends it a UDP message). If anyone
happens to run the scheduler on its cpu, it will run; but no proactive
actions are taken to wake it up (i.e., sending an IPI).
The idea of not sending an IPI when a process goes from "waiting for
something to do" to "has something to do" seems strange to me; and if it
wasn't a mistake, then my only guess why they would choose to do that
would be to reduce IPI traffic on large systems.
But whether it's a mistake or on purpose, it's a Linux thing, so...
>> they knew that low performance on ping-pong workloads might be a
>> possibility when they wrote the code that way; I don't see a reason why
>> we should try to work around that in Xen.
>
> Which is not what I am suggesting.
I'm glad we agree on this. :-)
> Our first ideas was that since this is a Linux kernel schduler characteristic
> - let us give the guest all the information it needs to do this. That is
> make it look as baremetal as possible - and that is where the vCPU
> pinning and the exposing of SMT information came about. That (Elena
> pls correct me if I am wrong) did indeed show that the guest was doing
> what we expected.
>
> But naturally that requires pinning and all that - and while it is a useful
> case for those that have the vCPUs to spare and can do it - that is not
> a general use-case.
>
> So Elena started looking at the CPU bound and seeing how Xen behaves then
> and if we can improve the floating situation as she saw some abnormal
> behavious.
OK -- if the focus was on the two cases where the Xen credit1 scheduler
(apparently) co-located two cpu-burning vcpus on sibling threads, then
yeah, that's behavior we should probably try to get to the bottom of.
-George
next prev parent reply other threads:[~2016-01-27 15:53 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-22 16:54 schedulers and topology exposing questions Elena Ufimtseva
2016-01-22 17:29 ` Dario Faggioli
2016-01-22 23:58 ` Elena Ufimtseva
2016-01-26 11:21 ` George Dunlap
2016-01-27 14:25 ` Dario Faggioli
2016-01-27 14:33 ` Konrad Rzeszutek Wilk
2016-01-27 15:10 ` George Dunlap
2016-01-27 15:27 ` Konrad Rzeszutek Wilk
2016-01-27 15:53 ` George Dunlap [this message]
2016-01-27 16:12 ` Konrad Rzeszutek Wilk
2016-01-28 9:55 ` Dario Faggioli
2016-01-29 21:59 ` Elena Ufimtseva
2016-02-02 11:58 ` Dario Faggioli
2016-01-27 16:03 ` Elena Ufimtseva
2016-01-28 9:46 ` Dario Faggioli
2016-01-29 16:09 ` Elena Ufimtseva
2016-01-28 15:10 ` Dario Faggioli
2016-01-29 3:27 ` Konrad Rzeszutek Wilk
2016-02-02 11:45 ` Dario Faggioli
2016-02-03 18:05 ` Konrad Rzeszutek Wilk
2016-01-27 14:01 ` Dario Faggioli
2016-01-28 18:51 ` Elena Ufimtseva
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56A8E802.9090702@citrix.com \
--to=george.dunlap@citrix.com \
--cc=boris.ostrovsky@oracle.com \
--cc=dario.faggioli@citrix.com \
--cc=elena.ufimtseva@oracle.com \
--cc=george.dunlap@eu.citrix.com \
--cc=joao.m.martins@oracle.com \
--cc=konrad.wilk@oracle.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.