xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Julien Grall <julien.grall@arm.com>
To: Dario Faggioli <dario.faggioli@citrix.com>,
	Stefano Stabellini <sstabellini@kernel.org>
Cc: edgar.iglesias@xilinx.com, george.dunlap@eu.citrix.com,
	nd@arm.com, Punit Agrawal <punit.agrawal@arm.com>,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH] xen/arm: introduce vwfi parameter
Date: Tue, 21 Feb 2017 12:30:11 +0000	[thread overview]
Message-ID: <14575011-0042-8940-c19f-2482136ff91c@foss.arm.com> (raw)
In-Reply-To: <1487668158.6732.310.camel@citrix.com>

Hi Dario,

On 21/02/2017 09:09, Dario Faggioli wrote:
> On Tue, 2017-02-21 at 07:59 +0000, Julien Grall wrote:
>> On 20/02/2017 22:53, Dario Faggioli wrote:
>>> For instance, as you say, executing a WFI from a guest directly on
>>> hardware, only makes sense if we have 1:1 static pinning. Which
>>> means
>>> it can't just be done by default, or with a boot parameter, because
>>> we
>>> need to check and enforce that there's only 1:1 pinning around.
>>
>> I agree it cannot be done by default. Similarly, the poll mode cannot
>> be
>> done by default in platform nor by domain because you need to know
>> that
>> all vCPUs will be in polling mode.
>>
> No, that's the big difference. Polling (which, as far as this patch
> goes, is yielding, in this case) is generic in the sense that, no
> matter the pinned or non-pinned state, things work. Power is wasted,
> but nothing breaks.
>
> Not trapping WF* is not generic in the sense that, if you do in the
> pinned case, i (probably) works. If you lift the pinning, but leave the
> direct WF* execution in place, everything breaks.
>
> This is all I'm saying: that if you say, not trapping is an alternative
> to this patch, well, it is not. Not trapping _plus_ measures for
> preventing things to break, is an alternative.
>
> Am I nitpicking? Perhaps... In which case, sorry. :-P

I am sorry but I still don't understand why you say things will break if 
you don't trap WFI/WFE. Can you detail it?

>
>> But as I said, if vCPUs are not pinned this patch as very little
>> advantage because you may context switch between them when yielding.
>>
> Smaller advantage, sure. How much smaller, hard to tell. That is the
> reason why I see some potential value in this patch, especially if
> converted to doing its thing per-domain, as George suggested. One can
> try (and, when that happens, we'll show a big WARNING about wasting
> power an heating up the CPUs!), and decide whether the result is good
> or not for the specific use case.

I even think there will be no advantage at all in multiple vCPUs case 
because I would not be surprised that the overhead of vCPU block is 
because we switch back and forth to the idle vCPU requiring to 
save/restore the context of the same vCPU.

Anyway, having number here would help to confirm.

My concern of per-domain solution or even system wide is you may have an 
idle vCPU where you don't expect interrupt to come. In this case, your 
vCPU will waste power and an unmodified app (e.g non-Xen aware) as there 
is no solution to suspend the vCPU today on Xen.

>
>>> Is it possible to decide whether to trap and emulate WFI, or just
>>> execute it, online, and change such decision dynamically? And even
>>> if
>>> yes, how would the whole thing work? When the direct execution is
>>> enabled for a domain we automatically enforce 1:1 pinning for that
>>> domain, and kick all the other domain out of its pcpus? What if
>>> they
>>> have their own pinning, what if they also have 'direct WFI'
>>> behavior
>>> enabled?
>>
>> It can be changed online, the WFI/WFE trapping is per pCPU (see
>> HCR_EL2.{TWE,TWI}
>>
> Ok, thanks for the info. Not bad. With added logic (perhaps in the nop
> scheduler), this looks like it could be useful.
>
>>> These are just examples, my point being that in theory, if we
>>> consider
>>> a very specific usecase or set of usecase, there's a lot we can do.
>>> But
>>> when you say "why don't you let the guest directly execute WFI", in
>>> response to a patch and a discussion like this, people may think
>>> that
>>> you are actually proposing doing it as a solution, which is not
>>> possible without figuring out all the open questions above
>>> (actually,
>>> probably, more) and without introducing a lot of cross-subsystem
>>> policing inside Xen, which is often something we don't want.
>>
>> I made this response because the patch sent by Stefano as a very
>> specific use case that can be solved the same way. Everyone here is
>> suggesting polling but it has it is own disadvantage: power
>> consumption.
>>
>> Anyway, I still think in both case we are solving a specific problem
>> without looking at what matters. I.e Why the scheduler takes so much
>> time to block/unblock.
>>
> Well, TBH, we still are not entirely sure who the culprit is for high
> latency. There are spikes in Credit2, and I'm investigating that. But
> apart from them? I think we need other numbers with which we can
> compare the numbers that Stefano has collected.

I think the problem is because we save/restore the vCPU state when 
switching to the idle vCPU.

Let say the only 1 vCPU can run on the pCPU, when the vCPU is issuing a 
WFI the following steps will happen:
      * WFI trapped and vcpu blocked
      * save vCPU state
      * run idle_loop
-> Interrupt incoming for the guest
      * restore vCPU state
      * back to the guest

Saving/restoring on ARM requires to context switch all the state of the 
VM (this is not saved in memory when entering in the hypervisor). This 
include things like system register, interrupt controller state, FPU...

Context switching the interrupt controller and the FPU can take some 
times as you got lots of register and some are only accessible through 
the memory interface (see GICv2 for instance).

So a context switch will likely hurt the performance of block vcpu in 
the context of 1 vCPU only running per pCPU.

>
> I'll send code for the nop scheduler, and we will compare with what
> we'll get with it. Another interesting data point would be knowing how
> the numbers look like on baremetal, on the same platform and under
> comparable conditions.
>
> And I guess there are other components and layers, in the Xen
> architecture, that may be causing increased latency, which we may have
> not identified yet.
>
> Anyway, nop scheduler is probably first thing we want to check. I'll
> send the patches soon.
>
>>>> So, yes in fine the guest will waste its slot.
>>>>
>>> Did I say it already that this concept of "slots" does not apply
>>> here?
>>> :-D
>>
>> Sorry forgot about this :/. I guess you use the term credit? If so,
>> the
>> guest will use its credit for nothing.
>>
> If the guest is alone, or in general the system is undersubscribed, it
> would, by continuously yielding in a busy loop, but that doesn't
> matter, because there are enough pCPUs to run even vCPUs that are out
> of credits.
>
> If the guest is not alone, and the system is oversubscribed, it would
> use a very tiny amount of its credits, every now and then, i.e., the
> ones that are necessary to execute a WFI, and, for Xen, to issue a call
> to sched_yield(). But after that, we will run someone else. This to say
> that the problem of this patch might be that, in the oversubscribed
> case, it relies too much on the behavior of yield, but not that it does
> nothing.
>
> But maybe I'm nitpicking again. Sorry. I don't get to talk about these
> inner (and very interesting, to me at least) scheduling details too
> often, and when it happens, I tend to get excited and exaggerate! :-P

Let's take a step aside. The ARM ARM describes WFI as "hint instruction 
that permits the processor to enter a low-power state until one of a 
number of asynchronous event occurs". Entering in lower-power state 
means it will have an impact (maybe small) interrupt latency because the 
CPU would have to leave the low-power state.

A baremetal application that use WFI is aware of the impact and wish to 
save power. If that application really care about interrupt latency it 
will use polling and not WFI. It depends on how much you could tolerate 
the interrupt latency.

Now, a same baremetal running as Xen guest will expect the same 
behavior. This is why WFI is implement with block but is has an high 
impact today (see above for a possible explanation). Moving to yield may 
have the same high impact because as you said the implementation will 
depend on the scheduler and when multiple vCPU are running on the same 
pCPU then you would have to context switch and it has a cost.

A user who want to move his baremetal app into a guest will have to pay 
the price of virtualization overhead + power if he wants to get good 
interrupt latency result even by using WFI. I would be surprise if it 
looks appealing to some people.

This is why for me implementing guest WFI as polling looks like an 
attempt to muddy the waters.

If you want a good interrupt latency with virtualization, you would pin 
your vCPU and ensure no other vCPU will run on this pCPU. And the you 
can play with the scheduler to optimize it (e.g avoiding pointless 
context switch...).

So for me implementing guest WFI as polling looks like an attempt to 
muddy the waters. It is not gonna solve the problem of the context 
switch takes time.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-02-21 12:30 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1487286292-29502-1-git-send-email-sstabellini@kernel.org>
     [not found] ` <a271394a-6c76-027c-fb08-b3fe775224ba@arm.com>
2017-02-17 22:50   ` [PATCH] xen/arm: introduce vwfi parameter Stefano Stabellini
2017-02-18  1:47     ` Dario Faggioli
2017-02-19 21:27       ` Julien Grall
2017-02-20 10:43         ` George Dunlap
2017-02-20 11:15         ` Dario Faggioli
2017-02-19 21:34     ` Julien Grall
2017-02-20 11:35       ` Dario Faggioli
2017-02-20 18:43         ` Stefano Stabellini
2017-02-20 18:45           ` George Dunlap
2017-02-20 18:49             ` Stefano Stabellini
2017-02-20 18:47       ` Stefano Stabellini
2017-02-20 18:53         ` Julien Grall
2017-02-20 19:20           ` Dario Faggioli
2017-02-20 19:38             ` Julien Grall
2017-02-20 22:53               ` Dario Faggioli
2017-02-21  0:38                 ` Stefano Stabellini
2017-02-21  8:10                   ` Julien Grall
2017-02-21  9:24                     ` Dario Faggioli
2017-02-21 13:04                       ` Julien Grall
2017-02-21  7:59                 ` Julien Grall
2017-02-21  9:09                   ` Dario Faggioli
2017-02-21 12:30                     ` Julien Grall [this message]
2017-02-21 13:46                       ` George Dunlap
2017-02-21 15:07                         ` Dario Faggioli
2017-02-21 17:49                           ` Stefano Stabellini
2017-02-21 17:56                             ` Julien Grall
2017-02-21 18:30                               ` Stefano Stabellini
2017-02-21 19:20                                 ` Julien Grall
2017-02-22  4:21                                   ` Edgar E. Iglesias
2017-02-22 17:22                                     ` Stefano Stabellini
2017-02-23  9:19                                       ` Edgar E. Iglesias
2017-02-21 18:17                             ` George Dunlap
2017-02-22 16:40                               ` Dario Faggioli
2017-02-21 15:14                         ` Julien Grall
2017-02-21 16:59                           ` George Dunlap
2017-02-21 18:03                           ` Stefano Stabellini
2017-02-21 18:24                             ` Julien Grall
2017-02-21 16:51                       ` Dario Faggioli
2017-02-21 17:39                         ` Stefano Stabellini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14575011-0042-8940-c19f-2482136ff91c@foss.arm.com \
    --to=julien.grall@arm.com \
    --cc=dario.faggioli@citrix.com \
    --cc=edgar.iglesias@xilinx.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=nd@arm.com \
    --cc=punit.agrawal@arm.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).