From: Marc Zyngier <maz@kernel.org>
To: Julien Grall <julien@xen.org>,
Stefano Stabellini <stefano.stabellini@xilinx.com>
Cc: Peng Fan <peng.fan@nxp.com>,
Stefano Stabellini <sstabellini@kernel.org>,
George.Dunlap@citrix.com, Wei Xu <xuwei5@hisilicon.com>,
Bertrand.Marquis@arm.com, xen-devel@lists.xenproject.org
Subject: Re: [PATCH v2] xen/arm: implement GICD_I[S/C]ACTIVER reads
Date: Fri, 03 Apr 2020 09:47:05 +0100 [thread overview]
Message-ID: <85acdd9fa8248ddb93f2c5792bf5bd41@kernel.org> (raw)
In-Reply-To: <d457455f-a1ad-1964-ff15-45d794f1822a@xen.org>
On 2020-04-02 19:52, Julien Grall wrote:
> (+Marc)
Thanks for looping me in. Definitely an interesting read, but also a
very
puzzling one.
>
> Hi Stefano,
>
> On 02/04/2020 18:19, Stefano Stabellini wrote:
>> On Wed, 1 Apr 2020, Julien Grall wrote:
>>> On 01/04/2020 01:57, Stefano Stabellini wrote:
>>>> On Mon, 30 Mar 2020, Julien Grall wrote:
>>>>> Hi Stefano,
>>>>>
>>>>> On 30/03/2020 17:35, Stefano Stabellini wrote:
>>>>>> On Sat, 28 Mar 2020, Julien Grall wrote:
>>>>>>> qHi Stefano,
>>>>>>>
>>>>>>> On 27/03/2020 02:34, Stefano Stabellini wrote:
>>>>>>>> This is a simple implementation of GICD_ICACTIVER /
>>>>>>>> GICD_ISACTIVER
>>>>>>>> reads. It doesn't take into account the latest state of
>>>>>>>> interrupts
>>>>>>>> on
>>>>>>>> other vCPUs. Only the current vCPU is up-to-date. A full
>>>>>>>> solution is
>>>>>>>> not possible because it would require synchronization among all
>>>>>>>> vCPUs,
>>>>>>>> which would be very expensive in terms or latency.
>>>>>>>
>>>>>>> Your sentence suggests you have number showing that correctly
>>>>>>> emulating
>>>>>>> the
>>>>>>> registers would be too slow. Mind sharing them?
>>>>>>
>>>>>> No, I don't have any numbers. Would you prefer a different wording
>>>>>> or a
>>>>>> better explanation? I also realized there is a typo in there
>>>>>> (or/of).
>>>>> Let me start with I think correctness is more important than speed.
>>>>> So I would have expected your commit message to contain some fact
>>>>> why
>>>>> synchronization is going to be slow and why this is a problem.
>>>>>
>>>>> To give you a concrete example, the implementation of set/way
>>>>> instructions
>>>>> are
>>>>> really slow (it could take a few seconds depending on the setup).
>>>>> However,
>>>>> this was fine because not implementing them correctly would have a
>>>>> greater
>>>>> impact on the guest (corruption) and they are not used often.
>>>>>
>>>>> I don't think the performance in our case will be in same order
>>>>> magnitude.
>>>>> It
>>>>> is most likely to be in the range of milliseconds (if not less)
>>>>> which I
>>>>> think
>>>>> is acceptable for emulation (particularly for the vGIC) and the
>>>>> current
>>>>> uses.
>>>>
>>>> Writing on the mailing list some of our discussions today.
>>>>
>>>> Correctness is not just in terms of compliance to a specification
>>>> but it
>>>> is also about not breaking guests. Introducing latency in the range
>>>> of
>>>> milliseconds, or hundreds of microseconds, would break any latency
>>>> sensitive workloads. We don't have numbers so we don't know for
>>>> certain
>>>> the effect that your suggestion would have.
>>>
>>> You missed part of the discussion. I don't disagree that latency is
>>> important.
>>> However, if an implementation is only 95% reliable, then it means 5%
>>> of the
>>> time your guest may break (corruption, crash, deadlock...). At which
>>> point the
>>> latency is the last of your concern.
>>
>> Yeah I missed to highlight it, also because I look at it from a
>> slightly
>> different perspective: I think IRQ latency is part of correctness.
No. Low latency is a very desirable thing, but it doesn't matter at all
when
you don't even have functional correctness. To use my favourite car
analogy,
having a bigger engine doesn't help when you're about to hit the wall
and
have no breaks... You just hit the wall faster.
>>
>> If we have a solution that is not 100% faithful to the specification
>> we
>> are going to break guests that rely on a faithful implementation of
>> ISACTIVER.
>>
>> If we have a solution that is 100% faithful to the specification but
>> causes latency spikes it breaks RT guests.
>>
>> But they are different sets of guests, it is not like one is
>> necessarily
>> a subset of the other: there are guests that cannot tolerate any
>> latency
>> spikes but they are OK with an imprecise implementation of ISACTIVER.
s/imprecise/massively incorrect/
>>
>> My preference is a solution that is both spec-faithful and also
>> doesn't
>> cause any latency spikes. If that is not possible then we'll have to
>> compromise or come up with "creative" ideas.
You want your cake and eat it. Always a good thing to want.
As long as you don't pretend you implement the GIC architecture, expose
something else altogether to the guests and have specific drivers in all
the guest operating systems under the sun, by all mean, be creative.
> I do agree that latency is important. However, this needs to be based
> on numbers or a good grasp as to why this would be an issue. Neither
> of these have been provided so far.
>
> As we discussed on Tuesday, the cost for other vCPUs is only going to
> be a trap to the hypervisor and then back again. The cost is likely
> smaller than receiving and forwarding an interrupt.
>
> You actually agreed on this analysis. So can you enlighten me as to
> why receiving an interrupt is a not problem for latency but this is?
>
>>
>>>> It would be interesting to have those numbers, and I'll add to my
>>>> TODO
>>>> list to run the experiments you suggested, but I'll put it on the
>>>> back-burner (from a Xilinx perspective it is low priority as no
>>>> customers are affected.)
>>>
>>> How about we get a correct implementation merge first and then
>>> discuss about
>>> optimization? This would allow the community to check whether there
>>> are
>>> actually noticeable latency in their workload.
>>
>> A correct implementation to me means that it is correct from both the
>> specification point of view as well as from a latency point of view. A
>> patch that potentially introduces latency spikes could cause guest
>> breakage and I don't think it should be considered correct. The
>> tests would have to be done beforehand.
Please find anything that specifies latency in the GIC spec. Oh wait,
there is none. Because that's a quality of implementation thing, and
not a correctness issue.
>
> In all honesty, writing and testing the implementation would have
> likely took you less than trying to push for "creative" ideas or your
> patch.
>
>> In terms of other "creative" ideas, here are some:
>
> "creative" ideas should really be the last resort. Correct me if I am
> wrong, but I don't think we are there yet.
>
>>
>> One idea, as George suggested, would be to document the interface
>> deviation. The intention is still to remove any deviation but at least
>> we would be clear about what we have. Ideally in a single place
>> together
>> with other hypervisors. This is my preference.
>
> It is not without saying that deviation from specification should not
> be taken lightly and has risks.
>
> The main risk is you are never going to be able to reliably run an OS
> on Xen unless you manage to get the deviation accepted by the wider
> community and Arm.
There is just no way I'll ever accept a change to the GIC interrupt
state
machine for Linux. Feel free to try and convince other OS maintainers.
If you want to be creative, be really creative. Implement a fully PV
interrupt
controller that gives you simple enough semantics so that you can
actually be
deterministic. Don't pretend you implement the GIC architecture.
>> Another idea is that we could crash the guest if GICD_ISACTIVER is
>> read
>> from a multi-vcpu guest. It is similar to what we already do today but
>> better because we would do it purposely (not because of a typo) and
>> because it will work for single vcpu guests at least.
>>
>> We could also leave it as is (crash all the time) but it implies that
>> vendors that are seeing issues with Linux today will have to keep
>> maintaining patches in their private trees until a better solution is
>> found. This would also be the case if we crash multi-vcpus guests as
>> previously suggested.
OK, that's just insane. Suggesting that guests should be left crashing
because the are doing *the right thing* is just barking mad. I'm all for
exposing guest bugs when they take shortcuts with the architecture, but
blaming the guest for what is just a bug in Xen?
If I was someone developing a product using Xen/ARM, I'd be very worried
about what you have written above. Because it really reads "we don't
care
about reliability as long as we can show amazing numbers". I really hope
it isn't what you mean.
M.
--
Jazz is not dead. It just smells funny...
next prev parent reply other threads:[~2020-04-03 8:56 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-27 2:34 [Xen-devel] [PATCH v2] xen/arm: implement GICD_I[S/C]ACTIVER reads Stefano Stabellini
2020-03-28 11:52 ` Julien Grall
2020-03-30 16:35 ` Stefano Stabellini
2020-03-30 21:19 ` Julien Grall
2020-04-01 0:57 ` Stefano Stabellini
2020-04-01 8:30 ` Julien Grall
2020-04-01 9:54 ` Bertrand Marquis
2020-04-01 17:02 ` George Dunlap
2020-04-01 17:56 ` Julien Grall
2020-04-01 17:23 ` Julien Grall
2020-04-02 8:17 ` Bertrand Marquis
2020-04-02 17:19 ` Stefano Stabellini
2020-04-02 18:52 ` Julien Grall
2020-04-03 8:47 ` Marc Zyngier [this message]
2020-04-03 10:43 ` George Dunlap
2020-04-03 10:59 ` Marc Zyngier
2020-04-03 11:15 ` George Dunlap
2020-04-03 11:16 ` Julien Grall
2020-04-03 16:18 ` Stefano Stabellini
2020-04-03 16:23 ` Julien Grall
2020-04-03 17:46 ` Julien Grall
2020-04-03 19:41 ` Stefano Stabellini
2020-04-03 20:27 ` Julien Grall
2020-04-06 17:58 ` George Dunlap
2020-04-06 18:47 ` Julien Grall
2020-04-07 16:16 ` George Dunlap
2020-04-07 16:25 ` Bertrand Marquis
2020-04-07 16:52 ` Julien Grall
2020-04-07 16:50 ` Julien Grall
2020-04-07 16:56 ` Julien Grall
2020-04-07 18:16 ` George Dunlap
2020-04-07 19:58 ` Julien Grall
2020-04-09 1:26 ` Stefano Stabellini
2020-04-09 12:56 ` Julien Grall
2020-04-09 13:00 ` Julien Grall
2020-04-17 15:16 ` Bertrand Marquis
2020-04-17 16:07 ` Julien Grall
2020-04-17 16:31 ` Stefano Stabellini
2020-04-17 16:47 ` Julien Grall
2020-04-08 15:54 ` Julien Grall
2020-03-31 0:05 ` [Xen-devel] " Julien Grall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=85acdd9fa8248ddb93f2c5792bf5bd41@kernel.org \
--to=maz@kernel.org \
--cc=Bertrand.Marquis@arm.com \
--cc=George.Dunlap@citrix.com \
--cc=julien@xen.org \
--cc=peng.fan@nxp.com \
--cc=sstabellini@kernel.org \
--cc=stefano.stabellini@xilinx.com \
--cc=xen-devel@lists.xenproject.org \
--cc=xuwei5@hisilicon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.