From: Andrew Cooper <andrew.cooper3@citrix.com>
To: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Tim Deegan <tim@xen.org>, Xen-devel <xen-devel@lists.xen.org>,
KeirFraser <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
roger.pau@citrix.com
Subject: Re: [Patch] x86/HVM: Fix RTC interrupt modelling
Date: Mon, 10 Feb 2014 18:43:24 +0000 [thread overview]
Message-ID: <52F91DCC.1060007@citrix.com> (raw)
In-Reply-To: <52F91071.1080007@eu.citrix.com>
On 10/02/14 17:46, George Dunlap wrote:
> On 02/10/2014 04:34 PM, Jan Beulich wrote:
>>> * The results from XenRT suggest that the new emulation is better
>>> than the
>>> old.
>> "Better" in the sense of the limited set of uses of the virtual hardware
>> by whatever selection of guest OSes is being run there. But very
>> likely not "better" in the sense on matching up with how the respective
>> hardware specification would require it to behave.
>
> The context of the above sentence was in a justification for including
> it in 4.4. Obviously "occasionally gets stuck during boot" is a
> pretty bad bug that we'd like to see fix. But given the tricky nature
> of this whole area, there's a risk that this will cause regressions in
> *other* situations or operating systems. What I understand Andy to be
> saying is that with the patch, the RTC appears to cause less problems
> than without it.
That was indeed the message I was trying to convey.
>
> What your analysis is missing, Andy, is what the effects might be if
> there were a bug. Obviously other guests might hang during boot; but
> what else? Might they hang at some point much later, perhaps when
> being pounded with interrupts due to heavy network traffic? Might the
> clock begin to drift or jump around? Would the XenRT testing catch
> that if it happened? And, would those potential bugs be worse than
> what we have now?
This is much more complicated to answer. Lets try.
The patch only touches RTC and Periodic Timer code. All of the Periodic
Timer code is reversions back to the previous code (mid Xen-4.3 and
before), along with an attempted justification as to why the old code
was more correct than the current code.
Some of the RTC changes are reversions, but there is also new logic.
All new logic is to do with how to update REG_B and REG_C correctly.
None of the other functionality is touched.
REG_B and REG_C are to do with interrupts, and which events
should(B)/have(C) generated interrupts. The worst case is that a guest
gets none/too-few/too-many interrupts when trying to drive the RTC.
None of this should lead to clock skew, as reading the time values
directly will still provide the same information as before, although any
guest which attempts to guess time based on counting periodic interrupts
from the RTC is a) already broken and b) already having massive skew as
a VM due to vcpu scheduling.
XenRT does have tests for clock drift, but don't know for certain
whether they have been run against the new code yet.
I will ensure they get run on v2 of the patch.
~Andrew
>
> There's a reason for trying to go through the whole exercise,
> particularly in bugs like this. I do have a lot of faith in our
> intuition to consider hundreds of individual factors and come up with
> a reasonably good judgement of the probabilities -- but only if it is
> guided properly. We are all very prone to only consider the things we
> happen to be thinking about, and to completely ignore all the things
> we don't happen to be looking at. My own temptation, looking at this
> bug, is to say, "Random hangs during boot, yeah, that's pretty bad; we
> should take it." But then I'm only looking at the positives of the
> patch: I'm not really making a balance of the positives versus the
> negatives.
>
> The goal of going through the "worst-case-scenario" exercise is to
> bring to our minds the potential outcomes we are prone to ignore. Only
> then can we reasonably trust our intuition to make a properly informed
> judgement.
>
> -George
next prev parent reply other threads:[~2014-02-10 18:43 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-10 11:17 [Patch] x86/HVM: Fix RTC interrupt modelling Andrew Cooper
2014-02-10 12:19 ` Tim Deegan
2014-02-10 15:17 ` Roger Pau Monné
2014-02-10 15:33 ` Keir Fraser
2014-02-10 16:34 ` Jan Beulich
2014-02-10 17:13 ` Andrew Cooper
2014-02-11 9:28 ` Jan Beulich
2014-02-10 17:21 ` Tim Deegan
2014-02-11 9:15 ` Jan Beulich
2014-02-11 12:11 ` Tim Deegan
2014-02-11 12:50 ` Jan Beulich
2014-02-11 13:15 ` Tim Deegan
2014-02-11 13:31 ` Jan Beulich
2014-02-11 13:57 ` Tim Deegan
2014-02-11 14:35 ` Jan Beulich
2014-02-11 13:59 ` Andrew Cooper
2014-02-11 14:10 ` Tim Deegan
2014-02-11 14:52 ` Andrew Cooper
2014-02-10 17:46 ` George Dunlap
2014-02-10 18:43 ` Andrew Cooper [this message]
2014-02-11 8:47 ` Tim Deegan
2014-02-11 9:02 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52F91DCC.1060007@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=george.dunlap@eu.citrix.com \
--cc=keir@xen.org \
--cc=roger.pau@citrix.com \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).