xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Kyle Huey <me@kylehuey.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>,
	Robert O'Callahan <robert@ocallahan.org>,
	Jun Nakajima <jun.nakajima@intel.com>,
	xen-devel@lists.xen.org
Subject: Re: VPMU interrupt unreliability
Date: Thu, 19 Oct 2017 16:40:43 +0100	[thread overview]
Message-ID: <e5b0b3d5-dc90-b95c-e4a3-44bb5606c93c@citrix.com> (raw)
In-Reply-To: <CAP045Aq5k6DE4v6GkR=B1J_vM8RJibhMH1OewKyh46isRt8Acw@mail.gmail.com>

On 19/10/17 16:09, Kyle Huey wrote:
> On Wed, Oct 11, 2017 at 7:09 AM, Boris Ostrovsky
> <boris.ostrovsky@oracle.com> wrote:
>> On 10/10/2017 12:54 PM, Kyle Huey wrote:
>>> On Mon, Jul 24, 2017 at 9:54 AM, Kyle Huey <me@kylehuey.com> wrote:
>>>> On Mon, Jul 24, 2017 at 8:07 AM, Boris Ostrovsky
>>>> <boris.ostrovsky@oracle.com> wrote:
>>>>>>> One thing I noticed is that the workaround doesn't appear to be
>>>>>>> complete: it is only checking PMC0 status and not other counters (fixed
>>>>>>> or architectural). Of course, without knowing what the actual problem
>>>>>>> was it's hard to say whether this was intentional.
>>>>>> handle_pmc_quirk appears to loop through all the counters ...
>>>>> Right, I didn't notice that it is shifting MSR_CORE_PERF_GLOBAL_STATUS
>>>>> value one by one and so it is looking at all bits.
>>>>>
>>>>>>>> 2. Intercepting MSR loads for counters that have the workaround
>>>>>>>> applied and giving the guest the correct counter value.
>>>>>>> We'd have to keep track of whether the counter has been reset (by the
>>>>>>> quirk) since the last MSR write.
>>>>>> Yes.
>>>>>>
>>>>>>>> 3. Or perhaps even changing the workaround to disable the PMI on that
>>>>>>>> counter until the guest acks via GLOBAL_OVF_CTRL, assuming that works
>>>>>>>> on the relevant hardware.
>>>>>>> MSR_CORE_PERF_GLOBAL_OVF_CTRL is written immediately after the quirk
>>>>>>> runs (in core2_vpmu_do_interrupt()) so we already do this, don't we?
>>>>>> I'm suggesting waiting until the *guest* writes to the (virtualized)
>>>>>> GLOBAL_OVF_CTRL.
>>>>> Wouldn't it be better to wait until the counter is reloaded?
>>>> Maybe!  I haven't thought through it a lot.  It's still not clear to
>>>> me whether MSR_CORE_PERF_GLOBAL_OVF_CTRL actually controls the
>>>> interrupt in any way or whether it just resets the bits in
>>>> MSR_CORE_PERF_GLOBAL_STATUS and acking the interrupt on the APIC is
>>>> all that's required to reenable it.
>>>>
>>>> - Kyle
>>> I wonder if it would be reasonable to just remove the workaround
>>> entirely at some point.  The set of people using 1) several year old
>>> hardware, 2) an up to date Xen, and 3) the off-by-default performance
>>> counters is probably rather small.
>> We'd probably want to only enable this for affected processors, not
>> remove it outright. But the problem is that we still don't know for sure
>> whether this issue affects NHM only, do we?
>>
>> (https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg02242.html
>> is the original message)
> Yes, the basic problem is that we don't know where to draw the line.

vPMU is disabled by default for security reasons, and also broken, in a
way which demonstrates that vPMU isn't getting much real-world use.

As far as I'm concerned, all options (including rm -rf and start from
scratch) are acceptable, especially if this ends up giving us a better
overall subsystem.

Do we know how other hypervisors work around this issue?

I'm tempted to suggest just ripping it straight out.  NHM is ancient
these days, and if someone does manage to get a repro, we stand a better
chance of being able to debug it properly.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-10-19 15:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-22 20:16 VPMU interrupt unreliability Kyle Huey
2017-07-24 14:08 ` Boris Ostrovsky
2017-07-24 14:26   ` Kyle Huey
2017-07-24 15:07     ` Boris Ostrovsky
2017-07-24 16:54       ` Kyle Huey
2017-10-10 16:54         ` Kyle Huey
2017-10-11 14:09           ` Boris Ostrovsky
2017-10-19 15:09             ` Kyle Huey
2017-10-19 15:40               ` Andrew Cooper [this message]
2017-10-19 18:20                 ` Meng Xu
2017-10-19 18:24                   ` Kyle Huey
2017-10-19 18:38                     ` Andrew Cooper
2017-10-20  7:07                   ` Jan Beulich
2017-10-23  2:50                     ` Meng Xu
2017-07-24 14:13 ` Andrew Cooper
2017-07-24 14:32   ` Kyle Huey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5b0b3d5-dc90-b95c-e4a3-44bb5606c93c@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dietmar.hahn@ts.fujitsu.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=me@kylehuey.com \
    --cc=robert@ocallahan.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).