Re: VPMU interrupt unreliability

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Kyle Huey <me@kylehuey.com>
To: Meng Xu <xumengpanda@gmail.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jun Nakajima <jun.nakajima@intel.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Robert O'Callahan <robert@ocallahan.org>
Subject: Re: VPMU interrupt unreliability
Date: Thu, 19 Oct 2017 11:24:59 -0700	[thread overview]
Message-ID: <CAP045ArCGz9Ct1qmtDkaVmJHVk3p_Bptrjtd7tjtcDBUqpO4vg@mail.gmail.com> (raw)
In-Reply-To: <CAENZ-+m2dHbfwc5ZOfBTJcz5=QgrRG+1XBCNs3foSzU93UdBuA@mail.gmail.com>

On Thu, Oct 19, 2017 at 11:20 AM, Meng Xu <xumengpanda@gmail.com> wrote:
> On Thu, Oct 19, 2017 at 11:40 AM, Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>>
>> On 19/10/17 16:09, Kyle Huey wrote:
>> > On Wed, Oct 11, 2017 at 7:09 AM, Boris Ostrovsky
>> > <boris.ostrovsky@oracle.com> wrote:
>> >> On 10/10/2017 12:54 PM, Kyle Huey wrote:
>> >>> On Mon, Jul 24, 2017 at 9:54 AM, Kyle Huey <me@kylehuey.com> wrote:
>> >>>> On Mon, Jul 24, 2017 at 8:07 AM, Boris Ostrovsky
>> >>>> <boris.ostrovsky@oracle.com> wrote:
>> >>>>>>> One thing I noticed is that the workaround doesn't appear to be
>> >>>>>>> complete: it is only checking PMC0 status and not other counters (fixed
>> >>>>>>> or architectural). Of course, without knowing what the actual problem
>> >>>>>>> was it's hard to say whether this was intentional.
>> >>>>>> handle_pmc_quirk appears to loop through all the counters ...
>> >>>>> Right, I didn't notice that it is shifting MSR_CORE_PERF_GLOBAL_STATUS
>> >>>>> value one by one and so it is looking at all bits.
>> >>>>>
>> >>>>>>>> 2. Intercepting MSR loads for counters that have the workaround
>> >>>>>>>> applied and giving the guest the correct counter value.
>> >>>>>>> We'd have to keep track of whether the counter has been reset (by the
>> >>>>>>> quirk) since the last MSR write.
>> >>>>>> Yes.
>> >>>>>>
>> >>>>>>>> 3. Or perhaps even changing the workaround to disable the PMI on that
>> >>>>>>>> counter until the guest acks via GLOBAL_OVF_CTRL, assuming that works
>> >>>>>>>> on the relevant hardware.
>> >>>>>>> MSR_CORE_PERF_GLOBAL_OVF_CTRL is written immediately after the quirk
>> >>>>>>> runs (in core2_vpmu_do_interrupt()) so we already do this, don't we?
>> >>>>>> I'm suggesting waiting until the *guest* writes to the (virtualized)
>> >>>>>> GLOBAL_OVF_CTRL.
>> >>>>> Wouldn't it be better to wait until the counter is reloaded?
>> >>>> Maybe!  I haven't thought through it a lot.  It's still not clear to
>> >>>> me whether MSR_CORE_PERF_GLOBAL_OVF_CTRL actually controls the
>> >>>> interrupt in any way or whether it just resets the bits in
>> >>>> MSR_CORE_PERF_GLOBAL_STATUS and acking the interrupt on the APIC is
>> >>>> all that's required to reenable it.
>> >>>>
>> >>>> - Kyle
>> >>> I wonder if it would be reasonable to just remove the workaround
>> >>> entirely at some point.  The set of people using 1) several year old
>> >>> hardware, 2) an up to date Xen, and 3) the off-by-default performance
>> >>> counters is probably rather small.
>> >> We'd probably want to only enable this for affected processors, not
>> >> remove it outright. But the problem is that we still don't know for sure
>> >> whether this issue affects NHM only, do we?
>> >>
>> >> (https://lists.xenproject.org/archives/html/xen-devel/2017-07/msg02242.html
>> >> is the original message)
>> > Yes, the basic problem is that we don't know where to draw the line.
>>
>> vPMU is disabled by default for security reasons,
>
>
> Is there any document about the possible attack via the vPMU? The
> document I found (such as [1] and XSA-163) just briefly say that the
> vPMU should be disabled due to security concern.
>
>
> [1] https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html

Cross-guest information leaks, presumably.

>>
>> and also broken, in a
>> way which demonstrates that vPMU isn't getting much real-world use.
>
> I also noticed that AWS seems support part of the vPMU
> functionalities, which were used by Netflix to optimize their
> applications' performance, according to
> http://www.brendangregg.com/blog/2017-05-04/the-pmcs-of-ec2.html .
>
> I guess the security issue should be solved by AWS? However, without
> knowing how the attack could be conducted, I'm not sure how AWS avoids
> the attack concern for vPMU.

AWS only allows you to use the vPMU if you have the entire physical
machine your VM is running on dedicated to yourself.  Cross-guest
information leaks are not a big deal if the same tenant controls all
the guests.

>>
>> As far as I'm concerned, all options (including rm -rf and start from
>> scratch) are acceptable, especially if this ends up giving us a better
>> overall subsystem.
>>
>> Do we know how other hypervisors work around this issue?
>
> Maybe the solution of AWS is a choice? I'm not sure. I'm just thinking aloud. :)
>
> Thanks,
>
> Meng
>
> --
> Meng Xu
> Ph.D. Candidate in Computer and Information Science
> University of Pennsylvania
> http://www.cis.upenn.edu/~mengxu/

- Kyle

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

next prev parent reply	other threads:[~2017-10-19 18:24 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-22 20:16 VPMU interrupt unreliability Kyle Huey
2017-07-24 14:08 ` Boris Ostrovsky
2017-07-24 14:26   ` Kyle Huey
2017-07-24 15:07     ` Boris Ostrovsky
2017-07-24 16:54       ` Kyle Huey
2017-10-10 16:54         ` Kyle Huey
2017-10-11 14:09           ` Boris Ostrovsky
2017-10-19 15:09             ` Kyle Huey
2017-10-19 15:40               ` Andrew Cooper
2017-10-19 18:20                 ` Meng Xu
2017-10-19 18:24                   ` Kyle Huey [this message]
2017-10-19 18:38                     ` Andrew Cooper
2017-10-20  7:07                   ` Jan Beulich
2017-10-23  2:50                     ` Meng Xu
2017-07-24 14:13 ` Andrew Cooper
2017-07-24 14:32   ` Kyle Huey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP045ArCGz9Ct1qmtDkaVmJHVk3p_Bptrjtd7tjtcDBUqpO4vg@mail.gmail.com \
    --to=me@kylehuey.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=dietmar.hahn@ts.fujitsu.com \
    --cc=jun.nakajima@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=robert@ocallahan.org \
    --cc=xen-devel@lists.xen.org \
    --cc=xumengpanda@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).