From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Andre Przywara <andre.przywara@amd.com>,
Jeremy Fitzhardinge <jeremy@goop.org>,
xen-devel <xen-devel@lists.xensource.com>,
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels
Date: Wed, 23 May 2012 12:11:47 +0100 [thread overview]
Message-ID: <4FBCC5F3.1080804@citrix.com> (raw)
In-Reply-To: <4FBCAF1902000078000855C5@nat28.tlf.novell.com>
On 23/05/12 08:34, Jan Beulich wrote:
>>>> On 22.05.12 at 18:07, Andre Przywara <andre.przywara@amd.com> wrote:
>> while testing some APERF/MPERF semantics I discovered that this feature
>> is enabled in Xen Dom0, but is not reliable.
>> The Linux kernel's scheduler uses this feature if it sees the CPUID bit,
>> leading to costly RDMSR traps (a few 100,000s during a kernel compile)
>> and bogus values due to VCPU migration during the measurement.
>> The attached patch explicitly disables this CPU capability inside the
>> Linux kernel, I couldn't measure any APERF/MPERF reads anymore with the
>> patch applied.
>> I am not sure if the PVOPS code is the right place to fix this, we could
>> as well do it in the HV's xen/arch/x86/traps.c:pv_cpuid().
>> Also when the Dom0 VCPUs are pinned, we could allow this, but I am not
>> sure if it's worth to do so.
>>
>> Awaiting your comments.
> First of all I'm of the opinion that this indeed should not be
> masked in the hypervisor - there's no reason to disallow the
> guest to read these registers (but we should of course deny
> writes as long as Xen is controlling P-states, which we do).
I am sorry but I am going to have to disagree with you on this point.
We should not be advertising this feature to any guest at all if we
can't provide an implementation which works as native expects. Else we
are failing in our job of virtualisation.
There is 'dom0_vcpus_pin'[1] which identity pins dom0 vcpus, and
prevents update of the affinity masks, and appears to conditionally
allow access to certain MSRs. I think it would be fine to expose this
feature iff dom0s vcpus are pinned in this fashion. That way, the
measurement should succeed, even if dom0 only has read access to the MSRs.
The same logic applies to domU guests, although there is currently no
way I can see to get domU domains into a state where advertising it
would be safe.
~Andrew
[1] I am however about to submit a patch (for inclusion after the
feature freeze) which adds more semantics to this command line option.
There is currently no way ask Xen to pin dom0s vcpus from domain create
time but allow their affinity masks to be updated later. The XenServer
performance team have been experimenting and have found performance
benefits from being able to do this.
>
> Next I'd like to note that in our kernels we simply don't build
> arch/x86/kernel/cpu/sched.o. Together with CPU_FREQ being
> suppressed, there's no consumer of the feature flag in our
> kernels.
>
> So I would think that your suggested change is appropriate,
> but I'm adding Konrad to Cc as these days he's the one to pick
> this up.
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
next prev parent reply other threads:[~2012-05-23 11:11 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-22 16:07 [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels Andre Przywara
2012-05-22 16:52 ` Jeremy Fitzhardinge
2012-05-22 17:08 ` Malcolm Crossley
2012-05-23 8:10 ` Jan Beulich
2012-05-22 20:46 ` Andre Przywara
2012-05-22 17:18 ` Konrad Rzeszutek Wilk
2012-05-22 21:02 ` Andre Przywara
2012-05-22 21:00 ` Konrad Rzeszutek Wilk
2012-05-22 22:44 ` Andre Przywara
2012-05-23 13:26 ` Konrad Rzeszutek Wilk
2012-05-24 13:24 ` Andre Przywara
2012-05-29 10:54 ` Andre Przywara
2012-05-23 7:34 ` Jan Beulich
2012-05-23 9:14 ` Andre Przywara
2012-05-23 9:43 ` Jan Beulich
2012-05-23 9:52 ` Andre Przywara
2012-05-23 10:01 ` Jan Beulich
2012-05-23 11:11 ` Andrew Cooper [this message]
2012-05-23 12:18 ` Jan Beulich
2012-05-23 13:21 ` Andrew Cooper
2012-05-23 13:31 ` Andre Przywara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FBCC5F3.1080804@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=andre.przywara@amd.com \
--cc=jeremy@goop.org \
--cc=konrad.wilk@oracle.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).