xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Andre Przywara <andre.przywara@amd.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	xen-devel <xen-devel@lists.xensource.com>
Subject: Re: [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels
Date: Wed, 23 May 2012 09:26:14 -0400	[thread overview]
Message-ID: <20120523132614.GA15660@phenom.dumpdata.com> (raw)
In-Reply-To: <4FBC16B7.9080307@amd.com>

On Wed, May 23, 2012 at 12:44:07AM +0200, Andre Przywara wrote:
> On 05/22/2012 11:00 PM, Konrad Rzeszutek Wilk wrote:
> >On Tue, May 22, 2012 at 11:02:01PM +0200, Andre Przywara wrote:
> >>On 05/22/2012 07:18 PM, Konrad Rzeszutek Wilk wrote:
> >>>On Tue, May 22, 2012 at 06:07:11PM +0200, Andre Przywara wrote:
> >>>>Hi,
> >>>>
> >>>>while testing some APERF/MPERF semantics I discovered that this
> >>>>feature is enabled in Xen Dom0, but is not reliable.
> >>>>The Linux kernel's scheduler uses this feature if it sees the CPUID
> >>>>bit, leading to costly RDMSR traps (a few 100,000s during a kernel
> >>>>compile) and bogus values due to VCPU migration during the
> >>>
> >>>Can you point me to the Linux scheduler code that does this? Thanks.
> >>
> >>arch/x86/kernel/cpu/sched.c contains code to read out and compute
> >>APERF/MPERF registers. I added a Xen debug-key to dump a usage
> >>counter added in traps.c and thus could prove that it is actually
> >>the kernel that accesses these registers.
> >>As far as I understood this the idea is to learn about boosting and
> >>down-clocking (P-states) to get a fairer view on the actual
> >>computing time a process consumed.
> >
> >Looks like its looking for this:
> >
> >X86_FEATURE_APERFMPERF
> >
> >Perhaps masking that should do it? Something along this in enlighten.c:
> >
> >          cpuid_leaf1_edx_mask =
> >                  ~((1<<  X86_FEATURE_MCE)  |  /* disable MCE */
> >                    (1<<  X86_FEATURE_MCA)  |  /* disable MCA */
> >                    (1<<  X86_FEATURE_MTRR) |  /* disable MTRR */
> >                    (1<<  X86_FEATURE_ACC));   /* thermal monitoring
> >
> >would be more appropiate?
> >
> >Or is that attribute on a different leaf?
> 
> Right, it is bit 0 on level 6. That's why I couldn't use any of the
> predefined masks and I didn't feel like inventing a new one just for
> this single bit.
> We could as well explicitly use clear_cpu_cap somewhere, but I
> didn't find any code place in the Xen tree already doing this,
> instead it looks like it belongs to where I put it (we handle leaf 5
> in a special way already here)

OK, can you resend the patch please, looking similar to what you sent
earlier, but do use a #define if possible (you can have the #define
in that file) and an comment explaining why this is neccessary -
and point to the Linux source code that uses this.

Thanks!
.. snip..
> >>>>P.S. Of course this doesn't fix pure userland software like
> >>>>cpupower, but I would consider this in the user's responsibility to
> >>>
> >>>Which would not work anymore as the cpufreq support is disabled
> >>>when it boots under Xen.
> >>
> >>Do you mean with "anymore" in a future kernel? I tested this on
> >>3.4.0 and cpupower monitor worked fine. Right, cpufreq is not
> >>enabled, but cpupower uses the /dev/cpu/<n>/msr device file to
> >>directly read the MSRs. So I get this output if run on an idle Dom0:
> >
> >Ahh. Neat. Will have to play with that.
> 
> Bad news is we cannot forbid cpupower querying the feature directly
> using the CPUID instruction in PV guests. Only we could patch it to
> use /proc/cpuinfo readout instead, as this reflects the kernel view
> of available features. With my patch aperfmperf is no longer there.

Looks like a patch to cpupower should be cooked up too?

  reply	other threads:[~2012-05-23 13:26 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-22 16:07 [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels Andre Przywara
2012-05-22 16:52 ` Jeremy Fitzhardinge
2012-05-22 17:08   ` Malcolm Crossley
2012-05-23  8:10     ` Jan Beulich
2012-05-22 20:46   ` Andre Przywara
2012-05-22 17:18 ` Konrad Rzeszutek Wilk
2012-05-22 21:02   ` Andre Przywara
2012-05-22 21:00     ` Konrad Rzeszutek Wilk
2012-05-22 22:44       ` Andre Przywara
2012-05-23 13:26         ` Konrad Rzeszutek Wilk [this message]
2012-05-24 13:24           ` Andre Przywara
2012-05-29 10:54             ` Andre Przywara
2012-05-23  7:34 ` Jan Beulich
2012-05-23  9:14   ` Andre Przywara
2012-05-23  9:43     ` Jan Beulich
2012-05-23  9:52       ` Andre Przywara
2012-05-23 10:01         ` Jan Beulich
2012-05-23 11:11   ` Andrew Cooper
2012-05-23 12:18     ` Jan Beulich
2012-05-23 13:21       ` Andrew Cooper
2012-05-23 13:31         ` Andre Przywara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120523132614.GA15660@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=andre.przywara@amd.com \
    --cc=jeremy@goop.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).