xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andre Przywara <andre.przywara@amd.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	xen-devel <xen-devel@lists.xensource.com>
Subject: Re: [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels
Date: Tue, 29 May 2012 12:54:04 +0200	[thread overview]
Message-ID: <4FC4AACC.4000909@amd.com> (raw)
In-Reply-To: <4FBE36AA.3070903@amd.com>

On 05/24/2012 03:24 PM, Andre Przywara wrote:
> On 05/23/2012 03:26 PM, Konrad Rzeszutek Wilk wrote:
>> On Wed, May 23, 2012 at 12:44:07AM +0200, Andre Przywara wrote:
>>> On 05/22/2012 11:00 PM, Konrad Rzeszutek Wilk wrote:
>>>> On Tue, May 22, 2012 at 11:02:01PM +0200, Andre Przywara wrote:
>>>>> On 05/22/2012 07:18 PM, Konrad Rzeszutek Wilk wrote:
>>>>>> On Tue, May 22, 2012 at 06:07:11PM +0200, Andre Przywara wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> while testing some APERF/MPERF semantics I discovered that this
>>>>>>> feature is enabled in Xen Dom0, but is not reliable.
>>>>>>> The Linux kernel's scheduler uses this feature if it sees the CPUID
>>>>>>> bit, leading to costly RDMSR traps (a few 100,000s during a kernel
>>>>>>> compile) and bogus values due to VCPU migration during the
>>>>>>
>>>>>> Can you point me to the Linux scheduler code that does this? Thanks.
>>>>>
>>>>> arch/x86/kernel/cpu/sched.c contains code to read out and compute
>>>>> APERF/MPERF registers. I added a Xen debug-key to dump a usage
>>>>> counter added in traps.c and thus could prove that it is actually
>>>>> the kernel that accesses these registers.
>>>>> As far as I understood this the idea is to learn about boosting and
>>>>> down-clocking (P-states) to get a fairer view on the actual
>>>>> computing time a process consumed.
>>>>
>>>> Looks like its looking for this:
>>>>
>>>> X86_FEATURE_APERFMPERF
>>>>
>>>> Perhaps masking that should do it? Something along this in enlighten.c:
>>>>
>>>> cpuid_leaf1_edx_mask =
>>>> ~((1<< X86_FEATURE_MCE) | /* disable MCE */
>>>> (1<< X86_FEATURE_MCA) | /* disable MCA */
>>>> (1<< X86_FEATURE_MTRR) | /* disable MTRR */
>>>> (1<< X86_FEATURE_ACC)); /* thermal monitoring
>>>>
>>>> would be more appropiate?
>>>>
>>>> Or is that attribute on a different leaf?
>>>
>>> Right, it is bit 0 on level 6. That's why I couldn't use any of the
>>> predefined masks and I didn't feel like inventing a new one just for
>>> this single bit.
>>> We could as well explicitly use clear_cpu_cap somewhere, but I
>>> didn't find any code place in the Xen tree already doing this,
>>> instead it looks like it belongs to where I put it (we handle leaf 5
>>> in a special way already here)
>>
>> OK, can you resend the patch please, looking similar to what you sent
>> earlier, but do use a #define if possible (you can have the #define
>> in that file) and an comment explaining why this is neccessary -
>> and point to the Linux source code that uses this.
>
> Well, I was about to do this and wanted to see if this has any
> performance impact - only to discover that 3.4 does not trigger it
> anymore. After some debugging it turns out the guy reading APERF/MPERF
> was not arch/x86/kernel/cpu/sched.c, but drivers/cpufreq/mperf.c. So
> with disabling cpufreq the only real user is gone already.
> So the patch is kind of pointless as it on 3.4 with cpufreq already
> disabled. Remains to be investigated why sched.c is not called (I added
> a usage counter, it stays at zero).

The scheduler code accessing the MSRs is disabled by default:
kernel/sched/features.h: SCHED_FEAT(ARCH_POWER, false)
I enabled this and saw the traps.
So the only kernel user remaining is/was cpufreq (and /dev/cpu/<n>/msr).

> To avoid future mis-uses of APERF/MPERF by the kernel, I'd like to add
> the patch anyway. I will send it again when I have a clearer picture of
> this.

The patch will arrive in a minute. Although this is kind of pointless 
for 3.4, I put in a stable tag. Also I find the removed aperfmperf flag 
from /proc/cpuinfo useful, this can be handy for Xen aware power 
management tools.

> .. snip..
>> Looks like a patch to cpupower should be cooked up too?
>
> I will contact the author.

Have done this. We are still discussing the best solution with Thomas 
Renninger and Jan Beulich, but a simple warning (and maybe a hint to 
xenpm) looks the best for the time being. Not sure if proper 
emulation/virtualization is worth the effort.

Regards,
Andre.

-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

  reply	other threads:[~2012-05-29 10:54 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-22 16:07 [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels Andre Przywara
2012-05-22 16:52 ` Jeremy Fitzhardinge
2012-05-22 17:08   ` Malcolm Crossley
2012-05-23  8:10     ` Jan Beulich
2012-05-22 20:46   ` Andre Przywara
2012-05-22 17:18 ` Konrad Rzeszutek Wilk
2012-05-22 21:02   ` Andre Przywara
2012-05-22 21:00     ` Konrad Rzeszutek Wilk
2012-05-22 22:44       ` Andre Przywara
2012-05-23 13:26         ` Konrad Rzeszutek Wilk
2012-05-24 13:24           ` Andre Przywara
2012-05-29 10:54             ` Andre Przywara [this message]
2012-05-23  7:34 ` Jan Beulich
2012-05-23  9:14   ` Andre Przywara
2012-05-23  9:43     ` Jan Beulich
2012-05-23  9:52       ` Andre Przywara
2012-05-23 10:01         ` Jan Beulich
2012-05-23 11:11   ` Andrew Cooper
2012-05-23 12:18     ` Jan Beulich
2012-05-23 13:21       ` Andrew Cooper
2012-05-23 13:31         ` Andre Przywara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FC4AACC.4000909@amd.com \
    --to=andre.przywara@amd.com \
    --cc=jeremy@goop.org \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).