From: Andre Przywara <andre.przywara@amd.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
xen-devel <xen-devel@lists.xensource.com>
Subject: Re: [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels
Date: Tue, 29 May 2012 12:54:04 +0200 [thread overview]
Message-ID: <4FC4AACC.4000909@amd.com> (raw)
In-Reply-To: <4FBE36AA.3070903@amd.com>
On 05/24/2012 03:24 PM, Andre Przywara wrote:
> On 05/23/2012 03:26 PM, Konrad Rzeszutek Wilk wrote:
>> On Wed, May 23, 2012 at 12:44:07AM +0200, Andre Przywara wrote:
>>> On 05/22/2012 11:00 PM, Konrad Rzeszutek Wilk wrote:
>>>> On Tue, May 22, 2012 at 11:02:01PM +0200, Andre Przywara wrote:
>>>>> On 05/22/2012 07:18 PM, Konrad Rzeszutek Wilk wrote:
>>>>>> On Tue, May 22, 2012 at 06:07:11PM +0200, Andre Przywara wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> while testing some APERF/MPERF semantics I discovered that this
>>>>>>> feature is enabled in Xen Dom0, but is not reliable.
>>>>>>> The Linux kernel's scheduler uses this feature if it sees the CPUID
>>>>>>> bit, leading to costly RDMSR traps (a few 100,000s during a kernel
>>>>>>> compile) and bogus values due to VCPU migration during the
>>>>>>
>>>>>> Can you point me to the Linux scheduler code that does this? Thanks.
>>>>>
>>>>> arch/x86/kernel/cpu/sched.c contains code to read out and compute
>>>>> APERF/MPERF registers. I added a Xen debug-key to dump a usage
>>>>> counter added in traps.c and thus could prove that it is actually
>>>>> the kernel that accesses these registers.
>>>>> As far as I understood this the idea is to learn about boosting and
>>>>> down-clocking (P-states) to get a fairer view on the actual
>>>>> computing time a process consumed.
>>>>
>>>> Looks like its looking for this:
>>>>
>>>> X86_FEATURE_APERFMPERF
>>>>
>>>> Perhaps masking that should do it? Something along this in enlighten.c:
>>>>
>>>> cpuid_leaf1_edx_mask =
>>>> ~((1<< X86_FEATURE_MCE) | /* disable MCE */
>>>> (1<< X86_FEATURE_MCA) | /* disable MCA */
>>>> (1<< X86_FEATURE_MTRR) | /* disable MTRR */
>>>> (1<< X86_FEATURE_ACC)); /* thermal monitoring
>>>>
>>>> would be more appropiate?
>>>>
>>>> Or is that attribute on a different leaf?
>>>
>>> Right, it is bit 0 on level 6. That's why I couldn't use any of the
>>> predefined masks and I didn't feel like inventing a new one just for
>>> this single bit.
>>> We could as well explicitly use clear_cpu_cap somewhere, but I
>>> didn't find any code place in the Xen tree already doing this,
>>> instead it looks like it belongs to where I put it (we handle leaf 5
>>> in a special way already here)
>>
>> OK, can you resend the patch please, looking similar to what you sent
>> earlier, but do use a #define if possible (you can have the #define
>> in that file) and an comment explaining why this is neccessary -
>> and point to the Linux source code that uses this.
>
> Well, I was about to do this and wanted to see if this has any
> performance impact - only to discover that 3.4 does not trigger it
> anymore. After some debugging it turns out the guy reading APERF/MPERF
> was not arch/x86/kernel/cpu/sched.c, but drivers/cpufreq/mperf.c. So
> with disabling cpufreq the only real user is gone already.
> So the patch is kind of pointless as it on 3.4 with cpufreq already
> disabled. Remains to be investigated why sched.c is not called (I added
> a usage counter, it stays at zero).
The scheduler code accessing the MSRs is disabled by default:
kernel/sched/features.h: SCHED_FEAT(ARCH_POWER, false)
I enabled this and saw the traps.
So the only kernel user remaining is/was cpufreq (and /dev/cpu/<n>/msr).
> To avoid future mis-uses of APERF/MPERF by the kernel, I'd like to add
> the patch anyway. I will send it again when I have a clearer picture of
> this.
The patch will arrive in a minute. Although this is kind of pointless
for 3.4, I put in a stable tag. Also I find the removed aperfmperf flag
from /proc/cpuinfo useful, this can be handy for Xen aware power
management tools.
> .. snip..
>> Looks like a patch to cpupower should be cooked up too?
>
> I will contact the author.
Have done this. We are still discussing the best solution with Thomas
Renninger and Jan Beulich, but a simple warning (and maybe a hint to
xenpm) looks the best for the time being. Not sure if proper
emulation/virtualization is worth the effort.
Regards,
Andre.
--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
next prev parent reply other threads:[~2012-05-29 10:54 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-22 16:07 [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels Andre Przywara
2012-05-22 16:52 ` Jeremy Fitzhardinge
2012-05-22 17:08 ` Malcolm Crossley
2012-05-23 8:10 ` Jan Beulich
2012-05-22 20:46 ` Andre Przywara
2012-05-22 17:18 ` Konrad Rzeszutek Wilk
2012-05-22 21:02 ` Andre Przywara
2012-05-22 21:00 ` Konrad Rzeszutek Wilk
2012-05-22 22:44 ` Andre Przywara
2012-05-23 13:26 ` Konrad Rzeszutek Wilk
2012-05-24 13:24 ` Andre Przywara
2012-05-29 10:54 ` Andre Przywara [this message]
2012-05-23 7:34 ` Jan Beulich
2012-05-23 9:14 ` Andre Przywara
2012-05-23 9:43 ` Jan Beulich
2012-05-23 9:52 ` Andre Przywara
2012-05-23 10:01 ` Jan Beulich
2012-05-23 11:11 ` Andrew Cooper
2012-05-23 12:18 ` Jan Beulich
2012-05-23 13:21 ` Andrew Cooper
2012-05-23 13:31 ` Andre Przywara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FC4AACC.4000909@amd.com \
--to=andre.przywara@amd.com \
--cc=jeremy@goop.org \
--cc=konrad.wilk@oracle.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).