xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andre Przywara <andre.przywara@amd.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	xen-devel <xen-devel@lists.xensource.com>
Subject: Re: [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels
Date: Tue, 22 May 2012 23:02:01 +0200	[thread overview]
Message-ID: <4FBBFEC9.6040100@amd.com> (raw)
In-Reply-To: <20120522171858.GB19601@phenom.dumpdata.com>

On 05/22/2012 07:18 PM, Konrad Rzeszutek Wilk wrote:
> On Tue, May 22, 2012 at 06:07:11PM +0200, Andre Przywara wrote:
>> Hi,
>>
>> while testing some APERF/MPERF semantics I discovered that this
>> feature is enabled in Xen Dom0, but is not reliable.
>> The Linux kernel's scheduler uses this feature if it sees the CPUID
>> bit, leading to costly RDMSR traps (a few 100,000s during a kernel
>> compile) and bogus values due to VCPU migration during the
>
> Can you point me to the Linux scheduler code that does this? Thanks.

arch/x86/kernel/cpu/sched.c contains code to read out and compute 
APERF/MPERF registers. I added a Xen debug-key to dump a usage counter 
added in traps.c and thus could prove that it is actually the kernel 
that accesses these registers.
As far as I understood this the idea is to learn about boosting and 
down-clocking (P-states) to get a fairer view on the actual computing 
time a process consumed.

>> measurement.
>> The attached patch explicitly disables this CPU capability inside
>> the Linux kernel, I couldn't measure any APERF/MPERF reads anymore
>> with the patch applied.
>> I am not sure if the PVOPS code is the right place to fix this, we
>> could as well do it in the HV's xen/arch/x86/traps.c:pv_cpuid().
>> Also when the Dom0 VCPUs are pinned, we could allow this, but I am
>> not sure if it's worth to do so.
>>
>> Awaiting your comments.
>>
>> Regards,
>> Andre.
>>
>> P.S. Of course this doesn't fix pure userland software like
>> cpupower, but I would consider this in the user's responsibility to
>
> Which would not work anymore as the cpufreq support is disabled
> when it boots under Xen.

Do you mean with "anymore" in a future kernel? I tested this on 3.4.0 
and cpupower monitor worked fine. Right, cpufreq is not enabled, but 
cpupower uses the /dev/cpu/<n>/msr device file to directly read the 
MSRs. So I get this output if run on an idle Dom0:

     |Mperf
CPU | C0   | Cx   | Freq
    0|  0.10| 99.90|  3473
....

Regards,
Andre.

>
>> not use these tools in Dom0, but instead use xenpm.
>>
>> --
>> Andre Przywara
>> AMD-Operating System Research Center (OSRC), Dresden, Germany
>
>> commit e802e47d85314b4541288e4a19d057e2ea885a28
>> Author: Andre Przywara<andre.przywara@amd.com>
>> Date:   Tue May 22 15:13:07 2012 +0200
>>
>>      filter APERFMPERF feature in Xen to avoid kernel internal usage
>>
>>      Signed-off-by: Andre Przywara<andre.przywara@amd.com>
>>
>> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
>> index 95dccce..71252d5 100644
>> --- a/arch/x86/xen/enlighten.c
>> +++ b/arch/x86/xen/enlighten.c
>> @@ -240,6 +240,11 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
>>   		*dx = cpuid_leaf5_edx_val;
>>   		return;
>>
>> +	case 6:
>> +		/* Disabling APERFMPERF for kernel usage */
>> +		maskecx = ~(1U<<  0);
>> +		break;
>> +
>>   	case 0xb:
>>   		/* Suppress extended topology stuff */
>>   		maskebx = 0;
>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>


-- 
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12

  reply	other threads:[~2012-05-22 21:02 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-22 16:07 [PATCH] RFC: Linux: disable APERF/MPERF feature in PV kernels Andre Przywara
2012-05-22 16:52 ` Jeremy Fitzhardinge
2012-05-22 17:08   ` Malcolm Crossley
2012-05-23  8:10     ` Jan Beulich
2012-05-22 20:46   ` Andre Przywara
2012-05-22 17:18 ` Konrad Rzeszutek Wilk
2012-05-22 21:02   ` Andre Przywara [this message]
2012-05-22 21:00     ` Konrad Rzeszutek Wilk
2012-05-22 22:44       ` Andre Przywara
2012-05-23 13:26         ` Konrad Rzeszutek Wilk
2012-05-24 13:24           ` Andre Przywara
2012-05-29 10:54             ` Andre Przywara
2012-05-23  7:34 ` Jan Beulich
2012-05-23  9:14   ` Andre Przywara
2012-05-23  9:43     ` Jan Beulich
2012-05-23  9:52       ` Andre Przywara
2012-05-23 10:01         ` Jan Beulich
2012-05-23 11:11   ` Andrew Cooper
2012-05-23 12:18     ` Jan Beulich
2012-05-23 13:21       ` Andrew Cooper
2012-05-23 13:31         ` Andre Przywara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FBBFEC9.6040100@amd.com \
    --to=andre.przywara@amd.com \
    --cc=jeremy@goop.org \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).