From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Borislav Petkov <bp@alien8.de>,
Stefan Bader <stefan.bader@canonical.com>,
Andre Przywara <andre@andrep.de>,
"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
"Rafael J. Wysocki" <rjw@sisk.pl>,
Matthew Garrett <mjg@redhat.com>
Subject: Re: kernel 3.7+ cpufreq regression on AMD system running as dom0
Date: Fri, 18 Jan 2013 14:00:15 -0500 [thread overview]
Message-ID: <20130118190015.GC11351@phenom.dumpdata.com> (raw)
In-Reply-To: <20130115181839.GC8101@liondog.tnic>
On Tue, Jan 15, 2013 at 07:18:39PM +0100, Borislav Petkov wrote:
> On Tue, Jan 15, 2013 at 12:53:05PM -0500, Konrad Rzeszutek Wilk wrote:
> > > I don't think that's the right change - this is fixing baremetal so that
> > > it works on xen. And besides, this code was in powernow-k8 before so I'm
> > > wondering why did it work then.
> >
> > Powernow-k8 only populated the cpufreq policy information. This library
> > (processor_perflib) is the generic library used for ACPI P-states parsing.
> > This specific function (acpi_processor_get_performance_states) is just
> > used to fetch and parse the P-states.
> >
> > Xen-acpi-processor (which we use to upload the P and C-states to the
> > hypervisor) ends up calling this library to parse the P-states
> > and this unfortunate quirk clamps the P-states based on the MSRS.
>
> Huh? This is a fix for _PSS frequency values which are rounded and thus
> imprecise. The _PSS objects are the unfortunate ones, as most of the
> other crap BIOS produces.
I did not explain myself well. The fix is OK - it just that the hypervisor
causes the quirk to not work correctly. Hmm, I wonder if there BIOSes
that do the same thing (cause the MSR to return 0). Per you estimation
of BIOS quality, it seems that this could happen.
>
> > It is odd that this CPU specific quirk got added in this generic
> > library. Is there no ACPI quirk system similar to how DMI quirks are
> > handled?
>
> Even if there were, do you know all the boards and BIOS revisions which
> have those rounded values? The fix addresses the hardware which has
> those 50MHz multiples and simply ignores the _PSS data but reads out the
> P-states directly from the hardware.
Oh, I was not thinking DMI per-say. I was thinking something similar to
DMI-quirk API. But for the ACPI subsystem, so it would be:
if (ARM)
... these quirks neccessary
if (AMD)
.. these quirks
and then the ACPI code can make the calls to this ACPI-quirk API to
figure out whether it needs to modulate values. But this is all
hand-waving at this point.
>
> > Anyhow, I think this patch makes sense - it makes sure that the MSR
> > value is sane.
>
> I agree to a certain degree. Testing the Valid bit is something we
> should do for P-state MSRs - and for all MSRs containing a Valid bit,
> for that matter - and the original code didn't do it.
OK.
>
> However, you need to push down the *correct* frequencies *after* the
> quirk to the hypervisor (I'm looking at push_pxx_to_hypervisor()) so
> that it is aware of the exact P-state frequencies this CPU supports and
> not some rounded values.
Yes! It could be done in the hypervisor (it does the MSRs and figures out
that the P-states need tweaking).
>
> AFAICT for the xen part, of course. But the baseline stands: you need to
> tell the thing that switches P-states the exact P-state frequencies of
> the CPU. :-).
Right, that information is gathered from the MSRs. I think the Xen would
need to do this since it can do the MSRs correctly and modify the P-states.
So something like this in the hypervisor maybe (not even tested):
diff --git a/xen/arch/x86/acpi/cpufreq/powernow.c b/xen/arch/x86/acpi/cpufreq/powernow.c
index a9b7792..54e7808 100644
--- a/xen/arch/x86/acpi/cpufreq/powernow.c
+++ b/xen/arch/x86/acpi/cpufreq/powernow.c
@@ -146,7 +146,40 @@ static int powernow_cpufreq_target(struct cpufreq_policy *policy,
return 0;
}
+#define MSR_AMD_PSTATE_DEF_BASE 0xc0010064
+static void amd_fixup_frequency(struct xen_processor_px *px, int i)
+{
+ u32 hi, lo, fid, did;
+ int index = px->control & 0x00000007;
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+ return;
+
+ if ((boot_cpu_data.x86 == 0x10 && boot_cpu_data.x86_model < 10)
+ || boot_cpu_data.x86 == 0x11) {
+ rdmsr(MSR_AMD_PSTATE_DEF_BASE + index, lo, hi);
+ /* Bit 63 indicates whether contents are valid */
+ if (!(hi & 0x80000000))
+ return;
+
+ fid = lo & 0x3f;
+ did = (lo >> 6) & 7;
+ if (boot_cpu_data.x86 == 0x10)
+ px->core_frequency = (100 * (fid + 0x10)) >> did;
+ else
+ px->core_frequency = (100 * (fid + 8)) >> did;
+ }
+}
+
+static void amd_fixup_freq(struct processor_performance *perf)
+{
+ int i;
+
+ for (i = 0; i < perf->state_count; i++)
+ amd_fixup_frequency(perf->states, i);
+
+}
static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
{
struct acpi_cpufreq_data *data;
@@ -158,6 +191,8 @@ static int powernow_cpufreq_verify(struct cpufreq_policy *policy)
perf = &processor_pminfo[policy->cpu]->perf;
+ amd_fixup_freq(perf);
+
cpufreq_verify_within_limits(policy, 0,
perf->states[perf->platform_limit].core_frequency * 1000);
next prev parent reply other threads:[~2013-01-18 19:00 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-14 15:58 kernel 3.7+ cpufreq regression on AMD system running as dom0 Stefan Bader
2013-01-14 16:34 ` Borislav Petkov
2013-01-14 16:55 ` [Xen-devel] " Jan Beulich
2013-01-14 17:08 ` Stefan Bader
2013-01-14 17:40 ` André Przywara
2013-01-15 17:53 ` Konrad Rzeszutek Wilk
2013-01-15 18:18 ` Borislav Petkov
2013-01-18 19:00 ` Konrad Rzeszutek Wilk [this message]
2013-01-18 19:38 ` [Xen-devel] " Boris Ostrovsky
2013-01-18 20:03 ` Borislav Petkov
2013-01-18 22:00 ` Konrad Rzeszutek Wilk
2013-01-21 12:22 ` Stefan Bader
2013-01-21 12:42 ` Borislav Petkov
2013-01-21 12:53 ` Rafael J. Wysocki
2013-01-21 13:08 ` Borislav Petkov
2013-01-21 13:11 ` Stefan Bader
2013-01-21 15:03 ` Stefan Bader
2013-01-21 15:31 ` Borislav Petkov
2013-01-22 13:54 ` Rafael J. Wysocki
2013-01-22 0:01 ` [Xen-devel] " Boris Ostrovsky
2013-01-16 10:26 ` Jan Beulich
2013-01-16 14:34 ` Stefan Bader
2013-01-15 13:04 ` Matt Wilson
2013-01-15 17:59 ` [Xen-devel] " Matt Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130118190015.GC11351@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=andre@andrep.de \
--cc=bp@alien8.de \
--cc=linux-kernel@vger.kernel.org \
--cc=mjg@redhat.com \
--cc=rjw@sisk.pl \
--cc=stefan.bader@canonical.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox