* RE: [PATCH] acpi-cpufreq: Use IA32_APERF and IA32_MPERF and get freq feedback from hardware
@ 2006-09-25 14:41 Pallipadi, Venkatesh
2006-09-25 16:02 ` Erik Slagter
0 siblings, 1 reply; 7+ messages in thread
From: Pallipadi, Venkatesh @ 2006-09-25 14:41 UTC (permalink / raw)
To: Erik Slagter, cpufreq
>-----Original Message-----
>From: cpufreq-bounces@lists.linux.org.uk
>[mailto:cpufreq-bounces@lists.linux.org.uk] On Behalf Of Erik Slagter
>Sent: Monday, September 25, 2006 6:14 AM
>To: cpufreq@www.linux.org.uk
>Subject: Re: [PATCH] acpi-cpufreq: Use IA32_APERF and
>IA32_MPERF and get freq feedback from hardware
>
>Venkatesh Pallipadi wrote:
>>
>> Enable ondemand governor and acpi-cpufreq to use IA32_APERF
>and IA32_MPERF MSR
>> to get active frequency feedback for the last sampling
>interval. This will
>> make ondemand take right frequency decisions when hardware
>coordination of
>> frequency is going on.
>
>Does this mean there is actually no real clean approach to
>determine the
>current cpu speed?
>
The problem is, a lot of times current frequency may not mean much as
frequency can change immediately after one get frequency call or
immediately before we did the call. The freuency can change
asynchronously due to number of reasons like hardware coordination, TM2
and get frequency will not help to get frequency over a period of time.
Also, there is no call back from hardware when frequency changes
asynchronously. So, getting average frequency over a period of time is
the best we can do in order to use the utilization in that period and
make some frequency target decision for next sampling period.
Thanks,
Venki
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] acpi-cpufreq: Use IA32_APERF and IA32_MPERF and get freq feedback from hardware
2006-09-25 14:41 [PATCH] acpi-cpufreq: Use IA32_APERF and IA32_MPERF and get freq feedback from hardware Pallipadi, Venkatesh
@ 2006-09-25 16:02 ` Erik Slagter
0 siblings, 0 replies; 7+ messages in thread
From: Erik Slagter @ 2006-09-25 16:02 UTC (permalink / raw)
To: Pallipadi, Venkatesh; +Cc: cpufreq
Pallipadi, Venkatesh wrote:
> The problem is, a lot of times current frequency may not mean much as
> frequency can change immediately after one get frequency call or
> immediately before we did the call. The freuency can change
> asynchronously due to number of reasons like hardware coordination, TM2
> and get frequency will not help to get frequency over a period of time.
> Also, there is no call back from hardware when frequency changes
> asynchronously. So, getting average frequency over a period of time is
> the best we can do in order to use the utilization in that period and
> make some frequency target decision for next sampling period.
I get your drift ;-)
Actually what I am interested in is this:
- spotting frequency switching by tm2 because I suspect my processor
might be running hot sometimes (a little overlocking ;-))
- seeing the actual running frequency, as this may not be the value the
processor is set for (again, an overclocking issue, I know).
After booting the value in /proc/cpuinfo is correct, related to the
frequency the cpu is actually set to. After a switch by cpufreq, only
the "standard" values are shown there, although performance shows the
cpu is still running at it's higher clock.
BTW I also noticed that on a CPU with C1e the performance increases when
the acpi-cpufreq module is loaded and set to ondemand. Could it be that
the module can make smarter decisions than the processor itself? Does
the module override the C1e functionality, actually? Output of the sys
stats directorie shows indeed correct switching to lower speed when idle.
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH] acpi-cpufreq: Use IA32_APERF and IA32_MPERF and get freq feedback from hardware
@ 2006-09-25 16:36 Pallipadi, Venkatesh
2006-09-25 19:07 ` Erik Slagter
2006-10-02 23:03 ` Dominik Brodowski
0 siblings, 2 replies; 7+ messages in thread
From: Pallipadi, Venkatesh @ 2006-09-25 16:36 UTC (permalink / raw)
To: Erik Slagter; +Cc: cpufreq
>-----Original Message-----
>From: Erik Slagter [mailto:erik@slagter.name]
>Pallipadi, Venkatesh wrote:
>> The problem is, a lot of times current frequency may not mean much as
>> frequency can change immediately after one get frequency call or
>> immediately before we did the call. The freuency can change
>> asynchronously due to number of reasons like hardware
>coordination, TM2
>> and get frequency will not help to get frequency over a
>period of time.
>> Also, there is no call back from hardware when frequency changes
>> asynchronously. So, getting average frequency over a period
>of time is
>> the best we can do in order to use the utilization in that period and
>> make some frequency target decision for next sampling period.
>
>I get your drift ;-)
>
>Actually what I am interested in is this:
>
> - spotting frequency switching by tm2 because I suspect my processor
>might be running hot sometimes (a little overlocking ;-))
> - seeing the actual running frequency, as this may not be the
>value the
>processor is set for (again, an overclocking issue, I know).
>
>After booting the value in /proc/cpuinfo is correct, related to the
>frequency the cpu is actually set to. After a switch by cpufreq, only
>the "standard" values are shown there, although performance shows the
>cpu is still running at it's higher clock.
>
You can look at /sys/..../cpufreq/cpuinfo_cur_freq to get the
instantaneous frequency from hardware (But only works when MSR based
transitions are used. There is no way to get current frequency when IO
port based transitions are being used).
/sys/..../cpufreq/scaling_cur_freq and /proc/cpuinfo shows the last
value that cpufreq tried to set.
>BTW I also noticed that on a CPU with C1e the performance
>increases when
>the acpi-cpufreq module is loaded and set to ondemand. Could it be that
>the module can make smarter decisions than the processor itself? Does
>the module override the C1e functionality, actually? Output of the sys
>stats directorie shows indeed correct switching to lower speed
>when idle.
I am not sure how ondemand can help with more performance with C1E. One
possible explanation:
With C1E, hardware goes to lower frequency on idle by itself. Only
difference ondemand can be making is running at a lower frequency even
when CPU is busy and average utilization is low. Due to this CPU can run
cooler than without ondemand. And due to that TM2 may not kick in as
frequently as it would without the ondemand. But, I am not sure how C1E
makes the difference here. This behavior should be same with ot without
C1E capable CPU. What is the workload you have. Partially idle? What is
the CPU utilization over time?
Thanks,
Venki
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] acpi-cpufreq: Use IA32_APERF and IA32_MPERF and get freq feedback from hardware
2006-09-25 16:36 Pallipadi, Venkatesh
@ 2006-09-25 19:07 ` Erik Slagter
2006-10-02 23:03 ` Dominik Brodowski
1 sibling, 0 replies; 7+ messages in thread
From: Erik Slagter @ 2006-09-25 19:07 UTC (permalink / raw)
To: Pallipadi, Venkatesh; +Cc: cpufreq
[-- Attachment #1.1: Type: text/plain, Size: 1596 bytes --]
Pallipadi, Venkatesh wrote:
> You can look at /sys/..../cpufreq/cpuinfo_cur_freq to get the
> instantaneous frequency from hardware (But only works when MSR based
> transitions are used. There is no way to get current frequency when IO
> port based transitions are being used).
> /sys/..../cpufreq/scaling_cur_freq and /proc/cpuinfo shows the last
> value that cpufreq tried to set.
These only give the frequencies the CPU was "designed" for, not the
actual frequency. Only the information in /proc/cpuinfo is correct, at
least, until cpufreq comes in.
> I am not sure how ondemand can help with more performance with C1E. One
> possible explanation:
> With C1E, hardware goes to lower frequency on idle by itself. Only
> difference ondemand can be making is running at a lower frequency even
> when CPU is busy and average utilization is low. Due to this CPU can run
> cooler than without ondemand. And due to that TM2 may not kick in as
> frequently as it would without the ondemand. But, I am not sure how C1E
> makes the difference here. This behavior should be same with ot without
> C1E capable CPU. What is the workload you have. Partially idle? What is
> the CPU utilization over time?
I observe this while compiling the kernel. It shows a consistent
discrepancy of a few seconds on a total of ~2:45 compile time. I am not
complaining ;-)
Is there a way to know for shure if either tm1 or tm2 kicked in? I'd
really like to know, I want to have max performance. If tm? kicks in, I
won't get that of course. If I know if/when it happens, I can try a
lower frequency or better cooling.
[-- Attachment #1.2: S/MIME Cryptographic Signature --]
[-- Type: application/x-pkcs7-signature, Size: 3315 bytes --]
[-- Attachment #2: Type: text/plain, Size: 147 bytes --]
_______________________________________________
Cpufreq mailing list
Cpufreq@lists.linux.org.uk
http://lists.linux.org.uk/mailman/listinfo/cpufreq
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] acpi-cpufreq: Use IA32_APERF and IA32_MPERF and get freq feedback from hardware
2006-09-25 16:36 Pallipadi, Venkatesh
2006-09-25 19:07 ` Erik Slagter
@ 2006-10-02 23:03 ` Dominik Brodowski
1 sibling, 0 replies; 7+ messages in thread
From: Dominik Brodowski @ 2006-10-02 23:03 UTC (permalink / raw)
To: Pallipadi, Venkatesh; +Cc: cpufreq
On Mon, Sep 25, 2006 at 09:36:13AM -0700, Pallipadi, Venkatesh wrote:
> >BTW I also noticed that on a CPU with C1e the performance
> >increases when
> >the acpi-cpufreq module is loaded and set to ondemand. Could it be that
> >the module can make smarter decisions than the processor itself? Does
> >the module override the C1e functionality, actually? Output of the sys
> >stats directorie shows indeed correct switching to lower speed
> >when idle.
>
> I am not sure how ondemand can help with more performance with C1E. One
> possible explanation:
> With C1E, hardware goes to lower frequency on idle by itself. Only
> difference ondemand can be making is running at a lower frequency even
> when CPU is busy and average utilization is low. Due to this CPU can run
> cooler than without ondemand. And due to that TM2 may not kick in as
> frequently as it would without the ondemand. But, I am not sure how C1E
> makes the difference here. This behavior should be same with ot without
> C1E capable CPU. What is the workload you have. Partially idle? What is
> the CPU utilization over time?
Does the CPU get out of C1E quicker if the CPU frequency is already lower?
Thanks,
Dominik
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] acpi-cpufreq: Use IA32_APERF and IA32_MPERF and get freq feedback from hardware
@ 2006-09-23 0:28 Venkatesh Pallipadi
2006-09-25 13:14 ` Erik Slagter
0 siblings, 1 reply; 7+ messages in thread
From: Venkatesh Pallipadi @ 2006-09-23 0:28 UTC (permalink / raw)
To: Dave Jones; +Cc: cpufreq, Dominik Brodowski
Enable ondemand governor and acpi-cpufreq to use IA32_APERF and IA32_MPERF MSR
to get active frequency feedback for the last sampling interval. This will
make ondemand take right frequency decisions when hardware coordination of
frequency is going on.
Without APERF/MPERF, ondemand can take wrong decision at times due
to underlying hardware coordination or TM2.
Example:
* CPU 0 and CPU 1 are hardware cooridnated.
* CPU 1 running at highest frequency.
* CPU 0 was running at highest freq. Now ondemand reduces it to
some intermediate frequency based on utilization.
* Due to underlying hardware coordination with other CPU 1, CPU 0 continues to
run at highest frequency (as long as other CPU is at highest).
* When ondemand samples CPU 0 again next time, without actual frequency
feedback from APERF/MPERF, it will think that previous frequency change
was successful and can go to wrong target frequency. This is because it
thinks that utilization it has got this sampling interval is when running at
intermediate frequency, rather than actual highest frequency.
More information about IA32_APERF IA32_MPERF MSR:
Refer to IA-32 Intel® Architecture Software Developer's Manual at
http://developer.intel.com
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Index: linux-2.6.18-rc4/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
===================================================================
--- linux-2.6.18-rc4.orig/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ linux-2.6.18-rc4/arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -59,10 +59,12 @@ enum {
};
#define INTEL_MSR_RANGE (0xffff)
+#define CPUID_6_ECX_APERFMPERF_CAPABILITY (0x1)
struct acpi_cpufreq_data {
struct acpi_processor_performance *acpi_data;
struct cpufreq_frequency_table *freq_table;
+ unsigned int max_freq;
unsigned int resume;
unsigned int cpu_feature;
};
@@ -259,6 +261,100 @@ static u32 get_cur_val(cpumask_t mask)
return cmd.val;
}
+/*
+ * Return the measured active (C0) frequency on this CPU since last call
+ * to this function.
+ * Input: cpu number
+ * Return: Average CPU frequency in terms of max frequency (zero on error)
+ *
+ * We use IA32_MPERF and IA32_APERF MSRs to get the measured performance
+ * over a period of time, while CPU is in C0 state.
+ * IA32_MPERF counts at the rate of max advertised frequency
+ * IA32_APERF counts at the rate of actual CPU frequency
+ * Only IA32_APERF/IA32_MPERF ratio is architecturally defined and
+ * no meaning should be associated with absolute values of these MSRs.
+ */
+static unsigned int get_measured_perf(unsigned int cpu)
+{
+ union {
+ struct {
+ u32 lo;
+ u32 hi;
+ } split;
+ u64 whole;
+ } aperf_cur, mperf_cur;
+
+ cpumask_t saved_mask;
+ unsigned int perf_percent;
+ unsigned int retval;
+
+ saved_mask = current->cpus_allowed;
+ set_cpus_allowed(current, cpumask_of_cpu(cpu));
+ if (get_cpu() != cpu) {
+ /* We were not able to run on requested processor */
+ put_cpu();
+ return 0;
+ }
+
+ rdmsr(MSR_IA32_APERF, aperf_cur.split.lo, aperf_cur.split.hi);
+ rdmsr(MSR_IA32_MPERF, mperf_cur.split.lo, mperf_cur.split.hi);
+
+ wrmsr(MSR_IA32_APERF, 0,0);
+ wrmsr(MSR_IA32_MPERF, 0,0);
+
+#ifdef __i386__
+ /*
+ * We dont want to do 64 bit divide with 32 bit kernel
+ * Get an approximate value. Return failure in case we cannot get
+ * an approximate value.
+ */
+ if (unlikely(aperf_cur.split.hi || mperf_cur.split.hi)) {
+ int shift_count;
+ u32 h;
+
+ h = max_t(u32, aperf_cur.split.hi, mperf_cur.split.hi);
+ shift_count = fls(h);
+
+ aperf_cur.whole >>= shift_count;
+ mperf_cur.whole >>= shift_count;
+ }
+
+ if (((unsigned long)(-1) / 100) < aperf_cur.split.lo) {
+ int shift_count = 7;
+ aperf_cur.split.lo >>= shift_count;
+ mperf_cur.split.lo >>= shift_count;
+ }
+
+ if (aperf_cur.split.lo && mperf_cur.split.lo) {
+ perf_percent = (aperf_cur.split.lo * 100) / mperf_cur.split.lo;
+ } else {
+ perf_percent = 0;
+ }
+
+#else
+ if (unlikely(((unsigned long)(-1) / 100) < aperf_cur.whole)) {
+ int shift_count = 7;
+ aperf_cur.whole >>= shift_count;
+ mperf_cur.whole >>= shift_count;
+ }
+
+ if (aperf_cur.whole && mperf_cur.whole) {
+ perf_percent = (aperf_cur.whole * 100) / mperf_cur.whole;
+ } else {
+ perf_percent = 0;
+ }
+
+#endif
+
+ retval = drv_data[cpu]->max_freq * perf_percent / 100;
+
+ put_cpu();
+ set_cpus_allowed(current, saved_mask);
+
+ dprintk("cpu %d: performance percent %d\n", cpu, perf_percent);
+ return retval;
+}
+
static unsigned int get_cur_freq_on_cpu(unsigned int cpu)
{
struct acpi_cpufreq_data *data = drv_data[cpu];
@@ -498,7 +594,6 @@ static int acpi_cpufreq_cpu_init(struct
unsigned int valid_states = 0;
unsigned int cpu = policy->cpu;
struct acpi_cpufreq_data *data;
- unsigned int l, h;
unsigned int result = 0;
struct cpuinfo_x86 *c = &cpu_data[policy->cpu];
struct acpi_processor_performance *perf;
@@ -592,6 +687,7 @@ static int acpi_cpufreq_cpu_init(struct
}
policy->governor = CPUFREQ_DEFAULT_GOVERNOR;
+ data->max_freq = perf->states[0].core_frequency * 1000;
/* table init */
for (i = 0; i < perf->state_count; i++) {
if (i > 0 && perf->states[i].core_frequency ==
@@ -625,6 +721,15 @@ static int acpi_cpufreq_cpu_init(struct
/* notify BIOS that we exist */
acpi_processor_notify_smm(THIS_MODULE);
+ /* Check for APERF/MPERF support in hardware */
+ if (c->x86_vendor == X86_VENDOR_INTEL && c->cpuid_level >= 6) {
+ unsigned int ecx;
+ ecx = cpuid_ecx(6);
+ if (ecx & CPUID_6_ECX_APERFMPERF_CAPABILITY) {
+ acpi_cpufreq_driver.getavg = get_measured_perf;
+ }
+ }
+
dprintk("CPU%u - ACPI performance management activated.\n", cpu);
for (i = 0; i < perf->state_count; i++)
dprintk(" %cP%d: %d MHz, %d mW, %d uS\n",
Index: linux-2.6.18-rc4/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-2.6.18-rc4.orig/drivers/cpufreq/cpufreq.c
+++ linux-2.6.18-rc4/drivers/cpufreq/cpufreq.c
@@ -1268,6 +1268,26 @@ int cpufreq_driver_target(struct cpufreq
}
EXPORT_SYMBOL_GPL(cpufreq_driver_target);
+int cpufreq_driver_getavg(struct cpufreq_policy *policy)
+{
+ int ret = 0;
+
+ policy = cpufreq_cpu_get(policy->cpu);
+ if (!policy)
+ return -EINVAL;
+
+ mutex_lock(&policy->lock);
+
+ if (cpu_online(policy->cpu) && cpufreq_driver->getavg)
+ ret = cpufreq_driver->getavg(policy->cpu);
+
+ mutex_unlock(&policy->lock);
+
+ cpufreq_cpu_put(policy);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cpufreq_driver_getavg);
+
/*
* Locking: Must be called with the lock_cpu_hotplug() lock held
* when "event" is CPUFREQ_GOV_LIMITS
Index: linux-2.6.18-rc4/drivers/cpufreq/cpufreq_ondemand.c
===================================================================
--- linux-2.6.18-rc4.orig/drivers/cpufreq/cpufreq_ondemand.c
+++ linux-2.6.18-rc4/drivers/cpufreq/cpufreq_ondemand.c
@@ -293,8 +293,13 @@ static void dbs_check_cpu(struct cpu_dbs
* policy. To be safe, we focus 10 points under the threshold.
*/
if (load < (dbs_tuners_ins.up_threshold - 10)) {
- unsigned int freq_next;
- freq_next = (policy->cur * load) /
+ unsigned int freq_next, freq_cur;
+
+ freq_cur = cpufreq_driver_getavg(policy);
+ if (!freq_cur)
+ freq_cur = policy->cur;
+
+ freq_next = (freq_cur * load) /
(dbs_tuners_ins.up_threshold - 10);
__cpufreq_driver_target(policy, freq_next, CPUFREQ_RELATION_L);
Index: linux-2.6.18-rc4/include/asm-i386/msr.h
===================================================================
--- linux-2.6.18-rc4.orig/include/asm-i386/msr.h
+++ linux-2.6.18-rc4/include/asm-i386/msr.h
@@ -125,6 +125,9 @@ static inline void wrmsrl (unsigned long
#define MSR_IA32_PERF_STATUS 0x198
#define MSR_IA32_PERF_CTL 0x199
+#define MSR_IA32_MPERF 0xE7
+#define MSR_IA32_APERF 0xE8
+
#define MSR_IA32_THERM_CONTROL 0x19a
#define MSR_IA32_THERM_INTERRUPT 0x19b
#define MSR_IA32_THERM_STATUS 0x19c
Index: linux-2.6.18-rc4/include/asm-x86_64/msr.h
===================================================================
--- linux-2.6.18-rc4.orig/include/asm-x86_64/msr.h
+++ linux-2.6.18-rc4/include/asm-x86_64/msr.h
@@ -296,6 +296,9 @@ static inline unsigned int cpuid_edx(uns
#define MSR_IA32_PERF_STATUS 0x198
#define MSR_IA32_PERF_CTL 0x199
+#define MSR_IA32_MPERF 0xE7
+#define MSR_IA32_APERF 0xE8
+
#define MSR_IA32_THERM_CONTROL 0x19a
#define MSR_IA32_THERM_INTERRUPT 0x19b
#define MSR_IA32_THERM_STATUS 0x19c
Index: linux-2.6.18-rc4/include/linux/cpufreq.h
===================================================================
--- linux-2.6.18-rc4.orig/include/linux/cpufreq.h
+++ linux-2.6.18-rc4/include/linux/cpufreq.h
@@ -172,6 +172,8 @@ extern int __cpufreq_driver_target(struc
unsigned int relation);
+extern int cpufreq_driver_getavg(struct cpufreq_policy *policy);
+
int cpufreq_register_governor(struct cpufreq_governor *governor);
void cpufreq_unregister_governor(struct cpufreq_governor *governor);
@@ -204,6 +206,7 @@ struct cpufreq_driver {
unsigned int (*get) (unsigned int cpu);
/* optional */
+ unsigned int (*getavg) (unsigned int cpu);
int (*exit) (struct cpufreq_policy *policy);
int (*suspend) (struct cpufreq_policy *policy, pm_message_t pmsg);
int (*resume) (struct cpufreq_policy *policy);
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-10-02 23:03 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-25 14:41 [PATCH] acpi-cpufreq: Use IA32_APERF and IA32_MPERF and get freq feedback from hardware Pallipadi, Venkatesh
2006-09-25 16:02 ` Erik Slagter
-- strict thread matches above, loose matches on Subject: below --
2006-09-25 16:36 Pallipadi, Venkatesh
2006-09-25 19:07 ` Erik Slagter
2006-10-02 23:03 ` Dominik Brodowski
2006-09-23 0:28 Venkatesh Pallipadi
2006-09-25 13:14 ` Erik Slagter
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.