* [RFC] perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure
@ 2010-05-29 18:24 Cyrill Gorcunov
2010-05-29 18:33 ` Peter Zijlstra
2010-05-31 13:00 ` Robert Richter
0 siblings, 2 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2010-05-29 18:24 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Frédéric Weisbecker,
Robert Richter, Arnaldo Carvalho de Melo
Cc: LKML
Hi,
I would appreciate comments/complains on the following patch. The idea is to implement
PMU quirks with minimal impact. At the moment two quirks are addressed -
PEBS disabling on Clovertown and P4 performance counter double write.
PEBS disabling already was there only moved to x86_pmu_quirk_ops. Note
that I didn't use pointer to the structure intensionally but embed it into
x86_pmu, if the structure grow we will need to use a pointer to the structure.
Comments, complains please? Perhaps there is some idea which allow
to handle this all more gentle?
-- Cyrill
---
perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure
This allow us to handle both Clovertown and Netburst quirks in
a general fashion.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
arch/x86/kernel/cpu/perf_event.c | 18 ++++++++--
arch/x86/kernel/cpu/perf_event_intel.c | 55 ++++++++++++++++++---------------
arch/x86/kernel/cpu/perf_event_p4.c | 21 ++++++++++++
3 files changed, 64 insertions(+), 30 deletions(-)
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
@@ -185,6 +185,11 @@ union perf_capabilities {
u64 capabilities;
};
+struct x86_pmu_quirk_ops {
+ void (*pmu_init)(void);
+ void (*perfctr_write)(unsigned long addr, u64 value);
+};
+
/*
* struct x86_pmu - generic x86 pmu
*/
@@ -218,7 +223,8 @@ struct x86_pmu {
void (*put_event_constraints)(struct cpu_hw_events *cpuc,
struct perf_event *event);
struct event_constraint *event_constraints;
- void (*quirks)(void);
+
+ struct x86_pmu_quirk_ops quirks;
int (*cpu_prepare)(int cpu);
void (*cpu_starting)(int cpu);
@@ -924,7 +930,11 @@ x86_perf_event_set_period(struct perf_ev
*/
atomic64_set(&hwc->prev_count, (u64)-left);
- wrmsrl(hwc->event_base + idx,
+ if (x86_pmu.quirks.perfctr_write)
+ x86_pmu.quirks.perfctr_write(hwc->event_base + idx,
+ (u64)(-left) & x86_pmu.cntval_mask);
+ else
+ wrmsrl(hwc->event_base + idx,
(u64)(-left) & x86_pmu.cntval_mask);
perf_event_update_userpage(event);
@@ -1316,8 +1326,8 @@ void __init init_hw_perf_events(void)
pr_cont("%s PMU driver.\n", x86_pmu.name);
- if (x86_pmu.quirks)
- x86_pmu.quirks();
+ if (x86_pmu.quirks.pmu_init)
+ x86_pmu.quirks.pmu_init();
if (x86_pmu.num_counters > X86_PMC_MAX_GENERIC) {
WARN(1, KERN_ERR "hw perf events %d > max(%d), clipping!",
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_intel.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_intel.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_intel.c
@@ -822,6 +822,36 @@ static void intel_pmu_cpu_dying(int cpu)
fini_debug_store_on_cpu(cpu);
}
+static void intel_clovertown_pmu_init(void)
+{
+ /*
+ * PEBS is unreliable due to:
+ *
+ * AJ67 - PEBS may experience CPL leaks
+ * AJ68 - PEBS PMI may be delayed by one event
+ * AJ69 - GLOBAL_STATUS[62] will only be set when DEBUGCTL[12]
+ * AJ106 - FREEZE_LBRS_ON_PMI doesn't work in combination with PEBS
+ *
+ * AJ67 could be worked around by restricting the OS/USR flags.
+ * AJ69 could be worked around by setting PMU_FREEZE_ON_PMI.
+ *
+ * AJ106 could possibly be worked around by not allowing LBR
+ * usage from PEBS, including the fixup.
+ * AJ68 could possibly be worked around by always programming
+ * a pebs_event_reset[0] value and coping with the lost events.
+ *
+ * But taken together it might just make sense to not enable PEBS on
+ * these chips.
+ */
+ printk(KERN_WARNING "PEBS disabled due to CPU errata.\n");
+ x86_pmu.pebs = 0;
+ x86_pmu.pebs_constraints = NULL;
+}
+
+static __initconst const struct x86_pmu_quirk_ops intel_clovertown_quirks = {
+ .pmu_init = intel_clovertown_pmu_init,
+};
+
static __initconst const struct x86_pmu intel_pmu = {
.name = "Intel",
.handle_irq = intel_pmu_handle_irq,
@@ -848,31 +878,6 @@ static __initconst const struct x86_pmu
.cpu_dying = intel_pmu_cpu_dying,
};
-static void intel_clovertown_quirks(void)
-{
- /*
- * PEBS is unreliable due to:
- *
- * AJ67 - PEBS may experience CPL leaks
- * AJ68 - PEBS PMI may be delayed by one event
- * AJ69 - GLOBAL_STATUS[62] will only be set when DEBUGCTL[12]
- * AJ106 - FREEZE_LBRS_ON_PMI doesn't work in combination with PEBS
- *
- * AJ67 could be worked around by restricting the OS/USR flags.
- * AJ69 could be worked around by setting PMU_FREEZE_ON_PMI.
- *
- * AJ106 could possibly be worked around by not allowing LBR
- * usage from PEBS, including the fixup.
- * AJ68 could possibly be worked around by always programming
- * a pebs_event_reset[0] value and coping with the lost events.
- *
- * But taken together it might just make sense to not enable PEBS on
- * these chips.
- */
- printk(KERN_WARNING "PEBS disabled due to CPU errata.\n");
- x86_pmu.pebs = 0;
- x86_pmu.pebs_constraints = NULL;
-}
static __init int intel_pmu_init(void)
{
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -804,6 +804,24 @@ done:
return num ? -ENOSPC : 0;
}
+/*
+ * This handles errata N15 in intel doc 249199-029,
+ * the counter may not be updated correctly on write
+ * so we repeat write operation twice to do the trick
+ * (the official workaround didn't work)
+ *
+ * the former idea is taken from OProfile code
+ */
+static void p4_perfctr_write(unsigned long addr, u64 value)
+{
+ wrmsrl(addr, value);
+ wrmsrl(addr, value);
+}
+
+static __initconst const struct x86_pmu_quirk_ops p4_pmu_quirks = {
+ .perfctr_write = p4_perfctr_write,
+};
+
static __initconst const struct x86_pmu p4_pmu = {
.name = "Netburst P4/Xeon",
.handle_irq = p4_pmu_handle_irq,
@@ -850,7 +868,8 @@ static __init int p4_pmu_init(void)
pr_cont("Netburst events, ");
- x86_pmu = p4_pmu;
+ x86_pmu = p4_pmu;
+ x86_pmu.quirks = p4_pmu_quirks;
return 0;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure
2010-05-29 18:24 [RFC] perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure Cyrill Gorcunov
@ 2010-05-29 18:33 ` Peter Zijlstra
2010-05-31 16:45 ` Cyrill Gorcunov
2010-05-31 13:00 ` Robert Richter
1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2010-05-29 18:33 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Ingo Molnar, Frédéric Weisbecker, Robert Richter,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian
On Sat, 2010-05-29 at 22:24 +0400, Cyrill Gorcunov wrote:
> @@ -924,7 +930,11 @@ x86_perf_event_set_period(struct perf_ev
> */
> atomic64_set(&hwc->prev_count, (u64)-left);
>
> - wrmsrl(hwc->event_base + idx,
> + if (x86_pmu.quirks.perfctr_write)
> + x86_pmu.quirks.perfctr_write(hwc->event_base + idx,
> + (u64)(-left) & x86_pmu.cntval_mask);
> + else
> + wrmsrl(hwc->event_base + idx,
> (u64)(-left) & x86_pmu.cntval_mask);
This bit is rather ugly,.. not quite sure how to clean it up though.
Anybody got a bright idea?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure
2010-05-29 18:24 [RFC] perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure Cyrill Gorcunov
2010-05-29 18:33 ` Peter Zijlstra
@ 2010-05-31 13:00 ` Robert Richter
2010-05-31 13:43 ` Peter Zijlstra
2010-05-31 14:09 ` Cyrill Gorcunov
1 sibling, 2 replies; 7+ messages in thread
From: Robert Richter @ 2010-05-31 13:00 UTC (permalink / raw)
To: Cyrill Gorcunov
Cc: Peter Zijlstra, Ingo Molnar, Frédéric Weisbecker,
Arnaldo Carvalho de Melo, LKML
On 29.05.10 14:24:09, Cyrill Gorcunov wrote:
> Hi,
>
> I would appreciate comments/complains on the following patch. The idea is to implement
> PMU quirks with minimal impact. At the moment two quirks are addressed -
> PEBS disabling on Clovertown and P4 performance counter double write.
> PEBS disabling already was there only moved to x86_pmu_quirk_ops. Note
> that I didn't use pointer to the structure intensionally but embed it into
> x86_pmu, if the structure grow we will need to use a pointer to the structure.
The quirk functions add additional code and ops structures to the
already existing model specific code. This quirks would be fine if we
would could merge model specific code and get unified code. But these
model specific code cannot be replaced. So I rather prefer to
implement cpu errata in model specific code.
> @@ -185,6 +185,11 @@ union perf_capabilities {
> u64 capabilities;
> };
>
> +struct x86_pmu_quirk_ops {
> + void (*pmu_init)(void);
This init quirk could be much better handled in the model specific
init code (intel_pmu_init()/amd_pmu_init()). I don't see a reason for
adding the quirk first and then immediately calling it. The quirk
function could be called directly instead.
> + void (*perfctr_write)(unsigned long addr, u64 value);
This one is difficult to avoid ...
> @@ -924,7 +930,11 @@ x86_perf_event_set_period(struct perf_ev
> */
> atomic64_set(&hwc->prev_count, (u64)-left);
>
> - wrmsrl(hwc->event_base + idx,
> + if (x86_pmu.quirks.perfctr_write)
> + x86_pmu.quirks.perfctr_write(hwc->event_base + idx,
> + (u64)(-left) & x86_pmu.cntval_mask);
> + else
> + wrmsrl(hwc->event_base + idx,
... but it introduces another check in the fast path. There are some
options to avoid this. First we could see if we rather implement this
in model specific interrupt handlers (there is p4_pmu_handle_irq()).
Or, we implement an optimized check for perf quirks, maybe using
ALTERNATIVE or jump labels.
I think we can handle both quirks, but if we start using and extending
it more, it will have a performance impact and code will also more
complicated. So, I think it is rather inappropriate as a general
approach.
-Robert
> (u64)(-left) & x86_pmu.cntval_mask);
>
> perf_event_update_userpage(event);
--
Advanced Micro Devices, Inc.
Operating System Research Center
email: robert.richter@amd.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure
2010-05-31 13:00 ` Robert Richter
@ 2010-05-31 13:43 ` Peter Zijlstra
2010-05-31 14:09 ` Cyrill Gorcunov
1 sibling, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2010-05-31 13:43 UTC (permalink / raw)
To: Robert Richter
Cc: Cyrill Gorcunov, Ingo Molnar, Frédéric Weisbecker,
Arnaldo Carvalho de Melo, LKML
On Mon, 2010-05-31 at 15:00 +0200, Robert Richter wrote:
>
> The quirk functions add additional code and ops structures to the
> already existing model specific code. This quirks would be fine if we
> would could merge model specific code and get unified code. But these
> model specific code cannot be replaced. So I rather prefer to
> implement cpu errata in model specific code.
The reason the existing quirk() function is called after *_pmu_init() is
because of the funny way it all writes to the console.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure
2010-05-31 13:00 ` Robert Richter
2010-05-31 13:43 ` Peter Zijlstra
@ 2010-05-31 14:09 ` Cyrill Gorcunov
1 sibling, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2010-05-31 14:09 UTC (permalink / raw)
To: Robert Richter
Cc: Peter Zijlstra, Ingo Molnar, Frédéric Weisbecker,
Arnaldo Carvalho de Melo, LKML
On Mon, May 31, 2010 at 03:00:58PM +0200, Robert Richter wrote:
> On 29.05.10 14:24:09, Cyrill Gorcunov wrote:
> > Hi,
> >
> > I would appreciate comments/complains on the following patch. The idea is to implement
> > PMU quirks with minimal impact. At the moment two quirks are addressed -
> > PEBS disabling on Clovertown and P4 performance counter double write.
> > PEBS disabling already was there only moved to x86_pmu_quirk_ops. Note
> > that I didn't use pointer to the structure intensionally but embed it into
> > x86_pmu, if the structure grow we will need to use a pointer to the structure.
>
> The quirk functions add additional code and ops structures to the
> already existing model specific code. This quirks would be fine if we
> would could merge model specific code and get unified code. But these
> model specific code cannot be replaced. So I rather prefer to
> implement cpu errata in model specific code.
>
agreed, but this quirks ops looked as more clear solution for me,
at least in a sake of initiating the 'find something better' dialogue
you know :)
> > @@ -185,6 +185,11 @@ union perf_capabilities {
> > u64 capabilities;
> > };
> >
> > +struct x86_pmu_quirk_ops {
> > + void (*pmu_init)(void);
>
> This init quirk could be much better handled in the model specific
> init code (intel_pmu_init()/amd_pmu_init()). I don't see a reason for
> adding the quirk first and then immediately calling it. The quirk
> function could be called directly instead.
>
well, at moment we have only the one caller but if the number of
callers increase _better_ to have it called via function pointer
since it's easier to find out the callers in further and we're allowed
to use such approach since this is a not fast path code. On the other
hands I rather agree with you -- it's overzealous at the moment.
/me: dropping idea on *pmu_init
> > + void (*perfctr_write)(unsigned long addr, u64 value);
>
> This one is difficult to avoid ...
>
unfortunately
> > @@ -924,7 +930,11 @@ x86_perf_event_set_period(struct perf_ev
> > */
> > atomic64_set(&hwc->prev_count, (u64)-left);
> >
> > - wrmsrl(hwc->event_base + idx,
> > + if (x86_pmu.quirks.perfctr_write)
> > + x86_pmu.quirks.perfctr_write(hwc->event_base + idx,
> > + (u64)(-left) & x86_pmu.cntval_mask);
> > + else
> > + wrmsrl(hwc->event_base + idx,
>
> ... but it introduces another check in the fast path. There are some
> options to avoid this. First we could see if we rather implement this
> in model specific interrupt handlers (there is p4_pmu_handle_irq()).
no, we can't, I thought about that, this code is called from several
places.
> Or, we implement an optimized check for perf quirks, maybe using
> ALTERNATIVE or jump labels.
>
yes! Robert, I completely forgot about alternatives. I guess that
is exactly what we need! I'll try to implement.
> I think we can handle both quirks, but if we start using and extending
> it more, it will have a performance impact and code will also more
> complicated. So, I think it is rather inappropriate as a general
> approach.
>
> -Robert
>
Thanks a huge for comments, Robert!
> > (u64)(-left) & x86_pmu.cntval_mask);
> >
> > perf_event_update_userpage(event);
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
> email: robert.richter@amd.com
>
-- Cyrill
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure
2010-05-29 18:33 ` Peter Zijlstra
@ 2010-05-31 16:45 ` Cyrill Gorcunov
2010-06-01 21:25 ` Cyrill Gorcunov
0 siblings, 1 reply; 7+ messages in thread
From: Cyrill Gorcunov @ 2010-05-31 16:45 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Frédéric Weisbecker, Robert Richter,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian
On Sat, May 29, 2010 at 08:33:10PM +0200, Peter Zijlstra wrote:
>
> On Sat, 2010-05-29 at 22:24 +0400, Cyrill Gorcunov wrote:
> > @@ -924,7 +930,11 @@ x86_perf_event_set_period(struct perf_ev
> > */
> > atomic64_set(&hwc->prev_count, (u64)-left);
> >
> > - wrmsrl(hwc->event_base + idx,
> > + if (x86_pmu.quirks.perfctr_write)
> > + x86_pmu.quirks.perfctr_write(hwc->event_base + idx,
> > + (u64)(-left) & x86_pmu.cntval_mask);
> > + else
> > + wrmsrl(hwc->event_base + idx,
> > (u64)(-left) & x86_pmu.cntval_mask);
>
> This bit is rather ugly,.. not quite sure how to clean it up though.
> Anybody got a bright idea?
>
Yes, I know, only a bit lighter solution could be like in patch
below, alternative instructions bring mess (and considering we
may have paravirt turned on -- even more mess), jump labels...
I didn't find them in tree, in which file(s) they are? I mean,
are they under review now or merged in some place?
So I guess plain test may be more-less fine here, hmm?
-- Cyrill
---
perf, x86: Make a second write to performance counter when needed
On Netburst cpu we need a second write to performance counter to
be sure it's updated properly.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
---
arch/x86/kernel/cpu/perf_event.c | 10 ++++++++++
arch/x86/kernel/cpu/perf_event_p4.c | 9 +++++++++
2 files changed, 19 insertions(+)
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event.c
@@ -220,6 +220,7 @@ struct x86_pmu {
struct perf_event *event);
struct event_constraint *event_constraints;
void (*quirks)(void);
+ int perfctr_second_write;
int (*cpu_prepare)(int cpu);
void (*cpu_starting)(int cpu);
@@ -926,6 +927,15 @@ x86_perf_event_set_period(struct perf_ev
atomic64_set(&hwc->prev_count, (u64)-left);
wrmsrl(hwc->event_base + idx,
+ (u64)(-left) & x86_pmu.cntval_mask);
+
+ /*
+ * Due to erratum on certan cpu we need
+ * a second write to be sure the register
+ * is updated properly
+ */
+ if (x86_pmu.perfctr_second_write)
+ wrmsrl(hwc->event_base + idx,
(u64)(-left) & x86_pmu.cntval_mask);
perf_event_update_userpage(event);
Index: linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
=====================================================================
--- linux-2.6.git.orig/arch/x86/kernel/cpu/perf_event_p4.c
+++ linux-2.6.git/arch/x86/kernel/cpu/perf_event_p4.c
@@ -829,6 +829,15 @@ static __initconst const struct x86_pmu
.max_period = (1ULL << 39) - 1,
.hw_config = p4_hw_config,
.schedule_events = p4_pmu_schedule_events,
+ /*
+ * This handles erratum N15 in intel doc 249199-029,
+ * the counter may not be updated correctly on write
+ * so we need a second write operation to do the trick
+ * (the official workaround didn't work)
+ *
+ * the former idea is taken from OProfile code
+ */
+ .perfctr_second_write = 1,
};
static __init int p4_pmu_init(void)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC] perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure
2010-05-31 16:45 ` Cyrill Gorcunov
@ 2010-06-01 21:25 ` Cyrill Gorcunov
0 siblings, 0 replies; 7+ messages in thread
From: Cyrill Gorcunov @ 2010-06-01 21:25 UTC (permalink / raw)
To: Peter Zijlstra, Robert Richter
Cc: Ingo Molnar, Frédéric Weisbecker,
Arnaldo Carvalho de Melo, LKML, Stephane Eranian
On Mon, May 31, 2010 at 08:45:53PM +0400, Cyrill Gorcunov wrote:
...
>
> So I guess plain test may be more-less fine here, hmm?
>
> -- Cyrill
> ---
> perf, x86: Make a second write to performance counter when needed
>
> On Netburst cpu we need a second write to performance counter to
> be sure it's updated properly.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
> ---
> arch/x86/kernel/cpu/perf_event.c | 10 ++++++++++
> arch/x86/kernel/cpu/perf_event_p4.c | 9 +++++++++
> 2 files changed, 19 insertions(+)
>
...
Ping? Any objections/ideas on this approach?
-- Cyrill
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-06-01 21:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-29 18:24 [RFC] perf, x86: Segregate PMU workaraunds into x86_pmu_quirk_ops structure Cyrill Gorcunov
2010-05-29 18:33 ` Peter Zijlstra
2010-05-31 16:45 ` Cyrill Gorcunov
2010-06-01 21:25 ` Cyrill Gorcunov
2010-05-31 13:00 ` Robert Richter
2010-05-31 13:43 ` Peter Zijlstra
2010-05-31 14:09 ` Cyrill Gorcunov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox