* register_timer_hook use in arch/sh/oprofile
@ 2009-06-24 11:11 Martin Schwidefsky
2009-06-24 11:29 ` Paul Mundt
0 siblings, 1 reply; 19+ messages in thread
From: Martin Schwidefsky @ 2009-06-24 11:11 UTC (permalink / raw)
To: lethal; +Cc: Ingo Molnar, linux-kernel
Hi Paul,
following Ingo's suggestion I'm working on a patch to use a per-cpu
hrtimer instead of the timer_hook for the oprofile ticks. Given that
oprofile is the only user of the timer_hook I want to remove it
completely. That way I came across the register_timer_hook call in
arch/sh/oprofile/op_model_sh7750.c. Did this ever work? The startup
of oprofile is done in two steps, on module load oprofile_init is
called, on profiling start oprofile_start is called.
Now for SH7750 this happens:
oprofile_init()
...
oprofile_arch_init(&oprofile_ops);
...
lmodel->init();
/* init() is sh7750_ppc_init for CPU_SH7750/CPU_7750S */
sh7750_ppc_reset();
register_timer_hook(sh7750_timer_notify);
...
...
oprofile_timer_init(&oprofile_ops);
...
ops->start = timer_start;
...
...
oprofile_start()
...
oprofile_ops.start();
/* start() is timer_start set by oprofile_timer_init */
register_timer_hook(timer_notify);
...
As far as I can see the second register_timer_hook will fail
because sh7750_timer_notify is already registered. That will cause
oprofile_start to fail with -EBUSY, no?
My proposed solution for that particular problem would be to remove the
sh7750_timer_notify function together with the register/unregister
calls, see patch below. Then the sole caller of register_timer_hook
would be in drivers/oprofile/timer_int.c and after that has been
converted to hrtimer the whole timer_hook stuff could go away.
---
diff -urpN linux-2.6/arch/sh/oprofile/op_model_sh7750.c linux-2.6-patched/arch/sh/oprofile/op_model_sh7750.c
--- linux-2.6/arch/sh/oprofile/op_model_sh7750.c 2009-06-10 05:05:27.000000000 +0200
+++ linux-2.6-patched/arch/sh/oprofile/op_model_sh7750.c 2009-06-24 13:07:08.000000000 +0200
@@ -95,12 +95,6 @@ static struct sh7750_ppc_register_config
* behavior.
*/
-static int sh7750_timer_notify(struct pt_regs *regs)
-{
- oprofile_add_sample(regs, 0);
- return 0;
-}
-
static u64 sh7750_read_counter(int counter)
{
return (u64)((u64)(__raw_readl(PMCTRH(counter)) & 0xffff) << 32) |
@@ -231,14 +225,11 @@ static inline void sh7750_ppc_reset(void
static int sh7750_ppc_init(void)
{
sh7750_ppc_reset();
-
- return register_timer_hook(sh7750_timer_notify);
+ return 0;
}
static void sh7750_ppc_exit(void)
{
- unregister_timer_hook(sh7750_timer_notify);
-
sh7750_ppc_reset();
}
---
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 11:11 register_timer_hook use in arch/sh/oprofile Martin Schwidefsky @ 2009-06-24 11:29 ` Paul Mundt 2009-06-24 12:17 ` Ingo Molnar 2009-06-24 12:28 ` Martin Schwidefsky 0 siblings, 2 replies; 19+ messages in thread From: Paul Mundt @ 2009-06-24 11:29 UTC (permalink / raw) To: Martin Schwidefsky; +Cc: Ingo Molnar, linux-kernel On Wed, Jun 24, 2009 at 01:11:04PM +0200, Martin Schwidefsky wrote: > Hi Paul, > following Ingo's suggestion I'm working on a patch to use a per-cpu > hrtimer instead of the timer_hook for the oprofile ticks. Given that > oprofile is the only user of the timer_hook I want to remove it > completely. That way I came across the register_timer_hook call in > arch/sh/oprofile/op_model_sh7750.c. Did this ever work? The startup > of oprofile is done in two steps, on module load oprofile_init is > called, on profiling start oprofile_start is called. > Now for SH7750 this happens: > > oprofile_init() > ... > oprofile_arch_init(&oprofile_ops); > ... > lmodel->init(); > /* init() is sh7750_ppc_init for CPU_SH7750/CPU_7750S */ > sh7750_ppc_reset(); > register_timer_hook(sh7750_timer_notify); > ... > ... > oprofile_timer_init(&oprofile_ops); > ... > ops->start = timer_start; > ... > ... > > oprofile_start() > ... > oprofile_ops.start(); > /* start() is timer_start set by oprofile_timer_init */ > register_timer_hook(timer_notify); > ... > > As far as I can see the second register_timer_hook will fail > because sh7750_timer_notify is already registered. That will cause > oprofile_start to fail with -EBUSY, no? > No. oprofile_timer_init() is only entered if the performance counters fail to register in the SH7750 case, so there is only one timer hook user at a time: static int __init oprofile_init(void) { int err; err = oprofile_arch_init(&oprofile_ops); if (err < 0 || timer) { printk(KERN_INFO "oprofile: using timer interrupt.\n"); oprofile_timer_init(&oprofile_ops); } ... The sh7750 performance counter code has always purposely abused the timer interrupt as its profiling interrupt given that the counters themselves do not generate IRQs on their own. There are a couple of 64-bit counters that can count various events, but have no real event associated with them. As long as they are periodically read, they return accurate statistics, but the granularity is never better than the timer IRQ. In practice oprofile has never been a good fit for these sorts of counters, so this has fairly limited use. If there's a way to wiggle these types of counters in to the new perf_counter API, then I'll convert that over and just kill the old oprofile driver off completely. Barring that, I'll just end up converting it over to hrtimers as well, so don't let that stop you from ripping out the timer hook bits. Most of this code predates hrtimers anyways, and it also predates the timer hook, which is only something that we converted to some years back. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 11:29 ` Paul Mundt @ 2009-06-24 12:17 ` Ingo Molnar 2009-06-24 12:25 ` Paul Mundt 2009-06-24 12:28 ` Martin Schwidefsky 1 sibling, 1 reply; 19+ messages in thread From: Ingo Molnar @ 2009-06-24 12:17 UTC (permalink / raw) To: Paul Mundt, Martin Schwidefsky, linux-kernel, Peter Zijlstra, Paul Mackerras * Paul Mundt <lethal@linux-sh.org> wrote: > In practice oprofile has never been a good fit for these sorts of > counters, so this has fairly limited use. If there's a way to > wiggle these types of counters in to the new perf_counter API, > then I'll convert that over and just kill the old oprofile driver > off completely. Barring that, I'll just end up converting it over > to hrtimers as well, so don't let that stop you from ripping out > the timer hook bits. > > Most of this code predates hrtimers anyways, and it also predates > the timer hook, which is only something that we converted to some > years back. Note, the current initial upstream SH support for perfcounters: arch/sh/include/asm/perf_counter.h:#define set_perf_counter_pending() do { } while (0) arch/sh/include/asm/unistd_32.h:#define __NR_perf_counter_open 336 arch/sh/include/asm/unistd_64.h:#define __NR_perf_counter_open 364 arch/sh/kernel/syscalls_32.S: .long sys_perf_counter_open arch/sh/kernel/syscalls_64.S: .long sys_perf_counter_open Should already give you hrtimers straight away. To test it, could you try to run 'perf top' after: cd tools/perf/ make install It should display a hrtimer driven kernel profile already. You can increase/decrease the frequency of sampling by using -F option - say 'perf top -F 10000' should sample at 10 KHz. Please let me know if any of this does not work as expected. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 12:17 ` Ingo Molnar @ 2009-06-24 12:25 ` Paul Mundt 2009-06-24 12:37 ` Ingo Molnar 2009-06-24 12:40 ` Paul Mackerras 0 siblings, 2 replies; 19+ messages in thread From: Paul Mundt @ 2009-06-24 12:25 UTC (permalink / raw) To: Ingo Molnar Cc: Martin Schwidefsky, linux-kernel, Peter Zijlstra, Paul Mackerras On Wed, Jun 24, 2009 at 02:17:50PM +0200, Ingo Molnar wrote: > > * Paul Mundt <lethal@linux-sh.org> wrote: > > > In practice oprofile has never been a good fit for these sorts of > > counters, so this has fairly limited use. If there's a way to > > wiggle these types of counters in to the new perf_counter API, > > then I'll convert that over and just kill the old oprofile driver > > off completely. Barring that, I'll just end up converting it over > > to hrtimers as well, so don't let that stop you from ripping out > > the timer hook bits. > > > > Most of this code predates hrtimers anyways, and it also predates > > the timer hook, which is only something that we converted to some > > years back. > > Note, the current initial upstream SH support for perfcounters: > > arch/sh/include/asm/perf_counter.h:#define set_perf_counter_pending() do { } while (0) > arch/sh/include/asm/unistd_32.h:#define __NR_perf_counter_open 336 > arch/sh/include/asm/unistd_64.h:#define __NR_perf_counter_open 364 > arch/sh/kernel/syscalls_32.S: .long sys_perf_counter_open > arch/sh/kernel/syscalls_64.S: .long sys_perf_counter_open > > Should already give you hrtimers straight away. > > To test it, could you try to run 'perf top' after: > > cd tools/perf/ > make install > > It should display a hrtimer driven kernel profile already. You can > increase/decrease the frequency of sampling by using -F option - say > 'perf top -F 10000' should sample at 10 KHz. > > Please let me know if any of this does not work as expected. > Yes, that all works fine. My comment was more in reference to the hardware performance counters that don't have IRQs of their own, which still need to be tied in to the perf_counter API. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 12:25 ` Paul Mundt @ 2009-06-24 12:37 ` Ingo Molnar 2009-06-24 12:40 ` Paul Mackerras 1 sibling, 0 replies; 19+ messages in thread From: Ingo Molnar @ 2009-06-24 12:37 UTC (permalink / raw) To: Paul Mundt, Martin Schwidefsky, linux-kernel, Peter Zijlstra, Paul Mackerras * Paul Mundt <lethal@linux-sh.org> wrote: > On Wed, Jun 24, 2009 at 02:17:50PM +0200, Ingo Molnar wrote: > > > > * Paul Mundt <lethal@linux-sh.org> wrote: > > > > > In practice oprofile has never been a good fit for these sorts of > > > counters, so this has fairly limited use. If there's a way to > > > wiggle these types of counters in to the new perf_counter API, > > > then I'll convert that over and just kill the old oprofile driver > > > off completely. Barring that, I'll just end up converting it over > > > to hrtimers as well, so don't let that stop you from ripping out > > > the timer hook bits. > > > > > > Most of this code predates hrtimers anyways, and it also predates > > > the timer hook, which is only something that we converted to some > > > years back. > > > > Note, the current initial upstream SH support for perfcounters: > > > > arch/sh/include/asm/perf_counter.h:#define set_perf_counter_pending() do { } while (0) > > arch/sh/include/asm/unistd_32.h:#define __NR_perf_counter_open 336 > > arch/sh/include/asm/unistd_64.h:#define __NR_perf_counter_open 364 > > arch/sh/kernel/syscalls_32.S: .long sys_perf_counter_open > > arch/sh/kernel/syscalls_64.S: .long sys_perf_counter_open > > > > Should already give you hrtimers straight away. > > > > To test it, could you try to run 'perf top' after: > > > > cd tools/perf/ > > make install > > > > It should display a hrtimer driven kernel profile already. You can > > increase/decrease the frequency of sampling by using -F option - say > > 'perf top -F 10000' should sample at 10 KHz. > > > > Please let me know if any of this does not work as expected. > > Yes, that all works fine. [...] Great! :) > [...] My comment was more in reference to the hardware performance > counters that don't have IRQs of their own, which still need to be > tied in to the perf_counter API. Yeah. Note that you dont have to implement explicit interrupts support for that (especially if it's non-existent in the hardware): just implement the enable/disable and read methods, and then you can sample based on that counter by using it together in a counter-group with a sampling software-counter. Each software-counter IRQ (hrtimer driven) will cause a sample of the hw counter to be emitted too. This would work here and today. A step further would be to librarize this in kernel/perf_counter.c and allow architectures to offer such 'hw backed' counters to user-space as a single item, under the generic enumeration: PERF_COUNT_HW_CPU_CYCLES = 0, PERF_COUNT_HW_INSTRUCTIONS = 1, PERF_COUNT_HW_CACHE_REFERENCES = 2, PERF_COUNT_HW_CACHE_MISSES = 3, PERF_COUNT_HW_BRANCH_INSTRUCTIONS = 4, PERF_COUNT_HW_BRANCH_MISSES = 5, Note that with the latest tools it does not skew the results or profiles if the 'period metric' is not the hardware counter itself, but an independent hrtimer. All the tooling/reporting (perf top and perf report) is using event weights, so the period is an invariant. (and especially with self-tuning auto-freq counters the period is never truly constant anyway) So for all tooling and analysis purposes such counters would be fully equivalent to 'real' hw counters that can generate interrupts. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 12:25 ` Paul Mundt 2009-06-24 12:37 ` Ingo Molnar @ 2009-06-24 12:40 ` Paul Mackerras 2009-06-24 12:47 ` Ingo Molnar 1 sibling, 1 reply; 19+ messages in thread From: Paul Mackerras @ 2009-06-24 12:40 UTC (permalink / raw) To: Paul Mundt; +Cc: Ingo Molnar, Martin Schwidefsky, linux-kernel, Peter Zijlstra Paul Mundt writes: > Yes, that all works fine. My comment was more in reference to the > hardware performance counters that don't have IRQs of their own, which > still need to be tied in to the perf_counter API. For them you would need to check in hw_perf_counter_init() that counter->attr.sample_period is zero, and fail with an -EINVAL error if it isn't. Paul. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 12:40 ` Paul Mackerras @ 2009-06-24 12:47 ` Ingo Molnar 0 siblings, 0 replies; 19+ messages in thread From: Ingo Molnar @ 2009-06-24 12:47 UTC (permalink / raw) To: Paul Mackerras Cc: Paul Mundt, Martin Schwidefsky, linux-kernel, Peter Zijlstra * Paul Mackerras <paulus@samba.org> wrote: > Paul Mundt writes: > > > Yes, that all works fine. My comment was more in reference to > > the hardware performance counters that don't have IRQs of their > > own, which still need to be tied in to the perf_counter API. > > For them you would need to check in hw_perf_counter_init() that > counter->attr.sample_period is zero, and fail with an -EINVAL > error if it isn't. That would make 'perf stat' work - but 'perf top' and 'perf record' would not. But those can be made to work as well without hw IRQ support, if the period is simply fed into a special hrtimer, with nanosecs units. The resulting sampling wont be "constant period" - but none of the tools really mind about that. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 11:29 ` Paul Mundt 2009-06-24 12:17 ` Ingo Molnar @ 2009-06-24 12:28 ` Martin Schwidefsky 2009-06-24 12:34 ` Paul Mundt 1 sibling, 1 reply; 19+ messages in thread From: Martin Schwidefsky @ 2009-06-24 12:28 UTC (permalink / raw) To: Paul Mundt; +Cc: Ingo Molnar, linux-kernel On Wed, 24 Jun 2009 20:29:29 +0900 Paul Mundt <lethal@linux-sh.org> wrote: > On Wed, Jun 24, 2009 at 01:11:04PM +0200, Martin Schwidefsky wrote: > > Hi Paul, > > following Ingo's suggestion I'm working on a patch to use a per-cpu > > hrtimer instead of the timer_hook for the oprofile ticks. Given that > > oprofile is the only user of the timer_hook I want to remove it > > completely. That way I came across the register_timer_hook call in > > arch/sh/oprofile/op_model_sh7750.c. Did this ever work? The startup > > of oprofile is done in two steps, on module load oprofile_init is > > called, on profiling start oprofile_start is called. > > Now for SH7750 this happens: > > > > oprofile_init() > > ... > > oprofile_arch_init(&oprofile_ops); > > ... > > lmodel->init(); > > /* init() is sh7750_ppc_init for CPU_SH7750/CPU_7750S */ > > sh7750_ppc_reset(); > > register_timer_hook(sh7750_timer_notify); > > ... > > ... > > oprofile_timer_init(&oprofile_ops); > > ... > > ops->start = timer_start; > > ... > > ... > > > > oprofile_start() > > ... > > oprofile_ops.start(); > > /* start() is timer_start set by oprofile_timer_init */ > > register_timer_hook(timer_notify); > > ... > > > > As far as I can see the second register_timer_hook will fail > > because sh7750_timer_notify is already registered. That will cause > > oprofile_start to fail with -EBUSY, no? > > > No. oprofile_timer_init() is only entered if the performance counters > fail to register in the SH7750 case, so there is only one timer hook user > at a time: > > static int __init oprofile_init(void) > { > int err; > > err = oprofile_arch_init(&oprofile_ops); > > if (err < 0 || timer) { > printk(KERN_INFO "oprofile: using timer interrupt.\n"); > oprofile_timer_init(&oprofile_ops); > } > ... Oh, I see. That is the reason why the s390 version of oprofile_arch_init returns -ENODEV. It does so to trigger the fallback to the timer_hook. That should work for sh as well, no? > The sh7750 performance counter code has always purposely abused the timer > interrupt as its profiling interrupt given that the counters themselves do not > generate IRQs on their own. There are a couple of 64-bit counters that can > count various events, but have no real event associated with them. As long as > they are periodically read, they return accurate statistics, but the granularity > is never better than the timer IRQ. > > In practice oprofile has never been a good fit for these sorts of counters, so > this has fairly limited use. If there's a way to wiggle these types of counters > in to the new perf_counter API, then I'll convert that over and just kill the > old oprofile driver off completely. Barring that, I'll just end up converting it > over to hrtimers as well, so don't let that stop you from ripping out the timer > hook bits. > > Most of this code predates hrtimers anyways, and it also predates the timer > hook, which is only something that we converted to some years back. Consider the timer_hook as history :-) -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 12:28 ` Martin Schwidefsky @ 2009-06-24 12:34 ` Paul Mundt 2009-06-24 12:45 ` Ingo Molnar 2009-06-24 12:54 ` Martin Schwidefsky 0 siblings, 2 replies; 19+ messages in thread From: Paul Mundt @ 2009-06-24 12:34 UTC (permalink / raw) To: Martin Schwidefsky; +Cc: Ingo Molnar, linux-kernel On Wed, Jun 24, 2009 at 02:28:28PM +0200, Martin Schwidefsky wrote: > On Wed, 24 Jun 2009 20:29:29 +0900 > Paul Mundt <lethal@linux-sh.org> wrote: > > No. oprofile_timer_init() is only entered if the performance counters > > fail to register in the SH7750 case, so there is only one timer hook user > > at a time: > > > > static int __init oprofile_init(void) > > { > > int err; > > > > err = oprofile_arch_init(&oprofile_ops); > > > > if (err < 0 || timer) { > > printk(KERN_INFO "oprofile: using timer interrupt.\n"); > > oprofile_timer_init(&oprofile_ops); > > } > > ... > > Oh, I see. That is the reason why the s390 version of > oprofile_arch_init returns -ENODEV. It does so to trigger the fallback > to the timer_hook. That should work for sh as well, no? > It would, yes, but it would also disable access to the SH7750 counters at the same time, so we don't really want to do that. The sh7750 counters are more like timer based profiling with some extra events that can be set and read, so reverting to oprofile_timer_init() would reduce functionality. My current plan is to migrate things over to the perf_counter API and annoy Ingo with my interrupt deprived counters ;-) Given that hrtimers are already generically supported there, it should tie in much cleaner there than in the oprofile case at least. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 12:34 ` Paul Mundt @ 2009-06-24 12:45 ` Ingo Molnar 2009-06-24 13:14 ` Paul Mundt 2009-06-24 12:54 ` Martin Schwidefsky 1 sibling, 1 reply; 19+ messages in thread From: Ingo Molnar @ 2009-06-24 12:45 UTC (permalink / raw) To: Paul Mundt, Martin Schwidefsky, linux-kernel * Paul Mundt <lethal@linux-sh.org> wrote: > My current plan is to migrate things over to the perf_counter API > and annoy Ingo with my interrupt deprived counters ;-) Please do :-) I just wrote a longer explanation about them: i think they can be made full-blown hardware counters, which in the end would be basically just as capable as 'real' IRQ capable counters. The main complication that such counters bring is that in their 'own metric' they do interrupt periods in an 'irregular' way (because they interrupt in the nanosec metric - being hrtimers) - but both the tools can deal with uneven periods just fine and the auto-freq code can auto-balance based on this just fine too. So you should be able to implement this within arch/sh/ with existing perf_counter facilities and hrtimers, or you can help us librarize it in kernel/perf_counter.c some more, to minimize architecture code. [ Of course, given that you are the first architecture to do this, you would be fair to expect some unexpected details along the way as well ;-) ] Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 12:45 ` Ingo Molnar @ 2009-06-24 13:14 ` Paul Mundt 2009-06-24 13:24 ` Ingo Molnar 0 siblings, 1 reply; 19+ messages in thread From: Paul Mundt @ 2009-06-24 13:14 UTC (permalink / raw) To: Ingo Molnar; +Cc: Martin Schwidefsky, linux-kernel On Wed, Jun 24, 2009 at 02:45:42PM +0200, Ingo Molnar wrote: > > * Paul Mundt <lethal@linux-sh.org> wrote: > > > My current plan is to migrate things over to the perf_counter API > > and annoy Ingo with my interrupt deprived counters ;-) > > Please do :-) > > I just wrote a longer explanation about them: i think they can be > made full-blown hardware counters, which in the end would be > basically just as capable as 'real' IRQ capable counters. > Thanks for the hints, now that all of my -rc1 stuff is out of the way, I've got some spare cycles to start working on the hardware counters. The lack of design awareness for these sorts of use cases has been one of the biggest hassles of trying to deal with oprofile and the various incarnations of perfmon and so on in the past, but perf_counter certainly looks like a good way forward here. > The main complication that such counters bring is that in their 'own > metric' they do interrupt periods in an 'irregular' way (because > they interrupt in the nanosec metric - being hrtimers) - but both > the tools can deal with uneven periods just fine and the auto-freq > code can auto-balance based on this just fine too. > How we have mostly handled these counters in the past have been for long-term workload analysis. A given user wants to measure certain events across the lifetime of their workload to get an idea of where the hot spots are, and then send us numbers later. 64-bit hardware counters give them quite a bit of time to accumulate results, and in these cases precision is not so important as continuous non-interactive monitoring. On the other hand trying to beat these sorts of counters in to a framework where they can leverage existing userspace tools has never really worked as well as it could have, and this too is something that perf_counter seems well suited to accomodate. I knew quite a few other architectures that also have similar counters, so having these tie in cleanly in a generic way will make these a lot more useful, especially since in the past people have either just thrown them at sysfs or just ignored them completely. > So you should be able to implement this within arch/sh/ with > existing perf_counter facilities and hrtimers, or you can help us > librarize it in kernel/perf_counter.c some more, to minimize > architecture code. > Architecturally we have two fairly different counter types that follow a similar design, so they will need separate drivers regardless. I don't have much interest in hiding generic stuff in arch/sh/, especially given that we aren't the only ones with these types of counters, so I'll certainly hack on kernel/perf_counter.c as necessary to get things moving along. I expect between the two types of SH counters we have to handle there will already be quite a bit of variation. > [ Of course, given that you are the first architecture to do this, > you would be fair to expect some unexpected details along the way > as well ;-) ] > That tends to be the way things go more often than not. In any event, it's certainly worth spending time on getting right, and I would rather spend my time being the guinea pig for evolving in-tree code than not having the counters be useful at all. ;-) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 13:14 ` Paul Mundt @ 2009-06-24 13:24 ` Ingo Molnar 2009-06-26 7:01 ` Peter Zijlstra 0 siblings, 1 reply; 19+ messages in thread From: Ingo Molnar @ 2009-06-24 13:24 UTC (permalink / raw) To: Paul Mundt, Martin Schwidefsky, linux-kernel * Paul Mundt <lethal@linux-sh.org> wrote: > > The main complication that such counters bring is that in their > > 'own metric' they do interrupt periods in an 'irregular' way > > (because they interrupt in the nanosec metric - being hrtimers) > > - but both the tools can deal with uneven periods just fine and > > the auto-freq code can auto-balance based on this just fine too. > > How we have mostly handled these counters in the past have been > for long-term workload analysis. A given user wants to measure > certain events across the lifetime of their workload to get an > idea of where the hot spots are, and then send us numbers later. > 64-bit hardware counters give them quite a bit of time to > accumulate results, and in these cases precision is not so > important as continuous non-interactive monitoring. Note that if i understand the constraints correctly, SH's counters will be _exactly_ as precise as x86 counters or PowerPC counters. Even if SH uses a hrtimer to drive the sampling of them. [*] The reason is that perfcounters are designed to be 'sampling frequency/precision invariant'. We deal with [delta_count,delta_time] pairs in the histograms, etc. and nothing assumes that periods are static. [ And long-term analysis ('perf stat' type of runs) dont need IRQs anyway - perfcounters reads outs the counts and summarizes them across the measured workload. ] Furthermore, -F (auto-freq) counters are by design variable-period and self-adjusting, even on x86 and PowerPC. [ Btw., we plan to switch over the tools to use a 10 KHz auto-freq sampling by default - as that gives us meaningful samples all the time, regardless of how rare the underlying hardware event is. ] Ingo [*] The only tradeoff is that you wont have NMI sampling. But if you have _some_ sort of NMI source in SH (regardless of whether it's the PMU that drives it), you can still expose that and drive your perfcounter sampling via that. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 13:24 ` Ingo Molnar @ 2009-06-26 7:01 ` Peter Zijlstra 2009-06-26 7:23 ` Ingo Molnar 0 siblings, 1 reply; 19+ messages in thread From: Peter Zijlstra @ 2009-06-26 7:01 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul Mundt, Martin Schwidefsky, linux-kernel On Wed, 2009-06-24 at 15:24 +0200, Ingo Molnar wrote: > > [ And long-term analysis ('perf stat' type of runs) dont need IRQs > anyway - perfcounters reads outs the counts and summarizes them > across the measured workload. ] If the counter width is less than 64 bits we do need to have some interrupt to read them from before they cycle so we can accumulate the deltas into a proper u64. But for proper 64 bit hardware counters there is indeed no need for that. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-26 7:01 ` Peter Zijlstra @ 2009-06-26 7:23 ` Ingo Molnar 2009-06-26 7:26 ` Paul Mundt 2009-06-26 7:36 ` Peter Zijlstra 0 siblings, 2 replies; 19+ messages in thread From: Ingo Molnar @ 2009-06-26 7:23 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Paul Mundt, Martin Schwidefsky, linux-kernel * Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, 2009-06-24 at 15:24 +0200, Ingo Molnar wrote: > > > > [ And long-term analysis ('perf stat' type of runs) dont need IRQs > > anyway - perfcounters reads outs the counts and summarizes them > > across the measured workload. ] > > If the counter width is less than 64 bits we do need to have some > interrupt to read them from before they cycle so we can accumulate > the deltas into a proper u64. Yeah - but even that can be driven from some housekeeping hrtimer. > But for proper 64 bit hardware counters there is indeed no need > for that. Indeed - although they are rare. Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-26 7:23 ` Ingo Molnar @ 2009-06-26 7:26 ` Paul Mundt 2009-06-26 7:36 ` Peter Zijlstra 1 sibling, 0 replies; 19+ messages in thread From: Paul Mundt @ 2009-06-26 7:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, Martin Schwidefsky, linux-kernel On Fri, Jun 26, 2009 at 09:23:50AM +0200, Ingo Molnar wrote: > > * Peter Zijlstra <peterz@infradead.org> wrote: > > > On Wed, 2009-06-24 at 15:24 +0200, Ingo Molnar wrote: > > > > > > [ And long-term analysis ('perf stat' type of runs) dont need IRQs > > > anyway - perfcounters reads outs the counts and summarizes them > > > across the measured workload. ] > > > > If the counter width is less than 64 bits we do need to have some > > interrupt to read them from before they cycle so we can accumulate > > the deltas into a proper u64. > > Yeah - but even that can be driven from some housekeeping hrtimer. > > > But for proper 64 bit hardware counters there is indeed no need > > for that. > > Indeed - although they are rare. > We have both full 64-bit counters as well as 48-bit, although they do both at least set an overflow bit that can be looked at, even if there is no exception associated with it. Optionally they can be split up in to multiple 32-bit counters and half of them handed over to the bus controller. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-26 7:23 ` Ingo Molnar 2009-06-26 7:26 ` Paul Mundt @ 2009-06-26 7:36 ` Peter Zijlstra 1 sibling, 0 replies; 19+ messages in thread From: Peter Zijlstra @ 2009-06-26 7:36 UTC (permalink / raw) To: Ingo Molnar; +Cc: Paul Mundt, Martin Schwidefsky, linux-kernel On Fri, 2009-06-26 at 09:23 +0200, Ingo Molnar wrote: > * Peter Zijlstra <peterz@infradead.org> wrote: > > > On Wed, 2009-06-24 at 15:24 +0200, Ingo Molnar wrote: > > > > > > [ And long-term analysis ('perf stat' type of runs) dont need IRQs > > > anyway - perfcounters reads outs the counts and summarizes them > > > across the measured workload. ] > > > > If the counter width is less than 64 bits we do need to have some > > interrupt to read them from before they cycle so we can accumulate > > the deltas into a proper u64. > > Yeah - but even that can be driven from some housekeeping hrtimer. Oh absolutely, but its one of those things one should not forget to do. > > But for proper 64 bit hardware counters there is indeed no need > > for that. > > Indeed - although they are rare. Paul Mundt mentioned he had some. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 12:34 ` Paul Mundt 2009-06-24 12:45 ` Ingo Molnar @ 2009-06-24 12:54 ` Martin Schwidefsky 2009-12-16 4:52 ` Paul Mundt 1 sibling, 1 reply; 19+ messages in thread From: Martin Schwidefsky @ 2009-06-24 12:54 UTC (permalink / raw) To: Paul Mundt; +Cc: Ingo Molnar, linux-kernel On Wed, 24 Jun 2009 21:34:16 +0900 Paul Mundt <lethal@linux-sh.org> wrote: > On Wed, Jun 24, 2009 at 02:28:28PM +0200, Martin Schwidefsky wrote: > > On Wed, 24 Jun 2009 20:29:29 +0900 > > Paul Mundt <lethal@linux-sh.org> wrote: > > > No. oprofile_timer_init() is only entered if the performance counters > > > fail to register in the SH7750 case, so there is only one timer hook user > > > at a time: > > > > > > static int __init oprofile_init(void) > > > { > > > int err; > > > > > > err = oprofile_arch_init(&oprofile_ops); > > > > > > if (err < 0 || timer) { > > > printk(KERN_INFO "oprofile: using timer interrupt.\n"); > > > oprofile_timer_init(&oprofile_ops); > > > } > > > ... > > > > Oh, I see. That is the reason why the s390 version of > > oprofile_arch_init returns -ENODEV. It does so to trigger the fallback > > to the timer_hook. That should work for sh as well, no? > > > It would, yes, but it would also disable access to the SH7750 counters at > the same time, so we don't really want to do that. The sh7750 counters > are more like timer based profiling with some extra events that can be > set and read, so reverting to oprofile_timer_init() would reduce > functionality. > > My current plan is to migrate things over to the perf_counter API and > annoy Ingo with my interrupt deprived counters ;-) > > Given that hrtimers are already generically supported there, it should > tie in much cleaner there than in the oprofile case at least. Ok, that makes sense. So I guess for now I should stop trying to get rid of the timer_hook and concentrate to convert the fallback code in timer_int.c to hrtimer. Then after sh is fully converted to the perf_counter API we can do the cleanup. -- blue skies, Martin. "Reality continues to ruin my life." - Calvin. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-06-24 12:54 ` Martin Schwidefsky @ 2009-12-16 4:52 ` Paul Mundt 2009-12-16 7:21 ` Ingo Molnar 0 siblings, 1 reply; 19+ messages in thread From: Paul Mundt @ 2009-12-16 4:52 UTC (permalink / raw) To: Martin Schwidefsky; +Cc: Ingo Molnar, linux-kernel Hi Martin, On Wed, Jun 24, 2009 at 02:54:06PM +0200, Martin Schwidefsky wrote: > On Wed, 24 Jun 2009 21:34:16 +0900 > Paul Mundt <lethal@linux-sh.org> wrote: > > On Wed, Jun 24, 2009 at 02:28:28PM +0200, Martin Schwidefsky wrote: > > > On Wed, 24 Jun 2009 20:29:29 +0900 > > > Paul Mundt <lethal@linux-sh.org> wrote: > > > > No. oprofile_timer_init() is only entered if the performance counters > > > > fail to register in the SH7750 case, so there is only one timer hook user > > > > at a time: > > > > > > > > static int __init oprofile_init(void) > > > > { > > > > int err; > > > > > > > > err = oprofile_arch_init(&oprofile_ops); > > > > > > > > if (err < 0 || timer) { > > > > printk(KERN_INFO "oprofile: using timer interrupt.\n"); > > > > oprofile_timer_init(&oprofile_ops); > > > > } > > > > ... > > > > > > Oh, I see. That is the reason why the s390 version of > > > oprofile_arch_init returns -ENODEV. It does so to trigger the fallback > > > to the timer_hook. That should work for sh as well, no? > > > > > It would, yes, but it would also disable access to the SH7750 counters at > > the same time, so we don't really want to do that. The sh7750 counters > > are more like timer based profiling with some extra events that can be > > set and read, so reverting to oprofile_timer_init() would reduce > > functionality. > > > > My current plan is to migrate things over to the perf_counter API and > > annoy Ingo with my interrupt deprived counters ;-) > > > > Given that hrtimers are already generically supported there, it should > > tie in much cleaner there than in the oprofile case at least. > > Ok, that makes sense. So I guess for now I should stop trying to get > rid of the timer_hook and concentrate to convert the fallback code > in timer_int.c to hrtimer. Then after sh is fully converted to the > perf_counter API we can do the cleanup. > This is just a follow-up to let you know that the first-step transition to the perf_counter API is done. There's still more work to do, but we're already more functional on the perf_counter side than we ever were on oprofile. As such, I've subsequently killed off the bitrotted oprofile bits, which includes the timer hook references. There is now nothing remaining on the SH side blocking the timer hook removal. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: register_timer_hook use in arch/sh/oprofile 2009-12-16 4:52 ` Paul Mundt @ 2009-12-16 7:21 ` Ingo Molnar 0 siblings, 0 replies; 19+ messages in thread From: Ingo Molnar @ 2009-12-16 7:21 UTC (permalink / raw) To: Paul Mundt, Peter Zijlstra, Arnaldo Carvalho de Melo, =?unknown-8bit?B?RnLDqWTDqXJpYw==?= Weisbecker, Paul Mackerras Cc: Martin Schwidefsky, linux-kernel * Paul Mundt <lethal@linux-sh.org> wrote: > Hi Martin, > > On Wed, Jun 24, 2009 at 02:54:06PM +0200, Martin Schwidefsky wrote: > > On Wed, 24 Jun 2009 21:34:16 +0900 > > Paul Mundt <lethal@linux-sh.org> wrote: > > > On Wed, Jun 24, 2009 at 02:28:28PM +0200, Martin Schwidefsky wrote: > > > > On Wed, 24 Jun 2009 20:29:29 +0900 > > > > Paul Mundt <lethal@linux-sh.org> wrote: > > > > > No. oprofile_timer_init() is only entered if the performance counters > > > > > fail to register in the SH7750 case, so there is only one timer hook user > > > > > at a time: > > > > > > > > > > static int __init oprofile_init(void) > > > > > { > > > > > int err; > > > > > > > > > > err = oprofile_arch_init(&oprofile_ops); > > > > > > > > > > if (err < 0 || timer) { > > > > > printk(KERN_INFO "oprofile: using timer interrupt.\n"); > > > > > oprofile_timer_init(&oprofile_ops); > > > > > } > > > > > ... > > > > > > > > Oh, I see. That is the reason why the s390 version of > > > > oprofile_arch_init returns -ENODEV. It does so to trigger the fallback > > > > to the timer_hook. That should work for sh as well, no? > > > > > > > It would, yes, but it would also disable access to the SH7750 counters at > > > the same time, so we don't really want to do that. The sh7750 counters > > > are more like timer based profiling with some extra events that can be > > > set and read, so reverting to oprofile_timer_init() would reduce > > > functionality. > > > > > > My current plan is to migrate things over to the perf_counter API and > > > annoy Ingo with my interrupt deprived counters ;-) > > > > > > Given that hrtimers are already generically supported there, it should > > > tie in much cleaner there than in the oprofile case at least. > > > > Ok, that makes sense. So I guess for now I should stop trying to get > > rid of the timer_hook and concentrate to convert the fallback code > > in timer_int.c to hrtimer. Then after sh is fully converted to the > > perf_counter API we can do the cleanup. > > > > This is just a follow-up to let you know that the first-step transition to > the perf_counter API is done. There's still more work to do, but we're > already more functional on the perf_counter side than we ever were on > oprofile. [...] Nice to hear! Please let us know if you have any problems on the tools/perf/ side as well - in particular everyday usability. Even small details matter. Also, embedded-system usability would be of interest as well. Most of the perf tooling gets tested on x86 so it's quite possible that some aspects are not as slim as they could be. One key point of performance is the bona fide overhead of a default "perf record" profile recording session. On x86, it's roughly in the following range: aldebaran:~> perf stat --repeat 3 -e instructions ./loop_10b_instructions Performance counter stats for './loop_10b_instructions' (3 runs): 10009061648 instructions # 0.000 IPC ( +- 0.067% ) 2.407052637 seconds time elapsed ( +- 1.259% ) aldebaran:~> perf stat --repeat 3 -e instructions perf record -f ./loop_10b_instructions [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.073 MB perf.data (~3191 samples) ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.073 MB perf.data (~3183 samples) ] [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.073 MB perf.data (~3190 samples) ] Performance counter stats for 'perf record -f ./loop_10b_instructions' (3 runs): 10029183818 instructions # 0.000 IPC ( +- 0.004% ) 2.377064570 seconds time elapsed ( +- 0.215% ) I.e. 10029183818/10009061648 ~== 0.2% direct overhead. For something very system and task intense (such as hackbench) it can go up to 2.5%. Thanks, Ingo ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2009-12-16 7:22 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-06-24 11:11 register_timer_hook use in arch/sh/oprofile Martin Schwidefsky 2009-06-24 11:29 ` Paul Mundt 2009-06-24 12:17 ` Ingo Molnar 2009-06-24 12:25 ` Paul Mundt 2009-06-24 12:37 ` Ingo Molnar 2009-06-24 12:40 ` Paul Mackerras 2009-06-24 12:47 ` Ingo Molnar 2009-06-24 12:28 ` Martin Schwidefsky 2009-06-24 12:34 ` Paul Mundt 2009-06-24 12:45 ` Ingo Molnar 2009-06-24 13:14 ` Paul Mundt 2009-06-24 13:24 ` Ingo Molnar 2009-06-26 7:01 ` Peter Zijlstra 2009-06-26 7:23 ` Ingo Molnar 2009-06-26 7:26 ` Paul Mundt 2009-06-26 7:36 ` Peter Zijlstra 2009-06-24 12:54 ` Martin Schwidefsky 2009-12-16 4:52 ` Paul Mundt 2009-12-16 7:21 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox