* Attribute spinlock contention ticks to caller.
@ 2005-09-14 22:26 Robin Holt
2005-09-15 0:10 ` Keith Owens
` (15 more replies)
0 siblings, 16 replies; 17+ messages in thread
From: Robin Holt @ 2005-09-14 22:26 UTC (permalink / raw)
To: linux-ia64
On larger systems, ia64_spinlock_contention frequently shows up in
pfmon output. Determining whether it is a frequently contended lock or
numerous different locks is very difficult.
The following patch attributes the ticks received while in
ia64_spinlock_contention to the requestor of the lock.
Signed-off-by: Robin Holt <holt@sgi.com>
Index: linux-2.6/arch/ia64/kernel/head.S
=================================--- linux-2.6.orig/arch/ia64/kernel/head.S 2005-09-09 18:06:16.000000000 -0500
+++ linux-2.6/arch/ia64/kernel/head.S 2005-09-14 14:38:03.946010176 -0500
@@ -1119,6 +1119,8 @@ GLOBAL_ENTRY(ia64_spinlock_contention)
(p14) br.cond.sptk.few .wait
br.ret.sptk.many b6 // lock is now taken
+ .global ia64_spinlock_contention_end // for determining if we are in ia64_spinlock_contention code.
+ia64_spinlock_contention_end:
END(ia64_spinlock_contention)
#endif
Index: linux-2.6/arch/ia64/kernel/ia64_ksyms.c
=================================--- linux-2.6.orig/arch/ia64/kernel/ia64_ksyms.c 2005-09-01 09:30:56.000000000 -0500
+++ linux-2.6/arch/ia64/kernel/ia64_ksyms.c 2005-09-14 14:39:24.411493406 -0500
@@ -109,6 +109,8 @@ EXPORT_SYMBOL(unw_init_running);
*/
extern char ia64_spinlock_contention_pre3_4;
EXPORT_SYMBOL(ia64_spinlock_contention_pre3_4);
+extern char ia64_spinlock_contention_pre3_4_end;
+EXPORT_SYMBOL(ia64_spinlock_contention_pre3_4_end);
# else
/*
* This is not a normal routine and we don't want a function descriptor for it, so we use
@@ -116,6 +118,8 @@ EXPORT_SYMBOL(ia64_spinlock_contention_p
*/
extern char ia64_spinlock_contention;
EXPORT_SYMBOL(ia64_spinlock_contention);
+extern char ia64_spinlock_contention_end;
+EXPORT_SYMBOL(ia64_spinlock_contention_end);
# endif
# endif
#endif
Index: linux-2.6/arch/ia64/kernel/perfmon_default_smpl.c
=================================--- linux-2.6.orig/arch/ia64/kernel/perfmon_default_smpl.c 2005-09-01 09:30:56.000000000 -0500
+++ linux-2.6/arch/ia64/kernel/perfmon_default_smpl.c 2005-09-14 14:43:26.445574995 -0500
@@ -99,6 +99,16 @@ default_init(struct task_struct *task, v
return 0;
}
+#ifdef CONFIG_SMP
+#if __GNUC__ < 3 || (__GNUC__ = 3 && __GNUC_MINOR__ < 3)
+extern char ia64_spinlock_contention_pre3_4[], ia64_spinlock_contention_pre3_4_end[];
+#define ia64_spinlock_contention ia64_spinlock_contention_pre3_4
+#define ia64_spinlock_contention_end ia64_spinlock_contention_pre3_4_end
+#else
+extern char ia64_spinlock_contention[], ia64_spinlock_contention_end[];
+#endif
+#endif
+
static int
default_handler(struct task_struct *task, void *buf, pfm_ovfl_arg_t *arg, struct pt_regs *regs, unsigned long stamp)
{
@@ -165,6 +175,12 @@ default_handler(struct task_struct *task
* where did the fault happen (includes slot number)
*/
ent->ip = regs->cr_iip | ((regs->cr_ipsr >> 41) & 0x3);
+#ifdef CONFIG_SMP
+ /* Fix up the ip for code in the spinlock contention path. */
+ if ((ent->ip >= (unsigned long)ia64_spinlock_contention) &&
+ (ent->ip < (unsigned long)ia64_spinlock_contention_end))
+ ent->ip = regs->b6;
+#endif
ent->tstamp = stamp;
ent->cpu = smp_processor_id();
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
@ 2005-09-15 0:10 ` Keith Owens
2005-09-15 6:34 ` Stephane Eranian
` (14 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Keith Owens @ 2005-09-15 0:10 UTC (permalink / raw)
To: linux-ia64
On Wed, 14 Sep 2005 17:26:44 -0500,
Robin Holt <holt@sgi.com> wrote:
>
>On larger systems, ia64_spinlock_contention frequently shows up in
>pfmon output. Determining whether it is a frequently contended lock or
>numerous different locks is very difficult.
>
>The following patch attributes the ticks received while in
>ia64_spinlock_contention to the requestor of the lock.
>...
>@@ -165,6 +175,12 @@ default_handler(struct task_struct *task
> * where did the fault happen (includes slot number)
> */
> ent->ip = regs->cr_iip | ((regs->cr_ipsr >> 41) & 0x3);
>+#ifdef CONFIG_SMP
>+ /* Fix up the ip for code in the spinlock contention path. */
>+ if ((ent->ip >= (unsigned long)ia64_spinlock_contention) &&
>+ (ent->ip < (unsigned long)ia64_spinlock_contention_end))
>+ ent->ip = regs->b6;
>+#endif
>
> ent->tstamp = stamp;
> ent->cpu = smp_processor_id();
There is a small window at the start of ia64_spinlock_contention_pre3_4
where b6 is not defined. It is only about 2 bundles wide and is only
executed once when a lock is contended, so you are unlikely to hit it,
just FYI.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
2005-09-15 0:10 ` Keith Owens
@ 2005-09-15 6:34 ` Stephane Eranian
2005-09-15 8:19 ` Stephane Eranian
` (13 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Stephane Eranian @ 2005-09-15 6:34 UTC (permalink / raw)
To: linux-ia64
Robin,
> +
> static int
> default_handler(struct task_struct *task, void *buf, pfm_ovfl_arg_t *arg, struct pt_regs *regs, unsigned long stamp)
> {
> @@ -165,6 +175,12 @@ default_handler(struct task_struct *task
> * where did the fault happen (includes slot number)
> */
> ent->ip = regs->cr_iip | ((regs->cr_ipsr >> 41) & 0x3);
> +#ifdef CONFIG_SMP
> + /* Fix up the ip for code in the spinlock contention path. */
> + if ((ent->ip >= (unsigned long)ia64_spinlock_contention) &&
> + (ent->ip < (unsigned long)ia64_spinlock_contention_end))
> + ent->ip = regs->b6;
> +#endif
I think SGI already submitted something similar for the 2.4 kernel.
I understand your motivations for doing this. Yet it does not look
so clean and error proof. Keith already mentioned a potential gap.
I also think it is hard to maintain because if for some reasons someone
changes from b6 to b7 there is no way of tracking this from default_handler().
For your purpose, the value of b6 is more interesting than ip. Would that
always be the case for every measurement?
This also opens the door for people submitted other special cases.
This is the reasons why I designed sampling formats to be plug-in
modules such that for special needs, people can simply develop their
own format. I understand that your modification does not deeply alter
the default format and integrates seamlessly with existing applications.
But it seems there ought to be a cleaner way of doing this.
--
-stephane
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
2005-09-15 0:10 ` Keith Owens
2005-09-15 6:34 ` Stephane Eranian
@ 2005-09-15 8:19 ` Stephane Eranian
2005-09-15 17:14 ` Robin Holt
` (12 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Stephane Eranian @ 2005-09-15 8:19 UTC (permalink / raw)
To: linux-ia64
Robin,
On Thu, Sep 15, 2005 at 10:37:50AM -0700, Luck, Tony wrote:
> >This also opens the door for people submitted other special cases.
>
> I'm very sympathetic to getting better performance data. I agree
> 100% that knowing who called spinlock contention is far better than
> just lumping all spinlock contention together.
>
I think you need to clarify this a little bit I may be missing
somethin here.
It seems you assume you know something in advance here.
I think you need a two-step process somehow. First you need to
discover that you have contention, i.e., lots of samples
in the contention code. Second you want to know from where
and that's why you record the return from contention rather
than contention. This sequence makes sense.
With your patch, you would skip the first step. If you don't
know you have contention, how would you interpret the samples
you get? For each sample, you have to search backwards to see
if there is a br.call or similar that points to some
spinlock code. Why would you do this costly search systematically?
Unless the tool is designed just to look for this.
So I think that this should somehow be an option. Like Tony suggest
I would like to have a cleaner mechanism to track code range
for which this could be useful. Furthermore, the name of the
register to swap for iip should not be hardcoded becasue it is hard
to maintain, should the spinlock code evolve.
> But I have to agree with Stephane that this looks like the start
> of a slippery slope of special cases (each of which provides two
> new exported symbols).
>
> We should look to see if there is a better way to flag address
> ranges in the kernel where you'd like to bill time to the caller
> rather than the function (perhaps some sort of tag table like the
> extable used for copyin/copyout fault recovery? Then we can just
> export one table and have the profiler search it ... rather than
> a new pair of symbols for every case.
>
> Or you can try to convince me that spinlock contention is such
> a special one off case, and we will never, ever, want to do this
> anywhere else.
>
--
-Stephane
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (2 preceding siblings ...)
2005-09-15 8:19 ` Stephane Eranian
@ 2005-09-15 17:14 ` Robin Holt
2005-09-15 17:23 ` Robin Holt
` (11 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Robin Holt @ 2005-09-15 17:14 UTC (permalink / raw)
To: linux-ia64
On Thu, Sep 15, 2005 at 10:10:30AM +1000, Keith Owens wrote:
> On Wed, 14 Sep 2005 17:26:44 -0500,
> Robin Holt <holt@sgi.com> wrote:
> >
> >On larger systems, ia64_spinlock_contention frequently shows up in
> >pfmon output. Determining whether it is a frequently contended lock or
> >numerous different locks is very difficult.
> >
> >The following patch attributes the ticks received while in
> >ia64_spinlock_contention to the requestor of the lock.
> >...
> >@@ -165,6 +175,12 @@ default_handler(struct task_struct *task
> > * where did the fault happen (includes slot number)
> > */
> > ent->ip = regs->cr_iip | ((regs->cr_ipsr >> 41) & 0x3);
> >+#ifdef CONFIG_SMP
> >+ /* Fix up the ip for code in the spinlock contention path. */
> >+ if ((ent->ip >= (unsigned long)ia64_spinlock_contention) &&
> >+ (ent->ip < (unsigned long)ia64_spinlock_contention_end))
> >+ ent->ip = regs->b6;
> >+#endif
> >
> > ent->tstamp = stamp;
> > ent->cpu = smp_processor_id();
>
> There is a small window at the start of ia64_spinlock_contention_pre3_4
> where b6 is not defined. It is only about 2 bundles wide and is only
> executed once when a lock is contended, so you are unlikely to hit it,
> just FYI.
Here is a better version with that taken care of. Do you see any other
issues?
Index: linux-2.6/arch/ia64/kernel/head.S
=================================--- linux-2.6.orig/arch/ia64/kernel/head.S 2005-09-09 18:06:16.000000000 -0500
+++ linux-2.6/arch/ia64/kernel/head.S 2005-09-15 12:07:30.019498585 -0500
@@ -1071,6 +1071,8 @@ GLOBAL_ENTRY(ia64_spinlock_contention_pr
tbit.nz p15,p0=r27,IA64_PSR_I_BIT
.restore sp // pop existing prologue after next insn
mov b6 = r28
+ .global ia64_spinlock_contention_pre3_4_beg // for kernprof
+ia64_spinlock_contention_pre3_4_beg:
.prologue
.save ar.pfs, r0
.altrp b6
@@ -1119,6 +1121,8 @@ GLOBAL_ENTRY(ia64_spinlock_contention)
(p14) br.cond.sptk.few .wait
br.ret.sptk.many b6 // lock is now taken
+ .global ia64_spinlock_contention_end // for determining if we are in ia64_spinlock_contention code.
+ia64_spinlock_contention_end:
END(ia64_spinlock_contention)
#endif
Index: linux-2.6/arch/ia64/kernel/ia64_ksyms.c
=================================--- linux-2.6.orig/arch/ia64/kernel/ia64_ksyms.c 2005-09-01 09:30:56.000000000 -0500
+++ linux-2.6/arch/ia64/kernel/ia64_ksyms.c 2005-09-15 12:07:30.019498585 -0500
@@ -109,6 +109,10 @@ EXPORT_SYMBOL(unw_init_running);
*/
extern char ia64_spinlock_contention_pre3_4;
EXPORT_SYMBOL(ia64_spinlock_contention_pre3_4);
+extern char ia64_spinlock_contention_pre3_4_beg;
+EXPORT_SYMBOL(ia64_spinlock_contention_pre3_4_beg);
+extern char ia64_spinlock_contention_pre3_4_end;
+EXPORT_SYMBOL(ia64_spinlock_contention_pre3_4_end);
# else
/*
* This is not a normal routine and we don't want a function descriptor for it, so we use
@@ -116,6 +120,8 @@ EXPORT_SYMBOL(ia64_spinlock_contention_p
*/
extern char ia64_spinlock_contention;
EXPORT_SYMBOL(ia64_spinlock_contention);
+extern char ia64_spinlock_contention_end;
+EXPORT_SYMBOL(ia64_spinlock_contention_end);
# endif
# endif
#endif
Index: linux-2.6/arch/ia64/kernel/perfmon_default_smpl.c
=================================--- linux-2.6.orig/arch/ia64/kernel/perfmon_default_smpl.c 2005-09-01 09:30:56.000000000 -0500
+++ linux-2.6/arch/ia64/kernel/perfmon_default_smpl.c 2005-09-15 12:08:23.772955818 -0500
@@ -99,6 +99,16 @@ default_init(struct task_struct *task, v
return 0;
}
+#ifdef CONFIG_SMP
+#if __GNUC__ < 3 || (__GNUC__ = 3 && __GNUC_MINOR__ < 3)
+extern char ia64_spinlock_contention_pre3_4_beg[], ia64_spinlock_contention_pre3_4_end[];
+#define ia64_spinlock_contention ia64_spinlock_contention_pre3_4_beg
+#define ia64_spinlock_contention_end ia64_spinlock_contention_pre3_4_end
+#else
+extern char ia64_spinlock_contention[], ia64_spinlock_contention_end[];
+#endif
+#endif
+
static int
default_handler(struct task_struct *task, void *buf, pfm_ovfl_arg_t *arg, struct pt_regs *regs, unsigned long stamp)
{
@@ -165,6 +175,12 @@ default_handler(struct task_struct *task
* where did the fault happen (includes slot number)
*/
ent->ip = regs->cr_iip | ((regs->cr_ipsr >> 41) & 0x3);
+#ifdef CONFIG_SMP
+ /* Fix up the ip for code in the spinlock contention path. */
+ if ((ent->ip >= (unsigned long)ia64_spinlock_contention) &&
+ (ent->ip < (unsigned long)ia64_spinlock_contention_end))
+ ent->ip = regs->b6;
+#endif
ent->tstamp = stamp;
ent->cpu = smp_processor_id();
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (3 preceding siblings ...)
2005-09-15 17:14 ` Robin Holt
@ 2005-09-15 17:23 ` Robin Holt
2005-09-15 17:37 ` Luck, Tony
` (10 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Robin Holt @ 2005-09-15 17:23 UTC (permalink / raw)
To: linux-ia64
On Thu, Sep 15, 2005 at 08:34:35AM +0200, Stephane Eranian wrote:
> Robin,
>
> > +
> > static int
> > default_handler(struct task_struct *task, void *buf, pfm_ovfl_arg_t *arg, struct pt_regs *regs, unsigned long stamp)
> > {
> > @@ -165,6 +175,12 @@ default_handler(struct task_struct *task
> > * where did the fault happen (includes slot number)
> > */
> > ent->ip = regs->cr_iip | ((regs->cr_ipsr >> 41) & 0x3);
> > +#ifdef CONFIG_SMP
> > + /* Fix up the ip for code in the spinlock contention path. */
> > + if ((ent->ip >= (unsigned long)ia64_spinlock_contention) &&
> > + (ent->ip < (unsigned long)ia64_spinlock_contention_end))
> > + ent->ip = regs->b6;
> > +#endif
>
> I think SGI already submitted something similar for the 2.4 kernel.
> I understand your motivations for doing this. Yet it does not look
> so clean and error proof. Keith already mentioned a potential gap.
> I also think it is hard to maintain because if for some reasons someone
> changes from b6 to b7 there is no way of tracking this from default_handler().
> For your purpose, the value of b6 is more interesting than ip. Would that
> always be the case for every measurement?
Can you ever think of a case that monitoring an entire
application including kernel activity would benefit from seeing
ia64_spinlock_contention instead of the function doing the work. We made
this change to our distro many years ago and have never found a case
where less specific information was good. We have found knowing which
function was causing spinlock contention to always be more helpful than
just knowing there is contention.
> This also opens the door for people submitted other special cases.
I don't think this change which is arguably getting the information
that the module is providing correct.
> This is the reasons why I designed sampling formats to be plug-in
> modules such that for special needs, people can simply develop their
> own format. I understand that your modification does not deeply alter
> the default format and integrates seamlessly with existing applications.
> But it seems there ought to be a cleaner way of doing this.
If you are suggesting I check in a second format module just for this,
I could do that, but... I can not think of a cleaner way than my most
recent patch provides. I am open to suggestions.
Thanks,
Robin
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (4 preceding siblings ...)
2005-09-15 17:23 ` Robin Holt
@ 2005-09-15 17:37 ` Luck, Tony
2005-09-15 22:29 ` Robin Holt
` (9 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Luck, Tony @ 2005-09-15 17:37 UTC (permalink / raw)
To: linux-ia64
>This also opens the door for people submitted other special cases.
I'm very sympathetic to getting better performance data. I agree
100% that knowing who called spinlock contention is far better than
just lumping all spinlock contention together.
But I have to agree with Stephane that this looks like the start
of a slippery slope of special cases (each of which provides two
new exported symbols).
We should look to see if there is a better way to flag address
ranges in the kernel where you'd like to bill time to the caller
rather than the function (perhaps some sort of tag table like the
extable used for copyin/copyout fault recovery? Then we can just
export one table and have the profiler search it ... rather than
a new pair of symbols for every case.
Or you can try to convince me that spinlock contention is such
a special one off case, and we will never, ever, want to do this
anywhere else.
-Tony
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (5 preceding siblings ...)
2005-09-15 17:37 ` Luck, Tony
@ 2005-09-15 22:29 ` Robin Holt
2005-09-15 22:54 ` Zou Nan hai
` (8 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Robin Holt @ 2005-09-15 22:29 UTC (permalink / raw)
To: linux-ia64
On Thu, Sep 15, 2005 at 01:19:09AM -0700, Stephane Eranian wrote:
> Robin,
>
> On Thu, Sep 15, 2005 at 10:37:50AM -0700, Luck, Tony wrote:
> > >This also opens the door for people submitted other special cases.
> >
> > I'm very sympathetic to getting better performance data. I agree
> > 100% that knowing who called spinlock contention is far better than
> > just lumping all spinlock contention together.
> >
> I think you need to clarify this a little bit I may be missing
> somethin here.
>
> It seems you assume you know something in advance here.
> I think you need a two-step process somehow. First you need to
> discover that you have contention, i.e., lots of samples
> in the contention code. Second you want to know from where
> and that's why you record the return from contention rather
> than contention. This sequence makes sense.
>
> With your patch, you would skip the first step. If you don't
> know you have contention, how would you interpret the samples
> you get? For each sample, you have to search backwards to see
> if there is a br.call or similar that points to some
> spinlock code. Why would you do this costly search systematically?
> Unless the tool is designed just to look for this.
Unfortunately, without this patch, ia64_spinlock_contention become the
top billing issue on nearly every large cpu count sample. It does not
mean you are contending on the same lock. I have been fooled many times
into chasing a contention problem when in reality there were many locks
lightly contended which artificially raised the number of ticks to a
significant level.
By looking at the caller, you get rid of that false positive. Usually,
it becomes rapidly apparrent what lock within a code path is the potential
for being the hot lock.
>
> So I think that this should somehow be an option. Like Tony suggest
> I would like to have a cleaner mechanism to track code range
> for which this could be useful. Furthermore, the name of the
> register to swap for iip should not be hardcoded becasue it is hard
> to maintain, should the spinlock code evolve.
I remain open to suggestions.
Thanks,
Robin
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (6 preceding siblings ...)
2005-09-15 22:29 ` Robin Holt
@ 2005-09-15 22:54 ` Zou Nan hai
2005-09-16 9:37 ` Stephane Eranian
` (7 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Zou Nan hai @ 2005-09-15 22:54 UTC (permalink / raw)
To: linux-ia64
On Fri, 2005-09-16 at 01:37, Luck, Tony wrote:
> >This also opens the door for people submitted other special cases.
>
> I'm very sympathetic to getting better performance data. I agree
> 100% that knowing who called spinlock contention is far better than
> just lumping all spinlock contention together.
>
> But I have to agree with Stephane that this looks like the start
> of a slippery slope of special cases (each of which provides two
> new exported symbols).
>
> We should look to see if there is a better way to flag address
> ranges in the kernel where you'd like to bill time to the caller
> rather than the function (perhaps some sort of tag table like the
> extable used for copyin/copyout fault recovery? Then we can just
> export one table and have the profiler search it ... rather than
> a new pair of symbols for every case.
>
> Or you can try to convince me that spinlock contention is such
> a special one off case, and we will never, ever, want to do this
> anywhere else.
>
> -Tony
There is a function in_lock_function in kernel/spinlock.c
However in latest kernel, ia64_spinlock_contention is not in
.spinlock.text section.
We could first collect ia64_spinlock_contention into .spinlock.text
section. Then use in_lock_function.
Zou Nan hai
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (7 preceding siblings ...)
2005-09-15 22:54 ` Zou Nan hai
@ 2005-09-16 9:37 ` Stephane Eranian
2005-09-16 22:29 ` Robin Holt
` (6 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Stephane Eranian @ 2005-09-16 9:37 UTC (permalink / raw)
To: linux-ia64
Robin,
On Thu, Sep 15, 2005 at 05:29:00PM -0500, Robin Holt wrote:
> > It seems you assume you know something in advance here.
> > I think you need a two-step process somehow. First you need to
> > discover that you have contention, i.e., lots of samples
> > in the contention code. Second you want to know from where
> > and that's why you record the return from contention rather
> > than contention. This sequence makes sense.
> >
> > With your patch, you would skip the first step. If you don't
> > know you have contention, how would you interpret the samples
> > you get? For each sample, you have to search backwards to see
> > if there is a br.call or similar that points to some
> > spinlock code. Why would you do this costly search systematically?
> > Unless the tool is designed just to look for this.
>
> Unfortunately, without this patch, ia64_spinlock_contention become the
> top billing issue on nearly every large cpu count sample. It does not
> mean you are contending on the same lock. I have been fooled many times
> into chasing a contention problem when in reality there were many locks
> lightly contended which artificially raised the number of ticks to a
> significant level.
>
You have another side effect in here: interrupt masking. With existing
PMU, you cannot take a sample is interrupts are masked. For some
of the kernel code, it means that you get samples attributed to the
bundle(s) just following interrupt unmasking.
I think what you are really after here is kernel call stack unwinding.
Your patch is effectively a quick hack to get this for a specific function
and for one level of unwinding.
I have shown in several presentations (incl. Gelato May 2004) that the existing
infrastructure can be used to sample the kernel call stack. I have written
a prototype perfmon2 sampling format that does just that. You can
say, for instance:" Every 100,0000 cycles in the kernel record the full
(or partial) call stack". The format is just a prototype at this point
but I think it could be useful for your situation. The format was
designed to show the power of the interface in that it allows you
sample on PMU event and yet record non PMU-based information.
--
-Stephane
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (8 preceding siblings ...)
2005-09-16 9:37 ` Stephane Eranian
@ 2005-09-16 22:29 ` Robin Holt
2005-09-17 1:08 ` David Mosberger-Tang
` (5 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Robin Holt @ 2005-09-16 22:29 UTC (permalink / raw)
To: linux-ia64
On Fri, Sep 16, 2005 at 02:37:31AM -0700, Stephane Eranian wrote:
> Robin,
>
> On Thu, Sep 15, 2005 at 05:29:00PM -0500, Robin Holt wrote:
> > > It seems you assume you know something in advance here.
> > > I think you need a two-step process somehow. First you need to
> > > discover that you have contention, i.e., lots of samples
> > > in the contention code. Second you want to know from where
> > > and that's why you record the return from contention rather
> > > than contention. This sequence makes sense.
> > >
> > > With your patch, you would skip the first step. If you don't
> > > know you have contention, how would you interpret the samples
> > > you get? For each sample, you have to search backwards to see
> > > if there is a br.call or similar that points to some
> > > spinlock code. Why would you do this costly search systematically?
> > > Unless the tool is designed just to look for this.
> >
> > Unfortunately, without this patch, ia64_spinlock_contention become the
> > top billing issue on nearly every large cpu count sample. It does not
> > mean you are contending on the same lock. I have been fooled many times
> > into chasing a contention problem when in reality there were many locks
> > lightly contended which artificially raised the number of ticks to a
> > significant level.
> >
> You have another side effect in here: interrupt masking. With existing
> PMU, you cannot take a sample is interrupts are masked. For some
> of the kernel code, it means that you get samples attributed to the
> bundle(s) just following interrupt unmasking.
At least in those situations, you can clearly see the samples are
attributable to the function which disabled/enabled interrupts. Not some
generic code which has nothing to do with the function.
> I think what you are really after here is kernel call stack unwinding.
> Your patch is effectively a quick hack to get this for a specific function
> and for one level of unwinding.
Are you saying I should add the heavy steps of checking to top the call
stack and unwinding a single step if I am in ia64_spinlock_contention
instead of the relatively light checking against two global symbols
and doing a register move?
> I have shown in several presentations (incl. Gelato May 2004) that the existing
> infrastructure can be used to sample the kernel call stack. I have written
> a prototype perfmon2 sampling format that does just that. You can
> say, for instance:" Every 100,0000 cycles in the kernel record the full
> (or partial) call stack". The format is just a prototype at this point
> but I think it could be useful for your situation. The format was
> designed to show the power of the interface in that it allows you
> sample on PMU event and yet record non PMU-based information.
Can you site one instance where it is more helpful to know that _ANY_
lock in the system is hitting ia64_spinlock_contention as opposed to
the function which has the spin_lock() code? I can recall times when
network traffic was contending on locks and causing my application to
appear to have a contended lock. With this change, at least we know
the degradation is do to something external.
Thanks,
Robin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (9 preceding siblings ...)
2005-09-16 22:29 ` Robin Holt
@ 2005-09-17 1:08 ` David Mosberger-Tang
2005-09-18 23:06 ` Robin Holt
` (4 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: David Mosberger-Tang @ 2005-09-17 1:08 UTC (permalink / raw)
To: linux-ia64
On 9/16/05, Robin Holt <holt@sgi.com> wrote:
> Can you site one instance where it is more helpful to know that _ANY_
> lock in the system is hitting ia64_spinlock_contention as opposed to
> the function which has the spin_lock() code? I can recall times when
> network traffic was contending on locks and causing my application to
> appear to have a contended lock. With this change, at least we know
> the degradation is do to something external.
Not sure if I can help siting it, but I can cite one recent example: I
was looking at a profile of a test which wasn't supposed to have any
contention. Yet, ia64_spinlock_contention unexpectedly showed up
fairly high in the profile. Turns out it was due to some left-over
debug code. Of course, I was using q-syscollect so it was easy to see
who was calling the spinlock contention routine a lot...
--david
--
Mosberger Consulting LLC, voice/fax: 510-744-9372,
http://www.mosberger-consulting.com/
35706 Runckel Lane, Fremont, CA 94536
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (10 preceding siblings ...)
2005-09-17 1:08 ` David Mosberger-Tang
@ 2005-09-18 23:06 ` Robin Holt
2005-09-19 1:18 ` David Mosberger-Tang
` (3 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Robin Holt @ 2005-09-18 23:06 UTC (permalink / raw)
To: linux-ia64
On Fri, Sep 16, 2005 at 06:08:45PM -0700, David Mosberger-Tang wrote:
> On 9/16/05, Robin Holt <holt@sgi.com> wrote:
>
> > Can you site one instance where it is more helpful to know that _ANY_
> > lock in the system is hitting ia64_spinlock_contention as opposed to
> > the function which has the spin_lock() code? I can recall times when
> > network traffic was contending on locks and causing my application to
> > appear to have a contended lock. With this change, at least we know
> > the degradation is do to something external.
>
> Not sure if I can help siting it, but I can cite one recent example: I
> was looking at a profile of a test which wasn't supposed to have any
> contention. Yet, ia64_spinlock_contention unexpectedly showed up
> fairly high in the profile. Turns out it was due to some left-over
> debug code. Of course, I was using q-syscollect so it was easy to see
> who was calling the spinlock contention routine a lot...
That would be a case where you are contending on one lock and you would
be given the function that was contending. What I was concerned with
is where two or more locks are contended and ia64_spinlock_contention
artificially appears at the top.
Robin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (11 preceding siblings ...)
2005-09-18 23:06 ` Robin Holt
@ 2005-09-19 1:18 ` David Mosberger-Tang
2005-09-19 8:35 ` Stephane Eranian
` (2 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: David Mosberger-Tang @ 2005-09-19 1:18 UTC (permalink / raw)
To: linux-ia64
On 9/18/05, Robin Holt <holt@sgi.com> wrote:
> On Fri, Sep 16, 2005 at 06:08:45PM -0700, David Mosberger-Tang wrote:
> > On 9/16/05, Robin Holt <holt@sgi.com> wrote:
> >
> > > Can you site one instance where it is more helpful to know that _ANY_
> > > lock in the system is hitting ia64_spinlock_contention as opposed to
> > > the function which has the spin_lock() code? I can recall times when
> > > network traffic was contending on locks and causing my application to
> > > appear to have a contended lock. With this change, at least we know
> > > the degradation is do to something external.
> >
> > Not sure if I can help siting it, but I can cite one recent example: I
> > was looking at a profile of a test which wasn't supposed to have any
> > contention. Yet, ia64_spinlock_contention unexpectedly showed up
> > fairly high in the profile. Turns out it was due to some left-over
> > debug code. Of course, I was using q-syscollect so it was easy to see
> > who was calling the spinlock contention routine a lot...
>
> That would be a case where you are contending on one lock and you would
> be given the function that was contending. What I was concerned with
> is where two or more locks are contended and ia64_spinlock_contention
> artificially appears at the top.
Well, it's an example where attributing the spinlock contention time
to the caller would have completely obfuscated the problem.
--david
--
Mosberger Consulting LLC, voice/fax: 510-744-9372,
http://www.mosberger-consulting.com/
35706 Runckel Lane, Fremont, CA 94536
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (12 preceding siblings ...)
2005-09-19 1:18 ` David Mosberger-Tang
@ 2005-09-19 8:35 ` Stephane Eranian
2005-09-19 15:17 ` Robin Holt
2005-09-19 17:52 ` David Mosberger-Tang
15 siblings, 0 replies; 17+ messages in thread
From: Stephane Eranian @ 2005-09-19 8:35 UTC (permalink / raw)
To: linux-ia64
Robin,
On Mon, Sep 19, 2005 at 10:52:11AM -0700, David Mosberger-Tang wrote:
> And as Stephane already explained, if you use the right tool, there is
> no need for the hack that you suggest. You can either use a
> q-syscollect-like approach (which will give you call-counts, but not
> necessarily distribute the time accurately) or you can unwind the
> call-stack and even distribute the time correctly. That's all doable
> today without any special-case hacks.
>
If you still have your test case. Could you run q-syscollect
on it and see how close you get from the profile you get with
the modified handler? Look at the kernel profile. Would that
be good enough to track down the problem?
The other issue I have with this patch is that it is non-portable.
The next version of perfmon works on multiple architectures. In
particular the default sampling format is used by i386, x86-64, ia-64,
ppc64. Your patch would not work with those because it contains
IA-64 specific code yet I think the same problem exists on those
architectures as well.
> --david
>
> On 9/19/05, Robin Holt <holt@sgi.com> wrote:
> > On Sun, Sep 18, 2005 at 06:18:20PM -0700, David Mosberger-Tang wrote:
> > > Well, it's an example where attributing the spinlock contention time
> > > to the caller would have completely obfuscated the problem.
> >
> > Either way, we have obfuscation. In the one case (attributing to caller),
> > the obfuscation can be resolved by looking at the code. In the other
> > (multiple paths contending on independent locks), the obfuscation can
> > only be resolved by repeating the test with different sampling.
> >
> > Although that sounds simple, what if it is a difficult to execute test.
> > What if this appeared to be a one-time aberration that was captured during
> > one of many iterations. The chance to capture is gone.
> >
> > For a more complete illustration, I would like to elaborate my previous
> > example. I had a sample file produced by our benchmarkers. They had
> > received the results on their third run after tweaking some app settings
> > and the results were nearly impossible to believe. This happened to be
> > an MPI job where all ranks barrier at the end of a phase so one single
> > rank being slow results in the entire application being slow.
> >
> > After the third run, they repeated with the app settings from the
> > second run and then repeated again with the settings from the third
> > run. Neither run showed any signs of a similar problem. The customer
> > acceptance test continued. Before the customer would accept the results,
> > they needed that anomaly explained.
> >
> > Fortunately, the customer had required a sampling output from every
> > run so data had been taken using perfmon and retained. This was on a
> > 2.4 based system. The system had eight Ethernet adapters spread across
> > the machine. Interrupts for each were targeted to different cpus.
> >
> > Because sampling was showing the caller, this turned into a simple
> > question, why was there so much network receive activity. On some of
> > the cpus, we noticed a significant number of processes were trying to
> > en-queue network packets at the same time. The sample IP showed we were
> > in a bundle after a spinlock was acquired.
> >
> > Had we not provided the caller, we would have been left with something
> > that was relatively impossible to diagnose definitively. With the unroll,
> > it became a simple matter of looking at the enabled network services and
> > finding somebody had run a network benchmark using all eight network
> > adapters. We contacted the group responsible for network benchmarks
> > and the problem was isolated and explained to the customers satisfaction.
> >
> > I hope this illustrates that one way of sampling makes it slightly more
> > difficult to determine that the source of slowdown is contention on
> > a lock where the other way of sampling results in it being impossible
> > to determine the source of a problem. Given the choices, I would say
> > the right way to do the sampling is to not attribute the samples to
> > the caller.
> >
> > Thanks,
> > Robin
> >
>
>
> --
> Mosberger Consulting LLC, voice/fax: 510-744-9372,
> http://www.mosberger-consulting.com/
> 35706 Runckel Lane, Fremont, CA 94536
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
-Stephane
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (13 preceding siblings ...)
2005-09-19 8:35 ` Stephane Eranian
@ 2005-09-19 15:17 ` Robin Holt
2005-09-19 17:52 ` David Mosberger-Tang
15 siblings, 0 replies; 17+ messages in thread
From: Robin Holt @ 2005-09-19 15:17 UTC (permalink / raw)
To: linux-ia64
On Sun, Sep 18, 2005 at 06:18:20PM -0700, David Mosberger-Tang wrote:
> Well, it's an example where attributing the spinlock contention time
> to the caller would have completely obfuscated the problem.
Either way, we have obfuscation. In the one case (attributing to caller),
the obfuscation can be resolved by looking at the code. In the other
(multiple paths contending on independent locks), the obfuscation can
only be resolved by repeating the test with different sampling.
Although that sounds simple, what if it is a difficult to execute test.
What if this appeared to be a one-time aberration that was captured during
one of many iterations. The chance to capture is gone.
For a more complete illustration, I would like to elaborate my previous
example. I had a sample file produced by our benchmarkers. They had
received the results on their third run after tweaking some app settings
and the results were nearly impossible to believe. This happened to be
an MPI job where all ranks barrier at the end of a phase so one single
rank being slow results in the entire application being slow.
After the third run, they repeated with the app settings from the
second run and then repeated again with the settings from the third
run. Neither run showed any signs of a similar problem. The customer
acceptance test continued. Before the customer would accept the results,
they needed that anomaly explained.
Fortunately, the customer had required a sampling output from every
run so data had been taken using perfmon and retained. This was on a
2.4 based system. The system had eight Ethernet adapters spread across
the machine. Interrupts for each were targeted to different cpus.
Because sampling was showing the caller, this turned into a simple
question, why was there so much network receive activity. On some of
the cpus, we noticed a significant number of processes were trying to
en-queue network packets at the same time. The sample IP showed we were
in a bundle after a spinlock was acquired.
Had we not provided the caller, we would have been left with something
that was relatively impossible to diagnose definitively. With the unroll,
it became a simple matter of looking at the enabled network services and
finding somebody had run a network benchmark using all eight network
adapters. We contacted the group responsible for network benchmarks
and the problem was isolated and explained to the customers satisfaction.
I hope this illustrates that one way of sampling makes it slightly more
difficult to determine that the source of slowdown is contention on
a lock where the other way of sampling results in it being impossible
to determine the source of a problem. Given the choices, I would say
the right way to do the sampling is to not attribute the samples to
the caller.
Thanks,
Robin
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Attribute spinlock contention ticks to caller.
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
` (14 preceding siblings ...)
2005-09-19 15:17 ` Robin Holt
@ 2005-09-19 17:52 ` David Mosberger-Tang
15 siblings, 0 replies; 17+ messages in thread
From: David Mosberger-Tang @ 2005-09-19 17:52 UTC (permalink / raw)
To: linux-ia64
And as Stephane already explained, if you use the right tool, there is
no need for the hack that you suggest. You can either use a
q-syscollect-like approach (which will give you call-counts, but not
necessarily distribute the time accurately) or you can unwind the
call-stack and even distribute the time correctly. That's all doable
today without any special-case hacks.
--david
On 9/19/05, Robin Holt <holt@sgi.com> wrote:
> On Sun, Sep 18, 2005 at 06:18:20PM -0700, David Mosberger-Tang wrote:
> > Well, it's an example where attributing the spinlock contention time
> > to the caller would have completely obfuscated the problem.
>
> Either way, we have obfuscation. In the one case (attributing to caller),
> the obfuscation can be resolved by looking at the code. In the other
> (multiple paths contending on independent locks), the obfuscation can
> only be resolved by repeating the test with different sampling.
>
> Although that sounds simple, what if it is a difficult to execute test.
> What if this appeared to be a one-time aberration that was captured during
> one of many iterations. The chance to capture is gone.
>
> For a more complete illustration, I would like to elaborate my previous
> example. I had a sample file produced by our benchmarkers. They had
> received the results on their third run after tweaking some app settings
> and the results were nearly impossible to believe. This happened to be
> an MPI job where all ranks barrier at the end of a phase so one single
> rank being slow results in the entire application being slow.
>
> After the third run, they repeated with the app settings from the
> second run and then repeated again with the settings from the third
> run. Neither run showed any signs of a similar problem. The customer
> acceptance test continued. Before the customer would accept the results,
> they needed that anomaly explained.
>
> Fortunately, the customer had required a sampling output from every
> run so data had been taken using perfmon and retained. This was on a
> 2.4 based system. The system had eight Ethernet adapters spread across
> the machine. Interrupts for each were targeted to different cpus.
>
> Because sampling was showing the caller, this turned into a simple
> question, why was there so much network receive activity. On some of
> the cpus, we noticed a significant number of processes were trying to
> en-queue network packets at the same time. The sample IP showed we were
> in a bundle after a spinlock was acquired.
>
> Had we not provided the caller, we would have been left with something
> that was relatively impossible to diagnose definitively. With the unroll,
> it became a simple matter of looking at the enabled network services and
> finding somebody had run a network benchmark using all eight network
> adapters. We contacted the group responsible for network benchmarks
> and the problem was isolated and explained to the customers satisfaction.
>
> I hope this illustrates that one way of sampling makes it slightly more
> difficult to determine that the source of slowdown is contention on
> a lock where the other way of sampling results in it being impossible
> to determine the source of a problem. Given the choices, I would say
> the right way to do the sampling is to not attribute the samples to
> the caller.
>
> Thanks,
> Robin
>
--
Mosberger Consulting LLC, voice/fax: 510-744-9372,
http://www.mosberger-consulting.com/
35706 Runckel Lane, Fremont, CA 94536
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2005-09-19 17:52 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-14 22:26 Attribute spinlock contention ticks to caller Robin Holt
2005-09-15 0:10 ` Keith Owens
2005-09-15 6:34 ` Stephane Eranian
2005-09-15 8:19 ` Stephane Eranian
2005-09-15 17:14 ` Robin Holt
2005-09-15 17:23 ` Robin Holt
2005-09-15 17:37 ` Luck, Tony
2005-09-15 22:29 ` Robin Holt
2005-09-15 22:54 ` Zou Nan hai
2005-09-16 9:37 ` Stephane Eranian
2005-09-16 22:29 ` Robin Holt
2005-09-17 1:08 ` David Mosberger-Tang
2005-09-18 23:06 ` Robin Holt
2005-09-19 1:18 ` David Mosberger-Tang
2005-09-19 8:35 ` Stephane Eranian
2005-09-19 15:17 ` Robin Holt
2005-09-19 17:52 ` David Mosberger-Tang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox