From mboxrd@z Thu Jan  1 00:00:00 1970
From: David Mosberger-Tang <David.Mosberger@acm.org>
Date: Mon, 19 Sep 2005 17:52:11 +0000
Subject: Re: Attribute spinlock contention ticks to caller.
Message-Id: <ed5aea4305091910526c0e475@mail.gmail.com>
List-Id: <linux-ia64.vger.kernel.org>
References: <20050914222644.GA5036@lnx-holt.americas.sgi.com>
In-Reply-To: <20050914222644.GA5036@lnx-holt.americas.sgi.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: linux-ia64@vger.kernel.org

And as Stephane already explained, if you use the right tool, there is
no need for the hack that you suggest.  You can either use a
q-syscollect-like approach (which will give you call-counts, but not
necessarily distribute the time accurately) or you can unwind the
call-stack and even distribute the time correctly.  That's all doable
today without any special-case hacks.

  --david

On 9/19/05, Robin Holt <holt@sgi.com> wrote:
> On Sun, Sep 18, 2005 at 06:18:20PM -0700, David Mosberger-Tang wrote:
> > Well, it's an example where attributing the spinlock contention time
> > to the caller would have completely obfuscated the problem.
> 
> Either way, we have obfuscation.  In the one case (attributing to caller),
> the obfuscation can be resolved by looking at the code.  In the other
> (multiple paths contending on independent locks), the obfuscation can
> only be resolved by repeating the test with different sampling.
> 
> Although that sounds simple, what if it is a difficult to execute test.
> What if this appeared to be a one-time aberration that was captured during
> one of many iterations.  The chance to capture is gone.
> 
> For a more complete illustration, I would like to elaborate my previous
> example.  I had a sample file produced by our benchmarkers.  They had
> received the results on their third run after tweaking some app settings
> and the results were nearly impossible to believe.  This happened to be
> an MPI job where all ranks barrier at the end of a phase so one single
> rank being slow results in the entire application being slow.
> 
> After the third run, they repeated with the app settings from the
> second run and then repeated again with the settings from the third
> run.  Neither run showed any signs of a similar problem.  The customer
> acceptance test continued.  Before the customer would accept the results,
> they needed that anomaly explained.
> 
> Fortunately, the customer had required a sampling output from every
> run so data had been taken using perfmon and retained.  This was on a
> 2.4 based system.  The system had eight Ethernet adapters spread across
> the machine.  Interrupts for each were targeted to different cpus.
> 
> Because sampling was showing the caller, this turned into a simple
> question, why was there so much network receive activity.  On some of
> the cpus, we noticed a significant number of processes were trying to
> en-queue network packets at the same time.  The sample IP showed we were
> in a bundle after a spinlock was acquired.
> 
> Had we not provided the caller, we would have been left with something
> that was relatively impossible to diagnose definitively.  With the unroll,
> it became a simple matter of looking at the enabled network services and
> finding somebody had run a network benchmark using all eight network
> adapters.  We contacted the group responsible for network benchmarks
> and the problem was isolated and explained to the customers satisfaction.
> 
> I hope this illustrates that one way of sampling makes it slightly more
> difficult to determine that the source of slowdown is contention on
> a lock where the other way of sampling results in it being impossible
> to determine the source of a problem.  Given the choices, I would say
> the right way to do the sampling is to not attribute the samples to
> the caller.
> 
> Thanks,
> Robin
> 


-- 
Mosberger Consulting LLC, voice/fax: 510-744-9372,
http://www.mosberger-consulting.com/
35706 Runckel Lane, Fremont, CA 94536