From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Mosberger-Tang Date: Mon, 19 Sep 2005 17:52:11 +0000 Subject: Re: Attribute spinlock contention ticks to caller. Message-Id: List-Id: References: <20050914222644.GA5036@lnx-holt.americas.sgi.com> In-Reply-To: <20050914222644.GA5036@lnx-holt.americas.sgi.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-ia64@vger.kernel.org And as Stephane already explained, if you use the right tool, there is no need for the hack that you suggest. You can either use a q-syscollect-like approach (which will give you call-counts, but not necessarily distribute the time accurately) or you can unwind the call-stack and even distribute the time correctly. That's all doable today without any special-case hacks. --david On 9/19/05, Robin Holt wrote: > On Sun, Sep 18, 2005 at 06:18:20PM -0700, David Mosberger-Tang wrote: > > Well, it's an example where attributing the spinlock contention time > > to the caller would have completely obfuscated the problem. > > Either way, we have obfuscation. In the one case (attributing to caller), > the obfuscation can be resolved by looking at the code. In the other > (multiple paths contending on independent locks), the obfuscation can > only be resolved by repeating the test with different sampling. > > Although that sounds simple, what if it is a difficult to execute test. > What if this appeared to be a one-time aberration that was captured during > one of many iterations. The chance to capture is gone. > > For a more complete illustration, I would like to elaborate my previous > example. I had a sample file produced by our benchmarkers. They had > received the results on their third run after tweaking some app settings > and the results were nearly impossible to believe. This happened to be > an MPI job where all ranks barrier at the end of a phase so one single > rank being slow results in the entire application being slow. > > After the third run, they repeated with the app settings from the > second run and then repeated again with the settings from the third > run. Neither run showed any signs of a similar problem. The customer > acceptance test continued. Before the customer would accept the results, > they needed that anomaly explained. > > Fortunately, the customer had required a sampling output from every > run so data had been taken using perfmon and retained. This was on a > 2.4 based system. The system had eight Ethernet adapters spread across > the machine. Interrupts for each were targeted to different cpus. > > Because sampling was showing the caller, this turned into a simple > question, why was there so much network receive activity. On some of > the cpus, we noticed a significant number of processes were trying to > en-queue network packets at the same time. The sample IP showed we were > in a bundle after a spinlock was acquired. > > Had we not provided the caller, we would have been left with something > that was relatively impossible to diagnose definitively. With the unroll, > it became a simple matter of looking at the enabled network services and > finding somebody had run a network benchmark using all eight network > adapters. We contacted the group responsible for network benchmarks > and the problem was isolated and explained to the customers satisfaction. > > I hope this illustrates that one way of sampling makes it slightly more > difficult to determine that the source of slowdown is contention on > a lock where the other way of sampling results in it being impossible > to determine the source of a problem. Given the choices, I would say > the right way to do the sampling is to not attribute the samples to > the caller. > > Thanks, > Robin > -- Mosberger Consulting LLC, voice/fax: 510-744-9372, http://www.mosberger-consulting.com/ 35706 Runckel Lane, Fremont, CA 94536