From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754380Ab0A2XY0 (ORCPT ); Fri, 29 Jan 2010 18:24:26 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753575Ab0A2XY0 (ORCPT ); Fri, 29 Jan 2010 18:24:26 -0500 Received: from mail-ew0-f219.google.com ([209.85.219.219]:46761 "EHLO mail-ew0-f219.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753465Ab0A2XYZ (ORCPT ); Fri, 29 Jan 2010 18:24:25 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:mime-version:content-type :content-disposition:user-agent; b=ptCzlqCr9QhQSfn4XX1+E3n2iySkhnVJAXo7WkeRgFAscT6Ww77+8sc7WYzH0C0nAg 0Z07b7hybIIb4y/788n+rfkzv5pKGcEwmx2zp9oVa+A/veYCulno8AxNhAVWOuTAdjxr +er6l+oex75hP8DJBQiTPym0WUVgWCmmL5YAE= Date: Sat, 30 Jan 2010 00:17:24 +0100 From: Frederic Weisbecker To: Hitoshi Mitake , Ingo Molnar , Peter Zijlstra , Paul Mackerras , Arnaldo Carvalho de Melo , Thomas Gleixner , "Paul E. McKenney" Cc: LKML Subject: Lock dependency based tree report in perf lock Message-ID: <20100129231723.GB5052@nowhere> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Looking at the report layout we have with perf lock, it gives a cool overview of per lock waittime, acquire time, by avg/max, that's cool. Now to complete the overview through another dimension we could have another mode on top of a lock dependency based tree: -- 125 ms -- lock1 | -- 100 ms -- lock2 | | | -- 99 ms -- lock 3 | | | -- 16 us -- lock 4 | | | ----- [...] | -- 25 ms -- lock 5 Having such view can tell you which lock is delaying another one. We could have this view by average or maximum of acquired time (waittime here would be pointless). And we can also report the outer locking noise time for these reports. Taking the above example: -- 100 ms/84 us -- lock2 | -- 99 ms -- lock 3 | -- 16 us -- lock 4 | ----- [...] 100 ms is the time lock2 has been acquired. Lock 3 has been acquired 99 ms and lock 4 during 16 us, which means the outer locking noise (the code executed outside children locks) is of 84 us. This noise can give a good rough glance of the influence childs locks can have on a parent. This is somehow the child weight, this all can even be expressed using percentages, we could even reproduce the absolute/relative percentage we currently have with the callchains in perf report (-g graph for absolute, -g fractal for relative). Or we can have a ghost child for each locks that represent the outer child locks noise: -- parent percentage -- lock2 | -- 99 % -- lock 3 | -- 0.84 % -- noise // could have its own color for quick distinguishing | -- 0.16 % -- lock 4 | ----- [...] May be just one tricky thing. Say we have lock3 and lock4 that depend on lock2, it doesn't always means lock4 will always be taken after lock3, it doesn't even mean lock4 will ever be taken after lock3. So we shouldn't have one branch per dependency but one branch per practical scenario encountered. So imagine we have the following dependency: lock2 | --- lock3 | --- lock4 | --- lock5 But we can actually have only the following scenarios: lock2 | -- lock3 | -- lock4 lock2 | -- lock4 lock2 | -- lock5 Then we need three branches: -- lock2 percentage -- lock2 | -- 60 % | | | -- 90 % -- lock 3 | | | -- 9 % -- noise | | | -- 1 % -- lock 4 | -- 20 % | | | -- 99 % -- lock4 | | | -- 1 % -- noise | -- 20 % | -- 14 % -- lock5 | -- 86 % -- noise Hopefully we could have such expandable tree using ncurses, but we can probably start like callchains in perf report. There might also be a problem with the accuracy. The max time lock2 is acquired won't always match the path where lock3 and lock4 had their max time too, but this should give, most of the time, a view close to the reality, especially once we enter non-sleepable lock branches. Anyway, that's just an idea, not trivial I must admit.