Re: I made a flame graph renderer for git's trace2 output

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Jeff King <peff@peff.net>, Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <stolee@gmail.com>,
	Junio C Hamano <gitster@pobox.com>,
	Josh Steadmon <steadmon@google.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: I made a flame graph renderer for git's trace2 output
Date: Tue, 21 May 2019 16:19:09 +0200	[thread overview]
Message-ID: <877eajvqjm.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <d1f9a2f4-500d-d5f6-af3d-75d3fdb1323b@jeffhostetler.com>


On Mon, May 20 2019, Jeff Hostetler wrote:

> On 5/10/2019 5:57 PM, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, May 10 2019, Jeff King wrote:
>>
>>> On Fri, May 10, 2019 at 05:09:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>
>>>> As noted in TODOs in the script there's various stuff I'd like to do
>>>> better, and this also shows how we need a lot more trace regions to get
>>>> granular data.
>>>
>>> Hmm. My gut reaction was: doesn't "perf record -g make test" already
>>> give us that granular data? I know "perf" isn't available everywhere,
>>> but the idea of the FlameGraph repo is that it takes input from a lot of
>>> sources (though I don't know if it supports any Windows-specific formats
>>> yet, which is presumably a point of interesting to trace-2 authors).
>>>
>>> But having generated such a flamegraph, it's not all that helpful. It
>>> mainly tells us that we spend a lot of time on fork/exec. Which is no
>>> surprise, since the test suite is geared not towards heavy workloads,
>>> but lots of tiny functionality tests.
>>>
>>> TBH, I'm not sure that flame-graphing the test suite is going to be all
>>> that useful in the long run. It's going to be heavily weighted by the
>>> types of things the test suite does. Flamegraphs are good for
>>> understanding where your time is going for a particular workload, but
>>> the workload of the test suite is not that interesting.
>>>
>>> And once you do have a particular workload of interest that you can
>>> replay, then I think the granular "perf" results really can be helpful.
>>>
>>> I think the trace2 flamegraph would be most useful if you were
>>> collecting across a broad spectrum of workloads done by a user. You
>>> _can_ do that with perf or similar tools, but it can be a bit awkward.
>>> I do wonder how painful it would be to alias "git" to "perf record git"
>>> for a day or something.
>>
>> Yeah I should have mentioned that I'm mainly linking to the test suite
>> rendering as a demo.
>>
>> My actual use-case for this is to see what production nodes are spending
>> their time on, similar to what Microsoft is doing with their use of this
>> facility.
>>
>> The test suite serves as a really good test-case for the output, and to
>> stress-test my aggregation script, since we're pretty much guaranteed to
>> run all our commands, and cover a lot of unusual cases.
>>
>> It also shows that we've got a long way to go in improving the trace2
>> facility, i.e. adding region enter/leave for some of the things we spend
>> the most time on.
>>
>
> Very nice!
>
> Yes, there is more work to do to add more regions to get more
> granular data for interesting/problematic things.  My primary
> goal in this phase has been to get the basic machinery in place
> and be vetted with some universally interesting regions, such as
> reading/writing the index and the phases of status.
>
> Going forward, we can trivially (permanently) add new regions as we
> want.  I tend to use temporary "experimental" regions during my perf
> investigations so that I don't clutter up the mainline source with
> uninteresting noise.

Indeed, a lot more regions are needed.

> WRT the TODO's in your script:
>
> [] I don't think data events will be useful for your usage.  The data
> values are orthogonal to the time values.

I haven't done this, so I'm not asserting that it's useful, but from
some brief grepping a few datapoints are overwhelmingly common, and can
be faked up into regions of sorts for the purposes of a flamegraph.

E.g. "git checkout" will reliably have read/version early on, and then
write/version, in that case mostly/entirely redundant to the
do_{read,write}_index region, but in general I think we'll be able to
loosely plot data points say as "given the median runtime, here's the
median % of time into the command we first encounter this data point".

> [] I would add the child_start/_exit events to the stack.  This will
> give you the names of non-builtin/shell commands and hooks.  The
> various "child_class" and "use_shell" and "hook_name" fields will help
> you avoid duplicate stack frames (which you'll get for builtin
> commands).

Yeah that's very useful. Any reason not to do something like this:

    diff --git a/git.c b/git.c
    index 1bf9c94550..6c926ae013 100644
    --- a/git.c
    +++ b/git.c
    @@ -698 +698 @@ static void execv_dashed_external(const char **argv)
    -       trace2_cmd_name("_run_dashed_");
    +       trace2_cmd_name(is_builtin(argv[0]) ? argv[0] : "_run_dashed_");

I haven't tested, but we e.g. report 'git-submodule' as just
'_run_dashed_', seems we could do better...

next prev parent reply	other threads:[~2019-05-21 14:19 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-10 15:09 I made a flame graph renderer for git's trace2 output Ævar Arnfjörð Bjarmason
2019-05-10 16:38 ` Derrick Stolee
2019-05-10 17:00   ` SZEDER Gábor
2019-05-20 18:49     ` Jeff Hostetler
2019-05-10 21:03 ` Jeff King
2019-05-10 21:57   ` Ævar Arnfjörð Bjarmason
2019-05-20 18:22     ` Jeff Hostetler
2019-05-21 14:19       ` Ævar Arnfjörð Bjarmason [this message]
2019-05-21 20:46         ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877eajvqjm.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=steadmon@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.