All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Victoria Dye <vdye@github.com>,
	Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: gitster@pobox.com, phillip.wood123@gmail.com, jonathantanmy@google.com
Subject: Re: [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
Date: Thu, 10 Nov 2022 09:44:36 -0500	[thread overview]
Message-ID: <acc2a6d9-16aa-2576-d9cb-ca75fd94a2fa@github.com> (raw)
In-Reply-To: <99c1e5e0-d5cd-cf0e-25ba-31bc96a089c6@github.com>

On 11/9/2022 5:18 PM, Victoria Dye wrote:
> Derrick Stolee wrote:
>> On 11/8/2022 5:44 PM, Victoria Dye via GitGitGadget wrote:
>>> Following up on a discussion [1] around cache tree refreshes in 'git reset',
>>> this series updates callers of 'unpack_trees()' to skip its internal
>>> invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
>>> immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
>>> expensive operation, and it is redundant when 'prime_cache_tree()' clears
>>> and rebuilds the cache tree from scratch immediately after.
>>>
>>> The first patch adds a test directly comparing the execution time of
>>> 'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
>>> that on a fully-valid cache tree, they perform the same, but on a
>>> fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
>>> (although both are so fast that the total execution time of 100 invocations
>>> is needed to compare the results in the default perf repo).
>>
>> One thing I found interesting is how you needed 200 iterations to show
>> a meaningful change in this test script, but in the case of 'git reset'
>> we can see sizeable improvements even with a single iteration.
> 
> All of the new performance tests run with multiple iterations: 20 for reset
> (10 iterations of two resets each), 100 for read-tree, 200 for the
> comparison of 'cache_tree_update()' & 'prime_cache_tree()'. Those counts
> were picked mostly by trial-and-error, to strike a balance of "the test
> doesn't take too long to run" and "the change in execution time is clearly
> visible in the results."

Thanks for pointing out my misunderstanding. I missed the repeat counts
because 2-3 seconds "seemed right" based on performance tests of large
monorepos, but clearly that's not right when using the Git repository for
performance tests.
>> Is there something about this test that is artificially speeding up
>> these iterations? Perhaps the index has up-to-date filesystem information
>> that allows these methods to avoid filesystem interactions that are
>> necessary in the 'git reset' case?
> 
> I would expect the "cache_tree_update, invalid" test's execution time, when
> scaled to the iterations of 'read-tree' and 'reset', to match the change in
> timing of those commands, but the command tests are reporting *much* larger
> improvements (e.g., I'd expect a 0.27s improvement in 'git read-tree', but
> the results are *consistently* >=0.9s).
> 
> Per trace2 logs, a single invocation of 'read-tree' matching the one added
> in 'p0006' spent 0.010108s in 'cache_tree_update()'. Over 100 iterations,
> the total time would be ~1.01s, which lines up with the 'p0006' test
> results. However, the trace2 results for "test-tool cache-tree --count 3
> --fresh --update" show the first iteration taking 0.013060s (looks good),
> then the next taking 0.003755s, then 0.004026s (_much_ faster than
> expected).
> 
> To be honest, I can't figure out what's going on there. It might be some
> kind of runtime/memory optimization with repeatedly rebuilding the same
> cache tree (doesn't seem to be compiler optimization, since the speedup
> still happens with '-O0'). The only sure-fire way to avoid it seems to be
> moving the iteration outside of 'test-cache-tree.c' and into 'p0090'.
> Unfortunately, the command initialization overhead *really* slows things
> down, but I can add a "control" test (with no cache tree refresh) to
> quantify how long that initialization takes.

Getting unit-level performance tests is always tricky. Sometimes the best
way to do it is to collect a sample using GIT_TRACE2_PERF and then manually
collect the region times. It could be a fun project to integrate region
measurements into the performance test suite instead of only end-to-end
timings.
 
> While looking into this, I found a few other things I'd like to add to/fix
> in that test (add a "partially-invalidated" cache tree case, use the
> original cache tree OID in 'prime_cache_tree()' rather than the OID at
> HEAD), so I'll re-roll with those + the updated iteration logic.

Taking a look now. Thanks!

-Stolee

  reply	other threads:[~2022-11-10 14:44 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-08 22:44 [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
2022-11-10  7:23   ` SZEDER Gábor
2022-11-08 22:44 ` [PATCH 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 3/5] reset: use " Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 4/5] read-tree: " Victoria Dye via GitGitGadget
2022-11-08 22:44 ` [PATCH 5/5] rebase: " Victoria Dye via GitGitGadget
2022-11-09 15:23 ` [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Derrick Stolee
2022-11-09 22:18   ` Victoria Dye
2022-11-10 14:44     ` Derrick Stolee [this message]
2022-11-09 23:01 ` Taylor Blau
2022-11-10  1:57 ` [PATCH v2 " Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 3/5] reset: use " Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 4/5] read-tree: " Victoria Dye via GitGitGadget
2022-11-10  1:57   ` [PATCH v2 5/5] rebase: " Victoria Dye via GitGitGadget
2022-11-10 14:40     ` Phillip Wood
2022-11-10 18:19       ` Victoria Dye
2022-11-10  2:12   ` [PATCH v2 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after Taylor Blau
2022-11-10 17:26   ` Derrick Stolee
2022-11-10 19:06   ` [PATCH v3 " Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 1/5] cache-tree: add perf test comparing update and prime Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 2/5] unpack-trees: add 'skip_cache_tree_update' option Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 3/5] reset: use " Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 4/5] read-tree: " Victoria Dye via GitGitGadget
2022-11-10 19:06     ` [PATCH v3 5/5] rebase: " Victoria Dye via GitGitGadget
2022-11-10 19:50     ` [PATCH v3 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after SZEDER Gábor
2022-11-10 20:54       ` Victoria Dye
2022-11-11  2:50     ` Taylor Blau
2022-11-14  0:08       ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=acc2a6d9-16aa-2576-d9cb-ca75fd94a2fa@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=phillip.wood123@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.