From: Victoria Dye <vdye@github.com>
To: Patrick Steinhardt <ps@pks.im>,
Victoria Dye via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 0/4] Performance improvement & cleanup in loose ref iteration
Date: Mon, 9 Oct 2023 14:49:14 -0700 [thread overview]
Message-ID: <28ae03f5-7091-d3f3-8a70-56aba6639640@github.com> (raw)
In-Reply-To: <ZSPQI2gkLOSdNWLu@tanuki>
Patrick Steinhardt wrote:
> On Fri, Oct 06, 2023 at 06:09:25PM +0000, Victoria Dye via GitGitGadget wrote:
>> While investigating ref iteration performance in builtins like
>> 'for-each-ref' and 'show-ref', I found two small improvement opportunities.
>>
>> The first patch tweaks the logic around prefix matching in
>> 'cache_ref_iterator_advance' so that we correctly skip refs that do not
>> actually match a given prefix. The unnecessary iteration doesn't seem to be
>> causing any bugs in the ref iteration commands that I've tested, but it
>> doesn't hurt to be more precise (and it helps with some other patches I'm
>> working on ;) ).
>>
>> The next three patches update how 'loose_fill_ref_dir' determines the type
>> of ref cache entry to create (directory or regular). On platforms that
>> include d_type information in 'struct dirent' (as far as I can tell, all
>> except NonStop & certain versions of Cygwin), this allows us to skip calling
>> 'stat'. In ad-hoc testing, this improved performance of 'git for-each-ref'
>> by about 20%.
>
> I've done a small set of benchmarks with my usual test repositories,
> which is linux.git with a bunch of references added. The repository
> comes in four sizes:
>
> - small: 50k references
> - medium: 500k references
> - high: 1.1m references
> - huge: 12m references
>
> Unfortunately, I couldn't really reproduce the performance improvements.
> In fact, the new version runs consistently a tiny bit slower than the
> old version:
>
> # Old version, which is 3a06386e31 (The fifteenth batch, 2023-10-04).
>
> Benchmark 1: git for-each-ref (revision=old,refcount=small)
> Time (mean ± σ): 135.5 ms ± 1.2 ms [User: 76.4 ms, System: 59.0 ms]
> Range (min … max): 134.8 ms … 136.9 ms 3 runs
>
> Benchmark 2: git for-each-ref (revision=old,refcount=medium)
> Time (mean ± σ): 822.7 ms ± 2.2 ms [User: 697.4 ms, System: 125.1 ms]
> Range (min … max): 821.1 ms … 825.2 ms 3 runs
>
> Benchmark 3: git for-each-ref (revision=old,refcount=high)
> Time (mean ± σ): 1.960 s ± 0.015 s [User: 1.702 s, System: 0.257 s]
> Range (min … max): 1.944 s … 1.973 s 3 runs
>
> # New version, which is your tip.
>
> Benchmark 4: git for-each-ref (revision=old,refcount=huge)
> Time (mean ± σ): 16.815 s ± 0.054 s [User: 15.091 s, System: 1.722 s]
> Range (min … max): 16.760 s … 16.869 s 3 runs
>
> Benchmark 5: git for-each-ref (revision=new,refcount=small)
> Time (mean ± σ): 136.0 ms ± 0.2 ms [User: 78.8 ms, System: 57.1 ms]
> Range (min … max): 135.8 ms … 136.2 ms 3 runs
>
> Benchmark 6: git for-each-ref (revision=new,refcount=medium)
> Time (mean ± σ): 830.4 ms ± 21.2 ms [User: 691.3 ms, System: 138.7 ms]
> Range (min … max): 814.2 ms … 854.5 ms 3 runs
>
> Benchmark 7: git for-each-ref (revision=new,refcount=high)
> Time (mean ± σ): 1.966 s ± 0.013 s [User: 1.717 s, System: 0.249 s]
> Range (min … max): 1.952 s … 1.978 s 3 runs
>
> Benchmark 8: git for-each-ref (revision=new,refcount=huge)
> Time (mean ± σ): 16.945 s ± 0.037 s [User: 15.182 s, System: 1.760 s]
> Range (min … max): 16.910 s … 16.983 s 3 runs
>
> Summary
> git for-each-ref (revision=old,refcount=small) ran
> 1.00 ± 0.01 times faster than git for-each-ref (revision=new,refcount=small)
> 6.07 ± 0.06 times faster than git for-each-ref (revision=old,refcount=medium)
> 6.13 ± 0.17 times faster than git for-each-ref (revision=new,refcount=medium)
> 14.46 ± 0.17 times faster than git for-each-ref (revision=old,refcount=high)
> 14.51 ± 0.16 times faster than git for-each-ref (revision=new,refcount=high)
> 124.09 ± 1.15 times faster than git for-each-ref (revision=old,refcount=huge)
> 125.05 ± 1.12 times faster than git for-each-ref (revision=new,refcount=huge)
>
> The performance regression isn't all that concerning, but it makes me
> wonder why I see things becoming slower rather than faster. My guess is
> that this is because all my test repositories are well-packed and don't
> have a lot of loose references. But I just wanted to confirm how you
> benchmarked your change and what the underlying shape of your test repo
> was.
I ran my benchmark on my (Intel) Mac with a test repository (single commit,
one file) containing:
- 10k refs/heads/ references
- 10k refs/tags/ references
- 10k refs/special/ references
All refs in the repository are loose. My Mac has historically been somewhat
slow and inconsistent when it comes to perf testing, though, so I re-ran the
benchmark a bit more formally on an Ubuntu VM (3 warmup iterations followed
by at least 10 iterations per test):
---
Benchmark 1: git for-each-ref (revision=old,refcount=3k)
Time (mean ± σ): 40.6 ms ± 3.9 ms [User: 13.2 ms, System: 27.1 ms]
Range (min … max): 37.2 ms … 59.1 ms 76 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: git for-each-ref (revision=new,refcount=3k)
Time (mean ± σ): 38.7 ms ± 4.4 ms [User: 13.8 ms, System: 24.5 ms]
Range (min … max): 35.1 ms … 57.2 ms 71 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 3: git for-each-ref (revision=old,refcount=30k)
Time (mean ± σ): 419.4 ms ± 43.9 ms [User: 136.4 ms, System: 274.1 ms]
Range (min … max): 385.1 ms … 528.7 ms 10 runs
Benchmark 4: git for-each-ref (revision=new,refcount=30k)
Time (mean ± σ): 390.4 ms ± 27.2 ms [User: 133.1 ms, System: 251.6 ms]
Range (min … max): 360.3 ms … 447.6 ms 10 runs
Benchmark 5: git for-each-ref (revision=old,refcount=300k)
Time (mean ± σ): 4.171 s ± 0.052 s [User: 1.400 s, System: 2.715 s]
Range (min … max): 4.118 s … 4.283 s 10 runs
Benchmark 6: git for-each-ref (revision=new,refcount=300k)
Time (mean ± σ): 3.939 s ± 0.054 s [User: 1.403 s, System: 2.466 s]
Range (min … max): 3.858 s … 4.026 s 10 runs
Summary
'git for-each-ref (revision=new,refcount=3k)' ran
1.05 ± 0.16 times faster than 'git for-each-ref (revision=old,refcount=3k)'
10.08 ± 1.34 times faster than 'git for-each-ref (revision=new,refcount=30k)'
10.83 ± 1.67 times faster than 'git for-each-ref (revision=old,refcount=30k)'
101.68 ± 11.63 times faster than 'git for-each-ref (revision=new,refcount=300k)'
107.67 ± 12.30 times faster than 'git for-each-ref (revision=old,refcount=300k)'
---
So it's not the 20% speedup I saw on my local test repo (it's more like
5-8%), but there does appear to be a consistent improvement. As for your
results, the changes in this series shouldn't affect packed ref operations,
and the difference between old & new doesn't seem to indicate a regression.
next prev parent reply other threads:[~2023-10-09 21:49 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-06 18:09 [PATCH 0/4] Performance improvement & cleanup in loose ref iteration Victoria Dye via GitGitGadget
2023-10-06 18:09 ` [PATCH 1/4] ref-cache.c: fix prefix matching in " Victoria Dye via GitGitGadget
2023-10-06 21:51 ` Junio C Hamano
2023-10-09 10:04 ` Patrick Steinhardt
2023-10-09 16:21 ` Victoria Dye
2023-10-09 18:15 ` Junio C Hamano
2023-10-06 18:09 ` [PATCH 2/4] dir.[ch]: expose 'get_dtype' Victoria Dye via GitGitGadget
2023-10-06 22:00 ` Junio C Hamano
2023-10-06 18:09 ` [PATCH 3/4] dir.[ch]: add 'follow_symlink' arg to 'get_dtype' Victoria Dye via GitGitGadget
2023-10-06 18:09 ` [PATCH 4/4] files-backend.c: avoid stat in 'loose_fill_ref_dir' Victoria Dye via GitGitGadget
2023-10-06 22:12 ` Junio C Hamano
2023-10-06 19:09 ` [PATCH 0/4] Performance improvement & cleanup in loose ref iteration Junio C Hamano
2023-10-09 10:04 ` Patrick Steinhardt
2023-10-09 21:49 ` Victoria Dye [this message]
2023-10-10 7:21 ` Patrick Steinhardt
2023-10-09 21:58 ` [PATCH v2 " Victoria Dye via GitGitGadget
2023-10-09 21:58 ` [PATCH v2 1/4] ref-cache.c: fix prefix matching in " Victoria Dye via GitGitGadget
2023-10-10 7:21 ` Patrick Steinhardt
2023-10-09 21:58 ` [PATCH v2 2/4] dir.[ch]: expose 'get_dtype' Victoria Dye via GitGitGadget
2023-10-09 21:58 ` [PATCH v2 3/4] dir.[ch]: add 'follow_symlink' arg to 'get_dtype' Victoria Dye via GitGitGadget
2023-10-09 21:58 ` [PATCH v2 4/4] files-backend.c: avoid stat in 'loose_fill_ref_dir' Victoria Dye via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=28ae03f5-7091-d3f3-8a70-56aba6639640@github.com \
--to=vdye@github.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.