From: Derrick Stolee <stolee@gmail.com>
To: Lidong Yan <yldhome2d2@gmail.com>
Cc: 502024330056@smail.nju.edu.cn, git@vger.kernel.org,
gitster@pobox.com, toon@iotcl.com
Subject: Re: [PATCH v5 0/4] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal
Date: Thu, 10 Jul 2025 09:49:50 -0400 [thread overview]
Message-ID: <afb68948-218b-4b56-9faa-29578ef9c73c@gmail.com> (raw)
In-Reply-To: <20250710084829.2171855-1-502024330056@smail.nju.edu.cn>
On 7/10/2025 4:48 AM, Lidong Yan wrote:
> The difference from v4 is:
> - for the bloom_key_* functions, we now pass struct bloom_key *
> as the first parameter.
> - bloom_keyvec_fill_key() and bloom_keyvec_new() have been merged
> into a single function, so that each key is filled during the
> creation of the bloom_keyvec.
>
> Below is a comparison of the time taken to run git log on Git and
> LLVM repositories before and after applying this patch.
>
> Setup commit-graph:
> $ cd ~/src/git && git commit-graph write --split --reachable --changed-paths
> $ cd ~/src/llvm && git commit-graph write --split --reachable --changed-paths
>
> Run git log on Git repository
> $ cd ~/src/git
> $ hash -p /usr/bin/git git # my system git binary is 2.43.0
> $ time git log -100 -- commit.c commit-graph.c >/dev/null
> real 0m0.055s
> user 0m0.040s
> sys 0m0.015s
> $ hash -p ~/bin/git/bin/git git
> $ time git log -100 -- commit.c commit-graph.c >/dev/null
> real 0m0.039s
> user 0m0.020s
> sys 0m0.020s
>
> Run git log in LLVM repository
> $ cd ~/src/llvm
> $ hash -p /usr/bin/git git # my system git binary is 2.43.0
> $ time git log -100 -- llvm/lib/Support/CommandLine.cpp llvm/lib/Support/CommandLine.h >/dev/null
> real 0m2.365s
> user 0m2.252s
> sys 0m0.113s
> $ hash -p ~/bin/git/bin/git git
> $ time git log -100 -- llvm/lib/Support/CommandLine.cpp llvm/lib/Support/CommandLine.h >/dev/null
> real 0m0.240s
> user 0m0.158s
> sys 0m0.064s
Thanks for these stats, though I'd recommend trying hyperfine [1]
which with this setup would let you run these experiments as the
following:
$ $ hyperfine --warmup=3 \
> -n 'old' '~/_git/git-sparse-checkout-clean/git log -100 -- commit.c commit-graph.c' \
> -n 'new' '~/_git/git/git log -100 -- commit.c commit-graph.c'
Benchmark 1: old
Time (mean ± σ): 73.1 ms ± 2.9 ms [User: 48.8 ms, System: 23.9 ms]
Range (min … max): 69.9 ms … 84.5 ms 42 runs
Benchmark 2: new
Time (mean ± σ): 55.1 ms ± 2.9 ms [User: 30.5 ms, System: 24.4 ms]
Range (min … max): 51.1 ms … 61.2 ms 52 runs
Summary
'new' ran
1.33 ± 0.09 times faster than 'old'
And for LLVM:
$ hyperfine --warmup=3 \
> -n 'old' '~/_git/git-sparse-checkout-clean/git log -100 -- llvm/lib/Support/CommandLine.cpp llvm/lib/Support/CommandLine.h' \
> -n 'new' '~/_git/git/git log -100 -- llvm/lib/Support/CommandLine.cpp llvm/lib/Support/CommandLine.h'
Benchmark 1: old
Time (mean ± σ): 1.974 s ± 0.006 s [User: 1.877 s, System: 0.097 s]
Range (min … max): 1.960 s … 1.983 s 10 runs
Benchmark 2: new
Time (mean ± σ): 262.9 ms ± 2.4 ms [User: 214.2 ms, System: 48.4 ms]
Range (min … max): 257.7 ms … 266.2 ms 11 runs
Summary
'new' ran
7.51 ± 0.07 times faster than 'old'
[1] https://github.com/sharkdp/hyperfine
Finally, putting these performance numbers in the commit message will
make the results permanently findable in the repo history.
> 3: c042907b92 ! 3: 60a3b16bbb bloom: replace struct bloom_key * with struct bloom_keyvec
> -+struct bloom_keyvec *bloom_keyvec_new(size_t count)
> ++struct bloom_keyvec *bloom_keyvec_new(const char *path, size_t len,
> ++ const struct bloom_filter_settings *settings)
> +{
> + struct bloom_keyvec *vec;
> -+ size_t sz = sizeof(struct bloom_keyvec);
> -+ sz += count * sizeof(struct bloom_key);
> ++ const char *p;
> ++ size_t sz;
> ++ size_t nr = 1;
> ++
> ++ p = path;
> ++ while (*p) {
> ++ /*
> ++ * At this point, the path is normalized to use Unix-style
> ++ * path separators. This is required due to how the
> ++ * changed-path Bloom filters store the paths.
> ++ */
> ++ if (*p == '/')
> ++ nr++;
> ++ p++;
> ++ }
> ++
> ++ sz = sizeof(struct bloom_keyvec);
> ++ sz += nr * sizeof(struct bloom_key);
> + vec = (struct bloom_keyvec *)xcalloc(1, sz);
> -+ vec->count = count;
> ++ if (!vec)
> ++ return NULL;
> ++ vec->count = nr;
> ++
> ++ bloom_key_fill(&vec->key[0], path, len, settings);
> ++ nr = 1;
> ++ p = path + len - 1;
> ++ while (p > path) {
> ++ if (*p == '/') {
> ++ bloom_key_fill(&vec->key[nr++], path, p - path, settings);
> ++ }
> ++ p--;
> ++ }
> ++ assert(nr == vec->count);
> + return vec;
> +}
These additions to bloom_keyvec_new() certainly help simplify some
of the code movement I was talking about, but there is more that
can be done to simplify patch 4. I'll send a couple example patches
in reply to patch 4 with what I mean.
Overall, the patch series is correct. My complaints are stylistic
more than anything.
Thanks,
-Stolee
next prev parent reply other threads:[~2025-07-10 13:49 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-25 12:55 [PATCH 0/2] bloom: use bloom filter given multiple pathspec Lidong Yan
2025-06-25 12:55 ` [PATCH 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-06-25 17:43 ` Junio C Hamano
2025-06-26 3:44 ` Lidong Yan
2025-06-25 12:55 ` [PATCH 2/2] bloom: enable multiple pathspec bloom keys Lidong Yan
2025-06-27 13:50 ` Junio C Hamano
2025-06-27 14:24 ` Lidong Yan
2025-06-27 18:09 ` Junio C Hamano
2025-07-01 5:52 ` Lidong Yan
2025-07-01 15:19 ` Junio C Hamano
2025-07-02 7:14 ` Lidong Yan
2025-07-02 15:48 ` Junio C Hamano
2025-07-03 1:52 ` Lidong Yan
2025-07-04 12:09 ` Lidong Yan
2025-07-01 8:50 ` SZEDER Gábor
2025-07-01 11:40 ` Lidong Yan
2025-07-01 15:43 ` Junio C Hamano
2025-06-27 20:39 ` Junio C Hamano
2025-06-28 2:54 ` Lidong Yan
2025-06-25 17:32 ` [PATCH 0/2] bloom: use bloom filter given multiple pathspec Junio C Hamano
2025-06-26 3:34 ` Lidong Yan
2025-06-26 14:15 ` Junio C Hamano
2025-06-27 6:21 ` [PATCH v2 0/2] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Lidong Yan
2025-06-28 4:21 ` [PATCH v3 " Lidong Yan
2025-07-04 11:14 ` [PATCH v4 0/4] " Lidong Yan
2025-07-04 11:14 ` [PATCH v4 1/4] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-04 11:14 ` [PATCH v4 2/4] bloom: rename function operates on bloom_key Lidong Yan
2025-07-04 11:14 ` [PATCH v4 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-07 11:35 ` Derrick Stolee
2025-07-07 14:14 ` Lidong Yan
2025-07-04 11:14 ` [PATCH v4 4/4] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-07-07 11:43 ` Derrick Stolee
2025-07-07 14:18 ` Lidong Yan
2025-07-07 15:14 ` Junio C Hamano
2025-07-10 8:48 ` [PATCH v5 0/4] bloom: enable bloom filter optimization for multiple pathspec elements " Lidong Yan
2025-07-10 8:48 ` [PATCH v5 1/4] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-10 8:48 ` [PATCH v5 2/4] bloom: rename function operates on bloom_key Lidong Yan
2025-07-10 8:48 ` [PATCH v5 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-10 16:17 ` Junio C Hamano
2025-07-11 12:46 ` Lidong Yan
2025-07-11 15:06 ` Junio C Hamano
2025-07-10 8:48 ` [PATCH v5 4/4] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-07-10 13:51 ` [PATCH v5.1 3.5/4] revision: make helper for pathspec to bloom key Derrick Stolee
2025-07-10 15:42 ` Lidong Yan
2025-07-10 13:55 ` [PATCH v5.1 4/4] bloom: optimize multiple pathspec items in revision Derrick Stolee
2025-07-10 15:49 ` Lidong Yan
2025-07-10 13:49 ` Derrick Stolee [this message]
2025-07-12 9:35 ` [PATCH v6 0/5] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Lidong Yan
2025-07-12 9:35 ` [PATCH v6 1/5] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-12 9:35 ` [PATCH v6 2/5] bloom: rename function operates on bloom_key Lidong Yan
2025-07-12 9:35 ` [PATCH v6 3/5] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-12 9:35 ` [PATCH v6 4/5] revision: make helper for pathspec to bloom keyvec Lidong Yan
2025-07-12 9:35 ` [PATCH v6 5/5] To enable optimize multiple pathspec items in revision traversal, return 0 if all pathspec item is literal in forbid_bloom_filters(). Add for loops to initialize and check each pathspec item's bloom_keyvec when optimization is possible Lidong Yan
2025-07-12 9:47 ` Lidong Yan
2025-07-12 9:51 ` [PATCH v6 5/5] bloom: optimize multiple pathspec items in revision Lidong Yan
2025-07-14 16:51 ` Derrick Stolee
2025-07-14 17:01 ` Junio C Hamano
2025-07-15 1:37 ` Lidong Yan
2025-07-15 2:56 ` [RESEND][PATCH " Lidong Yan
2025-07-14 16:53 ` [PATCH v6 0/5] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Derrick Stolee
2025-07-14 17:02 ` Junio C Hamano
2025-07-15 1:34 ` Lidong Yan
2025-07-15 2:48 ` Derrick Stolee
2025-07-15 15:09 ` Junio C Hamano
2025-06-28 4:21 ` [PATCH v3 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-02 15:08 ` Patrick Steinhardt
2025-07-02 15:49 ` Lidong Yan
2025-07-02 18:28 ` Junio C Hamano
2025-07-03 1:41 ` Lidong Yan
2025-06-28 4:21 ` [PATCH v3 2/2] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-06-27 6:21 ` [PATCH v2 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-06-27 6:21 ` [PATCH v2 2/2] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afb68948-218b-4b56-9faa-29578ef9c73c@gmail.com \
--to=stolee@gmail.com \
--cc=502024330056@smail.nju.edu.cn \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=toon@iotcl.com \
--cc=yldhome2d2@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).