git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Lidong Yan <yldhome2d2@gmail.com>
Cc: 502024330056@smail.nju.edu.cn, git@vger.kernel.org,
	gitster@pobox.com, toon@iotcl.com
Subject: Re: [PATCH v5 0/4] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal
Date: Thu, 10 Jul 2025 09:49:50 -0400	[thread overview]
Message-ID: <afb68948-218b-4b56-9faa-29578ef9c73c@gmail.com> (raw)
In-Reply-To: <20250710084829.2171855-1-502024330056@smail.nju.edu.cn>

On 7/10/2025 4:48 AM, Lidong Yan wrote:

> The difference from v4 is:
>   - for the bloom_key_* functions, we now pass struct bloom_key *
>     as the first parameter.
>   - bloom_keyvec_fill_key() and bloom_keyvec_new() have been merged
>     into a single function, so that each key is filled during the
>     creation of the bloom_keyvec.
> 
> Below is a comparison of the time taken to run git log on Git and
> LLVM repositories before and after applying this patch.
> 
> Setup commit-graph:
>   $ cd ~/src/git && git commit-graph write --split --reachable --changed-paths
>   $ cd ~/src/llvm && git commit-graph write --split --reachable --changed-paths
> 
> Run git log on Git repository
>   $ cd ~/src/git
>   $ hash -p /usr/bin/git git # my system git binary is 2.43.0
>   $ time git log -100 -- commit.c commit-graph.c >/dev/null
>   real	0m0.055s
>   user	0m0.040s
>   sys	  0m0.015s
>   $ hash -p ~/bin/git/bin/git git
>   $ time git log -100 -- commit.c commit-graph.c >/dev/null
>   real	0m0.039s
>   user	0m0.020s
>   sys	  0m0.020s
> 
> Run git log in LLVM repository
>   $ cd ~/src/llvm
>   $ hash -p /usr/bin/git git # my system git binary is 2.43.0
>   $ time git log -100 -- llvm/lib/Support/CommandLine.cpp llvm/lib/Support/CommandLine.h >/dev/null
>   real	0m2.365s
>   user	0m2.252s
>   sys	  0m0.113s
>   $ hash -p ~/bin/git/bin/git git
>   $ time git log -100 -- llvm/lib/Support/CommandLine.cpp llvm/lib/Support/CommandLine.h >/dev/null
>   real	0m0.240s
>   user	0m0.158s
>   sys	  0m0.064s

Thanks for these stats, though I'd recommend trying hyperfine [1]
which with this setup would let you run these experiments as the
following:

$ $ hyperfine --warmup=3 \
> -n 'old' '~/_git/git-sparse-checkout-clean/git log -100 -- commit.c commit-graph.c' \
> -n 'new' '~/_git/git/git log -100 -- commit.c commit-graph.c'
Benchmark 1: old
  Time (mean ± σ):      73.1 ms ±   2.9 ms    [User: 48.8 ms, System: 23.9 ms]
  Range (min … max):    69.9 ms …  84.5 ms    42 runs
 
Benchmark 2: new
  Time (mean ± σ):      55.1 ms ±   2.9 ms    [User: 30.5 ms, System: 24.4 ms]
  Range (min … max):    51.1 ms …  61.2 ms    52 runs
 
Summary
  'new' ran
    1.33 ± 0.09 times faster than 'old'

And for LLVM:

$ hyperfine --warmup=3 \
> -n 'old' '~/_git/git-sparse-checkout-clean/git log -100 -- llvm/lib/Support/CommandLine.cpp llvm/lib/Support/CommandLine.h' \
> -n 'new' '~/_git/git/git log -100 -- llvm/lib/Support/CommandLine.cpp llvm/lib/Support/CommandLine.h'

Benchmark 1: old
  Time (mean ± σ):      1.974 s ±  0.006 s    [User: 1.877 s, System: 0.097 s]
  Range (min … max):    1.960 s …  1.983 s    10 runs
 
Benchmark 2: new
  Time (mean ± σ):     262.9 ms ±   2.4 ms    [User: 214.2 ms, System: 48.4 ms]
  Range (min … max):   257.7 ms … 266.2 ms    11 runs
 
Summary
  'new' ran
    7.51 ± 0.07 times faster than 'old'

[1] https://github.com/sharkdp/hyperfine

Finally, putting these performance numbers in the commit message will
make the results permanently findable in the repo history.

> 3:  c042907b92 ! 3:  60a3b16bbb bloom: replace struct bloom_key * with struct bloom_keyvec

>     -+struct bloom_keyvec *bloom_keyvec_new(size_t count)
>     ++struct bloom_keyvec *bloom_keyvec_new(const char *path, size_t len,
>     ++				      const struct bloom_filter_settings *settings)
>      +{
>      +	struct bloom_keyvec *vec;
>     -+	size_t sz = sizeof(struct bloom_keyvec);
>     -+	sz += count * sizeof(struct bloom_key);
>     ++	const char *p;
>     ++	size_t sz;
>     ++	size_t nr = 1;
>     ++
>     ++	p = path;
>     ++	while (*p) {
>     ++		/*
>     ++		 * At this point, the path is normalized to use Unix-style
>     ++		 * path separators. This is required due to how the
>     ++		 * changed-path Bloom filters store the paths.
>     ++		 */
>     ++		if (*p == '/')
>     ++			nr++;
>     ++		p++;
>     ++	}
>     ++
>     ++	sz = sizeof(struct bloom_keyvec);
>     ++	sz += nr * sizeof(struct bloom_key);
>      +	vec = (struct bloom_keyvec *)xcalloc(1, sz);
>     -+	vec->count = count;
>     ++	if (!vec)
>     ++		return NULL;
>     ++	vec->count = nr;
>     ++
>     ++	bloom_key_fill(&vec->key[0], path, len, settings);
>     ++	nr = 1;
>     ++	p = path + len - 1;
>     ++	while (p > path) {
>     ++		if (*p == '/') {
>     ++			bloom_key_fill(&vec->key[nr++], path, p - path, settings);
>     ++		}
>     ++		p--;
>     ++	}
>     ++	assert(nr == vec->count);
>      +	return vec;
>      +}

These additions to bloom_keyvec_new() certainly help simplify some
of the code movement I was talking about, but there is more that
can be done to simplify patch 4. I'll send a couple example patches
in reply to patch 4 with what I mean.

Overall, the patch series is correct. My complaints are stylistic
more than anything.

Thanks,
-Stolee


  parent reply	other threads:[~2025-07-10 13:49 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-25 12:55 [PATCH 0/2] bloom: use bloom filter given multiple pathspec Lidong Yan
2025-06-25 12:55 ` [PATCH 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-06-25 17:43   ` Junio C Hamano
2025-06-26  3:44     ` Lidong Yan
2025-06-25 12:55 ` [PATCH 2/2] bloom: enable multiple pathspec bloom keys Lidong Yan
2025-06-27 13:50   ` Junio C Hamano
2025-06-27 14:24     ` Lidong Yan
2025-06-27 18:09       ` Junio C Hamano
2025-07-01  5:52     ` Lidong Yan
2025-07-01 15:19       ` Junio C Hamano
2025-07-02  7:14         ` Lidong Yan
2025-07-02 15:48           ` Junio C Hamano
2025-07-03  1:52             ` Lidong Yan
2025-07-04 12:09             ` Lidong Yan
2025-07-01  8:50     ` SZEDER Gábor
2025-07-01 11:40       ` Lidong Yan
2025-07-01 15:43       ` Junio C Hamano
2025-06-27 20:39   ` Junio C Hamano
2025-06-28  2:54     ` Lidong Yan
2025-06-25 17:32 ` [PATCH 0/2] bloom: use bloom filter given multiple pathspec Junio C Hamano
2025-06-26  3:34   ` Lidong Yan
2025-06-26 14:15     ` Junio C Hamano
2025-06-27  6:21 ` [PATCH v2 0/2] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Lidong Yan
2025-06-28  4:21   ` [PATCH v3 " Lidong Yan
2025-07-04 11:14     ` [PATCH v4 0/4] " Lidong Yan
2025-07-04 11:14       ` [PATCH v4 1/4] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-04 11:14       ` [PATCH v4 2/4] bloom: rename function operates on bloom_key Lidong Yan
2025-07-04 11:14       ` [PATCH v4 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-07 11:35         ` Derrick Stolee
2025-07-07 14:14           ` Lidong Yan
2025-07-04 11:14       ` [PATCH v4 4/4] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-07-07 11:43         ` Derrick Stolee
2025-07-07 14:18           ` Lidong Yan
2025-07-07 15:14           ` Junio C Hamano
2025-07-10  8:48       ` [PATCH v5 0/4] bloom: enable bloom filter optimization for multiple pathspec elements " Lidong Yan
2025-07-10  8:48         ` [PATCH v5 1/4] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-10  8:48         ` [PATCH v5 2/4] bloom: rename function operates on bloom_key Lidong Yan
2025-07-10  8:48         ` [PATCH v5 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-10 16:17           ` Junio C Hamano
2025-07-11 12:46             ` Lidong Yan
2025-07-11 15:06               ` Junio C Hamano
2025-07-10  8:48         ` [PATCH v5 4/4] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-07-10 13:51           ` [PATCH v5.1 3.5/4] revision: make helper for pathspec to bloom key Derrick Stolee
2025-07-10 15:42             ` Lidong Yan
2025-07-10 13:55           ` [PATCH v5.1 4/4] bloom: optimize multiple pathspec items in revision Derrick Stolee
2025-07-10 15:49             ` Lidong Yan
2025-07-10 13:49         ` Derrick Stolee [this message]
2025-07-12  9:35         ` [PATCH v6 0/5] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Lidong Yan
2025-07-12  9:35           ` [PATCH v6 1/5] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-12  9:35           ` [PATCH v6 2/5] bloom: rename function operates on bloom_key Lidong Yan
2025-07-12  9:35           ` [PATCH v6 3/5] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-12  9:35           ` [PATCH v6 4/5] revision: make helper for pathspec to bloom keyvec Lidong Yan
2025-07-12  9:35           ` [PATCH v6 5/5] To enable optimize multiple pathspec items in revision traversal, return 0 if all pathspec item is literal in forbid_bloom_filters(). Add for loops to initialize and check each pathspec item's bloom_keyvec when optimization is possible Lidong Yan
2025-07-12  9:47             ` Lidong Yan
2025-07-12  9:51               ` [PATCH v6 5/5] bloom: optimize multiple pathspec items in revision Lidong Yan
2025-07-14 16:51                 ` Derrick Stolee
2025-07-14 17:01                   ` Junio C Hamano
2025-07-15  1:37                     ` Lidong Yan
2025-07-15  2:56                       ` [RESEND][PATCH " Lidong Yan
2025-07-14 16:53           ` [PATCH v6 0/5] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Derrick Stolee
2025-07-14 17:02             ` Junio C Hamano
2025-07-15  1:34             ` Lidong Yan
2025-07-15  2:48               ` Derrick Stolee
2025-07-15 15:09                 ` Junio C Hamano
2025-06-28  4:21   ` [PATCH v3 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-02 15:08     ` Patrick Steinhardt
2025-07-02 15:49       ` Lidong Yan
2025-07-02 18:28       ` Junio C Hamano
2025-07-03  1:41         ` Lidong Yan
2025-06-28  4:21   ` [PATCH v3 2/2] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-06-27  6:21 ` [PATCH v2 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-06-27  6:21 ` [PATCH v2 2/2] bloom: optimize multiple pathspec items in revision traversal Lidong Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afb68948-218b-4b56-9faa-29578ef9c73c@gmail.com \
    --to=stolee@gmail.com \
    --cc=502024330056@smail.nju.edu.cn \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=toon@iotcl.com \
    --cc=yldhome2d2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).