git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Lidong Yan <yldhome2d2@gmail.com>
Cc: git@vger.kernel.org, Lidong Yan <502024330056@smail.nju.edu.cn>
Subject: Re: [PATCH v3 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec
Date: Wed, 2 Jul 2025 17:08:23 +0200	[thread overview]
Message-ID: <aGVLZ9VUf2M1sWhL@pks.im> (raw)
In-Reply-To: <20250628042140.1097910-2-502024330056@smail.nju.edu.cn>

On Sat, Jun 28, 2025 at 12:21:39PM +0800, Lidong Yan wrote:
> diff --git a/bloom.h b/bloom.h
> index 6e46489a20..9e4e832c8c 100644
> --- a/bloom.h
> +++ b/bloom.h
> @@ -74,6 +74,11 @@ struct bloom_key {
>  	uint32_t *hashes;
>  };
>  
> +struct bloom_keyvec {
> +	size_t count;
> +	struct bloom_key key[FLEX_ARRAY];
> +};
> +

A short comment would help readers understand what the intent of this
data structure is.

>  int load_bloom_filter_from_graph(struct commit_graph *g,
>  				 struct bloom_filter *filter,
>  				 uint32_t graph_pos);
> @@ -100,6 +105,17 @@ void add_key_to_filter(const struct bloom_key *key,
>  void init_bloom_filters(void);
>  void deinit_bloom_filters(void);
>  
> +struct bloom_keyvec *create_bloom_keyvec(size_t count);
> +void destroy_bloom_keyvec(struct bloom_keyvec *vec);

These functions are named very unusually for us -- the first version of
this patch series was following our coding guidelines, but this version
here isn't anymore.

 - The primary data structure that a subsystem 'S' deals with is called
   `struct S`. Functions that operate on `struct S` are named
   `S_<verb>()` and should generally receive a pointer to `struct S` as
   first parameter. E.g.

Second, the functions should probably be called `*_new()` and `*_free()`
instead of `create_*()` and `destroy_*()`.

> +static inline void fill_bloom_keyvec_key(const char *data, size_t len,
> +					 struct bloom_keyvec *vec, size_t nr,
> +					 const struct bloom_filter_settings *settings)
> +{
> +	assert(nr < vec->count);
> +	fill_bloom_key(data, len, &vec->key[nr], settings);
> +}
> +

Similarly, this should probably be called `bloom_keyvec_fill_key()`.

>  enum bloom_filter_computed {
>  	BLOOM_NOT_COMPUTED = (1 << 0),
>  	BLOOM_COMPUTED     = (1 << 1),
> @@ -137,4 +153,8 @@ int bloom_filter_contains(const struct bloom_filter *filter,
>  			  const struct bloom_key *key,
>  			  const struct bloom_filter_settings *settings);
>  
> +int bloom_filter_contains_vec(const struct bloom_filter *filter,
> +			      const struct bloom_keyvec *v,
> +			      const struct bloom_filter_settings *settings);
> +
>  #endif

This one looks alright though.

> diff --git a/revision.c b/revision.c
> index afee111196..3aa544c137 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -779,11 +782,8 @@ static int check_maybe_different_in_bloom_filter(struct rev_info *revs,
>  		return -1;
>  	}
>  
> -	for (j = 0; result && j < revs->bloom_keys_nr; j++) {
> -		result = bloom_filter_contains(filter,
> -					       &revs->bloom_keys[j],
> -					       revs->bloom_filter_settings);
> -	}
> +	result = bloom_filter_contains_vec(filter, revs->bloom_keyvecs[0],
> +					   revs->bloom_filter_settings);
>  
>  	if (result)
>  		count_bloom_filter_maybe++;

This conversion feels wrong to me. Why don't we end up iterating through
`revs->bloom_keyvecs_nr` here?  We do indeed change it back in the next
patch to use a for loop.

> @@ -3230,10 +3230,10 @@ void release_revisions(struct rev_info *revs)
>  	line_log_free(revs);
>  	oidset_clear(&revs->missing_commits);
>  
> -	for (int i = 0; i < revs->bloom_keys_nr; i++)
> -		clear_bloom_key(&revs->bloom_keys[i]);
> -	FREE_AND_NULL(revs->bloom_keys);
> -	revs->bloom_keys_nr = 0;
> +	for (int i = 0; i < revs->bloom_keyvecs_nr; i++)

It's puzzling that the number of keys is declared as `int`. It's not an
issue introduced by you, but can we maybe fix it while at it?

Patrick

  reply	other threads:[~2025-07-02 15:08 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-25 12:55 [PATCH 0/2] bloom: use bloom filter given multiple pathspec Lidong Yan
2025-06-25 12:55 ` [PATCH 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-06-25 17:43   ` Junio C Hamano
2025-06-26  3:44     ` Lidong Yan
2025-06-25 12:55 ` [PATCH 2/2] bloom: enable multiple pathspec bloom keys Lidong Yan
2025-06-27 13:50   ` Junio C Hamano
2025-06-27 14:24     ` Lidong Yan
2025-06-27 18:09       ` Junio C Hamano
2025-07-01  5:52     ` Lidong Yan
2025-07-01 15:19       ` Junio C Hamano
2025-07-02  7:14         ` Lidong Yan
2025-07-02 15:48           ` Junio C Hamano
2025-07-03  1:52             ` Lidong Yan
2025-07-04 12:09             ` Lidong Yan
2025-07-01  8:50     ` SZEDER Gábor
2025-07-01 11:40       ` Lidong Yan
2025-07-01 15:43       ` Junio C Hamano
2025-06-27 20:39   ` Junio C Hamano
2025-06-28  2:54     ` Lidong Yan
2025-06-25 17:32 ` [PATCH 0/2] bloom: use bloom filter given multiple pathspec Junio C Hamano
2025-06-26  3:34   ` Lidong Yan
2025-06-26 14:15     ` Junio C Hamano
2025-06-27  6:21 ` [PATCH v2 0/2] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Lidong Yan
2025-06-28  4:21   ` [PATCH v3 " Lidong Yan
2025-07-04 11:14     ` [PATCH v4 0/4] " Lidong Yan
2025-07-04 11:14       ` [PATCH v4 1/4] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-04 11:14       ` [PATCH v4 2/4] bloom: rename function operates on bloom_key Lidong Yan
2025-07-04 11:14       ` [PATCH v4 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-07 11:35         ` Derrick Stolee
2025-07-07 14:14           ` Lidong Yan
2025-07-04 11:14       ` [PATCH v4 4/4] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-07-07 11:43         ` Derrick Stolee
2025-07-07 14:18           ` Lidong Yan
2025-07-07 15:14           ` Junio C Hamano
2025-07-10  8:48       ` [PATCH v5 0/4] bloom: enable bloom filter optimization for multiple pathspec elements " Lidong Yan
2025-07-10  8:48         ` [PATCH v5 1/4] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-10  8:48         ` [PATCH v5 2/4] bloom: rename function operates on bloom_key Lidong Yan
2025-07-10  8:48         ` [PATCH v5 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-10 16:17           ` Junio C Hamano
2025-07-11 12:46             ` Lidong Yan
2025-07-11 15:06               ` Junio C Hamano
2025-07-10  8:48         ` [PATCH v5 4/4] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-07-10 13:51           ` [PATCH v5.1 3.5/4] revision: make helper for pathspec to bloom key Derrick Stolee
2025-07-10 15:42             ` Lidong Yan
2025-07-10 13:55           ` [PATCH v5.1 4/4] bloom: optimize multiple pathspec items in revision Derrick Stolee
2025-07-10 15:49             ` Lidong Yan
2025-07-10 13:49         ` [PATCH v5 0/4] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Derrick Stolee
2025-07-12  9:35         ` [PATCH v6 0/5] " Lidong Yan
2025-07-12  9:35           ` [PATCH v6 1/5] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-12  9:35           ` [PATCH v6 2/5] bloom: rename function operates on bloom_key Lidong Yan
2025-07-12  9:35           ` [PATCH v6 3/5] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-12  9:35           ` [PATCH v6 4/5] revision: make helper for pathspec to bloom keyvec Lidong Yan
2025-07-12  9:35           ` [PATCH v6 5/5] To enable optimize multiple pathspec items in revision traversal, return 0 if all pathspec item is literal in forbid_bloom_filters(). Add for loops to initialize and check each pathspec item's bloom_keyvec when optimization is possible Lidong Yan
2025-07-12  9:47             ` Lidong Yan
2025-07-12  9:51               ` [PATCH v6 5/5] bloom: optimize multiple pathspec items in revision Lidong Yan
2025-07-14 16:51                 ` Derrick Stolee
2025-07-14 17:01                   ` Junio C Hamano
2025-07-15  1:37                     ` Lidong Yan
2025-07-15  2:56                       ` [RESEND][PATCH " Lidong Yan
2025-07-14 16:53           ` [PATCH v6 0/5] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Derrick Stolee
2025-07-14 17:02             ` Junio C Hamano
2025-07-15  1:34             ` Lidong Yan
2025-07-15  2:48               ` Derrick Stolee
2025-07-15 15:09                 ` Junio C Hamano
2025-06-28  4:21   ` [PATCH v3 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-02 15:08     ` Patrick Steinhardt [this message]
2025-07-02 15:49       ` Lidong Yan
2025-07-02 18:28       ` Junio C Hamano
2025-07-03  1:41         ` Lidong Yan
2025-06-28  4:21   ` [PATCH v3 2/2] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-06-27  6:21 ` [PATCH v2 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-06-27  6:21 ` [PATCH v2 2/2] bloom: optimize multiple pathspec items in revision traversal Lidong Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGVLZ9VUf2M1sWhL@pks.im \
    --to=ps@pks.im \
    --cc=502024330056@smail.nju.edu.cn \
    --cc=git@vger.kernel.org \
    --cc=yldhome2d2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).