All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, Derrick Stolee <derrickstolee@github.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC PATCH 1/6] bloom: annotate filters with hash version
Date: Mon, 21 Aug 2023 16:21:49 -0400	[thread overview]
Message-ID: <ZOPHXcTMlB77CsCh@nand.local> (raw)
In-Reply-To: <ZN57Gsz+wk9n6/Da@nand.local>

On Thu, Aug 17, 2023 at 03:55:06PM -0400, Taylor Blau wrote:
> > Another thing that we might want to track is whether the Bloom filter is
> > a reference to an existing buffer (and thus does not need to be freed)
> > or a reference to a malloc-ed buffer that we must free. But both before
> > and after this patch set, a malloc-ed buffer is never overridden by a
> > reference-to-existing-buffer, so we should still be fine for now. (This
> > patch set does add a scenario in which a reference-to-existing buffer is
> > overridden by a malloc-ed buffer, but that's the only new scenario.)
>
> Yeah, I think there is some opportunity for clean-up here. I'll take a
> look...

This ended up being pretty reasonable. I'm not sure whether I should
include it here or not, since any leaks in the Bloom subsystem are
definitely not new as of this series.

But the patch is relatively straightforward anyway, so I think throwing
it on the end would be OK:

--- 8< ---
diff --git a/bloom.c b/bloom.c
index 24dd874e46..ff131893cd 100644
--- a/bloom.c
+++ b/bloom.c
@@ -59,6 +59,7 @@ int load_bloom_filter_from_graph(struct commit_graph *g,
 					sizeof(unsigned char) * start_index +
 					BLOOMDATA_CHUNK_HEADER_SIZE);
 	filter->version = g->bloom_filter_settings->hash_version;
+	filter->to_free = NULL;

 	return 1;
 }
@@ -231,6 +232,18 @@ void init_bloom_filters(void)
 	init_bloom_filter_slab(&bloom_filters);
 }

+static void free_one_bloom_filter(struct bloom_filter *filter)
+{
+	if (!filter)
+		return;
+	free(filter->to_free);
+}
+
+void deinit_bloom_filters(void)
+{
+	deep_clear_bloom_filter_slab(&bloom_filters, free_one_bloom_filter);
+}
+
 static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED,
 		       const struct hashmap_entry *eptr,
 		       const struct hashmap_entry *entry_or_key,
@@ -247,7 +260,7 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED,
 static void init_truncated_large_filter(struct bloom_filter *filter,
 					int version)
 {
-	filter->data = xmalloc(1);
+	filter->data = filter->to_free = xmalloc(1);
 	filter->data[0] = 0xFF;
 	filter->len = 1;
 	filter->version = version;
@@ -449,6 +462,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 			filter->len = 1;
 		}
 		CALLOC_ARRAY(filter->data, filter->len);
+		filter->to_free = filter->data;

 		hashmap_for_each_entry(&pathmap, &iter, e, entry) {
 			struct bloom_key key;
diff --git a/bloom.h b/bloom.h
index 4462fc3908..c1d74d63e6 100644
--- a/bloom.h
+++ b/bloom.h
@@ -56,6 +56,8 @@ struct bloom_filter {
 	unsigned char *data;
 	size_t len;
 	int version;
+
+	void *to_free;
 };

 /*
@@ -96,6 +98,7 @@ void add_key_to_filter(const struct bloom_key *key,
 		       const struct bloom_filter_settings *settings);

 void init_bloom_filters(void);
+void deinit_bloom_filters(void);

 enum bloom_filter_computed {
 	BLOOM_NOT_COMPUTED = (1 << 0),
diff --git a/commit-graph.c b/commit-graph.c
index 183ed90b6d..f22f2d350d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -2532,6 +2532,9 @@ int write_commit_graph(struct object_directory *odb,

 	res = write_commit_graph_file(ctx);

+	if (ctx->changed_paths)
+		deinit_bloom_filters();
+
 	if (ctx->split)
 		mark_commit_graphs(ctx);

--- >8 ---

Thanks,
Taylor

  reply	other threads:[~2023-08-21 20:21 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-07 16:37 [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 1/6] bloom: annotate filters with hash version Taylor Blau
2023-08-11 21:46   ` Jonathan Tan
2023-08-17 19:55     ` Taylor Blau
2023-08-21 20:21       ` Taylor Blau [this message]
2023-08-07 16:37 ` [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-11 21:48   ` Jonathan Tan
2023-08-21 20:23     ` Taylor Blau
2023-08-24 22:20   ` Jonathan Tan
2023-08-24 22:47     ` Taylor Blau
2023-08-24 23:05       ` Jonathan Tan
2023-08-25 19:00         ` Taylor Blau
2023-08-29 16:49           ` Jonathan Tan
2023-08-29 19:14             ` Taylor Blau
2023-08-29 22:04               ` Jonathan Tan
2023-08-07 16:37 ` [RFC PATCH 3/6] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 4/6] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-11 22:00   ` Jonathan Tan
2023-08-21 20:40     ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 5/6] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 6/6] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-11 22:06   ` Jonathan Tan
2023-08-11 22:13 ` [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Jonathan Tan
2023-08-21 20:46   ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZOPHXcTMlB77CsCh@nand.local \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.