Git development

Git development
 help / color / mirror / Atom feed

* [PATCH v4 09/17] repo-settings: introduce commitgraph.changedPathsVersion
From: Taylor Blau @ 2023-10-18 18:32 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697653929.git.me@ttaylorr.com>

From: Jonathan Tan <jonathantanmy@google.com>

A subsequent commit will introduce another version of the changed-path
filter in the commit graph file. In order to control which version to
write (and read), a config variable is needed.

Therefore, introduce this config variable. For forwards compatibility,
teach Git to not read commit graphs when the config variable
is set to an unsupported version. Because we teach Git this,
commitgraph.readChangedPaths is now redundant, so deprecate it and
define its behavior in terms of the config variable we introduce.

This commit does not change the behavior of writing (Git writes changed
path filters when explicitly instructed regardless of any config
variable), but a subsequent commit will restrict Git such that it will
only write when commitgraph.changedPathsVersion is a recognized value.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/commitgraph.txt | 23 ++++++++++++++++++++---
 commit-graph.c                       |  2 +-
 oss-fuzz/fuzz-commit-graph.c         |  2 +-
 repo-settings.c                      |  6 +++++-
 repository.h                         |  2 +-
 5 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt
index 30604e4a4c..2dc9170622 100644
--- a/Documentation/config/commitgraph.txt
+++ b/Documentation/config/commitgraph.txt
@@ -9,6 +9,23 @@ commitGraph.maxNewFilters::
 	commit-graph write` (c.f., linkgit:git-commit-graph[1]).
 
 commitGraph.readChangedPaths::
-	If true, then git will use the changed-path Bloom filters in the
-	commit-graph file (if it exists, and they are present). Defaults to
-	true. See linkgit:git-commit-graph[1] for more information.
+	Deprecated. Equivalent to commitGraph.changedPathsVersion=-1 if true, and
+	commitGraph.changedPathsVersion=0 if false. (If commitGraph.changedPathVersion
+	is also set, commitGraph.changedPathsVersion takes precedence.)
+
+commitGraph.changedPathsVersion::
+	Specifies the version of the changed-path Bloom filters that Git will read and
+	write. May be -1, 0 or 1.
++
+Defaults to -1.
++
+If -1, Git will use the version of the changed-path Bloom filters in the
+repository, defaulting to 1 if there are none.
++
+If 0, Git will not read any Bloom filters, and will write version 1 Bloom
+filters when instructed to write.
++
+If 1, Git will only read version 1 Bloom filters, and will write version 1
+Bloom filters.
++
+See linkgit:git-commit-graph[1] for more information.
diff --git a/commit-graph.c b/commit-graph.c
index 0ac79aff5a..bcc9a15cfa 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -411,7 +411,7 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 			graph->read_generation_data = 1;
 	}
 
-	if (s->commit_graph_read_changed_paths) {
+	if (s->commit_graph_changed_paths_version) {
 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
 			   &graph->chunk_bloom_indexes);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
diff --git a/oss-fuzz/fuzz-commit-graph.c b/oss-fuzz/fuzz-commit-graph.c
index 2992079dd9..325c0b991a 100644
--- a/oss-fuzz/fuzz-commit-graph.c
+++ b/oss-fuzz/fuzz-commit-graph.c
@@ -19,7 +19,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
 	 * possible.
 	 */
 	the_repository->settings.commit_graph_generation_version = 2;
-	the_repository->settings.commit_graph_read_changed_paths = 1;
+	the_repository->settings.commit_graph_changed_paths_version = 1;
 	g = parse_commit_graph(&the_repository->settings, (void *)data, size);
 	repo_clear(the_repository);
 	free_commit_graph(g);
diff --git a/repo-settings.c b/repo-settings.c
index 525f69c0c7..db8fe817f3 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -24,6 +24,7 @@ void prepare_repo_settings(struct repository *r)
 	int value;
 	const char *strval;
 	int manyfiles;
+	int read_changed_paths;
 
 	if (!r->gitdir)
 		BUG("Cannot add settings for uninitialized repository");
@@ -54,7 +55,10 @@ void prepare_repo_settings(struct repository *r)
 	/* Commit graph config or default, does not cascade (simple) */
 	repo_cfg_bool(r, "core.commitgraph", &r->settings.core_commit_graph, 1);
 	repo_cfg_int(r, "commitgraph.generationversion", &r->settings.commit_graph_generation_version, 2);
-	repo_cfg_bool(r, "commitgraph.readchangedpaths", &r->settings.commit_graph_read_changed_paths, 1);
+	repo_cfg_bool(r, "commitgraph.readchangedpaths", &read_changed_paths, 1);
+	repo_cfg_int(r, "commitgraph.changedpathsversion",
+		     &r->settings.commit_graph_changed_paths_version,
+		     read_changed_paths ? -1 : 0);
 	repo_cfg_bool(r, "gc.writecommitgraph", &r->settings.gc_write_commit_graph, 1);
 	repo_cfg_bool(r, "fetch.writecommitgraph", &r->settings.fetch_write_commit_graph, 0);
 
diff --git a/repository.h b/repository.h
index 5f18486f64..f71154e12c 100644
--- a/repository.h
+++ b/repository.h
@@ -29,7 +29,7 @@ struct repo_settings {
 
 	int core_commit_graph;
 	int commit_graph_generation_version;
-	int commit_graph_read_changed_paths;
+	int commit_graph_changed_paths_version;
 	int gc_write_commit_graph;
 	int fetch_write_commit_graph;
 	int command_requires_full_index;
-- 
2.42.0.415.g8942f205c8


^ permalink raw reply related

* [PATCH v4 10/17] commit-graph: new filter ver. that fixes murmur3
From: Taylor Blau @ 2023-10-18 18:32 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697653929.git.me@ttaylorr.com>

From: Jonathan Tan <jonathantanmy@google.com>

The murmur3 implementation in bloom.c has a bug when converting series
of 4 bytes into network-order integers when char is signed (which is
controllable by a compiler option, and the default signedness of char is
platform-specific). When a string contains characters with the high bit
set, this bug causes results that, although internally consistent within
Git, does not accord with other implementations of murmur3 (thus,
the changed path filters wouldn't be readable by other off-the-shelf
implementatios of murmur3) and even with Git binaries that were compiled
with different signedness of char. This bug affects both how Git writes
changed path filters to disk and how Git interprets changed path filters
on disk.

Therefore, introduce a new version (2) of changed path filters that
corrects this problem. The existing version (1) is still supported and
is still the default, but users should migrate away from it as soon
as possible.

Because this bug only manifests with characters that have the high bit
set, it may be possible that some (or all) commits in a given repo would
have the same changed path filter both before and after this fix is
applied. However, in order to determine whether this is the case, the
changed paths would first have to be computed, at which point it is not
much more expensive to just compute a new changed path filter.

So this patch does not include any mechanism to "salvage" changed path
filters from repositories. There is also no "mixed" mode - for each
invocation of Git, reading and writing changed path filters are done
with the same version number; this version number may be explicitly
stated (typically if the user knows which version they need) or
automatically determined from the version of the existing changed path
filters in the repository.

There is a change in write_commit_graph(). graph_read_bloom_data()
makes it possible for chunk_bloom_data to be non-NULL but
bloom_filter_settings to be NULL, which causes a segfault later on. I
produced such a segfault while developing this patch, but couldn't find
a way to reproduce it neither after this complete patch (or before),
but in any case it seemed like a good thing to include that might help
future patch authors.

The value in t0095 was obtained from another murmur3 implementation
using the following Go source code:

  package main

  import "fmt"
  import "github.com/spaolacci/murmur3"

  func main() {
          fmt.Printf("%x\n", murmur3.Sum32([]byte("Hello world!")))
          fmt.Printf("%x\n", murmur3.Sum32([]byte{0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff}))
  }

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/commitgraph.txt |   5 +-
 bloom.c                              |  69 +++++++++++++++-
 bloom.h                              |   8 +-
 commit-graph.c                       |  32 ++++++--
 t/helper/test-bloom.c                |   9 ++-
 t/t0095-bloom.sh                     |   8 ++
 t/t4216-log-bloom.sh                 | 114 +++++++++++++++++++++++++++
 7 files changed, 232 insertions(+), 13 deletions(-)

diff --git a/Documentation/config/commitgraph.txt b/Documentation/config/commitgraph.txt
index 2dc9170622..acc74a2f27 100644
--- a/Documentation/config/commitgraph.txt
+++ b/Documentation/config/commitgraph.txt
@@ -15,7 +15,7 @@ commitGraph.readChangedPaths::
 
 commitGraph.changedPathsVersion::
 	Specifies the version of the changed-path Bloom filters that Git will read and
-	write. May be -1, 0 or 1.
+	write. May be -1, 0, 1, or 2.
 +
 Defaults to -1.
 +
@@ -28,4 +28,7 @@ filters when instructed to write.
 If 1, Git will only read version 1 Bloom filters, and will write version 1
 Bloom filters.
 +
+If 2, Git will only read version 2 Bloom filters, and will write version 2
+Bloom filters.
++
 See linkgit:git-commit-graph[1] for more information.
diff --git a/bloom.c b/bloom.c
index 3e78cfe79d..ebef5cfd2f 100644
--- a/bloom.c
+++ b/bloom.c
@@ -66,7 +66,64 @@ int load_bloom_filter_from_graph(struct commit_graph *g,
  * Not considered to be cryptographically secure.
  * Implemented as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm
  */
-uint32_t murmur3_seeded(uint32_t seed, const char *data, size_t len)
+uint32_t murmur3_seeded_v2(uint32_t seed, const char *data, size_t len)
+{
+	const uint32_t c1 = 0xcc9e2d51;
+	const uint32_t c2 = 0x1b873593;
+	const uint32_t r1 = 15;
+	const uint32_t r2 = 13;
+	const uint32_t m = 5;
+	const uint32_t n = 0xe6546b64;
+	int i;
+	uint32_t k1 = 0;
+	const char *tail;
+
+	int len4 = len / sizeof(uint32_t);
+
+	uint32_t k;
+	for (i = 0; i < len4; i++) {
+		uint32_t byte1 = (uint32_t)(unsigned char)data[4*i];
+		uint32_t byte2 = ((uint32_t)(unsigned char)data[4*i + 1]) << 8;
+		uint32_t byte3 = ((uint32_t)(unsigned char)data[4*i + 2]) << 16;
+		uint32_t byte4 = ((uint32_t)(unsigned char)data[4*i + 3]) << 24;
+		k = byte1 | byte2 | byte3 | byte4;
+		k *= c1;
+		k = rotate_left(k, r1);
+		k *= c2;
+
+		seed ^= k;
+		seed = rotate_left(seed, r2) * m + n;
+	}
+
+	tail = (data + len4 * sizeof(uint32_t));
+
+	switch (len & (sizeof(uint32_t) - 1)) {
+	case 3:
+		k1 ^= ((uint32_t)(unsigned char)tail[2]) << 16;
+		/*-fallthrough*/
+	case 2:
+		k1 ^= ((uint32_t)(unsigned char)tail[1]) << 8;
+		/*-fallthrough*/
+	case 1:
+		k1 ^= ((uint32_t)(unsigned char)tail[0]) << 0;
+		k1 *= c1;
+		k1 = rotate_left(k1, r1);
+		k1 *= c2;
+		seed ^= k1;
+		break;
+	}
+
+	seed ^= (uint32_t)len;
+	seed ^= (seed >> 16);
+	seed *= 0x85ebca6b;
+	seed ^= (seed >> 13);
+	seed *= 0xc2b2ae35;
+	seed ^= (seed >> 16);
+
+	return seed;
+}
+
+static uint32_t murmur3_seeded_v1(uint32_t seed, const char *data, size_t len)
 {
 	const uint32_t c1 = 0xcc9e2d51;
 	const uint32_t c2 = 0x1b873593;
@@ -131,8 +188,14 @@ void fill_bloom_key(const char *data,
 	int i;
 	const uint32_t seed0 = 0x293ae76f;
 	const uint32_t seed1 = 0x7e646e2c;
-	const uint32_t hash0 = murmur3_seeded(seed0, data, len);
-	const uint32_t hash1 = murmur3_seeded(seed1, data, len);
+	uint32_t hash0, hash1;
+	if (settings->hash_version == 2) {
+		hash0 = murmur3_seeded_v2(seed0, data, len);
+		hash1 = murmur3_seeded_v2(seed1, data, len);
+	} else {
+		hash0 = murmur3_seeded_v1(seed0, data, len);
+		hash1 = murmur3_seeded_v1(seed1, data, len);
+	}
 
 	key->hashes = (uint32_t *)xcalloc(settings->num_hashes, sizeof(uint32_t));
 	for (i = 0; i < settings->num_hashes; i++)
diff --git a/bloom.h b/bloom.h
index 1e4f612d2c..138d57a86b 100644
--- a/bloom.h
+++ b/bloom.h
@@ -8,9 +8,11 @@ struct commit_graph;
 struct bloom_filter_settings {
 	/*
 	 * The version of the hashing technique being used.
-	 * We currently only support version = 1 which is
+	 * The newest version is 2, which is
 	 * the seeded murmur3 hashing technique implemented
-	 * in bloom.c.
+	 * in bloom.c. Bloom filters of version 1 were created
+	 * with prior versions of Git, which had a bug in the
+	 * implementation of the hash function.
 	 */
 	uint32_t hash_version;
 
@@ -80,7 +82,7 @@ int load_bloom_filter_from_graph(struct commit_graph *g,
  * Not considered to be cryptographically secure.
  * Implemented as described in https://en.wikipedia.org/wiki/MurmurHash#Algorithm
  */
-uint32_t murmur3_seeded(uint32_t seed, const char *data, size_t len);
+uint32_t murmur3_seeded_v2(uint32_t seed, const char *data, size_t len);
 
 void fill_bloom_key(const char *data,
 		    size_t len,
diff --git a/commit-graph.c b/commit-graph.c
index bcc9a15cfa..6b21b17b20 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -314,17 +314,26 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
 	return 0;
 }
 
+struct graph_read_bloom_data_context {
+	struct commit_graph *g;
+	int *commit_graph_changed_paths_version;
+};
+
 static int graph_read_bloom_data(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
-	struct commit_graph *g = data;
+	struct graph_read_bloom_data_context *c = data;
+	struct commit_graph *g = c->g;
 	uint32_t hash_version;
-	g->chunk_bloom_data = chunk_start;
 	hash_version = get_be32(chunk_start);
 
-	if (hash_version != 1)
+	if (*c->commit_graph_changed_paths_version == -1) {
+		*c->commit_graph_changed_paths_version = hash_version;
+	} else if (hash_version != *c->commit_graph_changed_paths_version) {
 		return 0;
+	}
 
+	g->chunk_bloom_data = chunk_start;
 	g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
 	g->bloom_filter_settings->hash_version = hash_version;
 	g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4);
@@ -412,10 +421,14 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 	}
 
 	if (s->commit_graph_changed_paths_version) {
+		struct graph_read_bloom_data_context context = {
+			.g = graph,
+			.commit_graph_changed_paths_version = &s->commit_graph_changed_paths_version
+		};
 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
 			   &graph->chunk_bloom_indexes);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
-			   graph_read_bloom_data, graph);
+			   graph_read_bloom_data, &context);
 	}
 
 	if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) {
@@ -2436,6 +2449,13 @@ int write_commit_graph(struct object_directory *odb,
 	}
 	if (!commit_graph_compatible(r))
 		return 0;
+	if (r->settings.commit_graph_changed_paths_version < -1
+	    || r->settings.commit_graph_changed_paths_version > 2) {
+		warning(_("attempting to write a commit-graph, but "
+			  "'commitgraph.changedPathsVersion' (%d) is not supported"),
+			r->settings.commit_graph_changed_paths_version);
+		return 0;
+	}
 
 	CALLOC_ARRAY(ctx, 1);
 	ctx->r = r;
@@ -2448,6 +2468,8 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
 	ctx->num_generation_data_overflows = 0;
 
+	bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2
+		? 2 : 1;
 	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
 						      bloom_settings.bits_per_entry);
 	bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
@@ -2477,7 +2499,7 @@ int write_commit_graph(struct object_directory *odb,
 		g = ctx->r->objects->commit_graph;
 
 		/* We have changed-paths already. Keep them in the next graph */
-		if (g && g->chunk_bloom_data) {
+		if (g && g->bloom_filter_settings) {
 			ctx->changed_paths = 1;
 			ctx->bloom_settings = g->bloom_filter_settings;
 		}
diff --git a/t/helper/test-bloom.c b/t/helper/test-bloom.c
index aabe31d724..3cbc0a5b50 100644
--- a/t/helper/test-bloom.c
+++ b/t/helper/test-bloom.c
@@ -50,6 +50,7 @@ static void get_bloom_filter_for_commit(const struct object_id *commit_oid)
 
 static const char *bloom_usage = "\n"
 "  test-tool bloom get_murmur3 <string>\n"
+"  test-tool bloom get_murmur3_seven_highbit\n"
 "  test-tool bloom generate_filter <string> [<string>...]\n"
 "  test-tool bloom get_filter_for_commit <commit-hex>\n";
 
@@ -64,7 +65,13 @@ int cmd__bloom(int argc, const char **argv)
 		uint32_t hashed;
 		if (argc < 3)
 			usage(bloom_usage);
-		hashed = murmur3_seeded(0, argv[2], strlen(argv[2]));
+		hashed = murmur3_seeded_v2(0, argv[2], strlen(argv[2]));
+		printf("Murmur3 Hash with seed=0:0x%08x\n", hashed);
+	}
+
+	if (!strcmp(argv[1], "get_murmur3_seven_highbit")) {
+		uint32_t hashed;
+		hashed = murmur3_seeded_v2(0, "\x99\xaa\xbb\xcc\xdd\xee\xff", 7);
 		printf("Murmur3 Hash with seed=0:0x%08x\n", hashed);
 	}
 
diff --git a/t/t0095-bloom.sh b/t/t0095-bloom.sh
index b567383eb8..c8d84ab606 100755
--- a/t/t0095-bloom.sh
+++ b/t/t0095-bloom.sh
@@ -29,6 +29,14 @@ test_expect_success 'compute unseeded murmur3 hash for test string 2' '
 	test_cmp expect actual
 '
 
+test_expect_success 'compute unseeded murmur3 hash for test string 3' '
+	cat >expect <<-\EOF &&
+	Murmur3 Hash with seed=0:0xa183ccfd
+	EOF
+	test-tool bloom get_murmur3_seven_highbit >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'compute bloom key for empty string' '
 	cat >expect <<-\EOF &&
 	Hashes:0x5615800c|0x5b966560|0x61174ab4|0x66983008|0x6c19155c|0x7199fab0|0x771ae004|
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 400dce2193..68066b7928 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -535,4 +535,118 @@ test_expect_success 'version 1 changed-path used when version 1 requested' '
 	)
 '
 
+test_expect_success 'version 1 changed-path not used when version 2 requested' '
+	(
+		cd highbit1 &&
+		git config --add commitgraph.changedPathsVersion 2 &&
+		test_bloom_filters_not_used "-- another$CENT"
+	)
+'
+
+test_expect_success 'version 1 changed-path used when autodetect requested' '
+	(
+		cd highbit1 &&
+		git config --add commitgraph.changedPathsVersion -1 &&
+		test_bloom_filters_used "-- another$CENT"
+	)
+'
+
+test_expect_success 'when writing another commit graph, preserve existing version 1 of changed-path' '
+	test_commit -C highbit1 c1double "$CENT$CENT" &&
+	git -C highbit1 commit-graph write --reachable --changed-paths &&
+	(
+		cd highbit1 &&
+		git config --add commitgraph.changedPathsVersion -1 &&
+		echo "options: bloom(1,10,7) read_generation_data" >expect &&
+		test-tool read-graph >full &&
+		grep options full >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'set up repo with high bit path, version 2 changed-path' '
+	git init highbit2 &&
+	git -C highbit2 config --add commitgraph.changedPathsVersion 2 &&
+	test_commit -C highbit2 c2 "$CENT" &&
+	git -C highbit2 commit-graph write --reachable --changed-paths
+'
+
+test_expect_success 'check value of version 2 changed-path' '
+	(
+		cd highbit2 &&
+		echo "c01f" >expect &&
+		get_first_changed_path_filter >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'setup make another commit' '
+	# "git log" does not use Bloom filters for root commits - see how, in
+	# revision.c, rev_compare_tree() (the only code path that eventually calls
+	# get_bloom_filter()) is only called by try_to_simplify_commit() when the commit
+	# has one parent. Therefore, make another commit so that we perform the tests on
+	# a non-root commit.
+	test_commit -C highbit2 anotherc2 "another$CENT"
+'
+
+test_expect_success 'version 2 changed-path used when version 2 requested' '
+	(
+		cd highbit2 &&
+		test_bloom_filters_used "-- another$CENT"
+	)
+'
+
+test_expect_success 'version 2 changed-path not used when version 1 requested' '
+	(
+		cd highbit2 &&
+		git config --add commitgraph.changedPathsVersion 1 &&
+		test_bloom_filters_not_used "-- another$CENT"
+	)
+'
+
+test_expect_success 'version 2 changed-path used when autodetect requested' '
+	(
+		cd highbit2 &&
+		git config --add commitgraph.changedPathsVersion -1 &&
+		test_bloom_filters_used "-- another$CENT"
+	)
+'
+
+test_expect_success 'when writing another commit graph, preserve existing version 2 of changed-path' '
+	test_commit -C highbit2 c2double "$CENT$CENT" &&
+	git -C highbit2 commit-graph write --reachable --changed-paths &&
+	(
+		cd highbit2 &&
+		git config --add commitgraph.changedPathsVersion -1 &&
+		echo "options: bloom(2,10,7) read_generation_data" >expect &&
+		test-tool read-graph >full &&
+		grep options full >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success 'when writing commit graph, do not reuse changed-path of another version' '
+	git init doublewrite &&
+	test_commit -C doublewrite c "$CENT" &&
+	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
+	git -C doublewrite commit-graph write --reachable --changed-paths &&
+	for v in -2 3
+	do
+		git -C doublewrite config --add commitgraph.changedPathsVersion $v &&
+		git -C doublewrite commit-graph write --reachable --changed-paths 2>err &&
+		cat >expect <<-EOF &&
+		warning: attempting to write a commit-graph, but ${SQ}commitgraph.changedPathsVersion${SQ} ($v) is not supported
+		EOF
+		test_cmp expect err || return 1
+	done &&
+	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
+	git -C doublewrite commit-graph write --reachable --changed-paths &&
+	(
+		cd doublewrite &&
+		echo "c01f" >expect &&
+		get_first_changed_path_filter >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.42.0.415.g8942f205c8


^ permalink raw reply related

* [PATCH v4 11/17] bloom: annotate filters with hash version
From: Taylor Blau @ 2023-10-18 18:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697653929.git.me@ttaylorr.com>

In subsequent commits, we will want to load existing Bloom filters out
of a commit-graph, even when the hash version they were computed with
does not match the value of `commitGraph.changedPathVersion`.

In order to differentiate between the two, add a "version" field to each
Bloom filter.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bloom.c | 11 ++++++++---
 bloom.h |  1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/bloom.c b/bloom.c
index ebef5cfd2f..9b6a30f6f6 100644
--- a/bloom.c
+++ b/bloom.c
@@ -55,6 +55,7 @@ int load_bloom_filter_from_graph(struct commit_graph *g,
 	filter->data = (unsigned char *)(g->chunk_bloom_data +
 					sizeof(unsigned char) * start_index +
 					BLOOMDATA_CHUNK_HEADER_SIZE);
+	filter->version = g->bloom_filter_settings->hash_version;
 
 	return 1;
 }
@@ -240,11 +241,13 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED,
 	return strcmp(e1->path, e2->path);
 }
 
-static void init_truncated_large_filter(struct bloom_filter *filter)
+static void init_truncated_large_filter(struct bloom_filter *filter,
+					int version)
 {
 	filter->data = xmalloc(1);
 	filter->data[0] = 0xFF;
 	filter->len = 1;
+	filter->version = version;
 }
 
 struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
@@ -329,13 +332,15 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 		}
 
 		if (hashmap_get_size(&pathmap) > settings->max_changed_paths) {
-			init_truncated_large_filter(filter);
+			init_truncated_large_filter(filter,
+						    settings->hash_version);
 			if (computed)
 				*computed |= BLOOM_TRUNC_LARGE;
 			goto cleanup;
 		}
 
 		filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD;
+		filter->version = settings->hash_version;
 		if (!filter->len) {
 			if (computed)
 				*computed |= BLOOM_TRUNC_EMPTY;
@@ -355,7 +360,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	} else {
 		for (i = 0; i < diff_queued_diff.nr; i++)
 			diff_free_filepair(diff_queued_diff.queue[i]);
-		init_truncated_large_filter(filter);
+		init_truncated_large_filter(filter, settings->hash_version);
 
 		if (computed)
 			*computed |= BLOOM_TRUNC_LARGE;
diff --git a/bloom.h b/bloom.h
index 138d57a86b..330a140520 100644
--- a/bloom.h
+++ b/bloom.h
@@ -55,6 +55,7 @@ struct bloom_filter_settings {
 struct bloom_filter {
 	unsigned char *data;
 	size_t len;
+	int version;
 };
 
 /*
-- 
2.42.0.415.g8942f205c8


^ permalink raw reply related

* [PATCH v4 12/17] bloom: prepare to discard incompatible Bloom filters
From: Taylor Blau @ 2023-10-18 18:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697653929.git.me@ttaylorr.com>

Callers use the inline `get_bloom_filter()` implementation as a thin
wrapper around `get_or_compute_bloom_filter()`. The former calls the
latter with a value of "0" for `compute_if_not_present`, making
`get_bloom_filter()` the default read-only path for fetching an existing
Bloom filter.

Callers expect the value returned from `get_bloom_filter()` is usable,
that is that it's compatible with the configured value corresponding to
`commitGraph.changedPathsVersion`.

This is OK, since the commit-graph machinery only initializes its BDAT
chunk (thereby enabling it to service Bloom filter queries) when the
Bloom filter hash_version is compatible with our settings. So any value
returned by `get_bloom_filter()` is trivially useable.

However, subsequent commits will load the BDAT chunk even when the Bloom
filters are built with incompatible hash versions. Prepare to handle
this by teaching `get_bloom_filter()` to discard filters that are
incompatible with the configured hash version.

Callers who wish to read incompatible filters (e.g., for upgrading
filters from v1 to v2) may use the lower level routine,
`get_or_compute_bloom_filter()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bloom.c | 20 +++++++++++++++++++-
 bloom.h | 20 ++++++++++++++++++--
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/bloom.c b/bloom.c
index 9b6a30f6f6..739fa093ba 100644
--- a/bloom.c
+++ b/bloom.c
@@ -250,6 +250,23 @@ static void init_truncated_large_filter(struct bloom_filter *filter,
 	filter->version = version;
 }
 
+struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)
+{
+	struct bloom_filter *filter;
+	int hash_version;
+
+	filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL);
+	if (!filter)
+		return NULL;
+
+	prepare_repo_settings(r);
+	hash_version = r->settings.commit_graph_changed_paths_version;
+
+	if (!(hash_version == -1 || hash_version == filter->version))
+		return NULL; /* unusable filter */
+	return filter;
+}
+
 struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 						 struct commit *c,
 						 int compute_if_not_present,
@@ -275,7 +292,8 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 						     filter, graph_pos);
 	}
 
-	if (filter->data && filter->len)
+	if ((filter->data && filter->len) &&
+	    (!settings || settings->hash_version == filter->version))
 		return filter;
 	if (!compute_if_not_present)
 		return NULL;
diff --git a/bloom.h b/bloom.h
index 330a140520..bfe389e29c 100644
--- a/bloom.h
+++ b/bloom.h
@@ -110,8 +110,24 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 						 const struct bloom_filter_settings *settings,
 						 enum bloom_filter_computed *computed);
 
-#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \
-	(r), (c), 0, NULL, NULL)
+/*
+ * Find the Bloom filter associated with the given commit "c".
+ *
+ * If any of the following are true
+ *
+ *   - the repository does not have a commit-graph, or
+ *   - the repository disables reading from the commit-graph, or
+ *   - the given commit does not have a Bloom filter computed, or
+ *   - there is a Bloom filter for commit "c", but it cannot be read
+ *     because the filter uses an incompatible version of murmur3
+ *
+ * , then `get_bloom_filter()` will return NULL. Otherwise, the corresponding
+ * Bloom filter will be returned.
+ *
+ * For callers who wish to inspect Bloom filters with incompatible hash
+ * versions, use get_or_compute_bloom_filter().
+ */
+struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c);
 
 int bloom_filter_contains(const struct bloom_filter *filter,
 			  const struct bloom_key *key,
-- 
2.42.0.415.g8942f205c8


^ permalink raw reply related

* [PATCH v4 13/17] commit-graph.c: unconditionally load Bloom filters
From: Taylor Blau @ 2023-10-18 18:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697653929.git.me@ttaylorr.com>

In 9e4df4da07 (commit-graph: new filter ver. that fixes murmur3,
2023-08-01), we began ignoring the Bloom data ("BDAT") chunk for
commit-graphs whose Bloom filters were computed using a hash version
incompatible with the value of `commitGraph.changedPathVersion`.

Now that the Bloom API has been hardened to discard these incompatible
filters (with the exception of low-level APIs), we can safely load these
Bloom filters unconditionally.

We no longer want to return early from `graph_read_bloom_data()`, and
similarly do not want to set the bloom_settings' `hash_version` field as
a side-effect. The latter is because we want to wait until we know which
Bloom settings we're using (either the defaults, from the GIT_TEST
variables, or from the previous commit-graph layer) before deciding what
hash_version to use.

If we detect an existing BDAT chunk, we'll infer the rest of the
settings (e.g., number of hashes, bits per entry, and maximum number of
changed paths) from the earlier graph layer. The hash_version will be
inferred from the previous layer as well, unless one has already been
specified via configuration.

Once all of that is done, we normalize the value of the hash_version to
either "1" or "2".

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6b21b17b20..7d0fb32107 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -327,12 +327,6 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,
 	uint32_t hash_version;
 	hash_version = get_be32(chunk_start);

-	if (*c->commit_graph_changed_paths_version == -1) {
-		*c->commit_graph_changed_paths_version = hash_version;
-	} else if (hash_version != *c->commit_graph_changed_paths_version) {
-		return 0;
-	}
-
 	g->chunk_bloom_data = chunk_start;
 	g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
 	g->bloom_filter_settings->hash_version = hash_version;
@@ -2468,8 +2462,7 @@ int write_commit_graph(struct object_directory *odb,
 	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
 	ctx->num_generation_data_overflows = 0;

-	bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2
-		? 2 : 1;
+	bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version;
 	bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
 						      bloom_settings.bits_per_entry);
 	bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
@@ -2501,10 +2494,18 @@ int write_commit_graph(struct object_directory *odb,
 		/* We have changed-paths already. Keep them in the next graph */
 		if (g && g->bloom_filter_settings) {
 			ctx->changed_paths = 1;
-			ctx->bloom_settings = g->bloom_filter_settings;
+
+			/* don't propagate the hash_version unless unspecified */
+			if (bloom_settings.hash_version == -1)
+				bloom_settings.hash_version = g->bloom_filter_settings->hash_version;
+			bloom_settings.bits_per_entry = g->bloom_filter_settings->bits_per_entry;
+			bloom_settings.num_hashes = g->bloom_filter_settings->num_hashes;
+			bloom_settings.max_changed_paths = g->bloom_filter_settings->max_changed_paths;
 		}
 	}

+	bloom_settings.hash_version = bloom_settings.hash_version == 2 ? 2 : 1;
+
 	if (ctx->split) {
 		struct commit_graph *g = ctx->r->objects->commit_graph;

-- 
2.42.0.415.g8942f205c8

^ permalink raw reply related

* [PATCH v4 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context`
From: Taylor Blau @ 2023-10-18 18:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697653929.git.me@ttaylorr.com>

The `graph_read_bloom_data_context` struct was introduced in an earlier
commit in order to pass pointers to the commit-graph and changed-path
Bloom filter version when reading the BDAT chunk.

The previous commit no longer writes through the changed_paths_version
pointer, making the surrounding context structure unnecessary. Drop it
and pass a pointer to the commit-graph directly when reading the BDAT
chunk.

Noticed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7d0fb32107..b70d57b085 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -314,16 +314,10 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start,
 	return 0;
 }
 
-struct graph_read_bloom_data_context {
-	struct commit_graph *g;
-	int *commit_graph_changed_paths_version;
-};
-
 static int graph_read_bloom_data(const unsigned char *chunk_start,
 				  size_t chunk_size, void *data)
 {
-	struct graph_read_bloom_data_context *c = data;
-	struct commit_graph *g = c->g;
+	struct commit_graph *g = data;
 	uint32_t hash_version;
 	hash_version = get_be32(chunk_start);
 
@@ -415,14 +409,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 	}
 
 	if (s->commit_graph_changed_paths_version) {
-		struct graph_read_bloom_data_context context = {
-			.g = graph,
-			.commit_graph_changed_paths_version = &s->commit_graph_changed_paths_version
-		};
 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
 			   &graph->chunk_bloom_indexes);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
-			   graph_read_bloom_data, &context);
+			   graph_read_bloom_data, graph);
 	}
 
 	if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) {
-- 
2.42.0.415.g8942f205c8


^ permalink raw reply related

* [PATCH v4 15/17] object.h: fix mis-aligned flag bits table
From: Taylor Blau @ 2023-10-18 18:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697653929.git.me@ttaylorr.com>

Bit position 23 is one column too far to the left.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 object.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object.h b/object.h
index 114d45954d..db25714b4e 100644
--- a/object.h
+++ b/object.h
@@ -62,7 +62,7 @@ void object_array_init(struct object_array *array);
 
 /*
  * object flag allocation:
- * revision.h:               0---------10         15             23------27
+ * revision.h:               0---------10         15               23------27
  * fetch-pack.c:             01    67
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
-- 
2.42.0.415.g8942f205c8


^ permalink raw reply related

* [PATCH v4 16/17] commit-graph: reuse existing Bloom filters where possible
From: Taylor Blau @ 2023-10-18 18:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697653929.git.me@ttaylorr.com>

In 9e4df4da07 (commit-graph: new filter ver. that fixes murmur3,
2023-08-01), a bug was described where it's possible for Git to produce
non-murmur3 hashes when the platform's "char" type is signed, and there
are paths with characters whose highest bit is set (i.e. all characters
>= 0x80).

That patch allows the caller to control which version of Bloom filters
are read and written. However, even on platforms with a signed "char"
type, it is possible to reuse existing Bloom filters if and only if
there are no changed paths in any commit's first parent tree-diff whose
characters have their highest bit set.

When this is the case, we can reuse the existing filter without having
to compute a new one. This is done by marking trees which are known to
have (or not have) any such paths. When a commit's root tree is verified
to not have any such paths, we mark it as such and declare that the
commit's Bloom filter is reusable.

Note that this heuristic only goes in one direction. If neither a commit
nor its first parent have any paths in their trees with non-ASCII
characters, then we know for certain that a path with non-ASCII
characters will not appear in a tree-diff against that commit's first
parent. The reverse isn't necessarily true: just because the tree-diff
doesn't contain any such paths does not imply that no such paths exist
in either tree.

So we end up recomputing some Bloom filters that we don't strictly have
to (i.e. their bits are the same no matter which version of murmur3 we
use). But culling these out is impossible, since we'd have to perform
the full tree-diff, which is the same effort as computing the Bloom
filter from scratch.

But because we can cache our results in each tree's flag bits, we can
often avoid recomputing many filters, thereby reducing the time it takes
to run

    $ git commit-graph write --changed-paths --reachable

when upgrading from v1 to v2 Bloom filters.

To benchmark this, let's generate a commit-graph in linux.git with v1
changed-paths in generation order[^1]:

    $ git clone git@github.com:torvalds/linux.git
    $ cd linux
    $ git commit-graph write --reachable --changed-paths
    $ graph=".git/objects/info/commit-graph"
    $ mv $graph{,.bak}

Then let's time how long it takes to go from v1 to v2 filters (with and
without the upgrade path enabled), resetting the state of the
commit-graph each time:

    $ git config commitGraph.changedPathsVersion 2
    $ hyperfine -p 'cp -f $graph.bak $graph' -L v 0,1 \
        'GIT_TEST_UPGRADE_BLOOM_FILTERS={v} git.compile commit-graph write --reachable --changed-paths'

On linux.git (where there aren't any non-ASCII paths), the timings
indicate that this patch represents a speed-up over recomputing all
Bloom filters from scratch:

    Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths
      Time (mean ± σ):     124.873 s ±  0.316 s    [User: 124.081 s, System: 0.643 s]
      Range (min … max):   124.621 s … 125.227 s    3 runs

    Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths
      Time (mean ± σ):     79.271 s ±  0.163 s    [User: 74.611 s, System: 4.521 s]
      Range (min … max):   79.112 s … 79.437 s    3 runs

    Summary
      'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran
        1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths'

On git.git, we do have some non-ASCII paths, giving us a more modest
improvement from 4.163 seconds to 3.348 seconds, for a 1.24x speed-up.
On my machine, the stats for git.git are:

  - 8,285 Bloom filters computed from scratch
  - 10 Bloom filters generated as empty
  - 4 Bloom filters generated as truncated due to too many changed paths
  - 65,114 Bloom filters were reused when transitioning from v1 to v2.

[^1]: Note that this is is important, since `--stdin-packs` or
  `--stdin-commits` orders commits in the commit-graph by their pack
  position (with `--stdin-packs`) or in the raw input (with
  `--stdin-commits`).

  Since we compute Bloom filters in the same order that commits appear
  in the graph, we must see a commit's (first) parent before we process
  the commit itself. This is only guaranteed to happen when sorting
  commits by their generation number.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bloom.c              | 90 ++++++++++++++++++++++++++++++++++++++++++--
 bloom.h              |  1 +
 commit-graph.c       |  5 +++
 object.h             |  1 +
 t/t4216-log-bloom.sh | 35 ++++++++++++++++-
 5 files changed, 128 insertions(+), 4 deletions(-)

diff --git a/bloom.c b/bloom.c
index 739fa093ba..24dd874e46 100644
--- a/bloom.c
+++ b/bloom.c
@@ -7,6 +7,9 @@
 #include "commit-graph.h"
 #include "commit.h"
 #include "commit-slab.h"
+#include "tree.h"
+#include "tree-walk.h"
+#include "config.h"
 
 define_commit_slab(bloom_filter_slab, struct bloom_filter);
 
@@ -250,6 +253,73 @@ static void init_truncated_large_filter(struct bloom_filter *filter,
 	filter->version = version;
 }
 
+#define VISITED   (1u<<21)
+#define HIGH_BITS (1u<<22)
+
+static int has_entries_with_high_bit(struct repository *r, struct tree *t)
+{
+	if (parse_tree(t))
+		return 1;
+
+	if (!(t->object.flags & VISITED)) {
+		struct tree_desc desc;
+		struct name_entry entry;
+
+		init_tree_desc(&desc, t->buffer, t->size);
+		while (tree_entry(&desc, &entry)) {
+			size_t i;
+			for (i = 0; i < entry.pathlen; i++) {
+				if (entry.path[i] & 0x80) {
+					t->object.flags |= HIGH_BITS;
+					goto done;
+				}
+			}
+
+			if (S_ISDIR(entry.mode)) {
+				struct tree *sub = lookup_tree(r, &entry.oid);
+				if (sub && has_entries_with_high_bit(r, sub)) {
+					t->object.flags |= HIGH_BITS;
+					goto done;
+				}
+			}
+
+		}
+
+done:
+		t->object.flags |= VISITED;
+	}
+
+	return !!(t->object.flags & HIGH_BITS);
+}
+
+static int commit_tree_has_high_bit_paths(struct repository *r,
+					  struct commit *c)
+{
+	struct tree *t;
+	if (repo_parse_commit(r, c))
+		return 1;
+	t = repo_get_commit_tree(r, c);
+	if (!t)
+		return 1;
+	return has_entries_with_high_bit(r, t);
+}
+
+static struct bloom_filter *upgrade_filter(struct repository *r, struct commit *c,
+					   struct bloom_filter *filter,
+					   int hash_version)
+{
+	struct commit_list *p = c->parents;
+	if (commit_tree_has_high_bit_paths(r, c))
+		return NULL;
+
+	if (p && commit_tree_has_high_bit_paths(r, p->item))
+		return NULL;
+
+	filter->version = hash_version;
+
+	return filter;
+}
+
 struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)
 {
 	struct bloom_filter *filter;
@@ -292,9 +362,23 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 						     filter, graph_pos);
 	}
 
-	if ((filter->data && filter->len) &&
-	    (!settings || settings->hash_version == filter->version))
-		return filter;
+	if (filter->data && filter->len) {
+		struct bloom_filter *upgrade;
+		if (!settings || settings->hash_version == filter->version)
+			return filter;
+
+		/* version mismatch, see if we can upgrade */
+		if (compute_if_not_present &&
+		    git_env_bool("GIT_TEST_UPGRADE_BLOOM_FILTERS", 1)) {
+			upgrade = upgrade_filter(r, c, filter,
+						 settings->hash_version);
+			if (upgrade) {
+				if (computed)
+					*computed |= BLOOM_UPGRADED;
+				return upgrade;
+			}
+		}
+	}
 	if (!compute_if_not_present)
 		return NULL;
 
diff --git a/bloom.h b/bloom.h
index bfe389e29c..e3a9b68905 100644
--- a/bloom.h
+++ b/bloom.h
@@ -102,6 +102,7 @@ enum bloom_filter_computed {
 	BLOOM_COMPUTED     = (1 << 1),
 	BLOOM_TRUNC_LARGE  = (1 << 2),
 	BLOOM_TRUNC_EMPTY  = (1 << 3),
+	BLOOM_UPGRADED     = (1 << 4),
 };
 
 struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
diff --git a/commit-graph.c b/commit-graph.c
index b70d57b085..50dcbb4d9b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1102,6 +1102,7 @@ struct write_commit_graph_context {
 	int count_bloom_filter_not_computed;
 	int count_bloom_filter_trunc_empty;
 	int count_bloom_filter_trunc_large;
+	int count_bloom_filter_upgraded;
 };
 
 static int write_graph_chunk_fanout(struct hashfile *f,
@@ -1709,6 +1710,8 @@ static void trace2_bloom_filter_write_statistics(struct write_commit_graph_conte
 			   ctx->count_bloom_filter_trunc_empty);
 	trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-large",
 			   ctx->count_bloom_filter_trunc_large);
+	trace2_data_intmax("commit-graph", ctx->r, "filter-upgraded",
+			   ctx->count_bloom_filter_upgraded);
 }
 
 static void compute_bloom_filters(struct write_commit_graph_context *ctx)
@@ -1750,6 +1753,8 @@ static void compute_bloom_filters(struct write_commit_graph_context *ctx)
 				ctx->count_bloom_filter_trunc_empty++;
 			if (computed & BLOOM_TRUNC_LARGE)
 				ctx->count_bloom_filter_trunc_large++;
+		} else if (computed & BLOOM_UPGRADED) {
+			ctx->count_bloom_filter_upgraded++;
 		} else if (computed & BLOOM_NOT_COMPUTED)
 			ctx->count_bloom_filter_not_computed++;
 		ctx->total_bloom_filter_data_size += filter
diff --git a/object.h b/object.h
index db25714b4e..2e5e08725f 100644
--- a/object.h
+++ b/object.h
@@ -75,6 +75,7 @@ void object_array_init(struct object_array *array);
  * commit-reach.c:                                  16-----19
  * sha1-name.c:                                              20
  * list-objects-filter.c:                                      21
+ * bloom.c:                                                    2122
  * builtin/fsck.c:           0--3
  * builtin/gc.c:             0
  * builtin/index-pack.c:                                     2021
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 68066b7928..569f2b6f8b 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -221,6 +221,10 @@ test_filter_trunc_large () {
 	grep "\"key\":\"filter-trunc-large\",\"value\":\"$1\"" $2
 }
 
+test_filter_upgraded () {
+	grep "\"key\":\"filter-upgraded\",\"value\":\"$1\"" $2
+}
+
 test_expect_success 'correctly report changes over limit' '
 	git init limits &&
 	(
@@ -628,7 +632,13 @@ test_expect_success 'when writing another commit graph, preserve existing versio
 test_expect_success 'when writing commit graph, do not reuse changed-path of another version' '
 	git init doublewrite &&
 	test_commit -C doublewrite c "$CENT" &&
+
 	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -C doublewrite commit-graph write --reachable --changed-paths &&
+	test_filter_computed 1 trace2.txt &&
+	test_filter_upgraded 0 trace2.txt &&
+
 	git -C doublewrite commit-graph write --reachable --changed-paths &&
 	for v in -2 3
 	do
@@ -639,8 +649,13 @@ test_expect_success 'when writing commit graph, do not reuse changed-path of ano
 		EOF
 		test_cmp expect err || return 1
 	done &&
+
 	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
-	git -C doublewrite commit-graph write --reachable --changed-paths &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -C doublewrite commit-graph write --reachable --changed-paths &&
+	test_filter_computed 1 trace2.txt &&
+	test_filter_upgraded 0 trace2.txt &&
+
 	(
 		cd doublewrite &&
 		echo "c01f" >expect &&
@@ -649,4 +664,22 @@ test_expect_success 'when writing commit graph, do not reuse changed-path of ano
 	)
 '
 
+test_expect_success 'when writing commit graph, reuse changed-path of another version where possible' '
+	git init upgrade &&
+
+	test_commit -C upgrade base no-high-bits &&
+
+	git -C upgrade config --add commitgraph.changedPathsVersion 1 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -C upgrade commit-graph write --reachable --changed-paths &&
+	test_filter_computed 1 trace2.txt &&
+	test_filter_upgraded 0 trace2.txt &&
+
+	git -C upgrade config --add commitgraph.changedPathsVersion 2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -C upgrade commit-graph write --reachable --changed-paths &&
+	test_filter_computed 0 trace2.txt &&
+	test_filter_upgraded 1 trace2.txt
+'
+
 test_done
-- 
2.42.0.415.g8942f205c8


^ permalink raw reply related

* [PATCH v4 17/17] bloom: introduce `deinit_bloom_filters()`
From: Taylor Blau @ 2023-10-18 18:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt
In-Reply-To: <cover.1697653929.git.me@ttaylorr.com>

After we are done using Bloom filters, we do not currently clean up any
memory allocated by the commit slab used to store those filters in the
first place.

Besides the bloom_filter structures themselves, there is mostly nothing
to free() in the first place, since in the read-only path all Bloom
filter's `data` members point to a memory mapped region in the
commit-graph file itself.

But when generating Bloom filters from scratch (or initializing
truncated filters) we allocate additional memory to store the filter's
data.

Keep track of when we need to free() this additional chunk of memory by
using an extra pointer `to_free`. Most of the time this will be NULL
(indicating that we are representing an existing Bloom filter stored in
a memory mapped region). When it is non-NULL, free it before discarding
the Bloom filters slab.

Suggested-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bloom.c        | 16 +++++++++++++++-
 bloom.h        |  3 +++
 commit-graph.c |  4 ++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/bloom.c b/bloom.c
index 24dd874e46..ff131893cd 100644
--- a/bloom.c
+++ b/bloom.c
@@ -59,6 +59,7 @@ int load_bloom_filter_from_graph(struct commit_graph *g,
 					sizeof(unsigned char) * start_index +
 					BLOOMDATA_CHUNK_HEADER_SIZE);
 	filter->version = g->bloom_filter_settings->hash_version;
+	filter->to_free = NULL;
 
 	return 1;
 }
@@ -231,6 +232,18 @@ void init_bloom_filters(void)
 	init_bloom_filter_slab(&bloom_filters);
 }
 
+static void free_one_bloom_filter(struct bloom_filter *filter)
+{
+	if (!filter)
+		return;
+	free(filter->to_free);
+}
+
+void deinit_bloom_filters(void)
+{
+	deep_clear_bloom_filter_slab(&bloom_filters, free_one_bloom_filter);
+}
+
 static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED,
 		       const struct hashmap_entry *eptr,
 		       const struct hashmap_entry *entry_or_key,
@@ -247,7 +260,7 @@ static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED,
 static void init_truncated_large_filter(struct bloom_filter *filter,
 					int version)
 {
-	filter->data = xmalloc(1);
+	filter->data = filter->to_free = xmalloc(1);
 	filter->data[0] = 0xFF;
 	filter->len = 1;
 	filter->version = version;
@@ -449,6 +462,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 			filter->len = 1;
 		}
 		CALLOC_ARRAY(filter->data, filter->len);
+		filter->to_free = filter->data;
 
 		hashmap_for_each_entry(&pathmap, &iter, e, entry) {
 			struct bloom_key key;
diff --git a/bloom.h b/bloom.h
index e3a9b68905..d20e64bfbb 100644
--- a/bloom.h
+++ b/bloom.h
@@ -56,6 +56,8 @@ struct bloom_filter {
 	unsigned char *data;
 	size_t len;
 	int version;
+
+	void *to_free;
 };
 
 /*
@@ -96,6 +98,7 @@ void add_key_to_filter(const struct bloom_key *key,
 		       const struct bloom_filter_settings *settings);
 
 void init_bloom_filters(void);
+void deinit_bloom_filters(void);
 
 enum bloom_filter_computed {
 	BLOOM_NOT_COMPUTED = (1 << 0),
diff --git a/commit-graph.c b/commit-graph.c
index 50dcbb4d9b..60fa64d956 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -779,6 +779,7 @@ struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r)
 void close_commit_graph(struct raw_object_store *o)
 {
 	clear_commit_graph_data_slab(&commit_graph_data_slab);
+	deinit_bloom_filters();
 	free_commit_graph(o->commit_graph);
 	o->commit_graph = NULL;
 }
@@ -2583,6 +2584,9 @@ int write_commit_graph(struct object_directory *odb,
 
 	res = write_commit_graph_file(ctx);
 
+	if (ctx->changed_paths)
+		deinit_bloom_filters();
+
 	if (ctx->split)
 		mark_commit_graphs(ctx);
 
-- 
2.42.0.415.g8942f205c8

^ permalink raw reply related

* Re: [PATCH 04/11] t: convert tests to not write references via the filesystem
From: Junio C Hamano @ 2023-10-18 18:34 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <c79431c0bf117d756e1d584f4c9415888d9ff9eb.1697607222.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> -test_expect_success "fail to create $n" '
> -	test_when_finished "rm -f .git/$n_dir" &&
> -	touch .git/$n_dir &&
> -	test_must_fail git update-ref $n $A
> +test_expect_success "fail to create $n due to file/directory conflict" '
> +	test_when_finished "git update-ref -d refs/heads/gu" &&
> +	git update-ref refs/heads/gu $A &&
> +	test_must_fail git update-ref refs/heads/gu/fixes $A
>  '

OK, the original checks "if a random garbage file, which may not
necessarily be a ref, exists at $n_dir, we cannot create a ref at
$n_dir/fixes, due to D/F conflict" more directly, but as long as our
intention is to enforce the D/F restriction across different ref
backends [*], creating a ref at $n_dir and making sure $n_dir/fixes
cannot be created is an equivalent check that is better (because it
can be applied for other backends).

    Side note: there is no fundamental need to, though, and there
         are cases where being able to have the 'seen' branch and
         'seen/ps/ref-test-tools' branches at the same time is
         beneficial---packed-refs and ref-table backends would not
         have such an inherent limitation, but they can of course be
         castrated to match what files-backend can(not) do.

> @@ -222,7 +220,7 @@ test_expect_success 'delete symref without dereference when the referred ref is
>  
>  test_expect_success 'update-ref -d is not confused by self-reference' '
>  	git symbolic-ref refs/heads/self refs/heads/self &&
> -	test_when_finished "rm -f .git/refs/heads/self" &&
> +	test_when_finished "test-tool ref-store main delete-refs REF_NO_DEREF refs/heads/self" &&
>  	test_path_is_file .git/refs/heads/self &&

I trust that this will be corrected to use some wrapper around "git
symbolic-ref" (or an equivalent for it as a test-tool subcommand) in
some future patch, if not in this series?

>  	test_must_fail git update-ref -d refs/heads/self &&
>  	test_path_is_file .git/refs/heads/self

Likewise.

> @@ -230,7 +228,7 @@ test_expect_success 'update-ref -d is not confused by self-reference' '
>  
>  test_expect_success 'update-ref --no-deref -d can delete self-reference' '
>  	git symbolic-ref refs/heads/self refs/heads/self &&
> -	test_when_finished "rm -f .git/refs/heads/self" &&
> +	test_when_finished "test-tool ref-store main delete-refs REF_NO_DEREF refs/heads/self" &&
>  	test_path_is_file .git/refs/heads/self &&
>  	git update-ref --no-deref -d refs/heads/self &&
>  	test_must_fail git show-ref --verify -q refs/heads/self

We already have the "ref is missing" test here.

I'll stop at this point for now; will hopefully continue in a
separate message later.  Thanks.

^ permalink raw reply

* Re: [Outreachy][PATCH] branch.c: adjust error messages to coding guidelines
From: Junio C Hamano @ 2023-10-18 18:44 UTC (permalink / raw)
  To: Rubén Justo; +Cc: Isoken June Ibizugbe, git, christian.couder
In-Reply-To: <e08b2ec4-786a-4c18-b7af-0a6a250ae0f0@gmail.com>

Rubén Justo <rjusto@gmail.com> writes:

> On 18-oct-2023 06:12:22, Isoken June Ibizugbe wrote:
>
>> Signed-off-by: Isoken June Ibizugbe <isokenjune@gmail.com>
>> ---
>>  builtin/branch.c | 66 ++++++++++++++++++++++++------------------------
>>  1 file changed, 33 insertions(+), 33 deletions(-)
>
> Only builtin/branch.c is touched.
>
> The changes in this patch break some tests, therefore this patch must
> also include the fixes for those tests.

Good point.  I also notice the lack of body in the proposed commit
message.  It should say which part(s) of "coding guidelines" this
change wants to make the code adhere to (maybe later we might change
the guideline, and those who find this commit via "git blame" needs
to understand what exact rule this change was aiming to follow,
before they are confident that the change they plan to make is good).

>> @@ -965,11 +965,11 @@ int cmd_branch(int argc, const char **argv, const char *prefix)
>>  		const char *start_name = argc == 2 ? argv[1] : head;
>>  
>>  		if (filter.kind != FILTER_REFS_BRANCHES)
>> -			die(_("The -a, and -r, options to 'git branch' do not take a branch name.\n"
>> +			die(_("the -a, and -r, options to 'git branch' do not take a branch name\n"
>>  				  "Did you mean to use: -a|-r --list <pattern>?"));
>
> OK.  The initial 'T' is fixed, but as Junio explained [1], the full stop
> must stay.

Thanks for pointing out that it will help mentorship applicants to
learn from reviews the other applicants are receiving, as they tend
to make similar mistakes.

>
> Thanks.
>
>  [1] https://lore.kernel.org/git/xmqqttqxkmaq.fsf@gitster.g/

^ permalink raw reply

* Re: [PATCH] git-gui: add support for filenames starting with tilde
From: Junio C Hamano @ 2023-10-18 18:52 UTC (permalink / raw)
  To: Matthias Aßhauer via GitGitGadget
  Cc: git, Pratyush Yadav, Matthias Aßhauer
In-Reply-To: <pull.1599.git.1697619043944.gitgitgadget@gmail.com>

"Matthias Aßhauer via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: =?UTF-8?q?Matthias=20A=C3=9Fhauer?= <mha1993@live.de>
>
> When git-gui encounters a file name starting with a tilde character (~),
> TCL "helpfully" expands that tilde into a (probably non-existing) users
> home directory. But in git-gui we're often not dealing with user supplied
> paths, where such an expansion might be expected, but actual names of files.
>
> Prevent TCL from doing tilde expansion on these literal filenames.

I do not have stake in git-gui (or gitk) and take the changes the
subsystem owners have decided to take from them, but I have to
wonder if this is merely robbing Peter to pay Paul?

If the above description were not "we're often not" but "the only
paths we use are repository relative, and a path that begin with a
tilde NEVER refers to somebody's home directory", then I would buy
into the change (but again, I am not even a user of git-gui, so take
this with a moderate grains of salt).

Thanks.

^ permalink raw reply

* [PATCH] git-p4 shouldn't attempt to store symlinks in LFS
From: Matthew McClain @ 2023-10-18 18:58 UTC (permalink / raw)
  To: git; +Cc: Matthew McClain

If a symlink in your Perforce repository matches
git-p4.largeFileExtensions, git-p4.py will attempt to put the symlink in
LFS but will blow up when it passes a string to generateTempFile.

Git LFS normal behavior does not stash symlinks in LFS.

Importing revision 42889 (100%)Traceback (most recent call last):
  File "./git/git-p4.py", line 4621, in <module>
    main()
  File "./git/git-p4.py", line 4615, in main
    if not cmd.run(args):
  File "./git/git-p4.py", line 4225, in run
    self.importRevisions(args, branch_arg_given)
  File "./git/git-p4.py", line 4002, in importRevisions
    self.importChanges(changes)
  File "./git/git-p4.py", line 3876, in importChanges
    self.initialParent)
  File "./git/git-p4.py", line 3496, in commit
    self.streamP4Files(files)
  File "./git/git-p4.py", line 3336, in streamP4Files
    cb=streamP4FilesCbSelf)
  File "./git/git-p4.py", line 910, in p4CmdList
    cb(entry)
  File "./git/git-p4.py", line 3321, in streamP4FilesCbSelf
    self.streamP4FilesCb(entry)
  File "./git/git-p4.py", line 3266, in streamP4FilesCb
    self.streamOneP4File(self.stream_file, self.stream_contents)
  File "./git/git-p4.py", line 3217, in streamOneP4File
    git_mode, contents = self.largeFileSystem.processContent(git_mode, relPath, contents)
  File "./git/git-p4.py", line 1656, in processContent
    return LargeFileSystem.processContent(self, git_mode, relPath, contents)
  File "./git/git-p4.py", line 1526, in processContent
    contentTempFile = self.generateTempFile(contents)
  File "./git/git-p4.py", line 1488, in generateTempFile
    contentFile.write(d)
  File "/usr/lib64/python3.6/tempfile.py", line 485, in func_wrapper
    return func(*args, **kwargs)
TypeError: a bytes-like object is required, not 'str'

Signed-off-by: Matthew McClain <mmcclain@noprivs.com>
---
 git-p4.py | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/git-p4.py b/git-p4.py
index d26a980e5a..f5fda2a3dc 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1522,6 +1522,10 @@ def processContent(self, git_mode, relPath, contents):
            file is stored in the large file system and handles all necessary
            steps.
            """
+        # no need to store symlinks in LFS (generateTempFile wants bytes)
+        if git_mode == "120000":
+            return (git_mode, contents)
+
         if self.exceedsLargeFileThreshold(relPath, contents) or self.hasLargeFileExtension(relPath):
             contentTempFile = self.generateTempFile(contents)
             pointer_git_mode, contents, localLargeFile = self.generatePointer(contentTempFile)
-- 
2.39.3


^ permalink raw reply related

* Re: [PATCH] git-p4 shouldn't attempt to store symlinks in LFS
From: Junio C Hamano @ 2023-10-18 19:30 UTC (permalink / raw)
  To: Matthew McClain; +Cc: git, brian m. carlson
In-Reply-To: <20231018185854.857674-1-mmcclain@noprivs.com>

Matthew McClain <mmcclain@noprivs.com> writes:

> If a symlink in your Perforce repository matches
> git-p4.largeFileExtensions, git-p4.py will attempt to put the symlink in
> LFS but will blow up when it passes a string to generateTempFile.
>
> Git LFS normal behavior does not stash symlinks in LFS.

I am not a "git p4" (or "git lfs") user, but if nobody uses LFS to
store symbolic links (which is very much understandable given what
it was invented for), then that alone is a good enough reason to
avoid the regular blob codepath, and that is true even if
generateTempFile accepted a str and silently converted it to bytes
to help callers, no?

So "but will blow up ..." and the stacktrace below are more or less
irrelevant details and do not help convince readers why this change
is desirable.  I'd suggest removing it (and perhaps place more stress
on explaining why storing symbolic links in LFS is a bad practice
nobody uses, but if it is so obvious perhaps the single sentence
explanation you have above would be sufficient).

I know I can ask brian to take a look at this change from LFS angle,
but who's authoritatively reviewing "git p4" related changes these
days (Matthew, this is not a question for you, but to the list)?

Thanks.

> Importing revision 42889 (100%)Traceback (most recent call last):
>   File "./git/git-p4.py", line 4621, in <module>
>     main()
>   File "./git/git-p4.py", line 4615, in main
>     if not cmd.run(args):
>   File "./git/git-p4.py", line 4225, in run
>     self.importRevisions(args, branch_arg_given)
>   File "./git/git-p4.py", line 4002, in importRevisions
>     self.importChanges(changes)
>   File "./git/git-p4.py", line 3876, in importChanges
>     self.initialParent)
>   File "./git/git-p4.py", line 3496, in commit
>     self.streamP4Files(files)
>   File "./git/git-p4.py", line 3336, in streamP4Files
>     cb=streamP4FilesCbSelf)
>   File "./git/git-p4.py", line 910, in p4CmdList
>     cb(entry)
>   File "./git/git-p4.py", line 3321, in streamP4FilesCbSelf
>     self.streamP4FilesCb(entry)
>   File "./git/git-p4.py", line 3266, in streamP4FilesCb
>     self.streamOneP4File(self.stream_file, self.stream_contents)
>   File "./git/git-p4.py", line 3217, in streamOneP4File
>     git_mode, contents = self.largeFileSystem.processContent(git_mode, relPath, contents)
>   File "./git/git-p4.py", line 1656, in processContent
>     return LargeFileSystem.processContent(self, git_mode, relPath, contents)
>   File "./git/git-p4.py", line 1526, in processContent
>     contentTempFile = self.generateTempFile(contents)
>   File "./git/git-p4.py", line 1488, in generateTempFile
>     contentFile.write(d)
>   File "/usr/lib64/python3.6/tempfile.py", line 485, in func_wrapper
>     return func(*args, **kwargs)
> TypeError: a bytes-like object is required, not 'str'
>
> Signed-off-by: Matthew McClain <mmcclain@noprivs.com>
> ---
>  git-p4.py | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/git-p4.py b/git-p4.py
> index d26a980e5a..f5fda2a3dc 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -1522,6 +1522,10 @@ def processContent(self, git_mode, relPath, contents):
>             file is stored in the large file system and handles all necessary
>             steps.
>             """
> +        # no need to store symlinks in LFS (generateTempFile wants bytes)
> +        if git_mode == "120000":
> +            return (git_mode, contents)
> +
>          if self.exceedsLargeFileThreshold(relPath, contents) or self.hasLargeFileExtension(relPath):
>              contentTempFile = self.generateTempFile(contents)
>              pointer_git_mode, contents, localLargeFile = self.generatePointer(contentTempFile)

^ permalink raw reply

* Re: [PATCH v2 1/5] doc: fix some typos, grammar and wording issues
From: Junio C Hamano @ 2023-10-18 20:21 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: git, Eric Sunshine
In-Reply-To: <xmqqcyxsj02f.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Štěpán Němec <stepnem@smrk.net> writes:
>
>> Signed-off-by: Štěpán Němec <stepnem@smrk.net>
>> ---
>> v2: Feedback from Eric and Junio, implement most of Eric's suggestions.
>> Range-diff with two inline comments follows:
>
> Looking good.
>
> The above comment was meant to apply to the whole series, but
> because you lack the [0/5] cover letter, I am replying to [1/5].
> You may want to run git-format-patch with --cover-letter when
> sending a series with multiple patches.
>
> Will queue.  Thanks.

I wanted to see an independent review from others before deciding to
merge this to 'next', but unfortunately we seem to lack review
bandwidth.  I've given another scan of the whole thing and did not
spot anything glaringly wrong (which is not surprising), so let's
merge it down to 'next' and if anybody finds a bad rewrite in these
changes later, let's let them update on top.

Thanks.

^ permalink raw reply

* [PATCH v1 0/4] maintenance: use XDG config if it exists
From: Kristoffer Haugsbakk @ 2023-10-18 20:28 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, ps, stolee

maintenance: use XDG config if it exists

I use the conventional XDG config path for the global configuration. This
path is always used except for `git maintenance register`.

§ Other discussions

While working on this series I found that Phillip Wood[1] had pointed out
that `xdg_config` is never used. However that was on a series where this
was the existing behavior (not new), so this wasn't acted upon.

🔗 1: https://lore.kernel.org/git/448cc6ed-c441-85a3-2780-0c07e56f53f8@dunelm.org.uk/

§ Patches

• 1–3: Preparatory
• 4: The desired change

§ CC

• Patrick Steinhardt: `config` changes
• Derrick Stolee: `maintenance` changes

Kristoffer Haugsbakk (4):
  config: format newlines
  config: rename global config function
  config: factor out global config file retrieval
  maintenance: use XDG config if it exists

 builtin/config.c       | 26 ++------------------------
 builtin/gc.c           | 23 +++++------------------
 builtin/var.c          |  2 +-
 config.c               | 30 ++++++++++++++++++++++++++----
 config.h               |  3 ++-
 t/t7900-maintenance.sh | 21 +++++++++++++++++++++
 6 files changed, 57 insertions(+), 48 deletions(-)

--
2.42.0.2.g879ad04204

^ permalink raw reply

* [PATCH v1 1/4] config: format newlines
From: Kristoffer Haugsbakk @ 2023-10-18 20:28 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, ps, stolee
In-Reply-To: <cover.1697660181.git.code@khaugsbakk.name>


Remove unneeded newlines according to `clang-format`.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
---

Notes (series):
    Honestly the formatter changing these lines over and over again was just
    annoying. And we're visiting the file anyway.

 builtin/config.c | 1 -
 config.c         | 2 --
 2 files changed, 3 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 11a4d4ef141..87d0dc92d99 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -760,7 +760,6 @@ int cmd_config(int argc, const char **argv, const char *prefix)
 		given_config_source.scope = CONFIG_SCOPE_COMMAND;
 	}
 
-
 	if (respect_includes_opt == -1)
 		config_options.respect_includes = !given_config_source.file;
 	else
diff --git a/config.c b/config.c
index 3846a37be97..19f832818f1 100644
--- a/config.c
+++ b/config.c
@@ -96,7 +96,6 @@ static long config_file_ftell(struct config_source *conf)
 	return ftell(conf->u.file);
 }
 
-
 static int config_buf_fgetc(struct config_source *conf)
 {
 	if (conf->u.buf.pos < conf->u.buf.len)
@@ -3564,7 +3563,6 @@ int git_config_set_multivar_in_file_gently(const char *config_filename,
 write_err_out:
 	ret = write_error(get_lock_file_path(&lock));
 	goto out_free;
-
 }
 
 void git_config_set_multivar_in_file(const char *config_filename,
-- 
2.42.0.2.g879ad04204


^ permalink raw reply related

* [PATCH v1 2/4] config: rename global config function
From: Kristoffer Haugsbakk @ 2023-10-18 20:28 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, ps, stolee
In-Reply-To: <cover.1697660181.git.code@khaugsbakk.name>


Rename this function to a more descriptive name since we want to use the
existing name for a new function.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
---
 builtin/config.c | 2 +-
 builtin/gc.c     | 4 ++--
 builtin/var.c    | 2 +-
 config.c         | 4 ++--
 config.h         | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 87d0dc92d99..6fff2655816 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -710,7 +710,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
 	if (use_global_config) {
 		char *user_config, *xdg_config;
 
-		git_global_config(&user_config, &xdg_config);
+		git_global_config_paths(&user_config, &xdg_config);
 		if (!user_config)
 			/*
 			 * It is unknown if HOME/.gitconfig exists, so
diff --git a/builtin/gc.c b/builtin/gc.c
index 5c4315f0d81..17fc031f63a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1529,7 +1529,7 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
 		char *user_config = NULL, *xdg_config = NULL;
 
 		if (!config_file) {
-			git_global_config(&user_config, &xdg_config);
+			git_global_config_paths(&user_config, &xdg_config);
 			config_file = user_config;
 			if (!user_config)
 				die(_("$HOME not set"));
@@ -1597,7 +1597,7 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
 		int rc;
 		char *user_config = NULL, *xdg_config = NULL;
 		if (!config_file) {
-			git_global_config(&user_config, &xdg_config);
+			git_global_config_paths(&user_config, &xdg_config);
 			config_file = user_config;
 			if (!user_config)
 				die(_("$HOME not set"));
diff --git a/builtin/var.c b/builtin/var.c
index 74161bdf1c6..8e18b50b1e5 100644
--- a/builtin/var.c
+++ b/builtin/var.c
@@ -90,7 +90,7 @@ static char *git_config_val_global(int ident_flag UNUSED)
 	char *user, *xdg;
 	size_t unused;
 
-	git_global_config(&user, &xdg);
+	git_global_config_paths(&user, &xdg);
 	if (xdg && *xdg) {
 		normalize_path_copy(xdg, xdg);
 		strbuf_addf(&buf, "%s\n", xdg);
diff --git a/config.c b/config.c
index 19f832818f1..d2cdda96edd 100644
--- a/config.c
+++ b/config.c
@@ -2111,7 +2111,7 @@ char *git_system_config(void)
 	return system_config;
 }
 
-void git_global_config(char **user_out, char **xdg_out)
+void git_global_config_paths(char **user_out, char **xdg_out)
 {
 	char *user_config = xstrdup_or_null(getenv("GIT_CONFIG_GLOBAL"));
 	char *xdg_config = NULL;
@@ -2186,7 +2186,7 @@ static int do_git_config_sequence(const struct config_options *opts,
 							 data, CONFIG_SCOPE_SYSTEM,
 							 NULL);
 
-	git_global_config(&user_config, &xdg_config);
+	git_global_config_paths(&user_config, &xdg_config);
 
 	if (xdg_config && !access_or_die(xdg_config, R_OK, ACCESS_EACCES_OK))
 		ret += git_config_from_file_with_options(fn, xdg_config, data,
diff --git a/config.h b/config.h
index 6332d749047..9f04de8ee3e 100644
--- a/config.h
+++ b/config.h
@@ -394,7 +394,7 @@ int config_error_nonbool(const char *);
 #endif
 
 char *git_system_config(void);
-void git_global_config(char **user, char **xdg);
+void git_global_config_paths(char **user, char **xdg);
 
 int git_config_parse_parameter(const char *, config_fn_t fn, void *data);
 
-- 
2.42.0.2.g879ad04204


^ permalink raw reply related

* [PATCH v1 3/4] config: factor out global config file retrieval
From: Kristoffer Haugsbakk @ 2023-10-18 20:28 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, ps, stolee
In-Reply-To: <cover.1697660181.git.code@khaugsbakk.name>


Factor out code that retrieves the global config file so that we can use
it in `gc.c` as well.

Use the old name from the previous commit since this function acts
functionally the same as `git_system_config` but for “global”.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
---
 builtin/config.c | 25 ++-----------------------
 config.c         | 24 ++++++++++++++++++++++++
 config.h         |  1 +
 3 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 6fff2655816..df06b766fad 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -708,30 +708,9 @@ int cmd_config(int argc, const char **argv, const char *prefix)
 	}
 
 	if (use_global_config) {
-		char *user_config, *xdg_config;
-
-		git_global_config_paths(&user_config, &xdg_config);
-		if (!user_config)
-			/*
-			 * It is unknown if HOME/.gitconfig exists, so
-			 * we do not know if we should write to XDG
-			 * location; error out even if XDG_CONFIG_HOME
-			 * is set and points at a sane location.
-			 */
-			die(_("$HOME not set"));
-
+		given_config_source.file = git_global_config();
 		given_config_source.scope = CONFIG_SCOPE_GLOBAL;
-
-		if (access_or_warn(user_config, R_OK, 0) &&
-		    xdg_config && !access_or_warn(xdg_config, R_OK, 0)) {
-			given_config_source.file = xdg_config;
-			free(user_config);
-		} else {
-			given_config_source.file = user_config;
-			free(xdg_config);
-		}
-	}
-	else if (use_system_config) {
+	} else if (use_system_config) {
 		given_config_source.file = git_system_config();
 		given_config_source.scope = CONFIG_SCOPE_SYSTEM;
 	} else if (use_local_config) {
diff --git a/config.c b/config.c
index d2cdda96edd..2ff766c56ff 100644
--- a/config.c
+++ b/config.c
@@ -2111,6 +2111,30 @@ char *git_system_config(void)
 	return system_config;
 }
 
+char *git_global_config(void)
+{
+	char *user_config, *xdg_config;
+
+	git_global_config_paths(&user_config, &xdg_config);
+	if (!user_config)
+		/*
+		 * It is unknown if HOME/.gitconfig exists, so
+		 * we do not know if we should write to XDG
+		 * location; error out even if XDG_CONFIG_HOME
+		 * is set and points at a sane location.
+		 */
+		die(_("$HOME not set"));
+
+	if (access_or_warn(user_config, R_OK, 0) && xdg_config &&
+	    !access_or_warn(xdg_config, R_OK, 0)) {
+		free(user_config);
+		return xdg_config;
+	} else {
+		free(xdg_config);
+		return user_config;
+	}
+}
+
 void git_global_config_paths(char **user_out, char **xdg_out)
 {
 	char *user_config = xstrdup_or_null(getenv("GIT_CONFIG_GLOBAL"));
diff --git a/config.h b/config.h
index 9f04de8ee3e..5cf961b548d 100644
--- a/config.h
+++ b/config.h
@@ -394,6 +394,7 @@ int config_error_nonbool(const char *);
 #endif
 
 char *git_system_config(void);
+char *git_global_config(void);
 void git_global_config_paths(char **user, char **xdg);
 
 int git_config_parse_parameter(const char *, config_fn_t fn, void *data);
-- 
2.42.0.2.g879ad04204


^ permalink raw reply related

* [PATCH v1 4/4] maintenance: use XDG config if it exists
From: Kristoffer Haugsbakk @ 2023-10-18 20:28 UTC (permalink / raw)
  To: git; +Cc: Kristoffer Haugsbakk, ps, stolee
In-Reply-To: <cover.1697660181.git.code@khaugsbakk.name>


`git maintenance register` registers the repository in the user's global
config. `$XDG_CONFIG_HOME/git/config` is supposed to be used if
`~/.gitconfig` does not exist. However, this command creates a
`~/.gitconfig` file and writes to that one even though the XDG variant
exists.

This used to work correctly until 50a044f1e4 (gc: replace config
subprocesses with API calls, 2022-09-27), when the command started calling
the config API instead of git-config(1).

Also change `unregister` accordingly.

Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
---
 builtin/gc.c           | 23 +++++------------------
 t/t7900-maintenance.sh | 21 +++++++++++++++++++++
 2 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 17fc031f63a..7b780f2ab38 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1526,19 +1526,12 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
 
 	if (!found) {
 		int rc;
-		char *user_config = NULL, *xdg_config = NULL;
 
-		if (!config_file) {
-			git_global_config_paths(&user_config, &xdg_config);
-			config_file = user_config;
-			if (!user_config)
-				die(_("$HOME not set"));
-		}
+		if (!config_file)
+			config_file = git_global_config();
 		rc = git_config_set_multivar_in_file_gently(
 			config_file, "maintenance.repo", maintpath,
 			CONFIG_REGEX_NONE, 0);
-		free(user_config);
-		free(xdg_config);
 
 		if (rc)
 			die(_("unable to add '%s' value of '%s'"),
@@ -1595,18 +1588,12 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
 
 	if (found) {
 		int rc;
-		char *user_config = NULL, *xdg_config = NULL;
-		if (!config_file) {
-			git_global_config_paths(&user_config, &xdg_config);
-			config_file = user_config;
-			if (!user_config)
-				die(_("$HOME not set"));
-		}
+
+		if (!config_file)
+			config_file = git_global_config();
 		rc = git_config_set_multivar_in_file_gently(
 			config_file, key, NULL, maintpath,
 			CONFIG_FLAGS_MULTI_REPLACE | CONFIG_FLAGS_FIXED_VALUE);
-		free(user_config);
-		free(xdg_config);
 
 		if (rc &&
 		    (!force || rc == CONFIG_NOTHING_SET))
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 487e326b3fa..a11e6c61520 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -67,6 +67,27 @@ test_expect_success 'maintenance.auto config option' '
 	test_subcommand ! git maintenance run --auto --quiet  <false
 '
 
+test_expect_success 'register uses XDG_CONFIG_HOME config if it exists' '
+	XDG_CONFIG_HOME=.config &&
+	test_when_finished rm -r "$XDG_CONFIG_HOME"/git/config &&
+	export "XDG_CONFIG_HOME" &&
+	mkdir -p "$XDG_CONFIG_HOME"/git &&
+	touch "$XDG_CONFIG_HOME"/git/config &&
+	git maintenance register &&
+	git config --file="$XDG_CONFIG_HOME"/git/config --get maintenance.repo >actual &&
+	pwd >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'register does not need XDG_CONFIG_HOME config to exist' '
+	test_when_finished git maintenance unregister &&
+	test_path_is_missing "$XDG_CONFIG_HOME"/git/config &&
+	git maintenance register &&
+	git config --global --get maintenance.repo >actual &&
+	pwd >expect &&
+	test_cmp expect actual
+'
+
 test_expect_success 'maintenance.<task>.enabled' '
 	git config maintenance.gc.enabled false &&
 	git config maintenance.commit-graph.enabled true &&
-- 
2.42.0.2.g879ad04204


^ permalink raw reply related

* Re: [PATCH 04/11] t: convert tests to not write references via the filesystem
From: Junio C Hamano @ 2023-10-18 21:18 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <c79431c0bf117d756e1d584f4c9415888d9ff9eb.1697607222.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> @@ -434,7 +432,7 @@ test_expect_success 'Query "main@{2005-05-28}" (past end of history)' '
>  	test_i18ngrep -F "warning: log for ref $m unexpectedly ended on $ld" e
>  '
>  
> -rm -f .git/$m .git/logs/$m expect
> +git update-ref -d $m

We are not clearing "expect" file.  I do not know if it matters
here, but I am only recording what I noticed.

> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 10a539158c4..5cce24f1006 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -115,15 +115,16 @@ test_expect_success 'zlib corrupt loose object output ' '
>  '
>  
>  test_expect_success 'branch pointing to non-commit' '
> -	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
> +	tree_oid=$(git rev-parse --verify HEAD^{tree}) &&
> +	test-tool ref-store main update-ref msg refs/heads/invalid $tree_oid $ZERO_OID REF_SKIP_OID_VERIFICATION &&

I have mixed feelings on this.

In olden days, plumbing commands tended to allow to pass anything
the user told them to use, but in more recent versions of Git, we,
probably by mistake, managed to butcher some of the plumbing
commands to make them unable to deliberately "break" repositories,
one victim being "update-ref", i.e.

    $ git update-ref refs/heads/invalid HEAD^{tree}

is rejected these days (I just checked with v1.3.0 and it allows me
to do this), and that is one of the reasons why we manually broke
the repository in these tests.  We need to have a warning message in
comments near the implementation of "ref-store update-ref" that says
never ever attempt to share code with the production version of
update-ref---otherwise this (or the "safety" given to the plumbing
command, possibly by mistake) will be broken, depending on which
direction such a sharing goes.  On the other hand, forcing us to
keep two separate implementations, one deliberately loose to allow
us corrupt repositories, the other for production and actively used,
would mean the former one that is only used for validation would risk
bitrotting.

>  	test_when_finished "git update-ref -d refs/heads/invalid" &&

Not a problem this patch introduces, but I think it is a better
discipline to have when_finished clean-up routine prepared before we
do actual damage, i.e. if I were writing this test today from scratch,
I would expect it to be before "git rev-parse >.git/refs/heads/invalid"
is done.

>  	test_must_fail git fsck 2>out &&
>  	test_i18ngrep "not a commit" out
>  '

A #leftoverbit that is not relevant to the topic; we should clean
these test_i18ngrep and replace them with a plain "grep".

>  test_expect_success 'HEAD link pointing at a funny object' '
> -	test_when_finished "mv .git/SAVED_HEAD .git/HEAD" &&
> -	mv .git/HEAD .git/SAVED_HEAD &&
> +	saved_head=$(git rev-parse --verify HEAD) &&
> +	test_when_finished "git update-ref HEAD ${saved_head}" &&
>  	echo $ZERO_OID >.git/HEAD &&

Are you sure .git/HEAD when this test is entered is a detached HEAD?
The title of the test says "HEAD link", and I take it to mean HEAD
is a symlink, and we save it away, while we create a loose ref that
points at 0{40} in a detached HEAD state.  Actually, the original
would also work if HEAD is detached on entry.  In either case,
moving SAVED_HEAD back to HEAD would restore the original state.

But the updated one only works if HEAD upon entry is already
detached.  Is this intended?

> @@ -131,8 +132,8 @@ test_expect_success 'HEAD link pointing at a funny object' '
>  '
>  
>  test_expect_success 'HEAD link pointing at a funny place' '
> -	test_when_finished "mv .git/SAVED_HEAD .git/HEAD" &&
> -	mv .git/HEAD .git/SAVED_HEAD &&
> +	saved_head=$(git rev-parse --verify HEAD) &&
> +	test_when_finished "git update-ref --no-deref HEAD ${saved_head}" &&

Likewise.  Use of "update-ref" in the previous one vs "update-ref
--no-deref" in this one to recover from the damage the tests make
makes me feel that we may be assuming too much.

>  	echo "ref: refs/funny/place" >.git/HEAD &&

Even though "git symbolic-ref" refuses to point HEAD outside refs/,
as plumbing command should, it allows it to point it outside refs/heads/.
so this line should probably become

	git symbolic-ref HEAD refs/funny/place

in the same spirit as the rest of the series.

> @@ -391,7 +393,7 @@ test_expect_success 'tag pointing to nonexistent' '
>  
>  	tag=$(git hash-object -t tag -w --stdin <invalid-tag) &&
>  	test_when_finished "remove_object $tag" &&
> -	echo $tag >.git/refs/tags/invalid &&
> +	git update-ref refs/tags/invalid $tag &&

Good (not just this one, but similar ones throughout this patch).

^ permalink raw reply

* Re: [PATCH] Include gettext.h in MyFirstContribution tutorial
From: Junio C Hamano @ 2023-10-18 21:35 UTC (permalink / raw)
  To: git, Emily Shaffer; +Cc: Jacob Stopak
In-Reply-To: <20231017041503.3249-1-jacob@initialcommit.io>

Jacob Stopak <jacob@initialcommit.io> writes:

> The tutorial in Documentation/MyFirstContribution.txt has steps to print
> some text using the "_" function. However, this leads to compiler errors
> when running "make" since "gettext.h" is not #included.
>
> Update docs with a note to #include "gettext.h" in "builtin/psuh.c".
>
> Signed-off-by: Jacob Stopak <jacob@initialcommit.io>
> ---
>  Documentation/MyFirstContribution.txt | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Who's the first responder on this document these days?  I think the
"psuh" was Emily's invention, so sending it in her direction.

Thanks.

> diff --git a/Documentation/MyFirstContribution.txt b/Documentation/MyFirstContribution.txt
> index 62d11a5cd7..7cfed60c2e 100644
> --- a/Documentation/MyFirstContribution.txt
> +++ b/Documentation/MyFirstContribution.txt
> @@ -160,10 +160,11 @@ in order to keep the declarations alphabetically sorted:
>  int cmd_psuh(int argc, const char **argv, const char *prefix);
>  ----
>  
> -Be sure to `#include "builtin.h"` in your `psuh.c`.
> +Be sure to `#include "builtin.h"` in your `psuh.c`. You'll also need to
> +`#include "gettext.h"` to use functions related to printing output text.
>  
> -Go ahead and add some throwaway printf to that function. This is a decent
> -starting point as we can now add build rules and register the command.
> +Go ahead and add some throwaway printf to the `cmd_psuh` function. This is a
> +decent starting point as we can now add build rules and register the command.
>  
>  NOTE: Your throwaway text, as well as much of the text you will be adding over
>  the course of this tutorial, is user-facing. That means it needs to be

^ permalink raw reply

* Re: [PATCH] git-p4 shouldn't attempt to store symlinks in LFS
From: brian m. carlson @ 2023-10-18 22:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthew McClain, git
In-Reply-To: <xmqqil73sp8l.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 2591 bytes --]

On 2023-10-18 at 19:30:02, Junio C Hamano wrote:
> Matthew McClain <mmcclain@noprivs.com> writes:
> 
> > If a symlink in your Perforce repository matches
> > git-p4.largeFileExtensions, git-p4.py will attempt to put the symlink in
> > LFS but will blow up when it passes a string to generateTempFile.
> >
> > Git LFS normal behavior does not stash symlinks in LFS.
> 
> I am not a "git p4" (or "git lfs") user, but if nobody uses LFS to
> store symbolic links (which is very much understandable given what
> it was invented for), then that alone is a good enough reason to
> avoid the regular blob codepath, and that is true even if
> generateTempFile accepted a str and silently converted it to bytes
> to help callers, no?

Git LFS doesn't store symlinks because smudge/clean filters don't handle
symlinks.  They never get passed to the filter process nor the
smudge/clean filters, nor could that occur without a change to the
protocol or command-line interface.  Unless Git learned how to send them
to the filters, Git LFS would have a hard time using them in any useful
way.

Also, for Git LFS, whose goal is to move large files out of the
repository history, symlinks are typically not an interesting thing to
process because they are functionally limited to 4 KiB or a similar size
on most systems.

> So "but will blow up ..." and the stacktrace below are more or less
> irrelevant details and do not help convince readers why this change
> is desirable.  I'd suggest removing it (and perhaps place more stress
> on explaining why storing symbolic links in LFS is a bad practice
> nobody uses, but if it is so obvious perhaps the single sentence
> explanation you have above would be sufficient).

Hopefully my explanation above can be part of that commit message.

> I know I can ask brian to take a look at this change from LFS angle,
> but who's authoritatively reviewing "git p4" related changes these
> days (Matthew, this is not a question for you, but to the list)?

It looks fine to me from an LFS angle, but I know nothing about Perforce
or git-p4.

Also, as a minor request, it would be great if you could continue to
email me at my personal address, since that's the address with which I
read the list.  My work address appearing on series is more of a
reflection that I'm more frequently finding time to work on things on
work time (hence the work address) versus personal time (where I'd be
using my personal email for patches).  I've fixed it up above.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply

* Re: [PATCH] git-p4 shouldn't attempt to store symlinks in LFS
From: Junio C Hamano @ 2023-10-18 22:46 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Matthew McClain, git
In-Reply-To: <ZTBbqobbqQpxHPI2@tapette.crustytoothpaste.net>

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> Also, as a minor request, it would be great if you could continue to
> email me at my personal address, since that's the address with which I
> read the list.  My work address appearing on series is more of a
> reflection that I'm more frequently finding time to work on things on
> work time (hence the work address) versus personal time (where I'd be
> using my personal email for patches).  I've fixed it up above.

Thanks.

With

    [alias]
	who = !sh -c 'git log -1 --format=\"%an <%ae>\" --author=\"$1\"' -

I say "git who sandals" to get your e-mail address, but I guess I
should use %aE instead to let .mailmap kick in.

^ permalink raw reply

* Re: [PATCH v3 05/10] bulk-checkin: extract abstract `bulk_checkin_source`
From: Junio C Hamano @ 2023-10-18 23:10 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Elijah Newren, Eric W. Biederman, Jeff King,
	Patrick Steinhardt
In-Reply-To: <da52ec838025a59a3f4f4ffaf2e6f9098a37547e.1697648864.git.me@ttaylorr.com>

Taylor Blau <me@ttaylorr.com> writes:

> A future commit will want to implement a very similar routine as in
> `stream_blob_to_pack()` with two notable changes:
>
>   - Instead of streaming just OBJ_BLOBs, this new function may want to
>     stream objects of arbitrary type.
>
>   - Instead of streaming the object's contents from an open
>     file-descriptor, this new function may want to "stream" its contents
>     from memory.
>
> To avoid duplicating a significant chunk of code between the existing
> `stream_blob_to_pack()`, extract an abstract `bulk_checkin_source`. This
> concept currently is a thin layer of `lseek()` and `read_in_full()`, but
> will grow to understand how to perform analogous operations when writing
> out an object's contents from memory.
>
> Suggested-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  bulk-checkin.c | 61 +++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 53 insertions(+), 8 deletions(-)

> diff --git a/bulk-checkin.c b/bulk-checkin.c
> index f4914fb6d1..fc1d902018 100644
> --- a/bulk-checkin.c
> +++ b/bulk-checkin.c
> @@ -140,8 +140,41 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id
>  	return 0;
>  }
>  
> +struct bulk_checkin_source {
> +	enum { SOURCE_FILE } type;
> +
> +	/* SOURCE_FILE fields */
> +	int fd;
> +
> +	/* common fields */
> +	size_t size;
> +	const char *path;
> +};

Looks OK, even though I expected to see a bit more involved object
orientation with something like

	struct bulk_checkin_source {
		off_t (*read)(struct bulk_checkin_source *, void *, size_t);
		off_t (*seek)(struct bulk_checkin_source *, off_t);
		union {
			struct {
				int fd;
				size_t size;
				const char *path;
			} from_fd;
			struct {
				...
			} incore;
		} data;
	};

As there will only be two subclasses of this thing, it may not
matter all that much right now, but it would be much nicer as your
methods do not have to care about "switch (enum) { case FILE: ... }".


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox