Git development
 help / color / mirror / Atom feed
* [PATCH v5 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()`
From: Taylor Blau @ 2024-01-16 22:09 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt, Jonathan Tan, SZEDER Gábor
In-Reply-To: <cover.1705442923.git.me@ttaylorr.com>

Prepare for the 'read-graph' test helper to perform other tasks besides
dumping high-level information about the commit-graph by extracting its
main routine into a separate function.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-graph.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/t/helper/test-read-graph.c b/t/helper/test-read-graph.c
index 8c7a83f578..3375392f6c 100644
--- a/t/helper/test-read-graph.c
+++ b/t/helper/test-read-graph.c
@@ -5,20 +5,8 @@
 #include "bloom.h"
 #include "setup.h"
 
-int cmd__read_graph(int argc UNUSED, const char **argv UNUSED)
+static void dump_graph_info(struct commit_graph *graph)
 {
-	struct commit_graph *graph = NULL;
-	struct object_directory *odb;
-
-	setup_git_directory();
-	odb = the_repository->objects->odb;
-
-	prepare_repo_settings(the_repository);
-
-	graph = read_commit_graph_one(the_repository, odb);
-	if (!graph)
-		return 1;
-
 	printf("header: %08x %d %d %d %d\n",
 		ntohl(*(uint32_t*)graph->data),
 		*(unsigned char*)(graph->data + 4),
@@ -57,6 +45,23 @@ int cmd__read_graph(int argc UNUSED, const char **argv UNUSED)
 	if (graph->topo_levels)
 		printf(" topo_levels");
 	printf("\n");
+}
+
+int cmd__read_graph(int argc UNUSED, const char **argv UNUSED)
+{
+	struct commit_graph *graph = NULL;
+	struct object_directory *odb;
+
+	setup_git_directory();
+	odb = the_repository->objects->odb;
+
+	prepare_repo_settings(the_repository);
+
+	graph = read_commit_graph_one(the_repository, odb);
+	if (!graph)
+		return 1;
+
+	dump_graph_info(graph);
 
 	UNLEAK(graph);
 
-- 
2.43.0.334.gd4dbce1db5.dirty


^ permalink raw reply related

* [PATCH v5 04/17] gitformat-commit-graph: describe version 2 of BDAT
From: Taylor Blau @ 2024-01-16 22:09 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt, Jonathan Tan, SZEDER Gábor
In-Reply-To: <cover.1705442923.git.me@ttaylorr.com>

From: Jonathan Tan <jonathantanmy@google.com>

The code change to Git to support version 2 will be done in subsequent
commits.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/gitformat-commit-graph.txt | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/Documentation/gitformat-commit-graph.txt b/Documentation/gitformat-commit-graph.txt
index 31cad585e2..3e906e8030 100644
--- a/Documentation/gitformat-commit-graph.txt
+++ b/Documentation/gitformat-commit-graph.txt
@@ -142,13 +142,16 @@ All multi-byte numbers are in network byte order.
 
 ==== Bloom Filter Data (ID: {'B', 'D', 'A', 'T'}) [Optional]
     * It starts with header consisting of three unsigned 32-bit integers:
-      - Version of the hash algorithm being used. We currently only support
-	value 1 which corresponds to the 32-bit version of the murmur3 hash
+      - Version of the hash algorithm being used. We currently support
+	value 2 which corresponds to the 32-bit version of the murmur3 hash
 	implemented exactly as described in
 	https://en.wikipedia.org/wiki/MurmurHash#Algorithm and the double
 	hashing technique using seed values 0x293ae76f and 0x7e646e2 as
 	described in https://doi.org/10.1007/978-3-540-30494-4_26 "Bloom Filters
-	in Probabilistic Verification"
+	in Probabilistic Verification". Version 1 Bloom filters have a bug that appears
+	when char is signed and the repository has path names that have characters >=
+	0x80; Git supports reading and writing them, but this ability will be removed
+	in a future version of Git.
       - The number of times a path is hashed and hence the number of bit positions
 	      that cumulatively determine whether a file is present in the commit.
       - The minimum number of bits 'b' per entry in the Bloom filter. If the filter
-- 
2.43.0.334.gd4dbce1db5.dirty


^ permalink raw reply related

* [PATCH v5 03/17] commit-graph: ensure Bloom filters are read with consistent settings
From: Taylor Blau @ 2024-01-16 22:09 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt, Jonathan Tan, SZEDER Gábor
In-Reply-To: <cover.1705442923.git.me@ttaylorr.com>

The changed-path Bloom filter mechanism is parameterized by a couple of
variables, notably the number of bits per hash (typically "m" in Bloom
filter literature) and the number of hashes themselves (typically "k").

It is critically important that filters are read with the Bloom filter
settings that they were written with. Failing to do so would mean that
each query is liable to compute different fingerprints, meaning that the
filter itself could return a false negative. This goes against a basic
assumption of using Bloom filters (that they may return false positives,
but never false negatives) and can lead to incorrect results.

We have some existing logic to carry forward existing Bloom filter
settings from one layer to the next. In `write_commit_graph()`, we have
something like:

    if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) {
        struct commit_graph *g = ctx->r->objects->commit_graph;

        /* We have changed-paths already. Keep them in the next graph */
        if (g && g->chunk_bloom_data) {
            ctx->changed_paths = 1;
            ctx->bloom_settings = g->bloom_filter_settings;
        }
    }

, which drags forward Bloom filter settings across adjacent layers.

This doesn't quite address all cases, however, since it is possible for
intermediate layers to contain no Bloom filters at all. For example,
suppose we have two layers in a commit-graph chain, say, {G1, G2}. If G1
contains Bloom filters, but G2 doesn't, a new G3 (whose base graph is
G2) may be written with arbitrary Bloom filter settings, because we only
check the immediately adjacent layer's settings for compatibility.

This behavior has existed since the introduction of changed-path Bloom
filters. But in practice, this is not such a big deal, since the only
way up until this point to modify the Bloom filter settings at write
time is with the undocumented environment variables:

  - GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY
  - GIT_TEST_BLOOM_SETTINGS_NUM_HASHES
  - GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS

(it is still possible to tweak MAX_CHANGED_PATHS between layers, but
this does not affect reads, so is allowed to differ across multiple
graph layers).

But in future commits, we will introduce another parameter to change the
hash algorithm used to compute Bloom fingerprints itself. This will be
exposed via a configuration setting, making this foot-gun easier to use.

To prevent this potential issue, validate that all layers of a split
commit-graph have compatible settings with the newest layer which
contains Bloom filters.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Original-test-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 commit-graph.c       | 25 +++++++++++++++++
 t/t4216-log-bloom.sh | 65 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index bba316913c..00113b0f62 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -543,6 +543,30 @@ static int validate_mixed_generation_chain(struct commit_graph *g)
 	return 0;
 }
 
+static void validate_mixed_bloom_settings(struct commit_graph *g)
+{
+	struct bloom_filter_settings *settings = NULL;
+	for (; g; g = g->base_graph) {
+		if (!g->bloom_filter_settings)
+			continue;
+		if (!settings) {
+			settings = g->bloom_filter_settings;
+			continue;
+		}
+
+		if (g->bloom_filter_settings->bits_per_entry != settings->bits_per_entry ||
+		    g->bloom_filter_settings->num_hashes != settings->num_hashes) {
+			g->chunk_bloom_indexes = NULL;
+			g->chunk_bloom_data = NULL;
+			FREE_AND_NULL(g->bloom_filter_settings);
+
+			warning(_("disabling Bloom filters for commit-graph "
+				  "layer '%s' due to incompatible settings"),
+				oid_to_hex(&g->oid));
+		}
+	}
+}
+
 static int add_graph_to_chain(struct commit_graph *g,
 			      struct commit_graph *chain,
 			      struct object_id *oids,
@@ -666,6 +690,7 @@ struct commit_graph *load_commit_graph_chain_fd_st(struct repository *r,
 	}
 
 	validate_mixed_generation_chain(graph_chain);
+	validate_mixed_bloom_settings(graph_chain);
 
 	free(oids);
 	fclose(fp);
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index cc6ebc8140..20b0cf0c0e 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -421,8 +421,71 @@ test_expect_success 'Bloom generation backfills empty commits' '
 	)
 '
 
+graph=.git/objects/info/commit-graph
+graphdir=.git/objects/info/commit-graphs
+chain=$graphdir/commit-graph-chain
+
+test_expect_success 'setup for mixed Bloom setting tests' '
+	repo=mixed-bloom-settings &&
+
+	git init $repo &&
+	for i in one two three
+	do
+		test_commit -C $repo $i file || return 1
+	done
+'
+
+test_expect_success 'ensure incompatible Bloom filters are ignored' '
+	# Compute Bloom filters with "unusual" settings.
+	git -C $repo rev-parse one >in &&
+	GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
+		--stdin-commits --changed-paths --split <in &&
+	layer=$(head -n 1 $repo/$chain) &&
+
+	# A commit-graph layer without Bloom filters "hides" the layers
+	# below ...
+	git -C $repo rev-parse two >in &&
+	git -C $repo commit-graph write --stdin-commits --no-changed-paths \
+		--split=no-merge <in &&
+
+	# Another commit-graph layer that has Bloom filters, but with
+	# standard settings, and is thus incompatible with the base
+	# layer written above.
+	git -C $repo rev-parse HEAD >in &&
+	git -C $repo commit-graph write --stdin-commits --changed-paths \
+		--split=no-merge <in &&
+
+	test_line_count = 3 $repo/$chain &&
+
+	# Ensure that incompatible Bloom filters are ignored.
+	git -C $repo -c core.commitGraph=false log --oneline --no-decorate -- file \
+		>expect 2>err &&
+	git -C $repo log --oneline --no-decorate -- file >actual 2>err &&
+	test_cmp expect actual &&
+	grep "disabling Bloom filters for commit-graph layer .$layer." err
+'
+
+test_expect_success 'merge graph layers with incompatible Bloom settings' '
+	# Ensure that incompatible Bloom filters are ignored when
+	# merging existing layers.
+	git -C $repo commit-graph write --reachable --changed-paths 2>err &&
+	grep "disabling Bloom filters for commit-graph layer .$layer." err &&
+
+	test_path_is_file $repo/$graph &&
+	test_dir_is_empty $repo/$graphdir &&
+
+	git -C $repo -c core.commitGraph=false log --oneline --no-decorate -- \
+		file >expect &&
+	trace_out="$(pwd)/trace.perf" &&
+	GIT_TRACE2_PERF="$trace_out" \
+		git -C $repo log --oneline --no-decorate -- file >actual 2>err &&
+
+	test_cmp expect actual &&
+	grep "statistics:{\"filter_not_present\":0," trace.perf &&
+	test_must_be_empty err
+'
+
 corrupt_graph () {
-	graph=.git/objects/info/commit-graph &&
 	test_when_finished "rm -rf $graph" &&
 	git commit-graph write --reachable --changed-paths &&
 	corrupt_chunk_file $graph "$@"
-- 
2.43.0.334.gd4dbce1db5.dirty


^ permalink raw reply related

* [PATCH v5 02/17] revision.c: consult Bloom filters for root commits
From: Taylor Blau @ 2024-01-16 22:09 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt, Jonathan Tan, SZEDER Gábor
In-Reply-To: <cover.1705442923.git.me@ttaylorr.com>

The commit-graph stores changed-path Bloom filters which represent the
set of paths included in a tree-level diff between a commit's root tree
and that of its parent.

When a commit has no parents, the tree-diff is computed against that
commit's root tree and the empty tree. In other words, every path in
that commit's tree is stored in the Bloom filter (since they all appear
in the diff).

Consult these filters during pathspec-limited traversals in the function
`rev_same_tree_as_empty()`. Doing so yields a performance improvement
where we can avoid enumerating the full set of paths in a parentless
commit's root tree when we know that the path(s) of interest were not
listed in that commit's changed-path Bloom filter.

Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Original-patch-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 revision.c           | 26 ++++++++++++++++++++++----
 t/t4216-log-bloom.sh |  8 ++++++--
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/revision.c b/revision.c
index 2424c9bd67..0e6f7d02b6 100644
--- a/revision.c
+++ b/revision.c
@@ -833,17 +833,28 @@ static int rev_compare_tree(struct rev_info *revs,
 	return tree_difference;
 }
 
-static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
+static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit,
+				  int nth_parent)
 {
 	struct tree *t1 = repo_get_commit_tree(the_repository, commit);
+	int bloom_ret = -1;
 
 	if (!t1)
 		return 0;
 
+	if (!nth_parent && revs->bloom_keys_nr) {
+		bloom_ret = check_maybe_different_in_bloom_filter(revs, commit);
+		if (!bloom_ret)
+			return 1;
+	}
+
 	tree_difference = REV_TREE_SAME;
 	revs->pruning.flags.has_changes = 0;
 	diff_tree_oid(NULL, &t1->object.oid, "", &revs->pruning);
 
+	if (bloom_ret == 1 && tree_difference == REV_TREE_SAME)
+		count_bloom_filter_false_positive++;
+
 	return tree_difference == REV_TREE_SAME;
 }
 
@@ -881,7 +892,7 @@ static int compact_treesame(struct rev_info *revs, struct commit *commit, unsign
 		if (nth_parent != 0)
 			die("compact_treesame %u", nth_parent);
 		old_same = !!(commit->object.flags & TREESAME);
-		if (rev_same_tree_as_empty(revs, commit))
+		if (rev_same_tree_as_empty(revs, commit, nth_parent))
 			commit->object.flags |= TREESAME;
 		else
 			commit->object.flags &= ~TREESAME;
@@ -977,7 +988,14 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 		return;
 
 	if (!commit->parents) {
-		if (rev_same_tree_as_empty(revs, commit))
+		/*
+		 * Pretend as if we are comparing ourselves to the
+		 * (non-existent) first parent of this commit object. Even
+		 * though no such parent exists, its changed-path Bloom filter
+		 * (if one exists) is relative to the empty tree, using Bloom
+		 * filters is allowed here.
+		 */
+		if (rev_same_tree_as_empty(revs, commit, 0))
 			commit->object.flags |= TREESAME;
 		return;
 	}
@@ -1058,7 +1076,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 
 		case REV_TREE_NEW:
 			if (revs->remove_empty_trees &&
-			    rev_same_tree_as_empty(revs, p)) {
+			    rev_same_tree_as_empty(revs, p, nth_parent)) {
 				/* We are adding all the specified
 				 * paths from this parent, so the
 				 * history beyond this parent is not
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index b7baf49d62..cc6ebc8140 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -88,7 +88,11 @@ test_bloom_filters_not_used () {
 		# if the Bloom filter system is initialized, ensure that no
 		# filters were used
 		data="statistics:{"
-		data="$data\"filter_not_present\":0,"
+		# unusable filters (e.g., those computed with a
+		# different value of commitGraph.changedPathsVersion)
+		# are counted in the filter_not_present bucket, so any
+		# value is OK there.
+		data="$data\"filter_not_present\":[0-9][0-9]*,"
 		data="$data\"maybe\":0,"
 		data="$data\"definitely_not\":0,"
 		data="$data\"false_positive\":0}"
@@ -175,7 +179,7 @@ test_expect_success 'setup - add commit-graph to the chain with Bloom filters' '
 
 test_bloom_filters_used_when_some_filters_are_missing () {
 	log_args=$1
-	bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":9"
+	bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":10"
 	setup "$log_args" &&
 	grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" &&
 	test_cmp log_wo_bloom log_w_bloom
-- 
2.43.0.334.gd4dbce1db5.dirty


^ permalink raw reply related

* [PATCH v5 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
From: Taylor Blau @ 2024-01-16 22:09 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt, Jonathan Tan, SZEDER Gábor
In-Reply-To: <cover.1705442923.git.me@ttaylorr.com>

The existing implementation of test_bloom_filters_not_used() asserts
that the Bloom filter sub-system has not been initialized at all, by
checking for the absence of any data from it from trace2.

In the following commit, it will become possible to load Bloom filters
without using them (e.g., because the `commitGraph.changedPathVersion`
introduced later in this series is incompatible with the hash version
with which the commit-graph's Bloom filters were written).

When this is the case, it's possible to initialize the Bloom filter
sub-system, while still not using any Bloom filters. When this is the
case, check that the data dump from the Bloom sub-system is all zeros,
indicating that no filters were used.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t4216-log-bloom.sh | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 2ba0324a69..b7baf49d62 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -82,7 +82,19 @@ test_bloom_filters_used () {
 test_bloom_filters_not_used () {
 	log_args=$1
 	setup "$log_args" &&
-	! grep -q "statistics:{\"filter_not_present\":" "$TRASH_DIRECTORY/trace.perf" &&
+
+	if grep -q "statistics:{\"filter_not_present\":" "$TRASH_DIRECTORY/trace.perf"
+	then
+		# if the Bloom filter system is initialized, ensure that no
+		# filters were used
+		data="statistics:{"
+		data="$data\"filter_not_present\":0,"
+		data="$data\"maybe\":0,"
+		data="$data\"definitely_not\":0,"
+		data="$data\"false_positive\":0}"
+
+		grep -q "$data" "$TRASH_DIRECTORY/trace.perf"
+	fi &&
 	test_cmp log_wo_bloom log_w_bloom
 }
 
-- 
2.43.0.334.gd4dbce1db5.dirty


^ permalink raw reply related

* [PATCH v5 00/17] bloom: changed-path Bloom filters v2 (& sundries)
From: Taylor Blau @ 2024-01-16 22:08 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano,
	Patrick Steinhardt, Jonathan Tan, SZEDER Gábor
In-Reply-To: <cover.1697653929.git.me@ttaylorr.com>

(Rebased onto the tip of 'master', which is d4dbce1db5 (The seventh
batch, 2024-01-12), at the time of writing).

This series is a reroll of the combined efforts of [1] and [2] to
introduce the v2 changed-path Bloom filters, which fixes a bug in our
existing implementation of murmur3 paths with non-ASCII characters (when
the "char" type is signed).

In large part, this is the same as the previous round. Like last time,
this round addresses the remaining additional issues pointed out by
SZEDER Gábor. The remaining issues which have been addressed by this
series are:

  - Incorrectly reading Bloom filters computed with differing hash
    versions. This has been corrected by discarding them when a version
    mismatch is detected.

  - Added a note about the new `commitGraph.changedPathVersion`
    configuration variable which can cause (un-fixable, see [3])
    issues in earlier versions of Git which do not yet understand them.

Thanks to Jonathan, Peff, and SZEDER who have helped a great deal in
assembling these patches. As usual, a range-diff is included below.

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
[2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/
[3]: https://lore.kernel.org/git/Zabr1Glljjgl%2FUUB@nand.local/

Jonathan Tan (1):
  gitformat-commit-graph: describe version 2 of BDAT

Taylor Blau (16):
  t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
  revision.c: consult Bloom filters for root commits
  commit-graph: ensure Bloom filters are read with consistent settings
  t/helper/test-read-graph.c: extract `dump_graph_info()`
  bloom.h: make `load_bloom_filter_from_graph()` public
  t/helper/test-read-graph: implement `bloom-filters` mode
  t4216: test changed path filters with high bit paths
  repo-settings: introduce commitgraph.changedPathsVersion
  commit-graph: new Bloom filter version that fixes murmur3
  bloom: annotate filters with hash version
  bloom: prepare to discard incompatible Bloom filters
  commit-graph.c: unconditionally load Bloom filters
  commit-graph: drop unnecessary `graph_read_bloom_data_context`
  object.h: fix mis-aligned flag bits table
  commit-graph: reuse existing Bloom filters where possible
  bloom: introduce `deinit_bloom_filters()`

 Documentation/config/commitgraph.txt     |  29 ++-
 Documentation/gitformat-commit-graph.txt |   9 +-
 bloom.c                                  | 208 ++++++++++++++-
 bloom.h                                  |  38 ++-
 commit-graph.c                           |  64 ++++-
 object.h                                 |   3 +-
 oss-fuzz/fuzz-commit-graph.c             |   2 +-
 repo-settings.c                          |   6 +-
 repository.h                             |   2 +-
 revision.c                               |  26 +-
 t/helper/test-bloom.c                    |   9 +-
 t/helper/test-read-graph.c               |  67 ++++-
 t/t0095-bloom.sh                         |   8 +
 t/t4216-log-bloom.sh                     | 310 ++++++++++++++++++++++-
 14 files changed, 724 insertions(+), 57 deletions(-)

Range-diff against v4:
 1:  e0fc51c3fb !  1:  c5e1b3e507 t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
    @@ Commit message
         checking for the absence of any data from it from trace2.
     
         In the following commit, it will become possible to load Bloom filters
    -    without using them (e.g., because `commitGraph.changedPathVersion` is
    -    incompatible with the hash version with which the commit-graph's Bloom
    -    filters were written).
    +    without using them (e.g., because the `commitGraph.changedPathVersion`
    +    introduced later in this series is incompatible with the hash version
    +    with which the commit-graph's Bloom filters were written).
     
         When this is the case, it's possible to initialize the Bloom filter
         sub-system, while still not using any Bloom filters. When this is the
 2:  87b09e6266 !  2:  8f32fd5f46 revision.c: consult Bloom filters for root commits
    @@ revision.c: static int rev_compare_tree(struct rev_info *revs,
     +				  int nth_parent)
      {
      	struct tree *t1 = repo_get_commit_tree(the_repository, commit);
    -+	int bloom_ret = 1;
    ++	int bloom_ret = -1;
      
      	if (!t1)
      		return 0;
      
    -+	if (nth_parent == 1 && revs->bloom_keys_nr) {
    ++	if (!nth_parent && revs->bloom_keys_nr) {
     +		bloom_ret = check_maybe_different_in_bloom_filter(revs, commit);
     +		if (!bloom_ret)
     +			return 1;
    @@ revision.c: static void try_to_simplify_commit(struct rev_info *revs, struct com
     +		 * (if one exists) is relative to the empty tree, using Bloom
     +		 * filters is allowed here.
     +		 */
    -+		if (rev_same_tree_as_empty(revs, commit, 1))
    ++		if (rev_same_tree_as_empty(revs, commit, 0))
      			commit->object.flags |= TREESAME;
      		return;
      	}
 3:  46d8a41005 !  3:  285b25f1b7 commit-graph: ensure Bloom filters are read with consistent settings
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
     +	test_must_be_empty err
     +'
     +
    - test_done
    + corrupt_graph () {
    +-	graph=.git/objects/info/commit-graph &&
    + 	test_when_finished "rm -rf $graph" &&
    + 	git commit-graph write --reachable --changed-paths &&
    + 	corrupt_chunk_file $graph "$@"
 4:  4d0190a992 =  4:  0cee8078d4 gitformat-commit-graph: describe version 2 of BDAT
 5:  3c2057c11c =  5:  1fc8d2828d t/helper/test-read-graph.c: extract `dump_graph_info()`
 6:  e002e35004 !  6:  03dd7cf30a bloom.h: make `load_bloom_filter_from_graph()` public
    @@ Commit message
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## bloom.c ##
    -@@ bloom.c: static inline unsigned char get_bitmask(uint32_t pos)
    - 	return ((unsigned char)1) << (pos & (BITS_PER_WORD - 1));
    +@@ bloom.c: static int check_bloom_offset(struct commit_graph *g, uint32_t pos,
    + 	return -1;
      }
      
     -static int load_bloom_filter_from_graph(struct commit_graph *g,
 7:  c7016f51cd =  7:  dd9193e404 t/helper/test-read-graph: implement `bloom-filters` mode
 8:  cef2aac8ba !  8:  aa2416795d t4216: test changed path filters with high bit paths
    @@
      ## Metadata ##
    -Author: Jonathan Tan <jonathantanmy@google.com>
    +Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
         t4216: test changed path filters with high bit paths
    @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible
     +	)
     +'
     +
    - test_done
    + corrupt_graph () {
    + 	test_when_finished "rm -rf $graph" &&
    + 	git commit-graph write --reachable --changed-paths &&
 9:  36d4e2202e !  9:  a77dcc99b4 repo-settings: introduce commitgraph.changedPathsVersion
    @@
      ## Metadata ##
    -Author: Jonathan Tan <jonathantanmy@google.com>
    +Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
         repo-settings: introduce commitgraph.changedPathsVersion
    @@ Documentation/config/commitgraph.txt: commitGraph.maxNewFilters::
     +
     +commitGraph.changedPathsVersion::
     +	Specifies the version of the changed-path Bloom filters that Git will read and
    -+	write. May be -1, 0 or 1.
    ++	write. May be -1, 0 or 1. Note that values greater than 1 may be
    ++	incompatible with older versions of Git which do not yet understand
    ++	those versions. Use caution when operating in a mixed-version
    ++	environment.
     ++
     +Defaults to -1.
     ++
    @@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repo_settings *s,
      
     -	if (s->commit_graph_read_changed_paths) {
     +	if (s->commit_graph_changed_paths_version) {
    - 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
    - 			   &graph->chunk_bloom_indexes);
    + 		read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
    + 			   graph_read_bloom_index, graph);
      		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
    +@@ commit-graph.c: static void validate_mixed_bloom_settings(struct commit_graph *g)
    + 		}
    + 
    + 		if (g->bloom_filter_settings->bits_per_entry != settings->bits_per_entry ||
    +-		    g->bloom_filter_settings->num_hashes != settings->num_hashes) {
    ++		    g->bloom_filter_settings->num_hashes != settings->num_hashes ||
    ++		    g->bloom_filter_settings->hash_version != settings->hash_version) {
    + 			g->chunk_bloom_indexes = NULL;
    + 			g->chunk_bloom_data = NULL;
    + 			FREE_AND_NULL(g->bloom_filter_settings);
     
      ## oss-fuzz/fuzz-commit-graph.c ##
     @@ oss-fuzz/fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
    @@ repository.h: struct repo_settings {
      	int gc_write_commit_graph;
      	int fetch_write_commit_graph;
      	int command_requires_full_index;
    +
    + ## t/t4216-log-bloom.sh ##
    +@@ t/t4216-log-bloom.sh: test_expect_success 'setup for mixed Bloom setting tests' '
    + 	done
    + '
    + 
    +-test_expect_success 'ensure incompatible Bloom filters are ignored' '
    ++test_expect_success 'ensure Bloom filters with incompatible settings are ignored' '
    + 	# Compute Bloom filters with "unusual" settings.
    + 	git -C $repo rev-parse one >in &&
    + 	GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
    +@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' '
    + 	test_must_be_empty err
    + '
    + 
    ++test_expect_success 'ensure Bloom filter with incompatible versions are ignored' '
    ++	rm "$repo/$graph" &&
    ++
    ++	git -C $repo log --oneline --no-decorate -- $CENT >expect &&
    ++
    ++	# Compute v1 Bloom filters for commits at the bottom.
    ++	git -C $repo rev-parse HEAD^ >in &&
    ++	git -C $repo commit-graph write --stdin-commits --changed-paths \
    ++		--split <in &&
    ++
    ++	# Compute v2 Bloomfilters for the rest of the commits at the top.
    ++	git -C $repo rev-parse HEAD >in &&
    ++	git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write \
    ++		--stdin-commits --changed-paths --split=no-merge <in &&
    ++
    ++	test_line_count = 2 $repo/$chain &&
    ++
    ++	git -C $repo log --oneline --no-decorate -- $CENT >actual 2>err &&
    ++	test_cmp expect actual &&
    ++
    ++	layer="$(head -n 1 $repo/$chain)" &&
    ++	cat >expect.err <<-EOF &&
    ++	warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings
    ++	EOF
    ++	test_cmp expect.err err
    ++'
    ++
    + get_first_changed_path_filter () {
    + 	test-tool read-graph bloom-filters >filters.dat &&
    + 	head -n 1 filters.dat
10:  f6ab427ead ! 10:  f0f22e852c commit-graph: new filter ver. that fixes murmur3
    @@
      ## Metadata ##
    -Author: Jonathan Tan <jonathantanmy@google.com>
    +Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    commit-graph: new filter ver. that fixes murmur3
    +    commit-graph: new Bloom filter version that fixes murmur3
     
         The murmur3 implementation in bloom.c has a bug when converting series
         of 4 bytes into network-order integers when char is signed (which is
    @@ Documentation/config/commitgraph.txt: commitGraph.readChangedPaths::
      
      commitGraph.changedPathsVersion::
      	Specifies the version of the changed-path Bloom filters that Git will read and
    --	write. May be -1, 0 or 1.
    -+	write. May be -1, 0, 1, or 2.
    - +
    - Defaults to -1.
    - +
    +-	write. May be -1, 0 or 1. Note that values greater than 1 may be
    ++	write. May be -1, 0, 1, or 2. Note that values greater than 1 may be
    + 	incompatible with older versions of Git which do not yet understand
    + 	those versions. Use caution when operating in a mixed-version
    + 	environment.
     @@ Documentation/config/commitgraph.txt: filters when instructed to write.
      If 1, Git will only read version 1 Bloom filters, and will write version 1
      Bloom filters.
    @@ bloom.h: int load_bloom_filter_from_graph(struct commit_graph *g,
      		    size_t len,
     
      ## commit-graph.c ##
    -@@ commit-graph.c: static int graph_read_oid_lookup(const unsigned char *chunk_start,
    +@@ commit-graph.c: static int graph_read_bloom_index(const unsigned char *chunk_start,
      	return 0;
      }
      
    @@ commit-graph.c: static int graph_read_oid_lookup(const unsigned char *chunk_star
     +	struct graph_read_bloom_data_context *c = data;
     +	struct commit_graph *g = c->g;
      	uint32_t hash_version;
    --	g->chunk_bloom_data = chunk_start;
    - 	hash_version = get_be32(chunk_start);
      
    --	if (hash_version != 1)
    -+	if (*c->commit_graph_changed_paths_version == -1) {
    + 	if (chunk_size < BLOOMDATA_CHUNK_HEADER_SIZE) {
    +@@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
    + 		return -1;
    + 	}
    + 
    ++	hash_version = get_be32(chunk_start);
    ++
    ++	if (*c->commit_graph_changed_paths_version == -1)
     +		*c->commit_graph_changed_paths_version = hash_version;
    -+	} else if (hash_version != *c->commit_graph_changed_paths_version) {
    - 		return 0;
    -+	}
    - 
    -+	g->chunk_bloom_data = chunk_start;
    ++	else if (hash_version != *c->commit_graph_changed_paths_version)
    ++		return 0;
    ++
    + 	g->chunk_bloom_data = chunk_start;
    + 	g->chunk_bloom_data_size = chunk_size;
    +-	hash_version = get_be32(chunk_start);
    +-
    +-	if (hash_version != 1)
    +-		return 0;
    +-
      	g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
      	g->bloom_filter_settings->hash_version = hash_version;
      	g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4);
    @@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repo_settings *s,
     +			.g = graph,
     +			.commit_graph_changed_paths_version = &s->commit_graph_changed_paths_version
     +		};
    - 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
    - 			   &graph->chunk_bloom_indexes);
    + 		read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
    + 			   graph_read_bloom_index, graph);
      		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
     -			   graph_read_bloom_data, graph);
     +			   graph_read_bloom_data, &context);
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +	)
     +'
     +
    - test_done
    + corrupt_graph () {
    + 	test_when_finished "rm -rf $graph" &&
    + 	git commit-graph write --reachable --changed-paths &&
11:  dc69b28329 = 11:  b56e94cad7 bloom: annotate filters with hash version
12:  85dbdc4ed2 = 12:  ddfd1ba32a bloom: prepare to discard incompatible Bloom filters
13:  3ff669a622 ! 13:  72aabd289b commit-graph.c: unconditionally load Bloom filters
    @@ Metadata
      ## Commit message ##
         commit-graph.c: unconditionally load Bloom filters
     
    -    In 9e4df4da07 (commit-graph: new filter ver. that fixes murmur3,
    -    2023-08-01), we began ignoring the Bloom data ("BDAT") chunk for
    -    commit-graphs whose Bloom filters were computed using a hash version
    -    incompatible with the value of `commitGraph.changedPathVersion`.
    +    In an earlier commit, we began ignoring the Bloom data ("BDAT") chunk
    +    for commit-graphs whose Bloom filters were computed using a hash version
    +      incompatible with the value of `commitGraph.changedPathVersion`.
     
         Now that the Bloom API has been hardened to discard these incompatible
         filters (with the exception of low-level APIs), we can safely load these
    @@ Commit message
     
      ## commit-graph.c ##
     @@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
    - 	uint32_t hash_version;
    + 
      	hash_version = get_be32(chunk_start);
      
    --	if (*c->commit_graph_changed_paths_version == -1) {
    +-	if (*c->commit_graph_changed_paths_version == -1)
     -		*c->commit_graph_changed_paths_version = hash_version;
    --	} else if (hash_version != *c->commit_graph_changed_paths_version) {
    +-	else if (hash_version != *c->commit_graph_changed_paths_version)
     -		return 0;
    --	}
     -
      	g->chunk_bloom_data = chunk_start;
    + 	g->chunk_bloom_data_size = chunk_size;
      	g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
    - 	g->bloom_filter_settings->hash_version = hash_version;
     @@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
      	ctx->write_generation_data = (get_configured_generation_version(r) == 2);
      	ctx->num_generation_data_overflows = 0;
14:  1c78e3d178 ! 14:  526beb9766 commit-graph: drop unnecessary `graph_read_bloom_data_context`
    @@ Commit message
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## commit-graph.c ##
    -@@ commit-graph.c: static int graph_read_oid_lookup(const unsigned char *chunk_start,
    +@@ commit-graph.c: static int graph_read_bloom_index(const unsigned char *chunk_start,
      	return 0;
      }
      
    @@ commit-graph.c: static int graph_read_oid_lookup(const unsigned char *chunk_star
     -	struct commit_graph *g = c->g;
     +	struct commit_graph *g = data;
      	uint32_t hash_version;
    - 	hash_version = get_be32(chunk_start);
      
    + 	if (chunk_size < BLOOMDATA_CHUNK_HEADER_SIZE) {
     @@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repo_settings *s,
      	}
      
    @@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repo_settings *s,
     -			.g = graph,
     -			.commit_graph_changed_paths_version = &s->commit_graph_changed_paths_version
     -		};
    - 		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
    - 			   &graph->chunk_bloom_indexes);
    + 		read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
    + 			   graph_read_bloom_index, graph);
      		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
     -			   graph_read_bloom_data, &context);
     +			   graph_read_bloom_data, graph);
15:  a289514faa = 15:  c683697efa object.h: fix mis-aligned flag bits table
16:  6a12e39e7f ! 16:  4bf043be9a commit-graph: reuse existing Bloom filters where possible
    @@ Metadata
      ## Commit message ##
         commit-graph: reuse existing Bloom filters where possible
     
    -    In 9e4df4da07 (commit-graph: new filter ver. that fixes murmur3,
    -    2023-08-01), a bug was described where it's possible for Git to produce
    -    non-murmur3 hashes when the platform's "char" type is signed, and there
    -    are paths with characters whose highest bit is set (i.e. all characters
    -    >= 0x80).
    +    In an earlier commit, a bug was described where it's possible for Git to
    +    produce non-murmur3 hashes when the platform's "char" type is signed,
    +    and there are paths with characters whose highest bit is set (i.e. all
    +    characters >= 0x80).
     
         That patch allows the caller to control which version of Bloom filters
         are read and written. However, even on platforms with a signed "char"
    @@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reu
     +	test_filter_upgraded 1 trace2.txt
     +'
     +
    - test_done
    + corrupt_graph () {
    + 	test_when_finished "rm -rf $graph" &&
    + 	git commit-graph write --reachable --changed-paths &&
17:  8942f205c8 = 17:  7daa0d8833 bloom: introduce `deinit_bloom_filters()`

base-commit: d4dbce1db5cd227a57074bcfc7ec9f0655961bba
-- 
2.43.0.334.gd4dbce1db5.dirty

^ permalink raw reply

* Re: Custom merge drivers: accessing the pathnames and revisions of the files being merged
From: Antonin Delpeuch @ 2024-01-16 21:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqedeh1816.fsf@gitster.g>

Thanks a lot for your insights! Yes I also very much appreciate that the 
extension point is built in such a way that introducing new parameters 
is non-breaking.

On 16/01/2024 18:51, Junio C Hamano wrote:
>   * Whatever letters we choose, they must have mnemonic value that
>     signals two of them are the both sides of the merge that are
>     equal participants, and the other one is the old-file, their
>     common ancestor that plays quite a different from these two in
>     the merge.  I cannot tell which one of the XYZ would be the more
>     special than other two, which is the primary reason why I do not
>     know if XYZ is a good idea.

That makes perfect sense. How about:

- %S for the "source" pathname (corresponding to the %O file)

- %X for the first side of the merge (corresponding to the %A file)

- %Y for the second side of the merge (corresponding to the %B file)

Anyway, I'll try to work on a patch: it should be easy to adapt the 
letters to any other choice.

Best,

Antonin


^ permalink raw reply

* Re: [PATCH v2 4/4] maintenance: use XDG config if it exists
From: Junio C Hamano @ 2024-01-16 21:52 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git, ps, stolee, Eric Sunshine, Taylor Blau
In-Reply-To: <8bd67c5bf01ca10fbf575dfa2cf88f8c88b48276.1705267839.git.code@khaugsbakk.name>

Kristoffer Haugsbakk <code@khaugsbakk.name> writes:

> diff --git a/builtin/gc.c b/builtin/gc.c
> index c078751824c..cb80ced6cb5 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -1543,19 +1543,18 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
>  
>  	if (!found) {
>  		int rc;
> -		char *user_config = NULL, *xdg_config = NULL;
> +		char *global_config_file = NULL;
>  
>  		if (!config_file) {
> -			git_global_config_paths(&user_config, &xdg_config);
> -			config_file = user_config;
> -			if (!user_config)
> -				die(_("$HOME not set"));
> +			global_config_file = git_global_config();
> +			config_file = global_config_file;
>  		}
> +		if (!config_file)
> +			die(_("$HOME not set"));
>  		rc = git_config_set_multivar_in_file_gently(
>  			config_file, "maintenance.repo", maintpath,
>  			CONFIG_REGEX_NONE, 0);

OK.  We used to ask for both user and xdg and without using xdg at
all, as long as $HOME is set, we used $HOME/.gitconfig even if it
did not exist.

What we want to happen is we pick XDG is XDG exists *and* $HOME/.gitconfig
does not.  And that is exactly what git_global_config() gives us.

Nicely done.

> @@ -1612,18 +1611,18 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
>  
>  	if (found) {
>  		int rc;
> -		char *user_config = NULL, *xdg_config = NULL;
> +		char *global_config_file = NULL;
> +
>  		if (!config_file) {
> -			git_global_config_paths(&user_config, &xdg_config);
> -			config_file = user_config;
> -			if (!user_config)
> -				die(_("$HOME not set"));
> +			global_config_file = git_global_config();
> +			config_file = global_config_file;
>  		}
> +		if (!config_file)
> +			die(_("$HOME not set"));
>  		rc = git_config_set_multivar_in_file_gently(
>  			config_file, key, NULL, maintpath,
>  			CONFIG_FLAGS_MULTI_REPLACE | CONFIG_FLAGS_FIXED_VALUE);

Ditto.

^ permalink raw reply

* Re: [PATCH v2 3/4] config: factor out global config file retrieval
From: Kristoffer Haugsbakk @ 2024-01-16 21:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Patrick Steinhardt, stolee, Eric Sunshine, Taylor Blau
In-Reply-To: <xmqqcyu1yn36.fsf@gitster.g>

On Tue, Jan 16, 2024, at 22:39, Junio C Hamano wrote:
> Kristoffer Haugsbakk <code@khaugsbakk.name> writes:
>>  char *git_system_config(void);
>> +/**
>> + * Returns `NULL` if is uncertain whether or not `HOME/.gitconfig` exists.
>> + */
>
> Sorry, but I am not sure what this comment wants to say.
>
> When $HOME is not set, we do get NULL out of this function.  But
> interpolate_path() that makes git_global_config_paths() to return
> NULL in user_config does not do any existence check with stat() or
> access(), so even when we return a string that is "~/.gitconfig"
> expanded to '/home/user/.gitconfig", we are not certain if the file
> exists.  So,... it is unclear what "uncertain"ty we are talking
> about in this case.

I'll delete it. It was an attempt to refer to the comments about
"It is unknown if HOME/.gitconfig exists".

Cheers

-- 
Kristoffer Haugsbakk

^ permalink raw reply

* Re: [PATCH v2 3/4] config: factor out global config file retrieval
From: Junio C Hamano @ 2024-01-16 21:39 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git, ps, stolee, Eric Sunshine, Taylor Blau
In-Reply-To: <32e5ec7d866ff8fd26554b325812c6e19cb65126.1705267839.git.code@khaugsbakk.name>

Kristoffer Haugsbakk <code@khaugsbakk.name> writes:

>  	if (use_global_config) {
> -		char *user_config, *xdg_config;
> ...
> -	else if (use_system_config) {
> +	} else if (use_system_config) {
>  		given_config_source.file = git_system_config();
>  		given_config_source.scope = CONFIG_SCOPE_SYSTEM;
>  	} else if (use_local_config) {
> diff --git a/config.c b/config.c
> index ebc6a57e1c3..3cfeb3d8bd9 100644
> --- a/config.c
> +++ b/config.c
> @@ -1987,6 +1987,26 @@ char *git_system_config(void)
>  	return system_config;
>  }
>  
> +char *git_global_config(void)
> +{
> +...
> +}
> +
>  void git_global_config_paths(char **user_out, char **xdg_out)
>  {
>  	char *user_config = xstrdup_or_null(getenv("GIT_CONFIG_GLOBAL"));

The conversion above

> diff --git a/config.h b/config.h
> index e5e523553cc..625e932b993 100644
> --- a/config.h
> +++ b/config.h
> @@ -382,6 +382,10 @@ int config_error_nonbool(const char *);
>  #endif
>  
>  char *git_system_config(void);
> +/**
> + * Returns `NULL` if is uncertain whether or not `HOME/.gitconfig` exists.
> + */

Sorry, but I am not sure what this comment wants to say.

When $HOME is not set, we do get NULL out of this function.  But
interpolate_path() that makes git_global_config_paths() to return
NULL in user_config does not do any existence check with stat() or
access(), so even when we return a string that is "~/.gitconfig"
expanded to '/home/user/.gitconfig", we are not certain if the file
exists.  So,... it is unclear what "uncertain"ty we are talking
about in this case.

> +char *git_global_config(void);

^ permalink raw reply

* Re: Make git ls-files omit deleted files
From: Raul Rangel @ 2024-01-16 21:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqply68c87.fsf@gitster.g>

On Fri, Jan 12, 2024 at 2:37 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Raul Rangel <rrangel@google.com> writes:
>
> > I'm trying to copy my current git worktree to a new directory, while
> > including all modified and untracked files, but excluding any ignored
> > files.
>
> Curiously missing from the above is "unmodified".  You only talked
> about modified, untracked, and ignored, but what do you want to do
> with them?
>
> As you are grabbing the files from the working tree, I suspect that
> you do not want to base your decision on what is in the index, which
> means that ls-files might be a wrong tool for the job.

Thanks for this insight. I ended up using `git ls-files` and
`--ignored` to print out all the files that I want to ignore, and I
converted that into a `find` exclusion list.

Here is the final command:
$ git ls-files --others --ignored --exclude-standard --directory |
(printf "%s\n" \( -path ./.git; sed -e 's/^/.\//' -e 's/\/$//' -e
's/^/-o\n-path\n/'; printf "%s\n" \) -prune -o ! -type d -print0 ) |
xargs --exit find . | xargs -0 --max-procs 0 cp --parents
--no-dereference  -t /tmp/clone

This is actually a lot faster than I would have expected.
Parallelizing the `cp` cuts the time down dramatically.

Thanks again!

p.s., Sorry for the resend. I forgot to send a plain text to the mailing list.

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
From: Taylor Blau @ 2024-01-16 20:49 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git
In-Reply-To: <20240113225157.GD3000857@szeder.dev>

On Sat, Jan 13, 2024 at 11:51:57PM +0100, SZEDER Gábor wrote:
> On a related note, if current git (I tried current master and v2.43.0)
> encounters a commit graph layer containing v2 Bloom filters (created
> by current seen) while writing a new commit graph, then it segfaults
> dereferencing a NULL 'settings' pointer in
> get_or_compute_bloom_filter().
>
> The test below demonstrates this, but it's quite hacky using two
> different git versions: it has to be run by an old git version not yet
> supporting v2 Bloom filters, and a new git version already supporting
> them should be installed at /tmp/git-new/.
>
> diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
> index 2ba0324a69..0464dd68d5 100755
> --- a/t/t4216-log-bloom.sh
> +++ b/t/t4216-log-bloom.sh
> @@ -454,4 +454,33 @@ test_expect_success 'Bloom reader notices out-of-order index offsets' '
>  	test_cmp expect.err err
>  '
>
> +CENT=$(printf "\302\242")
> +test_expect_success 'split commit graph vs changed paths Bloom filter v2 vs old git' '
> +	git init split-v2-old &&
> +	(
> +		cd split-v2-old &&
> +		git commit --allow-empty -m "Bloom filters are written but still ignored for root commits :(" &&
> +		for i in 1 2 3
> +		do
> +			echo $i >$CENT &&
> +			git add $CENT &&
> +			git commit -m "$i" || return 1
> +		done &&
> +		git log --oneline -- $CENT >expect &&
> +
> +		# Here we write a commit graph layer containing v2 changed
> +		# path Bloom filters using a git binary built from current
> +		# 'seen' branch.
> +		git rev-parse HEAD^ |
> +		/tmp/git-new/bin/git -c commitgraph.changedPathsVersion=2 \
> +			commit-graph write --stdin-commits --changed-paths --split &&
> +
> +		# This is current master, and segfaults.
> +		git commit-graph write --reachable --changed-paths &&
> +
> +		git log --oneline -- $CENT >actual &&
> +		test_cmp expect actual
> +	)
> +'
> +
>  test_done

Thanks. The segfault is reproducible on my end, but I don't think that
it is possible to fix this for existing versions of Git. The problem (as
you note in your backtrace) is here:

    #0  0x000055555569c842 in get_or_compute_bloom_filter (
        r=0x5555559c9ce0 <the_repo>, c=0x5555559dffd0, compute_if_not_present=1,
        settings=0x0, computed=0x7fffffffe0f4) at bloom.c:253

Which tries to dereference ctx->bloom_settings, which is NULL. Note that
we initialize some sensible defaults for ctx->bloom_settings in
commit-graph.c::write_commit_graph():

    struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS;
    /* ... */
    bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
                                                  bloom_settings.bits_per_entry);
    bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
                                              bloom_settings.num_hashes);
    bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS",
                                                     bloom_settings.max_changed_paths);
    ctx->bloom_settings = &bloom_settings;

...but we'll throw those away in favor of whatever is in the topmost
layer of the existing commit-graph chain later on in that same function:

    if (!(flags & COMMIT_GRAPH_NO_WRITE_BLOOM_FILTERS)) {
      struct commit_graph *g;

      g = ctx->r->objects->commit_graph;

      /* We have changed-paths already. Keep them in the next graph */
      if (g && g->chunk_bloom_data) {
        ctx->changed_paths = 1;
        ctx->bloom_settings = g->bloom_filter_settings;
      }
    }

OK, everything seems fine thus far, until we inspect the value of
g->bloom_filter_settings, which is NULL, becuase of this hunk from
commit-graph.c::graph_read_bloom_data():

    if (hash_version != 1)
      return 0;

which terminates the function before we assign g->bloom_filter_settings
for the existing (written with v2 Bloom filters) graph layer.

I don't think that there is a way to fix this in a backwards compatible
way, but I'm comfortable with that in this instance since we don't
expect users to upgrading to v2 Bloom filters and then writing new graph
layers using a non-v2 compatible version of Git.

We can add a warning in the series that I'm working on indicating this,
but I don't think there's much more we can do besides changing this to
indicate a warning and bailing instead of segfaulting.

Thanks,
Taylor

^ permalink raw reply

* What's cooking in git.git (Jan 2024, #05; Tue, 16)
From: Junio C Hamano @ 2024-01-16 20:45 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking in my tree.  Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a
future release).  Commits prefixed with '-' are only in 'seen', and
aren't considered "accepted" at all and may be annotated with an URL
to a message that raises issues but they are no means exhaustive.  A
topic without enough support may be discarded after a long period of
no activity (of course they can be resubmit when new interests
arise).

Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors.  Some
repositories have only a subset of branches.

With maint, master, next, seen, todo:

	git://git.kernel.org/pub/scm/git/git.git/
	git://repo.or.cz/alt-git.git/
	https://kernel.googlesource.com/pub/scm/git/git/
	https://github.com/git/git/
	https://gitlab.com/git-vcs/git/

With all the integration branches and topics broken out:

	https://github.com/gitster/git/

Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):

	git://git.kernel.org/pub/scm/git/git-htmldocs.git/
	https://github.com/gitster/git-htmldocs.git/

Release tarballs are available at:

	https://www.kernel.org/pub/software/scm/git/

--------------------------------------------------
[Graduated to 'master']

* cp/git-flush-is-an-env-bool (2024-01-04) 1 commit
  (merged to 'next' on 2024-01-04 at b435a96ce8)
 + write-or-die: make GIT_FLUSH a Boolean environment variable

 Unlike other environment variables that took the usual
 true/false/yes/no as well as 0/1, GIT_FLUSH only understood 0/1,
 which has been corrected.
 source: <pull.1628.v3.git.1704363617842.gitgitgadget@gmail.com>


* cp/sideband-array-index-comment-fix (2023-12-28) 1 commit
  (merged to 'next' on 2024-01-08 at f906bc86f1)
 + sideband.c: remove redundant 'NEEDSWORK' tag

 In-code comment fix.
 source: <pull.1625.v4.git.1703750460527.gitgitgadget@gmail.com>


* ib/rebase-reschedule-doc (2024-01-05) 1 commit
  (merged to 'next' on 2024-01-08 at d451d1f760)
 + rebase: clarify --reschedule-failed-exec default

 Doc update.
 source: <20240105011424.1443732-2-illia.bobyr@gmail.com>


* jk/commit-graph-slab-clear-fix (2024-01-05) 1 commit
  (merged to 'next' on 2024-01-08 at f78c4fc296)
 + commit-graph: retain commit slab when closing NULL commit_graph

 Clearing in-core repository (happens during e.g., "git fetch
 --recurse-submodules" with commit graph enabled) made in-core
 commit object in an inconsistent state by discarding the necessary
 data from commit-graph too early, which has been corrected.
 source: <20240105054142.GA2035092@coredump.intra.peff.net>


* jk/index-pack-lsan-false-positive-fix (2024-01-05) 1 commit
  (merged to 'next' on 2024-01-08 at 589ed65251)
 + index-pack: spawn threads atomically

 Fix false positive reported by leak sanitizer.
 source: <20240105085034.GA3078476@coredump.intra.peff.net>


* jk/t1006-cat-file-objectsize-disk (2024-01-03) 2 commits
  (merged to 'next' on 2024-01-03 at a492c6355c)
 + t1006: prefer shell loop to awk for packed object sizes
  (merged to 'next' on 2023-12-28 at d82812e636)
 + t1006: add tests for %(objectsize:disk)

 Test update.
 source: <20231221094722.GA570888@coredump.intra.peff.net>
 source: <20240103090152.GB1866508@coredump.intra.peff.net>


* js/contributor-docs-updates (2023-12-27) 9 commits
  (merged to 'next' on 2024-01-02 at 0e072117cd)
 + SubmittingPatches: hyphenate non-ASCII
 + SubmittingPatches: clarify GitHub artifact format
 + SubmittingPatches: clarify GitHub visual
 + SubmittingPatches: provide tag naming advice
 + SubmittingPatches: update extra tags list
 + SubmittingPatches: discourage new trailers
 + SubmittingPatches: drop ref to "What's in git.git"
 + CodingGuidelines: write punctuation marks
 + CodingGuidelines: move period inside parentheses

 Doc update.
 source: <pull.1623.v3.git.1703739324.gitgitgadget@gmail.com>


* jw/builtin-objectmode-attr (2023-12-28) 1 commit
  (merged to 'next' on 2024-01-02 at 4c3784b3a1)
 + attr: add builtin objectmode values support

 The builtin_objectmode attribute is populated for each path
 without adding anything in .gitattributes files, which would be
 useful in magic pathspec, e.g., ":(attr:builtin_objectmode=100755)"
 to limit to executables.
 cf. <xmqq5y0ssknj.fsf@gitster.g>
 source: <20231116054437.2343549-1-jojwang@google.com>


* jx/sideband-chomp-newline-fix (2023-12-18) 3 commits
  (merged to 'next' on 2024-01-04 at 1237898a22)
 + pkt-line: do not chomp newlines for sideband messages
 + pkt-line: memorize sideband fragment in reader
 + test-pkt-line: add option parser for unpack-sideband

 Sideband demultiplexer fixes.
 source: <cover.1702823801.git.zhiyou.jx@alibaba-inc.com>


* ms/rebase-insnformat-doc-fix (2024-01-03) 1 commit
  (merged to 'next' on 2024-01-04 at d68f2be39b)
 + Documentation: fix statement about rebase.instructionFormat

 Docfix.
 source: <pull.1629.git.git.1704305663254.gitgitgadget@gmail.com>


* ps/refstorage-extension (2024-01-02) 13 commits
  (merged to 'next' on 2024-01-08 at f9a034803b)
 + t9500: write "extensions.refstorage" into config
 + builtin/clone: introduce `--ref-format=` value flag
 + builtin/init: introduce `--ref-format=` value flag
 + builtin/rev-parse: introduce `--show-ref-format` flag
 + t: introduce GIT_TEST_DEFAULT_REF_FORMAT envvar
 + setup: introduce GIT_DEFAULT_REF_FORMAT envvar
 + setup: introduce "extensions.refStorage" extension
 + setup: set repository's formats on init
 + setup: start tracking ref storage format
 + refs: refactor logic to look up storage backends
 + worktree: skip reading HEAD when repairing worktrees
 + t: introduce DEFAULT_REPO_FORMAT prereq
 + Merge branch 'ps/clone-into-reftable-repository' into ps/refstorage-extension
 (this branch is used by ps/prompt-parse-HEAD-futureproof and ps/worktree-refdb-initialization.)

 Introduce a new extension "refstorage" so that we can mark a
 repository that uses a non-default ref backend, like reftable.
 source: <cover.1703833818.git.ps@pks.im>


* ps/reftable-fixes-and-optims (2024-01-03) 9 commits
  (merged to 'next' on 2024-01-08 at 167d7685f8)
 + reftable/merged: transfer ownership of records when iterating
 + reftable/merged: really reuse buffers to compute record keys
 + reftable/record: store "val2" hashes as static arrays
 + reftable/record: store "val1" hashes as static arrays
 + reftable/record: constify some parts of the interface
 + reftable/writer: fix index corruption when writing multiple indices
 + reftable/stack: do not auto-compact twice in `reftable_stack_add()`
 + reftable/stack: do not overwrite errors when compacting
 + Merge branch 'ps/reftable-fixes' into ps/reftable-fixes-and-optims

 More fixes and optimizations to the reftable backend.
 source: <cover.1704262787.git.ps@pks.im>


* tb/multi-pack-verbatim-reuse (2023-12-14) 26 commits
  (merged to 'next' on 2024-01-04 at 891ac0fa2c)
 + t/perf: add performance tests for multi-pack reuse
 + pack-bitmap: enable reuse from all bitmapped packs
 + pack-objects: allow setting `pack.allowPackReuse` to "single"
 + t/test-lib-functions.sh: implement `test_trace2_data` helper
 + pack-objects: add tracing for various packfile metrics
 + pack-bitmap: prepare to mark objects from multiple packs for reuse
 + pack-revindex: implement `midx_pair_to_pack_pos()`
 + pack-revindex: factor out `midx_key_to_pack_pos()` helper
 + midx: implement `midx_preferred_pack()`
 + git-compat-util.h: implement checked size_t to uint32_t conversion
 + pack-objects: include number of packs reused in output
 + pack-objects: prepare `write_reused_pack_verbatim()` for multi-pack reuse
 + pack-objects: prepare `write_reused_pack()` for multi-pack reuse
 + pack-objects: pass `bitmapped_pack`'s to pack-reuse functions
 + pack-objects: keep track of `pack_start` for each reuse pack
 + pack-objects: parameterize pack-reuse routines over a single pack
 + pack-bitmap: return multiple packs via `reuse_partial_packfile_from_bitmap()`
 + pack-bitmap: simplify `reuse_partial_packfile_from_bitmap()` signature
 + ewah: implement `bitmap_is_empty()`
 + pack-bitmap: pass `bitmapped_pack` struct to pack-reuse functions
 + midx: implement `midx_locate_pack()`
 + midx: implement `BTMP` chunk
 + midx: factor out `fill_pack_info()`
 + pack-bitmap: plug leak in find_objects()
 + pack-bitmap-write: deep-clear the `bb_commit` slab
 + pack-objects: free packing_data in more places

 Streaming spans of packfile data used to be done only from a
 single, primary, pack in a repository with multiple packfiles.  It
 has been extended to allow reuse from other packfiles, too.
 cf. <ZXurD1NTZ4TAs7WZ@nand.local>
 source: <cover.1702592603.git.me@ttaylorr.com>

--------------------------------------------------
[New Topics]

* es/some-up-to-date-messages-must-stay (2024-01-12) 1 commit
  (merged to 'next' on 2024-01-16 at 2b598f7de2)
 + messages: mark some strings with "up-to-date" not to touch

 Comment updates to help developers not to attempt to modify
 messages from plumbing commands that must stay constant.

 It might make sense to reassess the plumbing needs every few years,
 but that should be done as a separate effort.

 Will merge to 'master'.
 source: <20240112171910.11131-1-ericsunshine@charter.net>


* bk/complete-bisect (2024-01-12) 5 commits
 - completion: custom git-bisect terms
 - completion: custom git-bisect terms
 - completion: move to maintain define-before-use
 - completion: git-log opts to bisect visualize
 - completion: complete new old actions, start opts

 Command line completion support (in contrib/) has been
 updated for "git bisect".

 Needs review.
 source: <20240110020347.673155-1-britton.kerin@gmail.com>


* bk/complete-dirname-for-am-and-format-patch (2024-01-12) 1 commit
 - completion: dir-type optargs for am, format-patch

 Command line completion support (in contrib/) has been
 updated for a few commands to complete directory names where a
 directory name is expected.

 Needs review.
 source: <d37781c3-6af2-409b-95a8-660a9b92d20b@smtp-relay.sendinblue.com>


* bk/complete-send-email (2024-01-12) 1 commit
 - completion: don't complete revs when --no-format-patch

 Command line completion support (in contrib/) has been taught to
 avoid offering revision names as candidates to "git send-email" when
 the command is used to send pre-generated files.

 Needs review.
 source: <a718b5ee-afb0-44bd-a299-3208fac43506@smtp-relay.sendinblue.com>


* gt/test-commit-o-i-options (2024-01-16) 2 commits
 - t7501: add tests for --amend --signoff
 - t7501: add tests for --include and --only

 A few tests to "git commit -o <pathspec>" and "git commit -i
 <pathspec>" has been added.

 Expecting a reroll.
 cf. <xmqq1qah46i0.fsf@gitster.g>
 source: <20240113042254.38602-1-shyamthakkar001@gmail.com>


* jt/tests-with-reftable (2024-01-12) 2 commits
 - t5541: remove lockfile creation
 - t1401: remove lockfile creation

 Tweak a few tests not to manually modify the reference database
 (hence easier to work with other backends like reftable).

 Will merge to 'next'.
 source: <pull.1634.v2.git.1705004670.gitgitgadget@gmail.com>


* la/strvec-comment-fix (2024-01-12) 1 commit
 - strvec: use correct member name in comments

 Comment fix.

 Will merge to 'next'.
 source: <pull.1640.git.1705043195997.gitgitgadget@gmail.com>


* la/trailer-api (2024-01-12) 10 commits
 - trailer: delete obsolete argument handling code from API
 - trailer: move arg handling to interpret-trailers.c
 - trailer: prepare to move parse_trailers_from_command_line_args() to builtin
 - trailer: spread usage of "trailer_block" language
 - trailer: make trailer_info struct private
 - sequencer: use the trailer iterator
 - trailer: delete obsolete formatting functions
 - trailer: unify trailer formatting machinery
 - trailer: include "trailer" term in API functions
 - trailer: move process_trailers() to interpret-trailers.c

 Code clean-up.

 Needs review.
 source: <pull.1632.git.1704869487.gitgitgadget@gmail.com>


* ps/tests-with-ref-files-backend (2024-01-12) 6 commits
 - t: mark tests regarding git-pack-refs(1) to be backend specific
 - t5526: break test submodule differently
 - t1419: mark test suite as files-backend specific
 - t1302: make tests more robust with new extensions
 - t1301: mark test for `core.sharedRepository` as reffiles specific
 - t1300: make tests more robust with non-default ref backends

 Prepare existing tests on refs to work better with non-default
 backends.

 Needs review.
 source: <cover.1704877233.git.ps@pks.im>


* ne/doc-filter-blob-limit-fix (2024-01-16) 1 commit
 - rev-list-options: fix off-by-one in '--filter=blob:limit=<n>' explainer

 Docfix.

 Will merge to 'next'.
 source: <pull.1645.git.git.1705261850650.gitgitgadget@gmail.com>


* ps/commit-graph-write-leakfix (2024-01-15) 1 commit
 - commit-graph: fix memory leak when not writing graph

 Leakfix.

 Will merge to 'next'.
 source: <0feab5e7d5bc6275e2c7671cd8f6786ea86fd610.1702891190.git.ps@pks.im>

--------------------------------------------------
[Cooking]

* cp/t4129-pipefix (2024-01-10) 1 commit
  (merged to 'next' on 2024-01-12 at fd9b72b71a)
 + t4129: prevent loss of exit code due to the use of pipes

 Test update.

 Will merge to 'master'.
 source: <pull.1636.git.1704891257544.gitgitgadget@gmail.com>


* rj/advice-delete-branch-not-fully-merged (2024-01-11) 3 commits
 - branch: make the advice to force-deleting a conditional one
 - advice: fix an unexpected leading space
 - advice: sort the advice related lists
 (this branch is used by rj/advice-disable-how-to-disable.)

 The error message given when "git branch -d branch" fails due to
 commits unique to the branch has been split into an error and a new
 conditional advice message.

 Will merge to 'next'.
 source: <4aedc15c-4b3f-4f5e-abea-581b501600f8@gmail.com>


* en/diffcore-delta-final-line-fix (2024-01-11) 1 commit
 - diffcore-delta: avoid ignoring final 'line' of file

 Rename detection logic ignored the final line of a file if it is an
 incomplete line.

 Expecting a reroll.
 cf. <xmqqedenearc.fsf@gitster.g>
 source: <pull.1637.git.1705006074626.gitgitgadget@gmail.com>


* mj/gitweb-unreadable-config-error (2024-01-10) 1 commit
 - gitweb: die when a configuration file cannot be read

 When given an existing but unreadable file as a configuration file,
 gitweb behaved as if the file did not exist at all, but now it
 errors out.  This is a change that may break backward compatibility.

 Will merge to 'next'.
 source: <20240110225709.30168-1-marcelo.jimenez@gmail.com>


* ps/completion-with-reftable-fix (2024-01-16) 5 commits
 - completion: treat dangling symrefs as existing pseudorefs
 - completion: silence pseudoref existence check
 - completion: improve existence check for pseudo-refs
 - t9902: verify that completion does not print anything
 - completion: discover repo path in `__git_pseudoref_exists ()`

 Completion update to prepare for reftable

 Will merge to 'next'?
 source: <cover.1705314554.git.ps@pks.im>


* ps/p4-use-ref-api (2024-01-11) 1 commit
  (merged to 'next' on 2024-01-12 at 3f89cf25f6)
 + git-p4: stop reaching into the refdb

 "git p4" update to prepare for reftable

 Will merge to 'master'.
 source: <33d6a062ec56be33ed50a42a420be0b023f6f4cf.1704980814.git.ps@pks.im>


* ps/gitlab-ci-static-analysis (2024-01-08) 1 commit
  (merged to 'next' on 2024-01-10 at 71af34de07)
 + ci: add job performing static analysis on GitLab CI

 GitLab CI update.

 Will merge to 'master'.
 source: <1536a5ef07ad24dafb5d685b40099882f89e6cc5.1703761005.git.ps@pks.im>


* ps/prompt-parse-HEAD-futureproof (2024-01-08) 2 commits
  (merged to 'next' on 2024-01-10 at f9515b9d89)
 + git-prompt: stop manually parsing HEAD with unknown ref formats
 + Merge branch 'ps/refstorage-extension' into ps/prompt-parse-HEAD-futureproof

 Futureproof command line prompt support (in contrib/).

 Will merge to 'master'.
 source: <ef4e36a5a40c369da138242a8fdc9e12a846613b.1704356313.git.ps@pks.im>


* ps/reftable-optimize-io (2024-01-11) 5 commits
  (merged to 'next' on 2024-01-12 at 4096e880e0)
 + reftable/blocksource: use mmap to read tables
 + reftable/blocksource: refactor code to match our coding style
 + reftable/stack: use stat info to avoid re-reading stack list
 + reftable/stack: refactor reloading to use file descriptor
 + reftable/stack: refactor stack reloading to have common exit path

 Low-level I/O optimization for reftable.

 Will merge to 'master'.
 source: <cover.1704966670.git.ps@pks.im>


* rj/clarify-branch-doc-m (2024-01-08) 1 commit
  (merged to 'next' on 2024-01-10 at 432eaa2c6b)
 + branch: clarify <oldbranch> term

 Doc update.

 Will merge to 'master'.
 source: <d38e5a18-4d85-48f3-bc8b-8ca02ea683a4@gmail.com>


* tb/fetch-all-configuration (2024-01-08) 1 commit
  (merged to 'next' on 2024-01-12 at 6a05050382)
 + fetch: add new config option fetch.all

 "git fetch" learned to pay attention to "fetch.all" configuration
 variable, which pretends as if "--all" was passed from the command
 line when no remote parameter was given.

 Will merge to 'master'.
 source: <20240108211832.47362-1-dev@tb6.eu>


* rj/advice-disable-how-to-disable (2024-01-15) 1 commit
 - advice: allow disabling the automatic hint in advise_if_enabled()
 (this branch uses rj/advice-delete-branch-not-fully-merged.)

 All conditional "advice" messages show how to turn them off, which
 becomes repetitive.  Add a configuration variable to omit the
 instruction.

 Will requeue with better dependencies before merging to 'next'.
 source: <c870a0b6-9fa8-4d00-a5a6-661ca175805f@gmail.com>


* vd/fsck-submodule-url-test (2024-01-09) 3 commits
 - submodule-config.c: strengthen URL fsck check
 - t7450: test submodule urls
 - submodule-config.h: move check_submodule_url

 Tighten URL checks fsck makes in a URL recorded for submodules.

 Expecting a reroll (and review response).
 cf. <20240110103812.GB16674@coredump.intra.peff.net>
 cf. <ZZ46MrjSocJl-kpU@tanuki>
 source: <pull.1635.git.1704822817.gitgitgadget@gmail.com>


* sd/negotiate-trace-fix (2024-01-03) 1 commit
 - push: region_leave trace for negotiate_using_fetch

 Tracing fix.

 Waiting for a review response.
 cf. <xmqqbka27zu9.fsf@gitster.g>
 source: <20240103224054.1940209-1-delmerico@google.com>


* sk/mingw-owner-check-error-message-improvement (2024-01-10) 1 commit
  (merged to 'next' on 2024-01-12 at 05c56e151b)
 + mingw: give more details about unsafe directory's ownership

 In addition to (rather cryptic) Security Identifiers, show username
 and domain in the error message when we barf on mismatch between
 the Git directory and the current user.

 Will merge to 'master'.
 source: <20240108173837.20480-2-soekkle@freenet.de>


* ps/worktree-refdb-initialization (2024-01-08) 7 commits
 - builtin/worktree: create refdb via ref backend
 - worktree: expose interface to look up worktree by name
 - builtin/worktree: move setup of commondir file earlier
 - refs/files: skip creation of "refs/{heads,tags}" for worktrees
 - setup: move creation of "refs/" into the files backend
 - refs: prepare `refs_init_db()` for initializing worktree refs
 - Merge branch 'ps/refstorage-extension' into ps/worktree-refdb-initialization

 Instead of manually creating refs/ hierarchy on disk upon a
 creation of a secondary worktree, which is only usable via the
 files backend, use the refs API to populate it.

 Will merge to 'next'.
 source: <cover.1704705733.git.ps@pks.im>


* cp/apply-core-filemode (2023-12-26) 3 commits
 - apply: code simplification
 - apply: correctly reverse patch's pre- and post-image mode bits
 - apply: ignore working tree filemode when !core.filemode

 "git apply" on a filesystem without filemode support have learned
 to take a hint from what is in the index for the path, even when
 not working with the "--index" or "--cached" option, when checking
 the executable bit match what is required by the preimage in the
 patch.

 Needs review.
 source: <20231226233218.472054-1-gitster@pobox.com>


* al/unit-test-ctype (2024-01-16) 1 commit
 - unit-tests: rewrite t/helper/test-ctype.c as a unit test

 Move test-ctype helper to the unit-test framework.

 Will merge to 'next'.
 source: <20240112102743.1440-1-ach.lumap@gmail.com>


* bk/bisect-doc-fix (2024-01-10) 2 commits
  (merged to 'next' on 2024-01-12 at bdb3609554)
 + doc: refer to pathspec instead of path
 + doc: use singular form of repeatable path arg

 Synopsis fix.

 Will merge to 'master'.
 source: <20240103040207.661413-1-britton.kerin@gmail.com>


* ja/doc-placeholders-fix (2023-12-26) 2 commits
 - doc: enforce placeholders in documentation
 - doc: enforce dashes in placeholders

 Docfix.

 Needs review.
 source: <pull.1626.git.1703539287.gitgitgadget@gmail.com>


* jc/bisect-doc (2023-12-09) 1 commit
 - bisect: document "terms" subcommand more fully

 Doc update.

 Needs review.
 source: <xmqqzfyjmk02.fsf@gitster.g>


* tb/pair-chunk-expect (2023-11-10) 8 commits
 - midx: read `OOFF` chunk with `pair_chunk_expect()`
 - midx: read `OIDL` chunk with `pair_chunk_expect()`
 - commit-graph: read `BIDX` chunk with `pair_chunk_expect()`
 - commit-graph: read `GDAT` chunk with `pair_chunk_expect()`
 - commit-graph: read `CDAT` chunk with `pair_chunk_expect()`
 - commit-graph: read `OIDL` chunk with `pair_chunk_expect()`
 - chunk-format: introduce `pair_chunk_expect()` helper
 - Merge branch 'jk/chunk-bounds-more' into HEAD

 Further code clean-up.

 Needs review.
 source: <cover.1699569246.git.me@ttaylorr.com>


* tb/path-filter-fix (2023-10-18) 17 commits
 - bloom: introduce `deinit_bloom_filters()`
 - commit-graph: reuse existing Bloom filters where possible
 - object.h: fix mis-aligned flag bits table
 - commit-graph: drop unnecessary `graph_read_bloom_data_context`
 - commit-graph.c: unconditionally load Bloom filters
 - bloom: prepare to discard incompatible Bloom filters
 - bloom: annotate filters with hash version
 - commit-graph: new filter ver. that fixes murmur3
 - repo-settings: introduce commitgraph.changedPathsVersion
 - t4216: test changed path filters with high bit paths
 - t/helper/test-read-graph: implement `bloom-filters` mode
 - bloom.h: make `load_bloom_filter_from_graph()` public
 - t/helper/test-read-graph.c: extract `dump_graph_info()`
 - gitformat-commit-graph: describe version 2 of BDAT
 - commit-graph: ensure Bloom filters are read with consistent settings
 - revision.c: consult Bloom filters for root commits
 - t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`

 The Bloom filter used for path limited history traversal was broken
 on systems whose "char" is unsigned; update the implementation and
 bump the format version to 2.

 Expecting a reroll?
 cf. <20231023202212.GA5470@szeder.dev>
 source: <cover.1697653929.git.me@ttaylorr.com>


* ak/color-decorate-symbols (2023-10-23) 7 commits
 - log: add color.decorate.pseudoref config variable
 - refs: exempt pseudorefs from pattern prefixing
 - refs: add pseudorefs array and iteration functions
 - log: add color.decorate.ref config variable
 - log: add color.decorate.symbol config variable
 - log: use designated inits for decoration_colors
 - config: restructure color.decorate documentation

 A new config for coloring.

 Needs review.
 source: <20231023221143.72489-1-andy.koppe@gmail.com>


* eb/hash-transition (2023-10-02) 30 commits
 - t1016-compatObjectFormat: add tests to verify the conversion between objects
 - t1006: test oid compatibility with cat-file
 - t1006: rename sha1 to oid
 - test-lib: compute the compatibility hash so tests may use it
 - builtin/ls-tree: let the oid determine the output algorithm
 - object-file: handle compat objects in check_object_signature
 - tree-walk: init_tree_desc take an oid to get the hash algorithm
 - builtin/cat-file: let the oid determine the output algorithm
 - rev-parse: add an --output-object-format parameter
 - repository: implement extensions.compatObjectFormat
 - object-file: update object_info_extended to reencode objects
 - object-file-convert: convert commits that embed signed tags
 - object-file-convert: convert commit objects when writing
 - object-file-convert: don't leak when converting tag objects
 - object-file-convert: convert tag objects when writing
 - object-file-convert: add a function to convert trees between algorithms
 - object: factor out parse_mode out of fast-import and tree-walk into in object.h
 - cache: add a function to read an OID of a specific algorithm
 - tag: sign both hashes
 - commit: export add_header_signature to support handling signatures on tags
 - commit: convert mergetag before computing the signature of a commit
 - commit: write commits for both hashes
 - object-file: add a compat_oid_in parameter to write_object_file_flags
 - object-file: update the loose object map when writing loose objects
 - loose: compatibilty short name support
 - loose: add a mapping between SHA-1 and SHA-256 for loose objects
 - repository: add a compatibility hash algorithm
 - object-names: support input of oids in any supported hash
 - oid-array: teach oid-array to handle multiple kinds of oids
 - object-file-convert: stubs for converting from one object format to another

 Teach a repository to work with both SHA-1 and SHA-256 hash algorithms.

 Needs review.
 source: <878r8l929e.fsf@gmail.froward.int.ebiederm.org>


* jx/remote-archive-over-smart-http (2024-01-16) 6 commits
 - transport-helper: call do_take_over() in process_connect
 - transport-helper: call do_take_over() in connect_helper
 - http-backend: new rpc-service for git-upload-archive
 - transport-helper: protocol-v2 supports upload-archive
 - remote-curl: supports git-upload-archive service
 - transport-helper: no connection restriction in connect_helper

 "git archive --remote=<remote>" learned to talk over the smart
 http (aka stateless) transport.

 Will merge to 'next'?
 source: <cover.1705411391.git.zhiyou.jx@alibaba-inc.com>


* jc/rerere-cleanup (2023-08-25) 4 commits
 - rerere: modernize use of empty strbuf
 - rerere: try_merge() should use LL_MERGE_ERROR when it means an error
 - rerere: fix comment on handle_file() helper
 - rerere: simplify check_one_conflict() helper function

 Code clean-up.

 Not ready to be reviewed yet.
 source: <20230824205456.1231371-1-gitster@pobox.com>

^ permalink raw reply

* Re: What's cooking in git.git (Jan 2024, #01; Tue, 2)
From: Taylor Blau @ 2024-01-16 20:37 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git
In-Reply-To: <20240113234134.GE3000857@szeder.dev>

Hi Gábor,

On Sun, Jan 14, 2024 at 12:41:34AM +0100, SZEDER Gábor wrote:
> > In any case, here's the patch on top (with a lightly modified version of
> > the test you wrote in <20230830200218.GA5147@szeder.dev>:
>
> I certainly hope that I'm just misunderstanding, and you don't
> actually imply that this one test in its current form would qualify as
> thorough testing... :)

I hear what you're saying, though I think that the interesting behavior
that would be most likely to regress is the transition between different
Bloom filter settings/hash-version across split commit-graph layers.

We have extensive tests on either "side" of this transition for both v1
and v2 Bloom filters, so I'm not sure what we'd want to add there. Like
I said, the transition is the primary (previously-)untested area of this
code that I would want to ensure is covered to protect against
regressions.

I think that the most recent round of this series accomplishes that
goal.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH] clone: support cloning of filtered bundles
From: Nikolay Edigaryev @ 2024-01-16 20:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nikolay Edigaryev via GitGitGadget, git, Derrick Stolee
In-Reply-To: <xmqq8r4r5ovf.fsf@gitster.g>

Hello Junio and Phillip,

Thanks a lot for the explanations of how this is supposed to work. It
seems that to make this work properly, we'd need to:

(1) add an argument (or an option) to 'git bundle create', so that
    the user will be able to explicitly request the inclusion of a
    desired remote's URL

Without such mechanism in place data leak is possible, e.g. remote with
credentials hardcoded in it.

(2) extend the 'gitformat-bundle' to include 'url'

However, a remote can have multiple URLs and other remote-specific
options might be necessary to properly work with it.

(3) add an argument (or an option) to 'git clone', so that the user
    will be able to explicitly request the write of the URL contained
    in the bundle to the repository's config

Otherwise, it's insecure, e.g. someone might craft a bundle with a URL
that collects data from the user.

I don't want waste anyone's time on this anymore because I've toyed with
'git bundle' a bit more and realized that what I'm trying to accomplish
can be done the other way:

1. git init

2. git bundle unbundle <PATH> | <script that swaps hashes and refs in
   'git bundle unbundle output' and feeds them to 'git update-ref'>

Hopefully this discussion will be useful for people looking to
accomplish something similar to what I've described in the initial
message.

On Mon, Jan 15, 2024 at 6:09 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > "Nikolay Edigaryev via GitGitGadget" <gitgitgadget@gmail.com>
> > writes:
> >
> >> diff --git a/builtin/clone.c b/builtin/clone.c
> >> index c6357af9498..4b3fedf78ed 100644
> >> --- a/builtin/clone.c
> >> +++ b/builtin/clone.c
> >> @@ -1227,9 +1227,18 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
> >>
> >>              if (fd > 0)
> >>                      close(fd);
> >> +
> >> +            if (has_filter) {
> >> +                    strbuf_addf(&key, "remote.%s.promisor", remote_name);
> >> +                    git_config_set(key.buf, "true");
> >> +                    strbuf_reset(&key);
> >> +
> >> +                    strbuf_addf(&key, "remote.%s.partialclonefilter", remote_name);
> >> +                    git_config_set(key.buf, expand_list_objects_filter_spec(&header.filter));
> >> +                    strbuf_reset(&key);
> >> +            }
> >> +
> >
> >> -# NEEDSWORK: 'git clone --bare' should be able to clone from a filtered
> >> -# bundle, but that requires a change to promisor/filter config options.
> > ...
> > But a bundle that were created with objects _omitted_ already?
> > ... the source of this clone operation, i.e. the bundle file that is
> > pointed at by "remote.$remote_name.url", cannot be that promisor.
>
> Extending the above a bit, one important way a bundle is used is as
> a medium for sneaker-net.  Instead of making a full clone over the
> network, if you can create a bundle that records all objects and all
> refs out of the source repository and then unbundle it in a
> different place to create a repository, you can tweak the resulting
> repository by either adding a separete remote or changing the
> remote.origin.url so that your subsequent fetch goes over the
> network to the repository you took the initial bundle from.
>
> The "tweak the resulting repository" part however MUST be done
> manually with the current system.  If we can optionally record the
> publically reachable URL of the source repository when we create a
> bundle file, and "git clone" on the receiving side can read the URL
> out of the bundle and act on it (e.g., show it to the user and offer
> to record it as remote.origin.url in the resulting repository---I do
> not think it is wise to do this silently without letting the user
> know from security's point of view), then the use of bundle files as
> a medium for sneaker-netting will become even easier.
>
> And once that is done, perhaps allowing a filtered bundle to act as
> a sneaker-net medium to simulate an initial filtered clone would
> make sense.  The promisor as well as the origin will be the network
> reachable URL and subsequent fetches (both deliberate ones via "git
> fetch" as well as lazy on-demand ones that backfills missing objects
> via the "promisor" access) would become possible.
>
> But without such a change to the bundle file format, allowing
> "clone" to finish and pretend the resulting repository is usable is
> somewhat irresponsible to the users.  The on-demand lazy fetch would
> fail after this code cloned from such a filtered bundle, no?

^ permalink raw reply

* Re: [Outreachy][PATCH v5] Port helper/test-ctype.c to unit-tests/t-ctype.c
From: Junio C Hamano @ 2024-01-16 19:52 UTC (permalink / raw)
  To: René Scharfe
  Cc: Phillip Wood, Achu Luma, git, chriscool, christian.couder, me,
	phillip.wood, steadmon
In-Reply-To: <41cf1944-2456-4115-a934-aff2306a26e5@web.de>

René Scharfe <l.s.r@web.de> writes:

> Am 16.01.24 um 16:38 schrieb Junio C Hamano:
>> Phillip Wood <phillip.wood123@gmail.com> writes:
>>
>>> Thanks for adding back the test for EOF, this version looks good to me.
>>
>> Thanks.  Let's merge it to 'next'.
>
> OK.  I'm still interested in replies to my question in
> https://lore.kernel.org/git/a087f57c-ce72-45c7-8182-f38d0aca9030@web.de/,
> i.e. whether we should have one TEST per class or one per class and
> character -- or in a broader sense: What's the ideal scope of a TEST?
> But I can ask it again in the form of a follow-up patch.

I personally do not have a good answer, but those who are interested
in unit-tests more than I do should have their opinions to share ;-)

^ permalink raw reply

* Re: [Outreachy][PATCH v5] Port helper/test-ctype.c to unit-tests/t-ctype.c
From: Christian Couder @ 2024-01-16 19:45 UTC (permalink / raw)
  To: René Scharfe
  Cc: Junio C Hamano, Phillip Wood, Achu Luma, git, chriscool, me,
	phillip.wood, steadmon
In-Reply-To: <41cf1944-2456-4115-a934-aff2306a26e5@web.de>

On Tue, Jan 16, 2024 at 8:27 PM René Scharfe <l.s.r@web.de> wrote:
>
> Am 16.01.24 um 16:38 schrieb Junio C Hamano:
> > Phillip Wood <phillip.wood123@gmail.com> writes:
> >
> >> Thanks for adding back the test for EOF, this version looks good to me.
> >
> > Thanks.  Let's merge it to 'next'.
>
> OK.  I'm still interested in replies to my question in
> https://lore.kernel.org/git/a087f57c-ce72-45c7-8182-f38d0aca9030@web.de/,
> i.e. whether we should have one TEST per class or one per class and
> character -- or in a broader sense: What's the ideal scope of a TEST?
> But I can ask it again in the form of a follow-up patch.

I think one test per character per class would result in too much
detail in the output. Other than that I think it's better to address
your questions to the designers of the unit test framework rather than
to the authors of this patch. And yeah, sending a follow up patch
would perhaps be the best. Thanks.

^ permalink raw reply

* Re: [Outreachy][PATCH v5] Port helper/test-ctype.c to unit-tests/t-ctype.c
From: René Scharfe @ 2024-01-16 19:27 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood
  Cc: Achu Luma, git, chriscool, christian.couder, me, phillip.wood,
	steadmon
In-Reply-To: <xmqqply147bj.fsf@gitster.g>

Am 16.01.24 um 16:38 schrieb Junio C Hamano:
> Phillip Wood <phillip.wood123@gmail.com> writes:
>
>> Thanks for adding back the test for EOF, this version looks good to me.
>
> Thanks.  Let's merge it to 'next'.

OK.  I'm still interested in replies to my question in
https://lore.kernel.org/git/a087f57c-ce72-45c7-8182-f38d0aca9030@web.de/,
i.e. whether we should have one TEST per class or one per class and
character -- or in a broader sense: What's the ideal scope of a TEST?
But I can ask it again in the form of a follow-up patch.

René

^ permalink raw reply

* Re: git-bugreport-2024-01-16-2051
From: René Scharfe @ 2024-01-16 19:10 UTC (permalink / raw)
  To: Vaibhav Naik, git
In-Reply-To: <CALZECO2kHf_qPX06kOUPbJT_hNd5Txt9FTh_Pxr+9x_sMJdKOg@mail.gmail.com>

Am 16.01.24 um 16:30 schrieb Vaibhav Naik:
>
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
> i typed this command in order:
> git checkout ca35d53
> git tag 1.1.2
> git commit --allow-empty -m "want to commit at any cost"
> (i just wanted to commit to catch the bug)
>
> What did you expect to happen? (Expected behavior)
> when I checkout, it was showing the commit id at the place of HEAD Point:
> vaibh@VaibhavNaik MINGW64 /d/E Drive/MyCodes/Git-Testing ((ca35d53...))
> and then after creating a new tag it was showing the tag name at the place of HEAD Point:
> vaibh@VaibhavNaik MINGW64 /d/E Drive/MyCodes/Git-Testing ((1.1.2))
> which is wrong, that place is to show where the HEAD is pointing at and not the newly created tag name

The "it" in "it was showing the tag name" is your prompt, right?  It
looks like the one provided by contrib/completion/git-prompt.sh.  A
version of that script is at [1].  It is quite long and configurable,
but it shows the output of "git describe --tags --exact-match HEAD" if
the HEAD commit is tagged or otherwise its abbreviated hash.

Which matches what you see: both your (detached) HEAD and your tag
point to commit ca35d53, so the tag name is shown.  The idea being
that a symbolic name is more meaningful to a reader than a hash value.
You just gave that commit a name, so the prompt script uses it.

> What happened instead? (Actual behavior)
> It showed the newly created tag name at the place of the HEAD Point:
> vaibh@VaibhavNaik MINGW64 /d/E Drive/MyCodes/Git-Testing ((1.1.2))
>
> What's different between what you expected and what actually happened?
> It should show the the detached HEAD commit id at the place of HEAD Point and not the newly created tag name
> I proved this wrong by doing a commit which didn't got committed to any branch or tag but it moved the HEAD:
>
> vaibh@VaibhavNaik MINGW64 /d/E Drive/MyCodes/Git-Testing ((newt))
> $ git commit --allow-empty -m "want to commit at any cost"
> [detached HEAD caad8ef] want to commit at any cost
>
> vaibh@VaibhavNaik MINGW64 /d/E Drive/MyCodes/Git-Testing ((caad8ef...))
> $ git status
> HEAD detached from ca35d53
> nothing to commit, working tree clean
>
> it showed the tag name in the brackets

git commit moves the HEAD, even if it is detached, true.  And you
created an untagged commit, so the prompt fell back to showing its
abbreviated hash.  I don't see how this proves that showing the tag name
for a tagged commit is somehow wrong.  What would be the benefit?

I can understand that prompts are a matter of personal preference,
though.  You can adjust what the prompt shows by setting the environment
variable GIT_PS1_DESCRIBE_STYLE.  See git-prompt.sh lines 91 ff. for a
description of the recognized values (or check your local version of the
script).  There's no option to ignore tags and always show the
abbreviated hash, though, yet.

To summarize: I don't think the behavior you saw indicates a bug, and
the prompt style you'd like to use doesn't seem to be implemented, yet.
It shouldn't be difficult to add, but I don't quite get why you'd want
to shun tags.

>
> Anything else you want to add:
>
> Please review the rest of the bug report below.
> You can delete any lines you don't wish to share.
>
>
> [System Info]
> git version:
> git version 2.43.0.windows.1
> cpu: x86_64
> built from commit: 4b968f3ea3b32a7bc50846bab49f3f381841d297
> sizeof-long: 4
> sizeof-size_t: 8
> shell-path: /bin/sh
> feature: fsmonitor--daemon
> uname: Windows 10.0 22631
> compiler info: gnuc: 13.2
> libc info: no libc information available
> $SHELL (typically, interactive shell): C:\Program Files\Git\usr\bin\bash.exe
>
>
> [Enabled Hooks]


^ permalink raw reply

* [PATCH 2/2] pack-objects: enable multi-pack reuse via `feature.experimental`
From: Taylor Blau @ 2024-01-16 19:03 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano
In-Reply-To: <cover.1705431816.git.me@ttaylorr.com>

Now that multi-pack reuse is supported, enable it via the
feature.experimental configuration in addition to the classic
`pack.allowPackReuse`.

This will allow more users to experiment with the new behavior who might
not otherwise be aware of the existing `pack.allowPackReuse`
configuration option.

The enum with values NO_PACK_REUSE, SINGLE_PACK_REUSE, and
MULTI_PACK_REUSE is defined statically in builtin/pack-objects.c's
compilation unit. We could hoist that enum into a scope visible from the
repository_settings struct, and then use that enum value in
pack-objects. Instead, define a single int that indicates what
pack-objects's default value should be to avoid additional unnecessary
code movement.

Though `feature.experimental` implies `pack.allowPackReuse=multi`, this
can still be overridden by explicitly setting the latter configuration
to either "single" or "false". Tests covering all of these cases are
showin t5332.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/config/feature.txt |  3 +++
 builtin/pack-objects.c           |  2 ++
 repo-settings.c                  |  1 +
 repository.h                     |  1 +
 t/t5332-multi-pack-reuse.sh      | 16 ++++++++++++++++
 5 files changed, 23 insertions(+)

diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt
index bf9546fca4..f061b64b74 100644
--- a/Documentation/config/feature.txt
+++ b/Documentation/config/feature.txt
@@ -17,6 +17,9 @@ skipping more commits at a time, reducing the number of round trips.
 +
 * `pack.useBitmapBoundaryTraversal=true` may improve bitmap traversal times by
 walking fewer objects.
++
+* `pack.allowPackReuse=multi` may improve the time it takes to create a pack by
+reusing objects from multiple packs instead of just one.
 
 feature.manyFiles::
 	Enable config options that optimize for repos with many files in the
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d8c2128a97..329aeac804 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4396,6 +4396,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		prepare_repo_settings(the_repository);
 		if (sparse < 0)
 			sparse = the_repository->settings.pack_use_sparse;
+		if (the_repository->settings.pack_use_multi_pack_reuse)
+			allow_pack_reuse = MULTI_PACK_REUSE;
 	}
 
 	reset_pack_idx_option(&pack_idx_opts);
diff --git a/repo-settings.c b/repo-settings.c
index 30cd478762..a0b590bc6c 100644
--- a/repo-settings.c
+++ b/repo-settings.c
@@ -43,6 +43,7 @@ void prepare_repo_settings(struct repository *r)
 	if (experimental) {
 		r->settings.fetch_negotiation_algorithm = FETCH_NEGOTIATION_SKIPPING;
 		r->settings.pack_use_bitmap_boundary_traversal = 1;
+		r->settings.pack_use_multi_pack_reuse = 1;
 	}
 	if (manyfiles) {
 		r->settings.index_version = 4;
diff --git a/repository.h b/repository.h
index 5f18486f64..b92881b0a3 100644
--- a/repository.h
+++ b/repository.h
@@ -36,6 +36,7 @@ struct repo_settings {
 	int sparse_index;
 	int pack_read_reverse_index;
 	int pack_use_bitmap_boundary_traversal;
+	int pack_use_multi_pack_reuse;
 
 	/*
 	 * Does this repository have core.useReplaceRefs=true (on by
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index b53e821bc0..ccc8735db6 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -57,6 +57,22 @@ test_expect_success 'preferred pack is reused for single-pack reuse' '
 	test_pack_objects_reused_all 3 1
 '
 
+test_expect_success 'multi-pack reuse is disabled by default' '
+	test_pack_objects_reused_all 3 1
+'
+
+test_expect_success 'feature.experimental implies multi-pack reuse' '
+	test_config feature.experimental true &&
+
+	test_pack_objects_reused_all 6 2
+'
+
+test_expect_success 'multi-pack reuse can be disabled with feature.experimental' '
+	test_config feature.experimental true &&
+	test_config pack.allowPackReuse single &&
+
+	test_pack_objects_reused_all 3 1
+'
 
 test_expect_success 'enable multi-pack reuse' '
 	git config pack.allowPackReuse multi
-- 
2.43.0.334.gd4dbce1db5.dirty

^ permalink raw reply related

* [PATCH 1/2] t5332-multi-pack-reuse.sh: extract pack-objects helper functions
From: Taylor Blau @ 2024-01-16 19:03 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano
In-Reply-To: <cover.1705431816.git.me@ttaylorr.com>

Most of the tests in t5332 perform some setup before repeating a common
refrain that looks like:

    : >trace2.txt &&
    GIT_TRACE2_EVENT="$PWD/trace2.txt" \
      git pack-objects --stdout --revs --all >/dev/null &&

    test_pack_reused $objects_nr <trace2.txt &&
    test_packs_reused $packs_nr <trace2.txt

The next commit will add more tests which repeat the above refrain.
Avoid duplicating this invocation even further and prepare for the
following commit by wrapping the above in a helper function called
`test_pack_objects_reused_all()`.

Introduce another similar function `test_pack_objects_reused`, which
expects to read a list of revisions over stdin for tests which need more
fine-grained control of the contents of the pack they generate.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5332-multi-pack-reuse.sh | 70 +++++++++++++++----------------------
 1 file changed, 28 insertions(+), 42 deletions(-)

diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh
index 2ba788b042..b53e821bc0 100755
--- a/t/t5332-multi-pack-reuse.sh
+++ b/t/t5332-multi-pack-reuse.sh
@@ -23,6 +23,26 @@ pack_position () {
 	grep "$1" objects | cut -d" " -f1
 }
 
+# test_pack_objects_reused_all <pack-reused> <packs-reused>
+test_pack_objects_reused_all () {
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT="$PWD/trace2.txt" \
+		git pack-objects --stdout --revs --all >/dev/null &&
+
+	test_pack_reused "$1" <trace2.txt &&
+	test_packs_reused "$2" <trace2.txt
+}
+
+# test_pack_objects_reused <pack-reused> <packs-reused>
+test_pack_objects_reused () {
+	: >trace2.txt &&
+	GIT_TRACE2_EVENT="$PWD/trace2.txt" \
+		git pack-objects --stdout --revs >/dev/null &&
+
+	test_pack_reused "$1" <trace2.txt &&
+	test_packs_reused "$2" <trace2.txt
+}
+
 test_expect_success 'preferred pack is reused for single-pack reuse' '
 	test_config pack.allowPackReuse single &&
 
@@ -34,14 +54,10 @@ test_expect_success 'preferred pack is reused for single-pack reuse' '
 
 	git multi-pack-index write --bitmap &&
 
-	: >trace2.txt &&
-	GIT_TRACE2_EVENT="$PWD/trace2.txt" \
-		git pack-objects --stdout --revs --all >/dev/null &&
-
-	test_pack_reused 3 <trace2.txt &&
-	test_packs_reused 1 <trace2.txt
+	test_pack_objects_reused_all 3 1
 '
 
+
 test_expect_success 'enable multi-pack reuse' '
 	git config pack.allowPackReuse multi
 '
@@ -57,21 +73,11 @@ test_expect_success 'reuse all objects from subset of bitmapped packs' '
 	^$(git rev-parse A)
 	EOF
 
-	: >trace2.txt &&
-	GIT_TRACE2_EVENT="$PWD/trace2.txt" \
-		git pack-objects --stdout --revs <in >/dev/null &&
-
-	test_pack_reused 6 <trace2.txt &&
-	test_packs_reused 2 <trace2.txt
+	test_pack_objects_reused 6 2 <in
 '
 
 test_expect_success 'reuse all objects from all packs' '
-	: >trace2.txt &&
-	GIT_TRACE2_EVENT="$PWD/trace2.txt" \
-		git pack-objects --stdout --revs --all >/dev/null &&
-
-	test_pack_reused 9 <trace2.txt &&
-	test_packs_reused 3 <trace2.txt
+	test_pack_objects_reused_all 9 3
 '
 
 test_expect_success 'reuse objects from first pack with middle gap' '
@@ -104,12 +110,7 @@ test_expect_success 'reuse objects from first pack with middle gap' '
 	^$(git rev-parse D)
 	EOF
 
-	: >trace2.txt &&
-	GIT_TRACE2_EVENT="$PWD/trace2.txt" \
-		git pack-objects --stdout --delta-base-offset --revs <in >/dev/null &&
-
-	test_pack_reused 3 <trace2.txt &&
-	test_packs_reused 1 <trace2.txt
+	test_pack_objects_reused 3 1 <in
 '
 
 test_expect_success 'reuse objects from middle pack with middle gap' '
@@ -125,12 +126,7 @@ test_expect_success 'reuse objects from middle pack with middle gap' '
 	^$(git rev-parse D)
 	EOF
 
-	: >trace2.txt &&
-	GIT_TRACE2_EVENT="$PWD/trace2.txt" \
-		git pack-objects --stdout --delta-base-offset --revs <in >/dev/null &&
-
-	test_pack_reused 3 <trace2.txt &&
-	test_packs_reused 1 <trace2.txt
+	test_pack_objects_reused 3 1 <in
 '
 
 test_expect_success 'omit delta with uninteresting base (same pack)' '
@@ -160,10 +156,6 @@ test_expect_success 'omit delta with uninteresting base (same pack)' '
 	^$base
 	EOF
 
-	: >trace2.txt &&
-	GIT_TRACE2_EVENT="$PWD/trace2.txt" \
-		git pack-objects --stdout --delta-base-offset --revs <in >/dev/null &&
-
 	# We can only reuse the 3 objects corresponding to "other" from
 	# the latest pack.
 	#
@@ -175,8 +167,7 @@ test_expect_success 'omit delta with uninteresting base (same pack)' '
 	# The remaining objects from the other pack are similarly not
 	# reused because their objects are on the uninteresting side of
 	# the query.
-	test_pack_reused 3 <trace2.txt &&
-	test_packs_reused 1 <trace2.txt
+	test_pack_objects_reused 3 1 <in
 '
 
 test_expect_success 'omit delta from uninteresting base (cross pack)' '
@@ -189,15 +180,10 @@ test_expect_success 'omit delta from uninteresting base (cross pack)' '
 
 	git multi-pack-index write --bitmap --preferred-pack="pack-$P.idx" &&
 
-	: >trace2.txt &&
-	GIT_TRACE2_EVENT="$PWD/trace2.txt" \
-		git pack-objects --stdout --delta-base-offset --all >/dev/null &&
-
 	packs_nr="$(find $packdir -type f -name "pack-*.pack" | wc -l)" &&
 	objects_nr="$(git rev-list --count --all --objects)" &&
 
-	test_pack_reused $(($objects_nr - 1)) <trace2.txt &&
-	test_packs_reused $packs_nr <trace2.txt
+	test_pack_objects_reused_all $(($objects_nr - 1)) $packs_nr
 '
 
 test_done
-- 
2.43.0.334.gd4dbce1db5.dirty


^ permalink raw reply related

* [PATCH 0/2] pack-objects: enable multi-pack reuse via feature.experimental
From: Taylor Blau @ 2024-01-16 19:03 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano

This short series teaches pack-objects to perform multi-pack reuse by
way of the feature.experimental configuration. The hope is that this
will expose this new feature to more users who might not otherwise be
aware of lesser-known configuration options for pack-objects's
internals.

Thanks in advance for your review!

Taylor Blau (2):
  t5332-multi-pack-reuse.sh: extract pack-objects helper functions
  pack-objects: enable multi-pack reuse via `feature.experimental`

 Documentation/config/feature.txt |  3 ++
 builtin/pack-objects.c           |  2 +
 repo-settings.c                  |  1 +
 repository.h                     |  1 +
 t/t5332-multi-pack-reuse.sh      | 84 ++++++++++++++++----------------
 5 files changed, 50 insertions(+), 41 deletions(-)


base-commit: d4dbce1db5cd227a57074bcfc7ec9f0655961bba
-- 
2.43.0.334.gd4dbce1db5.dirty

^ permalink raw reply

* Re: [PATCH] rebase: Fix documentation about used shell in -x
From: Junio C Hamano @ 2024-01-16 17:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Nikolay Borisov, git
In-Reply-To: <20240116143757.GA2119690@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Maybe it makes sense to just say:
>
>   ...in a shell (the default one, usually /bin/sh), ...
>
> It might even make sense to just drop the parenthetical phrase entirely.
> Git executes lots of things using a shell, and it is always "the default
> one", but we don't bother saying so in most places.

Thanks for your archaeological skill, as always.  I like the
deliberate "vagueness" of the above.


^ permalink raw reply

* Re: [PATCH v2 0/6] worktree: initialize refdb via ref backends
From: Junio C Hamano @ 2024-01-16 17:53 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: Patrick Steinhardt, git, Eric Sunshine
In-Reply-To: <CAOLa=ZS3aVP=h9iC2i_4Hbx_-OSdqJ8S6xYT65CPyXd+9_=4Nw@mail.gmail.com>

Karthik Nayak <karthik.188@gmail.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> Hi,
>>
>> this is the second version of my patch series that refactors the
>> initialization of worktree refdbs to use the refs API.
>>
>> Changes compared to v1:
>>
>>   - Improved commit messages.
>>
>>   - This series is now based on `ps/refstorage-extension`, 1b2234079b
>>     (t9500: write "extensions.refstorage" into config, 2023-12-29).
>>     While there is no functional dependency between those series,
>>     merging both topics causes a merge conflict.
>>
>
> This looks good to me now. Thanks Patrick!

Thanks, both.

^ permalink raw reply

* Re: Custom merge drivers: accessing the pathnames and revisions of the files being merged
From: Junio C Hamano @ 2024-01-16 17:51 UTC (permalink / raw)
  To: Antonin Delpeuch; +Cc: git
In-Reply-To: <8bb5e41e-4db9-4527-8492-3aca6a0f40bf@delpeuch.eu>

Antonin Delpeuch <antonin@delpeuch.eu> writes:

> So, I wonder: would people be open to exposing additional parameters
> to merge drivers? For instance we could add parameters "%X", "%Y" "%Z"
> to expose those "revision:pathname" strings for each part. (I think
> colons cannot be part of revision names, so this can be parsed
> unambiguously by the custom merge driver to recover the revision and
> pathname independently, if needed.)

The last time this changed was in ef45bb1f (ll-merge: pass the
original path to external drivers, 2015-06-04).  

I may not necessarily endorse the choice of XYZ [*], but I do not
fundamentally oppose to such a new feature existing.  The mechanism
to define a custom merge driver is designed to be future-proof in
that only the parameters it uses is given as the value of
merge.*.driver variable, so it is not a problem that existing merge
drivers will not know what to do with "pathname in the common
ancestor", "pathname on our side", and "pathname on their side".


[Footnote]

 * Whatever letters we choose, they must have mnemonic value that
   signals two of them are the both sides of the merge that are
   equal participants, and the other one is the old-file, their
   common ancestor that plays quite a different from these two in
   the merge.  I cannot tell which one of the XYZ would be the more
   special than other two, which is the primary reason why I do not
   know if XYZ is a good idea.



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox