Git development
 help / color / mirror / Atom feed
* [PATCH 04/20] commit-graph: check size of oid fanout chunk
From: Jeff King @ 2023-10-09 20:59 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau
In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net>

We load the oid fanout chunk with pair_chunk(), which means we never see
the size of the chunk. We just assume the on-disk file uses the
appropriate size, and if it's too small we'll access random memory.

It's easy to check this up-front; the fanout always consists of 256
uint32's, since it is a fanout of the first byte of the hash pointing
into the oid index. These parameters can't be changed without
introducing a new chunk type.

This matches the similar check in the midx OIDF chunk (but note that
rather than checking for the error immediately, the graph code just
leaves parts of the struct NULL and checks for required fields later).

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c          | 13 +++++++++++--
 t/t5318-commit-graph.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index a689a55b79..9b3b01da61 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -305,6 +305,16 @@ static int verify_commit_graph_lite(struct commit_graph *g)
 	return 0;
 }
 
+static int graph_read_oid_fanout(const unsigned char *chunk_start,
+				 size_t chunk_size, void *data)
+{
+	struct commit_graph *g = data;
+	if (chunk_size != 256 * sizeof(uint32_t))
+		return error("commit-graph oid fanout chunk is wrong size");
+	g->chunk_oid_fanout = (const uint32_t *)chunk_start;
+	return 0;
+}
+
 static int graph_read_oid_lookup(const unsigned char *chunk_start,
 				 size_t chunk_size, void *data)
 {
@@ -394,8 +404,7 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 				   GRAPH_HEADER_SIZE, graph->num_chunks))
 		goto free_and_return;
 
-	pair_chunk_unsafe(cf, GRAPH_CHUNKID_OIDFANOUT,
-		   (const unsigned char **)&graph->chunk_oid_fanout);
+	read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph);
 	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
 	pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
 	pair_chunk_unsafe(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index ba65f17dd9..d25bea3ec5 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -2,6 +2,7 @@
 
 test_description='commit graph'
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-chunk.sh
 
 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
 
@@ -821,4 +822,29 @@ test_expect_success 'overflow during generation version upgrade' '
 	)
 '
 
+corrupt_chunk () {
+	graph=full/.git/objects/info/commit-graph &&
+	test_when_finished "rm -rf $graph" &&
+	git -C full commit-graph write --reachable &&
+	corrupt_chunk_file $graph "$@"
+}
+
+check_corrupt_chunk () {
+	corrupt_chunk "$@" &&
+	git -C full -c core.commitGraph=false log >expect.out &&
+	git -C full -c core.commitGraph=true log >out 2>err &&
+	test_cmp expect.out out
+}
+
+test_expect_success 'reader notices too-small oid fanout chunk' '
+	# make it big enough that the graph file is plausible,
+	# otherwise we hit an earlier check
+	check_corrupt_chunk OIDF clear $(printf "000000%02x" $(test_seq 250)) &&
+	cat >expect.err <<-\EOF &&
+	error: commit-graph oid fanout chunk is wrong size
+	error: commit-graph is missing the OID Fanout chunk
+	EOF
+	test_cmp expect.err err
+'
+
 test_done
-- 
2.42.0.884.g35e1fe1a6a


^ permalink raw reply related

* Re: [RFC PATCH] Not computing changed path filter for root commits
From: Jonathan Tan @ 2023-10-09 20:59 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jonathan Tan, SZEDER Gábor, git
In-Reply-To: <ZSQ3s3ZiRcvQIKOa@nand.local>

Taylor Blau <me@ttaylorr.com> writes:
> This only happens when we return REV_TREE_NEW from a call to
> `rev_compare_tree(revs, p, commit, nth_parent)`. But we'll only get
> REV_TREE_NEW back if
> 
>     repo_get_commit_tree(the_repository, p);
> 
> returns NULL. But when we call rev_same_tree_as_empty(revs, p) in the
> REV_TREE_NEW case, we return early as follows:
> 
>     struct tree *t1 = repo_get_commit_tree(revs, p);
>     if (!t1)
>       return 0;
> 
> So we won't even consult the Bloom filter in that case, since t1 is NULL
> for the same reason as what caused rev_compare_tree() to return
> REV_TREE_NEW in the first place.
> 
> I am still dumbfounded by how we would ever get REV_TREE_NEW in the
> first place, but if we did, I think we would be OK here.
> 
> Thanks,
> Taylor

Ah, good point. Your patch in
https://lore.kernel.org/git/ZQnmTXUO94%2FQy8mq@nand.local/ looks good to
me, then.

^ permalink raw reply

* [PATCH 03/20] midx: stop ignoring malformed oid fanout chunk
From: Jeff King @ 2023-10-09 20:59 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau
In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net>

When we load the oid-fanout chunk, our callback checks that its size is
reasonable and returns an error if not. However, the caller only checks
our return value against CHUNK_NOT_FOUND, so we end up ignoring the
error completely! Using a too-small fanout table means we end up
accessing random memory for the fanout and segfault.

We can fix this by checking for any non-zero return value, rather than
just CHUNK_NOT_FOUND, and adjusting our error message to cover both
cases. We could handle each error code individually, but there's not
much point for such a rare case. The extra message produced in the
callback makes it clear what is going on.

The same pattern is used in the adjacent code. Those cases are actually
OK for now because they do not use a custom callback, so the only error
they can get is CHUNK_NOT_FOUND. But let's convert them, as this is an
accident waiting to happen (especially as we convert some of them away
from pair_chunk). The error messages are more verbose, but it should be
rare for a user to see these anyway.

Signed-off-by: Jeff King <peff@peff.net>
---
 midx.c                      | 16 ++++++++--------
 t/t5319-multi-pack-index.sh | 20 +++++++++++++++++++-
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/midx.c b/midx.c
index 3165218ab5..21d7dd15ef 100644
--- a/midx.c
+++ b/midx.c
@@ -143,14 +143,14 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 				   MIDX_HEADER_SIZE, m->num_chunks))
 		goto cleanup_fail;
 
-	if (pair_chunk_unsafe(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names) == CHUNK_NOT_FOUND)
-		die(_("multi-pack-index missing required pack-name chunk"));
-	if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m) == CHUNK_NOT_FOUND)
-		die(_("multi-pack-index missing required OID fanout chunk"));
-	if (pair_chunk_unsafe(cf, MIDX_CHUNKID_OIDLOOKUP, &m->chunk_oid_lookup) == CHUNK_NOT_FOUND)
-		die(_("multi-pack-index missing required OID lookup chunk"));
-	if (pair_chunk_unsafe(cf, MIDX_CHUNKID_OBJECTOFFSETS, &m->chunk_object_offsets) == CHUNK_NOT_FOUND)
-		die(_("multi-pack-index missing required object offsets chunk"));
+	if (pair_chunk_unsafe(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names))
+		die(_("multi-pack-index required pack-name chunk missing or corrupted"));
+	if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m))
+		die(_("multi-pack-index required OID fanout chunk missing or corrupted"));
+	if (pair_chunk_unsafe(cf, MIDX_CHUNKID_OIDLOOKUP, &m->chunk_oid_lookup))
+		die(_("multi-pack-index required OID lookup chunk missing or corrupted"));
+	if (pair_chunk_unsafe(cf, MIDX_CHUNKID_OBJECTOFFSETS, &m->chunk_object_offsets))
+		die(_("multi-pack-index required object offsets chunk missing or corrupted"));
 
 	pair_chunk_unsafe(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 1bcc02004d..b8fe85aeba 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -2,6 +2,7 @@
 
 test_description='multi-pack-indexes'
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-chunk.sh
 
 GIT_TEST_MULTI_PACK_INDEX=0
 objdir=.git/objects
@@ -438,7 +439,7 @@ test_expect_success 'verify extended chunk count' '
 
 test_expect_success 'verify missing required chunk' '
 	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_ID "\01" $objdir \
-		"missing required"
+		"required pack-name chunk missing"
 '
 
 test_expect_success 'verify invalid chunk offset' '
@@ -1055,4 +1056,21 @@ test_expect_success 'repack with delta islands' '
 	)
 '
 
+corrupt_chunk () {
+	midx=.git/objects/pack/multi-pack-index &&
+	test_when_finished "rm -rf $midx" &&
+	git repack -ad --write-midx &&
+	corrupt_chunk_file $midx "$@"
+}
+
+test_expect_success 'reader notices too-small oid fanout chunk' '
+	corrupt_chunk OIDF clear 00000000 &&
+	test_must_fail git log 2>err &&
+	cat >expect <<-\EOF &&
+	error: multi-pack-index OID fanout is of the wrong size
+	fatal: multi-pack-index required OID fanout chunk missing or corrupted
+	EOF
+	test_cmp expect err
+'
+
 test_done
-- 
2.42.0.884.g35e1fe1a6a


^ permalink raw reply related

* [PATCH 02/20] t: add library for munging chunk-format files
From: Jeff King @ 2023-10-09 20:58 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau
In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net>

When testing corruption of files using the chunk format (like
commit-graphs and midx files), it's helpful to be able to modify bytes
in specific chunks. This requires being able both to read the
table-of-contents (to find the chunk to modify) but also to adjust it
(to account for size changes in the offsets of subsequent chunks).

We have some tests already which corrupt chunk files, but they have some
downsides:

  1. They are very brittle, as they manually compute the expected size
     of a particular instance of the file (e.g., see the definitions
     starting with NUM_OBJECTS in t5319).

  2. Because they rely on manual offsets and don't read the
     table-of-contents, they're limited to overwriting bytes. But there
     are many interesting corruptions that involve changing the sizes of
     chunks (especially smaller-than-expected ones).

This patch adds a perl script which makes such corruptions easy. We'll
use it in subsequent patches.

Note that we could get by with just a big "perl -e" inside the helper
function. I chose to put it in a separate script for two reasons. One,
so we don't have to worry about the extra layer of shell quoting. And
two, the script is kind of big, and running the tests with "-x" would
repeatedly dump it into the log output.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-chunk.sh                    | 17 ++++++++
 t/lib-chunk/corrupt-chunk-file.pl | 66 +++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)
 create mode 100644 t/lib-chunk.sh
 create mode 100644 t/lib-chunk/corrupt-chunk-file.pl

diff --git a/t/lib-chunk.sh b/t/lib-chunk.sh
new file mode 100644
index 0000000000..a7cd9c3c6d
--- /dev/null
+++ b/t/lib-chunk.sh
@@ -0,0 +1,17 @@
+# Shell library for working with "chunk" files (commit-graph, midx, etc).
+
+# corrupt_chunk_file <fn> <chunk> <offset> <bytes>
+#
+# Corrupt a chunk-based file (like a commit-graph) by overwriting the bytes
+# found in the chunk specified by the 4-byte <chunk> identifier. If <offset> is
+# "clear", replace the chunk entirely. Otherwise, overwrite data <offset> bytes
+# into the chunk.
+#
+# The <bytes> are interpreted as pairs of hex digits (so "000000FE" would be
+# big-endian 254).
+corrupt_chunk_file () {
+	fn=$1; shift
+	perl "$TEST_DIRECTORY"/lib-chunk/corrupt-chunk-file.pl \
+		"$@" <"$fn" >"$fn.tmp" &&
+	mv "$fn.tmp" "$fn"
+}
diff --git a/t/lib-chunk/corrupt-chunk-file.pl b/t/lib-chunk/corrupt-chunk-file.pl
new file mode 100644
index 0000000000..cd6d386fef
--- /dev/null
+++ b/t/lib-chunk/corrupt-chunk-file.pl
@@ -0,0 +1,66 @@
+#!/usr/bin/perl
+
+my ($chunk, $seek, $bytes) = @ARGV;
+$bytes =~ s/../chr(hex($&))/ge;
+
+binmode STDIN;
+binmode STDOUT;
+
+# A few helpers to read bytes, or read and copy them to the
+# output.
+sub get {
+	my $n = shift;
+	return unless $n;
+	read(STDIN, my $buf, $n)
+		or die "read error or eof: $!\n";
+	return $buf;
+}
+sub copy {
+	my $buf = get(@_);
+	print $buf;
+	return $buf;
+}
+
+# read until we find table-of-contents entry for chunk;
+# note that we cheat a bit by assuming 4-byte alignment and
+# that no ToC entry will accidentally look like a header.
+#
+# If we don't find the entry, copy() will hit EOF and exit
+# (which should cause the caller to fail the test).
+while (copy(4) ne $chunk) { }
+my $offset = unpack("Q>", copy(8));
+
+# In clear mode, our length will change. So figure out
+# the length by comparing to the offset of the next chunk, and
+# then adjust that offset (and all subsequent) ones.
+my $len;
+if ($seek eq "clear") {
+	my $id;
+	do {
+		$id = copy(4);
+		my $next = unpack("Q>", get(8));
+		if (!defined $len) {
+			$len = $next - $offset;
+		}
+		print pack("Q>", $next - $len + length($bytes));
+	} while (unpack("N", $id));
+}
+
+# and now copy up to our existing chunk data
+copy($offset - tell(STDIN));
+if ($seek eq "clear") {
+	# if clearing, skip past existing data
+	get($len);
+} else {
+	# otherwise, copy up to the requested offset,
+	# and skip past the overwritten bytes
+	copy($seek);
+	get(length($bytes));
+}
+
+# now write out the requested bytes, along
+# with any other remaining data
+print $bytes;
+while (read(STDIN, my $buf, 4096)) {
+	print $buf;
+}
-- 
2.42.0.884.g35e1fe1a6a


^ permalink raw reply related

* [PATCH 01/20] chunk-format: note that pair_chunk() is unsafe
From: Jeff King @ 2023-10-09 20:58 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau
In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net>

The pair_chunk() function is provided as an easy helper for parsing
chunks that just want a pointer to a set of bytes. But every caller has
a hidden bug: because we return only the pointer without the matching
chunk size, the callers have no clue how many bytes they are allowed to
look at. And as a result, they may read off the end of the mmap'd data
when the on-disk file does not match their expectations.

Since chunk files are typically used for local-repository data like
commit-graph files and midx's, the security implications here are pretty
mild. The worst that can happen is that you hand somebody a corrupted
repository tarball, and running Git on it does an out-of-bounds read and
crashes. So it's worth being more defensive, but we don't need to drop
everything and fix every caller immediately.

I noticed the problem because the pair_chunk_fn() callback does not look
at its chunk_size argument, and wanted to annotate it to silence
-Wunused-parameter. We could do that now, but we'd lose the hint that
this code should be audited and fixed.

So instead, let's set ourselves up for going down that path:

  1. Provide a pair_chunk() function that does return the size, which
     prepares us for fixing these cases.

  2. Rename the existing function to pair_chunk_unsafe(). That gives us
     an easy way to grep for cases which still need to be fixed, and the
     name should cause anybody adding new calls to think twice before
     using it.

There are no callers of the "safe" version yet, but we'll add some in
subsequent patches.

Signed-off-by: Jeff King <peff@peff.net>
---
I originally wasn't sure if I'd convert every case, or if we'd have to
leave some longer-term. But as you'll see at the end, we can get rid of
pair_chunk_unsafe() in the final patch (but we need both forms to
co-exist during the series, since the intermediate state is that we have
some safe and some unsafe calls).

 chunk-format.c | 24 ++++++++++++++++++++----
 chunk-format.h | 19 +++++++++++++++++--
 commit-graph.c | 14 +++++++-------
 midx.c         | 10 +++++-----
 4 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/chunk-format.c b/chunk-format.c
index 140dfa0dcc..8d910e23f6 100644
--- a/chunk-format.c
+++ b/chunk-format.c
@@ -154,20 +154,36 @@ int read_table_of_contents(struct chunkfile *cf,
 	return 0;
 }
 
+struct pair_chunk_data {
+	const unsigned char **p;
+	size_t *size;
+};
+
 static int pair_chunk_fn(const unsigned char *chunk_start,
 			 size_t chunk_size,
 			 void *data)
 {
-	const unsigned char **p = data;
-	*p = chunk_start;
+	struct pair_chunk_data *pcd = data;
+	*pcd->p = chunk_start;
+	*pcd->size = chunk_size;
 	return 0;
 }
 
 int pair_chunk(struct chunkfile *cf,
 	       uint32_t chunk_id,
-	       const unsigned char **p)
+	       const unsigned char **p,
+	       size_t *size)
+{
+	struct pair_chunk_data pcd = { .p = p, .size = size };
+	return read_chunk(cf, chunk_id, pair_chunk_fn, &pcd);
+}
+
+int pair_chunk_unsafe(struct chunkfile *cf,
+		      uint32_t chunk_id,
+		      const unsigned char **p)
 {
-	return read_chunk(cf, chunk_id, pair_chunk_fn, p);
+	size_t dummy;
+	return pair_chunk(cf, chunk_id, p, &dummy);
 }
 
 int read_chunk(struct chunkfile *cf,
diff --git a/chunk-format.h b/chunk-format.h
index c7794e84ad..8dce7967f4 100644
--- a/chunk-format.h
+++ b/chunk-format.h
@@ -43,13 +43,28 @@ int read_table_of_contents(struct chunkfile *cf,
 /*
  * Find 'chunk_id' in the given chunkfile and assign the
  * given pointer to the position in the mmap'd file where
- * that chunk begins.
+ * that chunk begins. Likewise the "size" parameter is filled
+ * with the size of the chunk.
  *
  * Returns CHUNK_NOT_FOUND if the chunk does not exist.
  */
 int pair_chunk(struct chunkfile *cf,
 	       uint32_t chunk_id,
-	       const unsigned char **p);
+	       const unsigned char **p,
+	       size_t *size);
+
+/*
+ * Unsafe version of pair_chunk; it does not return the size,
+ * meaning that the caller cannot possibly be careful about
+ * reading out of bounds from the mapped memory.
+ *
+ * No new callers should use this function, and old callers should
+ * be audited and migrated over to using the regular pair_chunk()
+ * function.
+ */
+int pair_chunk_unsafe(struct chunkfile *cf,
+		      uint32_t chunk_id,
+		      const unsigned char **p);
 
 typedef int (*chunk_read_fn)(const unsigned char *chunk_start,
 			     size_t chunk_size, void *data);
diff --git a/commit-graph.c b/commit-graph.c
index 1a56efcf69..a689a55b79 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -394,25 +394,25 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
 				   GRAPH_HEADER_SIZE, graph->num_chunks))
 		goto free_and_return;
 
-	pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT,
+	pair_chunk_unsafe(cf, GRAPH_CHUNKID_OIDFANOUT,
 		   (const unsigned char **)&graph->chunk_oid_fanout);
 	read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
-	pair_chunk(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
-	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
-	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
+	pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
+	pair_chunk_unsafe(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
+	pair_chunk_unsafe(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
 
 	if (s->commit_graph_generation_version >= 2) {
-		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
+		pair_chunk_unsafe(cf, GRAPH_CHUNKID_GENERATION_DATA,
 			&graph->chunk_generation_data);
-		pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
+		pair_chunk_unsafe(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
 			&graph->chunk_generation_data_overflow);
 
 		if (graph->chunk_generation_data)
 			graph->read_generation_data = 1;
 	}
 
 	if (s->commit_graph_read_changed_paths) {
-		pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
+		pair_chunk_unsafe(cf, GRAPH_CHUNKID_BLOOMINDEXES,
 			   &graph->chunk_bloom_indexes);
 		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
 			   graph_read_bloom_data, graph);
diff --git a/midx.c b/midx.c
index 931f557350..3165218ab5 100644
--- a/midx.c
+++ b/midx.c
@@ -143,19 +143,19 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 				   MIDX_HEADER_SIZE, m->num_chunks))
 		goto cleanup_fail;
 
-	if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names) == CHUNK_NOT_FOUND)
+	if (pair_chunk_unsafe(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required pack-name chunk"));
 	if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required OID fanout chunk"));
-	if (pair_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, &m->chunk_oid_lookup) == CHUNK_NOT_FOUND)
+	if (pair_chunk_unsafe(cf, MIDX_CHUNKID_OIDLOOKUP, &m->chunk_oid_lookup) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required OID lookup chunk"));
-	if (pair_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, &m->chunk_object_offsets) == CHUNK_NOT_FOUND)
+	if (pair_chunk_unsafe(cf, MIDX_CHUNKID_OBJECTOFFSETS, &m->chunk_object_offsets) == CHUNK_NOT_FOUND)
 		die(_("multi-pack-index missing required object offsets chunk"));
 
-	pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets);
+	pair_chunk_unsafe(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets);
 
 	if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1))
-		pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex);
+		pair_chunk_unsafe(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex);
 
 	m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
-- 
2.42.0.884.g35e1fe1a6a


^ permalink raw reply related

* [PATCH 0/20] bounds-checks for chunk-based files
From: Jeff King @ 2023-10-09 20:55 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau

As part of my -Wunused-parameter series, I noticed that a few callbacks
used with the chunk-format API ignored the "chunk_size" parameter. I had
initially just annotated these with UNUSED under the usual "well, not
all callbacks need all parameters" logic.

But if you think about it, a chunk callback that does not look at the
chunk size _must_ be buggy, as there is no way it could ensure that it
does not read past the end of the chunk. In a well-formed file this
isn't a problem, since most chunks have expected sizes (e.g., a
commit-graph has a fixed-size commit-data record for every commit
mentioned in its index chunk). But it is very easy to get out-of-bounds
reads for files that are not well-formed.

I think the security implications here are pretty minor. The only files
which use the chunk format are multi-pack-index and commit-graph,
neither of which we'd expect to receive over the network (so you'd need
to access an untrusted tarball, etc, to even see a malicious file). And
we'd never try to write to this memory (they're read-only mmaps of the
files). So your worst case is probably an unexpected out-of-bounds read
and segfault, on a file that is hard to put in front of the victim.

But I think it's still worth fixing. The extra checks aren't very
expensive, and let us handle bugs or corruption more gracefully, as
well.

It's a lot of patches because there are a lot of chunk types. ;) But
each one is hopefully pretty straightforward in isolation. I tried to
group similar chunks together (e.g., commit-graph and midx both have
OIDF and OIDL chunks), but otherwise just fixed the midx chunks in the
order they appear in the code, followed by the same for commit-graph.

  [01/20]: chunk-format: note that pair_chunk() is unsafe
  [02/20]: t: add library for munging chunk-format files
  [03/20]: midx: stop ignoring malformed oid fanout chunk
  [04/20]: commit-graph: check size of oid fanout chunk
  [05/20]: midx: check size of oid lookup chunk
  [06/20]: commit-graph: check consistency of fanout table
  [07/20]: midx: check size of pack names chunk
  [08/20]: midx: enforce chunk alignment on reading
  [09/20]: midx: check size of object offset chunk
  [10/20]: midx: bounds-check large offset chunk
  [11/20]: midx: check size of revindex chunk
  [12/20]: commit-graph: check size of commit data chunk
  [13/20]: commit-graph: detect out-of-bounds extra-edges pointers
  [14/20]: commit-graph: bounds-check base graphs chunk
  [15/20]: commit-graph: check size of generations chunk
  [16/20]: commit-graph: bounds-check generation overflow chunk
  [17/20]: commit-graph: check bounds when accessing BDAT chunk
  [18/20]: commit-graph: check bounds when accessing BIDX chunk
  [19/20]: commit-graph: detect out-of-order BIDX offsets
  [20/20]: chunk-format: drop pair_chunk_unsafe()

 bloom.c                            |  34 +++++++++
 chunk-format.c                     |  24 ++++--
 chunk-format.h                     |   9 ++-
 commit-graph.c                     | 119 ++++++++++++++++++++++++-----
 commit-graph.h                     |   4 +
 midx.c                             |  68 +++++++++++++----
 midx.h                             |   3 +
 pack-revindex.c                    |  13 +++-
 t/lib-chunk.sh                     |  17 +++++
 t/lib-chunk/corrupt-chunk-file.pl  |  66 ++++++++++++++++
 t/t4216-log-bloom.sh               |  50 ++++++++++++
 t/t5318-commit-graph.sh            |  76 +++++++++++++++++-
 t/t5319-multi-pack-index.sh        | 102 ++++++++++++++++++++++++-
 t/t5324-split-commit-graph.sh      |  20 ++++-
 t/t5328-commit-graph-64bit-time.sh |  10 +++
 15 files changed, 568 insertions(+), 47 deletions(-)
 create mode 100644 t/lib-chunk.sh
 create mode 100644 t/lib-chunk/corrupt-chunk-file.pl

-Peff

^ permalink raw reply

* Re: gpg.ssh.defaultKeyCommand docs bug?
From: Jeff King @ 2023-10-09 20:43 UTC (permalink / raw)
  To: matthew sporleder; +Cc: Fabian Stelzer, Git Mailing List
In-Reply-To: <CAHKF-AvUxH1Ar3Xijjb4_8N+_kssPHZVHqQSAE9kDGRfTYHyxw@mail.gmail.com>

[+cc Fabian, who wrote this code]

On Fri, Oct 06, 2023 at 01:14:49PM -0400, matthew sporleder wrote:

> https://git-scm.com/docs/git-config#Documentation/git-config.txt-gpgsshdefaultKeyCommand
> 
> This command that will be run when user.signingkey is not set and a
> ssh signature is requested. On successful exit a valid ssh public key
> prefixed with key:: is expected in the first line of its output. This
> allows for a script doing a dynamic lookup of the correct public key
> when it is impractical to statically configure user.signingKey. For
> example when keys or SSH Certificates are rotated frequently or
> selection of the right key depends on external factors unknown to git.
> 
> ---
> 
> The command does not actually work (for me, git version 2.42.0) with
> key:: prefixed.
> 
> It only works if I cat the public key as-is.
> 
> I only figured this out because the docs previously said it took the
> format of ssh-add -L, which also doesn't not contain key::.
> 
> I am using this script for my "dynamic" key discovery:
> #!/bin/sh
> f=$(ssh -G $(git remote get-url $(git remote|head -1)|awk -F':' '{
> print $1 }') |grep -E '^identityfile'|sed 's#^identityfile ##g')
> cat $(eval realpath ${f}.pub)

I'm not very familiar with this part of Git, but looking at the code
which parses the output of gpg.ssh.defaultKeyCommand, it splits it by
line and then calls is_literal_ssh_key() on it, which is:

  static int is_literal_ssh_key(const char *string, const char **key)
  {
          if (skip_prefix(string, "key::", key))
                  return 1;
          if (starts_with(string, "ssh-")) {
                  *key = string;
                  return 1;
          }
          return 0;
  }

So your script works because the pub file starts with "ssh-rsa" or
similar (and so would "ssh-add -L" output).

The user.signingKey docs say:

  For backward compatibility, a raw key which begins with "ssh-", such
  as "ssh-rsa XXXXXX identifier", is treated as "key::ssh-rsa XXXXXX
  identifier", but this form is deprecated; use the key:: form instead.

From reading the commit messages here, I guess this is about supporting
non-ssh key types (e.g., my TPM-based key is ecdsa-sha2-nistp256 in the
"ssh-add -L" output). But I'm not sure who is supposed to be put "key::"
there.

You said it "does not actually work" with "key::" prefixed. What
happens? In the signing code we make a similar call to
is_literal_ssh_key() that wills trip off the "key::" prefix, so I'd
expect it work. But I could also believe there is a bug. :)

-Peff

^ permalink raw reply

* Re: [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack`
From: Jeff King @ 2023-10-09 20:21 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Elijah Newren, Junio C Hamano, git, Eric W. Biederman
In-Reply-To: <ZSNZZrWyCqRH+0Bd@nand.local>

On Sun, Oct 08, 2023 at 09:37:42PM -0400, Taylor Blau wrote:

> Very interesting, thanks for running (and documenting!) this experiment.
> I'm mostly with you that it probably doesn't make a huge difference in
> practice here.
> 
> One thing that I'm not entirely clear on is how we'd treat objects that
> could be good delta candidates for each other between two packs. For
> instance, if I write a tree corresponding to the merge between two
> branches, it's likely that the resulting tree would be a good delta
> candidate against either of the trees at the tips of those two refs.
> 
> But we won't pack those trees (the ones at the tips of the refs) in the
> same pack as the tree containing their merge. If we later on tried to
> repack, would we evaluate the tip trees as possible delta candidates
> against the merged tree? Or would we look at the merged tree, realize it
> isn't delta'd with anything, and then not attempt to find any
> candidates?

When we repack (either all-into-one, or in a geometric roll-up), we
should consider those trees as candidates. The only deltas we don't
consider are:

  - if something is already a delta in a pack, then we will usually
    reuse that delta verbatim (so you might get fooled by a mediocre
    delta and not go to the trouble to search again. But I don't think
    that applies here; there is no other tree in your new pack to make
    such a mediocre delta from, and anyway you are skipping deltas
    entirely)

  - if two objects are in the same pack but there's no delta
    relationship, the try_delta() heuristics will skip them immediately
    (under the assumption that we tried during the last repack and
    didn't find anything good).

So yes, if you had a big old pack with the original trees, and then a
new pack with the merge result, we should try to delta the new merge
result tree against the others, just as we would if it were loose.

> > I was going to suggest the same thing. ;) Unfortunately it's a bit
> > tricky to do as we have no room in the file format for an optional flag.
> > You'd have to add a ".mediocre-delta" file or something.
> 
> Yeah, I figured that we'd add a new ".baddeltas" file or something. (As
> an aside, we probably should have an optional flags section in the .pack
> format, since we seem to have a lot of optional pack extensions: .rev,
> .bitmap, .keep, .promisor, etc.)

Yes, though since packv2 is the on-the-wire format, it's very hard to
change now. It might be easier to put an annotation into the .idx file.

-Peff

^ permalink raw reply

* Re: [PATCH] doc/cat-file: make synopsis and description less confusing
From: Junio C Hamano @ 2023-10-09 20:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Štěpán Němec, git, avarab
In-Reply-To: <20231009183352.GA3270793@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Mon, Oct 09, 2023 at 07:56:04PM +0200, Štěpán Němec wrote:
>
>> > Yup, that is a good suggestion. Do you want to wrap all of this
>> > discussion up as a patch?
>> 
>> Certainly, here it is.  I just wanted to hash out some of the details
>> first and found the plain text more fit for that purpose.
>
> This looks good to me. Thanks for taking the time to collect and fix all
> of the issues.
>
> -Peff

Thanks, both.  Will take a look.

^ permalink raw reply

* Re: [PATCH v4 0/3] diff-merges: introduce '--dd' option
From: Junio C Hamano @ 2023-10-09 20:02 UTC (permalink / raw)
  To: Sergey Organov; +Cc: git, Elijah Newren
In-Reply-To: <20231009160535.236523-1-sorganov@gmail.com>

Sergey Organov <sorganov@gmail.com> writes:

> Updates in v4:
>
>   * Removed "(which see)" reference from documentation that caused
>     confusion.
>
>   * Removed explanation why it's --dd and not simply -d from commit
>     message.
>
>   * Refined --remerge-diff short description according to Junio and
>     Elijah comments.
>
>   * Added explanation of --dd purpose.
>
>   * Fixed style and syntax of "on,m::" description.

Will replace.  Thanks.


^ permalink raw reply

* Re: [PATCH 07/15] commit-graph: new filter ver. that fixes murmur3
From: Junio C Hamano @ 2023-10-09 19:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: SZEDER Gábor, Jonathan Tan, git, Jeff King
In-Reply-To: <ZSRVJ3iASRaDGc80@nand.local>

Taylor Blau <me@ttaylorr.com> writes:

> However, I am not entirely sure I agree with you that this is a "new"
> issue. At least in the sense that Git (on 'master') does not currently
> know how to deal with Bloom filters that have different settings across
> commit-graph layers.
>
> IOW, you could produce this problem today using the test you wrote in
> <20201015132147.GB24954@szeder.dev> using different values of the
> GIT_BLOOM_SETTINGS environment variables as a proxy for different values
> of the commitGraph.changedPathsVersion configuration variable introduced
> in this series.
>
> So I think that this series makes it easier to fall into that trap, but
> the trap itself is not new. I think a reasonable stopgap (which IIUC you
> have suggested earlier) is to prevent writing a new commit-graph layer
> with a different hash version than the previous layer.

What we probably want more urgently than that stopgap is to perhaps
teach the code pretend as if commit-graph did not exist when we
detect multiple layers use different hash versions (or perhaps only
use the base layer and ignore the rest as an anti-pessimization), to
protect correctness first, no?



^ permalink raw reply

* Re: [PATCH 07/15] commit-graph: new filter ver. that fixes murmur3
From: Taylor Blau @ 2023-10-09 19:31 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Jonathan Tan, git, Junio C Hamano, Jeff King
In-Reply-To: <ZSRD0tK3bk67aDw4@nand.local>

On Mon, Oct 09, 2023 at 02:17:54PM -0400, Taylor Blau wrote:
> On Sun, Oct 08, 2023 at 04:35:23PM +0200, SZEDER Gábor wrote:
> > > Hmm. I am confused -- are you saying that this series breaks existing
> > > functionality, or merely does not patch an existing breakage? I *think*
> > > that it's the latter,
> >
> > It's neither: the new functionality added in this series is broken.
> >
> > > since this test case fails identically on master,
> > > but I am not sure.
> >
> > Not sure what test you are referring to.  My test demonstrating the
> > breakage succeeds when adaped to master, because master doesn't
> > understand the commitgraph.changedPathsVersion=2 setting, and keeps
> > writing v1 Bloom filter chunks instead, so all commit-graphs layers
> > contain the same version.
>
> I was referring to the test you sent back in:
>
>     https://public-inbox.org/git/20201015132147.GB24954@szeder.dev/
>
> but I think that I should have been looking at the one you sent more
> recently in:
>
>     https://lore.kernel.org/git/20230830200218.GA5147@szeder.dev/

OK, I think that I now have my head wrapped around what you're saying.

However, I am not entirely sure I agree with you that this is a "new"
issue. At least in the sense that Git (on 'master') does not currently
know how to deal with Bloom filters that have different settings across
commit-graph layers.

IOW, you could produce this problem today using the test you wrote in
<20201015132147.GB24954@szeder.dev> using different values of the
GIT_BLOOM_SETTINGS environment variables as a proxy for different values
of the commitGraph.changedPathsVersion configuration variable introduced
in this series.

So I think that this series makes it easier to fall into that trap, but
the trap itself is not new. I think a reasonable stopgap (which IIUC you
have suggested earlier) is to prevent writing a new commit-graph layer
with a different hash version than the previous layer.

How does that sound?

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 03/25] documentation: fix typos
From: Junio C Hamano @ 2023-10-09 19:01 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Elijah Newren via GitGitGadget, git, Elijah Newren
In-Reply-To: <92beac82-f0fd-452b-858f-453cdf21b71f@ramsayjones.plus.com>

Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

> On 08/10/2023 07:45, Elijah Newren via GitGitGadget wrote:
> [snip]
>> diff --git a/Documentation/gitformat-pack.txt b/Documentation/gitformat-pack.txt
>> index 870e00f2982..42afb953e8c 100644
>> --- a/Documentation/gitformat-pack.txt
>> +++ b/Documentation/gitformat-pack.txt
>> @@ -17,8 +17,8 @@ $GIT_DIR/objects/pack/multi-pack-index
>>  DESCRIPTION
>>  -----------
>>  
>> -The Git pack format is now Git stores most of its primary repository
>> -data. Over the lietime af a repository loose objects (if any) and
>> +The Git pack format is how Git stores most of its primary repository
>> +data. Over the lietime of a repository loose objects (if any) and
>
> Hmm, this tyop jumped out at me while this patch
> floated past... (at least I assume it is a typo!):
>
>   s/lietime/lifetime/
>
> ATB,
> Ramsay Jones

Thanks, will squash in.

^ permalink raw reply

* Re: Outreachy Introduction and Interest in Contributing to the Git Community
From: Junio C Hamano @ 2023-10-09 18:54 UTC (permalink / raw)
  To: Doreen Wanyama; +Cc: git
In-Reply-To: <CANhBNnu6jbhNqAowBzyfcKDamXtcqCtZpY=QniTGTEJoCrwWOA@mail.gmail.com>

Doreen Wanyama <doreenwanyama20@gmail.com> writes:

> Dear Git community,
>
> I hope you are all doing well. I am writing to show my interest in
> working in the project titled move existing tests to a unit testing
> framework. This is because I have always been intrigued by the work
> the git community does and hence I am interested in being part of
> this. I have gone through the links provided about getting started on
> this. I spent yesterday evening and a better part of today trying to
> understand the resources. As of now I would like to start working on a
> microproject since I understand this is the first step. I am finding
> it difficult though to start. Someone to please help me understand how
> I should go about this or how I should go about finding my first
> microproject. Just a brief explanation will help.
> Thank you in advance.
>
>
> Best regards,
>
> Doreen Wanyama

Do you have access to the mailing list archive?

Visit

    https://lore.kernel.org/git/?q=Outreachy+d%3A20230920..

with your browser to see how other applicants are starting their
journey and being helped in finding their microprojects.  All the
suggestions they are given equally applies to you.

Welcome to the Git development community.

Have fun.

^ permalink raw reply

* Re: [PATCH v6] merge-tree: add -X strategy option
From: Jeff King @ 2023-10-09 18:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: phillip.wood, Izzy via GitGitGadget, git, Elijah Newren, Izzy
In-Reply-To: <xmqq4jiz3env.fsf@gitster.g>

On Mon, Oct 09, 2023 at 10:10:28AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I agree that struct-copying is an unusual pattern, and we'd potentially
> > run into problems with duplication. But I think it is even trickier than
> > that here. We also go on to actually _modify_ opt in this function,
> > assigning to various members (both directly, and I think the merge code
> > itself will write to opt->priv).
> >
> > So if we use a pointer (rather than struct assignment), those changes
> > will persist in the merge_options struct that was passed in. Which is
> > also weird.
> >
> > Between the two, I think using a pointer is probably the least-weird.
> > This real_merge() function is only called once, and is a static-local
> > helper for cmd_merge_tree(). So the two functions work as a single unit,
> > and munging "opt" is not a big deal.
> 
> It is called once per --stdin input to perform many merges in a row.
> The most obvious "structure to pointer to structure" conversion
> below seems to break an assertion (which is not very surprising, as
> it happens inside that --stdin loop), so I am tempted to revert the
> whole thing for now.

Oops, I totally missed the loop around the call to real_merge(). So
yeah, I think this is rather tricky.

Before Izzy's patch, real_merge() always makes its own fresh
merge_options. After, we have a template merge_options that we copy, but
we are assuming that a shallow struct copy is OK (probably true, but an
anti-pattern that may bite us later).  If we add Phillip's suggestion on
top, then we do not copy at all, and end up reusing the same options
struct (which is definitely wrong).

I don't think there are any bugs with the state at the current tip of
ty/merge-tree-strategy-options, but if we want to make it safer, I think
we have two options:

  - delay the conversion of the "xopts" list into merge_options until we
    initialize it in real_merge(). This avoids breaking abstraction
    boundaries, but does mean that the sanity-checking of the options
    happens a little later (but not much in practice).

  - provide a copy_merge_options() function, which makes this kind of
    "set up a template and then copy it" pattern official. It can be a
    struct assignment for now, but it at least alerts anybody adding new
    options to the notion that a deep copy might be required.

Option 1 looks something like this (a lot of the hunks are just
reverting the tip of that branch, so squashed in it would be even
smaller):

diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 7024b5ce2e..f9dbbdb867 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -415,7 +415,7 @@ struct merge_tree_options {
 	int show_messages;
 	int name_only;
 	int use_stdin;
-	struct merge_options merge_options;
+	struct strvec xopts;
 };
 
 static int real_merge(struct merge_tree_options *o,
@@ -425,7 +425,7 @@ static int real_merge(struct merge_tree_options *o,
 {
 	struct commit *parent1, *parent2;
 	struct commit_list *merge_bases = NULL;
-	struct merge_options opt = o->merge_options;
+	struct merge_options opt;
 	struct merge_result result = { 0 };
 	int show_messages = o->show_messages;
 
@@ -439,11 +439,17 @@ static int real_merge(struct merge_tree_options *o,
 		help_unknown_ref(branch2, "merge-tree",
 				 _("not something we can merge"));
 
+	init_merge_options(&opt, the_repository);
+
 	opt.show_rename_progress = 0;
 
 	opt.branch1 = branch1;
 	opt.branch2 = branch2;
 
+	for (size_t x = 0; x < o->xopts.nr; x++)
+		if (parse_merge_opt(&opt, o->xopts.v[x]))
+			die(_("unknown strategy option: -X%s"), o->xopts.v[x]);
+
 	if (merge_base) {
 		struct commit *base_commit;
 		struct tree *base_tree, *parent1_tree, *parent2_tree;
@@ -512,8 +518,7 @@ static int real_merge(struct merge_tree_options *o,
 
 int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 {
-	struct merge_tree_options o = { .show_messages = -1 };
-	struct strvec xopts = STRVEC_INIT;
+	struct merge_tree_options o = { .show_messages = -1,.xopts = STRVEC_INIT };
 	int expected_remaining_argc;
 	int original_argc;
 	const char *merge_base = NULL;
@@ -549,24 +554,18 @@ int cmd_merge_tree(int argc, const char **argv, const char *prefix)
 			   &merge_base,
 			   N_("commit"),
 			   N_("specify a merge-base for the merge")),
-		OPT_STRVEC('X', "strategy-option", &xopts, N_("option=value"),
+		OPT_STRVEC('X', "strategy-option", &o.xopts, N_("option=value"),
 			N_("option for selected merge strategy")),
 		OPT_END()
 	};
 
-	/* Init merge options */
-	init_merge_options(&o.merge_options, the_repository);
-
 	/* Parse arguments */
 	original_argc = argc - 1; /* ignoring argv[0] */
 	argc = parse_options(argc, argv, prefix, mt_options,
 			     merge_tree_usage, PARSE_OPT_STOP_AT_NON_OPTION);
 
-	if (xopts.nr && o.mode == MODE_TRIVIAL)
+	if (o.xopts.nr && o.mode == MODE_TRIVIAL)
 		die(_("--trivial-merge is incompatible with all other options"));
-	for (int x = 0; x < xopts.nr; x++)
-		if (parse_merge_opt(&o.merge_options, xopts.v[x]))
-			die(_("unknown strategy option: -X%s"), xopts.v[x]);
 
 	/* Handle --stdin */
 	if (o.use_stdin) {

^ permalink raw reply related

* Re: [PATCH 1/1] [OUTREACHY] Fixed add.c file to conform to guidelines when using die() listed in issue #635
From: Junio C Hamano @ 2023-10-09 18:49 UTC (permalink / raw)
  To: Naomi Ibe; +Cc: git
In-Reply-To: <20231009011546.509-1-naomi.ibeh69@gmail.com>

Naomi Ibe <naomi.ibeh69@gmail.com> writes:

> Subject: Re: [PATCH 1/1] [OUTREACHY] Fixed add.c file to conform to guidelines when using d

Subject: [OUTREACHY][PATCH 1/1] builtin/add.c: clean up die() messages

> From: Naomi <naomi.ibeh69@gmail.com>

The name and address on this line come from your commit object,
which in turn would have came from your configuration.  You have

    [user]
	name = Naomi
	email = naomi.ibeh69@gmail.com

somewhere in your configuration file, perhaps in $HOME/.gitconfig or
somewhere.  When contributiong to this project, you want that "name"
line to also include your family name, as it should match what you
write on your Signed-off-by: line.  A focused way to do so without
affecting your author identity for other projects is to add

    [user]
	name = Naomi Ibe

in .git/config of the repository that you use to contribute to this
project, e.g.,

    $ cd ... to the working tree of your clone of git you work in ...
    $ git config user.name "Naomi Ibe"

The space above your Sign off is to fill the details you omitted on
the title of the message (which would say something about "conform
to guidelines" or "clean up" without mentioning what rule the
original violates), making the whole thing something like:

    builtin/add.c: clean up die() messages

    As described in the CodingGuidelines document, a single line
    message given to die() and its friends should not capitalize its
    first word, and should not add full-stop at the end.

    Signed-off-by: ...

> Signed-off-by: Naomi Ibe <naomi.ibeh69@gmail.com>
> ---
>  builtin/add.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)

Thanks.  Otherwise the patch looks good.

>
> diff --git a/builtin/add.c b/builtin/add.c
> index c27254a5cd..5126d2ede3 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -182,7 +182,7 @@ static int edit_patch(int argc, const char **argv, const char *prefix)
>  	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
>  
>  	if (repo_read_index(the_repository) < 0)
> -		die(_("Could not read the index"));
> +		die(_("could not read the index"));
>  
>  	repo_init_revisions(the_repository, &rev, prefix);
>  	rev.diffopt.context = 7;
> @@ -200,15 +200,15 @@ static int edit_patch(int argc, const char **argv, const char *prefix)
>  		die(_("editing patch failed"));
>  
>  	if (stat(file, &st))
> -		die_errno(_("Could not stat '%s'"), file);
> +		die_errno(_("could not stat '%s'"), file);
>  	if (!st.st_size)
> -		die(_("Empty patch. Aborted."));
> +		die(_("empty patch. aborted"));
>  
>  	child.git_cmd = 1;
>  	strvec_pushl(&child.args, "apply", "--recount", "--cached", file,
>  		     NULL);
>  	if (run_command(&child))
> -		die(_("Could not apply '%s'"), file);
> +		die(_("could not apply '%s'"), file);
>  
>  	unlink(file);
>  	free(file);
> @@ -568,7 +568,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>  finish:
>  	if (write_locked_index(&the_index, &lock_file,
>  			       COMMIT_LOCK | SKIP_IF_UNCHANGED))
> -		die(_("Unable to write new index file"));
> +		die(_("unable to write new index file"));
>  
>  	dir_clear(&dir);
>  	clear_pathspec(&pathspec);

^ permalink raw reply

* Re: [PATCH] doc/cat-file: make synopsis and description less confusing
From: Jeff King @ 2023-10-09 18:33 UTC (permalink / raw)
  To: Štěpán Němec; +Cc: git, avarab
In-Reply-To: <20231009175604.2361700-1-stepnem@smrk.net>

On Mon, Oct 09, 2023 at 07:56:04PM +0200, Štěpán Němec wrote:

> > Yup, that is a good suggestion. Do you want to wrap all of this
> > discussion up as a patch?
> 
> Certainly, here it is.  I just wanted to hash out some of the details
> first and found the plain text more fit for that purpose.

This looks good to me. Thanks for taking the time to collect and fix all
of the issues.

-Peff

^ permalink raw reply

* [Outreachy] Introduction and Interest in Contributing to the Git Community
From: Doreen Wanyama @ 2023-10-09 18:31 UTC (permalink / raw)
  To: git

Dear Git community,

I hope you are all doing well. I am writing to show my interest in
working in the project titled move existing tests to a unit testing
framework. This is because I have always been intrigued by the work
the git community does and hence I am interested in being part of
this. I have gone through the links provided about getting started on
this. I spent yesterday evening and a better part of today trying to
understand the resources. As of now I would like to start working on a
microproject since I understand this is the first step. I am finding
it difficult though to start. Someone to please help me understand how
I should go about this or how I should go about finding my first
microproject. Just a brief explanation will help.
Thank you in advance.

Best regards,

Doreen Wanyama

^ permalink raw reply

* Re: [PATCH] pretty: fix ref filtering for %(decorate) formats
From: Junio C Hamano @ 2023-10-09 18:24 UTC (permalink / raw)
  To: Andy Koppe; +Cc: git
In-Reply-To: <20231008202307.1568477-1-andy.koppe@gmail.com>

Andy Koppe <andy.koppe@gmail.com> writes:

> Mark pretty formats containing "%(decorate" as requiring decoration in
> userformat_find_requirements(), same as "%d" and "%D".

Ah, of course.  The patch makes sense.

> diff --git a/t/t4205-log-pretty-formats.sh b/t/t4205-log-pretty-formats.sh
> index 16626e4fe9..5aabc9f7d8 100755
> --- a/t/t4205-log-pretty-formats.sh
> +++ b/t/t4205-log-pretty-formats.sh
> @@ -590,9 +590,9 @@ test_expect_success 'pretty format %decorate' '
>  	git log --format="%(decorate:prefix=,suffix=)" -1 >actual2 &&
>  	test_cmp expect2 actual2 &&
>  
> -	echo "[ HEAD -> foo; tag: bar; qux ]" >expect3 &&
> -	git log --format="%(decorate:prefix=[ ,suffix= ],separator=%x3B )" \
> -		-1 >actual3 &&
> +	echo "[ bar; qux; foo ]" >expect3 &&
> +	git log --format="%(decorate:prefix=[ ,suffix= ],separator=%x3B ,tag=)" \
> +		--decorate-refs=refs/ -1 >actual3 &&
>  	test_cmp expect3 actual3 &&

The original test shares the same, but is the order of multiple
decorations expected to be stable?  I feel a bit uneasy to see a
test that insists multiple things come out in a hardcoded order.

It is not making anything _worse_, so let's take the patch as-is.

Thanks.

>  	# Try with a typo (in "separator"), in which case the placeholder should

^ permalink raw reply

* Re: [PATCH 07/15] commit-graph: new filter ver. that fixes murmur3
From: Taylor Blau @ 2023-10-09 18:17 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Jonathan Tan, git, Junio C Hamano, Jeff King
In-Reply-To: <20231008143523.GA18858@szeder.dev>

On Sun, Oct 08, 2023 at 04:35:23PM +0200, SZEDER Gábor wrote:
> > Hmm. I am confused -- are you saying that this series breaks existing
> > functionality, or merely does not patch an existing breakage? I *think*
> > that it's the latter,
>
> It's neither: the new functionality added in this series is broken.
>
> > since this test case fails identically on master,
> > but I am not sure.
>
> Not sure what test you are referring to.  My test demonstrating the
> breakage succeeds when adaped to master, because master doesn't
> understand the commitgraph.changedPathsVersion=2 setting, and keeps
> writing v1 Bloom filter chunks instead, so all commit-graphs layers
> contain the same version.

I was referring to the test you sent back in:

    https://public-inbox.org/git/20201015132147.GB24954@szeder.dev/

but I think that I should have been looking at the one you sent more
recently in:

    https://lore.kernel.org/git/20230830200218.GA5147@szeder.dev/

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 1/4] ref-cache.c: fix prefix matching in ref iteration
From: Junio C Hamano @ 2023-10-09 18:15 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Patrick Steinhardt, Victoria Dye via GitGitGadget, git
In-Reply-To: <3585d72f-9f06-d190-ad5a-bec6db3f647f@github.com>

Victoria Dye <vdye@github.com> writes:

> I originally operated on the assumption that it was the first case, which is
> why I didn't include a test in this patch. Commands like 'for-each-ref',
> 'show-ref', etc. either use an empty prefix or a directory prefix with a
> trailing slash, which won't trigger this issue.

Ah, yes, I didn't mention it but I suspected as such (i.e. the code
is structured in such a way that this broken implementation does not
matter to the current callers).

> I encountered the problem
> while working on a builtin that filtered refs by a user-specified prefix -
> the results included refs that should not have been matched, which led me to
> this fix.

OK, perfectly understandable.

> Scanning through the codebase again, though, I do see a way to replicate the
> issue:
>
> $ git update-ref refs/bisect/b HEAD
> $ git rev-parse --abbrev-ref --bisect
> refs/bisect/b
>
> Because 'rev-parse --bisect' uses the "refs/bisect/bad" prefix (no trailing
> slash) and does no additional filtering in its 'for_each_fullref_in'
> callback, refs like "refs/bisect/b" and "refs/bisect/ba" are (incorrectly)
> matched. I'll re-roll with the added test.

Good find.  Thanks!

^ permalink raw reply

* Re: What's cooking in git.git (Oct 2023, #03; Fri, 6)
From: Taylor Blau @ 2023-10-09 18:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqfs2j1xdo.fsf@gitster.g>

On Mon, Oct 09, 2023 at 11:09:07AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > On Sat, Oct 07, 2023 at 01:20:02AM -0700, Junio C Hamano wrote:
> >> * tb/repack-max-cruft-size (2023-10-05) 4 commits
> >>   (merged to 'next' on 2023-10-06 at b3ca6df3b9)
> >>  + builtin/repack.c: avoid making cruft packs preferred
> >>  + builtin/repack.c: implement support for `--max-cruft-size`
> >>  + builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
> >>  + t7700: split cruft-related tests to t7704
> >>
> >>  "git repack" learned "--max-cruft-size" to prevent cruft packs from
> >>  growing without bounds.
> >>
> >>  Will merge to 'master'.
> >>  source: <cover.1696293862.git.me@ttaylorr.com>
> >>  source: <035393935108d02aaf8927189b05102f4f74f340.1696370003.git.me@ttaylorr.com>
> >
> > Thanks. On a semi-related note, did you want to pick up my patch in
> >
> >   https://lore.kernel.org/git/035393935108d02aaf8927189b05102f4f74f340.1696370003.git.me@ttaylorr.com/
> >
> > ? That should address a performance bug that occurs when a repository
> > (incorrectly) chooses a cruft pack as its preferred pack source when
> > writing a MIDX bitmap, significantly impeding the pack-reuse mechanism.
>
> Isn't that in the above list already as b3ca6df3b9^2?

Oops, duh. I hadn't expected you to group that patch in with the rest of
tb/repack-max-cruft-size.

I'll put on my glasses the next time before suggesting you dropped one
of my patches on the floor... ;-)

Thanks,
Taylor

^ permalink raw reply

* Re: What's cooking in git.git (Oct 2023, #03; Fri, 6)
From: Junio C Hamano @ 2023-10-09 18:09 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git
In-Reply-To: <ZSQnVnK0k3bdk5zX@nand.local>

Taylor Blau <me@ttaylorr.com> writes:

> On Sat, Oct 07, 2023 at 01:20:02AM -0700, Junio C Hamano wrote:
>> * tb/repack-max-cruft-size (2023-10-05) 4 commits
>>   (merged to 'next' on 2023-10-06 at b3ca6df3b9)
>>  + builtin/repack.c: avoid making cruft packs preferred
>>  + builtin/repack.c: implement support for `--max-cruft-size`
>>  + builtin/repack.c: parse `--max-pack-size` with OPT_MAGNITUDE
>>  + t7700: split cruft-related tests to t7704
>>
>>  "git repack" learned "--max-cruft-size" to prevent cruft packs from
>>  growing without bounds.
>>
>>  Will merge to 'master'.
>>  source: <cover.1696293862.git.me@ttaylorr.com>
>>  source: <035393935108d02aaf8927189b05102f4f74f340.1696370003.git.me@ttaylorr.com>
>
> Thanks. On a semi-related note, did you want to pick up my patch in
>
>   https://lore.kernel.org/git/035393935108d02aaf8927189b05102f4f74f340.1696370003.git.me@ttaylorr.com/
>
> ? That should address a performance bug that occurs when a repository
> (incorrectly) chooses a cruft pack as its preferred pack source when
> writing a MIDX bitmap, significantly impeding the pack-reuse mechanism.

Isn't that in the above list already as b3ca6df3b9^2?



^ permalink raw reply

* [PATCH] doc/cat-file: make synopsis and description less confusing
From: Štěpán Němec @ 2023-10-09 17:56 UTC (permalink / raw)
  To: peff; +Cc: git, avarab
In-Reply-To: <20231009155603.GA3251575@coredump.intra.peff.net>

The DESCRIPTION's "first form" is actually the 1st, 2nd, 3rd and 5th
form in SYNOPSIS, the "second form" is the 4th one.

Interestingly, this state of affairs was introduced in
97fe7250753b (cat-file docs: fix SYNOPSIS and "-h" output, 2021-12-28)
with the claim of "Now the two will match again." ("the two" being
DESCRIPTION and SYNOPSIS)...

The description also suffers from other correctness and clarity issues,
e.g., the "first form" paragraph discusses -p, -s and -t, but leaves out
-e, which is included in the corresponding SYNOPSIS section; the second
paragraph mentions <format>, which doesn't occur in SYNOPSIS at all, and
of the three batch options, really only describes the behavior of
--batch-check.  Also the mention of "drivers" seems an implementation
detail not adding much clarity in a short summary (and isn't expanded
upon in the rest of the man page, either).

Rather than trying to maintain one-to-one (or N-to-M) correspondence
between the DESCRIPTION and SYNOPSIS forms, creating duplication and
providing opportunities for error, shorten the former into a concise
summary describing the two general modes of operation: batch and
non-batch, leaving details to the subsequent manual sections.

While here, fix a grammar error in the description of -e and make the
following further minor improvements:

  NAME:
    shorten ("content or type and size" isn't the whole story; say
    "details" and leave the actual details to later sections)

  SYNOPSIS and --help:
    move the (--textconv | --filters) form before --batch, closer
    to the other non-batch forms

Signed-off-by: Štěpán Němec <stepnem@smrk.net>
---
On Mon, 09 Oct 2023 11:56:03 -0400
Jeff King wrote:

> On Mon, Oct 09, 2023 at 10:36:51AM +0200, Štěpán Němec wrote:
>
>> > Ah, true, I was thinking that the DESCRIPTION section would be the first
>> > thing users would read, but I didn't notice the headline. I agree that
>> > what it says is probably sufficient (though arguably "type and size" is
>> > slightly inaccurate there; I said "details" in my proposed text but
>> > maybe that is too vague).
>> 
>> We could also leave the NAME vague(r) and put an additional sentence at
>> the beginning of DESCRIPTION:
>
> Yup, that is a good suggestion. Do you want to wrap all of this
> discussion up as a patch?

Certainly, here it is.  I just wanted to hash out some of the details
first and found the plain text more fit for that purpose.

Thanks.

To anyone not following the whole thread: this is a slightly more
ambitious version of [1] ("[PATCH] doc/cat-file: clarify description
regarding various command forms"), intended to supersede it.

[1]
https://lore.kernel.org/git/20231003082513.3003520-1-stepnem@smrk.net/

 Documentation/git-cat-file.txt | 30 ++++++++++++++----------------
 builtin/cat-file.c             |  4 ++--
 2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 0e4936d18263..bd95a6c10a7d 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -3,8 +3,7 @@ git-cat-file(1)
 
 NAME
 ----
-git-cat-file - Provide content or type and size information for repository objects
-
+git-cat-file - Provide contents or details of repository objects
 
 SYNOPSIS
 --------
@@ -12,25 +11,24 @@ SYNOPSIS
 'git cat-file' <type> <object>
 'git cat-file' (-e | -p) <object>
 'git cat-file' (-t | -s) [--allow-unknown-type] <object>
+'git cat-file' (--textconv | --filters)
+	     [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]
 'git cat-file' (--batch | --batch-check | --batch-command) [--batch-all-objects]
 	     [--buffer] [--follow-symlinks] [--unordered]
 	     [--textconv | --filters] [-Z]
-'git cat-file' (--textconv | --filters)
-	     [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]
 
 DESCRIPTION
 -----------
-In its first form, the command provides the content or the type of an object in
-the repository. The type is required unless `-t` or `-p` is used to find the
-object type, or `-s` is used to find the object size, or `--textconv` or
-`--filters` is used (which imply type "blob").
+Output the contents or other properties such as size, type or delta
+information of one or more objects.
 
-In the second form, a list of objects (separated by linefeeds) is provided on
-stdin, and the SHA-1, type, and size of each object is printed on stdout. The
-output format can be overridden using the optional `<format>` argument. If
-either `--textconv` or `--filters` was specified, the input is expected to
-list the object names followed by the path name, separated by a single
-whitespace, so that the appropriate drivers can be determined.
+This command can operate in two modes, depending on whether an option
+from the `--batch` family is specified.
+
+In non-batch mode, the command provides information on an object
+named on the command line.
+
+In batch mode, arguments are read from standard input.
 
 OPTIONS
 -------
@@ -51,8 +49,8 @@ OPTIONS
 
 -e::
 	Exit with zero status if `<object>` exists and is a valid
-	object. If `<object>` is of an invalid format exit with non-zero and
-	emits an error on stderr.
+	object. If `<object>` is of an invalid format, exit with non-zero
+	status and emit an error on stderr.
 
 -p::
 	Pretty-print the contents of `<object>` based on its type.
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 694c8538df2f..ea8ad601ecc0 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -922,11 +922,11 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
 		N_("git cat-file <type> <object>"),
 		N_("git cat-file (-e | -p) <object>"),
 		N_("git cat-file (-t | -s) [--allow-unknown-type] <object>"),
+		N_("git cat-file (--textconv | --filters)\n"
+		   "             [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]"),
 		N_("git cat-file (--batch | --batch-check | --batch-command) [--batch-all-objects]\n"
 		   "             [--buffer] [--follow-symlinks] [--unordered]\n"
 		   "             [--textconv | --filters] [-Z]"),
-		N_("git cat-file (--textconv | --filters)\n"
-		   "             [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]"),
 		NULL
 	};
 	const struct option options[] = {

base-commit: 3a06386e314565108ad56a9bdb8f7b80ac52fb69
-- 
2.42.0


^ permalink raw reply related

* Outreachy Introduction and Interest in Contributing to the Git Community
From: Doreen Wanyama @ 2023-10-09 17:38 UTC (permalink / raw)
  To: git

Dear Git community,

I hope you are all doing well. I am writing to show my interest in
working in the project titled move existing tests to a unit testing
framework. This is because I have always been intrigued by the work
the git community does and hence I am interested in being part of
this. I have gone through the links provided about getting started on
this. I spent yesterday evening and a better part of today trying to
understand the resources. As of now I would like to start working on a
microproject since I understand this is the first step. I am finding
it difficult though to start. Someone to please help me understand how
I should go about this or how I should go about finding my first
microproject. Just a brief explanation will help.
Thank you in advance.


Best regards,

Doreen Wanyama

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox