Git development
 help / color / mirror / Atom feed
* [PATCH v2 4/5] reftable/writer: fix writing multi-level indices
From: Patrick Steinhardt @ 2024-02-01  7:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Toon Claes, Kristoffer Haugsbakk
In-Reply-To: <cover.1706773842.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4607 bytes --]

When finishing a section we will potentially write an index that makes
it more efficient to look up relevant blocks. The index records written
will encode, for each block of the indexed section, what the offset of
that block is as well as the last key of that block. Thus, the reader
would iterate through the index records to find the first key larger or
equal to the wanted key and then use the encoded offset to look up the
desired block.

When there are a lot of blocks to index though we may end up writing
multiple index blocks, too. To not require a linear search across all
index blocks we instead end up writing a multi-level index. Instead of
referring to the block we are after, an index record may point to
another index block. The reader will then access the highest-level index
and follow down the chain of index blocks until it hits the sought-after
block.

It has been observed though that it is impossible to seek ref records of
the last ref block when using a multi-level index. While the multi-level
index exists and looks fine for most of the part, the highest-level
index was missing an index record pointing to the last block of the next
index. Thus, every additional level made more refs become unseekable at
the end of the ref section.

The root cause is that we are not flushing the last block of the current
level once done writing the level. Consequently, it wasn't recorded in
the blocks that need to be indexed by the next-higher level and thus we
forgot about it.

Fix this bug by flushing blocks after we have written all index records.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/readwrite_test.c | 56 +++++++++++++++++++++++++++++++++++++++
 reftable/writer.c         |  8 +++---
 2 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index 6b99daeaf2..853923397e 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -866,6 +866,61 @@ static void test_write_multiple_indices(void)
 	strbuf_release(&buf);
 }
 
+static void test_write_multi_level_index(void)
+{
+	struct reftable_write_options opts = {
+		.block_size = 100,
+	};
+	struct strbuf writer_buf = STRBUF_INIT, buf = STRBUF_INIT;
+	struct reftable_block_source source = { 0 };
+	struct reftable_iterator it = { 0 };
+	const struct reftable_stats *stats;
+	struct reftable_writer *writer;
+	struct reftable_reader *reader;
+	int err;
+
+	writer = reftable_new_writer(&strbuf_add_void, &noop_flush, &writer_buf, &opts);
+	reftable_writer_set_limits(writer, 1, 1);
+	for (size_t i = 0; i < 200; i++) {
+		struct reftable_ref_record ref = {
+			.update_index = 1,
+			.value_type = REFTABLE_REF_VAL1,
+			.value.val1 = {i},
+		};
+
+		strbuf_reset(&buf);
+		strbuf_addf(&buf, "refs/heads/%03" PRIuMAX, (uintmax_t)i);
+		ref.refname = buf.buf,
+
+		err = reftable_writer_add_ref(writer, &ref);
+		EXPECT_ERR(err);
+	}
+	reftable_writer_close(writer);
+
+	/*
+	 * The written refs should be sufficiently large to result in a
+	 * multi-level index.
+	 */
+	stats = reftable_writer_stats(writer);
+	EXPECT(stats->ref_stats.max_index_level == 2);
+
+	block_source_from_strbuf(&source, &writer_buf);
+	err = reftable_new_reader(&reader, &source, "filename");
+	EXPECT_ERR(err);
+
+	/*
+	 * Seeking the last ref should work as expected.
+	 */
+	err = reftable_reader_seek_ref(reader, &it, "refs/heads/199");
+	EXPECT_ERR(err);
+
+	reftable_iterator_destroy(&it);
+	reftable_writer_free(writer);
+	reftable_reader_free(reader);
+	strbuf_release(&writer_buf);
+	strbuf_release(&buf);
+}
+
 static void test_corrupt_table_empty(void)
 {
 	struct strbuf buf = STRBUF_INIT;
@@ -916,5 +971,6 @@ int readwrite_test_main(int argc, const char *argv[])
 	RUN_TEST(test_write_object_id_length);
 	RUN_TEST(test_write_object_id_min_length);
 	RUN_TEST(test_write_multiple_indices);
+	RUN_TEST(test_write_multi_level_index);
 	return 0;
 }
diff --git a/reftable/writer.c b/reftable/writer.c
index b5bcce0292..0349094d29 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -418,15 +418,15 @@ static int writer_finish_section(struct reftable_writer *w)
 				return err;
 		}
 
+		err = writer_flush_block(w);
+		if (err < 0)
+			return err;
+
 		for (i = 0; i < idx_len; i++)
 			strbuf_release(&idx[i].last_key);
 		reftable_free(idx);
 	}
 
-	err = writer_flush_block(w);
-	if (err < 0)
-		return err;
-
 	writer_clear_index(w);
 
 	bstats = writer_reftable_block_stats(w, typ);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 5/5] reftable: document reading and writing indices
From: Patrick Steinhardt @ 2024-02-01  7:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Toon Claes, Kristoffer Haugsbakk
In-Reply-To: <cover.1706773842.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 3863 bytes --]

The way the index gets written and read is not trivial at all and
requires the reader to piece together a bunch of parts to figure out how
it works. Add some documentation to hopefully make this easier to
understand for the next reader.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 27 +++++++++++++++++++++++++++
 reftable/writer.c | 23 +++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/reftable/reader.c b/reftable/reader.c
index 278f727a3d..6011d0aa04 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -508,11 +508,38 @@ static int reader_seek_indexed(struct reftable_reader *r,
 	if (err < 0)
 		goto done;
 
+	/*
+	 * The index may consist of multiple levels, where each level may have
+	 * multiple index blocks. We start by doing a linear search in the
+	 * highest layer that identifies the relevant index block as well as
+	 * the record inside that block that corresponds to our wanted key.
+	 */
 	err = reader_seek_linear(&index_iter, &want_index);
 	if (err < 0)
 		goto done;
 
+	/*
+	 * Traverse down the levels until we find a non-index entry.
+	 */
 	while (1) {
+		/*
+		 * In case we seek a record that does not exist the index iter
+		 * will tell us that the iterator is over. This works because
+		 * the last index entry of the current level will contain the
+		 * last key it knows about. So in case our seeked key is larger
+		 * than the last indexed key we know that it won't exist.
+		 *
+		 * There is one subtlety in the layout of the index section
+		 * that makes this work as expected: the highest-level index is
+		 * at end of the section and will point backwards and thus we
+		 * start reading from the end of the index section, not the
+		 * beginning.
+		 *
+		 * If that wasn't the case and the order was reversed then the
+		 * linear seek would seek into the lower levels and traverse
+		 * all levels of the index only to find out that the key does
+		 * not exist.
+		 */
 		err = table_iter_next(&index_iter, &index_result);
 		table_iter_block_done(&index_iter);
 		if (err != 0)
diff --git a/reftable/writer.c b/reftable/writer.c
index 0349094d29..e23953e498 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -391,6 +391,24 @@ static int writer_finish_section(struct reftable_writer *w)
 	if (err < 0)
 		return err;
 
+	/*
+	 * When the section we are about to index has a lot of blocks then the
+	 * index itself may span across multiple blocks, as well. This would
+	 * require a linear scan over index blocks only to find the desired
+	 * indexed block, which is inefficient. Instead, we write a multi-level
+	 * index where index records of level N+1 will refer to index blocks of
+	 * level N. This isn't constant time, either, but at least logarithmic.
+	 *
+	 * This loop handles writing this multi-level index. Note that we write
+	 * the lowest-level index pointing to the indexed blocks first. We then
+	 * continue writing additional index levels until the current level has
+	 * less blocks than the threshold so that the highest level will be at
+	 * the end of the index section.
+	 *
+	 * Readers are thus required to start reading the index section from
+	 * its end, which is why we set `index_start` to the beginning of the
+	 * last index section.
+	 */
 	while (w->index_len > threshold) {
 		struct reftable_index_record *idx = NULL;
 		size_t i, idx_len;
@@ -427,6 +445,11 @@ static int writer_finish_section(struct reftable_writer *w)
 		reftable_free(idx);
 	}
 
+	/*
+	 * The index may still contain a number of index blocks lower than the
+	 * threshold. Clear it so that these entries don't leak into the next
+	 * index section.
+	 */
 	writer_clear_index(w);
 
 	bstats = writer_reftable_block_stats(w, typ);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* Re: [PATCH 3/5] reftable/writer: simplify writing index records
From: Patrick Steinhardt @ 2024-02-01  8:39 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git
In-Reply-To: <b96f8ce1-f597-447b-b410-e135626e03d0@app.fastmail.com>

[-- Attachment #1: Type: text/plain, Size: 939 bytes --]

On Wed, Jan 31, 2024 at 04:55:33PM +0100, Kristoffer Haugsbakk wrote:
> Hi
> 
> It looks like this isn’t in `next` yet so I’ll leave a comment.
> 
> > @@ -405,6 +405,7 @@ static int writer_finish_section(struct reftable_writer *w)
> >  		w->index = NULL;
> >  		w->index_len = 0;
> >  		w->index_cap = 0;
> > +
> 
> This part and the next quoted one seem to be an unrelated edit by
> `clang-format`. They could perhaps be grouped in their own patch.

I'll just drop this newline change here, it indeed does make the reader
wonder what's going on. I'll keep the other one though -- it does not
quite feel sensible to move it into its own patch.

Thanks!

Patrick

> > -				abort();
> > -			}
> >  		}
> > -		for (i = 0; i < idx_len; i++) {
> > +
> > +		for (i = 0; i < idx_len; i++)
> >  			strbuf_release(&idx[i].last_key);
> > -		}
> >  		reftable_free(idx);
> >  	}
> 
> -- 
> Kristoffer Haugsbakk

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 3/5] reftable/writer: simplify writing index records
From: Patrick Steinhardt @ 2024-02-01  8:39 UTC (permalink / raw)
  To: Toon Claes; +Cc: git
In-Reply-To: <87sf2dppjt.fsf@iotcl.com>

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]

On Wed, Jan 31, 2024 at 02:44:38PM +0100, Toon Claes wrote:
> 
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > When finishing the current section we may end up writing index records
> > for the section to the table. The logic to do so essentially copies what
> > we already have in `writer_add_record()`, making this more complicated
> > than it really has to be.
> 
> I didn't feel like this commit message made it easier for me to
> understand, because I interpreted words differently than you intended.
> Using "may end up" makes it sound like it's unexpected behavior. Also
> the use of "copies" implies to me it's doing a copy operation. I would
> rephrase it to something like:
> 
>   When finishing the current section some index records might be written
>   for the section to the table. The logic to do so is essentially
>   duplicated from what we already have in `writer_add_record()`, making
>   this more complicated than it really has to be.
> 
> Other than that, I don't have any comments about this patch series.

Thanks, I'll use a slightly adapted version of this.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: Git in GSoC 2024
From: Karthik Nayak @ 2024-02-01  9:00 UTC (permalink / raw)
  To: Kaartic Sivaraam
  Cc: Patrick Steinhardt, Christian Couder, git, Taylor Blau,
	Junio C Hamano, Victoria Dye
In-Reply-To: <c61322de-8cd9-42b8-a04b-a8ae47b25c5e@gmail.com>

Hello all!

On Wed, Jan 31, 2024 at 6:58 PM Kaartic Sivaraam
<kaartic.sivaraam@gmail.com> wrote:
>
> Hi Patrick,
>
> On 31 January 2024 6:40:36 pm IST, Patrick Steinhardt <ps@pks.im> wrote:
> >On Tue, Jan 30, 2024 at 10:08:53AM +0100, Patrick Steinhardt wrote:
> >> On Tue, Jan 30, 2024 at 09:38:48AM +0100, Christian Couder wrote:
> >>
> >> Yes, the tests in t0032-reftable-unittest.sh should be ported over to
> >> the new unit test framework eventually, and I think that this might be a
> >> good GSoC project indeed.
> >>
>
> Nice. Good to hear that.
>
> >> If there is interest I'd also be happy to draft up some more topics in
> >> the context of refs and the reftable backend. I'm sure there should be
> >> some topics here that would be a good fit for the GSoC project, and I'd
> >> be happy to mentor any such project in this context.
> >
>
> Great. Thanks for your interest in willing to mentor!
>
> I created a fairly rough SoC ideas page for now including a barebones
> information about the unit test migration idea:
>
> https://git.github.io/SoC-2024-Ideas/
>
> Note well that the existing idea's description is far from complete and
> I mostly just cooked it up to serve as a template for how the idea entry
> could look like. Kindly contribute towards elaborating the same :-)
>
> Also, feel free to suggest ideas you have around refs and reftable
> backed, Patrick. Those would be helpful.
>
> >I noticed that the starting period falls right into my honeymoon from
> >June 17th until July 19th. This unfortunately makes it quite a lot
> >harder for me to mentor projects alone. Still, I'd be happy to co-mentor
> >or help out in other ways.
> >
>
> I too don't believe your vacation is going to be a deal breaker for you
> being a mentor. It should be totally fine given that we get a backup
> mentor who is also willing to mentor the candidate. (side note: I myself
> have no knowledge about refs backends. So, I suppose I might not be able
> to help co-mentor this one).

I should be able to help out here, happy to co-mentor on the reftable topics.

^ permalink raw reply

* Re: Git in GSoC 2024
From: Patrick Steinhardt @ 2024-02-01  9:38 UTC (permalink / raw)
  To: Kaartic Sivaraam
  Cc: Christian Couder, git, Taylor Blau, Junio C Hamano, Victoria Dye,
	Karthik Nayak
In-Reply-To: <c61322de-8cd9-42b8-a04b-a8ae47b25c5e@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2796 bytes --]

On Wed, Jan 31, 2024 at 11:27:13PM +0530, Kaartic Sivaraam wrote:
> Hi Patrick,
> 
> On 31 January 2024 6:40:36 pm IST, Patrick Steinhardt <ps@pks.im> wrote:
> > On Tue, Jan 30, 2024 at 10:08:53AM +0100, Patrick Steinhardt wrote:
> > > On Tue, Jan 30, 2024 at 09:38:48AM +0100, Christian Couder wrote:
> > > 
> > > Yes, the tests in t0032-reftable-unittest.sh should be ported over to
> > > the new unit test framework eventually, and I think that this might be a
> > > good GSoC project indeed.
> > > 
> 
> Nice. Good to hear that.
> 
> > > If there is interest I'd also be happy to draft up some more topics in
> > > the context of refs and the reftable backend. I'm sure there should be
> > > some topics here that would be a good fit for the GSoC project, and I'd
> > > be happy to mentor any such project in this context.
> > 
> 
> Great. Thanks for your interest in willing to mentor!
> 
> I created a fairly rough SoC ideas page for now including a barebones
> information about the unit test migration idea:
> 
> https://git.github.io/SoC-2024-Ideas/
> 
> Note well that the existing idea's description is far from complete and I
> mostly just cooked it up to serve as a template for how the idea entry could
> look like. Kindly contribute towards elaborating the same :-)
> 
> Also, feel free to suggest ideas you have around refs and reftable backed,
> Patrick. Those would be helpful.

I'll have a the beginning of next week and will think about topics
meanwhile.

> > I noticed that the starting period falls right into my honeymoon from
> > June 17th until July 19th. This unfortunately makes it quite a lot
> > harder for me to mentor projects alone. Still, I'd be happy to co-mentor
> > or help out in other ways.
> > 
> 
> I too don't believe your vacation is going to be a deal breaker for you
> being a mentor. It should be totally fine given that we get a backup mentor
> who is also willing to mentor the candidate. (side note: I myself have no
> knowledge about refs backends. So, I suppose I might not be able to help
> co-mentor this one).
> 
> Reg. the timline [1] Jun 17th won't be much of the start. The community
> bonding period starts May 1. Many contributors typically start dipping their
> toes into their project right away. So, you would have ample time before the
> start of the vacation to set the contributor up and going with the project.
> 
> [1]: https://developers.google.com/open-source/gsoc/timeline

Yeah, as long as there is a co-mentor that can take over during my
absence I'm happy to do it. Karthik said that he'd be willing to cover
me, which I think would be a good fit given that he's already got quite
a bit of exposure to the reftable backend internally at GitLab. Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v4 0/8] completion: improvements for git-bisect
From: Patrick Steinhardt @ 2024-02-01  9:55 UTC (permalink / raw)
  To: Britton Leo Kerin; +Cc: git, Junio C Hamano
In-Reply-To: <20240128223447.342493-1-britton.kerin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

On Sun, Jan 28, 2024 at 01:34:39PM -0900, Britton Leo Kerin wrote:
> Relative to v3 this reworks the commit contents and descriptions
> according to review suggestions, removes unnecessary case statements and
> precondition, adds option completion for the terms subcommand, and adds
> tests.

Thanks for this new version! I've got a few more comments, but overall I
really like what I see here and think that this is getting close.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v4 1/8] completion: bisect: complete bad, new, old, and help subcommands
From: Patrick Steinhardt @ 2024-02-01  9:55 UTC (permalink / raw)
  To: Britton Leo Kerin; +Cc: git, Junio C Hamano
In-Reply-To: <20240128223447.342493-2-britton.kerin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1359 bytes --]

On Sun, Jan 28, 2024 at 01:34:40PM -0900, Britton Leo Kerin wrote:
> The bad, new, old and help subcommands to git-bisect(1) are not
> completed.
> 
> Add the bad, new, old, and help subcommands to the appropriate lists
> such that the commands and their possible ref arguments are completed.
> 
> Signed-off-by: Britton Leo Kerin <britton.kerin@gmail.com>
> ---
>  contrib/completion/git-completion.bash | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
> index 185b47d802..06d0b156e7 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1449,7 +1449,7 @@ _git_bisect ()
>  {
>  	__git_has_doubledash && return
>  
> -	local subcommands="start bad good skip reset visualize replay log run"
> +	local subcommands="start bad new good old skip reset visualize replay log run help"
>  	local subcommand="$(__git_find_on_cmdline "$subcommands")"
>  	if [ -z "$subcommand" ]; then
>  		__git_find_repo_path
> @@ -1462,7 +1462,7 @@ _git_bisect ()
>  	fi
>  
>  	case "$subcommand" in
> -	bad|good|reset|skip|start)
> +	bad|new|good|old|reset|skip|start)
>  		__git_complete_refs
>  		;;
>  	*)

I didn't even know that `git bisect reset` takes a commit :)

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v4 2/8] completion: bisect: complete custom terms and related options
From: Patrick Steinhardt @ 2024-02-01  9:55 UTC (permalink / raw)
  To: Britton Leo Kerin; +Cc: git, Junio C Hamano
In-Reply-To: <20240128223447.342493-3-britton.kerin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2794 bytes --]

On Sun, Jan 28, 2024 at 01:34:41PM -0900, Britton Leo Kerin wrote:
> git bisect supports the use of custom terms via the --term-(new|bad) and
> --term-(old|good) options, but the completion code doesn't know about
> these options or the new subcommands they define.
> 
> Add support for these options and the custom subcommands by checking for
> them if a bisection is in progress and adding them to the list of
> subcommands.
> 
> Signed-off-by: Britton Leo Kerin <britton.kerin@gmail.com>
> ---
>  contrib/completion/git-completion.bash | 32 ++++++++++++++++++++++++--
>  1 file changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
> index 06d0b156e7..8baf330824 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1449,7 +1449,20 @@ _git_bisect ()
>  {
>  	__git_has_doubledash && return
>  
> -	local subcommands="start bad new good old skip reset visualize replay log run help"
> +	__git_find_repo_path
> +
> +	# If a bisection is in progress get the terms being used.
> +	local term_bad term_good
> +	if [ -f "$__git_repo_path"/BISECT_START ]; then
> +		term_bad=$(__git bisect terms --term-bad)
> +		term_good=$(__git bisect terms --term-good)
> +	fi

Nit: instead of checking for `BISECT_START` we should rather check for
`BISECT_TERMS`. Like that we don't waste two processes for users who
didn't specify any terms, which should be the majority.

We could also parse the terms directly from the file... but I'm not sure
that is really worth it and feels a lot more fragile to me. So I'm not
sure whether or not that is a good idea.

Patrick

> +	# We will complete any custom terms, but still always complete the
> +	# more usual bad/new/good/old because git bisect gives a good error
> +	# message if these are given when not in use, and that's better than
> +	# silent refusal to complete if the user is confused.
> +	local subcommands="start bad new $term_bad good old $term_good terms skip reset visualize replay log run help"
>  	local subcommand="$(__git_find_on_cmdline "$subcommands")"
>  	if [ -z "$subcommand" ]; then
>  		__git_find_repo_path
> @@ -1462,7 +1475,22 @@ _git_bisect ()
>  	fi
>  
>  	case "$subcommand" in
> -	bad|new|good|old|reset|skip|start)
> +	start)
> +		case "$cur" in
> +		--*)
> +			__gitcomp "--term-new --term-bad --term-old --term-good"
> +			return
> +			;;
> +		*)
> +			__git_complete_refs
> +			;;
> +		esac
> +		;;
> +	terms)
> +		__gitcomp "--term-good --term-old --term-bad --term-new"
> +		return
> +		;;
> +	bad|new|"$term_bad"|good|old|"$term_good"|reset|skip)
>  		__git_complete_refs
>  		;;
>  	*)
> -- 
> 2.43.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v4 5/8] completion: log: use __git_complete_log_opts
From: Patrick Steinhardt @ 2024-02-01  9:55 UTC (permalink / raw)
  To: Britton Leo Kerin; +Cc: git, Junio C Hamano
In-Reply-To: <20240128223447.342493-6-britton.kerin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3446 bytes --]

On Sun, Jan 28, 2024 at 01:34:44PM -0900, Britton Leo Kerin wrote:
> Use the new __git_complete_log_opts function to handle option and
> optiona rgument completion in _git_log.

I think this commit could be merged with the preceding one to clarify
that this really only is a move of code. Sorry if my comments on the
previous round weren't clear on that.

Patrick

> Signed-off-by: Britton Leo Kerin <britton.kerin@gmail.com>
> ---
>  contrib/completion/git-completion.bash | 95 +-------------------------
>  1 file changed, 3 insertions(+), 92 deletions(-)
> 
> diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
> index dfd504c37e..41c76c1246 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -2195,98 +2195,9 @@ _git_log ()
>  	__git_has_doubledash && return
>  	__git_find_repo_path
>  
> -	local merge=""
> -	if [ -f "$__git_repo_path/MERGE_HEAD" ]; then
> -		merge="--merge"
> -	fi
> -	case "$prev,$cur" in
> -	-L,:*:*)
> -		return	# fall back to Bash filename completion
> -		;;
> -	-L,:*)
> -		__git_complete_symbol --cur="${cur#:}" --sfx=":"
> -		return
> -		;;
> -	-G,*|-S,*)
> -		__git_complete_symbol
> -		return
> -		;;
> -	esac
> -	case "$cur" in
> -	--pretty=*|--format=*)
> -		__gitcomp "$__git_log_pretty_formats $(__git_pretty_aliases)
> -			" "" "${cur#*=}"
> -		return
> -		;;
> -	--date=*)
> -		__gitcomp "$__git_log_date_formats" "" "${cur##--date=}"
> -		return
> -		;;
> -	--decorate=*)
> -		__gitcomp "full short no" "" "${cur##--decorate=}"
> -		return
> -		;;
> -	--diff-algorithm=*)
> -		__gitcomp "$__git_diff_algorithms" "" "${cur##--diff-algorithm=}"
> -		return
> -		;;
> -	--submodule=*)
> -		__gitcomp "$__git_diff_submodule_formats" "" "${cur##--submodule=}"
> -		return
> -		;;
> -	--ws-error-highlight=*)
> -		__gitcomp "$__git_ws_error_highlight_opts" "" "${cur##--ws-error-highlight=}"
> -		return
> -		;;
> -	--no-walk=*)
> -		__gitcomp "sorted unsorted" "" "${cur##--no-walk=}"
> -		return
> -		;;
> -	--diff-merges=*)
> -                __gitcomp "$__git_diff_merges_opts" "" "${cur##--diff-merges=}"
> -                return
> -                ;;
> -	--*)
> -		__gitcomp "
> -			$__git_log_common_options
> -			$__git_log_shortlog_options
> -			$__git_log_gitk_options
> -			$__git_log_show_options
> -			--root --topo-order --date-order --reverse
> -			--follow --full-diff
> -			--abbrev-commit --no-abbrev-commit --abbrev=
> -			--relative-date --date=
> -			--pretty= --format= --oneline
> -			--show-signature
> -			--cherry-mark
> -			--cherry-pick
> -			--graph
> -			--decorate --decorate= --no-decorate
> -			--walk-reflogs
> -			--no-walk --no-walk= --do-walk
> -			--parents --children
> -			--expand-tabs --expand-tabs= --no-expand-tabs
> -			$merge
> -			$__git_diff_common_options
> -			"
> -		return
> -		;;
> -	-L:*:*)
> -		return	# fall back to Bash filename completion
> -		;;
> -	-L:*)
> -		__git_complete_symbol --cur="${cur#-L:}" --sfx=":"
> -		return
> -		;;
> -	-G*)
> -		__git_complete_symbol --pfx="-G" --cur="${cur#-G}"
> -		return
> -		;;
> -	-S*)
> -		__git_complete_symbol --pfx="-S" --cur="${cur#-S}"
> -		return
> -		;;
> -	esac
> +	__git_complete_log_opts
> +	[ -z "$COMPREPLY" ] || return
> +
>  	__git_complete_revlist
>  }
>  
> -- 
> 2.43.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v4 8/8] completion: add tests for git-bisect
From: Patrick Steinhardt @ 2024-02-01  9:55 UTC (permalink / raw)
  To: Britton Leo Kerin; +Cc: git, Junio C Hamano
In-Reply-To: <20240128223447.342493-9-britton.kerin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4911 bytes --]

On Sun, Jan 28, 2024 at 01:34:47PM -0900, Britton Leo Kerin wrote:
> There aren't any tests for completion of git bisect and it's
> subcommands.
> 
> Add tests.

I think it would be nice if you added relevant tests directly in the
commits that introduce the new completions. E.g. add a test for the
bisect terms in the same commit where you introduce the completion for
it.

Like that you can easily add tests one by one, which decreases the
review load. Also, it serves to demonstrate both that the functionality
works and helps the reviewer to understand better what exactly you are
adding by having a nice example.

Patrick

> ---
>  t/t9902-completion.sh | 135 ++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 135 insertions(+)
> 
> diff --git a/t/t9902-completion.sh b/t/t9902-completion.sh
> index aa9a614de3..698e278450 100755
> --- a/t/t9902-completion.sh
> +++ b/t/t9902-completion.sh
> @@ -1259,6 +1259,141 @@ test_expect_success 'git switch - with no options, complete local branches and u
>  	EOF
>  '
>  
> +test_expect_success 'git bisect - when not bisecting, complete only replay and start subcommands' '
> +	test_completion "git bisect " <<-\EOF
> +	replay Z
> +	start Z
> +	EOF
> +'
> +
> +test_expect_success 'git bisect - complete options to start subcommand' '
> +	test_completion "git bisect start --" <<-\EOF
> +	--term-new Z
> +	--term-bad Z
> +	--term-old Z
> +	--term-good Z
> +	--no-checkout Z
> +	--first-parent Z
> +	EOF
> +'
> +
> +test_expect_success 'setup for git-bisect tests requiring a repo' '
> +	git init git-bisect &&
> +	(
> +		cd git-bisect &&
> +		echo "initial contents" >file &&
> +		git add file &&
> +		git commit -am "Initial commit" &&
> +		git tag initial &&
> +		echo "new line" >>file &&
> +		git commit -am "First change" &&
> +		echo "another new line" >>file &&
> +		git commit -am "Second change" &&
> +		git tag final
> +	)
> +'
> +
> +test_expect_success 'git bisect - start subcommand arguments before double-dash are completed as revs' '
> +	(
> +		cd git-bisect &&
> +		test_completion "git bisect start " <<-\EOF
> +		HEAD Z
> +		final Z
> +		initial Z
> +		master Z
> +		EOF
> +	)
> +'
> +
> +# Note that these arguments are <pathspec>s, which in practice the fallback
> +# completion (not the git completion) later ends up completing as paths.
> +test_expect_success 'git bisect - start subcommand arguments after double-dash are not completed' '
> +	(
> +		cd git-bisect &&
> +		test_completion "git bisect start final initial -- " ""
> +	)
> +'
> +
> +test_expect_success 'setup for git-bisect tests requiring ongoing bisection' '
> +	(
> +		cd git-bisect &&
> +		git bisect start --term-new=custom_new --term-old=custom_old final initial
> +	)
> +'
> +
> +test_expect_success 'git-bisect - when bisecting all subcommands are candidates' '
> +	(
> +		cd git-bisect &&
> +		test_completion "git bisect " <<-\EOF
> +		start Z
> +		bad Z
> +		custom_new Z
> +		custom_old Z
> +		new Z
> +		good Z
> +		old Z
> +		terms Z
> +		skip Z
> +		reset Z
> +		visualize Z
> +		replay Z
> +		log Z
> +		run Z
> +		help Z
> +		EOF
> +	)
> +'
> +test_expect_success 'git-bisect - options to terms subcommand are candidates' '
> +	(
> +		cd git-bisect &&
> +		test_completion "git bisect terms --" <<-\EOF
> +		--term-bad Z
> +		--term-good Z
> +		--term-new Z
> +		--term-old Z
> +		EOF
> +	)
> +'
> +
> +test_expect_success 'git-bisect - git-log options to visualize subcommand are candidates' '
> +	(
> +		cd git-bisect &&
> +		# The completion used for git-log and here does not complete
> +		# every git-log option, so rather than hope to stay in sync
> +		# with exactly what it does we will just spot-test here.
> +		test_completion "git bisect visualize --sta" <<-\EOF &&
> +		--stat Z
> +		EOF
> +		test_completion "git bisect visualize --summar" <<-\EOF
> +		--summary Z
> +		EOF
> +	)
> +'
> +
> +test_expect_success 'git-bisect - view subcommand is not a candidate' '
> +	(
> +		cd git-bisect &&
> +		test_completion "git bisect vi" <<-\EOF
> +		visualize Z
> +		EOF
> +	)
> +'
> +
> +test_expect_success 'git-bisect - existing view subcommand is recognized and enables completion of git-log options' '
> +	(
> +		cd git-bisect &&
> +		# The completion used for git-log and here does not complete
> +		# every git-log option, so rather than hope to stay in sync
> +		# with exactly what it does we will just spot-test here.
> +		test_completion "git bisect view --sta" <<-\EOF &&
> +		--stat Z
> +		EOF
> +		test_completion "git bisect view --summar" <<-\EOF
> +		--summary Z
> +		EOF
> +	)
> +'
> +
>  test_expect_success 'git checkout - completes refs and unique remote branches for DWIM' '
>  	test_completion "git checkout " <<-\EOF
>  	HEAD Z
> -- 
> 2.43.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH 0/7] reftable: improve ref iteration performance
From: Patrick Steinhardt @ 2024-02-01 10:24 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1334 bytes --]

Hi,

this patch series aims to improve performance when iterating over many
refs with the reftable backend. It mostly does so by trying to reduce
the number of allocations and avoiding useless copying of memory where
possible.

I've been careful to avoid conflicts with any in-flight topics, so it
should be fine for all of the topics to go in indepentently of each
other. The benchmarks have all been done with ps/reftable-backend at
066dced0b1 (ci: add jobs to test with the reftable backend, 2024-01-30).

Patrick

Patrick Steinhardt (7):
  reftable/record: introduce function to compare records by key
  reftable/merged: allocation-less dropping of shadowed records
  reftable/merged: skip comparison for records of the same subiter
  reftable/pq: allocation-less comparison of entry keys
  reftable/block: swap buffers instead of copying
  reftable/record: don't try to reallocate ref record name
  reftable/reader: add comments to `table_iter_next()`

 reftable/block.c  |  3 +--
 reftable/merged.c | 19 +++++++-------
 reftable/merged.h |  2 --
 reftable/pq.c     | 13 +--------
 reftable/reader.c | 26 +++++++++++-------
 reftable/record.c | 67 ++++++++++++++++++++++++++++++++++++++++++++---
 reftable/record.h |  7 +++++
 7 files changed, 100 insertions(+), 37 deletions(-)

-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH 1/7] reftable/record: introduce function to compare records by key
From: Patrick Steinhardt @ 2024-02-01 10:24 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1706782841.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 6451 bytes --]

In some places we need to sort reftable records by their keys to
determine their ordering. This is done by first formatting the keys into
a `struct strbuf` and then using `strbuf_cmp()` to compare them. This
logic is needlessly roundabout and can end up costing quite a bit fo CPU
cycles, both due to the allocation and formatting logic.

Introduce a new `reftable_record_cmp()` function that knows to compare
two records with each other without requiring allocations.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++-
 reftable/record.h |  7 ++++++
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/reftable/record.c b/reftable/record.c
index 5c3fbb7b2a..f1b6a5eac9 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -430,7 +430,6 @@ static int reftable_ref_record_is_deletion_void(const void *p)
 		(const struct reftable_ref_record *)p);
 }
 
-
 static int reftable_ref_record_equal_void(const void *a,
 					  const void *b, int hash_size)
 {
@@ -439,6 +438,13 @@ static int reftable_ref_record_equal_void(const void *a,
 	return reftable_ref_record_equal(ra, rb, hash_size);
 }
 
+static int reftable_ref_record_cmp_void(const void *_a, const void *_b)
+{
+	const struct reftable_ref_record *a = _a;
+	const struct reftable_ref_record *b = _b;
+	return strcmp(a->refname, b->refname);
+}
+
 static void reftable_ref_record_print_void(const void *rec,
 					   int hash_size)
 {
@@ -455,6 +461,7 @@ static struct reftable_record_vtable reftable_ref_record_vtable = {
 	.release = &reftable_ref_record_release_void,
 	.is_deletion = &reftable_ref_record_is_deletion_void,
 	.equal = &reftable_ref_record_equal_void,
+	.cmp = &reftable_ref_record_cmp_void,
 	.print = &reftable_ref_record_print_void,
 };
 
@@ -625,6 +632,25 @@ static int reftable_obj_record_equal_void(const void *a, const void *b, int hash
 	return 1;
 }
 
+static int reftable_obj_record_cmp_void(const void *_a, const void *_b)
+{
+	const struct reftable_obj_record *a = _a;
+	const struct reftable_obj_record *b = _b;
+	int cmp;
+
+	cmp = memcmp(a->hash_prefix, b->hash_prefix,
+		     a->hash_prefix_len > b->hash_prefix_len ?
+		     a->hash_prefix_len : b->hash_prefix_len);
+	if (cmp)
+		return cmp;
+
+	/*
+	 * When the prefix is the same then the object record that is longer is
+	 * considered to be bigger.
+	 */
+	return a->hash_prefix_len - b->hash_prefix_len;
+}
+
 static struct reftable_record_vtable reftable_obj_record_vtable = {
 	.key = &reftable_obj_record_key,
 	.type = BLOCK_TYPE_OBJ,
@@ -635,6 +661,7 @@ static struct reftable_record_vtable reftable_obj_record_vtable = {
 	.release = &reftable_obj_record_release,
 	.is_deletion = &not_a_deletion,
 	.equal = &reftable_obj_record_equal_void,
+	.cmp = &reftable_obj_record_cmp_void,
 	.print = &reftable_obj_record_print,
 };
 
@@ -953,6 +980,22 @@ static int reftable_log_record_equal_void(const void *a,
 					 hash_size);
 }
 
+static int reftable_log_record_cmp_void(const void *_a, const void *_b)
+{
+	const struct reftable_log_record *a = _a;
+	const struct reftable_log_record *b = _b;
+	int cmp = strcmp(a->refname, b->refname);
+	if (cmp)
+		return cmp;
+
+	/*
+	 * Note that the comparison here is reversed. This is because the
+	 * update index is reversed when comparing keys. For reference, see how
+	 * we handle this in reftable_log_record_key()`.
+	 */
+	return b->update_index - a->update_index;
+}
+
 int reftable_log_record_equal(const struct reftable_log_record *a,
 			      const struct reftable_log_record *b, int hash_size)
 {
@@ -1002,6 +1045,7 @@ static struct reftable_record_vtable reftable_log_record_vtable = {
 	.release = &reftable_log_record_release_void,
 	.is_deletion = &reftable_log_record_is_deletion_void,
 	.equal = &reftable_log_record_equal_void,
+	.cmp = &reftable_log_record_cmp_void,
 	.print = &reftable_log_record_print_void,
 };
 
@@ -1077,6 +1121,13 @@ static int reftable_index_record_equal(const void *a, const void *b, int hash_si
 	return ia->offset == ib->offset && !strbuf_cmp(&ia->last_key, &ib->last_key);
 }
 
+static int reftable_index_record_cmp(const void *_a, const void *_b)
+{
+	const struct reftable_index_record *a = _a;
+	const struct reftable_index_record *b = _b;
+	return strbuf_cmp(&a->last_key, &b->last_key);
+}
+
 static void reftable_index_record_print(const void *rec, int hash_size)
 {
 	const struct reftable_index_record *idx = rec;
@@ -1094,6 +1145,7 @@ static struct reftable_record_vtable reftable_index_record_vtable = {
 	.release = &reftable_index_record_release,
 	.is_deletion = &not_a_deletion,
 	.equal = &reftable_index_record_equal,
+	.cmp = &reftable_index_record_cmp,
 	.print = &reftable_index_record_print,
 };
 
@@ -1147,6 +1199,14 @@ int reftable_record_is_deletion(struct reftable_record *rec)
 		reftable_record_data(rec));
 }
 
+int reftable_record_cmp(struct reftable_record *a, struct reftable_record *b)
+{
+	if (a->type != b->type)
+		BUG("cannot compare reftable records of different type");
+	return reftable_record_vtable(a)->cmp(
+		reftable_record_data(a), reftable_record_data(b));
+}
+
 int reftable_record_equal(struct reftable_record *a, struct reftable_record *b, int hash_size)
 {
 	if (a->type != b->type)
diff --git a/reftable/record.h b/reftable/record.h
index fd80cd451d..0d96fbfd1b 100644
--- a/reftable/record.h
+++ b/reftable/record.h
@@ -62,6 +62,12 @@ struct reftable_record_vtable {
 	/* Are two records equal? This assumes they have the same type. Returns 0 for non-equal. */
 	int (*equal)(const void *a, const void *b, int hash_size);
 
+	/*
+	 * Compare keys of two records with each other. The records must have
+	 * the same type.
+	 */
+	int (*cmp)(const void *a, const void *b);
+
 	/* Print on stdout, for debugging. */
 	void (*print)(const void *rec, int hash_size);
 };
@@ -114,6 +120,7 @@ struct reftable_record {
 };
 
 /* see struct record_vtable */
+int reftable_record_cmp(struct reftable_record *a, struct reftable_record *b);
 int reftable_record_equal(struct reftable_record *a, struct reftable_record *b, int hash_size);
 void reftable_record_print(struct reftable_record *rec, int hash_size);
 void reftable_record_key(struct reftable_record *rec, struct strbuf *dest);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 2/7] reftable/merged: allocation-less dropping of shadowed records
From: Patrick Steinhardt @ 2024-02-01 10:25 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1706782841.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4576 bytes --]

The purpose of the merged reftable iterator is to iterate through all
entries of a set of tables in the correct order. This is implemented by
using a sub-iterator for each table, where the next entry of each of
these iterators gets put into a priority queue. For each iteration, we
do roughly the following steps:

  1. Retrieve the top record of the priority queue. This is the entry we
     want to return to the caller.

  2. Retrieve the next record of the sub-iterator that this record came
     from. If any, add it to the priority queue at the correct position.
     The position is determined by comparing the record keys, which e.g.
     corresponds to the refname for ref records.

  3. Keep removing the top record of the priority queue until we hit the
     first entry whose key is larger than the returned record's key.
     This is required to drop "shadowed" records.

The last step will lead to at least one comparison to the next entry,
but may lead to many comparisons in case the reftable stack consists of
many tables with shadowed records. It is thus part of the hot code path
when iterating through records.

The code to compare the entries with each other is quite inefficient
though. Instead of comparing record keys with each other directly, we
first format them into `struct strbuf`s and only then compare them with
each other. While we already optimized this code path to reuse buffers
in 829231dc20 (reftable/merged: reuse buffer to compute record keys,
2023-12-11), the cost to format the keys into the buffers still adds up
quite significantly.

Refactor the code to use `reftable_record_cmp()` instead, which has been
introduced in the preceding commit. This function compares records with
each other directly without requiring any memory allocations or copying
and is thus way more efficient.

The following benchmark uses git-show-ref(1) to print a single ref
matching a pattern out of 1 million refs. This is the most direct way to
exercise ref iteration speed as we remove all overhead of having to show
the refs, too.

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     180.7 ms ±   4.7 ms    [User: 177.1 ms, System: 3.4 ms]
      Range (min … max):   174.9 ms … 211.7 ms    1000 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     162.1 ms ±   4.4 ms    [User: 158.5 ms, System: 3.4 ms]
      Range (min … max):   155.4 ms … 189.3 ms    1000 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.11 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 11 ++---------
 reftable/merged.h |  2 --
 2 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index c258ce953e..fb9978d798 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -51,8 +51,6 @@ static void merged_iter_close(void *p)
 		reftable_iterator_destroy(&mi->stack[i]);
 	}
 	reftable_free(mi->stack);
-	strbuf_release(&mi->key);
-	strbuf_release(&mi->entry_key);
 }
 
 static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
@@ -105,14 +103,11 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 	  such a deployment, the loop below must be changed to collect all
 	  entries for the same key, and return new the newest one.
 	*/
-	reftable_record_key(&entry.rec, &mi->entry_key);
 	while (!merged_iter_pqueue_is_empty(mi->pq)) {
 		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
-		int cmp = 0;
+		int cmp;
 
-		reftable_record_key(&top.rec, &mi->key);
-
-		cmp = strbuf_cmp(&mi->key, &mi->entry_key);
+		cmp = reftable_record_cmp(&top.rec, &entry.rec);
 		if (cmp > 0)
 			break;
 
@@ -246,8 +241,6 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
 		.typ = reftable_record_type(rec),
 		.hash_id = mt->hash_id,
 		.suppress_deletions = mt->suppress_deletions,
-		.key = STRBUF_INIT,
-		.entry_key = STRBUF_INIT,
 	};
 	int n = 0;
 	int err = 0;
diff --git a/reftable/merged.h b/reftable/merged.h
index d5b39dfe7f..7d9f95d27e 100644
--- a/reftable/merged.h
+++ b/reftable/merged.h
@@ -31,8 +31,6 @@ struct merged_iter {
 	uint8_t typ;
 	int suppress_deletions;
 	struct merged_iter_pqueue pq;
-	struct strbuf key;
-	struct strbuf entry_key;
 };
 
 void merged_table_release(struct reftable_merged_table *mt);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 3/7] reftable/merged: skip comparison for records of the same subiter
From: Patrick Steinhardt @ 2024-02-01 10:25 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1706782841.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2552 bytes --]

When retrieving the next entry of a merged iterator we need to drop all
records of other sub-iterators that would be shadowed by the record that
we are about to return. We do this by comparing record keys, dropping
all keys that are smaller or equal to the key of the record we are about
to return.

There is an edge case here where we can skip that comparison: when the
record in the priority queue comes from the same subiterator than the
record we are about to return then we know that its key must be larger
than the key of the record we are about to return. This property is
guaranteed by the sub-iterators, and if it didn't hold then the whole
merged iterator would return records in the wrong order, too.

While this may seem like a very specific edge case it's in fact quite
likely to happen. For most repositories out there you can assume that we
will end up with one large table and several smaller ones on top of it.
Thus, it is very likely that the next entry will sort towards the top of
the priority queue.

Special case this and break out of the loop in that case. The following
benchmark uses git-show-ref(1) to print a single ref matching a pattern
out of 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     162.6 ms ±   4.5 ms    [User: 159.0 ms, System: 3.5 ms]
    Range (min … max):   156.6 ms … 188.5 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     156.8 ms ±   4.7 ms    [User: 153.0 ms, System: 3.6 ms]
    Range (min … max):   151.4 ms … 188.4 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.04 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/reftable/merged.c b/reftable/merged.c
index fb9978d798..0f74a14a77 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -107,6 +107,14 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
 		int cmp;
 
+		/*
+		 * When the next entry comes from the same queue as the current
+		 * entry then it must by definition be larger. This avoids a
+		 * comparison in the most common case.
+		 */
+		if (top.index == entry.index)
+			break;
+
 		cmp = reftable_record_cmp(&top.rec, &entry.rec);
 		if (cmp > 0)
 			break;
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 4/7] reftable/pq: allocation-less comparison of entry keys
From: Patrick Steinhardt @ 2024-02-01 10:25 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1706782841.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2853 bytes --]

The priority queue is used by the merged iterator to iterate over
reftable records from multiple tables in the correct order. The queue
ends up having one record for each table that is being iterated over,
with the record that is supposed to be shown next at the top. For
example, the key of a ref record is equal to its name so that we end up
sorting the priority queue lexicographically by ref name.

To figure out the order we need to compare the reftable record keys with
each other. This comparison is done by formatting them into a `struct
strbuf` and then doing `strbuf_strcmp()` on the result. We then discard
the buffers immediately after the comparison.

This ends up being very expensive. Because the priority queue usually
contains as many records as we have tables, we call the comparison
function `O(log($tablecount))` many times for every record we insert.
Furthermore, when iterating over many refs, we will insert at least one
record for every ref we are iterating over. So ultimately, this ends up
being called `O($refcount * log($tablecount))` many times.

Refactor the code to use the new `refatble_record_cmp()` function that
has been implemented in a preceding commit. This function does not need
to allocate memory and is thus significantly more efficient.

The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1), where the reftable stack
consists of three tables:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     224.4 ms ±   6.5 ms    [User: 220.6 ms, System: 3.6 ms]
    Range (min … max):   216.5 ms … 261.1 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     172.9 ms ±   4.4 ms    [User: 169.2 ms, System: 3.6 ms]
    Range (min … max):   166.5 ms … 204.6 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.30 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/pq.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/reftable/pq.c b/reftable/pq.c
index dcefeb793a..7220efc39a 100644
--- a/reftable/pq.c
+++ b/reftable/pq.c
@@ -14,20 +14,9 @@ license that can be found in the LICENSE file or at
 
 int pq_less(struct pq_entry *a, struct pq_entry *b)
 {
-	struct strbuf ak = STRBUF_INIT;
-	struct strbuf bk = STRBUF_INIT;
-	int cmp = 0;
-	reftable_record_key(&a->rec, &ak);
-	reftable_record_key(&b->rec, &bk);
-
-	cmp = strbuf_cmp(&ak, &bk);
-
-	strbuf_release(&ak);
-	strbuf_release(&bk);
-
+	int cmp = reftable_record_cmp(&a->rec, &b->rec);
 	if (cmp == 0)
 		return a->index > b->index;
-
 	return cmp < 0;
 }
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 5/7] reftable/block: swap buffers instead of copying
From: Patrick Steinhardt @ 2024-02-01 10:25 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1706782841.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2068 bytes --]

When iterating towards the next record in a reftable block we need to
keep track of the key that the last record had. This is required because
reftable records use prefix compression, where subsequent records may
reuse parts of their preceding record's key.

This key is stored in the `block_iter::last_key`, which we update after
every call to `block_iter_next()`: we simply reset the buffer and then
add the current key to it.

This is a bit inefficient though because it requires us to copy over the
key on every iteration, which adds up when iterating over many records.
Instead, we can make use of the fact that the `block_iter::key` buffer
is basically only a scratch buffer. So instead of copying over contents,
we can just swap both buffers.

The following benchmark prints a single ref matching a specific pattern
out of 1 million refs via git-show-ref(1):

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     155.7 ms ±   5.0 ms    [User: 152.1 ms, System: 3.4 ms]
    Range (min … max):   150.8 ms … 185.7 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     150.8 ms ±   4.2 ms    [User: 147.1 ms, System: 3.5 ms]
    Range (min … max):   145.1 ms … 180.7 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.03 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 1df3d8a0f0..44381ea6a3 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -342,8 +342,7 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 		return -1;
 	string_view_consume(&in, n);
 
-	strbuf_reset(&it->last_key);
-	strbuf_addbuf(&it->last_key, &it->key);
+	strbuf_swap(&it->last_key, &it->key);
 	it->next_off += start.len - in.len;
 	return 0;
 }
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 6/7] reftable/record: don't try to reallocate ref record name
From: Patrick Steinhardt @ 2024-02-01 10:25 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1706782841.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

When decoding reftable ref records we first release the pointer to the
record passed to us and then use realloc(3P) to allocate the refname
array. This is a bit misleading though as we know at that point that the
refname will always be `NULL`, so we would always end up allocating a
new char array anyway.

Refactor the code to use `REFTABLE_ALLOC_ARRAY()` instead. As the
following benchmark demonstrates this is a tiny bit more efficient. But
the bigger selling point really is the gained clarity.

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     150.1 ms ±   4.1 ms    [User: 146.6 ms, System: 3.3 ms]
    Range (min … max):   144.5 ms … 180.5 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     148.9 ms ±   4.5 ms    [User: 145.2 ms, System: 3.4 ms]
    Range (min … max):   143.0 ms … 185.4 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Ideally, we should try and reuse the memory of the old record instead of
first freeing and then immediately reallocating it. This requires some
more surgery though and is thus left for a future iteration.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/reftable/record.c b/reftable/record.c
index f1b6a5eac9..6465a7b8f4 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -377,10 +377,11 @@ static int reftable_ref_record_decode(void *rec, struct strbuf key,
 
 	assert(hash_size > 0);
 
-	r->refname = reftable_realloc(r->refname, key.len + 1);
+	r->refname = reftable_malloc(key.len + 1);
 	memcpy(r->refname, key.buf, key.len);
-	r->update_index = update_index;
 	r->refname[key.len] = 0;
+
+	r->update_index = update_index;
 	r->value_type = val_type;
 	switch (val_type) {
 	case REFTABLE_REF_VAL1:
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH 7/7] reftable/reader: add comments to `table_iter_next()`
From: Patrick Steinhardt @ 2024-02-01 10:25 UTC (permalink / raw)
  To: git
In-Reply-To: <cover.1706782841.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1764 bytes --]

While working on the optimizations in the preceding patches I stumbled
upon `table_iter_next()` multiple times. It is quite easy to miss the
fact that we don't call `table_iter_next_in_block()` twice, but that the
second call is in fact `table_iter_next_block()`.

Add comments to explain what exactly is going on here to make things
more obvious. While at it, touch up the code to conform to our code
style better.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index 64dc366fb1..add7d57f0b 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -357,24 +357,32 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 
 	while (1) {
 		struct table_iter next = TABLE_ITER_INIT;
-		int err = 0;
-		if (ti->is_finished) {
+		int err;
+
+		if (ti->is_finished)
 			return 1;
-		}
 
+		/*
+		 * Check whether the current block still has more records. If
+		 * so, return it. If the iterator returns positive then the
+		 * current block has been exhausted.
+		 */
 		err = table_iter_next_in_block(ti, rec);
-		if (err <= 0) {
+		if (err <= 0)
 			return err;
-		}
 
+		/*
+		 * Otherwise, we need to continue to the next block in the
+		 * table and retry. If there are no more blocks then the
+		 * iterator is drained.
+		 */
 		err = table_iter_next_block(&next, ti);
-		if (err != 0) {
-			ti->is_finished = 1;
-		}
 		table_iter_block_done(ti);
-		if (err != 0) {
+		if (err) {
+			ti->is_finished = 1;
 			return err;
 		}
+
 		table_iter_copy_from(ti, &next);
 		block_iter_close(&next.bi);
 	}
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* Re: [PATCH v2 3/3] t/Makefile: get UNIT_TESTS list from C sources
From: Phillip Wood @ 2024-02-01 10:50 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Phillip Wood, SZEDER Gábor, Junio C Hamano, Adam Dinwoodie,
	Patrick Steinhardt
In-Reply-To: <20240130054037.GC166699@coredump.intra.peff.net>

Hi Peff

On 30/01/2024 05:40, Jeff King wrote:
> We decide on the set of unit tests to run by asking make to expand the
> wildcard "t/unit-tests/bin/*". One unfortunate outcome of this is that
> we'll run anything in that directory, even if it is leftover cruft from
> a previous build. This isn't _quite_ as bad as it sounds, since in
> theory the unit tests executables are self-contained (so if they passed
> before, they'll pass even though they now have nothing to do with the
> checked out version of Git). But at the very least it's wasteful, and if
> they _do_ fail it can be quite confusing to understand why they are
> being run at all.
> 
> This wildcarding presumably came from our handling of the regular
> shell-script tests, which use $(wildcard t[0-9][0-9][0-9][0-9]-*.sh).
> But the difference there is that those are actual tracked files. So if
> you checkout a different commit, they'll go away. Whereas the contents
> of unit-tests/bin are ignored (so not only do they stick around, but you
> are not even warned of the stale files via "git status").
> 
> This patch fixes the situation by looking for the actual unit-test
> source files and then massaging those names into the final executable
> names. This has two additional benefits:
> 
>    1. It will notice if we failed to build one or more unit-tests for
>       some reason (wheras the current code just runs whatever made it to
>       the bin/ directory).

The downside to this is that if there are any cruft C files lying about 
t/unit-tests we'll fail to run the unit tests. In the past we've avoided 
using wildcard rules on C sources to avoid problems like this[1]. This 
change may well be the lesser of two evils but a test run that fails due 
to a cruft C file cannot be fixed by "make clean && make" whereas that 
will fix the problem of a stale test executable.

>    2. The wildcard should avoid other build cruft, like the pdb files we
>       worked around in 0df903d402 (unit-tests: do not mistake `.pdb`
>       files for being executable, 2023-09-25).
> 
> Our new wildcard does make an assumption that unit tests are built from
> C sources. It would be a bit cleaner if we consulted UNIT_TEST_PROGRAMS
> from the top-level Makefile. But doing so is tricky unless we reorganize
> that Makefile to split the source file lists into include-able subfiles.
> That might be worth doing in general, but in the meantime, the
> assumptions made by the wildcard here seems reasonable.

Using UNIT_TEST_PROGRAMS would definitely be a nicer approach long term.

Best Wishes

Phillip

[1] https://lore.kernel.org/git/xmqqtugl102l.fsf@gitster.g/


^ permalink raw reply

* Re: [PATCH 1/4] sequencer: Do not require `allow_empty` for redundant commit options
From: Phillip Wood @ 2024-02-01 10:57 UTC (permalink / raw)
  To: Brian Lyles, phillip.wood; +Cc: git, me, newren, gitster
In-Reply-To: <CAHPHrSeKY2Ou9VCq6rtADTOwycc0KCTPaCCwtqf94yLi0wj0OQ@mail.gmail.com>

Hi Brian

On 27/01/2024 23:30, Brian Lyles wrote:
> On Tue, Jan 23, 2024 at 8:23 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>> On 19/01/2024 05:59, brianmlyles@gmail.com wrote:
>>> From: Brian Lyles <brianmlyles@gmail.com>
>>>
>>> Previously, a consumer of the sequencer that wishes to take advantage of
>>> either the `keep_redundant_commits` or `drop_redundant_commits` feature
>>> must also specify `allow_empty`.
>>>
>>> The only consumer of `drop_redundant_commits` is `git-rebase`, which
>>> already allows empty commits by default and simply always enables
>>> `allow_empty`. `keep_redundant_commits` was also consumed by
>>> `git-cherry-pick`, which had to specify `allow-empty` when
>>> `keep_redundant_commits` was specified in order for the sequencer's
>>> `allow_empty()` to actually respect `keep_redundant_commits`.
>>
>> I think it might be more persuasive to start the commit message by
>> explaining what user visible change you're trying to make and why rather
>> than concentrating on the implementation details.
> 
> I struggled a bit with this initially because the motivation behind the
> change in this particular commit was driven by a technical issue in my
> mind. The side-effect with git-cherry-pick(s) `--allow-empty` and
> `--keep-redundant-commits` was mildly problematic, but less concerning
> that the future problem that we'd have once git-cherry-pick(1) got the
> more robust `--empty` option in a later commit in this series.
> 
> I think my problem came down to this commit trying to solve two problems
> at once -- the underlying technical concern _and_ the git-cherry-pick(1)
> behavior.
> 
> In v2, I intend to break this commit into two:
> 
> - Update `allow_empty()` to not require `allow_empty`, but without
>    actually changing any consumers (and thus without making any
>    functional change)
> - Update git-cherry-pick(1) such that `--keep-redundant-commits` no
>    longer implies `--allow-empty`.
> 
> This allows me to better justify the technical change technically and
> the functional change functionally, while also making it easier to drop
> the functional change if we decide that a breaking change is not
> warranted to address this.

That sounds like a good strategy. I'm wondering if we should not change 
the behavior of `--keep-redundant-commits` to avoid breaking existing 
users but have `--empty=keep|drop` not imply `--allow-empty`. What ever 
we do we'll annoy someone. It is confusing to have subtly different 
behaviors for `--keep-redundant-commits` and `--empty=keep` but it 
avoids breaking existing users. If we change `--keep-redundant-commits` 
we potentially upset existing users but we don't confuse others with the 
subtle difference between the two.

>> Do you have a practical example of where you want to keep the commits
>> that become empty but not the ones that start empty? I agree there is a
>> distinction but I think the common case is that the user wants to keep
>> both types of empty commit or none. I'm not against giving the user the
>> option to keep one or the other if it is useful but I'm wary of changing
>> the default.
> 
> That practical example is documented in the initial discussion[1], which
> I should have ought to have linked in a cover letter for this series
> (and will do so in v2). I'll avoid copying the details here, but we'd
> very much like to be able to programmatically drop the commits that
> become empty when doing the automated cherry-pick described there.
> 
> [1]: https://lore.kernel.org/git/CAHPHrSevBdQF0BisR8VK=jM=wj1dTUYEVrv31gLerAzL9=Cd8Q@mail.gmail.com/

Maybe I've missed something but that seems to be an argument for 
implementing `--empty=drop` which is completely reasonable but doesn't 
explain why someone using `--keep-redundant-commits` would want to keep 
the commits that become empty while dropping the commits that start empty.

> [...]
>> Thank you for being clear about the change in behavior, as I said above
>> I'm wary of changing the default unless there is a compelling reason but
>> I'm happy to support
>>
>>       git cherry-pick --keep-redundant-commits --no-allow-empty
>>
>> if it is needed.
> 
> I totally understand being wary here.
> 
> I've certainly convinced myself that having the future `--empty=drop`
> behavior introduced later in this patch should not imply
> `--allow-empty`.

I agree with that

> I also _think_ that the existing behavior of `--keep-redundant-commits`
> is probably technically not ideal or correct, but could be convinced
> that changing it now is not worthwhile. I will defer to group consensus
> here.

There is definitely an argument that the existing behavior conflates the 
two flavors of empty commit. I think one can also argue that the 
conflation is beneficial as most of the time users don't care about the 
distinction when using `--keep-redundant-commits` and so it is not worth 
changing the behavior of the existing option.

I sounds like you've got a good plan for v2, I look forward to reading it.

Best Wishes

Phillip

^ permalink raw reply

* [PATCH 0/3] rev-list: allow missing tips with --missing
From: Christian Couder @ 2024-02-01 11:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Patrick Steinhardt, John Cai, Christian Couder

A recent patch series, kn/rev-list-missing-fix [1] extended the
`--missing` option in git-rev-list(1) to support commit objects.

Unfortunately, git-rev-list(1) with `--missing` set to something other
than 'error' still fails, usually with a "fatal: bad object <oid>"
error message, when a missing object is passed as an argument.

This patch series removes this limitation and when using
`--missing=print` allows all missing objects to be printed including
those that are passed as arguments.

Patches 1/3 (revision: clarify a 'return NULL' in get_reference()) and
2/3 (t6022: fix 'even though' typo in comment) are very small
preparatory cleanups.

Patch 3/3 (rev-list: add --allow-missing-tips to be used with
--missing=...) introduces the new `--allow-missing-tips` option that
allows git-rev-list(1) with `--missing` to not fail when one of the
tips it is passed is missing.

[1] https://lore.kernel.org/git/20231026101109.43110-1-karthik.188@gmail.com/

Christian Couder (3):
  revision: clarify a 'return NULL' in get_reference()
  t6022: fix 'even though' typo in comment
  rev-list: add --allow-missing-tips to be used with --missing=...

 builtin/rev-list.c          | 24 ++++++++++++++++-
 revision.c                  | 11 +++++---
 revision.h                  |  8 ++++++
 t/t6022-rev-list-missing.sh | 53 ++++++++++++++++++++++++++++++++++++-
 4 files changed, 90 insertions(+), 6 deletions(-)

-- 
2.43.0.496.gd667eb0d7d.dirty


^ permalink raw reply

* [PATCH 1/3] revision: clarify a 'return NULL' in get_reference()
From: Christian Couder @ 2024-02-01 11:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, John Cai, Christian Couder,
	Christian Couder
In-Reply-To: <20240201115809.1177064-1-christian.couder@gmail.com>

In general when we know a pointer variable is NULL, it's clearer to
explicitely return NULL than to return that variable.

In get_reference() when 'object' is NULL, we already return NULL
when 'revs->exclude_promisor_objects && is_promisor_object(oid)' is
true, but we return 'object' when 'revs->ignore_missing' is true.

Let's make the code clearer and more uniform by also explicitely
returning NULL when 'revs->ignore_missing' is true.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 revision.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/revision.c b/revision.c
index 2424c9bd67..4c5cd7c3ce 100644
--- a/revision.c
+++ b/revision.c
@@ -385,7 +385,7 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
 
 	if (!object) {
 		if (revs->ignore_missing)
-			return object;
+			return NULL;
 		if (revs->exclude_promisor_objects && is_promisor_object(oid))
 			return NULL;
 		die("bad object %s", name);
-- 
2.43.0.496.gd667eb0d7d.dirty


^ permalink raw reply related

* [PATCH 2/3] t6022: fix 'even though' typo in comment
From: Christian Couder @ 2024-02-01 11:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, John Cai, Christian Couder,
	Christian Couder
In-Reply-To: <20240201115809.1177064-1-christian.couder@gmail.com>

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t6022-rev-list-missing.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
index 211672759a..527aa94f07 100755
--- a/t/t6022-rev-list-missing.sh
+++ b/t/t6022-rev-list-missing.sh
@@ -46,7 +46,7 @@ do
 			git rev-list --objects --no-object-names \
 				HEAD ^$obj >expect.raw &&
 
-			# Blobs are shared by all commits, so evethough a commit/tree
+			# Blobs are shared by all commits, so even though a commit/tree
 			# might be skipped, its blob must be accounted for.
 			if [ $obj != "HEAD:1.t" ]; then
 				echo $(git rev-parse HEAD:1.t) >>expect.raw &&
-- 
2.43.0.496.gd667eb0d7d.dirty


^ permalink raw reply related

* [PATCH 3/3] rev-list: add --allow-missing-tips to be used with --missing=...
From: Christian Couder @ 2024-02-01 11:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Patrick Steinhardt, John Cai, Christian Couder,
	Christian Couder
In-Reply-To: <20240201115809.1177064-1-christian.couder@gmail.com>

In 9830926c7d (rev-list: add commit object support in `--missing`
option, 2023-10-27) we fixed the `--missing` option in `git rev-list`
so that it now works with commits too.

Unfortunately, such a command would still fail with a "fatal: bad
object <oid>" if it is passed a missing commit, blob or tree as an
argument.

When such a command is used to find the dependencies of some objects,
for example the dependencies of quarantined objects, it would be
better if the command would instead consider such missing objects,
especially commits, in the same way as other missing objects.

If, for example `--missing=print` is used, it would be nice for some
use cases if the missing tips passed as arguments were reported in
the same way as other missing objects instead of the command just
failing.

Let's introduce a new `--allow-missing-tips` option to make it work
like this.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 builtin/rev-list.c          | 24 ++++++++++++++++-
 revision.c                  |  9 ++++---
 revision.h                  |  8 ++++++
 t/t6022-rev-list-missing.sh | 51 +++++++++++++++++++++++++++++++++++++
 4 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index b3f4783858..ae7bb15478 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -562,6 +562,16 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 				break;
 		}
 	}
+	for (i = 1; i < argc; i++) {
+		const char *arg = argv[i];
+		if (!strcmp(arg, "--allow-missing-tips")) {
+			if (arg_missing_action == MA_ERROR)
+				die(_("option '%s' only makes sense with '%s' set to '%s' or '%s'"),
+				      "--allow-missing-tips", "--missing=", "allow-*", "print");
+			revs.do_not_die_on_missing_tips = 1;
+			break;
+		}
+	}
 
 	if (arg_missing_action)
 		revs.do_not_die_on_missing_objects = 1;
@@ -627,6 +637,8 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 			continue; /* already handled above */
 		if (skip_prefix(arg, "--missing=", &arg))
 			continue; /* already handled above */
+		if (!strcmp(arg, "--allow-missing-tips"))
+			continue; /* already handled above */
 
 		if (!strcmp(arg, ("--no-object-names"))) {
 			arg_show_object_names = 0;
@@ -753,9 +765,19 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 
 	if (arg_print_omitted)
 		oidset_init(&omitted_objects, DEFAULT_OIDSET_SIZE);
-	if (arg_missing_action == MA_PRINT)
+	if (arg_missing_action == MA_PRINT) {
+		struct oidset_iter iter;
+		struct object_id *oid;
+
 		oidset_init(&missing_objects, DEFAULT_OIDSET_SIZE);
 
+		/* Already add missing commits */
+		oidset_iter_init(&revs.missing_commits, &iter);
+		while ((oid = oidset_iter_next(&iter)))
+			oidset_insert(&missing_objects, oid);
+		oidset_clear(&revs.missing_commits);
+	}
+
 	traverse_commit_list_filtered(
 		&revs, show_commit, show_object, &info,
 		(arg_print_omitted ? &omitted_objects : NULL));
diff --git a/revision.c b/revision.c
index 4c5cd7c3ce..9f25faa249 100644
--- a/revision.c
+++ b/revision.c
@@ -388,6 +388,10 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
 			return NULL;
 		if (revs->exclude_promisor_objects && is_promisor_object(oid))
 			return NULL;
+		if (revs->do_not_die_on_missing_tips) {
+			oidset_insert(&revs->missing_commits, oid);
+			return NULL;
+		}
 		die("bad object %s", name);
 	}
 	object->flags |= flags;
@@ -1947,6 +1951,7 @@ void repo_init_revisions(struct repository *r,
 	init_display_notes(&revs->notes_opt);
 	list_objects_filter_init(&revs->filter);
 	init_ref_exclusions(&revs->ref_excludes);
+	oidset_init(&revs->missing_commits, 0);
 }
 
 static void add_pending_commit_list(struct rev_info *revs,
@@ -2184,7 +2189,7 @@ static int handle_revision_arg_1(const char *arg_, struct rev_info *revs, int fl
 		verify_non_filename(revs->prefix, arg);
 	object = get_reference(revs, arg, &oid, flags ^ local_flags);
 	if (!object)
-		return revs->ignore_missing ? 0 : -1;
+		return (revs->ignore_missing || revs->do_not_die_on_missing_tips) ? 0 : -1;
 	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
 	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
 	free(oc.path);
@@ -3830,8 +3835,6 @@ int prepare_revision_walk(struct rev_info *revs)
 				       FOR_EACH_OBJECT_PROMISOR_ONLY);
 	}
 
-	oidset_init(&revs->missing_commits, 0);
-
 	if (!revs->reflog_info)
 		prepare_to_use_bloom_filter(revs);
 	if (!revs->unsorted_input)
diff --git a/revision.h b/revision.h
index 94c43138bc..67435a5d8a 100644
--- a/revision.h
+++ b/revision.h
@@ -227,6 +227,14 @@ struct rev_info {
 			 */
 			do_not_die_on_missing_objects:1,
 
+			/*
+			 * When the do_not_die_on_missing_objects flag above is set,
+			 * a rev walk could still die with "fatal: bad object <oid>"
+			 * if one of the tips it is passed is missing. With this flag
+			 * such a tip will be reported as missing too.
+			 */
+			 do_not_die_on_missing_tips:1,
+
 			/* for internal use only */
 			exclude_promisor_objects:1;
 
diff --git a/t/t6022-rev-list-missing.sh b/t/t6022-rev-list-missing.sh
index 527aa94f07..283e8fc2c2 100755
--- a/t/t6022-rev-list-missing.sh
+++ b/t/t6022-rev-list-missing.sh
@@ -77,4 +77,55 @@ do
 	done
 done
 
+for obj in "HEAD~1" "HEAD~1^{tree}" "HEAD:1.t"
+do
+	for tip in "" "HEAD"
+	do
+		for action in "allow-any" "print"
+		do
+			test_expect_success "--missing=$action --allow-missing-tips with tip '$obj' missing and tip '$tip'" '
+				oid="$(git rev-parse $obj)" &&
+				path=".git/objects/$(test_oid_to_path $oid)" &&
+
+				# Before the object is made missing, we use rev-list to
+				# get the expected oids.
+				if [ "$tip" = "HEAD" ]; then
+					git rev-list --objects --no-object-names \
+						HEAD ^$obj >expect.raw
+				else
+					>expect.raw
+				fi &&
+
+				# Blobs are shared by all commits, so even though a commit/tree
+				# might be skipped, its blob must be accounted for.
+				if [ "$tip" = "HEAD" ] && [ $obj != "HEAD:1.t" ]; then
+					echo $(git rev-parse HEAD:1.t) >>expect.raw &&
+					echo $(git rev-parse HEAD:2.t) >>expect.raw
+				fi &&
+
+				mv "$path" "$path.hidden" &&
+				test_when_finished "mv $path.hidden $path" &&
+
+				git rev-list --missing=$action --allow-missing-tips \
+				     --objects --no-object-names $oid $tip >actual.raw &&
+
+				# When the action is to print, we should also add the missing
+				# oid to the expect list.
+				case $action in
+				allow-any)
+					;;
+				print)
+					grep ?$oid actual.raw &&
+					echo ?$oid >>expect.raw
+					;;
+				esac &&
+
+				sort actual.raw >actual &&
+				sort expect.raw >expect &&
+				test_cmp expect actual
+			'
+		done
+	done
+done
+
 test_done
-- 
2.43.0.496.gd667eb0d7d.dirty


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox