Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH] doc/git-bisect: clarify `git bisect run` syntax
From: Patrick Steinhardt @ 2023-10-23  7:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, cousteau via GitGitGadget, git, Javier Mora
In-Reply-To: <xmqqa5sap44i.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 1973 bytes --]

On Sun, Oct 22, 2023 at 05:35:41PM -0700, Junio C Hamano wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> 
> > On Sun, Oct 22, 2023 at 4:03 PM cousteau via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> The description of the `git bisect run` command syntax at the beginning
> >> of the manpage is `git bisect run <cmd>...`, which isn't quite clear
> >> about what `<cmd>` is or what the `...` mean; one could think that it is
> >> the whole (quoted) command line with all arguments in a single string,
> >> or that it supports multiple commands, or that it doesn't accept
> >> commands with arguments at all.
> >>
> >> Change to `git bisect run <cmd> [<arg>...]` to clarify the syntax.
> >
> > Okay, makes sense.
> >
> >> Signed-off-by: Javier Mora <cousteaulecommandant@gmail.com>
> >> ---
> >> diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt
> >> @@ -26,7 +26,7 @@ on the subcommand:
> >> - git bisect run <cmd>...
> >> + git bisect run <cmd> [<arg>...]
> >
> > The output of `git bisect -h` suffers the same problem. Perhaps this
> > patch can fix that, as well?
> 
> Good eyes.
> 
> Not a new problem and obviously can be left outside of this simple
> update, but I wonder if we should eventually move these into the
> proper SYNOPSIS section.  Other multi-modal commands like "git
> checkout", "git rebase", etc. do list different forms all in the
> SYNOPSIS section.
> 
> I also thought at least some commands we know the "-h" output and
> SYNOPSIS match, we had tests to ensure they do not drift apart.  We
> would probably want to cover more subcommands with t0450.
> 
> Thanks.

If we don't want them to drift apart I wonder whether we could instead
generate the synopsis from the output of `-h`? This reduces duplication
at the cost of a more complex build process for our manpages.

Not saying that this is necessarily a good idea, just throwing it out
there.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] fetch: no redundant error message for atomic fetch
From: Patrick Steinhardt @ 2023-10-23  8:27 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Git List, Junio C Hamano, Jiang Xin
In-Reply-To: <ced46baeb1c18b416b4b4cc947f498bea2910b1b.1697725898.git.zhiyou.jx@alibaba-inc.com>

[-- Attachment #1: Type: text/plain, Size: 2719 bytes --]

On Thu, Oct 19, 2023 at 10:34:33PM +0800, Jiang Xin wrote:
> From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> 
> If an error occurs during an atomic fetch, a redundant error message
> will appear at the end of do_fetch(). It was introduced in b3a804663c
> (fetch: make `--atomic` flag cover backfilling of tags, 2022-02-17).
> 
> Instead of displaying the error message unconditionally, the final error
> output should follow the pattern in update-ref.c and files-backend.c as
> follows:
> 
>     if (ref_transaction_abort(transaction, &error))
>         error("abort: %s", error.buf);
> 
> This will fix the test case "fetch porcelain output (atomic)" in t5574.
> 
> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> ---
>  builtin/fetch.c         | 4 +---
>  t/t5574-fetch-output.sh | 2 +-
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index fd134ba74d..01a573cf8d 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1775,10 +1775,8 @@ static int do_fetch(struct transport *transport,
>  	}
>  
>  cleanup:
> -	if (retcode && transaction) {
> -		ref_transaction_abort(transaction, &err);
> +	if (retcode && transaction && ref_transaction_abort(transaction, &err))
>  		error("%s", err.buf);
> -	}

Right. We already call `error()` in all cases where `err` was populated
before we `goto cleanup;`, so calling it unconditionally a second time
here is wrong.

That being said, `ref_transaction_abort()` will end up calling the
respective backend's implementation of `transaction_abort`, and for the
files backend it actually ignores `err` completely. So if the abort
fails, we would still end up calling `error()` with an empty string.
Furthermore, it can happen that `transaction_commit` fails, writes to
the buffer and then prints the error. If the abort now fails as well, we
would end up printing the error message twice.

I wonder whether we should fix this by unifying all calls to `error()`
to only happen in the cleanup block, and only iff the buffer length is
non-zero?

Patrick

>  	display_state_release(&display_state);
>  	close_fetch_head(&fetch_head);
> diff --git a/t/t5574-fetch-output.sh b/t/t5574-fetch-output.sh
> index 1397101629..3c72fc693f 100755
> --- a/t/t5574-fetch-output.sh
> +++ b/t/t5574-fetch-output.sh
> @@ -97,7 +97,7 @@ do
>  		opt=
>  		;;
>  	esac
> -	test_expect_failure "fetch porcelain output ${opt:+(atomic)}" '
> +	test_expect_success "fetch porcelain output ${opt:+(atomic)}" '
>  		test_when_finished "rm -rf porcelain" &&
>  
>  		# Clone and pre-seed the repositories. We fetch references into two
> -- 
> 2.42.0.411.g813d9a9188
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] fetch: no redundant error message for atomic fetch
From: Jiang Xin @ 2023-10-23  9:16 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Git List, Junio C Hamano, Jiang Xin
In-Reply-To: <ZTYue-3gAS1aGXNa@tanuki>

On Mon, Oct 23, 2023 at 4:27 PM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Thu, Oct 19, 2023 at 10:34:33PM +0800, Jiang Xin wrote:
> > @@ -1775,10 +1775,8 @@ static int do_fetch(struct transport *transport,
> >       }
> >
> >  cleanup:
> > -     if (retcode && transaction) {
> > -             ref_transaction_abort(transaction, &err);
> > +     if (retcode && transaction && ref_transaction_abort(transaction, &err))
> >               error("%s", err.buf);
> > -     }
>
> Right. We already call `error()` in all cases where `err` was populated
> before we `goto cleanup;`, so calling it unconditionally a second time
> here is wrong.
>
> That being said, `ref_transaction_abort()` will end up calling the
> respective backend's implementation of `transaction_abort`, and for the
> files backend it actually ignores `err` completely. So if the abort
> fails, we would still end up calling `error()` with an empty string.

The transaction_abort implementations of the two builtin refs backends
will not use "err“ because they never fail (always return 0). Some one
may want to implement their own refs backend which may use the "err"
variable in their "transaction_abort". So follow the pattern as
update-ref.c and files-backend.c to call ref_transaction_abort() is
safe.

> Furthermore, it can happen that `transaction_commit` fails, writes to
> the buffer and then prints the error. If the abort now fails as well, we
> would end up printing the error message twice.

The abort never fails so error message from transaction_commit() will
not reach the code.

--
Jiang Xin

^ permalink raw reply

* Re: [PATCH v4 4/7] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source`
From: Patrick Steinhardt @ 2023-10-23  9:19 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
In-Reply-To: <e427fe6ad383cc238c13f313dc9f11eab37a3840.1697736516.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 2588 bytes --]

On Thu, Oct 19, 2023 at 01:28:51PM -0400, Taylor Blau wrote:
> Continue to prepare for streaming an object's contents directly from
> memory by teaching `bulk_checkin_source` how to perform reads and seeks
> based on an address in memory.
> 
> Unlike file descriptors, which manage their own offset internally, we
> have to keep track of how many bytes we've read out of the buffer, and
> make sure we don't read past the end of the buffer.
> 
> Suggested-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  bulk-checkin.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/bulk-checkin.c b/bulk-checkin.c
> index 28bc8d5ab4..60361b3e2e 100644
> --- a/bulk-checkin.c
> +++ b/bulk-checkin.c
> @@ -141,11 +141,15 @@ static int already_written(struct bulk_checkin_packfile *state, struct object_id
>  }
>  
>  struct bulk_checkin_source {
> -	enum { SOURCE_FILE } type;
> +	enum { SOURCE_FILE, SOURCE_INCORE } type;
>  
>  	/* SOURCE_FILE fields */
>  	int fd;
>  
> +	/* SOURCE_INCORE fields */
> +	const void *buf;
> +	size_t read;
> +
>  	/* common fields */
>  	size_t size;
>  	const char *path;
> @@ -157,6 +161,11 @@ static off_t bulk_checkin_source_seek_to(struct bulk_checkin_source *source,
>  	switch (source->type) {
>  	case SOURCE_FILE:
>  		return lseek(source->fd, offset, SEEK_SET);
> +	case SOURCE_INCORE:
> +		if (!(0 <= offset && offset < source->size))
> +			return (off_t)-1;
> +		source->read = offset;
> +		return source->read;
>  	default:
>  		BUG("unknown bulk-checkin source: %d", source->type);
>  	}
> @@ -168,6 +177,13 @@ static ssize_t bulk_checkin_source_read(struct bulk_checkin_source *source,
>  	switch (source->type) {
>  	case SOURCE_FILE:
>  		return read_in_full(source->fd, buf, nr);
> +	case SOURCE_INCORE:
> +		assert(source->read <= source->size);

Is there any guideline around when to use `assert()` vs `BUG()`? I think
that this assertion here is quite critical, because when it does not
hold we can end up performing out-of-bounds reads and writes. But as
asserts are typically missing in non-debug builds, this safeguard would
not do anything for our end users, right?

Patrick

> +		if (nr > source->size - source->read)
> +			nr = source->size - source->read;
> +		memcpy(buf, (unsigned char *)source->buf + source->read, nr);
> +		source->read += nr;
> +		return nr;
>  	default:
>  		BUG("unknown bulk-checkin source: %d", source->type);
>  	}
> -- 
> 2.42.0.405.g86fe3250c2
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v4 0/7] merge-ort: implement support for packing objects together
From: Patrick Steinhardt @ 2023-10-23  9:19 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Elijah Newren, Eric W. Biederman, Jeff King, Junio C Hamano
In-Reply-To: <cover.1697736516.git.me@ttaylorr.com>

[-- Attachment #1: Type: text/plain, Size: 13253 bytes --]

On Thu, Oct 19, 2023 at 01:28:38PM -0400, Taylor Blau wrote:
> (Rebased onto the current tip of 'master', which is 813d9a9188 (The
> nineteenth batch, 2023-10-18) at the time of writing).
> 
> This series implements support for a new merge-tree option,
> `--write-pack`, which causes any newly-written objects to be packed
> together instead of being stored individually as loose.
> 
> I intentionally broke this off from the existing thread, since I
> accidentally rerolled mine and Jonathan Tan's Bloom v2 series into it,
> causing some confusion.
> 
> This is a new round that is significantly simplified thanks to
> another very helpful suggestion[1] from Junio. By factoring out a common
> "deflate object to pack" that takes an abstract bulk_checkin_source as a
> parameter, all of the earlier refactorings can be dropped since we
> retain only a single caller instead of multiple.
> 
> This resulted in a rather satisfying range-diff (included below, as
> usual), and a similarly satisfying inter-diff:
> 
>     $ git diff --stat tb/ort-bulk-checkin.v3..
>      bulk-checkin.c | 203 ++++++++++++++++---------------------------------
>      1 file changed, 64 insertions(+), 139 deletions(-)
> 
> Beyond that, the changes since last time can be viewed in the range-diff
> below. Thanks in advance for any review!
> 
> [1]: https://lore.kernel.org/git/xmqq34y7plj4.fsf@gitster.g/

A single question regarding an `assert()` from my side. Other than that
the series looks good to me, thanks.

Patrick

> Taylor Blau (7):
>   bulk-checkin: extract abstract `bulk_checkin_source`
>   bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
>   bulk-checkin: refactor deflate routine to accept a
>     `bulk_checkin_source`
>   bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source`
>   bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
>   bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
>   builtin/merge-tree.c: implement support for `--write-pack`
> 
>  Documentation/git-merge-tree.txt |   4 +
>  builtin/merge-tree.c             |   5 +
>  bulk-checkin.c                   | 161 ++++++++++++++++++++++++++-----
>  bulk-checkin.h                   |   8 ++
>  merge-ort.c                      |  42 ++++++--
>  merge-recursive.h                |   1 +
>  t/t4301-merge-tree-write-tree.sh |  93 ++++++++++++++++++
>  7 files changed, 280 insertions(+), 34 deletions(-)
> 
> Range-diff against v3:
>  1:  2dffa45183 <  -:  ---------- bulk-checkin: factor out `format_object_header_hash()`
>  2:  7a10dc794a <  -:  ---------- bulk-checkin: factor out `prepare_checkpoint()`
>  3:  20c32d2178 <  -:  ---------- bulk-checkin: factor out `truncate_checkpoint()`
>  4:  893051d0b7 <  -:  ---------- bulk-checkin: factor out `finalize_checkpoint()`
>  5:  da52ec8380 !  1:  97bb6e9f59 bulk-checkin: extract abstract `bulk_checkin_source`
>     @@ bulk-checkin.c: static int stream_blob_to_pack(struct bulk_checkin_packfile *sta
>       			if (*already_hashed_to < offset) {
>       				size_t hsize = offset - *already_hashed_to;
>      @@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
>     - 	git_hash_ctx ctx;
>     + 	unsigned header_len;
>       	struct hashfile_checkpoint checkpoint = {0};
>       	struct pack_idx_entry *idx = NULL;
>      +	struct bulk_checkin_source source = {
>     @@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *st
>       	seekback = lseek(fd, 0, SEEK_CUR);
>       	if (seekback == (off_t) -1)
>      @@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
>     - 	while (1) {
>     - 		prepare_checkpoint(state, &checkpoint, idx, flags);
>     + 			crc32_begin(state->f);
>     + 		}
>       		if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
>      -					 fd, size, path, flags))
>      +					 &source, flags))
>       			break;
>     - 		truncate_checkpoint(state, &checkpoint, idx);
>     + 		/*
>     + 		 * Writing this object to the current pack will make
>     +@@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
>     + 		hashfile_truncate(state->f, &checkpoint);
>     + 		state->offset = checkpoint.offset;
>     + 		flush_bulk_checkin_packfile(state);
>      -		if (lseek(fd, seekback, SEEK_SET) == (off_t) -1)
>      +		if (bulk_checkin_source_seek_to(&source, seekback) == (off_t)-1)
>       			return error("cannot seek back");
>       	}
>     - 	finalize_checkpoint(state, &ctx, &checkpoint, idx, result_oid);
>     + 	the_hash_algo->final_oid_fn(result_oid, &ctx);
>  7:  04ec74e357 !  2:  9d633df339 bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types
>     @@ bulk-checkin.c: static int stream_blob_to_pack(struct bulk_checkin_packfile *sta
>       	s.avail_out = sizeof(obuf) - hdrlen;
>       
>      @@ bulk-checkin.c: static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
>     - 
>     - 	while (1) {
>     - 		prepare_checkpoint(state, &checkpoint, idx, flags);
>     + 			idx->offset = state->offset;
>     + 			crc32_begin(state->f);
>     + 		}
>      -		if (!stream_blob_to_pack(state, &ctx, &already_hashed_to,
>      -					 &source, flags))
>      +		if (!stream_obj_to_pack(state, &ctx, &already_hashed_to,
>      +					&source, OBJ_BLOB, flags))
>       			break;
>     - 		truncate_checkpoint(state, &checkpoint, idx);
>     - 		if (bulk_checkin_source_seek_to(&source, seekback) == (off_t)-1)
>     + 		/*
>     + 		 * Writing this object to the current pack will make
>  -:  ---------- >  3:  d5bbd7810e bulk-checkin: refactor deflate routine to accept a `bulk_checkin_source`
>  6:  4e9bac5bc1 =  4:  e427fe6ad3 bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source`
>  8:  8667b76365 !  5:  48095afe80 bulk-checkin: introduce `index_blob_bulk_checkin_incore()`
>     @@ Commit message
>          objects individually as loose.
>      
>          Similar to the existing `index_blob_bulk_checkin()` function, the
>     -    entrypoint delegates to `deflate_blob_to_pack_incore()`, which is
>     -    responsible for formatting the pack header and then deflating the
>     -    contents into the pack. The latter is accomplished by calling
>     -    deflate_obj_contents_to_pack_incore(), which takes advantage of the
>     -    earlier refactorings and is responsible for writing the object to the
>     -    pack and handling any overage from pack.packSizeLimit.
>     -
>     -    The bulk of the new functionality is implemented in the function
>     -    `stream_obj_to_pack()`, which can handle streaming objects from memory
>     -    to the bulk-checkin pack as a result of the earlier refactoring.
>     +    entrypoint delegates to `deflate_obj_to_pack_incore()`. That function in
>     +    turn delegates to deflate_obj_to_pack(), which is responsible for
>     +    formatting the pack header and then deflating the contents into the
>     +    pack.
>      
>          Consistent with the rest of the bulk-checkin mechanism, there are no
>          direct tests here. In future commits when we expose this new
>     @@ Commit message
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## bulk-checkin.c ##
>     -@@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *state,
>     - 	}
>     +@@ bulk-checkin.c: static int deflate_obj_to_pack(struct bulk_checkin_packfile *state,
>     + 	return 0;
>       }
>       
>     -+static int deflate_obj_contents_to_pack_incore(struct bulk_checkin_packfile *state,
>     -+					       git_hash_ctx *ctx,
>     -+					       struct hashfile_checkpoint *checkpoint,
>     -+					       struct object_id *result_oid,
>     -+					       const void *buf, size_t size,
>     -+					       enum object_type type,
>     -+					       const char *path, unsigned flags)
>     ++static int deflate_obj_to_pack_incore(struct bulk_checkin_packfile *state,
>     ++				       struct object_id *result_oid,
>     ++				       const void *buf, size_t size,
>     ++				       const char *path, enum object_type type,
>     ++				       unsigned flags)
>      +{
>     -+	struct pack_idx_entry *idx = NULL;
>     -+	off_t already_hashed_to = 0;
>      +	struct bulk_checkin_source source = {
>      +		.type = SOURCE_INCORE,
>      +		.buf = buf,
>     @@ bulk-checkin.c: static void finalize_checkpoint(struct bulk_checkin_packfile *st
>      +		.path = path,
>      +	};
>      +
>     -+	/* Note: idx is non-NULL when we are writing */
>     -+	if (flags & HASH_WRITE_OBJECT)
>     -+		CALLOC_ARRAY(idx, 1);
>     -+
>     -+	while (1) {
>     -+		prepare_checkpoint(state, checkpoint, idx, flags);
>     -+
>     -+		if (!stream_obj_to_pack(state, ctx, &already_hashed_to, &source,
>     -+					type, flags))
>     -+			break;
>     -+		truncate_checkpoint(state, checkpoint, idx);
>     -+		bulk_checkin_source_seek_to(&source, 0);
>     -+	}
>     -+
>     -+	finalize_checkpoint(state, ctx, checkpoint, idx, result_oid);
>     -+
>     -+	return 0;
>     -+}
>     -+
>     -+static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
>     -+				       struct object_id *result_oid,
>     -+				       const void *buf, size_t size,
>     -+				       const char *path, unsigned flags)
>     -+{
>     -+	git_hash_ctx ctx;
>     -+	struct hashfile_checkpoint checkpoint = {0};
>     -+
>     -+	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_BLOB,
>     -+				  size);
>     -+
>     -+	return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
>     -+						   result_oid, buf, size,
>     -+						   OBJ_BLOB, path, flags);
>     ++	return deflate_obj_to_pack(state, result_oid, &source, type, 0, flags);
>      +}
>      +
>       static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
>     @@ bulk-checkin.c: int index_blob_bulk_checkin(struct object_id *oid,
>      +				   const void *buf, size_t size,
>      +				   const char *path, unsigned flags)
>      +{
>     -+	int status = deflate_blob_to_pack_incore(&bulk_checkin_packfile, oid,
>     -+						 buf, size, path, flags);
>     ++	int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid,
>     ++						buf, size, path, OBJ_BLOB,
>     ++						flags);
>      +	if (!odb_transaction_nesting)
>      +		flush_bulk_checkin_packfile(&bulk_checkin_packfile);
>      +	return status;
>  9:  cba043ef14 !  6:  60568f9281 bulk-checkin: introduce `index_tree_bulk_checkin_incore()`
>     @@ Commit message
>          machinery will have enough to keep track of the converted object's hash
>          in order to update the compatibility mapping.
>      
>     -    Within `deflate_tree_to_pack_incore()`, the changes should be limited
>     -    to something like:
>     +    Within some thin wrapper around `deflate_obj_to_pack_incore()` (perhaps
>     +    `deflate_tree_to_pack_incore()`), the changes should be limited to
>     +    something like:
>      
>              struct strbuf converted = STRBUF_INIT;
>              if (the_repository->compat_hash_algo) {
>     @@ Commit message
>          Signed-off-by: Taylor Blau <me@ttaylorr.com>
>      
>       ## bulk-checkin.c ##
>     -@@ bulk-checkin.c: static int deflate_blob_to_pack_incore(struct bulk_checkin_packfile *state,
>     - 						   OBJ_BLOB, path, flags);
>     - }
>     - 
>     -+static int deflate_tree_to_pack_incore(struct bulk_checkin_packfile *state,
>     -+				       struct object_id *result_oid,
>     -+				       const void *buf, size_t size,
>     -+				       const char *path, unsigned flags)
>     -+{
>     -+	git_hash_ctx ctx;
>     -+	struct hashfile_checkpoint checkpoint = {0};
>     -+
>     -+	format_object_header_hash(the_hash_algo, &ctx, &checkpoint, OBJ_TREE,
>     -+				  size);
>     -+
>     -+	return deflate_obj_contents_to_pack_incore(state, &ctx, &checkpoint,
>     -+						   result_oid, buf, size,
>     -+						   OBJ_TREE, path, flags);
>     -+}
>     -+
>     - static int deflate_blob_to_pack(struct bulk_checkin_packfile *state,
>     - 				struct object_id *result_oid,
>     - 				int fd, size_t size,
>      @@ bulk-checkin.c: int index_blob_bulk_checkin_incore(struct object_id *oid,
>       	return status;
>       }
>     @@ bulk-checkin.c: int index_blob_bulk_checkin_incore(struct object_id *oid,
>      +				   const void *buf, size_t size,
>      +				   const char *path, unsigned flags)
>      +{
>     -+	int status = deflate_tree_to_pack_incore(&bulk_checkin_packfile, oid,
>     -+						 buf, size, path, flags);
>     ++	int status = deflate_obj_to_pack_incore(&bulk_checkin_packfile, oid,
>     ++						buf, size, path, OBJ_TREE,
>     ++						flags);
>      +	if (!odb_transaction_nesting)
>      +		flush_bulk_checkin_packfile(&bulk_checkin_packfile);
>      +	return status;
> 10:  ae70508037 =  7:  b9be9df122 builtin/merge-tree.c: implement support for `--write-pack`
> -- 
> 2.42.0.405.g86fe3250c2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v1 3/4] config: factor out global config file retrieval
From: Patrick Steinhardt @ 2023-10-23  9:58 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git, stolee
In-Reply-To: <147c767443c35b3b4a5516bf40557f41bb201078.1697660181.git.code@khaugsbakk.name>

[-- Attachment #1: Type: text/plain, Size: 3656 bytes --]

On Wed, Oct 18, 2023 at 10:28:40PM +0200, Kristoffer Haugsbakk wrote:
> 
> Factor out code that retrieves the global config file so that we can use
> it in `gc.c` as well.
> 
> Use the old name from the previous commit since this function acts
> functionally the same as `git_system_config` but for “global”.
> 
> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
> ---
>  builtin/config.c | 25 ++-----------------------
>  config.c         | 24 ++++++++++++++++++++++++
>  config.h         |  1 +
>  3 files changed, 27 insertions(+), 23 deletions(-)
> 
> diff --git a/builtin/config.c b/builtin/config.c
> index 6fff2655816..df06b766fad 100644
> --- a/builtin/config.c
> +++ b/builtin/config.c
> @@ -708,30 +708,9 @@ int cmd_config(int argc, const char **argv, const char *prefix)
>  	}
>  
>  	if (use_global_config) {
> -		char *user_config, *xdg_config;
> -
> -		git_global_config_paths(&user_config, &xdg_config);
> -		if (!user_config)
> -			/*
> -			 * It is unknown if HOME/.gitconfig exists, so
> -			 * we do not know if we should write to XDG
> -			 * location; error out even if XDG_CONFIG_HOME
> -			 * is set and points at a sane location.
> -			 */
> -			die(_("$HOME not set"));
> -
> +		given_config_source.file = git_global_config();
>  		given_config_source.scope = CONFIG_SCOPE_GLOBAL;
> -
> -		if (access_or_warn(user_config, R_OK, 0) &&
> -		    xdg_config && !access_or_warn(xdg_config, R_OK, 0)) {
> -			given_config_source.file = xdg_config;
> -			free(user_config);
> -		} else {
> -			given_config_source.file = user_config;
> -			free(xdg_config);
> -		}
> -	}
> -	else if (use_system_config) {
> +	} else if (use_system_config) {
>  		given_config_source.file = git_system_config();
>  		given_config_source.scope = CONFIG_SCOPE_SYSTEM;
>  	} else if (use_local_config) {
> diff --git a/config.c b/config.c
> index d2cdda96edd..2ff766c56ff 100644
> --- a/config.c
> +++ b/config.c
> @@ -2111,6 +2111,30 @@ char *git_system_config(void)
>  	return system_config;
>  }
>  
> +char *git_global_config(void)
> +{
> +	char *user_config, *xdg_config;
> +
> +	git_global_config_paths(&user_config, &xdg_config);
> +	if (!user_config)
> +		/*
> +		 * It is unknown if HOME/.gitconfig exists, so
> +		 * we do not know if we should write to XDG

Nit: we don't know about the intent of the caller, so they may not want
to write to the file but only read it.

> +		 * location; error out even if XDG_CONFIG_HOME
> +		 * is set and points at a sane location.
> +		 */
> +		die(_("$HOME not set"));

Is it sensible to `die()` here in this new function that behaves more
like a library function? I imagine it would be more sensible to indicate
the error to the user and let them handle it accordingly.

Patrick

> +
> +	if (access_or_warn(user_config, R_OK, 0) && xdg_config &&
> +	    !access_or_warn(xdg_config, R_OK, 0)) {
> +		free(user_config);
> +		return xdg_config;
> +	} else {
> +		free(xdg_config);
> +		return user_config;
> +	}
> +}
> +
>  void git_global_config_paths(char **user_out, char **xdg_out)
>  {
>  	char *user_config = xstrdup_or_null(getenv("GIT_CONFIG_GLOBAL"));
> diff --git a/config.h b/config.h
> index 9f04de8ee3e..5cf961b548d 100644
> --- a/config.h
> +++ b/config.h
> @@ -394,6 +394,7 @@ int config_error_nonbool(const char *);
>  #endif
>  
>  char *git_system_config(void);
> +char *git_global_config(void);
>  void git_global_config_paths(char **user, char **xdg);
>  
>  int git_config_parse_parameter(const char *, config_fn_t fn, void *data);
> -- 
> 2.42.0.2.g879ad04204
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v1 4/4] maintenance: use XDG config if it exists
From: Patrick Steinhardt @ 2023-10-23  9:58 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: git, stolee
In-Reply-To: <1e2376a4b998b5b182cc5f72afc7282134bcdf2c.1697660181.git.code@khaugsbakk.name>

[-- Attachment #1: Type: text/plain, Size: 4194 bytes --]

On Wed, Oct 18, 2023 at 10:28:41PM +0200, Kristoffer Haugsbakk wrote:
> 
> `git maintenance register` registers the repository in the user's global
> config. `$XDG_CONFIG_HOME/git/config` is supposed to be used if
> `~/.gitconfig` does not exist. However, this command creates a
> `~/.gitconfig` file and writes to that one even though the XDG variant
> exists.
> 
> This used to work correctly until 50a044f1e4 (gc: replace config
> subprocesses with API calls, 2022-09-27), when the command started calling
> the config API instead of git-config(1).
> 
> Also change `unregister` accordingly.
> 
> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
> ---
>  builtin/gc.c           | 23 +++++------------------
>  t/t7900-maintenance.sh | 21 +++++++++++++++++++++
>  2 files changed, 26 insertions(+), 18 deletions(-)
> 
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 17fc031f63a..7b780f2ab38 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -1526,19 +1526,12 @@ static int maintenance_register(int argc, const char **argv, const char *prefix)
>  
>  	if (!found) {
>  		int rc;
> -		char *user_config = NULL, *xdg_config = NULL;
>  
> -		if (!config_file) {
> -			git_global_config_paths(&user_config, &xdg_config);
> -			config_file = user_config;
> -			if (!user_config)
> -				die(_("$HOME not set"));
> -		}
> +		if (!config_file)
> +			config_file = git_global_config();
>  		rc = git_config_set_multivar_in_file_gently(
>  			config_file, "maintenance.repo", maintpath,
>  			CONFIG_REGEX_NONE, 0);
> -		free(user_config);
> -		free(xdg_config);

Don't we have to free `config_file` now?

>  		if (rc)
>  			die(_("unable to add '%s' value of '%s'"),
> @@ -1595,18 +1588,12 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
>  
>  	if (found) {
>  		int rc;
> -		char *user_config = NULL, *xdg_config = NULL;
> -		if (!config_file) {
> -			git_global_config_paths(&user_config, &xdg_config);
> -			config_file = user_config;
> -			if (!user_config)
> -				die(_("$HOME not set"));
> -		}
> +
> +		if (!config_file)
> +			config_file = git_global_config();
>  		rc = git_config_set_multivar_in_file_gently(
>  			config_file, key, NULL, maintpath,
>  			CONFIG_FLAGS_MULTI_REPLACE | CONFIG_FLAGS_FIXED_VALUE);
> -		free(user_config);
> -		free(xdg_config);

Same here.

>  		if (rc &&
>  		    (!force || rc == CONFIG_NOTHING_SET))
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 487e326b3fa..a11e6c61520 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -67,6 +67,27 @@ test_expect_success 'maintenance.auto config option' '
>  	test_subcommand ! git maintenance run --auto --quiet  <false
>  '
>  
> +test_expect_success 'register uses XDG_CONFIG_HOME config if it exists' '
> +	XDG_CONFIG_HOME=.config &&
> +	test_when_finished rm -r "$XDG_CONFIG_HOME"/git/config &&
> +	export "XDG_CONFIG_HOME" &&

Style: there is no need to quote here.

Also, I think we need to unset this variable at the end of this test as
tests don't run in a subshell. In theory, we should also be able to
leave this variable unset completely as it will be derived from HOME
automatically anyway.

> +	mkdir -p "$XDG_CONFIG_HOME"/git &&
> +	touch "$XDG_CONFIG_HOME"/git/config &&
> +	git maintenance register &&
> +	git config --file="$XDG_CONFIG_HOME"/git/config --get maintenance.repo >actual &&
> +	pwd >expect &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'register does not need XDG_CONFIG_HOME config to exist' '
> +	test_when_finished git maintenance unregister &&
> +	test_path_is_missing "$XDG_CONFIG_HOME"/git/config &&
> +	git maintenance register &&
> +	git config --global --get maintenance.repo >actual &&
> +	pwd >expect &&
> +	test_cmp expect actual
> +'
> +

As you also change the code of `git maintenance unregister`, should we
test its behaviour, as well?

Patrick

>  test_expect_success 'maintenance.<task>.enabled' '
>  	git config maintenance.gc.enabled false &&
>  	git config maintenance.commit-graph.enabled true &&
> -- 
> 2.42.0.2.g879ad04204
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 2/2] fetch: no redundant error message for atomic fetch
From: Patrick Steinhardt @ 2023-10-23 10:07 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Git List, Junio C Hamano, Jiang Xin
In-Reply-To: <CANYiYbEJ_mHdsPM3-huDPFktSWFhrpoz7Cvf000JSfZM2cco9w@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2248 bytes --]

On Mon, Oct 23, 2023 at 05:16:20PM +0800, Jiang Xin wrote:
> On Mon, Oct 23, 2023 at 4:27 PM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > On Thu, Oct 19, 2023 at 10:34:33PM +0800, Jiang Xin wrote:
> > > @@ -1775,10 +1775,8 @@ static int do_fetch(struct transport *transport,
> > >       }
> > >
> > >  cleanup:
> > > -     if (retcode && transaction) {
> > > -             ref_transaction_abort(transaction, &err);
> > > +     if (retcode && transaction && ref_transaction_abort(transaction, &err))
> > >               error("%s", err.buf);
> > > -     }
> >
> > Right. We already call `error()` in all cases where `err` was populated
> > before we `goto cleanup;`, so calling it unconditionally a second time
> > here is wrong.
> >
> > That being said, `ref_transaction_abort()` will end up calling the
> > respective backend's implementation of `transaction_abort`, and for the
> > files backend it actually ignores `err` completely. So if the abort
> > fails, we would still end up calling `error()` with an empty string.
> 
> The transaction_abort implementations of the two builtin refs backends
> will not use "err“ because they never fail (always return 0). Some one
> may want to implement their own refs backend which may use the "err"
> variable in their "transaction_abort". So follow the pattern as
> update-ref.c and files-backend.c to call ref_transaction_abort() is
> safe.
> 
> > Furthermore, it can happen that `transaction_commit` fails, writes to
> > the buffer and then prints the error. If the abort now fails as well, we
> > would end up printing the error message twice.
> 
> The abort never fails so error message from transaction_commit() will
> not reach the code.

With that reasoning we could get rid of the error handling of abort
completely as it's known not to fail. But only because it does not fail
right now doesn't mean that it won't in the future, as the infra for it
to fail is all in place. And in case it ever does the current code will
run into the bug I described.

So in my opinion, we should either refactor the code to clarify that
this cannot fail indeed. Or do the right thing and handle the error case
correctly, which right now we don't.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] commit: detect commits that exist in commit-graph but not in the ODB
From: Patrick Steinhardt @ 2023-10-23 10:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Karthik Nayak, Taylor Blau
In-Reply-To: <20231020100024.GA2194074@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]

On Fri, Oct 20, 2023 at 06:00:24AM -0400, Jeff King wrote:
> On Thu, Oct 19, 2023 at 10:16:56AM -0700, Junio C Hamano wrote:
> 
> > Patrick Steinhardt <ps@pks.im> writes:
> > 
> > > There's another way to handle this, which is to conditionally enable the
> > > object existence check. This would be less of a performance hit compared
> > > to disabling commit graphs altogether with `--missing`, but still ensure
> > > that repository corruption was detected. Second, it would not regress
> > > performance for all preexisting users of `repo_parse_commit_gently()`.
> > 
> > The above was what I meant to suggest when you demonstrated that the
> > code with additional check is still much more performant than
> > running without the commit-graph optimization, yet has observable
> > impact on performance for normal codepaths that do not need the
> > extra carefulness.
> > 
> > But I wasn't sure if it is easy to plumb the "do we want to double
> > check?  in other words, are we running something like --missing that
> > care the correctness a bit more than usual cases?" bit down from the
> > caller, because this check is so deep in the callchain.
> 
> I wonder if we would want a "be extra careful" flag that is read from
> the environment? That is largely how GIT_REF_PARANOIA works, and then
> particular operations set it (though actually it is the default these
> days, so they no longer do so explicitly).
> 
> I guess that is really like a global variable but even more gross. ;)
> But it is nice that it can cross process boundaries, because "how
> careful do we want to be" may be in the eye of the caller, especially
> for plumbing commands. E.g., even without --missing, you may want
> "rev-list" to be extra careful about checking the existence of commits.
> The caller in check_connected() could arguably turn that on by default
> (the way we used to turn on GIT_REF_PARANOIA for pruning repacks before
> it became the default).

Good point indeed, I also started to ask myself whether we'd want to
have a config option. An environment variable `GIT_OBJECT_PARANOIA`
would work equally well though and be less "official".

What I like about this idea is that it would also allow us to easily
unify the behaviour of `lookup_commit_in_graph()` and the new sanity
check in `repo_parse_commit_gently()` that I'm introducing here. I'd
personally think that the default for any such toggle should be `true`,
also to keep the current behaviour of `lookup_commit_in_graph()`. But
changing it would be easy enough.

I'll send a v2 of this series with such a toggle.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* malformed commit messages for git version 2.42.0 when "commit.verbose true"
From: matthieu @ 2023-10-23 10:36 UTC (permalink / raw)
  To: git

Hello,

I recently set the option `commit.verbose` to `true` for git version 2.42.0, and I really like it.

However, most of the time, commits are malformed if I don’t remove the diff after the marker 
`# ------------------------ >8 ------------------------`

I would expect that anything after the marker is automatically removed by git, or could it be that some pre-commit script see the raw commit message before it is cleaned-up by git itself?

Best,
Matthieu

^ permalink raw reply

* Re: [RFC PATCH 0/5] Introduce -t, --table for status/add commands
From: Oswald Buddenhagen @ 2023-10-23 10:52 UTC (permalink / raw)
  To: Dragan Simic; +Cc: Jacob Stopak, git
In-Reply-To: <58a6a25a7b2eb82c21d9b87143033cef@manjaro.org>

On Sun, Oct 22, 2023 at 02:55:05PM +0200, Dragan Simic wrote:
>Oh, that's awesome and I'm really happy to be wrong with my broad 
>classification of VCS users.  However, I still need to be convinced 
>further, and I'd assign your example as an exception to the rules, 
>
i don't see myself as exceptional at all in that regard.
in fact, your second user group seems like unicorns, and the first like 
a disparaging attitude from an elitist. in reality, users lie on a 
spectrum of willingness to engage with the details of the tools they 
use, and that willingness is circumstantial. a tool that is forthcoming 
with information has a higher chance of being actively engaged.

> especially because you migrated to git from another VCS,
>
that isn't all that uncommon, esp. in the older cohorts. and unless git 
achieves a total monopoly, it will remain a regular occurence.

but i don't see how that advances your argument anyway.

> which you liked,
>
i didn't say that.

> and because you use the command line a lot.
>
in my experience, this isn't uncommon for users of "discrete" vcs'es at 
all, even if they aren't too interested in the details. they just copy 
"magic incantations" from stackoverflow, etc. - disgusting, right? ;-)

>After using git for a while, I can firmly say that git is awesome, but 
>that it also is a total overkill for many projects that need a VCS, for 
>which choosing Subversion would be a much batter choice.  Why, you'll 
>ask?  Because Subversion is many times simpler, and because many 
>projects actually don't need a distributed VCS.
>
that line of reasoning seems mostly bogus to me. every project can 
benefit from using a dvcs - reviewing and polishing the history prior to 
publication leads to higher quality (primarily of the history, but such 
polishing often also transpires to the code itself), so encouraging it 
is a useful default (of course interested users can just use git-svn, 
but that's a bigger step than just having a closer look at what was 
shoved in their faces).

git is indeed pretty much by definition many times harder than svn, 
simply because it splits the process of submission into three stages.  
however, this is not a _useful_ definition, as everyone can remember to 
use `git commit -a && git push` instead of `svn commit`. the real 
complexity comes from all the interesting things one can do because of 
the split stages, but that's optional (well, until you need to pull and 
you get a merge conflict - unlike with svn, git leaves the repo in a 
"weird" state, and the poor communication of that was in fact the major 
source of frustration for me when i started).

>I also ask myself why would I use git-gui or any other GUI utility?  To 
>me, clicking on something that represents a file is often simply wrong.  
>
that makes you an outlier. most people find point-and-click interaction 
rather intuitive and significantly more efficient than encoding their 
intent into character sequences.

also, i said "add -p". selecting hunks (and even single lines) plays in 
a wholly different league than whole files.

>Though, I understand that many people prefer GUI utilities and I 
>respect that, everyone is free to do anything, but I also expect others 
>to respect my own preferences.
>
where did anyone even suggest disrespecting your preferences?

you should however consider whether your preferences are a good default 
for the wider audience, even within the context of the command line.

i for one think that it would be a perfectly valid experiment to go 
all-in and beyond with jacob's proposal - _and make it the default_ 
(when the output is a tty). more advanced users who feel annoyed would 
be expected to opt out of it via configuration, as they are for the 
advice messages. because it's really the same idea, only thought bigger.

regards

^ permalink raw reply

* [PATCH v2 0/2] commit-graph: detect commits missing in ODB
From: Patrick Steinhardt @ 2023-10-23 11:27 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Taylor Blau, Jeff King

[-- Attachment #1: Type: text/plain, Size: 6482 bytes --]

Hi,

this is version 2 of my patch series to more readily detect commits
parsed from the commit graph which are missing in the object database.
I've split this out into a separate thread as version 1 was sent in
reply to a different series to extend git-rev-list(1)'s `--missing`
option so that I don't continue to hijack this thread.

Changes compared to v1:

    - I've added a preparatory patch that introduced a new
      GIT_COMMIT_GRAPH_PARANOIA environment variable as suggested by
      Peff. This envvar is retrofitted to the preexisting check in
      `lookup_commit_in_graph()` so that the behaviour for this sanity
      check is consistent.

    - `repo_parse_commit_internal()` now also honors this new envvar.

    - I've amended the commit message in v2 to include the benchmark
      that demonstrates the performance regression.

    - We now un-parse the commit when parsing it via the commit graph
      succeeded, but it doesn't exist in the ODB.

Thanks for all the feedback and discussion around this.

Patrick

[1]: <b0bf576c51a706367a758b8e30eca37edb9c2734.1697200576.git.ps@pks.im>

Patrick Steinhardt (2):
  commit-graph: introduce envvar to disable commit existence checks
  commit: detect commits that exist in commit-graph but not in the ODB

 Documentation/git.txt   |  9 ++++++++
 commit-graph.c          |  6 +++++-
 commit-graph.h          |  6 ++++++
 commit.c                | 16 +++++++++++++-
 t/t5318-commit-graph.sh | 48 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 83 insertions(+), 2 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  a89c435528 commit-graph: introduce envvar to disable commit existence checks
1:  6ec1e340f8 ! 2:  0476d48555 commit: detect commits that exist in commit-graph but not in the ODB
    @@ Commit message
         behaviour by checking for object existence via the object database, as
         well.
     
    +    This check of course comes with a performance penalty. The following
    +    benchmarks have been executed in a clone of linux.git with stable tags
    +    added:
    +
    +        Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
    +          Time (mean ± σ):      2.913 s ±  0.018 s    [User: 2.363 s, System: 0.548 s]
    +          Range (min … max):    2.894 s …  2.950 s    10 runs
    +
    +        Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +          Time (mean ± σ):      3.834 s ±  0.052 s    [User: 3.276 s, System: 0.556 s]
    +          Range (min … max):    3.780 s …  3.961 s    10 runs
    +
    +        Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
    +          Time (mean ± σ):     13.841 s ±  0.084 s    [User: 13.152 s, System: 0.687 s]
    +          Range (min … max):   13.714 s … 13.995 s    10 runs
    +
    +        Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +          Time (mean ± σ):     13.762 s ±  0.116 s    [User: 13.094 s, System: 0.667 s]
    +          Range (min … max):   13.645 s … 14.038 s    10 runs
    +
    +        Summary
    +          git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
    +            1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +            4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
    +            4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)
    +
    +    We look at a ~30% regression in general, but in general we're still a
    +    whole lot faster than without the commit graph. To counteract this, the
    +    new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## commit.c ##
    +@@
    + #include "shallow.h"
    + #include "tree.h"
    + #include "hook.h"
    ++#include "parse.h"
    + 
    + static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
    + 
     @@ commit.c: int repo_parse_commit_internal(struct repository *r,
      		return -1;
      	if (item->object.parsed)
      		return 0;
     -	if (use_commit_graph && parse_commit_in_graph(r, item))
     +	if (use_commit_graph && parse_commit_in_graph(r, item)) {
    -+		if (!has_object(r, &item->object.oid, 0))
    ++		static int object_paranoia = -1;
    ++
    ++		if (object_paranoia == -1)
    ++			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
    ++
    ++		if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
    ++			unparse_commit(r, &item->object.oid);
     +			return quiet_on_missing ? -1 :
     +				error(_("commit %s exists in commit-graph but not in the object database"),
     +				      oid_to_hex(&item->object.oid));
    ++		}
    ++
      		return 0;
     +	}
      
    @@ commit.c: int repo_parse_commit_internal(struct repository *r,
      		return quiet_on_missing ? -1 :
     
      ## t/t5318-commit-graph.sh ##
    -@@ t/t5318-commit-graph.sh: test_expect_success 'overflow during generation version upgrade' '
    +@@ t/t5318-commit-graph.sh: test_expect_success 'stale commit cannot be parsed when given directly' '
      	)
      '
      
    -+test_expect_success 'commit exists in commit-graph but not in object database' '
    ++test_expect_success 'stale commit cannot be parsed when traversing graph' '
     +	test_when_finished "rm -rf repo" &&
     +	git init repo &&
     +	(
    @@ t/t5318-commit-graph.sh: test_expect_success 'overflow during generation version
     +		oid=$(git rev-parse B) &&
     +		rm .git/objects/"$(test_oid_to_path "$oid")" &&
     +
    ++		# Again, we should be able to parse the commit when not
    ++		# being paranoid about commit graph staleness...
    ++		GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
    ++		# ... but fail when we are paranoid.
     +		test_must_fail git rev-parse HEAD~2 2>error &&
     +		grep "error: commit $oid exists in commit-graph but not in the object database" error
     +	)
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [PATCH v2 1/2] commit-graph: introduce envvar to disable commit existence checks
From: Patrick Steinhardt @ 2023-10-23 11:27 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Taylor Blau, Jeff King
In-Reply-To: <cover.1698060036.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 5776 bytes --]

Our `lookup_commit_in_graph()` helper tries to look up commits from the
commit graph and, if it doesn't exist there, falls back to parsing it
from the object database instead. This is intended to speed up the
lookup of any such commit that exists in the database. There is an edge
case though where the commit exists in the graph, but not in the object
database. To avoid returning such stale commits the helper function thus
double checks that any such commit parsed from the graph also exists in
the object database. This makes the function safe to use even when
commit graphs aren't updated regularly.

We're about to introduce the same pattern into other parts of our code
base though, namely `repo_parse_commit_internal()`. Here the extra
sanity check is a bit of a tougher sell: `lookup_commit_in_graph()` was
a newly introduced helper, and as such there was no performance hit by
adding this sanity check. If we added `repo_parse_commit_internal()`
with that sanity check right from the beginning as well, this would
probably never have been an issue to begin with. But by retrofitting it
with this sanity check now we do add a performance regression to
preexisting code, and thus there is a desire to avoid this or at least
give an escape hatch.

In practice, there is no inherent reason why either of those functions
should have the sanity check whereas the other one does not: either both
of them are able to detect this issue or none of them should be. This
also means that the default of whether we do the check should likely be
the same for both. To err on the side of caution, we thus rather want to
make `repo_parse_commit_internal()` stricter than to loosen the checks
that we already have in `lookup_commit_in_graph()`.

The escape hatch is added in the form of a new GIT_COMMIT_GRAPH_PARANOIA
environment variable that mirrors GIT_REF_PARANOIA. If enabled, which is
the default, we will double check that commits looked up in the commit
graph via `lookup_commit_in_graph()` also exist in the object database.
This same check will also be added in `repo_parse_commit_internal()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git.txt   |  9 +++++++++
 commit-graph.c          |  6 +++++-
 commit-graph.h          |  6 ++++++
 t/t5318-commit-graph.sh | 21 +++++++++++++++++++++
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 11228956cd..22c2b537aa 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -911,6 +911,15 @@ for full details.
 	should not normally need to set this to `0`, but it may be
 	useful when trying to salvage data from a corrupted repository.

+`GIT_COMMIT_GRAPH_PARANOIA`::
+	If this Boolean environment variable is set to false, ignore the
+	case where commits exist in the commit graph but not in the
+	object database. Normally, Git will check whether commits loaded
+	from the commit graph exist in the object database to avoid
+	issues with stale commit graphs, but this check comes with a
+	performance penalty. The default is `1` (i.e., be paranoid about
+	stale commits in the commit graph).
+
 `GIT_ALLOW_PROTOCOL`::
 	If set to a colon-separated list of protocols, behave as if
 	`protocol.allow` is set to `never`, and each of the listed
diff --git a/commit-graph.c b/commit-graph.c
index fd2f700b2e..12ec31902e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -939,14 +939,18 @@ int repo_find_commit_pos_in_graph(struct repository *r, struct commit *c,

 struct commit *lookup_commit_in_graph(struct repository *repo, const struct object_id *id)
 {
+	static int object_paranoia = -1;
 	struct commit *commit;
 	uint32_t pos;

+	if (object_paranoia == -1)
+		object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
+
 	if (!prepare_commit_graph(repo))
 		return NULL;
 	if (!search_commit_pos_in_graph(id, repo->objects->commit_graph, &pos))
 		return NULL;
-	if (!has_object(repo, id, 0))
+	if (object_paranoia && !has_object(repo, id, 0))
 		return NULL;

 	commit = lookup_commit(repo, id);
diff --git a/commit-graph.h b/commit-graph.h
index 20ada7e891..bd4289620c 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -8,6 +8,12 @@
 #define GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE "GIT_TEST_COMMIT_GRAPH_DIE_ON_PARSE"
 #define GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS "GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS"

+/*
+ * This environment variable controls whether commits looked up via the
+ * commit graph will be double checked to exist in the object database.
+ */
+#define GIT_COMMIT_GRAPH_PARANOIA "GIT_COMMIT_GRAPH_PARANOIA"
+
 /*
  * This method is only used to enhance coverage of the commit-graph
  * feature in the test suite with the GIT_TEST_COMMIT_GRAPH and
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index ba65f17dd9..c0cc454538 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -821,4 +821,25 @@ test_expect_success 'overflow during generation version upgrade' '
 	)
 '

+test_expect_success 'stale commit cannot be parsed when given directly' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		test_commit B &&
+		git commit-graph write --reachable &&
+
+		oid=$(git rev-parse B) &&
+		rm .git/objects/"$(test_oid_to_path "$oid")" &&
+
+		# Verify that it is possible to read the commit from the
+		# commit graph when not being paranoid, ...
+		GIT_COMMIT_GRAPH_PARANOIA=false git rev-list B &&
+		# ... but parsing the commit when double checking that
+		# it actually exists in the object database should fail.
+		test_must_fail git rev-list -1 B
+	)
+'
+
 test_done
-- 
2.42.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v2 2/2] commit: detect commits that exist in commit-graph but not in the ODB
From: Patrick Steinhardt @ 2023-10-23 11:27 UTC (permalink / raw)
  To: git; +Cc: Karthik Nayak, Junio C Hamano, Taylor Blau, Jeff King
In-Reply-To: <cover.1698060036.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 6787 bytes --]

Commit graphs can become stale and contain references to commits that do
not exist in the object database anymore. Theoretically, this can lead
to a scenario where we are able to successfully look up any such commit
via the commit graph even though such a lookup would fail if done via
the object database directly.

As the commit graph is mostly intended as a sort of cache to speed up
parsing of commits we do not want to have diverging behaviour in a
repository with and a repository without commit graphs, no matter
whether they are stale or not. As commits are otherwise immutable, the
only thing that we really need to care about is thus the presence or
absence of a commit.

To address potentially stale commit data that may exist in the graph,
our `lookup_commit_in_graph()` function will check for the commit's
existence in both the commit graph, but also in the object database. So
even if we were able to look up the commit's data in the graph, we would
still pretend as if the commit didn't exist if it is missing in the
object database.

We don't have the same safety net in `parse_commit_in_graph_one()`
though. This function is mostly used internally in "commit-graph.c"
itself to validate the commit graph, and this usage is fine. We do
expose its functionality via `parse_commit_in_graph()` though, which
gets called by `repo_parse_commit_internal()`, and that function is in
turn used in many places in our codebase.

For all I can see this function is never used to directly turn an object
ID into a commit object without additional safety checks before or after
this lookup. What it is being used for though is to walk history via the
parent chain of commits. So when commits in the parent chain of a graph
walk are missing it is possible that we wouldn't notice if that missing
commit was part of the commit graph. Thus, a query like `git rev-parse
HEAD~2` can succeed even if the intermittent commit is missing.

It's unclear whether there are additional ways in which such stale
commit graphs can lead to problems. In any case, it feels like this is a
bigger bug waiting to happen when we gain additional direct or indirect
callers of `repo_parse_commit_internal()`. So let's fix the inconsistent
behaviour by checking for object existence via the object database, as
well.

This check of course comes with a performance penalty. The following
benchmarks have been executed in a clone of linux.git with stable tags
added:

    Benchmark 1: git -c core.commitGraph=true rev-list --topo-order --all (git = master)
      Time (mean ± σ):      2.913 s ±  0.018 s    [User: 2.363 s, System: 0.548 s]
      Range (min … max):    2.894 s …  2.950 s    10 runs

    Benchmark 2: git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
      Time (mean ± σ):      3.834 s ±  0.052 s    [User: 3.276 s, System: 0.556 s]
      Range (min … max):    3.780 s …  3.961 s    10 runs

    Benchmark 3: git -c core.commitGraph=false rev-list --topo-order --all (git = master)
      Time (mean ± σ):     13.841 s ±  0.084 s    [User: 13.152 s, System: 0.687 s]
      Range (min … max):   13.714 s … 13.995 s    10 runs

    Benchmark 4: git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
      Time (mean ± σ):     13.762 s ±  0.116 s    [User: 13.094 s, System: 0.667 s]
      Range (min … max):   13.645 s … 14.038 s    10 runs

    Summary
      git -c core.commitGraph=true rev-list --topo-order --all (git = master) ran
        1.32 ± 0.02 times faster than git -c core.commitGraph=true rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
        4.72 ± 0.05 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = pks-commit-graph-inconsistency)
        4.75 ± 0.04 times faster than git -c core.commitGraph=false rev-list --topo-order --all (git = master)

We look at a ~30% regression in general, but in general we're still a
whole lot faster than without the commit graph. To counteract this, the
new check can be turned off with the `GIT_COMMIT_GRAPH_PARANOIA` envvar.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 commit.c                | 16 +++++++++++++++-
 t/t5318-commit-graph.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index b3223478bc..7399e90212 100644
--- a/commit.c
+++ b/commit.c
@@ -28,6 +28,7 @@
 #include "shallow.h"
 #include "tree.h"
 #include "hook.h"
+#include "parse.h"

 static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);

@@ -572,8 +573,21 @@ int repo_parse_commit_internal(struct repository *r,
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	if (use_commit_graph && parse_commit_in_graph(r, item))
+	if (use_commit_graph && parse_commit_in_graph(r, item)) {
+		static int object_paranoia = -1;
+
+		if (object_paranoia == -1)
+			object_paranoia = git_env_bool(GIT_COMMIT_GRAPH_PARANOIA, 1);
+
+		if (object_paranoia && !has_object(r, &item->object.oid, 0)) {
+			unparse_commit(r, &item->object.oid);
+			return quiet_on_missing ? -1 :
+				error(_("commit %s exists in commit-graph but not in the object database"),
+				      oid_to_hex(&item->object.oid));
+		}
+
 		return 0;
+	}

 	if (oid_object_info_extended(r, &item->object.oid, &oi, flags) < 0)
 		return quiet_on_missing ? -1 :
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c0cc454538..79467d7926 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -842,4 +842,31 @@ test_expect_success 'stale commit cannot be parsed when given directly' '
 	)
 '

+test_expect_success 'stale commit cannot be parsed when traversing graph' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		test_commit B &&
+		test_commit C &&
+		git commit-graph write --reachable &&
+
+		# Corrupt the repository by deleting the intermittent commit
+		# object. Commands should notice that this object is absent and
+		# thus that the repository is corrupt even if the commit graph
+		# exists.
+		oid=$(git rev-parse B) &&
+		rm .git/objects/"$(test_oid_to_path "$oid")" &&
+
+		# Again, we should be able to parse the commit when not
+		# being paranoid about commit graph staleness...
+		GIT_COMMIT_GRAPH_PARANOIA=false git rev-parse HEAD~2 &&
+		# ... but fail when we are paranoid.
+		test_must_fail git rev-parse HEAD~2 2>error &&
+		grep "error: commit $oid exists in commit-graph but not in the object database" error
+	)
+'
+
 test_done
-- 
2.42.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* Re: [PATCH v1 4/4] maintenance: use XDG config if it exists
From: Eric Sunshine @ 2023-10-23 11:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Kristoffer Haugsbakk, git, stolee
In-Reply-To: <ZTZDsIcrB0zwHlFR@tanuki>

On Mon, Oct 23, 2023 at 5:58 AM Patrick Steinhardt <ps@pks.im> wrote:
> On Wed, Oct 18, 2023 at 10:28:41PM +0200, Kristoffer Haugsbakk wrote:
> > `git maintenance register` registers the repository in the user's global
> > config. `$XDG_CONFIG_HOME/git/config` is supposed to be used if
> > `~/.gitconfig` does not exist. However, this command creates a
> > `~/.gitconfig` file and writes to that one even though the XDG variant
> > exists.
> >
> > This used to work correctly until 50a044f1e4 (gc: replace config
> > subprocesses with API calls, 2022-09-27), when the command started calling
> > the config API instead of git-config(1).
> >
> > Also change `unregister` accordingly.
> >
> > Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
> > ---
> > +test_expect_success 'register uses XDG_CONFIG_HOME config if it exists' '
> > +     XDG_CONFIG_HOME=.config &&
> > +     test_when_finished rm -r "$XDG_CONFIG_HOME"/git/config &&
> > +     export "XDG_CONFIG_HOME" &&
>
> Also, I think we need to unset this variable at the end of this test as
> tests don't run in a subshell. [...]

Yup, well spotted. Almost the entire body of this test should be in a
subshell to ensure that the environment variable does not live beyond
the end of this test. But test_when_finished() can't be used in a
subshell, so a little care is needed:

    test_expect_success 'register uses XDG_CONFIG_HOME config if it exists' '
        test_when_finished rm -r .config/git/config &&
        (
            XDG_CONFIG_HOME=.config &&
            ...
        )
    '

> > +     mkdir -p "$XDG_CONFIG_HOME"/git &&
> > +     touch "$XDG_CONFIG_HOME"/git/config &&

If the timestamp of the file is not significant, then we use `>` to
create it rather than `touch`:

    >"$XDG_CONFIG_HOME"/git/config &&

> > +     git maintenance register &&
> > +     git config --file="$XDG_CONFIG_HOME"/git/config --get maintenance.repo >actual &&
> > +     pwd >expect &&
> > +     test_cmp expect actual
> > +'

^ permalink raw reply

* Re: [PATCH 00/11] t: reduce direct disk access to data structures
From: Patrick Steinhardt @ 2023-10-23 11:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Han-Wen Nienhuys, Jacob Keller
In-Reply-To: <xmqqlebzzeh8.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 1721 bytes --]

On Wed, Oct 18, 2023 at 04:40:35PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > As usual, I forgot to mention that this applies on top of current
> > master, which is at a9ecda2788 (The eighteenth batch, 2023-10-13) at the
> > time of writing. I look forward to the day where I remember to include
> > this information in the cover letter...
> 
> I heard rumors that we have a --base option that records such a
> piece of information in the format-patch output (I haven't used it
> myself).  Would it have helped, or perhaps it still needs some
> usability tweak?

I was aware of the option, but somehow never thought to use it. I guess
others are in the same boat as this is a semi-frequent topic to come up
on the mailing list.

What will fix this for me from now on is that I've now discovered the
`format.useAutoBase` config. To fix this for everybody else without
having to go through the same process we could change the default to
`format.useAutoBase=auto`. This can lead to hard-to-understand errors as
pointed out by 7efba5fa39f (format-patch: teach format.useAutoBase
"whenAble" option, 2020-10-01) though:

```
$ git format-patch -1 <an old commit>
fatal: base commit shouldn't be in revision list
```

But this was fixed via the new "whenAble" value for this configuration,
which will cause Git to ignore the case where it's unable to figure out
the base commit. So maybe we should consider to make it the new default?
I cannot think of any real downside. In the best case we simplify the
life of many maintainers of mailing list based projects. In the worst
case where we are unable to figure out the base commit nothing changes.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [RESEND v2] git-rebase.txt: rewrite docu for fixup/squash (again)
From: Oswald Buddenhagen @ 2023-10-23 13:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Phillip Wood, Christian Couder, Charvi Mendiratta,
	Marc Branchaud

Create a clear top-down structure which makes it hopefully unambiguous
what happens when.

Also mention the timestamp along with the author - this is primarily
meant to include the keywords somebody might be searching for, like I
did a year ago.

Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de>

---
v2:
- slight adjustments inspired by marc. however, i left most things
  unchanged or even went in the opposite direction, because i assume the
  readers to be sufficiently context-sensitive, and the objective is
  merely to be not actively confusing. adding redundancy in the name of
  clarity would just make the text stylistically inferior and arguably
  harder to read.

Cc: Junio C Hamano <gitster@pobox.com>
Cc: Phillip Wood <phillip.wood123@gmail.com>
Cc: Christian Couder <christian.couder@gmail.com>
Cc: Charvi Mendiratta <charvi077@gmail.com>
Cc: Marc Branchaud <marcnarc@xiplink.com>
---
 Documentation/git-rebase.txt | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index e7b39ad244..337df9ef2f 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -890,20 +890,21 @@ command "pick" with the command "reword".
 To drop a commit, replace the command "pick" with "drop", or just
 delete the matching line.
 
-If you want to fold two or more commits into one, replace the command
-"pick" for the second and subsequent commits with "squash" or "fixup".
-If the commits had different authors, the folded commit will be
-attributed to the author of the first commit.  The suggested commit
-message for the folded commit is the concatenation of the first
-commit's message with those identified by "squash" commands, omitting the
-messages of commits identified by "fixup" commands, unless "fixup -c"
-is used.  In that case the suggested commit message is only the message
-of the "fixup -c" commit, and an editor is opened allowing you to edit
-the message.  The contents (patch) of the "fixup -c" commit are still
-incorporated into the folded commit. If there is more than one "fixup -c"
-commit, the message from the final one is used.  You can also use
-"fixup -C" to get the same behavior as "fixup -c" except without opening
-an editor.
+If you want to fold two or more commits into one (that is, to combine
+their contents/patches), replace the command "pick" for the second and
+subsequent commits with "squash" or "fixup".
+The commit message for the folded commit is the concatenation of the
+message of the first commit with those of commits identified by "squash"
+commands, omitting those of commits identified by "fixup" commands,
+unless "fixup -c" is used. In the latter case, the message is obtained
+only from the "fixup -c" commit (having more than one of these is
+incorrect).
+If the resulting commit message is a concatenation of multiple messages,
+an editor is opened allowing you to edit it. This is also the case for a
+message obtained via "fixup -c", while using "fixup -C" instead skips
+the editor; this is analogous to the behavior of `git commit`.
+The first commit which contributes to the suggested commit message also
+determines the author, along with the date/timestamp.
 
 `git rebase` will stop when "pick" has been replaced with "edit" or
 when a command fails due to merge errors. When you are done editing
-- 
2.42.0.419.g70bf8a5751


^ permalink raw reply related

* Re: [Outreachy] Move existing tests to a unit testing framework
From: Christian Couder @ 2023-10-23 13:41 UTC (permalink / raw)
  To: Achu Luma; +Cc: Junio C Hamano, git
In-Reply-To: <CAFR+8Dwxr3iV+R7een0t2sYXUWu1XHhQcLVuqMhOsSg9Bt4wrg@mail.gmail.com>

On Fri, Oct 20, 2023 at 3:16 PM Achu Luma <ach.lumap@gmail.com> wrote:
>
> Dear Git Community and Mentors,
>
> I hope this email finds you well. As a follow-up to my previous application, I'd like to provide additional details on the process of migrating existing unit tests from the t/helper/ directory to the new Git unit test framework.

Thanks for these details!

> -- Identify Target Unit Tests: Start by identifying the specific unit tests in the t/helper/ directory that we want to port to the new Git unit test framework. Ensure that the tests are suitable for migration and that the benefits of doing so outweigh the effort(By avoiding integration tests). The following points have been developed with on going work on the unit-tests framework visible here:
>
> 1- https://lore.kernel.org/git/0169ce6fb9ccafc089b74ae406db0d1a8ff8ac65.1688165272.git.steadmon@google.com/
> 2- https://github.com/steadmon/git/blob/unit-tests-asciidoc/Documentation/technical/unit-tests.adoc

Maybe if you have time you could add some descriptions or comments
related to the above emails and documents. For example you could tell
what the new unit test framework will be like, how the unit tests will
look like, etc. Maybe a short overview would be nice.

You could also try to apply the patches in the series that adds the
test framework, or alternatively use the 'seen' branch where the
series has been merged, and start playing with it by writing, or
porting, a small example test.

> -- Create a New C Test File: For each unit test I plan to migrate, create a new C source file (.c) in the Git project's test suite directory(t/unit-tests). Name it appropriately to reflect the purpose of the test.

Could you provide an example of what the new name would be for an
existing test that is worth porting?

> --  Include Necessary Headers:In the new C test file, include the necessary Git unit test framework headers. Typically, this includes headers like "test-lib.h" and others relevant to the specific test.
> #include "test-lib.h"

Maybe you could continue the above example and tell which headers
would be needed for it?

> -- Convert Test Logic: Refactor the test logic from the original Shell script into the new C-based test format. Use the testing macros provided by the Git unit test framework, such as test_expect_success, test_expect_failure, etc., to define the tests.
> test_expect_success("simple progress display", "{
>     // Test logic here...
> }");

Ok, a simple example would be nice too.

> -- Add Test Descriptions: Provide clear and informative descriptions for each test using the testing macros. These descriptions will help in identifying the purpose of each test when the test suite is run.

This would seem to be part of the previous step, as you would have to
provide a description when using the testing macro. But Ok.

> -- Define a Test Entry Point: Create a cmd_main function as the entry point for the C-based tests. Inside this function, include the test functions using the testing macros.
> int cmd_main(int argc, const char **argv) {
>     // Test functions...
>     return test_done();
> }

Yeah, continuing an example would be nice.

> -- Ensure TAP Format Output: Ensure that the C-based tests produce output in the Test Anything Protocol (TAP) format. This format includes the test name, status (ok or not ok), and any diagnostic information.

That means using TEST* macros in the cmd_main() function, as they
should do the right thing or is there more to be done here?

> -- Test Interaction: Ensure that the migrated tests interact correctly with the new Git unit test framework and any other tests that may be relevant. Consider dependencies and interactions with other parts of the Git project.

I am not sure what work would be needed here. Is there more to do than
compiling the test files? Having an example would be nice.

> -- Test Execution: Run the migrated tests to verify that they produce the expected results when executed as part of the Git project's test suite. Use the Git testing framework's test runners to execute the tests.

Ok.

> -- Documentation Update: Update the Git project's documentation to reflect the changes made during the migration. Include a reference to the original unit tests in the t/helper/ directory and indicate that these tests have been ported to the new Git unit test framework.

I am not sure that we would want that. I think we might instead want
to document things in t/helper/ that we don't want to port to the new
unit test framework and why.

> By following these points, I think I can successfully port existing unit tests from the t/helper/ directory to use the new Git unit test framework. This migration helps standardize and streamline the testing process within the Git project, improving code quality and maintainability.

Yeah!

> Next Steps:
>
> I am eager to discuss these suggestions and collaborate with the Git community to ensure the success of this project. I will continue to engage with the community, seek guidance, and refine my proposal as per your suggestions.
>  I look forward to the opportunity to contribute to the Git project and help make it even more robust and reliable.

Thanks for this application and sorry for the late answer!

Best,
Christian.

^ permalink raw reply

* Re: [PATCH 00/11] t: reduce direct disk access to data structures
From: Patrick Steinhardt @ 2023-10-23 13:58 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Junio C Hamano, git
In-Reply-To: <CAFQ2z_Om724+o+EG1FAhC9VrvJECnQ5UA+Z04Rzycpi_mXvMHg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3406 bytes --]

On Thu, Oct 19, 2023 at 12:13:12PM +0200, Han-Wen Nienhuys wrote:
> On Wed, Oct 18, 2023 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> > > this patch series refactors a bunch of our tests to perform less direct
> > > disk access to on-disk data structures. Instead, the tests are converted
> > > to use Git tools or our test-tool to access data to the best extent
> > > possible. This serves two benefits:
> >
> > Laudable goal.
> >
> > >     - We increase test coverage of our own code base.
> >
> > Meaning the new code added to test-tool for this series will also
> > get tested and bugs spotted?

For now all the helpers of the test-tool only have implicit test
coverage, but I get your point. If we decide to instead transform this
test tool into production-level code as you suggested (e.g. `git
rev-parse --exists`) then this becomes less of a discussion point as we
would of course have proper test coverage for it.

> > >     - We become less dependent on the actual on-disk format.
> >
> > Yes, this is very desirable.  Without looking at the implementation,
> > I see some issues aiming for this goal may involve. [a] using the
> > production code for validation would mean our expectation to be
> > compared to the reality to be validated can be affected by the same
> > bug, making two wrongs to appear right; [b] using a separate
> > implementation used only for validation would at least mean we will
> > have to make the same mistake in unique part of both implementations
> > that is less likely to miss bugs compared to [a], but bugs in shared
> > part of the production code and validation code will be hidden the
> > same way as [a].
> 
> I think it would be really great if there were separate unittests for
> the ref backend API. Some of the reftable work was needlessly
> difficult because the contract of the API was underspecified. The API
> is well compartmentalized in refs-internal.h, and a lot of the API
> behavior can be tested as a black box, eg.
> 
> * setup symref HEAD pointing to R1
> * setup transaction updating ref R1 from C1 to C2
> * commit transaction, check that it succeeds
> * read ref R1, check if it is C2
> * read reflog for R1, see that it has a C1 => C2 update
> * read reflog for HEAD, see that it has a C1 => C2 update
> 
> Tests for the loose/packed backend could directly mess with the
> on-disk files to test failure scenarios.
> 
> With unittests like that, the tests can zoom in on the functionality
> of the ref backend, and provide more convenient coverage for
> dynamic/static analysis.

Agreed. Ideally, I think our tests should be split up into two layers:

    1. Low-level tests which are backend specific. These allow us to
       assert the on-disk state of the respective backend thoroughly and
       also allow us to explicitly test for edge cases that are only of
       relevance to this specific backend.

       The reftable backend in its current (non-upstream) already has
       such tests, but we don't really have explicit tests for the files
       backend. This is a gap that should ideally be filled at some
       point in time.

    2. Higher-level tests should then be allowed to assume that the
       underlying logic works as expected. These tests are thus free to
       use plumbing tools that tie into the reference backends.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 02/11] t: allow skipping expected object ID in `ref-store update-ref`
From: Patrick Steinhardt @ 2023-10-23 13:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <xmqqil73ud5j.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 2507 bytes --]

On Wed, Oct 18, 2023 at 09:08:08AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > We require the caller to pass both the old and new expected object ID to
> > our `test-tool ref-store update-ref` helper. When trying to update a
> > symbolic reference though it's impossible to specify the expected object
> > ID, which means that the test would instead have to force-update the
> > reference. This is currently impossible though.
> >
> > Update the helper to optionally skip verification of the old object ID
> > in case the test passes in an empty old object ID as input.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  t/helper/test-ref-store.c | 11 +++++++----
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> Good.
> 
> Even better would be to make the old one optional, though.

I was also a bit torn when writing this. We could of course make the
behaviour conditional on whether `argc` is 4 or 5. But I wasn't quite
sure how important it is to provide a nice UI for this test helper, and
we don't have `argc` readily available. It's not hard to count them
manually, but until now I was under the impression that the test helpers
only need to be "good enough".

Patrick

> > diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
> > index 7400f560ab6..7dc83137584 100644
> > --- a/t/helper/test-ref-store.c
> > +++ b/t/helper/test-ref-store.c
> > @@ -322,16 +322,19 @@ static int cmd_update_ref(struct ref_store *refs, const char **argv)
> >  	const char *new_sha1_buf = notnull(*argv++, "new-sha1");
> >  	const char *old_sha1_buf = notnull(*argv++, "old-sha1");
> >  	unsigned int flags = arg_flags(*argv++, "flags", transaction_flags);
> > -	struct object_id old_oid;
> > +	struct object_id old_oid, *old_oid_ptr = NULL;
> >  	struct object_id new_oid;
> >  
> > -	if (get_oid_hex(old_sha1_buf, &old_oid))
> > -		die("cannot parse %s as %s", old_sha1_buf, the_hash_algo->name);
> > +	if (*old_sha1_buf) {
> > +		if (get_oid_hex(old_sha1_buf, &old_oid))
> > +			die("cannot parse %s as %s", old_sha1_buf, the_hash_algo->name);
> > +		old_oid_ptr = &old_oid;
> > +	}
> >  	if (get_oid_hex(new_sha1_buf, &new_oid))
> >  		die("cannot parse %s as %s", new_sha1_buf, the_hash_algo->name);
> >  
> >  	return refs_update_ref(refs, msg, refname,
> > -			       &new_oid, &old_oid,
> > +			       &new_oid, old_oid_ptr,
> >  			       flags, UPDATE_REFS_DIE_ON_ERR);
> >  }
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 04/11] t: convert tests to not write references via the filesystem
From: Patrick Steinhardt @ 2023-10-23 13:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <xmqq1qdru6ds.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 4699 bytes --]

On Wed, Oct 18, 2023 at 11:34:23AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > -test_expect_success "fail to create $n" '
> > -	test_when_finished "rm -f .git/$n_dir" &&
> > -	touch .git/$n_dir &&
> > -	test_must_fail git update-ref $n $A
> > +test_expect_success "fail to create $n due to file/directory conflict" '
> > +	test_when_finished "git update-ref -d refs/heads/gu" &&
> > +	git update-ref refs/heads/gu $A &&
> > +	test_must_fail git update-ref refs/heads/gu/fixes $A
> >  '
> 
> OK, the original checks "if a random garbage file, which may not
> necessarily be a ref, exists at $n_dir, we cannot create a ref at
> $n_dir/fixes, due to D/F conflict" more directly, but as long as our
> intention is to enforce the D/F restriction across different ref
> backends [*], creating a ref at $n_dir and making sure $n_dir/fixes
> cannot be created is an equivalent check that is better (because it
> can be applied for other backends).
> 
>     Side note: there is no fundamental need to, though, and there
>          are cases where being able to have the 'seen' branch and
>          'seen/ps/ref-test-tools' branches at the same time is
>          beneficial---packed-refs and ref-table backends would not
>          have such an inherent limitation, but they can of course be
>          castrated to match what files-backend can(not) do.

I think initially it is beneficial to keep any such restriction and cut
back new backends to match them, even though it's more work. The main
reason why I think this makes sense is that hosting providers are the
most likely to update to the newer backend fast. It would be detrimental
to the users though if hosting providers that converted to the newer
backend were able to store such references that would be conflicting
with the files backend because they wouldn't be able to clone such
repositories anymore. And this isn't only true for hosting providers,
but essentially whenever two repositories that use different backends
try to interact with each other.

So even though I wish for a future where we don't need to care for D/F
conflicts anymore, I think the time isn't ripe for it and we should for
now aim for matching behaviour.

> > @@ -222,7 +220,7 @@ test_expect_success 'delete symref without dereference when the referred ref is
> >  
> >  test_expect_success 'update-ref -d is not confused by self-reference' '
> >  	git symbolic-ref refs/heads/self refs/heads/self &&
> > -	test_when_finished "rm -f .git/refs/heads/self" &&
> > +	test_when_finished "test-tool ref-store main delete-refs REF_NO_DEREF refs/heads/self" &&
> >  	test_path_is_file .git/refs/heads/self &&
> 
> I trust that this will be corrected to use some wrapper around "git
> symbolic-ref" (or an equivalent for it as a test-tool subcommand) in
> some future patch, if not in this series?

Yup, this is getting fixed in a subsequent patch. I had two different
options to structure this series:

    - Either by test file so that questions like this don't come up when
      a reviewer sees in the context that there are still tests that
      exercise the filesystem directly. This was my first approach.

    - The alternative is to write it such that we address common
      patterns globally, which I then rewrote my first version to.

There were two reasons why I didn't like the first iteration:

    - Commit messages would basically have to reexplain the exact same
      thing for every single testcase as the motivation is the same
      everywhere.

    - I think it's easier to review when you only see the same kind of
      transformation per patch.

But yes, it does have the downside that the reader is now left wondering
why the other call to `test_path_is_file` still exists.

Patrick

> >  	test_must_fail git update-ref -d refs/heads/self &&
> >  	test_path_is_file .git/refs/heads/self
> 
> Likewise.
> 
> > @@ -230,7 +228,7 @@ test_expect_success 'update-ref -d is not confused by self-reference' '
> >  
> >  test_expect_success 'update-ref --no-deref -d can delete self-reference' '
> >  	git symbolic-ref refs/heads/self refs/heads/self &&
> > -	test_when_finished "rm -f .git/refs/heads/self" &&
> > +	test_when_finished "test-tool ref-store main delete-refs REF_NO_DEREF refs/heads/self" &&
> >  	test_path_is_file .git/refs/heads/self &&
> >  	git update-ref --no-deref -d refs/heads/self &&
> >  	test_must_fail git show-ref --verify -q refs/heads/self
> 
> We already have the "ref is missing" test here.
> 
> I'll stop at this point for now; will hopefully continue in a
> separate message later.  Thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 04/11] t: convert tests to not write references via the filesystem
From: Patrick Steinhardt @ 2023-10-23 13:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <xmqqfs27r5ng.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 5886 bytes --]

On Wed, Oct 18, 2023 at 02:18:27PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > @@ -434,7 +432,7 @@ test_expect_success 'Query "main@{2005-05-28}" (past end of history)' '
> >  	test_i18ngrep -F "warning: log for ref $m unexpectedly ended on $ld" e
> >  '
> >  
> > -rm -f .git/$m .git/logs/$m expect
> > +git update-ref -d $m
> 
> We are not clearing "expect" file.  I do not know if it matters
> here, but I am only recording what I noticed.

Oops, will fix.

> > diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> > index 10a539158c4..5cce24f1006 100755
> > --- a/t/t1450-fsck.sh
> > +++ b/t/t1450-fsck.sh
> > @@ -115,15 +115,16 @@ test_expect_success 'zlib corrupt loose object output ' '
> >  '
> >  
> >  test_expect_success 'branch pointing to non-commit' '
> > -	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
> > +	tree_oid=$(git rev-parse --verify HEAD^{tree}) &&
> > +	test-tool ref-store main update-ref msg refs/heads/invalid $tree_oid $ZERO_OID REF_SKIP_OID_VERIFICATION &&
> 
> I have mixed feelings on this.
> 
> In olden days, plumbing commands tended to allow to pass anything
> the user told them to use, but in more recent versions of Git, we,
> probably by mistake, managed to butcher some of the plumbing
> commands to make them unable to deliberately "break" repositories,
> one victim being "update-ref", i.e.
> 
>     $ git update-ref refs/heads/invalid HEAD^{tree}
> 
> is rejected these days (I just checked with v1.3.0 and it allows me
> to do this), and that is one of the reasons why we manually broke
> the repository in these tests.

My first try was indeed to use git-update-ref(1) to update the ref to
the tree object, and I was surprised to learn that it would not let me
do so.

> We need to have a warning message in
> comments near the implementation of "ref-store update-ref" that says
> never ever attempt to share code with the production version of
> update-ref---otherwise this (or the "safety" given to the plumbing
> command, possibly by mistake) will be broken, depending on which
> direction such a sharing goes.  On the other hand, forcing us to
> keep two separate implementations, one deliberately loose to allow
> us corrupt repositories, the other for production and actively used,
> would mean the former one that is only used for validation would risk
> bitrotting.

Wouldn't any eventual bitrot be contained by the tests though? As our
use of the test helper grows via this patch series its behaviour will
also be verified more thoroughly.

> >  	test_when_finished "git update-ref -d refs/heads/invalid" &&
> 
> Not a problem this patch introduces, but I think it is a better
> discipline to have when_finished clean-up routine prepared before we
> do actual damage, i.e. if I were writing this test today from scratch,
> I would expect it to be before "git rev-parse >.git/refs/heads/invalid"
> is done.

I'll fix this while at it.

> >  	test_must_fail git fsck 2>out &&
> >  	test_i18ngrep "not a commit" out
> >  '
> 
> A #leftoverbit that is not relevant to the topic; we should clean
> these test_i18ngrep and replace them with a plain "grep".
> 
> >  test_expect_success 'HEAD link pointing at a funny object' '
> > -	test_when_finished "mv .git/SAVED_HEAD .git/HEAD" &&
> > -	mv .git/HEAD .git/SAVED_HEAD &&
> > +	saved_head=$(git rev-parse --verify HEAD) &&
> > +	test_when_finished "git update-ref HEAD ${saved_head}" &&
> >  	echo $ZERO_OID >.git/HEAD &&
> 
> Are you sure .git/HEAD when this test is entered is a detached HEAD?
> The title of the test says "HEAD link", and I take it to mean HEAD
> is a symlink, and we save it away, while we create a loose ref that
> points at 0{40} in a detached HEAD state.  Actually, the original
> would also work if HEAD is detached on entry.  In either case,
> moving SAVED_HEAD back to HEAD would restore the original state.
> 
> But the updated one only works if HEAD upon entry is already
> detached.  Is this intended?

Yes and no -- it's a reflection of the state when this test runs. One
problem in this test suite here is that many of the tests' states are
heavily interwoven with each other, which only makes this harder to
refactor without making any assumptions.

Well. Instead of restoring to whatever the state was previous to the
test we could also restore it to something sane-ish like "master". That
of course breaks other tests though... I'll investigate.

> > @@ -131,8 +132,8 @@ test_expect_success 'HEAD link pointing at a funny object' '
> >  '
> >  
> >  test_expect_success 'HEAD link pointing at a funny place' '
> > -	test_when_finished "mv .git/SAVED_HEAD .git/HEAD" &&
> > -	mv .git/HEAD .git/SAVED_HEAD &&
> > +	saved_head=$(git rev-parse --verify HEAD) &&
> > +	test_when_finished "git update-ref --no-deref HEAD ${saved_head}" &&
> 
> Likewise.  Use of "update-ref" in the previous one vs "update-ref
> --no-deref" in this one to recover from the damage the tests make
> makes me feel that we may be assuming too much.
> 
> >  	echo "ref: refs/funny/place" >.git/HEAD &&
> 
> Even though "git symbolic-ref" refuses to point HEAD outside refs/,
> as plumbing command should, it allows it to point it outside refs/heads/.
> so this line should probably become
> 
> 	git symbolic-ref HEAD refs/funny/place
> 
> in the same spirit as the rest of the series.

Yup, this will be adressed in a later patch.

Patrick

> > @@ -391,7 +393,7 @@ test_expect_success 'tag pointing to nonexistent' '
> >  
> >  	tag=$(git hash-object -t tag -w --stdin <invalid-tag) &&
> >  	test_when_finished "remove_object $tag" &&
> > -	echo $tag >.git/refs/tags/invalid &&
> > +	git update-ref refs/tags/invalid $tag &&
> 
> Good (not just this one, but similar ones throughout this patch).
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 08/11] t4207: delete replace references via git-update-ref(1)
From: Patrick Steinhardt @ 2023-10-23 13:58 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: git
In-Reply-To: <CAFQ2z_MiOazk8qLvEn5xfHyMwY=5pa6rEJD3vC3+rnHz4AeQ2Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

On Wed, Oct 18, 2023 at 03:27:11PM +0200, Han-Wen Nienhuys wrote:
> On Wed, Oct 18, 2023 at 7:35 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > In t4207 we set up a set of replace objects via git-replace(1). Because
> > these references should not be impacting subsequent tests we also set up
> > some cleanup logic that deletes the replacement references via a call to
> > `rm -rf`. This reaches into the internal implementation details of the
> > reference backend and will thus break when we grow an alternative refdb
> > implementation.
> >
> > Refactor the tests to delete the replacement refs via Git commands so
> > that we become independent of the actual refdb that's in use. As we
> > don't have a nice way to delete all replacements or all references in a
> > certain namespace, we opt for a combination of git-for-each-ref(1) and
> > git-update-ref(1)'s `--stdin` mode.
> 
> There is a test helper that can directly access the ref database,
> t/helper/test-ref-store.c.
> 
> If you use that manipulate refs for testing purposes, you make the
> test independent of behavior git-for-each-ref/git-update-ref, which is
> what you want for testing replace-objects?

Is there any specific reason why we shouldn't be using
git-for-each-ref(1) or git-update-ref(1) here? Neither of those commands
are part of the system under test, as we rather care about git-log(1)
here. So as those commands are already being verified in other tests I
think it should be fine to assume that they work as intended here.

Happy to hear differing viewpoints though.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 01/11] t: add helpers to test for reference existence
From: Patrick Steinhardt @ 2023-10-23 13:58 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <CAPig+cTueWuiWiw9_vou0Ud71utVMRkk5tR==tfkkMXBj9qWxA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2064 bytes --]

On Wed, Oct 18, 2023 at 01:08:45PM -0400, Eric Sunshine wrote:
> On Wed, Oct 18, 2023 at 1:35 AM Patrick Steinhardt <ps@pks.im> wrote:
> > Introduce a new subcommand for our ref-store test helper that explicitly
> > checks only for the presence or absence of a reference. This addresses
> > these limitations:
> > [...]
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> > diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
> > @@ -221,6 +221,30 @@ static int cmd_verify_ref(struct ref_store *refs, const char **argv)
> > +static int cmd_ref_exists(struct ref_store *refs, const char **argv)
> > +{
> > +       const char *refname = notnull(*argv++, "refname");
> > +       struct strbuf unused_referent = STRBUF_INIT;
> > +       struct object_id unused_oid;
> > +       unsigned int unused_type;
> > +       int failure_errno;
> > +
> > +       if (refs_read_raw_ref(refs, refname, &unused_oid, &unused_referent,
> > +                             &unused_type, &failure_errno)) {
> > +               /*
> > +                * We handle ENOENT separately here such that it is possible to
> > +                * distinguish actually-missing references from any kind of
> > +                * generic error.
> > +                */
> > +               if (failure_errno == ENOENT)
> > +                       return 17;
> > +               return -1;
> > +       }
> > +
> > +       strbuf_release(&unused_referent);
> > +       return 0;
> > +}
> 
> Unless refs_read_raw_ref() guarantees that `unused_referent` remains
> unallocated upon failure[*], then the early returns inside the
> conditional leak the strbuf. True, the program is exiting immediately
> anyhow, so this (potential) leak isn't significant, but it seems odd
> to clean up in one case (return 0) but not in the others (return -1 &
> 17).
> 
> [*] In my (admittedly brief) scan of the code and documentation, I
> didn't see any such promise.

Agreed, let's just be thorough and plug any potential memor leak here.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 01/11] t: add helpers to test for reference existence
From: Patrick Steinhardt @ 2023-10-23 13:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Han-Wen Nienhuys
In-Reply-To: <xmqqmswfud7p.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 9775 bytes --]

On Wed, Oct 18, 2023 at 09:06:50AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > There are two major ways to check for the existence of a reference in
> > our tests:
> >
> >     - `git rev-parse --verify` can be used to check for existence of a
> >       reference. This only works in the case where the reference is well
> >       formed though and resolves to an actual object ID. This does not
> >       work with malformed reference names or invalid contents.
> >
> >     - `test_path_is_file` can be used to check for existence of a loose
> >       reference if it is known to not resolve to an actual object ID. It
> >       by necessity reaches into implementation details of the reference
> >       backend though.
> 
> True.  It would be ideal if we can limit the use of latter when we
> _care_ how the ref is stored (e.g., "we expect it to be stored as a
> loose ref, not packed").  "The ref R at must be pointing at the
> commit X" is better asserted by using the former (or "git show-ref")
> as we not just only want to see the .git/refs/R file holding the
> object name X, but also want to see the production git tools observe
> the same---if what rev-parse or show-ref observes is different from
> the expected state and they say ref R does not point at commit X, we
> should complain (rev-parse or show-ref may be broken in the version
> of Git being tested, but we can assume that their breakage will be
> caught elsewhere in the test suite as well, so as long as we trust
> them, using them as the validator is better than going into the
> implementation detail and assuming things like "new refs always
> appear as a loose ref" that we might want to change in the future).
> 
> > Similarly, there are two equivalent ways to check for the absence of a
> > reference:
> >
> >     - `test_must_fail git rev-parse` can be used to check for the
> >       absence of a reference. It could fail due to a number of reasons
> >       though, and all of these reasons will be thrown into the same bag
> >       as an absent reference.
> >
> >     - `test_path_is_missing` can be used to check explicitly for the
> >       absence of a loose reference, but again reaches into internal
> >       implementation details of the reference backend.
> >
> > So both our tooling to check for the presence and for the absence of
> > references in tests is lacking as either failure cases are thrown into
> > the same bag or we need to reach into internal implementation details of
> > the respective reference backend.
> 
> > Introduce a new subcommand for our ref-store test helper that explicitly
> > checks only for the presence or absence of a reference. This addresses
> > these limitations:
> >
> >     - We can check for the presence of references with malformed names.
> 
> But for the purpose of tests, we can control the input.  When we
> perform an operation that we expect a ref R to be created, we would
> know R is well formed and we can validate using a tool that we know
> would be broken when fed a malformed name.  So I do not see this as
> a huge "limitation".

This is explicitly about the case where such a ref R is not well-formed
though. This limitation was mostly a problem in t1430-bad-ref-name.sh,
which verifies many such scenarios.

> >     - We can check for the presence of references that don't resolve.
> 
> Do you mean a dangling symbolic ref?  We are using a wrong tool if
> you are using rev-parse for that, aren't we?  Isn't symbolic-ref
> there for us for this exact use case?  That is

Again, t1430-bad-ref-name.sh has been the inspiration for this:

```
$ git symbolic-ref refs/heads/bad...name refs/heads/master
$ git symbolic-ref refs/heads/bad...name
fatal: No such ref: refs/heads/bad...name
```

The mismatch that you can write but not read the reference is kind of
astonishing though. We could fix this limitation, but I think there were
more usecases than only bad reference names. I honestly can't quite
remember right now.

> >     - We can explicitly handle the case where a reference is missing by
> >       special-casing ENOENT errors.
> 
> You probably know the error conditions refs_read_raw_ref() can be
> broken better than I do, but this feels a bit too intimate with how
> the method for the files backend happens to be implemented, which at
> the same time, can risk that [a] other backends can implement their
> "ref does not resolve to an object name---is it because it is
> missing?" report incorrectly and [b] we would eventually want to
> know error conditions other than "the ref requested is missing" and
> at that point we would need more "special casing", which does not
> smell easy to scale.

We actually rely on some of these error codes to be consistent across
backends. E.g. "refs.c" itself has higher-level logic that verifies
specific error codes when resolving symrefs. And as we explicitly made
these error codes part of the API design with `refs_read_raw_ref()` my
assumption is that any other backend needs to match the behaviour here.

I also think that this is a somewhat sane assumption to make. While it
may not be a good idea to tie this to standard error codes, the backend
should indeed be able to signal specific error cases to the caller. We
could refactor this to be more explicit about the expected failure cases
in the form of specialized error codes. But I'm not sure whether that
would be worth it for now, but it's sure something to keep in mind for
future patch series.

> >     - We don't need to reach into implementation details of the backend,
> >       which would allow us to use this helper for the future reftable
> >       backend.
> 
> This is exactly what we want to aim for.
> 
> > Next to this subcommand we also provide two wrappers `test_ref_exists`
> > and `test_ref_missing` that make the helper easier to use.
> 
> Hmmmm.  This may introduce "who watches the watchers" problem, no?
> I briefly wondered if a better approach is to teach the production
> code, e.g., rev-parse, to optionally give more detailed diag.  It
> essentially may be the same (making the code in test-ref-store.c
> added by this patch available from rev-parse, we would easily get
> there), so I do not think the distinction matters.
> 
> > diff --git a/t/README b/t/README
> > index 61080859899..779f7e7dd86 100644
> > --- a/t/README
> > +++ b/t/README
> > @@ -928,6 +928,15 @@ see test-lib-functions.sh for the full list and their options.
> >     committer times to defined state.  Subsequent calls will
> >     advance the times by a fixed amount.
> >  
> > + - test_ref_exists <ref>, test_ref_missing <ref>
> > +
> > +   Check whether a reference exists or is missing. In contrast to
> > +   git-rev-parse(1), these helpers also work with invalid reference
> > +   names and references whose contents are unresolvable. The latter
> > +   function also distinguishes generic errors from the case where a
> > +   reference explicitly doesn't exist and is thus safer to use than
> > +   `test_must_fail git rev-parse`.
> > +
> >   - test_commit <message> [<filename> [<contents>]]
> >  
> >     Creates a commit with the given message, committing the given
> > diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
> > index 48552e6a9e0..7400f560ab6 100644
> > --- a/t/helper/test-ref-store.c
> > +++ b/t/helper/test-ref-store.c
> > @@ -1,6 +1,6 @@
> >  #include "test-tool.h"
> >  #include "hex.h"
> > -#include "refs.h"
> > +#include "refs/refs-internal.h"
> >  #include "setup.h"
> >  #include "worktree.h"
> >  #include "object-store-ll.h"
> > @@ -221,6 +221,30 @@ static int cmd_verify_ref(struct ref_store *refs, const char **argv)
> >  	return ret;
> >  }
> >  
> > +static int cmd_ref_exists(struct ref_store *refs, const char **argv)
> > +{
> > +	const char *refname = notnull(*argv++, "refname");
> > +	struct strbuf unused_referent = STRBUF_INIT;
> > +	struct object_id unused_oid;
> > +	unsigned int unused_type;
> > +	int failure_errno;
> > +
> > +	if (refs_read_raw_ref(refs, refname, &unused_oid, &unused_referent,
> > +			      &unused_type, &failure_errno)) {
> > +		/*
> > +		 * We handle ENOENT separately here such that it is possible to
> > +		 * distinguish actually-missing references from any kind of
> > +		 * generic error.
> > +		 */
> > +		if (failure_errno == ENOENT)
> > +			return 17;
> 
> Can we tell between the cases where the ref itself is missing, and
> the requested ref is symbolic and points at a missing ref?  This
> particular case might be OK, but there may other cases where this
> "special case" may not be narrow enough.

Yes, because `refs_read_raw_ref()` doesn't concern itself with recursive
resolving of the reference, which is done at a higher level. It only
reads and parses the reference without caring whether their target
actually exists.

> As long we are going to spend cycles to refine the classification of
> error conditions, which is a very good thing to aim for the reason
> described in the proposed log message, namely "rev-parse can fail
> for reasons other than the ref being absent", I have to wonder again
> that the fruit of such an effort should become available in the
> production code, instead of being kept only in test-tool.

Fair enough, I'm happy to lift this up into production code. I just
didn't think this would be all that useful in general, but I can see
that somebody might want to use such functionality as part of our
plumbing interfaces.

I wonder what the best spot would be for it. Should we add a new
`--exists` switch to git-rev-parse(1)?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox