Git development
 help / color / mirror / Atom feed
* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
From: Jeff King @ 2023-12-14 21:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Patrick Steinhardt, git, Taylor Blau,
	Carlos Andrés Ramírez Cataño
In-Reply-To: <xmqqy1dynofo.fsf@gitster.g>

On Wed, Dec 13, 2023 at 06:54:03AM -0800, Junio C Hamano wrote:

> I actually had trouble with the proposed update, and wondered if
> 
> -	while ((c = *in++) != 0) {
> +	while ((c = *in)) {
> +		in++;
> 
> is easier to follow, but then was hit by the possibility that the
> same "we have incremented 'in' a bit too early" may exist if such
> a loop wants to use 'in' in its body.  Wouldn't it mean that
> 
> -	while ((c = *in++) != 0) {
> +	for (; c = *in; in++) {
> 
> would be even a better rewrite?

No, the "for" loop wouldn't work, because the loop body actually depends
on "in" having already been incremented. If we find the end of the
comment or quoted string, we return "in", and the caller is expecting it
to have moved past the closing quote. So that would have to become
"return in+1".

IOW, the issue is that the normal end-of-quote parsing and hitting
end-of-string are fundamentally different. So we either need to
differentiate the returns (either with "+1" on one, or "-1" on the
other). Or we need to choose to increment "in" based on which we found
(which is what my patch does).

-Peff

^ permalink raw reply

* Re: [PATCH 2/1] test-lib-functions: add object size functions
From: Jeff King @ 2023-12-14 20:59 UTC (permalink / raw)
  To: René Scharfe
  Cc: git, Ondrej Pohorelsky, brian m . carlson, Junio C Hamano
In-Reply-To: <65557f2d-9de0-49ae-a858-80476aa52b68@web.de>

On Wed, Dec 13, 2023 at 01:28:56PM +0100, René Scharfe wrote:

> Add test_object_size and its helpers test_loose_object_size and
> test_packed_object_size, which allow determining the size of a Git
> object using only the low-level Git commands rev-parse and show-index.
> 
> Use it in t6300 to replace the bare-bones function test_object_file_size
> as a motivating example.  There it provides the expected output of the
> high-level Git command for-each-ref.

This adds a packed-object function, but I doubt anybody actually calls
it. If we're going to do that, it's probably worth adding some tests for
"cat-file --batch-check" or similar.

At which point I wonder if rather than having a function for a single
object, we are better off just testing the result of:

  git cat-file --batch-all-objects --unordered --batch-check='%(objectsize:disk)'

against a single post-processed "show-index" invocation.

> So how about this?  I'm a bit nervous about all the rules about output
> descriptors and error propagation and whatnot in the test library, but
> this implementation seems simple enough and might be useful in more than
> one test.  No idea how to add support for alternate object directories,
> but I doubt we'll ever need it.

I'm not sure that we need to do anything special with output
redirection. Shouldn't these functions just send errors to stderr as
usual? If they are run inside a test_expect block, that goes to
descriptor 4 (which is either /dev/null or the original stderr,
depending on whether "-v" was used).

> +test_loose_object_size () {
> +	test "$#" -ne 1 && BUG "1 param"
> +	local path=$(test_oid_to_path "$1")
> +	test_file_size "$(git rev-parse --git-path "objects/$path")" 2>&4
> +}

OK. We lose the exit code from "rev-parse" but that is probably OK for
our purposes.

> +test_packed_object_size () {
> +	test "$#" -ne 2 && BUG "2 params"
> +	local oid=$1 idx=$2 packsize rawsz end
> +
> +	packsize=$(test_file_size "${idx%.idx}.pack")
> +	rawsz=$(test_oid rawsz)
> +	end=$(($packsize - $rawsz))

OK, this $end is the magic required for the final entry. Makes sense.

> +	git show-index <"$idx" |
> +	awk -v oid="$oid" -v end="$end" '
> +		$2 == oid {start = $1}
> +		{offsets[$1] = 1}
> +		END {
> +			if (!start || start >= end)
> +				exit 1
> +			for (o in offsets)
> +				if (start < o && o < end)
> +					end = o
> +			print end - start
> +		}
> +	' && return 0

I was confused at first, because I didn't see any sorting happening. But
if I understand correctly, you're just looking for the smallest "end"
that comes after the start of the object we're looking for. Which I
think works.

-Peff

^ permalink raw reply

* Re: [PATCH] tests: prefer host Git to verify chainlint self-checks
From: Junio C Hamano @ 2023-12-14 19:13 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Patrick Steinhardt, git
In-Reply-To: <CAPig+cQvcSeSKVE=0kDyNiSztNAgVwhfAzoL5K7uYHEKe=0f_A@mail.gmail.com>

Eric Sunshine <sunshine@sunshineco.com> writes:

> (The only postprocessing of "expect" files which needs to stay is the
> bit which removes the "# LINT:" comments which litter the "expect"
> files explaining to human readers why the linter should insert a
> "???FOO???" annotation at that particular point.)

Yup, that matches my understanding.

> The chainlint self-tests were never meant to be about its general
> output stability. They were intended to ensure that the "???FOO???"
> annotations are: (1) indeed inserted for the set of linting problems
> the tool detects, and (2) inserted at the correct spot in the emitted
> output relative to the shell tokens to which the annotation applies.

Yup.

> Minor differences in the tool's output (whether over time or between
> platforms) should be immaterial in respect to those correctness goals.

But there is no reason to make such immaterial changes to the output
gratuitously when we are updating the tool to improve it, no?

^ permalink raw reply

* Re: completing an existing patch
From: Eric Sunshine @ 2023-12-14 18:37 UTC (permalink / raw)
  To: Marzi Esipreh; +Cc: git
In-Reply-To: <CAN0Ui1RM-R+yX=LK+ir_WEzAYRJnT-WTn427JbNJjTNTiQfa4w@mail.gmail.com>

On Thu, Dec 14, 2023 at 8:21 AM Marzi Esipreh <marzi.esipreh@uber.com> wrote:
> We came across this PR: https://github.com/git/git/pull/1352 that is
> improving git status performance on linux platforms, we tried it out,
> and we are happy with the result.
> I was in contact with the author of this patch, and I addressed the PR
> comments as well.
> Please let me know how I can proceed? Shall I create a new fresh PR,
> and refer to existing one in the descriptions?

The general answer is that you can take over a stalled patch series in
order to move it forward by rerolling the series with changes which
address reviewer comments from the previous rounds, and send the
series to the mailing list with a cover letter explaining the
situation and enumerating the changes you made since the previous
version. Standard practice is to retain the original authorship of the
patches[1] and keep the original author's Signed-off-by:. Add your
Signed-off-by: below the author's Signed-of-by: on all patches, not
just the patches you changed. After submitting, respond to reviewer
comments on the new version, and reroll as necessary to address those
comments.

Somebody more familiar with GitHub and/or GitGitGadget will have to
answer the more specific part of your question about whether you can
push your version to the same PR or if you instead need to open a new
PR. If you are able to push to the existing PR, then you also need to
update the PR's description since that becomes the cover letter for
the series. Or you can just send the patch series directly to the
mailing list, skipping GitGitGadget altogether.

[1]: That is, unless you change a patch so substantially that little
of the author's original work is present, in which case you make
yourself the author and typically credit the original author with an
Original-patch-by: or Helped-by:.

^ permalink raw reply

* Re: [PATCH] tests: prefer host Git to verify chainlint self-checks
From: Eric Sunshine @ 2023-12-14 18:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git
In-Reply-To: <xmqqbkaspxn6.fsf@gitster.g>

On Thu, Dec 14, 2023 at 11:16 AM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> > I sent a reply[1] in the other thread explaining why I'm still leaning
> > toward `sed` to smooth over these minor differences rather than
> > churning the "expect" files, especially since the minor differences
> > are not significant to what is actually being tested.
>
> If it is just one time bulk conversion under t/chainlint/ to match
> what the chainlint.pl script produces, with the possibility of
> similar bulk updates in the future when the script gets updated, I
> tend to agree with Patrick that getting rid of the fuzzy comparison
> will be the best way forward.

Okay, that's fine. If we take this approach, though, then it would
make sense to eliminate _all_ gratuitous postprocessing of the
"expect" files[1] so that we really are comparing the direct output of
chainlint.pl with the "expect" files, rather than merely munging the
inline whitespace of the "expect" files slightly as Patrick's proposed
patch does[2].

(The only postprocessing of "expect" files which needs to stay is the
bit which removes the "# LINT:" comments which litter the "expect"
files explaining to human readers why the linter should insert a
"???FOO???" annotation at that particular point.)

> Especially if the fuzzy comparison is done only to hide differences
> between what the old chainlint.sed used to produce and what the
> current version produces, that is.  If for some reason the script
> started to create subtly different output for other reasons (e.g.,
> it may produce different whitespaces on a particular platform, or
> with a specific version of perl interpreter), we'd better be aware
> of it, instead of blindly ignoring the differences without
> inspecting them and verifying that they are benign.
>
> By going through the single conversion pain, it will force us to
> think twice before breaking its output stability while updating
> chainlint.pl, which would also be a good thing.

The chainlint self-tests were never meant to be about its general
output stability. They were intended to ensure that the "???FOO???"
annotations are: (1) indeed inserted for the set of linting problems
the tool detects, and (2) inserted at the correct spot in the emitted
output relative to the shell tokens to which the annotation applies.
Minor differences in the tool's output (whether over time or between
platforms) should be immaterial in respect to those correctness goals.

[1]: https://lore.kernel.org/git/CAPig+cTZmiXdPZEVO-F2UzV9YaP6c7r2MfPTC3QWksJa+rM7VA@mail.gmail.com/
[2]: https://lore.kernel.org/git/aec86a15c69aa276eee4875fad208ee2fc57635a.1702542564.git.ps@pks.im/

^ permalink raw reply

* Re: What's cooking in git.git (Dec 2023, #01; Sat, 9)
From: Junio C Hamano @ 2023-12-14 16:53 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git
In-Reply-To: <e5295dbe-94d2-3186-5663-2466eba4bdde@jeffhostetler.com>

Jeff Hostetler <git@jeffhostetler.com> writes:

>> * jc/diff-cached-fsmonitor-fix (2023-09-15) 3 commits
>> * jc/fake-lstat (2023-09-15) 1 commit
>
> I think these look good.  And yes, it is better to insure that the
> stat struct is always well-defined rather than sometimes uninitialized.

Thanks.

> FWIW, in
> f954c7b8ff3 (fsmonitor: never set CE_FSMONITOR_VALID on submodules,
> 2022-05-26)
> we try to never set the CE_FSMONITOR_VALID bit on submodules (because
> status on a submodule is much more than just an lstat check on the
> submodule root directory and we always should recursively ask Git to
> compute the submodule's status).
>
> I haven't had time to investigate, but I wonder if the original
> complaint on `diff-lib` was due to another code path that allowed
> the CE_FSMONITOR_VALID bit to get set on a submodule entry.

^ permalink raw reply

* Re: [PATCH] tests: drop dependency on `git diff` in check-chainlint
From: Junio C Hamano @ 2023-12-14 16:49 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Eric Sunshine, git, Eric Sunshine, Eric Sunshine
In-Reply-To: <ZXq3YdK2RSKF3npE@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

> This strongly reminds me of the thread at [1], where a similar issue was
> discussed for git-grep(1). Quoting Junio: 
>
>> I actually do not think these "we are allowing Git tools to be used
>> on random garbage" is a good idea to begin with X-<.  If we invented
>> something nice for our variant in "git grep" and wish we can use it
>> outside the repository, contributing the feature to implementations
>> of "grep" would have been the right way to move forward, instead of
>> contaminating the codebase with things that are not related to Git.
>
> So this might not be the best way to go.

That is not a conclusion I want people to draw.

Like it or not, "git diff --no-index" will be with us to stay, and
"--no-index" being "we have abused the rest of Git code to implement
'diff' that works _outside_ a Git repository---now go and do your
thing", we would eventually want to correct it, if it is misbehaving
when a repository it finds is in a shape it does not like, no?

We should have what you quoted in mind as a general principle, and
think twice when we are tempted to hoard useful features for another
tool we initially wrote for Git and allow them to be used with the
"--no-index" option, instead of contributing them to the tool that
does not know or care "git" repositories (like "diff" and "grep").

^ permalink raw reply

* Question regarding Git updater
From: Andreas Scholz @ 2023-12-14 16:44 UTC (permalink / raw)
  To: git

Hi

I hope you can help me with answering my question regarding the update
mechanism for Git after it has been installed.

1) Does the updater autonomously figure out if there is a newer
version than the current one that is installed?

2) Or does the updater only ask, when the user actively uses a command
to ask Git to check for a newer version?

3) In both cases, what information about the user/system is sent with
the request? Is this information stored on a server or database etc.?

BR

Andreas

^ permalink raw reply

* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Junio C Hamano @ 2023-12-14 16:41 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Ramsay Jones, git, Taylor Blau, Phillip Wood
In-Reply-To: <ZXrFOtKNLxyT8Csj@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

>> OK, I somehow got a (wrong) impression that you are very close to
>> the finish line.
>
> You mean with the reftable backend?

It would be a major achievement if we just stop bypassing refs API
to read/write ORIG_HEAD and friends, even if we are still using only
the files backend.  And I meant that I got an impression that you
are very close to that, regardless of the readiness of the reftable
support.

^ permalink raw reply

* Re: [PATCH] tests: prefer host Git to verify chainlint self-checks
From: Junio C Hamano @ 2023-12-14 16:16 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Patrick Steinhardt, git
In-Reply-To: <CAPig+cQ2-PB24n0xfcoSy_1UT-VbEZUXXJ9QbA8FBA8Vfyd6Ng@mail.gmail.com>

Eric Sunshine <sunshine@sunshineco.com> writes:

>>   - Fix the ".expect" files so that we can avoid all of these games.
>>
>> I actually like the last option most. I'll have a go at it and send this
>> third version out in a bit.
>
> I sent a reply[1] in the other thread explaining why I'm still leaning
> toward `sed` to smooth over these minor differences rather than
> churning the "expect" files, especially since the minor differences
> are not significant to what is actually being tested.

If it is just one time bulk conversion under t/chainlint/ to match
what the chainlint.pl script produces, with the possibility of
similar bulk updates in the future when the script gets updated, I
tend to agree with Patrick that getting rid of the fuzzy comparison
will be the best way forward.

Especially if the fuzzy comparison is done only to hide differences
between what the old chainlint.sed used to produce and what the
current version produces, that is.  If for some reason the script
started to create subtly different output for other reasons (e.g.,
it may produce different whitespaces on a particular platform, or
with a specific version of perl interpreter), we'd better be aware
of it, instead of blindly ignoring the differences without
inspecting them and verifying that they are benign.

By going through the single conversion pain, it will force us to
think twice before breaking its output stability while updating
chainlint.pl, which would also be a good thing.

THanks.

^ permalink raw reply

* Re: What's cooking in git.git (Dec 2023, #01; Sat, 9)
From: Jeff Hostetler @ 2023-12-14 15:19 UTC (permalink / raw)
  To: Junio C Hamano, git
In-Reply-To: <xmqqa5qknnej.fsf@gitster.g>



On 12/8/23 9:02 PM, Junio C Hamano wrote:
> --------------------------------------------------
...
> 
> * jc/diff-cached-fsmonitor-fix (2023-09-15) 3 commits
>   - diff-lib: fix check_removed() when fsmonitor is active
>   - Merge branch 'jc/fake-lstat' into jc/diff-cached-fsmonitor-fix
>   - Merge branch 'js/diff-cached-fsmonitor-fix' into jc/diff-cached-fsmonitor-fix
>   (this branch uses jc/fake-lstat.)
> 
>   The optimization based on fsmonitor in the "diff --cached"
>   codepath is resurrected with the "fake-lstat" introduced earlier.
> 
>   It is unknown if the optimization is worth resurrecting, but in case...
>   source: <xmqqr0n0h0tw.fsf@gitster.g>
> 
> --------------------------------------------------
...
> * jc/fake-lstat (2023-09-15) 1 commit
>   - cache: add fake_lstat()
>   (this branch is used by jc/diff-cached-fsmonitor-fix.)
> 
>   A new helper to let us pretend that we called lstat() when we know
>   our cache_entry is up-to-date via fsmonitor.
> 
>   Needs review.
>   source: <xmqqcyykig1l.fsf@gitster.g>
> 
> 

I think these look good.  And yes, it is better to insure that the
stat struct is always well-defined rather than sometimes uninitialized.


FWIW, in
f954c7b8ff3 (fsmonitor: never set CE_FSMONITOR_VALID on submodules, 
2022-05-26)
we try to never set the CE_FSMONITOR_VALID bit on submodules (because
status on a submodule is much more than just an lstat check on the
submodule root directory and we always should recursively ask Git to
compute the submodule's status).

I haven't had time to investigate, but I wonder if the original
complaint on `diff-lib` was due to another code path that allowed
the CE_FSMONITOR_VALID bit to get set on a submodule entry.

Jeff


^ permalink raw reply

* [PATCH v4 4/4] archive: support remote archive from stateless transport
From: Jiang Xin @ 2023-12-14 14:13 UTC (permalink / raw)
  To: Git List, Junio C Hamano; +Cc: Jiang Xin, Eric Sunshine
In-Reply-To: <cover.1702562879.git.zhiyou.jx@alibaba-inc.com>

From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

Even though we can establish a stateless connection, we still cannot
archive the remote repository using a stateless HTTP protocol. Try the
following steps to make it work.

 1. Add support for "git-upload-archive" service in "http-backend".

 2. Use the URL ".../info/refs?service=git-upload-pack" to detect the
    protocol version, instead of use the "git-upload-archive" service.

 3. "git-archive" does not expect to see protocol version and
    capabilities when connecting to remote-helper, so do not send them
    in "remote-curl.c" for the "git-upload-archive" service.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
---
 http-backend.c         | 15 +++++++++++----
 remote-curl.c          | 14 +++++++++++---
 t/t5003-archive-zip.sh | 34 ++++++++++++++++++++++++++++++++++
 transport-helper.c     |  3 ++-
 4 files changed, 58 insertions(+), 8 deletions(-)

diff --git a/http-backend.c b/http-backend.c
index ff07b87e64..6a2c919839 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -38,6 +38,7 @@ struct rpc_service {
 static struct rpc_service rpc_service[] = {
 	{ "upload-pack", "uploadpack", 1, 1 },
 	{ "receive-pack", "receivepack", 0, -1 },
+	{ "upload-archive", "uploadarchive", 0, -1 },
 };
 
 static struct string_list *get_parameters(void)
@@ -639,10 +640,15 @@ static void check_content_type(struct strbuf *hdr, const char *accepted_type)
 
 static void service_rpc(struct strbuf *hdr, char *service_name)
 {
-	const char *argv[] = {NULL, "--stateless-rpc", ".", NULL};
+	struct strvec argv = STRVEC_INIT;
 	struct rpc_service *svc = select_service(hdr, service_name);
 	struct strbuf buf = STRBUF_INIT;
 
+	strvec_push(&argv, svc->name);
+	if (strcmp(service_name, "git-upload-archive"))
+		strvec_push(&argv, "--stateless-rpc");
+	strvec_push(&argv, ".");
+
 	strbuf_reset(&buf);
 	strbuf_addf(&buf, "application/x-git-%s-request", svc->name);
 	check_content_type(hdr, buf.buf);
@@ -655,9 +661,9 @@ static void service_rpc(struct strbuf *hdr, char *service_name)
 
 	end_headers(hdr);
 
-	argv[0] = svc->name;
-	run_service(argv, svc->buffer_input);
+	run_service(argv.v, svc->buffer_input);
 	strbuf_release(&buf);
+	strvec_clear(&argv);
 }
 
 static int dead;
@@ -723,7 +729,8 @@ static struct service_cmd {
 	{"GET", "/objects/pack/pack-[0-9a-f]{64}\\.idx$", get_idx_file},
 
 	{"POST", "/git-upload-pack$", service_rpc},
-	{"POST", "/git-receive-pack$", service_rpc}
+	{"POST", "/git-receive-pack$", service_rpc},
+	{"POST", "/git-upload-archive$", service_rpc}
 };
 
 static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
diff --git a/remote-curl.c b/remote-curl.c
index ef05752ca5..ce6cb8ac05 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -1447,8 +1447,14 @@ static int stateless_connect(const char *service_name)
 	 * establish a stateless connection, otherwise we need to tell the
 	 * client to fallback to using other transport helper functions to
 	 * complete their request.
+	 *
+	 * The "git-upload-archive" service is a read-only operation. Fallback
+	 * to use "git-upload-pack" service to discover protocol version.
 	 */
-	discover = discover_refs(service_name, 0);
+	if (!strcmp(service_name, "git-upload-archive"))
+		discover = discover_refs("git-upload-pack", 0);
+	else
+		discover = discover_refs(service_name, 0);
 	if (discover->version != protocol_v2) {
 		printf("fallback\n");
 		fflush(stdout);
@@ -1486,9 +1492,11 @@ static int stateless_connect(const char *service_name)
 
 	/*
 	 * Dump the capability listing that we got from the server earlier
-	 * during the info/refs request.
+	 * during the info/refs request. This does not work with the
+	 * "git-upload-archive" service.
 	 */
-	write_or_die(rpc.in, discover->buf, discover->len);
+	if (strcmp(service_name, "git-upload-archive"))
+		write_or_die(rpc.in, discover->buf, discover->len);
 
 	/* Until we see EOF keep sending POSTs */
 	while (1) {
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index fc499cdff0..961c6aac25 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -239,4 +239,38 @@ check_zip with_untracked2
 check_added with_untracked2 untracked one/untracked
 check_added with_untracked2 untracked two/untracked
 
+# Test remote archive over HTTP protocol.
+#
+# Note: this should be the last part of this test suite, because
+# by including lib-httpd.sh, the test may end early if httpd tests
+# should not be run.
+#
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success "setup for HTTP protocol" '
+	cp -R bare.git "$HTTPD_DOCUMENT_ROOT_PATH/bare.git" &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/bare.git" \
+		config http.uploadpack true &&
+	set_askpass user@host pass@host
+'
+
+setup_askpass_helper
+
+test_expect_success 'remote archive does not work with protocol v1' '
+	test_must_fail git -c protocol.version=1 archive \
+		--remote="$HTTPD_URL/auth/smart/bare.git" \
+		--output=remote-http.zip HEAD >actual 2>&1 &&
+	cat >expect <<-EOF &&
+	fatal: can${SQ}t connect to subservice git-upload-archive
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'archive remote http repository' '
+	git archive --remote="$HTTPD_URL/auth/smart/bare.git" \
+		--output=remote-http.zip HEAD &&
+	test_cmp_bin d.zip remote-http.zip
+'
+
 test_done
diff --git a/transport-helper.c b/transport-helper.c
index 3b036ae1ca..566f7473df 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -628,7 +628,8 @@ static int process_connect_service(struct transport *transport,
 		ret = run_connect(transport, &cmdbuf);
 	} else if (data->stateless_connect &&
 		   (get_protocol_version_config() == protocol_v2) &&
-		   !strcmp("git-upload-pack", name)) {
+		   (!strcmp("git-upload-pack", name) ||
+		    !strcmp("git-upload-archive", name))) {
 		strbuf_addf(&cmdbuf, "stateless-connect %s\n", name);
 		ret = run_connect(transport, &cmdbuf);
 		if (ret)
-- 
2.41.0.232.g2f6f0bca4f.agit.8.0.4.dev


^ permalink raw reply related

* [PATCH v4 3/4] transport-helper: call do_take_over() in connect_helper
From: Jiang Xin @ 2023-12-14 14:13 UTC (permalink / raw)
  To: Git List, Junio C Hamano; +Cc: Jiang Xin
In-Reply-To: <cover.1702562879.git.zhiyou.jx@alibaba-inc.com>

From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

After successfully connecting to the smart transport by calling
process_connect_service() in connect_helper(), run do_take_over() to
replace the old vtable with a new one which has methods ready for the
smart transport connection.

The connect_helper() function is used as the connect method of the
vtable in "transport-helper.c", and it is called by transport_connect()
in "transport.c" to setup a connection. The only place that we call
transport_connect() so far is in "builtin/archive.c". Without running
do_take_over(), it may fail to call transport_disconnect() in
run_remote_archiver() of "builtin/archive.c". This is because for a
stateless connection or a service like "git-upload-pack-archive", the
remote helper may receive a SIGPIPE signal and exit early. To have a
graceful disconnect method by calling do_take_over() will solve this
issue.

The subsequent commit will introduce remote archive over a stateless-rpc
connection.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
---
 transport-helper.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/transport-helper.c b/transport-helper.c
index 51088cc03a..3b036ae1ca 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -672,6 +672,8 @@ static int connect_helper(struct transport *transport, const char *name,
 
 	fd[0] = data->helper->out;
 	fd[1] = data->helper->in;
+
+	do_take_over(transport);
 	return 0;
 }
 
-- 
2.41.0.232.g2f6f0bca4f.agit.8.0.4.dev


^ permalink raw reply related

* [PATCH v4 2/4] transport-helper: call do_take_over() in process_connect
From: Jiang Xin @ 2023-12-14 14:13 UTC (permalink / raw)
  To: Git List, Junio C Hamano; +Cc: Jiang Xin
In-Reply-To: <cover.1702562879.git.zhiyou.jx@alibaba-inc.com>

From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

The existing pattern among all callers of process_connect() seems to be

        if (process_connect(...)) {
                do_take_over();
                ... dispatch to the underlying method ...
        }
        ... otherwise implement the fallback ...

where the return value from process_connect() is the return value of the
call it makes to process_connect_service().

Move the call of do_take_over() inside process_connect(), so that
calling the process_connect() function is more concise and will not
miss do_take_over().

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
---
 transport-helper.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 2e127d24a5..51088cc03a 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -645,6 +645,7 @@ static int process_connect(struct transport *transport,
 	struct helper_data *data = transport->data;
 	const char *name;
 	const char *exec;
+	int ret;
 
 	name = for_push ? "git-receive-pack" : "git-upload-pack";
 	if (for_push)
@@ -652,7 +653,10 @@ static int process_connect(struct transport *transport,
 	else
 		exec = data->transport_options.uploadpack;
 
-	return process_connect_service(transport, name, exec);
+	ret = process_connect_service(transport, name, exec);
+	if (ret)
+		do_take_over(transport);
+	return ret;
 }
 
 static int connect_helper(struct transport *transport, const char *name,
@@ -682,10 +686,8 @@ static int fetch_refs(struct transport *transport,
 
 	get_helper(transport);
 
-	if (process_connect(transport, 0)) {
-		do_take_over(transport);
+	if (process_connect(transport, 0))
 		return transport->vtable->fetch_refs(transport, nr_heads, to_fetch);
-	}
 
 	/*
 	 * If we reach here, then the server, the client, and/or the transport
@@ -1142,10 +1144,8 @@ static int push_refs(struct transport *transport,
 {
 	struct helper_data *data = transport->data;
 
-	if (process_connect(transport, 1)) {
-		do_take_over(transport);
+	if (process_connect(transport, 1))
 		return transport->vtable->push_refs(transport, remote_refs, flags);
-	}
 
 	if (!remote_refs) {
 		fprintf(stderr,
@@ -1186,11 +1186,9 @@ static struct ref *get_refs_list(struct transport *transport, int for_push,
 {
 	get_helper(transport);
 
-	if (process_connect(transport, for_push)) {
-		do_take_over(transport);
+	if (process_connect(transport, for_push))
 		return transport->vtable->get_refs_list(transport, for_push,
 							transport_options);
-	}
 
 	return get_refs_list_using_list(transport, for_push);
 }
@@ -1274,10 +1272,8 @@ static int get_bundle_uri(struct transport *transport)
 {
 	get_helper(transport);
 
-	if (process_connect(transport, 0)) {
-		do_take_over(transport);
+	if (process_connect(transport, 0))
 		return transport->vtable->get_bundle_uri(transport);
-	}
 
 	return -1;
 }
-- 
2.41.0.232.g2f6f0bca4f.agit.8.0.4.dev


^ permalink raw reply related

* [PATCH v4 1/4] transport-helper: no connection restriction in connect_helper
From: Jiang Xin @ 2023-12-14 14:13 UTC (permalink / raw)
  To: Git List, Junio C Hamano; +Cc: Jiang Xin
In-Reply-To: <cover.1702562879.git.zhiyou.jx@alibaba-inc.com>

From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

When commit b236752a (Support remote archive from all smart transports,
2009-12-09) added "remote archive" support for "smart transports", it
was for transport that supports the ".connect" method. The
"connect_helper()" function protected itself from getting called for a
transport without the method before calling process_connect_service(),
which did not work with such a transport.

Later, commit edc9caf7 (transport-helper: introduce stateless-connect,
2018-03-15) added a way for a transport without the ".connect" method
to establish a "stateless" connection in protocol-v2, which
process_connect_service() was taught to handle the "stateless"
connection, making the old safety valve in its caller that insisted
that ".connect" method must be defined too strict, and forgot to loosen
it.

Remove the restriction in the "connect_helper()" function and give the
function "process_connect_service()" the opportunity to establish a
connection using ".connect" or ".stateless_connect" for protocol v2. So
we can connect with a stateless-rpc and do something useful. E.g., in a
later commit, implements remote archive for a repository over HTTP
protocol.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
---
 transport-helper.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 49811ef176..2e127d24a5 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -662,8 +662,6 @@ static int connect_helper(struct transport *transport, const char *name,
 
 	/* Get_helper so connect is inited. */
 	get_helper(transport);
-	if (!data->connect)
-		die(_("operation not supported by protocol"));
 
 	if (!process_connect_service(transport, name, exec))
 		die(_("can't connect to subservice %s"), name);
-- 
2.41.0.232.g2f6f0bca4f.agit.8.0.4.dev


^ permalink raw reply related

* [PATCH v4 0/4] support remote archive via stateless transport
From: Jiang Xin @ 2023-12-14 14:13 UTC (permalink / raw)
  To: Git List, Junio C Hamano; +Cc: Jiang Xin
In-Reply-To: <cover.1696432593.git.zhiyou.jx@alibaba-inc.com>

From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

# Change since v3:

1. Update commit message of patch 2/4.
2. Add comments in t5003.

# range-diff v3...v4

1:  1818d8e30e = 1:  d343585cb5 transport-helper: no connection restriction in connect_helper
2:  b57524bc91 ! 2:  65fb67523c transport-helper: call do_take_over() in process_connect
    @@ Commit message
         where the return value from process_connect() is the return value of the
         call it makes to process_connect_service().
     
    -    It is safe to make a refactor by moving the call of do_take_over()
    -    into the function process_connect().
    +    Move the call of do_take_over() inside process_connect(), so that
    +    calling the process_connect() function is more concise and will not
    +    miss do_take_over().
     
         Suggested-by: Junio C Hamano <gitster@pobox.com>
         Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
3:  7ce60e3b9a = 3:  109a1fffde transport-helper: call do_take_over() in connect_helper
4:  626f903508 ! 4:  eb905259fe archive: support remote archive from stateless transport
    @@ Commit message
     
         Helped-by: Eric Sunshine <sunshine@sunshineco.com>
         Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
    @@ t/t5003-archive-zip.sh: check_zip with_untracked2
      check_added with_untracked2 untracked one/untracked
      check_added with_untracked2 untracked two/untracked
      
    ++# Test remote archive over HTTP protocol.
    ++#
    ++# Note: this should be the last part of this test suite, because
    ++# by including lib-httpd.sh, the test may end early if httpd tests
    ++# should not be run.
    ++#
     +. "$TEST_DIRECTORY"/lib-httpd.sh
     +start_httpd
     +
    @@ t/t5003-archive-zip.sh: check_zip with_untracked2
     +setup_askpass_helper
     +
     +test_expect_success 'remote archive does not work with protocol v1' '
    -+	test_when_finished "rm -f d5.zip" &&
     +	test_must_fail git -c protocol.version=1 archive \
     +		--remote="$HTTPD_URL/auth/smart/bare.git" \
    -+		--output=d5.zip HEAD >actual 2>&1 &&
    ++		--output=remote-http.zip HEAD >actual 2>&1 &&
     +	cat >expect <<-EOF &&
     +	fatal: can${SQ}t connect to subservice git-upload-archive
     +	EOF
    @@ t/t5003-archive-zip.sh: check_zip with_untracked2
     +'
     +
     +test_expect_success 'archive remote http repository' '
    -+	test_when_finished "rm -f d5.zip" &&
     +	git archive --remote="$HTTPD_URL/auth/smart/bare.git" \
    -+		--output=d5.zip HEAD &&
    -+	test_cmp_bin d.zip d5.zip
    ++		--output=remote-http.zip HEAD &&
    ++	test_cmp_bin d.zip remote-http.zip
     +'
     +
      test_done

Jiang Xin (4):
  transport-helper: no connection restriction in connect_helper
  transport-helper: call do_take_over() in process_connect
  transport-helper: call do_take_over() in connect_helper
  archive: support remote archive from stateless transport

 http-backend.c         | 15 +++++++++++----
 remote-curl.c          | 14 +++++++++++---
 t/t5003-archive-zip.sh | 34 ++++++++++++++++++++++++++++++++++
 transport-helper.c     | 29 +++++++++++++----------------
 4 files changed, 69 insertions(+), 23 deletions(-)

-- 
2.41.0.232.g2f6f0bca4f.agit.8.0.4.dev


^ permalink raw reply

* [PATCH v3 4/4] bisect: consistently write BISECT_EXPECTED_REV via the refdb
From: Patrick Steinhardt @ 2023-12-14 13:37 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Phillip Wood, Ramsay Jones
In-Reply-To: <cover.1702560829.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 5649 bytes --]

We're inconsistently writing BISECT_EXPECTED_REV both via the filesystem
and via the refdb, which violates the newly established rules for how
special refs must be treated. This works alright in practice with the
reffiles reference backend, but will cause bugs once we gain additional
backends.

Fix this issue and consistently write BISECT_EXPECTED_REV via the refdb
so that it is no longer a special ref.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 bisect.c                    | 25 ++++---------------------
 builtin/bisect.c            |  8 ++------
 refs.c                      |  3 ++-
 t/t6030-bisect-porcelain.sh |  2 +-
 4 files changed, 9 insertions(+), 29 deletions(-)

diff --git a/bisect.c b/bisect.c
index 1be8e0a271..985b96ed13 100644
--- a/bisect.c
+++ b/bisect.c
@@ -471,7 +471,6 @@ static int read_bisect_refs(void)
 }
 
 static GIT_PATH_FUNC(git_path_bisect_names, "BISECT_NAMES")
-static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
 static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
 static GIT_PATH_FUNC(git_path_bisect_run, "BISECT_RUN")
 static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")
@@ -707,26 +706,10 @@ static enum bisect_error error_if_skipped_commits(struct commit_list *tried,
 
 static int is_expected_rev(const struct object_id *oid)
 {
-	const char *filename = git_path_bisect_expected_rev();
-	struct stat st;
-	struct strbuf str = STRBUF_INIT;
-	FILE *fp;
-	int res = 0;
-
-	if (stat(filename, &st) || !S_ISREG(st.st_mode))
+	struct object_id expected_oid;
+	if (read_ref("BISECT_EXPECTED_REV", &expected_oid))
 		return 0;
-
-	fp = fopen_or_warn(filename, "r");
-	if (!fp)
-		return 0;
-
-	if (strbuf_getline_lf(&str, fp) != EOF)
-		res = !strcmp(str.buf, oid_to_hex(oid));
-
-	strbuf_release(&str);
-	fclose(fp);
-
-	return res;
+	return oideq(oid, &expected_oid);
 }
 
 enum bisect_error bisect_checkout(const struct object_id *bisect_rev,
@@ -1185,10 +1168,10 @@ int bisect_clean_state(void)
 	struct string_list refs_for_removal = STRING_LIST_INIT_NODUP;
 	for_each_ref_in("refs/bisect", mark_for_removal, (void *) &refs_for_removal);
 	string_list_append(&refs_for_removal, xstrdup("BISECT_HEAD"));
+	string_list_append(&refs_for_removal, xstrdup("BISECT_EXPECTED_REV"));
 	result = delete_refs("bisect: remove", &refs_for_removal, REF_NO_DEREF);
 	refs_for_removal.strdup_strings = 1;
 	string_list_clear(&refs_for_removal, 0);
-	unlink_or_warn(git_path_bisect_expected_rev());
 	unlink_or_warn(git_path_bisect_ancestors_ok());
 	unlink_or_warn(git_path_bisect_log());
 	unlink_or_warn(git_path_bisect_names());
diff --git a/builtin/bisect.c b/builtin/bisect.c
index 35938b05fd..4e2c43daf5 100644
--- a/builtin/bisect.c
+++ b/builtin/bisect.c
@@ -17,7 +17,6 @@
 #include "revision.h"
 
 static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
-static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
 static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
 static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")
 static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG")
@@ -921,7 +920,6 @@ static enum bisect_error bisect_state(struct bisect_terms *terms, int argc,
 	const char *state;
 	int i, verify_expected = 1;
 	struct object_id oid, expected;
-	struct strbuf buf = STRBUF_INIT;
 	struct oid_array revs = OID_ARRAY_INIT;
 
 	if (!argc)
@@ -976,10 +974,8 @@ static enum bisect_error bisect_state(struct bisect_terms *terms, int argc,
 		oid_array_append(&revs, &commit->object.oid);
 	}
 
-	if (strbuf_read_file(&buf, git_path_bisect_expected_rev(), 0) < the_hash_algo->hexsz ||
-	    get_oid_hex(buf.buf, &expected) < 0)
+	if (read_ref("BISECT_EXPECTED_REV", &expected))
 		verify_expected = 0; /* Ignore invalid file contents */
-	strbuf_release(&buf);
 
 	for (i = 0; i < revs.nr; i++) {
 		if (bisect_write(state, oid_to_hex(&revs.oid[i]), terms, 0)) {
@@ -988,7 +984,7 @@ static enum bisect_error bisect_state(struct bisect_terms *terms, int argc,
 		}
 		if (verify_expected && !oideq(&revs.oid[i], &expected)) {
 			unlink_or_warn(git_path_bisect_ancestors_ok());
-			unlink_or_warn(git_path_bisect_expected_rev());
+			delete_ref(NULL, "BISECT_EXPECTED_REV", NULL, REF_NO_DEREF);
 			verify_expected = 0;
 		}
 	}
diff --git a/refs.c b/refs.c
index 8fe34d51e4..c76ce86bef 100644
--- a/refs.c
+++ b/refs.c
@@ -1840,6 +1840,8 @@ static int is_special_ref(const char *refname)
 	 * There are some exceptions that you might expect to see on this list
 	 * but which are handled exclusively via the reference backend:
 	 *
+	 * - BISECT_EXPECTED_REV
+	 *
 	 * - CHERRY_PICK_HEAD
 	 *
 	 * - HEAD
@@ -1857,7 +1859,6 @@ static int is_special_ref(const char *refname)
 	 */
 	static const char * const special_refs[] = {
 		"AUTO_MERGE",
-		"BISECT_EXPECTED_REV",
 		"FETCH_HEAD",
 		"MERGE_AUTOSTASH",
 		"MERGE_HEAD",
diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 2a5b7d8379..792c1504bc 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -1176,7 +1176,7 @@ test_expect_success 'git bisect reset cleans bisection state properly' '
 	git bisect bad $HASH4 &&
 	git bisect reset &&
 	test -z "$(git for-each-ref "refs/bisect/*")" &&
-	test_path_is_missing ".git/BISECT_EXPECTED_REV" &&
+	test_ref_missing BISECT_EXPECTED_REV &&
 	test_path_is_missing ".git/BISECT_ANCESTORS_OK" &&
 	test_path_is_missing ".git/BISECT_LOG" &&
 	test_path_is_missing ".git/BISECT_RUN" &&
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 3/4] refs: complete list of special refs
From: Patrick Steinhardt @ 2023-12-14 13:37 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Phillip Wood, Ramsay Jones
In-Reply-To: <cover.1702560829.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4342 bytes --]

We have some references that are more special than others. The reason
for them being special is that they either do not follow the usual
format of references, or that they are written to the filesystem
directly by the respective owning subsystem and thus circumvent the
reference backend.

This works perfectly fine right now because the reffiles backend will
know how to read those refs just fine. But with the prospect of gaining
a new reference backend implementation we need to be a lot more careful
here:

  - We need to make sure that we are consistent about how those refs are
    written. They must either always be written via the filesystem, or
    they must always be written via the reference backend. Any mixture
    will lead to inconsistent state.

  - We need to make sure that such special refs are always handled
    specially when reading them.

We're already mostly good with regard to the first item, except for
`BISECT_EXPECTED_REV` which will be addressed in a subsequent commit.
But the current list of special refs is missing some refs that really
should be treated specially. Right now, we only treat `FETCH_HEAD` and
`MERGE_HEAD` specially here.

Introduce a new function `is_special_ref()` that contains all current
instances of special refs to fix the reading path.

Note that this is only a temporary measure where we record and rectify
the current state. Ideally, the list of special refs should in the end
only contain `FETCH_HEAD` and `MERGE_HEAD` again because they both may
reference multiple objects and can contain annotations, so they indeed
are special.

Based-on-patch-by: Han-Wen Nienhuys <hanwenn@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index 00e72a2abf..8fe34d51e4 100644
--- a/refs.c
+++ b/refs.c
@@ -1820,15 +1820,65 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	return result;
 }
 
+static int is_special_ref(const char *refname)
+{
+	/*
+	 * Special references get written and read directly via the filesystem
+	 * by the subsystems that create them. Thus, they must not go through
+	 * the reference backend but must instead be read directly. It is
+	 * arguable whether this behaviour is sensible, or whether it's simply
+	 * a leaky abstraction enabled by us only having a single reference
+	 * backend implementation. But at least for a subset of references it
+	 * indeed does make sense to treat them specially:
+	 *
+	 * - FETCH_HEAD may contain multiple object IDs, and each one of them
+	 *   carries additional metadata like where it came from.
+	 *
+	 * - MERGE_HEAD may contain multiple object IDs when merging multiple
+	 *   heads.
+	 *
+	 * There are some exceptions that you might expect to see on this list
+	 * but which are handled exclusively via the reference backend:
+	 *
+	 * - CHERRY_PICK_HEAD
+	 *
+	 * - HEAD
+	 *
+	 * - ORIG_HEAD
+	 *
+	 * - "rebase-apply/" and "rebase-merge/" contain all of the state for
+	 *   rebases, including some reference-like files. These are
+	 *   exclusively read and written via the filesystem and never go
+	 *   through the refdb.
+	 *
+	 * Writing or deleting references must consistently go either through
+	 * the filesystem (special refs) or through the reference backend
+	 * (normal ones).
+	 */
+	static const char * const special_refs[] = {
+		"AUTO_MERGE",
+		"BISECT_EXPECTED_REV",
+		"FETCH_HEAD",
+		"MERGE_AUTOSTASH",
+		"MERGE_HEAD",
+	};
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(special_refs); i++)
+		if (!strcmp(refname, special_refs[i]))
+			return 1;
+
+	return 0;
+}
+
 int refs_read_raw_ref(struct ref_store *ref_store, const char *refname,
 		      struct object_id *oid, struct strbuf *referent,
 		      unsigned int *type, int *failure_errno)
 {
 	assert(failure_errno);
-	if (!strcmp(refname, "FETCH_HEAD") || !strcmp(refname, "MERGE_HEAD")) {
+	if (is_special_ref(refname))
 		return refs_read_special_head(ref_store, refname, oid, referent,
 					      type, failure_errno);
-	}
 
 	return ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
 					   type, failure_errno);
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 2/4] refs: propagate errno when reading special refs fails
From: Patrick Steinhardt @ 2023-12-14 13:37 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Phillip Wood, Ramsay Jones
In-Reply-To: <cover.1702560829.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]

Some refs in Git are more special than others due to reasons explained
in the next commit. These refs are read via `refs_read_special_head()`,
but this function doesn't behave the same as when we try to read a
normal ref. Most importantly, we do not propagate `failure_errno` in the
case where the reference does not exist, which is behaviour that we rely
on in many parts of Git.

Fix this bug by propagating errno when `strbuf_read_file()` fails.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c              |  4 +++-
 t/t1403-show-ref.sh | 10 ++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index fcae5dddc6..00e72a2abf 100644
--- a/refs.c
+++ b/refs.c
@@ -1806,8 +1806,10 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	int result = -1;
 	strbuf_addf(&full_path, "%s/%s", ref_store->gitdir, refname);
 
-	if (strbuf_read_file(&content, full_path.buf, 0) < 0)
+	if (strbuf_read_file(&content, full_path.buf, 0) < 0) {
+		*failure_errno = errno;
 		goto done;
+	}
 
 	result = parse_loose_ref_contents(content.buf, oid, referent, type,
 					  failure_errno);
diff --git a/t/t1403-show-ref.sh b/t/t1403-show-ref.sh
index b50ae6fcf1..66e6e77fa9 100755
--- a/t/t1403-show-ref.sh
+++ b/t/t1403-show-ref.sh
@@ -266,4 +266,14 @@ test_expect_success '--exists with directory fails with generic error' '
 	test_cmp expect err
 '
 
+test_expect_success '--exists with non-existent special ref' '
+	test_expect_code 2 git show-ref --exists FETCH_HEAD
+'
+
+test_expect_success '--exists with existing special ref' '
+	test_when_finished "rm .git/FETCH_HEAD" &&
+	git rev-parse HEAD >.git/FETCH_HEAD &&
+	git show-ref --exists FETCH_HEAD
+'
+
 test_done
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Patrick Steinhardt @ 2023-12-14 13:36 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Phillip Wood, Ramsay Jones
In-Reply-To: <cover.1702560829.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 2948 bytes --]

We read both the HEAD and ORIG_HEAD references directly from the
filesystem in order to figure out whether we're currently splitting a
commit. If both of the following are true:

  - HEAD points to the same object as "rebase-merge/amend".

  - ORIG_HEAD points to the same object as "rebase-merge/orig-head".

Then we are currently splitting commits.

The current code only works by chance because we only have a single
reference backend implementation. Refactor it to instead read both refs
via the refdb layer so that we'll also be compatible with alternate
reference backends.

There are some subtleties involved here:

  - We pass `RESOLVE_REF_READING` so that a missing ref will cause
    `read_ref_full()` to return an error.

  - We pass `RESOLVE_REF_NO_RECURSE` so that we do not try to resolve
    symrefs. The old code didn't resolve symrefs either, and we only
    ever write object IDs into the refs in "rebase-merge/".

  - In the same spirit we verify that successfully-read refs are not
    symbolic refs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 wt-status.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 9f45bf6949..da19923981 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1295,26 +1295,32 @@ static char *read_line_from_git_path(const char *filename)
 static int split_commit_in_progress(struct wt_status *s)
 {
 	int split_in_progress = 0;
-	char *head, *orig_head, *rebase_amend, *rebase_orig_head;
+	struct object_id head_oid, orig_head_oid;
+	char *rebase_amend, *rebase_orig_head;
+	int head_flags, orig_head_flags;
 
 	if ((!s->amend && !s->nowarn && !s->workdir_dirty) ||
 	    !s->branch || strcmp(s->branch, "HEAD"))
 		return 0;
 
-	head = read_line_from_git_path("HEAD");
-	orig_head = read_line_from_git_path("ORIG_HEAD");
+	if (read_ref_full("HEAD", RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
+			  &head_oid, &head_flags) ||
+	    read_ref_full("ORIG_HEAD", RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
+			  &orig_head_oid, &orig_head_flags))
+		return 0;
+	if (head_flags & REF_ISSYMREF || orig_head_flags & REF_ISSYMREF)
+		return 0;
+
 	rebase_amend = read_line_from_git_path("rebase-merge/amend");
 	rebase_orig_head = read_line_from_git_path("rebase-merge/orig-head");
 
-	if (!head || !orig_head || !rebase_amend || !rebase_orig_head)
+	if (!rebase_amend || !rebase_orig_head)
 		; /* fall through, no split in progress */
 	else if (!strcmp(rebase_amend, rebase_orig_head))
-		split_in_progress = !!strcmp(head, rebase_amend);
-	else if (strcmp(orig_head, rebase_orig_head))
+		split_in_progress = !!strcmp(oid_to_hex(&head_oid), rebase_amend);
+	else if (strcmp(oid_to_hex(&orig_head_oid), rebase_orig_head))
 		split_in_progress = 1;
 
-	free(head);
-	free(orig_head);
 	free(rebase_amend);
 	free(rebase_orig_head);
 
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH v3 0/4] refs: improve handling of special refs
From: Patrick Steinhardt @ 2023-12-14 13:36 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Phillip Wood, Ramsay Jones
In-Reply-To: <cover.1701243201.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 4913 bytes --]

Hi,

this is the third version of my patch series to improve handling of
special refs.

Changes combpared to v2:

  - Patch 1: We're now more careful around missing and symbolic refs.

  - Patch 3: Document in the commit message that the extended list of
    special refs is not intended to stay like this forever.

Thanks for your reviews and the discussion!

Patrick

Patrick Steinhardt (4):
  wt-status: read HEAD and ORIG_HEAD via the refdb
  refs: propagate errno when reading special refs fails
  refs: complete list of special refs
  bisect: consistently write BISECT_EXPECTED_REV via the refdb

 bisect.c                    | 25 +++-------------
 builtin/bisect.c            |  8 ++---
 refs.c                      | 59 +++++++++++++++++++++++++++++++++++--
 t/t1403-show-ref.sh         | 10 +++++++
 t/t6030-bisect-porcelain.sh |  2 +-
 wt-status.c                 | 22 +++++++++-----
 6 files changed, 87 insertions(+), 39 deletions(-)

Range-diff against v2:
1:  1db3eb3945 ! 1:  aea9410bd9 wt-status: read HEAD and ORIG_HEAD via the refdb
    @@ Commit message
         via the refdb layer so that we'll also be compatible with alternate
         reference backends.
     
    -    Note that we pass `RESOLVE_REF_NO_RECURSE` to `read_ref_full()`. This is
    -    because we didn't resolve symrefs before either, and in practice none of
    -    the refs in "rebase-merge/" would be symbolic. We thus don't want to
    -    resolve symrefs with the new code either to retain the old behaviour.
    +    There are some subtleties involved here:
    +
    +      - We pass `RESOLVE_REF_READING` so that a missing ref will cause
    +        `read_ref_full()` to return an error.
    +
    +      - We pass `RESOLVE_REF_NO_RECURSE` so that we do not try to resolve
    +        symrefs. The old code didn't resolve symrefs either, and we only
    +        ever write object IDs into the refs in "rebase-merge/".
    +
    +      - In the same spirit we verify that successfully-read refs are not
    +        symbolic refs.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    @@ wt-status.c: static char *read_line_from_git_path(const char *filename)
     -	char *head, *orig_head, *rebase_amend, *rebase_orig_head;
     +	struct object_id head_oid, orig_head_oid;
     +	char *rebase_amend, *rebase_orig_head;
    ++	int head_flags, orig_head_flags;
      
      	if ((!s->amend && !s->nowarn && !s->workdir_dirty) ||
      	    !s->branch || strcmp(s->branch, "HEAD"))
    @@ wt-status.c: static char *read_line_from_git_path(const char *filename)
      
     -	head = read_line_from_git_path("HEAD");
     -	orig_head = read_line_from_git_path("ORIG_HEAD");
    -+	if (read_ref_full("HEAD", RESOLVE_REF_NO_RECURSE, &head_oid, NULL) ||
    -+	    read_ref_full("ORIG_HEAD", RESOLVE_REF_NO_RECURSE, &orig_head_oid, NULL))
    ++	if (read_ref_full("HEAD", RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
    ++			  &head_oid, &head_flags) ||
    ++	    read_ref_full("ORIG_HEAD", RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
    ++			  &orig_head_oid, &orig_head_flags))
    ++		return 0;
    ++	if (head_flags & REF_ISSYMREF || orig_head_flags & REF_ISSYMREF)
     +		return 0;
     +
      	rebase_amend = read_line_from_git_path("rebase-merge/amend");
2:  24032a62e5 = 2:  455ab69177 refs: propagate errno when reading special refs fails
3:  3dd9089fd5 ! 3:  81ac092281 refs: complete list of special refs
    @@ Commit message
     
         We're already mostly good with regard to the first item, except for
         `BISECT_EXPECTED_REV` which will be addressed in a subsequent commit.
    -    But the current list of special refs is missing a lot of refs that
    -    really should be treated specially. Right now, we only treat
    -    `FETCH_HEAD` and `MERGE_HEAD` specially here.
    +    But the current list of special refs is missing some refs that really
    +    should be treated specially. Right now, we only treat `FETCH_HEAD` and
    +    `MERGE_HEAD` specially here.
     
         Introduce a new function `is_special_ref()` that contains all current
         instances of special refs to fix the reading path.
     
    +    Note that this is only a temporary measure where we record and rectify
    +    the current state. Ideally, the list of special refs should in the end
    +    only contain `FETCH_HEAD` and `MERGE_HEAD` again because they both may
    +    reference multiple objects and can contain annotations, so they indeed
    +    are special.
    +
         Based-on-patch-by: Han-Wen Nienhuys <hanwenn@gmail.com>
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
4:  4a4447a3f5 = 4:  3244678161 bisect: consistently write BISECT_EXPECTED_REV via the refdb

base-commit: 1a87c842ece327d03d08096395969aca5e0a6996
-- 
2.43.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Patrick Steinhardt @ 2023-12-14 13:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, Phillip Wood
In-Reply-To: <xmqqle9zqidj.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 3621 bytes --]

On Tue, Dec 12, 2023 at 12:24:24PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
[snip]
> > diff --git a/wt-status.c b/wt-status.c
> > index 9f45bf6949..fe9e590b80 100644
> > --- a/wt-status.c
> > +++ b/wt-status.c
> > @@ -1295,26 +1295,27 @@ static char *read_line_from_git_path(const char *filename)
> >  static int split_commit_in_progress(struct wt_status *s)
> >  {
> >  	int split_in_progress = 0;
> > -	char *head, *orig_head, *rebase_amend, *rebase_orig_head;
> > +	struct object_id head_oid, orig_head_oid;
> > +	char *rebase_amend, *rebase_orig_head;
> >  
> >  	if ((!s->amend && !s->nowarn && !s->workdir_dirty) ||
> >  	    !s->branch || strcmp(s->branch, "HEAD"))
> >  		return 0;
> >  
> > -	head = read_line_from_git_path("HEAD");
> > -	orig_head = read_line_from_git_path("ORIG_HEAD");
> > +	if (read_ref_full("HEAD", RESOLVE_REF_NO_RECURSE, &head_oid, NULL) ||
> > +	    read_ref_full("ORIG_HEAD", RESOLVE_REF_NO_RECURSE, &orig_head_oid, NULL))
> > +		return 0;
> > +
> 
> This made me wonder if we have changed behaviour when on an unborn
> branch.  In such a case, the original most likely would have read
> "ref: blah" in "head" and compared it with "rebase_amend", which
> would be a good way to ensure they would not match.  I would not
> know offhand what the updated code would do, but head_oid would be
> uninitialized in such a case, so ...?

The code flow goes as following:

  1. We call `read_ref_full()`, which itself calls
     `refs_resolve_ref_unsafe()`.

  2. It calls `refs_read_raw_ref()`, which succeeds and finds the
     symref.

  3. We notice that this is a symref and that `RESOLVE_REF_NO_RECURSE`
     is set. We thus clear the object ID and return the name of the
     symref target.

  4. Back in `read_ref_full()` we see that `refs_resolve_ref_unsafe()`
     returns the symref target, which we interpret as successful lookup.
     We thus return `0`.

  5. Now we look up "rebase-merge/{amend,orig-head}" and end up
     comparing whatever they contain with the cleared OID resolved from
     HEAD/ORIG_HEAD.

So the OID would not be uninitialized but the zero OID. Now:

  - "rebase-merge/amend" always contains the result of `repo_get_oid()`
    and never contains the zero OID.

  - "rebase-merge/orig-head" may contain the zero OID when there was no
    ORIG_HEAD at the time of starting a rebase or in case it did not
    resolve

So... if ORIG_HEAD was rewritten to be a symref pointing into nirvana
between starting the rebase and calling into "wt-status.c", and when
ORIG_HEAD didn't exist at the time of starting the rebase, then we might
now wrongly report that splitting was in progress.

In other cases the old code was actually doing the wrong thing. Suppose
that ORIG_HEAD was a dangling symref, then we'd have written the zero
OID into "rebase-merge/orig-head". But when calling into "wt-status.c"
now we read the still-dangling symref value and notice that the zero OID
is different than "ref: refs/heads/dangling".

I dunno. It feels like this is one of many cases where as you start to
think deeply about how things behave you realize that it's been broken
all along. On the other hand, I doubt there was even a single user who
ever experienced this issue. It often just needs to be correct enough.

I think the best way to go about this is to check for `REF_ISSSYMREF`
and exit early in that case. We only want to compare direct refs, so
this is closer to the old behaviour and should even fix edge cases like
the above.

Will update.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* completing an existing patch
From: Marzi Esipreh @ 2023-12-14 13:20 UTC (permalink / raw)
  To: git

Hi all,
I hope you are well.
I'm working in a company where most of our developers are using linux
as a development environment, so the performance of git on linux
platforms is important for us.
We came across this PR: https://github.com/git/git/pull/1352 that is
improving git status performance on linux platforms, we tried it out,
and we are happy with the result.
I was in contact with the author of this patch, and I addressed the PR
comments as well.
Please let me know how I can proceed? Shall I create a new fresh PR,
and refer to existing one in the descriptions?
Best regards,
Marzi

^ permalink raw reply

* Re: [PATCH] Use ^=1 to toggle between 0 and 1
From: René Scharfe @ 2023-12-14 13:08 UTC (permalink / raw)
  To: Jeff King; +Cc: AtariDreams via GitGitGadget, git, Seija Kijin
In-Reply-To: <20231213080143.GA1684525@coredump.intra.peff.net>

Am 13.12.23 um 09:01 schrieb Jeff King:
> On Tue, Dec 12, 2023 at 11:30:03PM +0100, René Scharfe wrote:
>
>> I wonder if
>> it's time to use the C99 type _Bool in our code.  It would allow
>> documenting that only two possible values exist in cases like the one
>> above.  That would be even more useful for function returns, I assume.

> I don't even know that we'd need much of a weather-balloon patch. I
> think it would be valid to do:
>
>   #ifndef bool
>   #define bool int
>
> to handle pre-C99 compilers (if there even are any these days). Of
> course we probably need some conditional magic to try to "#include
> <stdbool.h>" for the actual C99. I guess we could assume C99 by default
> and then add NO_STDBOOL as an escape hatch if anybody complains.

The semantics are slightly different in edge cases, so that fallback
would not be fully watertight.  E.g. consider:

   bool b(bool cond) {return cond == true;}
   bool b2(void) {return b(2);}

b() returns false if you give it false and true for anything else. b2()
returns true.

With int as the fallback this becomes:

   int b(int cond) {return cond == 1;}
   int b2(void) {return b(2);}

Now only 1 is recognized as true, b2() returns 0 (false).

A coding rule to not compare bools could mitigate that.  Or a rule to
only use the values true and false in bool context and to only use
logical operators on them.

René

^ permalink raw reply

* [PATCH v2 2/2] fetch: no redundant error message for atomic fetch
From: Jiang Xin @ 2023-12-14 12:33 UTC (permalink / raw)
  To: Git List, Junio C Hamano, Patrick Steinhardt; +Cc: Jiang Xin
In-Reply-To: <cover.1702556642.git.zhiyou.jx@alibaba-inc.com>

From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

If an error occurs during an atomic fetch, a redundant error message
will appear at the end of do_fetch(). It was introduced in b3a804663c
(fetch: make `--atomic` flag cover backfilling of tags, 2022-02-17).

In function do_fetch(), a failure message is already shown before the
retcode is set, so we should not call additional error() at the end of
this function.

We can remove the redundant error() function, because we know that
the function ref_transaction_abort() never fails. While we can find a
common pattern for calling ref_transaction_abort() by running command
"git grep -A1 ref_transaction_abort", e.g.:

    if (ref_transaction_abort(transaction, &error))
        error("abort: %s", error.buf);

We can fix this issue follow this pattern, and the test case "fetch
porcelain output (atomic)" in t5574 will also be fixed. If in the future
we decide that we don't need to check the return value of the function
ref_transaction_abort(), this change can be fixed along with it.

Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
---
 builtin/fetch.c         | 4 +---
 t/t5574-fetch-output.sh | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index fd134ba74d..01a573cf8d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1775,10 +1775,8 @@ static int do_fetch(struct transport *transport,
 	}
 
 cleanup:
-	if (retcode && transaction) {
-		ref_transaction_abort(transaction, &err);
+	if (retcode && transaction && ref_transaction_abort(transaction, &err))
 		error("%s", err.buf);
-	}
 
 	display_state_release(&display_state);
 	close_fetch_head(&fetch_head);
diff --git a/t/t5574-fetch-output.sh b/t/t5574-fetch-output.sh
index bc747efefc..8d01e36b3d 100755
--- a/t/t5574-fetch-output.sh
+++ b/t/t5574-fetch-output.sh
@@ -98,7 +98,7 @@ do
 		opt=
 		;;
 	esac
-	test_expect_failure "fetch porcelain output ${opt:+(atomic)}" '
+	test_expect_success "fetch porcelain output ${opt:+(atomic)}" '
 		test_when_finished "rm -rf porcelain" &&
 
 		# Clone and pre-seed the repositories. We fetch references into two
-- 
2.41.0.232.g2f6f0bca4f.agit.8.0.4.dev


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox