Git development
 help / color / mirror / Atom feed
* Re: [PATCH v2] Documentation: render dash correctly
From: Junio C Hamano @ 2023-01-23 17:39 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Andrei Rybak, git
In-Reply-To: <CAN0heSogz0cdhVJdiZhCc2_fcHzJggPjbS0wCAQkRh1uZMxLig@mail.gmail.com>

Martin Ågren <martin.agren@gmail.com> writes:

> On Mon, 23 Jan 2023 at 10:01, Andrei Rybak <rybak.a.v@gmail.com> wrote:
>
>>  highlighted with a `$` sign; if you are trying to recreate these
>> -example by hand, do not cut and paste them---they are there
>> +example by hand, do not cut and paste them--they are there
>>  primarily to highlight extra whitespace at the end of some lines.
>
> OK, so this is one of the new ones compared to v1. I can see the
> argument for adding some spaces around the "--" for consistency and to
> make this a bit easier to read in the resulting manpage (which can of
> course be very subjective), but then I can also see that kind of change
> being left out as orthogonal to this patch.
>
> This v2 patch looks good to me.

Thanks, both.  Will queue.

^ permalink raw reply

* [PATCH 5/5] hook: support a --to-stdin=<path> option for testing
From: Ævar Arnfjörð Bjarmason @ 2023-01-23 17:15 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Junio C Hamano, Eric Sunshine, Felipe Contreras,
	Taylor Blau, Michael Strawbridge,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <cover-0.5-00000000000-20230123T170550Z-avarab@gmail.com>

From: Emily Shaffer <emilyshaffer@google.com>

Expose the "path_to_stdin" API added in the preceding commit in the
"git hook run" command. For now we won't be using this command
interface outside of the tests, but exposing this functionality makes
it easier to test the hook API.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-hook.txt |  7 ++++++-
 builtin/hook.c             |  4 +++-
 t/t1800-hook.sh            | 18 ++++++++++++++++++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-hook.txt b/Documentation/git-hook.txt
index 77c3a8ad909..3407f3c2c07 100644
--- a/Documentation/git-hook.txt
+++ b/Documentation/git-hook.txt
@@ -8,7 +8,7 @@ git-hook - Run git hooks
 SYNOPSIS
 --------
 [verse]
-'git hook' run [--ignore-missing] <hook-name> [-- <hook-args>]
+'git hook' run [--ignore-missing] [--to-stdin=<path>] <hook-name> [-- <hook-args>]
 
 DESCRIPTION
 -----------
@@ -31,6 +31,11 @@ linkgit:githooks[5] for arguments hooks might expect (if any).
 OPTIONS
 -------
 
+--to-stdin::
+	For "run"; Specify a file which will be streamed into the
+	hook's stdin. The hook will receive the entire file from
+	beginning to EOF.
+
 --ignore-missing::
 	Ignore any missing hook by quietly returning zero. Used for
 	tools that want to do a blind one-shot run of a hook that may
diff --git a/builtin/hook.c b/builtin/hook.c
index b6530d189ad..f95b7965c58 100644
--- a/builtin/hook.c
+++ b/builtin/hook.c
@@ -7,7 +7,7 @@
 #include "strvec.h"
 
 #define BUILTIN_HOOK_RUN_USAGE \
-	N_("git hook run [--ignore-missing] <hook-name> [-- <hook-args>]")
+	N_("git hook run [--ignore-missing] [--to-stdin=<path>] <hook-name> [-- <hook-args>]")
 
 static const char * const builtin_hook_usage[] = {
 	BUILTIN_HOOK_RUN_USAGE,
@@ -28,6 +28,8 @@ static int run(int argc, const char **argv, const char *prefix)
 	struct option run_options[] = {
 		OPT_BOOL(0, "ignore-missing", &ignore_missing,
 			 N_("silently ignore missing requested <hook-name>")),
+		OPT_STRING(0, "to-stdin", &opt.path_to_stdin, N_("path"),
+			   N_("file to read into hooks' stdin")),
 		OPT_END(),
 	};
 	int ret;
diff --git a/t/t1800-hook.sh b/t/t1800-hook.sh
index 2ef3579fa7c..3506f627b6c 100755
--- a/t/t1800-hook.sh
+++ b/t/t1800-hook.sh
@@ -177,4 +177,22 @@ test_expect_success 'git hook run a hook with a bad shebang' '
 	test_cmp expect actual
 '
 
+test_expect_success 'stdin to hooks' '
+	write_script .git/hooks/test-hook <<-\EOF &&
+	echo BEGIN stdin
+	cat
+	echo END stdin
+	EOF
+
+	cat >expect <<-EOF &&
+	BEGIN stdin
+	hello
+	END stdin
+	EOF
+
+	echo hello >input &&
+	git hook run --to-stdin=input test-hook 2>actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.39.1.1301.gffb37c08dee


^ permalink raw reply related

* [PATCH 3/5] hook API: support passing stdin to hooks, convert am's 'post-rewrite'
From: Ævar Arnfjörð Bjarmason @ 2023-01-23 17:15 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Junio C Hamano, Eric Sunshine, Felipe Contreras,
	Taylor Blau, Michael Strawbridge,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <cover-0.5-00000000000-20230123T170550Z-avarab@gmail.com>

From: Emily Shaffer <emilyshaffer@google.com>

Convert the invocation of the 'post-rewrite' hook run by 'git am' to
use the hook.h library. To do this we need to add a "path_to_stdin"
member to "struct run_hooks_opt".

In our API this is supported by asking for a file path, rather
than by reading stdin. Reading directly from stdin would involve caching
the entire stdin (to memory or to disk) once the hook API is made to
support "jobs" larger than 1, along with support for executing N hooks
at a time (i.e. the upcoming config-based hooks).

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/am.c | 20 ++++----------------
 hook.c       |  8 +++++++-
 hook.h       |  5 +++++
 3 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 82a41cbfc4e..8be91617fef 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -495,24 +495,12 @@ static int run_applypatch_msg_hook(struct am_state *state)
  */
 static int run_post_rewrite_hook(const struct am_state *state)
 {
-	struct child_process cp = CHILD_PROCESS_INIT;
-	const char *hook = find_hook("post-rewrite");
-	int ret;
-
-	if (!hook)
-		return 0;
+	struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
 
-	strvec_push(&cp.args, hook);
-	strvec_push(&cp.args, "rebase");
+	strvec_push(&opt.args, "rebase");
+	opt.path_to_stdin = am_path(state, "rewritten");
 
-	cp.in = xopen(am_path(state, "rewritten"), O_RDONLY);
-	cp.stdout_to_stderr = 1;
-	cp.trace2_hook_name = "post-rewrite";
-
-	ret = run_command(&cp);
-
-	close(cp.in);
-	return ret;
+	return run_hooks_opt("post-rewrite", &opt);
 }
 
 /**
diff --git a/hook.c b/hook.c
index a4fa1031f28..86c6dc1fe70 100644
--- a/hook.c
+++ b/hook.c
@@ -53,8 +53,14 @@ static int pick_next_hook(struct child_process *cp,
 	if (!hook_path)
 		return 0;
 
-	cp->no_stdin = 1;
 	strvec_pushv(&cp->env, hook_cb->options->env.v);
+	/* reopen the file for stdin; run_command closes it. */
+	if (hook_cb->options->path_to_stdin) {
+		cp->no_stdin = 0;
+		cp->in = xopen(hook_cb->options->path_to_stdin, O_RDONLY);
+	} else {
+		cp->no_stdin = 1;
+	}
 	cp->stdout_to_stderr = 1;
 	cp->trace2_hook_name = hook_cb->hook_name;
 	cp->dir = hook_cb->options->dir;
diff --git a/hook.h b/hook.h
index 4258b13da0d..19ab9a5806e 100644
--- a/hook.h
+++ b/hook.h
@@ -30,6 +30,11 @@ struct run_hooks_opt
 	 * was invoked.
 	 */
 	int *invoked_hook;
+
+	/**
+	 * Path to file which should be piped to stdin for each hook.
+	 */
+	const char *path_to_stdin;
 };
 
 #define RUN_HOOKS_OPT_INIT { \
-- 
2.39.1.1301.gffb37c08dee


^ permalink raw reply related

* [PATCH 4/5] sequencer: use the new hook API for the simpler "post-rewrite" call
From: Ævar Arnfjörð Bjarmason @ 2023-01-23 17:15 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Junio C Hamano, Eric Sunshine, Felipe Contreras,
	Taylor Blau, Michael Strawbridge,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <cover-0.5-00000000000-20230123T170550Z-avarab@gmail.com>

From: Emily Shaffer <emilyshaffer@google.com>

Change the invocation of the "post-rewrite" hook added in
795160457db (sequencer (rebase -i): run the post-rewrite hook, if
needed, 2017-01-02) to use the new hook API.

This leaves the more complex "post-rewrite" invocation added in
a87a6f3c98e (commit: move post-rewrite code to libgit, 2017-11-17)
here in sequencer.c unconverted. That'll be done in a subsequent
commit.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 sequencer.c | 18 ++++--------------
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index 3e4a1972897..d8d59d05dd4 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -4834,8 +4834,7 @@ static int pick_commits(struct repository *r,
 		if (!stat(rebase_path_rewritten_list(), &st) &&
 				st.st_size > 0) {
 			struct child_process child = CHILD_PROCESS_INIT;
-			const char *post_rewrite_hook =
-				find_hook("post-rewrite");
+			struct run_hooks_opt hook_opt = RUN_HOOKS_OPT_INIT;
 
 			child.in = open(rebase_path_rewritten_list(), O_RDONLY);
 			child.git_cmd = 1;
@@ -4845,18 +4844,9 @@ static int pick_commits(struct repository *r,
 			/* we don't care if this copying failed */
 			run_command(&child);
 
-			if (post_rewrite_hook) {
-				struct child_process hook = CHILD_PROCESS_INIT;
-
-				hook.in = open(rebase_path_rewritten_list(),
-					O_RDONLY);
-				hook.stdout_to_stderr = 1;
-				hook.trace2_hook_name = "post-rewrite";
-				strvec_push(&hook.args, post_rewrite_hook);
-				strvec_push(&hook.args, "rebase");
-				/* we don't care if this hook failed */
-				run_command(&hook);
-			}
+			hook_opt.path_to_stdin = rebase_path_rewritten_list();
+			strvec_push(&hook_opt.args, "rebase");
+			run_hooks_opt("post-rewrite", &hook_opt);
 		}
 		apply_autostash(rebase_path_autostash());
 
-- 
2.39.1.1301.gffb37c08dee


^ permalink raw reply related

* [PATCH 0/5] hook API: support stdin, convert post-rewrite
From: Ævar Arnfjörð Bjarmason @ 2023-01-23 17:15 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Junio C Hamano, Eric Sunshine, Felipe Contreras,
	Taylor Blau, Michael Strawbridge,
	Ævar Arnfjörð Bjarmason

The greater "config-based-hooks" topic has been stalled for a while.

In the last couple of cycles it was held up by run-command.c API
changes (which will make the rest of this much simpler), and lack of
time on my part.

But let's get the ball rolling again. The last time this was
on-list[1] it was part of a 36 patch series, these 5 patches only
convert one hook (implemented by both sequencer & git-am) to the hook
API, but it's an important step: To do so we need so support reading
from stdin, and passing that to the hooks.

The (passing) CI & topic branch for this is at[2].

The immediate motivation for this is to supply a "stdin" interface
that git-send-email.perl can use, see [3].

Changes since the [1] v5:

 * A new 1/5 here, picked from
   https://lore.kernel.org/git/patch-v2-07.22-b90961ae76d-20221012T084850Z-avarab@gmail.com/;
   A small while-at-it cleanup of a related API.

 * Updates for the aforementioned run-command API changes.

 * Commit message updates & clarifications.

 * The previous version of the "sequencer.c" change changed a variable
   name while at it, now it doesn't, making the diff much easier to
   read.

 * Updates for the *.txt and -h usage, so that we'll pass the
   since-added t0450 test.

1. https://lore.kernel.org/git/cover-v5-00.36-00000000000-20210902T125110Z-avarab@gmail.com/
2. https://github.com/avar/git/tree/es-avar/config-based-hooks-the-beginning
3. https://lore.kernel.org/git/230123.86wn5ds602.gmgdl@evledraar.gmail.com/

Emily Shaffer (4):
  run-command: allow stdin for run_processes_parallel
  hook API: support passing stdin to hooks, convert am's 'post-rewrite'
  sequencer: use the new hook API for the simpler "post-rewrite" call
  hook: support a --to-stdin=<path> option for testing

Ævar Arnfjörð Bjarmason (1):
  run-command.c: remove dead assignment in while-loop

 Documentation/git-hook.txt |  7 ++++++-
 builtin/am.c               | 20 ++++----------------
 builtin/hook.c             |  4 +++-
 hook.c                     |  8 +++++++-
 hook.h                     |  5 +++++
 run-command.c              | 13 +++++++++----
 sequencer.c                | 18 ++++--------------
 t/t1800-hook.sh            | 18 ++++++++++++++++++
 8 files changed, 56 insertions(+), 37 deletions(-)

Range-diff:
 1:  ac419613fdc <  -:  ----------- Makefile: mark "check" target as .PHONY
 2:  a161b7f0a5c <  -:  ----------- Makefile: stop hardcoding {command,config}-list.h
 3:  ffef1d3257e <  -:  ----------- Makefile: remove an out-of-date comment
 4:  545e16c6f04 <  -:  ----------- hook.[ch]: move find_hook() from run-command.c to hook.c
 5:  a9bc4519e9a <  -:  ----------- hook.c: add a hook_exists() wrapper and use it in bugreport.c
 6:  e99ec2e6f8f <  -:  ----------- hook.c users: use "hook_exists()" instead of "find_hook()"
 7:  2ffb2332c8a <  -:  ----------- hook-list.h: add a generated list of hooks, like config-list.h
 8:  72dd1010f5b <  -:  ----------- hook: add 'run' subcommand
 9:  821cc9bf11e <  -:  ----------- gc: use hook library for pre-auto-gc hook
10:  d71c90254ea <  -:  ----------- rebase: convert pre-rebase to use hook.h
11:  ea3af2ccc4d <  -:  ----------- am: convert applypatch to use hook.h
12:  fed0b52f88f <  -:  ----------- hooks: convert 'post-checkout' hook to hook library
13:  53d8721a0e3 <  -:  ----------- merge: convert post-merge to use hook.h
14:  d60827a2856 <  -:  ----------- git hook run: add an --ignore-missing flag
15:  d4976a0821f <  -:  ----------- send-email: use 'git hook run' for 'sendemail-validate'
16:  99f3dcd1945 <  -:  ----------- git-p4: use 'git hook' to run hooks
17:  509761454e6 <  -:  ----------- commit: convert {pre-commit,prepare-commit-msg} hook to hook.h
18:  e2c94d95427 <  -:  ----------- read-cache: convert post-index-change to use hook.h
19:  fa7d0d24ea2 <  -:  ----------- receive-pack: convert push-to-checkout hook to hook.h
20:  428bb5a6792 <  -:  ----------- run-command: remove old run_hook_{le,ve}() hook API
 -:  ----------- >  1:  351c6a55a41 run-command.c: remove dead assignment in while-loop
21:  994f6ad8602 !  2:  81eef2f60a0 run-command: allow stdin for run_processes_parallel
    @@ Commit message
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## run-command.c ##
    -@@ run-command.c: static int pp_start_one(struct parallel_processes *pp)
    - 	if (i == pp->max_processes)
    +@@ run-command.c: static int pp_start_one(struct parallel_processes *pp,
    + 	if (i == opts->processes)
      		BUG("bookkeeping is hard");
      
     +	/*
    @@ run-command.c: static int pp_start_one(struct parallel_processes *pp)
     +	 */
     +	pp->children[i].process.no_stdin = 1;
     +
    - 	code = pp->get_next_task(&pp->children[i].process,
    - 				 &pp->children[i].err,
    - 				 pp->data,
    -@@ run-command.c: static int pp_start_one(struct parallel_processes *pp)
    + 	code = opts->get_next_task(&pp->children[i].process,
    + 				   opts->ungroup ? NULL : &pp->children[i].err,
    + 				   opts->data,
    +@@ run-command.c: static int pp_start_one(struct parallel_processes *pp,
    + 		pp->children[i].process.err = -1;
    + 		pp->children[i].process.stdout_to_stderr = 1;
      	}
    - 	pp->children[i].process.err = -1;
    - 	pp->children[i].process.stdout_to_stderr = 1;
     -	pp->children[i].process.no_stdin = 1;
      
      	if (start_command(&pp->children[i].process)) {
    - 		code = pp->start_failure(&pp->children[i].err,
    + 		if (opts->start_failure)
23:  f548e3d15e7 !  3:  c6b9b69c516 am: convert 'post-rewrite' hook to hook.h
    @@ Metadata
     Author: Emily Shaffer <emilyshaffer@google.com>
     
      ## Commit message ##
    -    am: convert 'post-rewrite' hook to hook.h
    +    hook API: support passing stdin to hooks, convert am's 'post-rewrite'
    +
    +    Convert the invocation of the 'post-rewrite' hook run by 'git am' to
    +    use the hook.h library. To do this we need to add a "path_to_stdin"
    +    member to "struct run_hooks_opt".
    +
    +    In our API this is supported by asking for a file path, rather
    +    than by reading stdin. Reading directly from stdin would involve caching
    +    the entire stdin (to memory or to disk) once the hook API is made to
    +    support "jobs" larger than 1, along with support for executing N hooks
    +    at a time (i.e. the upcoming config-based hooks).
     
         Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    @@ builtin/am.c: static int run_applypatch_msg_hook(struct am_state *state)
     -
     -	if (!hook)
     -		return 0;
    --
    ++	struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
    + 
     -	strvec_push(&cp.args, hook);
     -	strvec_push(&cp.args, "rebase");
    --
    ++	strvec_push(&opt.args, "rebase");
    ++	opt.path_to_stdin = am_path(state, "rewritten");
    + 
     -	cp.in = xopen(am_path(state, "rewritten"), O_RDONLY);
     -	cp.stdout_to_stderr = 1;
     -	cp.trace2_hook_name = "post-rewrite";
    -+	struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
    - 
    +-
     -	ret = run_command(&cp);
    -+	strvec_push(&opt.args, "rebase");
    -+	opt.path_to_stdin = am_path(state, "rewritten");
    - 
    +-
     -	close(cp.in);
     -	return ret;
    -+	return run_hooks_oneshot("post-rewrite", &opt);
    ++	return run_hooks_opt("post-rewrite", &opt);
      }
      
      /**
    +
    + ## hook.c ##
    +@@ hook.c: static int pick_next_hook(struct child_process *cp,
    + 	if (!hook_path)
    + 		return 0;
    + 
    +-	cp->no_stdin = 1;
    + 	strvec_pushv(&cp->env, hook_cb->options->env.v);
    ++	/* reopen the file for stdin; run_command closes it. */
    ++	if (hook_cb->options->path_to_stdin) {
    ++		cp->no_stdin = 0;
    ++		cp->in = xopen(hook_cb->options->path_to_stdin, O_RDONLY);
    ++	} else {
    ++		cp->no_stdin = 1;
    ++	}
    + 	cp->stdout_to_stderr = 1;
    + 	cp->trace2_hook_name = hook_cb->hook_name;
    + 	cp->dir = hook_cb->options->dir;
    +
    + ## hook.h ##
    +@@ hook.h: struct run_hooks_opt
    + 	 * was invoked.
    + 	 */
    + 	int *invoked_hook;
    ++
    ++	/**
    ++	 * Path to file which should be piped to stdin for each hook.
    ++	 */
    ++	const char *path_to_stdin;
    + };
    + 
    + #define RUN_HOOKS_OPT_INIT { \
 -:  ----------- >  4:  7a55c95f60f sequencer: use the new hook API for the simpler "post-rewrite" call
22:  3ccc654a664 !  5:  cb9ef7a89c4 hook: support passing stdin to hooks
    @@ Metadata
     Author: Emily Shaffer <emilyshaffer@google.com>
     
      ## Commit message ##
    -    hook: support passing stdin to hooks
    +    hook: support a --to-stdin=<path> option for testing
     
    -    Some hooks (such as post-rewrite) need to take input via stdin.
    -    Previously, callers provided stdin to hooks by setting
    -    run-command.h:child_process.in, which takes a FD. Callers would open the
    -    file in question themselves before calling run-command(). However, since
    -    we will now need to seek to the front of the file and read it again for
    -    every hook which runs, hook.h:run_command() takes a path and handles FD
    -    management itself. Since this file is opened for read only, it should
    -    not prevent later parallel execution support.
    -
    -    On the frontend, this is supported by asking for a file path, rather
    -    than by reading stdin. Reading directly from stdin would involve caching
    -    the entire stdin (to memory or to disk) and reading it back from the
    -    beginning to each hook. We'd want to support cases like insufficient
    -    memory or storage for the file. While this may prove useful later, for
    -    now the path of least resistance is to just ask the user to make this
    -    interim file themselves.
    +    Expose the "path_to_stdin" API added in the preceding commit in the
    +    "git hook run" command. For now we won't be using this command
    +    interface outside of the tests, but exposing this functionality makes
    +    it easier to test the hook API.
     
         Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Documentation/git-hook.txt ##
    -@@ Documentation/git-hook.txt: git-hook - run git hooks
    +@@ Documentation/git-hook.txt: git-hook - Run git hooks
      SYNOPSIS
      --------
      [verse]
     -'git hook' run [--ignore-missing] <hook-name> [-- <hook-args>]
    -+'git hook' run [--to-stdin=<path>] [--ignore-missing] <hook-name> [-- <hook-args>]
    ++'git hook' run [--ignore-missing] [--to-stdin=<path>] <hook-name> [-- <hook-args>]
      
      DESCRIPTION
      -----------
    -@@ Documentation/git-hook.txt: what those are.
    +@@ Documentation/git-hook.txt: linkgit:githooks[5] for arguments hooks might expect (if any).
      OPTIONS
      -------
      
    @@ builtin/hook.c
     @@ builtin/hook.c: static int run(int argc, const char **argv, const char *prefix)
      	struct option run_options[] = {
      		OPT_BOOL(0, "ignore-missing", &ignore_missing,
    - 			 N_("exit quietly with a zero exit code if the requested hook cannot be found")),
    + 			 N_("silently ignore missing requested <hook-name>")),
     +		OPT_STRING(0, "to-stdin", &opt.path_to_stdin, N_("path"),
     +			   N_("file to read into hooks' stdin")),
      		OPT_END(),
      	};
      	int ret;
     
    - ## hook.c ##
    -@@ hook.c: static int pick_next_hook(struct child_process *cp,
    - 	if (!run_me)
    - 		return 0;
    - 
    --	cp->no_stdin = 1;
    -+	/* reopen the file for stdin; run_command closes it. */
    -+	if (hook_cb->options->path_to_stdin) {
    -+		cp->no_stdin = 0;
    -+		cp->in = xopen(hook_cb->options->path_to_stdin, O_RDONLY);
    -+	} else {
    -+		cp->no_stdin = 1;
    -+	}
    - 	cp->env = hook_cb->options->env.v;
    - 	cp->stdout_to_stderr = 1;
    - 	cp->trace2_hook_name = hook_cb->hook_name;
    -
    - ## hook.h ##
    -@@ hook.h: struct run_hooks_opt
    - 
    - 	/* Path to initial working directory for subprocess */
    - 	const char *dir;
    -+
    -+	/* Path to file which should be piped to stdin for each hook */
    -+	const char *path_to_stdin;
    - };
    - 
    - #define RUN_HOOKS_OPT_INIT { \
    -
      ## t/t1800-hook.sh ##
    -@@ t/t1800-hook.sh: test_expect_success 'git -c core.hooksPath=<PATH> hook run' '
    +@@ t/t1800-hook.sh: test_expect_success 'git hook run a hook with a bad shebang' '
      	test_cmp expect actual
      '
      
24:  bb119fa7cc0 <  -:  ----------- run-command: add stdin callback for parallelization
25:  2439f7752b8 <  -:  ----------- hook: provide stdin by string_list or callback
26:  48a380b3a91 <  -:  ----------- hook: convert 'post-rewrite' hook in sequencer.c to hook.h
27:  af6b9292aaa <  -:  ----------- transport: convert pre-push hook to hook.h
28:  957691f0b6d <  -:  ----------- hook tests: test for exact "pre-push" hook input
29:  88fe2621549 <  -:  ----------- hook tests: use a modern style for "pre-push" tests
30:  1d905e81779 <  -:  ----------- reference-transaction: use hook.h to run hooks
31:  fac56a9d8af <  -:  ----------- run-command: allow capturing of collated output
32:  7d185cdf9d1 <  -:  ----------- hooks: allow callers to capture output
33:  c8150e1239f <  -:  ----------- receive-pack: convert 'update' hook to hook.h
34:  a20ad847c14 <  -:  ----------- post-update: use hook.h library
35:  79c380be6ed <  -:  ----------- receive-pack: convert receive hooks to hook.h
36:  fe056098534 <  -:  ----------- hooks: fix a TOCTOU in "did we run a hook?" heuristic
-- 
2.39.1.1301.gffb37c08dee


^ permalink raw reply

* [PATCH 2/5] run-command: allow stdin for run_processes_parallel
From: Ævar Arnfjörð Bjarmason @ 2023-01-23 17:15 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Junio C Hamano, Eric Sunshine, Felipe Contreras,
	Taylor Blau, Michael Strawbridge,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <cover-0.5-00000000000-20230123T170550Z-avarab@gmail.com>

From: Emily Shaffer <emilyshaffer@google.com>

While it makes sense not to inherit stdin from the parent process to
avoid deadlocking, it's not necessary to completely ban stdin to
children. An informed user should be able to configure stdin safely. By
setting `some_child.process.no_stdin=1` before calling `get_next_task()`
we provide a reasonable default behavior but enable users to set up
stdin streaming for themselves during the callback.

`some_child.process.stdout_to_stderr`, however, remains unmodifiable by
`get_next_task()` - the rest of the run_processes_parallel() API depends
on child output in stderr.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 run-command.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index b439c7974ca..6bd16acb060 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1586,6 +1586,14 @@ static int pp_start_one(struct parallel_processes *pp,
 	if (i == opts->processes)
 		BUG("bookkeeping is hard");
 
+	/*
+	 * By default, do not inherit stdin from the parent process - otherwise,
+	 * all children would share stdin! Users may overwrite this to provide
+	 * something to the child's stdin by having their 'get_next_task'
+	 * callback assign 0 to .no_stdin and an appropriate integer to .in.
+	 */
+	pp->children[i].process.no_stdin = 1;
+
 	code = opts->get_next_task(&pp->children[i].process,
 				   opts->ungroup ? NULL : &pp->children[i].err,
 				   opts->data,
@@ -1601,7 +1609,6 @@ static int pp_start_one(struct parallel_processes *pp,
 		pp->children[i].process.err = -1;
 		pp->children[i].process.stdout_to_stderr = 1;
 	}
-	pp->children[i].process.no_stdin = 1;
 
 	if (start_command(&pp->children[i].process)) {
 		if (opts->start_failure)
-- 
2.39.1.1301.gffb37c08dee


^ permalink raw reply related

* [PATCH 1/5] run-command.c: remove dead assignment in while-loop
From: Ævar Arnfjörð Bjarmason @ 2023-01-23 17:15 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Junio C Hamano, Eric Sunshine, Felipe Contreras,
	Taylor Blau, Michael Strawbridge,
	Ævar Arnfjörð Bjarmason
In-Reply-To: <cover-0.5-00000000000-20230123T170550Z-avarab@gmail.com>

Remove code that's been unused since it was added in
c553c72eed6 (run-command: add an asynchronous parallel child
processor, 2015-12-15), the next use of "i" in this function is:

	for (i = 0; ...

So we'll always clobber the "i" that's set here. Presumably the "i"
assignment is an artifact of WIP code that made it into our tree.

A subsequent commit will need to adjust the type of the "i" variable
in the otherwise unrelated for-loop, which is why this is being
removed now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 run-command.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/run-command.c b/run-command.c
index 50cc011654e..b439c7974ca 100644
--- a/run-command.c
+++ b/run-command.c
@@ -1632,9 +1632,7 @@ static void pp_buffer_stderr(struct parallel_processes *pp,
 			     const struct run_process_parallel_opts *opts,
 			     int output_timeout)
 {
-	int i;
-
-	while ((i = poll(pp->pfd, opts->processes, output_timeout) < 0)) {
+	while (poll(pp->pfd, opts->processes, output_timeout) < 0) {
 		if (errno == EINTR)
 			continue;
 		pp_cleanup(pp, opts);
-- 
2.39.1.1301.gffb37c08dee


^ permalink raw reply related

* Re: [CI]: Is t7527 known to be flakey?
From: Jeff Hostetler @ 2023-01-23 16:56 UTC (permalink / raw)
  To: SZEDER Gábor, Junio C Hamano; +Cc: Jeff Hostetler, edecosta, git
In-Reply-To: <20230121102355.GA2155@szeder.dev>



On 1/21/23 5:23 AM, SZEDER Gábor wrote:
> On Thu, Jan 19, 2023 at 06:52:01PM -0800, Junio C Hamano wrote:
>> The said test failed its linux-musl job in its first attempt, but
>> re-running the failed job passed.
>>
>>      https://github.com/git/git/actions/runs/3963948890/jobs/6792356234
>>      (seen@e096683 attempt #1 linux-musl)
>>
>>      https://github.com/git/git/actions/runs/3963948890/jobs/6792850313
>>      (seen@e096683 attempt #2 linux-musl)
> 
> t7527 is quite slow, even with the right selection of test cases, but
> this little tweak makes it much faster:
> 
>    diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
>    index 0e497ba98d..4210ef644c 100755
>    --- a/t/t7527-builtin-fsmonitor.sh
>    +++ b/t/t7527-builtin-fsmonitor.sh
>    @@ -676,7 +676,10 @@ test_expect_success 'cleanup worktrees' '
>     # cause incorrect results when the untracked-cache is enabled.
>     
>     test_lazy_prereq UNTRACKED_CACHE '
>    -	git update-index --test-untracked-cache
>    +	# This check takes a very long time, but I know it works on
>    +	# my system, so let's fake it.
>    +	#git update-index --test-untracked-cache
>    +	true
>     '
>     
>     test_expect_success 'Matrix: setup for untracked-cache,fsmonitor matrix' '
> 
> This with the right selection of test cases makes --stress
> practicable, and the test tends to fail after a handful of
> repetitions:
> 
>    ./t7527-builtin-fsmonitor.sh --stress -r 8,23-58
> 
> I saw different failures in multiple test cases, e.g.:
> 
> Unexpected output in case 55:
> 
>    expecting success of 7527.55 'Matrix[uc:true][fsm:true] move_directory_contents_deeper':
>    		matrix_clean_up_repo &&
>    		$fn &&
>    		if test $uc = false && test $fsm = false
>    		then
>    			git status --porcelain=v1 >.git/expect.$fn
>    		else
>    			git status --porcelain=v1 >.git/actual.$fn &&
>    			test_cmp .git/expect.$fn .git/actual.$fn
>    		fi
>    	
>    + matrix_clean_up_repo
>    + git reset --hard HEAD
>    HEAD is now at 1d1edcb initial
>    + git clean -fd
>    + move_directory_contents_deeper
>    + mkdir T1/_new_
>    + mv T1/F1 T1/F2 T1/T2 T1/_new_
>    + test true = false
>    + git status --porcelain=v1
>    error: read error: Connection reset by peer
>    error: could not read IPC response
>    + test_cmp .git/expect.move_directory_contents_deeper .git/actual.move_directory_contents_deeper
>    + test 2 -ne 2
>    + eval diff -u "$@"
>    + diff -u .git/expect.move_directory_contents_deeper .git/actual.move_directory_contents_deeper
>    --- .git/expect.move_directory_contents_deeper	2023-01-21 09:47:12.677410349 +0000
>    +++ .git/actual.move_directory_contents_deeper	2023-01-21 09:47:14.045448573 +0000
>    @@ -7,3 +7,4 @@
>      D T1/T2/T3/T4/F1
>      D T1/T2/T3/T4/F2
>     ?? T1/_new_/
>    +?? dir1
>    error: last command exited with $?=1
>    not ok 55 - Matrix[uc:true][fsm:true] move_directory_contents_deeper
> 
> SIGPIPE in 'git status' cases 42, 43 and 55:
> 
>    expecting success of 7527.42 'Matrix[uc:false][fsm:true] move_directory_contents_deeper':
>                    matrix_clean_up_repo &&
>                    $fn &&
>                    if test $uc = false && test $fsm = false
>                    then
>                            git status --porcelain=v1 >.git/expect.$fn
>                    else
>                            git status --porcelain=v1 >.git/actual.$fn &&
>                            test_cmp .git/expect.$fn .git/actual.$fn
>                    fi
>    
>    + matrix_clean_up_repo
>    + git reset --hard HEAD
>    HEAD is now at 1d1edcb initial
>    + git clean -fd
>    + move_directory_contents_deeper
>    + mkdir T1/_new_
>    + mv T1/F1 T1/F2 T1/T2 T1/_new_
>    + test false = false
>    + test true = false
>    + git status --porcelain=v1
>    error: last command exited with $?=141
>    not ok 42 - Matrix[uc:false][fsm:true] move_directory_contents_deeper
> 
>    expecting success of 7527.43 'Matrix[uc:false][fsm:true] move_directory_up':
>    		matrix_clean_up_repo &&
>    		$fn &&
>    		if test $uc = false && test $fsm = false
>    		then
>    			git status --porcelain=v1 >.git/expect.$fn
>    		else
>    			git status --porcelain=v1 >.git/actual.$fn &&
>    			test_cmp .git/expect.$fn .git/actual.$fn
>    		fi
>    	
>    + matrix_clean_up_repo
>    + git reset --hard HEAD
>    HEAD is now at 1d1edcb initial
>    + git clean -fd
>    Removing T1/_new_/
>    + move_directory_up
>    + mv T1/T2/T3 T1
>    + test false = false
>    + test true = false
>    + git status --porcelain=v1
>    error: last command exited with $?=141
>    not ok 43 - Matrix[uc:false][fsm:true] move_directory_up
> 
>    expecting success of 7527.55 'Matrix[uc:true][fsm:true] move_directory_contents_deeper':
>                    matrix_clean_up_repo &&
>                    $fn &&
>                    if test $uc = false && test $fsm = false
>                    then
>                            git status --porcelain=v1 >.git/expect.$fn
>                    else
>                            git status --porcelain=v1 >.git/actual.$fn &&
>                            test_cmp .git/expect.$fn .git/actual.$fn
>                    fi
>    
>    + matrix_clean_up_repo
>    + git reset --hard HEAD
>    HEAD is now at 1d1edcb initial
>    + git clean -fd
>    + move_directory_contents_deeper
>    + mkdir T1/_new_
>    + mv T1/F1 T1/F2 T1/T2 T1/_new_
>    + test true = false
>    + git status --porcelain=v1
>    error: last command exited with $?=141
> 
> I find it interesting that the output of 'git status' is redirected to
> a file in all these cases, and yet it gets a SIGPIPE.

The `git status` command talks to the running `git fsmonitor--daemon`
on a Unix domain socket (or a Named Pipe on Windows), so a SIGPIPE is
possible.

Was this on Linux or MacOS ?

Jeff




> 
> 
> I also saw the test hang a couple of times, e.g.:
> 
>    $ ./t7527-builtin-fsmonitor.sh --stress -r 8,23-58
>    OK    6.1
>    OK    7.1
>    OK    1.1
>    OK    2.1
>    OK    3.1
>    OK    5.1
>    OK    4.1
>    OK    0.1
>    OK    6.2
>    OK    1.2
>    OK    2.2
>    OK    7.2
>    OK    5.2
>    OK    0.2
>    OK    4.2
>    OK    6.3
>    OK    7.3
>    OK    2.3
>    OK    0.3
>    OK    4.3
>    OK    6.4
>    OK    7.4
>    OK    2.4
>    OK    0.4
>    OK    4.4
>    OK    6.5
>    OK    7.5
>    OK    2.5
>    OK    0.5
>    OK    4.5
>    OK    6.6
>    OK    7.6
>    OK    2.6
>    OK    0.6
>    OK    4.6
>    OK    6.7
>    OK    7.7
>    OK    2.7
>    OK    0.7
>    OK    4.7
>    OK    6.8
>    OK    7.8
>    OK    2.8
>    OK    0.8
>    OK    4.8
>    OK    6.9
>    OK    7.9
>    OK    2.9
>    OK    0.9
>    OK    4.9
>    OK    6.10
>    OK    7.10
>    OK    2.10
>    OK    0.10
>    OK    4.10
>    OK    6.11
>    OK    7.11
>    OK    2.11
>    OK    0.11
>    OK    4.11
>    OK    6.12
>    OK    7.12
>    OK    2.12
>    OK    0.12
>    OK    4.12
>    OK    6.13
>    OK    7.13
>    OK    2.13
>    OK    0.13
>    OK    6.14
>    OK    4.13
>    OK    7.14
>    OK    2.14
>    OK    0.14
>    OK    6.15
>    OK    4.14
>    OK    2.15
>    OK    7.15
>    OK    0.15
>    FAIL  7.16
>    OK    6.16
>    OK    2.16
>    OK    4.15
>    OK    0.16
> 
> At this point the test script should print the log of the failed job,
> but it hangs instead, as there are a number of stuck fsmonitor--daemon
> and status processes (notice how the stress test starts with 8 jobs,
> but the last repetition only has 4):
> 
>    $ ps aux |grep git
>    szeder   1857100  0.0  0.1  72272  4452 pts/2    Sl+  21:40   0:00 /home/szeder/src/git/git fsmonitor--daemon run --detach --ipc-threads=8
>    szeder   1857779  0.0  0.1   6560  4152 pts/2    S+   21:40   0:00 /home/szeder/src/git/git status --porcelain=v1
>    szeder   1860020  0.0  0.1  88664  4312 pts/2    Sl+  21:40   0:00 /home/szeder/src/git/git fsmonitor--daemon run --detach --ipc-threads=8
>    szeder   1860668  0.0  0.1   6560  4040 pts/2    S+   21:40   0:00 /home/szeder/src/git/git status --porcelain=v1
>    szeder   1860749  0.0  0.1  96860  4528 pts/2    Sl+  21:40   0:00 /home/szeder/src/git/git fsmonitor--daemon run --detach --ipc-threads=8
>    szeder   1861281  0.0  0.1   6560  4272 pts/2    S+   21:40   0:00 /home/szeder/src/git/git status --porcelain=v1
> 

^ permalink raw reply

* [PATCH v4] win32: fix thread usage for win32
From: Rose via GitGitGadget @ 2023-01-23 16:48 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt, Rose, Seija Kijin
In-Reply-To: <pull.1440.v3.git.git.1674492373925.gitgitgadget@gmail.com>

From: Seija Kijin <doremylover123@gmail.com>

Use _beginthreadex instead of CreateThread
since we use the Windows CRT,
as Microsoft recommends _beginthreadex
over CreateThread for these situations.

Finally, check for NULL handles, not "INVALID_HANDLE,"
as _beginthreadex guarantees a valid handle in most cases

Signed-off-by: Seija Kijin <doremylover123@gmail.com>
---
    win32: fix thread usage for win32
    
    Use pthread_exit instead of async_exit.
    
    This means we do not have to deal with Windows's implementation
    requiring an unsigned exit coded despite the POSIX exit code requiring a
    signed exit code.
    
    Use _beginthreadex instead of CreateThread since we use the Windows CRT.
    
    Finally, check for NULL handles, not "INVALID_HANDLE," as _beginthreadex
    guarantees a valid handle in most cases
    
    Signed-off-by: Seija Kijin doremylover123@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1440%2FAtariDreams%2FCreateThread-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1440/AtariDreams/CreateThread-v4
Pull-Request: https://github.com/git/git/pull/1440

Range-diff vs v3:

 1:  68baafba2bd ! 1:  2e2d5ce7745 win32: fix thread usage for win32
     @@ Commit message
      
          Signed-off-by: Seija Kijin <doremylover123@gmail.com>
      
     - ## compat/mingw.c ##
     -@@ compat/mingw.c: static int start_timer_thread(void)
     - 	timer_event = CreateEvent(NULL, FALSE, FALSE, NULL);
     - 	if (timer_event) {
     - 		timer_thread = (HANDLE) _beginthreadex(NULL, 0, ticktack, NULL, 0, NULL);
     --		if (!timer_thread )
     -+		if (!timer_thread)
     - 			return errno = ENOMEM,
     - 				error("cannot start timer thread");
     - 	} else
     -
       ## compat/winansi.c ##
      @@ compat/winansi.c: enum {
       	TEXT = 0, ESCAPE = 033, BRACKET = '['


 compat/winansi.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/compat/winansi.c b/compat/winansi.c
index 3abe8dd5a27..be65b27bd75 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -340,7 +340,7 @@ enum {
 	TEXT = 0, ESCAPE = 033, BRACKET = '['
 };
 
-static DWORD WINAPI console_thread(LPVOID unused)
+static unsigned int WINAPI console_thread(LPVOID unused)
 {
 	unsigned char buffer[BUFFER_SIZE];
 	DWORD bytes;
@@ -643,9 +643,9 @@ void winansi_init(void)
 		die_lasterr("CreateFile for named pipe failed");
 
 	/* start console spool thread on the pipe's read end */
-	hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
-	if (hthread == INVALID_HANDLE_VALUE)
-		die_lasterr("CreateThread(console_thread) failed");
+	hthread = (HANDLE)_beginthreadex(NULL, 0, console_thread, NULL, 0, NULL);
+	if (!hthread)
+		die_lasterr("_beginthreadex(console_thread) failed");
 
 	/* schedule cleanup routine */
 	if (atexit(winansi_exit))

base-commit: 56c8fb1e95377900ec9d53c07886022af0a5d3c2
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v3] win32: fix thread usage for win32
From: Rose via GitGitGadget @ 2023-01-23 16:46 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt, Rose, Seija Kijin
In-Reply-To: <pull.1440.v2.git.git.1674491796648.gitgitgadget@gmail.com>

From: Seija Kijin <doremylover123@gmail.com>

Use _beginthreadex instead of CreateThread
since we use the Windows CRT,
as Microsoft recommends _beginthreadex
over CreateThread for these situations.

Finally, check for NULL handles, not "INVALID_HANDLE,"
as _beginthreadex guarantees a valid handle in most cases

Signed-off-by: Seija Kijin <doremylover123@gmail.com>
---
    win32: fix thread usage for win32
    
    Use pthread_exit instead of async_exit.
    
    This means we do not have to deal with Windows's implementation
    requiring an unsigned exit coded despite the POSIX exit code requiring a
    signed exit code.
    
    Use _beginthreadex instead of CreateThread since we use the Windows CRT.
    
    Finally, check for NULL handles, not "INVALID_HANDLE," as _beginthreadex
    guarantees a valid handle in most cases
    
    Signed-off-by: Seija Kijin doremylover123@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1440%2FAtariDreams%2FCreateThread-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1440/AtariDreams/CreateThread-v3
Pull-Request: https://github.com/git/git/pull/1440

Range-diff vs v2:

 1:  4a2c3da9d4c ! 1:  68baafba2bd win32: fix thread usage for win32
     @@ Commit message
          win32: fix thread usage for win32
      
          Use _beginthreadex instead of CreateThread
     -    since we use the Windows CRT.
     +    since we use the Windows CRT,
     +    as Microsoft recommends _beginthreadex
     +    over CreateThread for these situations.
      
          Finally, check for NULL handles, not "INVALID_HANDLE,"
          as _beginthreadex guarantees a valid handle in most cases


 compat/mingw.c   | 2 +-
 compat/winansi.c | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index e433740381b..715f1c87e11 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -2291,7 +2291,7 @@ static int start_timer_thread(void)
 	timer_event = CreateEvent(NULL, FALSE, FALSE, NULL);
 	if (timer_event) {
 		timer_thread = (HANDLE) _beginthreadex(NULL, 0, ticktack, NULL, 0, NULL);
-		if (!timer_thread )
+		if (!timer_thread)
 			return errno = ENOMEM,
 				error("cannot start timer thread");
 	} else
diff --git a/compat/winansi.c b/compat/winansi.c
index 3abe8dd5a27..be65b27bd75 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -340,7 +340,7 @@ enum {
 	TEXT = 0, ESCAPE = 033, BRACKET = '['
 };
 
-static DWORD WINAPI console_thread(LPVOID unused)
+static unsigned int WINAPI console_thread(LPVOID unused)
 {
 	unsigned char buffer[BUFFER_SIZE];
 	DWORD bytes;
@@ -643,9 +643,9 @@ void winansi_init(void)
 		die_lasterr("CreateFile for named pipe failed");
 
 	/* start console spool thread on the pipe's read end */
-	hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
-	if (hthread == INVALID_HANDLE_VALUE)
-		die_lasterr("CreateThread(console_thread) failed");
+	hthread = (HANDLE)_beginthreadex(NULL, 0, console_thread, NULL, 0, NULL);
+	if (!hthread)
+		die_lasterr("_beginthreadex(console_thread) failed");
 
 	/* schedule cleanup routine */
 	if (atexit(winansi_exit))

base-commit: 56c8fb1e95377900ec9d53c07886022af0a5d3c2
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v2] win32: fix thread usage for win32
From: Rose via GitGitGadget @ 2023-01-23 16:36 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt, Rose, Seija Kijin
In-Reply-To: <pull.1440.git.git.1674334159116.gitgitgadget@gmail.com>

From: Seija Kijin <doremylover123@gmail.com>

Use _beginthreadex instead of CreateThread
since we use the Windows CRT.

Finally, check for NULL handles, not "INVALID_HANDLE,"
as _beginthreadex guarantees a valid handle in most cases

Signed-off-by: Seija Kijin <doremylover123@gmail.com>
---
    win32: fix thread usage for win32
    
    Use pthread_exit instead of async_exit.
    
    This means we do not have to deal with Windows's implementation
    requiring an unsigned exit coded despite the POSIX exit code requiring a
    signed exit code.
    
    Use _beginthreadex instead of CreateThread since we use the Windows CRT.
    
    Finally, check for NULL handles, not "INVALID_HANDLE," as _beginthreadex
    guarantees a valid handle in most cases
    
    Signed-off-by: Seija Kijin doremylover123@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1440%2FAtariDreams%2FCreateThread-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1440/AtariDreams/CreateThread-v2
Pull-Request: https://github.com/git/git/pull/1440

Range-diff vs v1:

 1:  f5de6bfb759 ! 1:  4a2c3da9d4c win32: fix thread usage for win32
     @@ Metadata
       ## Commit message ##
          win32: fix thread usage for win32
      
     -    Use pthread_exit instead of async_exit.
     -
     -    This means we do not have
     -    to deal with Windows's implementation
     -    requiring an unsigned exit coded
     -    despite the POSIX exit code requiring
     -    a signed exit code.
     -
          Use _beginthreadex instead of CreateThread
          since we use the Windows CRT.
      
     @@ compat/winansi.c: void winansi_init(void)
       
       	/* schedule cleanup routine */
       	if (atexit(winansi_exit))
     -
     - ## run-command.c ##
     -@@ run-command.c: static void *run_thread(void *data)
     - 	return (void *)ret;
     - }
     - 
     -+int in_async(void)
     -+{
     -+	if (!main_thread_set)
     -+		return 0; /* no asyncs started yet */
     -+	return !pthread_equal(main_thread, pthread_self());
     -+}
     -+
     - static NORETURN void die_async(const char *err, va_list params)
     - {
     - 	report_fn die_message_fn = get_die_message_routine();
     -@@ run-command.c: static int async_die_is_recursing(void)
     - 	return ret != NULL;
     - }
     - 
     --int in_async(void)
     --{
     --	if (!main_thread_set)
     --		return 0; /* no asyncs started yet */
     --	return !pthread_equal(main_thread, pthread_self());
     --}
     --
     --static void NORETURN async_exit(int code)
     --{
     --	pthread_exit((void *)(intptr_t)code);
     --}
     --
     - #else
     - 
     - static struct {
     -@@ run-command.c: int in_async(void)
     - 	return process_is_async;
     - }
     - 
     --static void NORETURN async_exit(int code)
     --{
     --	exit(code);
     --}
     --
     - #endif
     - 
     - void check_pipe(int err)
     - {
     - 	if (err == EPIPE) {
     --		if (in_async())
     --			async_exit(141);
     -+		if (in_async()) {
     -+#ifdef NO_PTHREADS
     -+			exit(141);
     -+#else
     -+			pthread_exit((void *)141);
     -+#endif
     -+		}
     - 
     - 		signal(SIGPIPE, SIG_DFL);
     - 		raise(SIGPIPE);


 compat/mingw.c   | 2 +-
 compat/winansi.c | 8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index e433740381b..715f1c87e11 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -2291,7 +2291,7 @@ static int start_timer_thread(void)
 	timer_event = CreateEvent(NULL, FALSE, FALSE, NULL);
 	if (timer_event) {
 		timer_thread = (HANDLE) _beginthreadex(NULL, 0, ticktack, NULL, 0, NULL);
-		if (!timer_thread )
+		if (!timer_thread)
 			return errno = ENOMEM,
 				error("cannot start timer thread");
 	} else
diff --git a/compat/winansi.c b/compat/winansi.c
index 3abe8dd5a27..be65b27bd75 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -340,7 +340,7 @@ enum {
 	TEXT = 0, ESCAPE = 033, BRACKET = '['
 };
 
-static DWORD WINAPI console_thread(LPVOID unused)
+static unsigned int WINAPI console_thread(LPVOID unused)
 {
 	unsigned char buffer[BUFFER_SIZE];
 	DWORD bytes;
@@ -643,9 +643,9 @@ void winansi_init(void)
 		die_lasterr("CreateFile for named pipe failed");
 
 	/* start console spool thread on the pipe's read end */
-	hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
-	if (hthread == INVALID_HANDLE_VALUE)
-		die_lasterr("CreateThread(console_thread) failed");
+	hthread = (HANDLE)_beginthreadex(NULL, 0, console_thread, NULL, 0, NULL);
+	if (!hthread)
+		die_lasterr("_beginthreadex(console_thread) failed");
 
 	/* schedule cleanup routine */
 	if (atexit(winansi_exit))

base-commit: 56c8fb1e95377900ec9d53c07886022af0a5d3c2
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH] ssh signing: better error message when key not in agent
From: Adam Szkoda @ 2023-01-23 16:17 UTC (permalink / raw)
  To: Fabian Stelzer; +Cc: phillip.wood, Adam Szkoda via GitGitGadget, git
In-Reply-To: <20230123100245.3qbscxkgvbnh7ilt@fs>

Hi!  I've pushed a patch that adds `-U` conditional on is_literal_ssh_key().

According to the OpenSSH issue ([1]), that option is backward compatible:

> It should be safe to use -U even for older versions. It won't require the agent (as openssh-9.1 will) but it won't cause an error.

[1]: https://bugzilla.mindrot.org/show_bug.cgi?id=3429

Cheers
— Adam


On Mon, Jan 23, 2023 at 11:02 AM Fabian Stelzer <fs@gigacodes.de> wrote:
>
> On 23.01.2023 09:33, Phillip Wood wrote:
> >On 20/01/2023 09:03, Fabian Stelzer wrote:
> >>On 18.01.2023 16:29, Phillip Wood wrote:
> >>>Hi Adam
> >>>
> >>>I've cc'd Fabian who knows more about the ssh signing code that I do.
> >>>
> >>>On 18/01/2023 15:28, Adam Szkoda wrote:
> >>>>Hi Phillip,
> >>>>
> >>>>Good point!  My first thought is to try doing a stat() syscall on the
> >>>>path from 'user.signingKey' to see if it exists and if not, treat it
> >>>>as a public key (and pass the -U option).  If that sounds reasonable,
> >>>>I can update the patch.
> >>>
> >>>My reading of the documentation is that user.signingKey may point
> >>>to a public or private key so I'm not sure how stat()ing would
> >>>help. Looking at the code in sign_buffer_ssh() we have a function
> >>>is_literal_ssh_key() that checks if the config value is a public
> >>>key. When the user passes the path to a key we could read the file
> >>>check use is_literal_ssh_key() to check if it is a public key (or
> >>>possibly just check if the file begins with "ssh-"). Fabian - does
> >>>that sound reasonable?
> >>
> >>Hi,
> >>I have encountered the mentioned problem before as well and tried to
> >>fix it but did not find a good / reasonable way to do so. Git just
> >>passes the user.signingKey to ssh-keygen which states:
> >>`The key used for signing is specified using the -f option and may
> >>refer to either a private key, or a public key with the private half
> >>available via ssh-agent(1)`
> >>
> >>I don't think it's a good idea for git to parse the key and try to
> >>determine if it's public or private. The fix should probably be in
> >>openssh (different error message) but when looking into it last time
> >>i remember that the logic for using the key is quite deeply embedded
> >>into the ssh code and not easily adjusted for the signing use case.
> >>At the moment I don't have the time to look into it but the openssh
> >>code for signing is quite readable so feel free to give it a try.
> >>Maybe you find a good way.
> >
> >Thanks Fabian, perhaps the easiest way forward is for us to only pass
> >"-U" when we have a literal key in user.signingKey as we know it must
> >a be public key in that case.
>
> Yes, i think that's a good idea as long as the `-U` flag is ignored in older
> ssh versions and shouldn't be too hard to implement. And it should work just
> as well when using `defaultKeyCommand`.
>
> Best,
> Fabian
>
> >
> >Best Wishes
> >
> >Phillip
> >
> >>Best regards,
> >>Fabian
> >>
> >>>
> >>>Best Wishes
> >>>
> >>>Phillip
> >>>
> >>>>Best
> >>>>— Adam
> >>>>
> >>>>
> >>>>On Wed, Jan 18, 2023 at 3:34 PM Phillip Wood
> >>>><phillip.wood123@gmail.com> wrote:
> >>>>>
> >>>>>On 18/01/2023 11:10, Phillip Wood wrote:
> >>>>>>>the agent [1].  A fix is scheduled to be released in OpenSSH 9.1. All
> >>>>>>>that
> >>>>>>>needs to be done is to pass an additional backward-compatible option
> >>>>>>>-U to
> >>>>>>>'ssh-keygen -Y sign' call.  With '-U', ssh-keygen always interprets
> >>>>>>>the file
> >>>>>>>as public key and expects to find the private key in the agent.
> >>>>>>
> >>>>>>The documentation for user.signingKey says
> >>>>>>
> >>>>>>  If gpg.format is set to ssh this can contain the path to either your
> >>>>>>private ssh key or the public key when ssh-agent is used.
> >>>>>>
> >>>>>>If I've understood correctly passing -U will prevent users
> >>>>>>from setting
> >>>>>>this to a private key.
> >>>>>
> >>>>>If there is an easy way to tell if the user has given us a public key
> >>>>>then we could pass "-U" in that case.
> >>>>>
> >>>>>Best Wishes
> >>>>>
> >>>>>Phillip

^ permalink raw reply

* Re: [PATCH v9 0/2] send-email: expose header information to git-send-email's sendemail-validate hook
From: Michael Strawbridge @ 2023-01-23 16:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git
In-Reply-To: <230123.86wn5ds602.gmgdl@evledraar.gmail.com>


On 2023-01-23 08:51, Ævar Arnfjörð Bjarmason wrote:
> On Thu, Jan 19 2023, Michael Strawbridge wrote:
>
>> Thanks to Ævar for an idea to simplify these patches further.
>>
>> Michael Strawbridge (2):
>>   send-email: refactor header generation functions
>>   send-email: expose header information to git-send-email's
>>     sendemail-validate hook
>>
>>  Documentation/githooks.txt | 27 +++++++++--
>>  git-send-email.perl        | 95 +++++++++++++++++++++++---------------
>>  t/t9001-send-email.sh      | 27 ++++++++++-
>>  3 files changed, 106 insertions(+), 43 deletions(-)
> Thanks for the update. Aside from any quibbles, I still have some
> fundimental concerns about the implementation here:
>
>  * Other hooks take stdin, not this sort of file argument.
>
>    We discussed that ending in
>    https://public-inbox.org/git/20230117215811.78313-1-michael.strawbridge@amd.com/;
>    but I probably shouldn't have mentioned "git hook" at all.
>
>    I do think though that we shouldn't expose a UX discrepancy like this
>    forever, but the ways forward out of that would seem to be to either
>    to revert a7555304546 (send-email: use 'git hook run' for
>    'sendemail-validate', 2021-12-22) & move forward from there, or to
>    wait for those patches (which I'm currentnly CI-ing).

Ok.  If we are at the point where the change is just trying to pass CI
but the main logic is there I am willing to wait some time.

>
>  * Aside from that, shouldn't we have a new "validate-headers" or
>    whatever hook, instead of assuming that we can add another argument
>    to existing users?...

While it's true we could (and I don't have a super strong opinion here),
I suppose I was foreseeing the potential that a user may want to have
logic that requires both the email headers and contents.  For example,
only checking contents for a specific mailing list.  If we split the
hooks, a user would then need to figure out how to have them coordinate.

>
>  * ...except can we do it safely? Now, it seems to me like you have
>    potential correctness issues here. We call format_2822_time() to make
>    the headers, but that's based on "$time", which we save away earlier.
>
>    But for the rest (e.g. "Message-Id" are we sure that we're giving the
>    hook the same headers as the one we actually end up sending?
>
>    But regardless of that, something that would bypass this entire
>    stdin/potential correctness etc. problem is if we just pass an offset
>    to the the, i.e. currently we have a "validate" which gets the
>    contents, if we had a "validate-raw" or whatever we could just pass:

I think there might be a part missing here: "problem is if we just pass
an offset to the ___."  So there's a chance I may not fully grasp your
suggestion.

> 	<headers>
> 	\n\n
> 	<content>
>
>    Where the current "validate" just gets "content", no? We could then
>    either pass the offset to the "\n\n", or just trust that such a hook
>    knows to find the "\n\n".
>
>    I also think that would be more generally usable, as the tiny
>    addition of some exit code interpretation would allow us to say "I
>    got this, and consider this sent", which would also satisfy some who
>    have wanted e.g. a way to intrecept it before it invokes "sendmail"
>    (I remember a recent thread about that in relation to using "mutt" to
>    send it directly)
>
>    

Are you suggesting to simply add the header to the current
sendemail-validate hook?

I appreciate the feedback.


^ permalink raw reply

* Re: [PATCH v4 0/9] rebase: fix several code/testing/documentation issues around flag incompatibilities
From: Derrick Stolee @ 2023-01-23 15:56 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Elijah Newren, Eric Sunshine, Martin Ågren, Phillip Wood
In-Reply-To: <pull.1466.v4.git.1674367961.gitgitgadget@gmail.com>

On 1/22/2023 1:12 AM, Elijah Newren via GitGitGadget wrote:
> We had a report about --update-refs being ignored when --whitespace=fix was
> passed, confusing an end user. These were already marked as incompatible in
> the manual, but the code didn't check for the incompatibility and provide an
> error to the user.
> 
> Folks brought up other flags tangentially when reviewing an earlier round of
> this series, and I found we had more that were missing checks, and that we
> were missing some testcases, and that the documentation was wrong on some of
> the relevant options. So, I ended up getting lots of little fixes to
> straighten these all out.

Wow, this really expanded since I last looked at it. Thanks for taking on all
of that extra work! (That was not my intention when recommending that you take
over the fix.)
 
> Changes since v3:
> 
>  * Corrected the code surrounding --[no-]reapply-cherry-picks, and extended
>    the testcases (Thanks to Phillip for pointing out my error)
>  * I went ahead and implemented the better error message when the merge
>    backend is triggered solely via config options.

I really appreciate this extra attention to detail. I'm also really happy with
how you implemented it, using different variables to signal how the option was
specified (and using -1 for "unset" in both cases).

While I had not been following version 2 or 3, I read this version in its
entirety and everything looked good to me. These improvements to our docs,
tests, and implementation will be felt by users.

Thanks!
-Stolee

^ permalink raw reply

* Re: [PATCH v2 2/2] ls-files: add %(skipworktree) atom to format option
From: ZheNing Hu @ 2023-01-23 15:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Elijah Newren, ZheNing Hu via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Victoria Dye, Nguyễn Thái Ngọc Duy
In-Reply-To: <xmqqcz79xizc.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> 于2023年1月21日周六 00:34写道:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > On Thu, Jan 19, 2023 at 9:34 AM ZheNing Hu via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >>
> >> From: ZheNing Hu <adlternative@gmail.com>
> >>
> >> Because sometimes we want to check if the files in the
> >> index match the sparse specification, so introduce
> >> "%(skipworktree)" atom to git ls-files `--format` option.
> >> When we use this option, if the file match the sparse
> >> specification, it will output "1", otherwise, output
> >> empty string "".
> >
> > Why is that output format useful?  It seems like it'll just lead to
> > bugs, or someone re-implementing the same field with a different name
> > to make it useful in the future.  In particular, if there are multiple
> > boolean fields and someone specifies e.g.
> >    git ls-files --format="%(path) %(skipworktree) %(intentToAdd)"
> > and both boolean fields are displayed the same way (either a "1" or a
> > blank string), and we see something like:
> >    foo.c 1
> >    bar.c 1
> > Then how do we know if foo.c and bar.c are SKIP_WORKTREE or
> > INTENT_TO_ADD?  The "1" could have come from either field.
>
> Perhaps it becomes useful in conjunction with %(if) and friends,
> when they become avaiable?
>
> Until then, I agree that the output format looks pretty klunky.
> The calling scripts still can do
>
>         --format='%(path) A=%(A) B=%(B) C=%(C)'
>
> and take an empty value as false, though.

Can this strange design be considered as a bad design of %(if) and
%(else) in ref-filter?

^ permalink raw reply

* Re: [PATCH v2 2/2] ls-files: add %(skipworktree) atom to format option
From: ZheNing Hu @ 2023-01-23 15:33 UTC (permalink / raw)
  To: Elijah Newren
  Cc: ZheNing Hu via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee, Victoria Dye,
	Nguyễn Thái Ngọc Duy
In-Reply-To: <CABPp-BGLmhoHAcuLoz_yQ4TmNBvDU6Ehymy_3rh0wguSw0hjGw@mail.gmail.com>

Elijah Newren <newren@gmail.com> 于2023年1月20日周五 13:30写道:
>
> On Thu, Jan 19, 2023 at 9:34 AM ZheNing Hu via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > Because sometimes we want to check if the files in the
> > index match the sparse specification, so introduce
> > "%(skipworktree)" atom to git ls-files `--format` option.
> > When we use this option, if the file match the sparse
> > specification, it will output "1", otherwise, output
> > empty string "".
>
> Why is that output format useful?  It seems like it'll just lead to
> bugs, or someone re-implementing the same field with a different name
> to make it useful in the future.  In particular, if there are multiple
> boolean fields and someone specifies e.g.
>    git ls-files --format="%(path) %(skipworktree) %(intentToAdd)"
> and both boolean fields are displayed the same way (either a "1" or a
> blank string), and we see something like:
>    foo.c 1
>    bar.c 1
> Then how do we know if foo.c and bar.c are SKIP_WORKTREE or
> INTENT_TO_ADD?  The "1" could have come from either field.
>

I understand your confusion here. If we need to combine these
boolean values in --format with %(if) %(else) of ref-filter in the future,
we can only do this strange design. Output like "true"/"false" or "1"/"0"
would be better without considering %(if), %(else).

> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> >  Documentation/git-ls-files.txt |  5 +++++
> >  builtin/ls-files.c             |  3 +++
> >  t/t3013-ls-files-format.sh     | 23 +++++++++++++++++++++++
> >  3 files changed, 31 insertions(+)
> >
> > diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
> > index 440043cdb8e..2540b404808 100644
> > --- a/Documentation/git-ls-files.txt
> > +++ b/Documentation/git-ls-files.txt
> > @@ -260,6 +260,11 @@ eolattr::
> >         that applies to the path.
> >  path::
> >         The pathname of the file which is recorded in the index.
> > +skipworktree::
> > +       If the file in the index marked with SKIP_WORKTREE bit.
> > +       It means the file do not match the sparse specification.
> > +       See link:technical/sparse-checkout.txt[sparse-checkout]
> > +       for more information.
>
> minor nits: Missing an "is", and "do" should be "does".
>
> I'm curious whether the second sentence is even necessary; we've
> already got the link to the more technical docs.  Perhaps just:
>
> skipworktree::
>     Whether the file in the index has the SKIP_WORKTREE bit set.
>     See link:technical/sparse-checkout.txt[sparse-checkout]
>     for more information.
>

Agree.

> >  EXCLUDE PATTERNS
> >  ----------------
> > diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> > index a03b559ecaa..bbff868ae6b 100644
> > --- a/builtin/ls-files.c
> > +++ b/builtin/ls-files.c
> > @@ -280,6 +280,9 @@ static size_t expand_show_index(struct strbuf *sb, const char *start,
> >                               data->pathname));
> >         else if (skip_prefix(start, "(path)", &p))
> >                 write_name_to_buf(sb, data->pathname);
> > +       else if (skip_prefix(start, "(skipworktree)", &p))
> > +               strbuf_addstr(sb, ce_skip_worktree(data->ce) ?
> > +                             "1" : "");
>
> As I mentioned in response to the commit message, I don't understand
> why having an empty string would be desirable.
>
> >         else
> >                 die(_("bad ls-files format: %%%.*s"), (int)len, start);
> >
> > diff --git a/t/t3013-ls-files-format.sh b/t/t3013-ls-files-format.sh
> > index efb7450bf1e..cd35dba5930 100755
> > --- a/t/t3013-ls-files-format.sh
> > +++ b/t/t3013-ls-files-format.sh
> > @@ -92,4 +92,27 @@ test_expect_success 'git ls-files --format with --debug' '
> >         test_cmp expect actual
> >  '
> >
> > +test_expect_success 'git ls-files --format with skipworktree' '
> > +       test_when_finished "git sparse-checkout disable" &&
> > +       mkdir dir1 dir2 &&
> > +       echo "file1" >dir1/file1.txt &&
> > +       echo "file2" >dir2/file2.txt &&
> > +       git add dir1 dir2 &&
> > +       git commit -m skipworktree &&
> > +       git sparse-checkout set dir1 &&
> > +       git ls-files --format="%(path)%(skipworktree)" >actual &&
> > +       cat >expect <<-\EOF &&
> > +       dir1/file1.txt
> > +       dir2/file2.txt1
> > +       o1.txt
> > +       o2.txt
> > +       o3.txt
> > +       o4.txt
> > +       o5.txt
> > +       o6.txt
> > +       o7.txt
> > +       EOF
> > +       test_cmp expect actual
> > +'
>
> I find this test hard to read; it's just too easy to miss
> "dir2/file2.txt1" vs "dir2/file2.txt".  I'd suggest at least adding a
> space, and likely having the skipworktree attribute come first in the
> format string.  It might also be useful to add "dir*" on the ls-files
> command to limit which paths are shown, just because there's an awful
> lot of files in the root directory and no variance between them, and
> it's easier to notice the binary difference between two items than
> having a full 9 and figuring out which are relevant.

Good idea.

I deliberately removed the space between %(path) and
%(skipworktree) before, because according to the current design,
the "" output by %(skipworktree) is empty, which leads to an extra
space at the end of the output line, which will break github's
 "whitespace" ci tests. Maybe swapping the location of %(path) and
%(skipworktree) will fix this.

^ permalink raw reply

* [PATCH v2 08/10] fetch: fetch from an external bundle URI
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

When a user specifies a URI via 'git clone --bundle-uri', that URI may
be a bundle list that advertises a 'bundle.heuristic' value. In that
case, the Git client stores a 'fetch.bundleURI' config value storing
that URI.

Teach 'git fetch' to check for this config value and download bundles
from that URI before fetching from the Git remote(s). Likely, the bundle
provider has configured a heuristic (such as "creationToken") that will
allow the Git client to download only a portion of the bundles before
continuing the fetch.

Since this URI is completely independent of the remote server, we want
to be sure that we connect to the bundle URI before creating a
connection to the Git remote. We do not want to hold a stateful
connection for too long if we can avoid it.

To test that this works correctly, extend the previous tests that set
'fetch.bundleURI' to do follow-up fetches. The bundle list is updated
incrementally at each phase to demonstrate that the heuristic avoids
downloading older bundles. This includes the middle fetch downloading
the objects in bundle-3.bundle from the Git remote, and therefore not
needing that bundle in the third fetch.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/fetch.c             |   7 +++
 t/t5558-clone-bundle-uri.sh | 113 +++++++++++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7378cafeec9..f101e454dc9 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -29,6 +29,7 @@
 #include "commit-graph.h"
 #include "shallow.h"
 #include "worktree.h"
+#include "bundle-uri.h"
 
 #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)
 
@@ -2109,6 +2110,7 @@ static int fetch_one(struct remote *remote, int argc, const char **argv,
 int cmd_fetch(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	const char *bundle_uri;
 	struct string_list list = STRING_LIST_INIT_DUP;
 	struct remote *remote = NULL;
 	int result = 0;
@@ -2194,6 +2196,11 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	if (dry_run)
 		write_fetch_head = 0;
 
+	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri)) {
+		if (fetch_bundle_uri(the_repository, bundle_uri, NULL))
+			warning(_("failed to fetch bundles from '%s'"), bundle_uri);
+	}
+
 	if (all) {
 		if (argc == 1)
 			die(_("fetch --all does not take a repository argument"));
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index b2d15e141ca..7deeb4b8ad1 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -440,7 +440,55 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	EOF
 
 	test_remote_https_urls <trace-clone.txt >actual &&
-	test_cmp expect actual
+	test_cmp expect actual &&
+
+	# We now have only one bundle ref.
+	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	EOF
+	test_cmp expect refs &&
+
+	# Add remaining bundles, exercising the "deepening" strategy
+	# for downloading via the creationToken heurisitc.
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
+		git -C clone-token-http fetch origin --no-tags \
+		refs/heads/merge:refs/heads/merge &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	EOF
+
+	test_remote_https_urls <trace1.txt >actual &&
+	test_cmp expect actual &&
+
+	# We now have all bundle refs.
+	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect refs
 '
 
 test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
@@ -477,6 +525,69 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	cat >expect <<-\EOF &&
 	refs/bundles/base
 	EOF
+	test_cmp expect refs &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	# Fetch the objects for bundle-2 _and_ bundle-3.
+	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/left:refs/heads/left \
+		refs/heads/right:refs/heads/right &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-2.bundle
+	EOF
+
+	test_remote_https_urls <trace1.txt >actual &&
+	test_cmp expect actual &&
+
+	# received left from bundle-2
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect refs &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	# This fetch should skip bundle-3.bundle, since its objects are
+	# already local (we have the requisite commits for bundle-4.bundle).
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/merge:refs/heads/merge &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+
+	test_remote_https_urls <trace2.txt >actual &&
+	test_cmp expect actual &&
+
+	# received merge ref from bundle-4, but right is missing
+	# because we did not download bundle-3.
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	EOF
 	test_cmp expect refs
 '
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 10/10] bundle-uri: test missing bundles with heuristic
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

The creationToken heuristic uses a different mechanism for downloading
bundles from the "standard" approach. Specifically: it uses a concrete
order based on the creationToken values and attempts to download as few
bundles as possible. It also modifies local config to store a value for
future fetches to avoid downloading bundles, if possible.

However, if any of the individual bundles has a failed download, then
the logic for the ordering comes into question. It is important to avoid
infinite loops, assigning invalid creation token values in config, but
also to be opportunistic as possible when downloading as many bundles as
seem appropriate.

These tests were used to inform the implementation of
fetch_bundles_by_token() in bundle-uri.c, but are being added
independently here to allow focusing on faulty downloads. There may be
more cases that could be added that result in modifications to
fetch_bundles_by_token() as interesting data shapes reveal themselves in
real scenarios.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t5558-clone-bundle-uri.sh | 400 ++++++++++++++++++++++++++++++++++++
 1 file changed, 400 insertions(+)

diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9c2b7934b9b..e3ccfe872c4 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -618,6 +618,406 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	test_cmp expect actual
 '
 
+test_expect_success 'creationToken heuristic with failed downloads (clone)' '
+	test_when_finished rm -rf download-* trace*.txt &&
+
+	# Case 1: base bundle does not exist, nothing can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = fake.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-1.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-1 &&
+
+	# Bundle failure does not set these configs.
+	test_must_fail git -C download-1 config fetch.bundleuri &&
+	test_must_fail git -C download-1 config fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/fake.bundle
+	EOF
+	test_remote_https_urls <trace-clone-1.txt >actual &&
+	test_cmp expect actual &&
+
+	# All bundles failed to unbundle
+	git -C download-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	test_must_be_empty refs &&
+
+	# Case 2: middle bundle does not exist, only two bundles can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = fake.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-2.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-2 &&
+
+	# Bundle failure does not set these configs.
+	test_must_fail git -C download-2 config fetch.bundleuri &&
+	test_must_fail git -C download-2 config fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+	test_remote_https_urls <trace-clone-2.txt >actual &&
+	test_cmp expect actual &&
+
+	# Only base bundle unbundled.
+	git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/right
+	EOF
+	test_cmp expect refs &&
+
+	# Case 3: top bundle does not exist, rest unbundle fine.
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = fake.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-3.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-3 &&
+
+	# As long as we have continguous successful downloads,
+	# we _do_ set these configs.
+	test_cmp_config -C download-3 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C download-3 3 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+	test_remote_https_urls <trace-clone-3.txt >actual &&
+	test_cmp expect actual &&
+
+	# All bundles failed to unbundle
+	git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/right
+	EOF
+	test_cmp expect refs
+'
+
+# Expand the bundle list to include other interesting shapes, specifically
+# interesting for use when fetching from a previous state.
+#
+# ---------------- bundle-7
+#       7
+#     _/|\_
+# ---/--|--\------ bundle-6
+#   5   |   6
+# --|---|---|----- bundle-4
+#   |   4   |
+#   |  / \  /
+# --|-|---|/------ bundle-3 (the client will be caught up to this point.)
+#   \ |   3
+# ---\|---|------- bundle-2
+#     2   |
+# ----|---|------- bundle-1
+#      \ /
+#       1
+#       |
+# (previous commits)
+test_expect_success 'expand incremental bundle list' '
+	(
+		cd clone-from &&
+		git checkout -b lefter left &&
+		test_commit 5 &&
+		git checkout -b righter right &&
+		test_commit 6 &&
+		git checkout -b top lefter &&
+		git merge -m "7" merge righter &&
+
+		git bundle create bundle-6.bundle lefter righter --not left right &&
+		git bundle create bundle-7.bundle top --not lefter merge righter &&
+
+		cp bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/"
+	) &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" fetch origin +refs/heads/*:refs/heads/*
+'
+
+test_expect_success 'creationToken heuristic with failed downloads (fetch)' '
+	test_when_finished rm -rf download-* trace*.txt &&
+
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+	EOF
+
+	git clone --single-branch --branch=left \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" fetch-base &&
+	test_cmp_config -C fetch-base "$HTTPD_URL/bundle-list" fetch.bundleURI &&
+	test_cmp_config -C fetch-base 3 fetch.bundleCreationToken &&
+
+	# Case 1: all bundles exist: successful unbundling of all bundles
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = bundle-6.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = bundle-7.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-1 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-1.txt" \
+		git -C fetch-1 fetch origin &&
+	test_cmp_config -C fetch-1 7 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-7.bundle
+	$HTTPD_URL/bundle-6.bundle
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-1.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/lefter
+	refs/bundles/merge
+	refs/bundles/right
+	refs/bundles/righter
+	refs/bundles/top
+	EOF
+	test_cmp expect refs &&
+
+	# Case 2: middle bundle does not exist, only bundle-4 can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = fake.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = bundle-7.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-2.txt" \
+		git -C fetch-2 fetch origin &&
+
+	# Since bundle-7 fails to unbundle, do not update creation token.
+	test_cmp_config -C fetch-2 3 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-7.bundle
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-2.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect refs &&
+
+	# Case 3: top bundle does not exist, rest unbundle fine.
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = bundle-6.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = fake.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-3 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-3.txt" \
+		git -C fetch-3 fetch origin &&
+
+	# As long as we have continguous successful downloads,
+	# we _do_ set the maximum creation token.
+	test_cmp_config -C fetch-3 6 fetch.bundlecreationtoken &&
+
+	# NOTE: the fetch skips bundle-4 since bundle-6 successfully
+	# unbundles itself and bundle-7 failed to download.
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-6.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-3.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/lefter
+	refs/bundles/right
+	refs/bundles/righter
+	EOF
+	test_cmp expect refs
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v2 09/10] bundle-uri: store fetch.bundleCreationToken
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

When a bundle list specifies the "creationToken" heuristic, the Git
client downloads the list and then starts downloading bundles in
descending creationToken order. This process stops as soon as all
downloaded bundles can be applied to the repository (because all
required commits are present in the repository or in the downloaded
bundles).

When checking the same bundle list twice, this strategy requires
downloading the bundle with the maximum creationToken again, which is
wasteful. The creationToken heuristic promises that the client will not
have a use for that bundle if its creationToken value is at most the
previous creationToken value.

To prevent these wasteful downloads, create a fetch.bundleCreationToken
config setting that the Git client sets after downloading bundles. This
value allows skipping that maximum bundle download when this config
value is the same value (or larger).

To test that this works correctly, we can insert some "duplicate"
fetches into existing tests and demonstrate that only the bundle list is
downloaded.

The previous logic for downloading bundles by creationToken worked even
if the bundle list was empty, but now we have logic that depends on the
first entry of the list. Terminate early in the (non-sensical) case of
an empty bundle list.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/fetch.txt | 16 ++++++++++++
 bundle-uri.c                   | 48 ++++++++++++++++++++++++++++++++--
 t/t5558-clone-bundle-uri.sh    | 29 +++++++++++++++++++-
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index 244f44d460f..568f0f75b30 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -104,3 +104,19 @@ fetch.bundleURI::
 	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
 	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
 	list that is organized for incremental fetches.
++
+If you modify this value and your repository has a `fetch.bundleCreationToken`
+value, then remove that `fetch.bundleCreationToken` value before fetching from
+the new bundle URI.
+
+fetch.bundleCreationToken::
+	When using `fetch.bundleURI` to fetch incrementally from a bundle
+	list that uses the "creationToken" heuristic, this config value
+	stores the maximum `creationToken` value of the downloaded bundles.
+	This value is used to prevent downloading bundles in the future
+	if the advertised `creationToken` is not strictly larger than this
+	value.
++
+The creation token values are chosen by the provider serving the specific
+bundle URI. If you modify the URI at `fetch.bundleURI`, then be sure to
+remove the value for the `fetch.bundleCreationToken` value before fetching.
diff --git a/bundle-uri.c b/bundle-uri.c
index 162a9276f31..691853b2c56 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -487,6 +487,8 @@ static int fetch_bundles_by_token(struct repository *r,
 {
 	int cur;
 	int move_direction = 0;
+	const char *creationTokenStr;
+	uint64_t maxCreationToken = 0, newMaxCreationToken = 0;
 	struct bundle_list_context ctx = {
 		.r = r,
 		.list = list,
@@ -500,8 +502,27 @@ static int fetch_bundles_by_token(struct repository *r,
 
 	for_all_bundles_in_list(list, append_bundle, &bundles);
 
+	if (!bundles.nr) {
+		free(bundles.items);
+		return 0;
+	}
+
 	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
 
+	/*
+	 * If fetch.bundleCreationToken exists, parses to a uint64t, and
+	 * is not strictly smaller than the maximum creation token in the
+	 * bundle list, then do not download any bundles.
+	 */
+	if (!repo_config_get_value(r,
+				   "fetch.bundlecreationtoken",
+				   &creationTokenStr) &&
+	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
+	    bundles.items[0]->creationToken <= maxCreationToken) {
+		free(bundles.items);
+		return 0;
+	}
+
 	/*
 	 * Attempt to download and unbundle the minimum number of bundles by
 	 * creationToken in decreasing order. If we fail to unbundle (after
@@ -522,6 +543,16 @@ static int fetch_bundles_by_token(struct repository *r,
 	cur = 0;
 	while (cur >= 0 && cur < bundles.nr) {
 		struct remote_bundle_info *bundle = bundles.items[cur];
+
+		/*
+		 * If we need to dig into bundles below the previous
+		 * creation token value, then likely we are in an erroneous
+		 * state due to missing or invalid bundles. Halt the process
+		 * instead of continuing to download extra data.
+		 */
+		if (bundle->creationToken <= maxCreationToken)
+			break;
+
 		if (!bundle->file) {
 			/*
 			 * Not downloaded yet. Try downloading.
@@ -561,6 +592,9 @@ static int fetch_bundles_by_token(struct repository *r,
 				 */
 				move_direction = -1;
 				bundle->unbundled = 1;
+
+				if (bundle->creationToken > newMaxCreationToken)
+					newMaxCreationToken = bundle->creationToken;
 			}
 		}
 
@@ -575,14 +609,24 @@ stack_operation:
 		cur += move_direction;
 	}
 
-	free(bundles.items);
-
 	/*
 	 * We succeed if the loop terminates because 'cur' drops below
 	 * zero. The other case is that we terminate because 'cur'
 	 * reaches the end of the list, so we have a failure no matter
 	 * which bundles we apply from the list.
 	 */
+	if (cur < 0) {
+		struct strbuf value = STRBUF_INIT;
+		strbuf_addf(&value, "%"PRIu64"", newMaxCreationToken);
+		if (repo_config_set_multivar_gently(ctx.r,
+						    "fetch.bundleCreationToken",
+						    value.buf, NULL, 0))
+			warning(_("failed to store maximum creation token"));
+
+		strbuf_release(&value);
+	}
+
+	free(bundles.items);
 	return cur >= 0;
 }
 
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 7deeb4b8ad1..9c2b7934b9b 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -433,6 +433,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
 
 	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C clone-token-http 1 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -468,6 +469,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
 		git -C clone-token-http fetch origin --no-tags \
 		refs/heads/merge:refs/heads/merge &&
+	test_cmp_config -C clone-token-http 4 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -511,6 +513,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
 
 	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -538,6 +541,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		git -C fetch-http-4 fetch origin --no-tags \
 		refs/heads/left:refs/heads/left \
 		refs/heads/right:refs/heads/right &&
+	test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -555,6 +559,18 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	EOF
 	test_cmp expect refs &&
 
+	# No-op fetch
+	GIT_TRACE2_EVENT="$(pwd)/trace1b.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/left:refs/heads/left \
+		refs/heads/right:refs/heads/right &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	EOF
+	test_remote_https_urls <trace1b.txt >actual &&
+	test_cmp expect actual &&
+
 	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle "bundle-3"]
 		uri = bundle-3.bundle
@@ -570,6 +586,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
 		git -C fetch-http-4 fetch origin --no-tags \
 		refs/heads/merge:refs/heads/merge &&
+	test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -588,7 +605,17 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	refs/bundles/left
 	refs/bundles/merge
 	EOF
-	test_cmp expect refs
+	test_cmp expect refs &&
+
+	# No-op fetch
+	GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \
+		git -C fetch-http-4 fetch origin &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	EOF
+	test_remote_https_urls <trace2b.txt >actual &&
+	test_cmp expect actual
 '
 
 # Do not add tests here unless they use the HTTP server, as they will
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 00/10] Bundle URIs V: creationToken heuristic for incremental fetches
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git; +Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee
In-Reply-To: <pull.1454.git.1673037405.gitgitgadget@gmail.com>

This fifth part to the bundle URIs feature follows part IV (advertising via
protocol v2) which recently merged to 'master', so this series is based on
'master'.

This part introduces the concept of a heuristic that a bundle list can
advertise. The purpose of the heuristic is to hint to the Git client that
the bundles can be downloaded and unbundled in a certain order. In
particular, that order can assist with using the same bundle URI to download
new bundles from an updated bundle list. This allows bundle URIs to assist
with incremental fetches, not just initial clones.

The only planned heuristic is the "creationToken" heuristic where the bundle
list adds a 64-bit unsigned integer "creationToken" value to each bundle in
the list. Those values provide an ordering on the bundles implying that the
bundles can be unbundled in increasing creationToken order and at each point
the required commits for the ith bundle were provided by bundles with lower
creationTokens.

At clone time, the only difference implied by the creationToken order is
that the Git client does not need to guess at the order to apply the
bundles, but instead can use the creationToken order to apply them without
failure and retry. However, this presents an interesting benefit during
fetches: the Git client can check the bundle list and download bundles in
decreasing creationToken order until the required commits for these bundles
are present within the repository's object store. This prevents downloading
more bundle information than required.

The creationToken value is also a promise that the Git client will not need
to download a bundle if its creationToken is less than or equal to the
creationToken of a previously-downloaded bundle. This further improves the
performance during a fetch in that the client does not need to download any
bundles at all if it recognizes that the maximum creationToken is the same
(or smaller than) a previously-downloaded creationToken.

The creationToken concept is documented in the existing design document at
Documentation/technical/bundle-uri.txt, including suggested ways for bundle
providers to organize their bundle lists to take advantage of the heuristic.

This series formalizes the creationToken heuristic and the Git client logic
for understanding it. Further, for bundle lists provided by the git clone
--bundle-uri option, the Git client will recognize the heuristic as being
helpful for incremental fetches and store config values so that future git
fetch commands check the bundle list before communicating with any Git
remotes.

Note that this option does not integrate fetches with bundle lists
advertised via protocol v2. I spent some time working on this, but found the
implementation to be distinct enough that it merited its own attention in a
separate series. In particular, the configuration for indicating that a
fetch should check the bundle-uri protocol v2 command seemed best to be
located within a Git remote instead of a repository-global key such as is
being used for a static URI. Further, the timing of querying the bundle-uri
command during a git fetch command is significantly different and more
complicated than how it is used in git clone.


What Remains?
=============

Originally, I had planned on making this bundle URI work a 5-part series,
and this is part 5. Shouldn't we be done now?

There are two main things that should be done after this series, in any
order:

 * Teach git fetch to check a bundle list advertised by a remote over the
   bundle-uri protocol v2 command.
 * Add the bundle.<id>.filter option to allow advertising bundles and
   partial bundles side-by-side.

There is also room for expanding tests for more error conditions, or for
other tweaks that are not currently part of the design document. I do think
that after this series, the feature will be easier to work on different
parts in parallel.


Patch Outline
=============

 * (New in v2) Patch 1 adds a new VERIFY_BUNDLE_SKIP_REACHABLE flag for
   verify_bundle() which is called by unbundle(); this fixes a probable
   exposed by patch 10 where a bundle would fail to unbundle due to the "are
   the required commits reachable from refs?" check.
 * Patch 2 creates a test setup demonstrating a creationToken heuristic. At
   this point, the Git client ignores the heuristic and uses its ad-hoc
   strategy for ordering the bundles.
 * Patches 3 and 4 teach Git to parse the bundle.heuristic and
   bundle.<id>.creationToken keys in a bundle list.
 * Patch 5 teaches Git to download bundles using the creationToken order.
   This order uses a stack approach to start from the maximum creationToken
   and continue downloading the next bundle in the list until all bundles
   can successfully be unbundled. This is the algorithm required for
   incremental fetches, while initial clones could download in the opposite
   order. Since clones will download all bundles anyway, having a second
   code path just for clones seemed unnecessary.
 * Patch 6 teaches git clone --bundle-uri to set fetch.bundleURI when the
   advertised bundle list includs a heuristic that Git understands.
 * Patch 7 updates the design document to remove reference to a bundle.flag
   option that was previously going to indicate the list was designed for
   fetches, but the bundle.heuristic option already does that.
 * Patch 8 teaches git fetch to check fetch.bundleURI and download bundles
   from that static URI before connecting to remotes via the Git protocol.
 * Patch 9 introduces a new fetch.bundleCreationToken config value to store
   the maximum creationToken of downloaded bundles. This prevents
   downloading the latest bundle on every git fetch command, reducing waste.
 * (New in v2) Patch 10 adds new tests for interesting incremental fetch
   shapes. Along with other test edits in other patches, these revealed
   several issues that required improvement within this series. These tests
   also check extra cases around failed bundle downloads.


Updates in v2
=============

 * Patches 1 and 10 are new.
 * I started making the extra tests in patch 10 due to Victoria's concern
   around failed downloads. I extended the bundle list in a way that exposed
   other issues that are fixed in this version. Unfortunately, the test
   requires the full functionality of the entire series, so the tests are
   not isolated to where the code fixes are made. One thing that I noticed
   in the process is that some of the tests were using the local-clone trick
   to copy full object directories instead of copying only the requested
   object set. This was causing confusion in how the bundles were applying
   or failing to apply, so the tests are updated to use http whenever
   possible.
 * In Patch 2, I created a new test_remote_https_urls helper to get the full
   download list (in order). In this patch, the bundle download order is not
   well-defined, but is modified in later tests when it becomes
   well-defined.
 * In Patch 3, I updated the connection between config value and enum value
   to be an array of pairs instead of faking a hashmap-like interface that
   could be dangerous if the enum values were assigned incorrectly.
 * In Patch 5, the 'sorted' list and its type was renamed to be more
   descriptive. This also included updates to "append_bundle()" and
   "compare_creation_token_decreasing()" to be more descriptive. This had
   some side effects in Patch 8 due to the renames.
 * In Patch 5, I added the interesting bundle shape to the commit message to
   remind us of why the creationToken algorithm needs to be the way it is. I
   also removed the "stack" language in favor of discussing ranges of the
   sorted list. Some renames, such as "pop_or_push" is changed to
   "move_direction", resulted from this change of language.
 * The assignment of heuristic from the local list to global_list was moved
   into Patch 5.
 * In Patch 5, one of the tests removed bundle-2 because it allows a later
   test for git fetch to demonstrate the interesting behavior where bundle-4
   requires both bundle-2 and bundle-3.
 * In Patch 6, the fetch.bundleURI config is described differently,
   including dropping the defunct git fetch --bundle-uri reference and
   discussing that git clone --bundle-uri will set it automatically.
 * Patch 8 no longer refers to a config value starting with "remote:". It
   also expands a test that was previously not expanded in v1.
 * Patch 9 updates the documentation for fetch.bundleURI and
   fetch.bundleCreationToken to describe how the user should unset the
   latter if they edit the former.
 * Much of Patch 9's changes are due to context changes from the renames in
   Patch 5. However, it also adds the restriction that it will not attempt
   to download bundles unless their creationToken is strictly greater than
   the stored token. This ends up being critical to the failed download
   case, preventing an incremental fetch from downloading all bundles just
   because one bundle failed to download (and that case is tested in patch
   10).
 * Patch 10 adds significant testing, including several tests of failed
   bundle downloads in various cases.

Thanks,

 * Stolee

Derrick Stolee (10):
  bundle: optionally skip reachability walk
  t5558: add tests for creationToken heuristic
  bundle-uri: parse bundle.heuristic=creationToken
  bundle-uri: parse bundle.<id>.creationToken values
  bundle-uri: download in creationToken order
  clone: set fetch.bundleURI if appropriate
  bundle-uri: drop bundle.flag from design doc
  fetch: fetch from an external bundle URI
  bundle-uri: store fetch.bundleCreationToken
  bundle-uri: test missing bundles with heuristic

 Documentation/config/bundle.txt        |   7 +
 Documentation/config/fetch.txt         |  24 +
 Documentation/technical/bundle-uri.txt |   8 +-
 builtin/clone.c                        |   6 +-
 builtin/fetch.c                        |   7 +
 bundle-uri.c                           | 257 +++++++++-
 bundle-uri.h                           |  28 +-
 bundle.c                               |   3 +-
 bundle.h                               |   1 +
 t/t5558-clone-bundle-uri.sh            | 672 ++++++++++++++++++++++++-
 t/t5601-clone.sh                       |  46 ++
 t/t5750-bundle-uri-parse.sh            |  37 ++
 t/test-lib-functions.sh                |   8 +
 13 files changed, 1091 insertions(+), 13 deletions(-)


base-commit: 4dbebc36b0893f5094668ddea077d0e235560b16
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1454%2Fderrickstolee%2Fbundle-redo%2FcreationToken-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1454/derrickstolee/bundle-redo/creationToken-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1454

Range-diff vs v1:

  -:  ----------- >  1:  b3828725bc8 bundle: optionally skip reachability walk
  1:  39eed914878 !  2:  427aff4d5e5 t5558: add tests for creationToken heuristic
     @@ Commit message
          meantime, create tests that add creation tokens to the bundle list. For
          now, the Git client correctly ignores these unknown keys.
      
     +    Create a new test helper function, test_remote_https_urls, which filters
     +    GIT_TRACE2_EVENT output to extract a list of URLs passed to
     +    git-remote-https child processes. This can be used to verify the order
     +    of these requests as we implement the creationToken heuristic. For now,
     +    we need to sort the actual output since the current client does not have
     +    a well-defined order that it applies to the bundles.
     +
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## t/t5558-clone-bundle-uri.sh ##
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone HTTP bundle' '
     - 	test_config -C clone-http log.excludedecoration refs/bundle/
       '
       
     -+# usage: test_bundle_downloaded <bundle-name> <trace-file>
     -+test_bundle_downloaded () {
     -+	cat >pattern <<-EOF &&
     -+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1"\]
     -+	EOF
     -+	grep -f pattern "$2"
     -+}
     -+
       test_expect_success 'clone bundle list (HTTP, no heuristic)' '
      +	test_when_finished rm -f trace*.txt &&
      +
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, no he
      -	git -C clone-list-http cat-file --batch-check <oids
      +	git -C clone-list-http cat-file --batch-check <oids &&
      +
     -+	for b in 1 2 3 4
     -+	do
     -+		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
     -+			return 1
     -+	done
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-1.bundle
     ++	$HTTPD_URL/bundle-2.bundle
     ++	$HTTPD_URL/bundle-3.bundle
     ++	$HTTPD_URL/bundle-4.bundle
     ++	$HTTPD_URL/bundle-list
     ++	EOF
     ++
     ++	# Sort the list, since the order is not well-defined
     ++	# without a heuristic.
     ++	test_remote_https_urls <trace-clone.txt | sort >actual &&
     ++	test_cmp expect actual
       '
       
       test_expect_success 'clone bundle list (HTTP, any mode)' '
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any m
       '
       
      +test_expect_success 'clone bundle list (http, creationToken)' '
     ++	test_when_finished rm -f trace*.txt &&
     ++
      +	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
      +	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
      +	[bundle]
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any m
      +		creationToken = 4
      +	EOF
      +
     -+	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
     ++	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" git \
     ++		clone --bundle-uri="$HTTPD_URL/bundle-list" \
     ++		"$HTTPD_URL/smart/fetch.git" clone-list-http-2 &&
      +
      +	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     -+	git -C clone-list-http-2 cat-file --batch-check <oids
     ++	git -C clone-list-http-2 cat-file --batch-check <oids &&
     ++
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-1.bundle
     ++	$HTTPD_URL/bundle-2.bundle
     ++	$HTTPD_URL/bundle-3.bundle
     ++	$HTTPD_URL/bundle-4.bundle
     ++	$HTTPD_URL/bundle-list
     ++	EOF
     ++
     ++	# Since the creationToken heuristic is not yet understood by the
     ++	# client, the order cannot be verified at this moment. Sort the
     ++	# list for consistent results.
     ++	test_remote_https_urls <trace-clone.txt | sort >actual &&
     ++	test_cmp expect actual
      +'
      +
       # Do not add tests here unless they use the HTTP server, as they will
       # not run unless the HTTP dependencies exist.
       
     +
     + ## t/test-lib-functions.sh ##
     +@@ t/test-lib-functions.sh: test_region () {
     + 	return 0
     + }
     + 
     ++# Given a GIT_TRACE2_EVENT log over stdin, writes to stdout a list of URLs
     ++# sent to git-remote-https child processes.
     ++test_remote_https_urls() {
     ++	grep -e '"event":"child_start".*"argv":\["git-remote-https",".*"\]' |
     ++		sed -e 's/{"event":"child_start".*"argv":\["git-remote-https","//g' \
     ++		    -e 's/"\]}//g'
     ++}
     ++
     + # Print the destination of symlink(s) provided as arguments. Basically
     + # the same as the readlink command, but it's not available everywhere.
     + test_readlink () {
  2:  9007249b948 !  3:  f6f8197c9cc bundle-uri: parse bundle.heuristic=creationToken
     @@ Commit message
          bundle-uri' to print the heuristic value and verify that the parsing
          works correctly.
      
     +    As an extra precaution, create the internal 'heuristics' array to be a
     +    list of (enum, string) pairs so we can iterate through the array entries
     +    carefully, regardless of the enum values.
     +
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## Documentation/config/bundle.txt ##
     @@ bundle-uri.c
       #include "config.h"
       #include "remote.h"
       
     -+static const char *heuristics[] = {
     -+	[BUNDLE_HEURISTIC_NONE] = "",
     -+	[BUNDLE_HEURISTIC_CREATIONTOKEN] = "creationToken",
     ++static struct {
     ++	enum bundle_list_heuristic heuristic;
     ++	const char *name;
     ++} heuristics[BUNDLE_HEURISTIC__COUNT] = {
     ++	{ BUNDLE_HEURISTIC_NONE, ""},
     ++	{ BUNDLE_HEURISTIC_CREATIONTOKEN, "creationToken" },
      +};
      +
       static int compare_bundles(const void *hashmap_cmp_fn_data,
     @@ bundle-uri.c: void print_bundle_list(FILE *fp, struct bundle_list *list)
       	fprintf(fp, "\tversion = %d\n", list->version);
       	fprintf(fp, "\tmode = %s\n", mode);
       
     -+	if (list->heuristic)
     -+		printf("\theuristic = %s\n", heuristics[list->heuristic]);
     ++	if (list->heuristic) {
     ++		int i;
     ++		for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
     ++			if (heuristics[i].heuristic == list->heuristic) {
     ++				printf("\theuristic = %s\n",
     ++				       heuristics[list->heuristic].name);
     ++				break;
     ++			}
     ++		}
     ++	}
      +
       	for_all_bundles_in_list(list, summarize_bundle, fp);
       }
     @@ bundle-uri.c: static int bundle_list_update(const char *key, const char *value,
      +		if (!strcmp(subkey, "heuristic")) {
      +			int i;
      +			for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
     -+				if (!strcmp(value, heuristics[i])) {
     -+					list->heuristic = i;
     ++				if (heuristics[i].heuristic &&
     ++				    heuristics[i].name &&
     ++				    !strcmp(value, heuristics[i].name)) {
     ++					list->heuristic = heuristics[i].heuristic;
      +					return 0;
      +				}
      +			}
     @@ bundle-uri.h: enum bundle_list_mode {
      +	BUNDLE_HEURISTIC_CREATIONTOKEN,
      +
      +	/* Must be last. */
     -+	BUNDLE_HEURISTIC__COUNT,
     ++	BUNDLE_HEURISTIC__COUNT
      +};
      +
       /**
  3:  a1808f0b10c =  4:  12efa228d04 bundle-uri: parse bundle.<id>.creationToken values
  4:  57c0174d375 !  5:  7cfaa3c518c bundle-uri: download in creationToken order
     @@ Commit message
          strategy implemented here provides that short-circuit where the client
          downloads a minimal set of bundles.
      
     +    However, we are not satisfied by the naive approach of downloading
     +    bundles until one successfully unbundles, expecting the earlier bundles
     +    to successfully unbundle now. The example repository in t5558
     +    demonstrates this well:
     +
     +     ---------------- bundle-4
     +
     +           4
     +          / \
     +     ----|---|------- bundle-3
     +         |   |
     +         |   3
     +         |   |
     +     ----|---|------- bundle-2
     +         |   |
     +         2   |
     +         |   |
     +     ----|---|------- bundle-1
     +          \ /
     +           1
     +           |
     +     (previous commits)
     +
     +    In this repository, if we already have the objects for bundle-1 and then
     +    try to fetch from this list, the naive approach will fail. bundle-4
     +    requires both bundle-3 and bundle-2, though bundle-3 will successfully
     +    unbundle without bundle-2. Thus, the algorithm needs to keep this in
     +    mind.
     +
          A later implementation detail will store the maximum creationToken seen
          during such a bundle download, and the client will avoid downloading a
          bundle unless its creationToken is strictly greater than that stored
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
       	return 0;
       }
       
     -+struct sorted_bundle_list {
     ++struct bundles_for_sorting {
      +	struct remote_bundle_info **items;
      +	size_t alloc;
      +	size_t nr;
      +};
      +
     -+static int insert_bundle(struct remote_bundle_info *bundle, void *data)
     ++static int append_bundle(struct remote_bundle_info *bundle, void *data)
      +{
     -+	struct sorted_bundle_list *list = data;
     ++	struct bundles_for_sorting *list = data;
      +	list->items[list->nr++] = bundle;
      +	return 0;
      +}
      +
     -+static int compare_creation_token(const void *va, const void *vb)
     ++/**
     ++ * For use in QSORT() to get a list sorted by creationToken
     ++ * in decreasing order.
     ++ */
     ++static int compare_creation_token_decreasing(const void *va, const void *vb)
      +{
      +	const struct remote_bundle_info * const *a = va;
      +	const struct remote_bundle_info * const *b = vb;
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
      +				  struct bundle_list *list)
      +{
      +	int cur;
     -+	int pop_or_push = 0;
     ++	int move_direction = 0;
      +	struct bundle_list_context ctx = {
      +		.r = r,
      +		.list = list,
      +		.mode = list->mode,
      +	};
     -+	struct sorted_bundle_list sorted = {
     ++	struct bundles_for_sorting bundles = {
      +		.alloc = hashmap_get_size(&list->bundles),
      +	};
      +
     -+	ALLOC_ARRAY(sorted.items, sorted.alloc);
     ++	ALLOC_ARRAY(bundles.items, bundles.alloc);
      +
     -+	for_all_bundles_in_list(list, insert_bundle, &sorted);
     ++	for_all_bundles_in_list(list, append_bundle, &bundles);
      +
     -+	QSORT(sorted.items, sorted.nr, compare_creation_token);
     ++	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
      +
      +	/*
     -+	 * Use a stack-based approach to download the bundles and attempt
     -+	 * to unbundle them in decreasing order by creation token. If we
     -+	 * fail to unbundle (after a successful download) then move to the
     -+	 * next non-downloaded bundle (push to the stack) and attempt
     -+	 * downloading. Once we succeed in applying a bundle, move to the
     -+	 * previous unapplied bundle (pop the stack) and attempt to unbundle
     -+	 * it again.
     ++	 * Attempt to download and unbundle the minimum number of bundles by
     ++	 * creationToken in decreasing order. If we fail to unbundle (after
     ++	 * a successful download) then move to the next non-downloaded bundle
     ++	 * and attempt downloading. Once we succeed in applying a bundle,
     ++	 * move to the previous unapplied bundle and attempt to unbundle it
     ++	 * again.
      +	 *
      +	 * In the case of a fresh clone, we will likely download all of the
      +	 * bundles before successfully unbundling the oldest one, then the
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
      +	 * repo's object store.
      +	 */
      +	cur = 0;
     -+	while (cur >= 0 && cur < sorted.nr) {
     -+		struct remote_bundle_info *bundle = sorted.items[cur];
     ++	while (cur >= 0 && cur < bundles.nr) {
     ++		struct remote_bundle_info *bundle = bundles.items[cur];
      +		if (!bundle->file) {
     -+			/* Not downloaded yet. Try downloading. */
     -+			if (download_bundle_to_file(bundle, &ctx)) {
     -+				/* Failure. Push to the stack. */
     -+				pop_or_push = 1;
     ++			/*
     ++			 * Not downloaded yet. Try downloading.
     ++			 *
     ++			 * Note that bundle->file is non-NULL if a download
     ++			 * was attempted, even if it failed to download.
     ++			 */
     ++			if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
     ++				/* Mark as unbundled so we do not retry. */
     ++				bundle->unbundled = 1;
     ++
     ++				/* Try looking deeper in the list. */
     ++				move_direction = 1;
      +				goto stack_operation;
      +			}
      +
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
      +			 * unbundled. Try unbundling again.
      +			 */
      +			if (unbundle_from_file(ctx.r, bundle->file)) {
     -+				/* Failed to unbundle. Push to stack. */
     -+				pop_or_push = 1;
     ++				/* Try looking deeper in the list. */
     ++				move_direction = 1;
      +			} else {
     -+				/* Succeeded in unbundle. Pop stack. */
     -+				pop_or_push = -1;
     ++				/*
     ++				 * Succeeded in unbundle. Retry bundles
     ++				 * that previously failed to unbundle.
     ++				 */
     ++				move_direction = -1;
     ++				bundle->unbundled = 1;
      +			}
      +		}
      +
     @@ bundle-uri.c: static int download_bundle_to_file(struct remote_bundle_info *bund
      +
      +stack_operation:
      +		/* Move in the specified direction and repeat. */
     -+		cur += pop_or_push;
     ++		cur += move_direction;
      +	}
      +
     -+	free(sorted.items);
     ++	free(bundles.items);
      +
      +	/*
      +	 * We succeed if the loop terminates because 'cur' drops below
     @@ bundle-uri.c: static int fetch_bundle_list_in_config_format(struct repository *r
      +	 * it advertises are expected to be bundles, not nested lists.
      +	 * We can drop 'global_list' and 'depth'.
      +	 */
     -+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
     ++	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
      +		result = fetch_bundles_by_token(r, &list_from_bundle);
     -+	else if ((result = download_bundle_list(r, &list_from_bundle,
     ++		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
     ++	} else if ((result = download_bundle_list(r, &list_from_bundle,
       					   global_list, depth)))
       		goto cleanup;
       
     @@ bundle-uri.c: int fetch_bundle_list(struct repository *r, struct bundle_list *li
       	for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
      
       ## t/t5558-clone-bundle-uri.sh ##
     -@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (HTTP, any mode)' '
     - '
     - 
     - test_expect_success 'clone bundle list (http, creationToken)' '
     -+	test_when_finished rm -f trace*.txt &&
     -+
     - 	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
     - 	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
     - 	[bundle]
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creationToken)' '
     - 		creationToken = 4
     - 	EOF
     + 	git -C clone-list-http-2 cat-file --batch-check <oids &&
       
     --	git clone --bundle-uri="$HTTPD_URL/bundle-list" . clone-list-http-2 &&
     -+	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
     -+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
     -+		clone-from clone-list-http-2 &&
     - 
     - 	git -C clone-from for-each-ref --format="%(objectname)" >oids &&
     --	git -C clone-list-http-2 cat-file --batch-check <oids
     -+	git -C clone-list-http-2 cat-file --batch-check <oids &&
     -+
     -+	for b in 1 2 3 4
     -+	do
     -+		test_bundle_downloaded bundle-$b.bundle trace-clone.txt ||
     -+			return 1
     -+	done
     + 	cat >expect <<-EOF &&
     +-	$HTTPD_URL/bundle-1.bundle
     +-	$HTTPD_URL/bundle-2.bundle
     +-	$HTTPD_URL/bundle-3.bundle
     ++	$HTTPD_URL/bundle-list
     + 	$HTTPD_URL/bundle-4.bundle
     ++	$HTTPD_URL/bundle-3.bundle
     ++	$HTTPD_URL/bundle-2.bundle
     ++	$HTTPD_URL/bundle-1.bundle
     ++	EOF
     ++
     ++	test_remote_https_urls <trace-clone.txt >actual &&
     ++	test_cmp expect actual
      +'
      +
     -+test_expect_success 'clone bundle list (http, creationToken)' '
     ++test_expect_success 'clone incomplete bundle list (http, creationToken)' '
      +	test_when_finished rm -f trace*.txt &&
      +
      +	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creat
      +	[bundle "bundle-1"]
      +		uri = bundle-1.bundle
      +		creationToken = 1
     -+
     -+	[bundle "bundle-2"]
     -+		uri = bundle-2.bundle
     -+		creationToken = 2
      +	EOF
      +
      +	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
      +	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
     -+		clone-from clone-token-http &&
     ++		--single-branch --branch=base --no-tags \
     ++		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
      +
     -+	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
     -+	test_bundle_downloaded bundle-2.bundle trace-clone.txt
     ++	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
     ++	$HTTPD_URL/bundle-1.bundle
     + 	EOF
     + 
     +-	# Since the creationToken heuristic is not yet understood by the
     +-	# client, the order cannot be verified at this moment. Sort the
     +-	# list for consistent results.
     +-	test_remote_https_urls <trace-clone.txt | sort >actual &&
     ++	test_remote_https_urls <trace-clone.txt >actual &&
     + 	test_cmp expect actual
       '
       
     - # Do not add tests here unless they use the HTTP server, as they will
      
       ## t/t5601-clone.sh ##
      @@ t/t5601-clone.sh: test_expect_success 'auto-discover multiple bundles from HTTP clone' '
       	grep -f pattern trace.txt
       '
       
     -+# Usage: test_bundle_downloaded <bundle-id> <trace-filename>
     -+test_bundle_downloaded () {
     -+	cat >pattern <<-EOF &&
     -+	"event":"child_start".*"argv":\["git-remote-https","$HTTPD_URL/$1.bundle"\]
     -+	EOF
     -+	grep -f pattern "$2"
     -+}
     -+
      +test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
      +	test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
      +	test_when_finished rm -rf clone-heuristic trace*.txt &&
     @@ t/t5601-clone.sh: test_expect_success 'auto-discover multiple bundles from HTTP
      +		    -c transfer.bundleURI=true clone \
      +		"$HTTPD_URL/smart/repo4.git" clone-heuristic &&
      +
     -+	# We should fetch all bundles
     -+	for b in everything new newest
     -+	do
     -+		test_bundle_downloaded $b trace-clone.txt || return 1
     -+	done
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/newest.bundle
     ++	$HTTPD_URL/new.bundle
     ++	$HTTPD_URL/everything.bundle
     ++	EOF
     ++
     ++	# We should fetch all bundles in the expected order.
     ++	test_remote_https_urls <trace-clone.txt >actual &&
     ++	test_cmp expect actual
      +'
      +
       # DO NOT add non-httpd-specific tests here, because the last part of this
  5:  d9c6f50e4f2 !  6:  17c404c1b83 clone: set fetch.bundleURI if appropriate
     @@ Documentation/config/fetch.txt: fetch.writeCommitGraph::
       	`git push -f`, and `git log --graph`. Defaults to false.
      +
      +fetch.bundleURI::
     -+	This value stores a URI for fetching Git object data from a bundle URI
     -+	before performing an incremental fetch from the origin Git server. If
     -+	the value is `<uri>` then running `git fetch <args>` is equivalent to
     -+	first running `git fetch --bundle-uri=<uri>` immediately before
     -+	`git fetch <args>`. See details of the `--bundle-uri` option in
     -+	linkgit:git-fetch[1].
     ++	This value stores a URI for downloading Git object data from a bundle
     ++	URI before performing an incremental fetch from the origin Git server.
     ++	This is similar to how the `--bundle-uri` option behaves in
     ++	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
     ++	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
     ++	list that is organized for incremental fetches.
      
       ## builtin/clone.c ##
      @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
     @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
       	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
      
       ## bundle-uri.c ##
     -@@ bundle-uri.c: static int fetch_bundle_list_in_config_format(struct repository *r,
     - 	 * it advertises are expected to be bundles, not nested lists.
     - 	 * We can drop 'global_list' and 'depth'.
     - 	 */
     --	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
     -+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
     - 		result = fetch_bundles_by_token(r, &list_from_bundle);
     --	else if ((result = download_bundle_list(r, &list_from_bundle,
     -+		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
     -+	} else if ((result = download_bundle_list(r, &list_from_bundle,
     - 					   global_list, depth)))
     - 		goto cleanup;
     - 
      @@ bundle-uri.c: static int unlink_bundle(struct remote_bundle_info *info, void *data)
       	return 0;
       }
     @@ bundle-uri.h: int bundle_uri_parse_config_format(const char *uri,
        * Given a bundle list that was already advertised (likely by the
      
       ## t/t5558-clone-bundle-uri.sh ##
     -@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creationToken)' '
     - 	test_bundle_downloaded bundle-2.bundle trace-clone.txt
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
     + 		--single-branch --branch=base --no-tags \
     + 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
     + 
     ++	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
     ++
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
     + 	$HTTPD_URL/bundle-1.bundle
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
     + 	test_cmp expect actual
       '
       
      +test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone bundle list (http, creat
      +
      +	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
      +
     -+	# The clone should copy two files: the list and bundle-1.
     -+	test_bundle_downloaded bundle-list trace-clone.txt &&
     -+	test_bundle_downloaded bundle-1.bundle trace-clone.txt &&
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	$HTTPD_URL/bundle-1.bundle
     ++	EOF
     ++
     ++	test_remote_https_urls <trace-clone.txt >actual &&
     ++	test_cmp expect actual &&
      +
      +	# only received base ref from bundle-1
      +	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
  6:  afcfd27a883 =  7:  d491070efed bundle-uri: drop bundle.flag from design doc
  7:  1627fc158b1 !  8:  59e57e04968 fetch: fetch from an external bundle URI
     @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
       	if (dry_run)
       		write_fetch_head = 0;
       
     -+	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) &&
     -+	    !starts_with(bundle_uri, "remote:")) {
     ++	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri)) {
      +		if (fetch_bundle_uri(the_repository, bundle_uri, NULL))
      +			warning(_("failed to fetch bundles from '%s'"), bundle_uri);
      +	}
     @@ builtin/fetch.c: int cmd_fetch(int argc, const char **argv, const char *prefix)
       			die(_("fetch --all does not take a repository argument"));
      
       ## t/t5558-clone-bundle-uri.sh ##
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
     + 	EOF
     + 
     + 	test_remote_https_urls <trace-clone.txt >actual &&
     +-	test_cmp expect actual
     ++	test_cmp expect actual &&
     ++
     ++	# We now have only one bundle ref.
     ++	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
     ++	cat >expect <<-\EOF &&
     ++	refs/bundles/base
     ++	EOF
     ++	test_cmp expect refs &&
     ++
     ++	# Add remaining bundles, exercising the "deepening" strategy
     ++	# for downloading via the creationToken heurisitc.
     ++	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
     ++	[bundle "bundle-2"]
     ++		uri = bundle-2.bundle
     ++		creationToken = 2
     ++
     ++	[bundle "bundle-3"]
     ++		uri = bundle-3.bundle
     ++		creationToken = 3
     ++
     ++	[bundle "bundle-4"]
     ++		uri = bundle-4.bundle
     ++		creationToken = 4
     ++	EOF
     ++
     ++	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
     ++		git -C clone-token-http fetch origin --no-tags \
     ++		refs/heads/merge:refs/heads/merge &&
     ++
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	$HTTPD_URL/bundle-4.bundle
     ++	$HTTPD_URL/bundle-3.bundle
     ++	$HTTPD_URL/bundle-2.bundle
     ++	EOF
     ++
     ++	test_remote_https_urls <trace1.txt >actual &&
     ++	test_cmp expect actual &&
     ++
     ++	# We now have all bundle refs.
     ++	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
     ++
     ++	cat >expect <<-\EOF &&
     ++	refs/bundles/base
     ++	refs/bundles/left
     ++	refs/bundles/merge
     ++	refs/bundles/right
     ++	EOF
     ++	test_cmp expect refs
     + '
     + 
     + test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
       	cat >expect <<-\EOF &&
       	refs/bundles/base
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
      +		refs/heads/left:refs/heads/left \
      +		refs/heads/right:refs/heads/right &&
      +
     -+	# This fetch should copy two files: the list and bundle-2.
     -+	test_bundle_downloaded bundle-list trace1.txt &&
     -+	test_bundle_downloaded bundle-2.bundle trace1.txt &&
     -+	! test_bundle_downloaded bundle-1.bundle trace1.txt &&
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	$HTTPD_URL/bundle-2.bundle
     ++	EOF
     ++
     ++	test_remote_https_urls <trace1.txt >actual &&
     ++	test_cmp expect actual &&
      +
      +	# received left from bundle-2
      +	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
      +		creationToken = 4
      +	EOF
      +
     -+	# This fetch should skip bundle-3.bundle, since its objets are
     ++	# This fetch should skip bundle-3.bundle, since its objects are
      +	# already local (we have the requisite commits for bundle-4.bundle).
      +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
      +		git -C fetch-http-4 fetch origin --no-tags \
      +		refs/heads/merge:refs/heads/merge &&
      +
     -+	# This fetch should copy three files: the list, bundle-3, and bundle-4.
     -+	test_bundle_downloaded bundle-list trace2.txt &&
     -+	test_bundle_downloaded bundle-4.bundle trace2.txt &&
     -+	! test_bundle_downloaded bundle-1.bundle trace2.txt &&
     -+	! test_bundle_downloaded bundle-2.bundle trace2.txt &&
     -+	! test_bundle_downloaded bundle-3.bundle trace2.txt &&
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	$HTTPD_URL/bundle-4.bundle
     ++	EOF
     ++
     ++	test_remote_https_urls <trace2.txt >actual &&
     ++	test_cmp expect actual &&
      +
      +	# received merge ref from bundle-4, but right is missing
      +	# because we did not download bundle-3.
  8:  51f210ddeb4 !  9:  6a1504b1c3a bundle-uri: store fetch.bundleCreationToken
     @@ Commit message
          When checking the same bundle list twice, this strategy requires
          downloading the bundle with the maximum creationToken again, which is
          wasteful. The creationToken heuristic promises that the client will not
     -    have a use for that bundle if its creationToken value is the at most the
     +    have a use for that bundle if its creationToken value is at most the
          previous creationToken value.
      
          To prevent these wasteful downloads, create a fetch.bundleCreationToken
     @@ Commit message
      
       ## Documentation/config/fetch.txt ##
      @@ Documentation/config/fetch.txt: fetch.bundleURI::
     - 	first running `git fetch --bundle-uri=<uri>` immediately before
     - 	`git fetch <args>`. See details of the `--bundle-uri` option in
     - 	linkgit:git-fetch[1].
     + 	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
     + 	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
     + 	list that is organized for incremental fetches.
     +++
     ++If you modify this value and your repository has a `fetch.bundleCreationToken`
     ++value, then remove that `fetch.bundleCreationToken` value before fetching from
     ++the new bundle URI.
      +
      +fetch.bundleCreationToken::
      +	When using `fetch.bundleURI` to fetch incrementally from a bundle
     @@ Documentation/config/fetch.txt: fetch.bundleURI::
      +	This value is used to prevent downloading bundles in the future
      +	if the advertised `creationToken` is not strictly larger than this
      +	value.
     +++
     ++The creation token values are chosen by the provider serving the specific
     ++bundle URI. If you modify the URI at `fetch.bundleURI`, then be sure to
     ++remove the value for the `fetch.bundleCreationToken` value before fetching.
      
       ## bundle-uri.c ##
      @@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
       {
       	int cur;
     - 	int pop_or_push = 0;
     + 	int move_direction = 0;
      +	const char *creationTokenStr;
     -+	uint64_t maxCreationToken;
     ++	uint64_t maxCreationToken = 0, newMaxCreationToken = 0;
       	struct bundle_list_context ctx = {
       		.r = r,
       		.list = list,
      @@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
       
     - 	for_all_bundles_in_list(list, insert_bundle, &sorted);
     + 	for_all_bundles_in_list(list, append_bundle, &bundles);
       
     -+	if (!sorted.nr) {
     -+		free(sorted.items);
     ++	if (!bundles.nr) {
     ++		free(bundles.items);
      +		return 0;
      +	}
      +
     - 	QSORT(sorted.items, sorted.nr, compare_creation_token);
     + 	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
       
      +	/*
      +	 * If fetch.bundleCreationToken exists, parses to a uint64t, and
     @@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
      +				   "fetch.bundlecreationtoken",
      +				   &creationTokenStr) &&
      +	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
     -+	    sorted.items[0]->creationToken <= maxCreationToken) {
     -+		free(sorted.items);
     ++	    bundles.items[0]->creationToken <= maxCreationToken) {
     ++		free(bundles.items);
      +		return 0;
      +	}
      +
       	/*
     - 	 * Use a stack-based approach to download the bundles and attempt
     - 	 * to unbundle them in decreasing order by creation token. If we
     + 	 * Attempt to download and unbundle the minimum number of bundles by
     + 	 * creationToken in decreasing order. If we fail to unbundle (after
     +@@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
     + 	cur = 0;
     + 	while (cur >= 0 && cur < bundles.nr) {
     + 		struct remote_bundle_info *bundle = bundles.items[cur];
     ++
     ++		/*
     ++		 * If we need to dig into bundles below the previous
     ++		 * creation token value, then likely we are in an erroneous
     ++		 * state due to missing or invalid bundles. Halt the process
     ++		 * instead of continuing to download extra data.
     ++		 */
     ++		if (bundle->creationToken <= maxCreationToken)
     ++			break;
     ++
     + 		if (!bundle->file) {
     + 			/*
     + 			 * Not downloaded yet. Try downloading.
     +@@ bundle-uri.c: static int fetch_bundles_by_token(struct repository *r,
     + 				 */
     + 				move_direction = -1;
     + 				bundle->unbundled = 1;
     ++
     ++				if (bundle->creationToken > newMaxCreationToken)
     ++					newMaxCreationToken = bundle->creationToken;
     + 			}
     + 		}
     + 
      @@ bundle-uri.c: stack_operation:
     - 		cur += pop_or_push;
     + 		cur += move_direction;
       	}
       
     --	free(sorted.items);
     +-	free(bundles.items);
      -
       	/*
       	 * We succeed if the loop terminates because 'cur' drops below
     @@ bundle-uri.c: stack_operation:
       	 */
      +	if (cur < 0) {
      +		struct strbuf value = STRBUF_INIT;
     -+		strbuf_addf(&value, "%"PRIu64"", sorted.items[0]->creationToken);
     ++		strbuf_addf(&value, "%"PRIu64"", newMaxCreationToken);
      +		if (repo_config_set_multivar_gently(ctx.r,
      +						    "fetch.bundleCreationToken",
      +						    value.buf, NULL, 0))
     @@ bundle-uri.c: stack_operation:
      +		strbuf_release(&value);
      +	}
      +
     -+	free(sorted.items);
     ++	free(bundles.items);
       	return cur >= 0;
       }
       
      
       ## t/t5558-clone-bundle-uri.sh ##
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
     + 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
     + 
     + 	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
     ++	test_cmp_config -C clone-token-http 1 fetch.bundlecreationtoken &&
     + 
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
     +@@ t/t5558-clone-bundle-uri.sh: test_expect_success 'clone incomplete bundle list (http, creationToken)' '
     + 	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
     + 		git -C clone-token-http fetch origin --no-tags \
     + 		refs/heads/merge:refs/heads/merge &&
     ++	test_cmp_config -C clone-token-http 4 fetch.bundlecreationtoken &&
     + 
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
       		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
       
       	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
      +	test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken &&
       
     - 	# The clone should copy two files: the list and bundle-1.
     - 	test_bundle_downloaded bundle-list trace-clone.txt &&
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
     + 		git -C fetch-http-4 fetch origin --no-tags \
       		refs/heads/left:refs/heads/left \
       		refs/heads/right:refs/heads/right &&
     - 
      +	test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken &&
     -+
     - 	# This fetch should copy two files: the list and bundle-2.
     - 	test_bundle_downloaded bundle-list trace1.txt &&
     - 	test_bundle_downloaded bundle-2.bundle trace1.txt &&
     + 
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
       	EOF
       	test_cmp expect refs &&
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
      +		git -C fetch-http-4 fetch origin --no-tags \
      +		refs/heads/left:refs/heads/left \
      +		refs/heads/right:refs/heads/right &&
     -+	test_bundle_downloaded bundle-list trace1b.txt &&
     -+	! test_bundle_downloaded bundle-1.bundle trace1b.txt &&
     -+	! test_bundle_downloaded bundle-2.bundle trace1b.txt &&
     ++
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	EOF
     ++	test_remote_https_urls <trace1b.txt >actual &&
     ++	test_cmp expect actual &&
      +
       	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
       	[bundle "bundle-3"]
       		uri = bundle-3.bundle
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
     + 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
       		git -C fetch-http-4 fetch origin --no-tags \
       		refs/heads/merge:refs/heads/merge &&
     - 
      +	test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken &&
     -+
     - 	# This fetch should copy three files: the list, bundle-3, and bundle-4.
     - 	test_bundle_downloaded bundle-list trace2.txt &&
     - 	test_bundle_downloaded bundle-4.bundle trace2.txt &&
     + 
     + 	cat >expect <<-EOF &&
     + 	$HTTPD_URL/bundle-list
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
       	refs/bundles/left
       	refs/bundles/merge
     @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'http clone with bundle.heurist
      +	# No-op fetch
      +	GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \
      +		git -C fetch-http-4 fetch origin &&
     -+	test_bundle_downloaded bundle-list trace2b.txt &&
     -+	! test_bundle_downloaded bundle-1.bundle trace2b.txt &&
     -+	! test_bundle_downloaded bundle-2.bundle trace2b.txt &&
     -+	! test_bundle_downloaded bundle-3.bundle trace2b.txt &&
     -+	! test_bundle_downloaded bundle-4.bundle trace2b.txt
     ++
     ++	cat >expect <<-EOF &&
     ++	$HTTPD_URL/bundle-list
     ++	EOF
     ++	test_remote_https_urls <trace2b.txt >actual &&
     ++	test_cmp expect actual
       '
       
       # Do not add tests here unless they use the HTTP server, as they will
  -:  ----------- > 10:  676522615ad bundle-uri: test missing bundles with heuristic

-- 
gitgitgadget

^ permalink raw reply

* [PATCH v2 07/10] bundle-uri: drop bundle.flag from design doc
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

The Implementation Plan section lists a 'bundle.flag' option that is not
documented anywhere else. What is documented elsewhere in the document
and implemented by previous changes is the 'bundle.heuristic' config
key. For now, a heuristic is required to indicate that a bundle list is
organized for use during 'git fetch', and it is also sufficient for all
existing designs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/technical/bundle-uri.txt | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
index b78d01d9adf..91d3a13e327 100644
--- a/Documentation/technical/bundle-uri.txt
+++ b/Documentation/technical/bundle-uri.txt
@@ -479,14 +479,14 @@ outline for submitting these features:
    (This choice is an opt-in via a config option and a command-line
    option.)
 
-4. Allow the client to understand the `bundle.flag=forFetch` configuration
+4. Allow the client to understand the `bundle.heuristic` configuration key
    and the `bundle.<id>.creationToken` heuristic. When `git clone`
-   discovers a bundle URI with `bundle.flag=forFetch`, it configures the
-   client repository to check that bundle URI during later `git fetch <remote>`
+   discovers a bundle URI with `bundle.heuristic`, it configures the client
+   repository to check that bundle URI during later `git fetch <remote>`
    commands.
 
 5. Allow clients to discover bundle URIs during `git fetch` and configure
-   a bundle URI for later fetches if `bundle.flag=forFetch`.
+   a bundle URI for later fetches if `bundle.heuristic` is set.
 
 6. Implement the "inspect headers" heuristic to reduce data downloads when
    the `bundle.<id>.creationToken` heuristic is not available.
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 06/10] clone: set fetch.bundleURI if appropriate
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

Bundle providers may organize their bundle lists in a way that is
intended to improve incremental fetches, not just initial clones.
However, they do need to state that they have organized with that in
mind, or else the client will not expect to save time by downloading
bundles after the initial clone. This is done by specifying a
bundle.heuristic value.

There are two types of bundle lists: those at a static URI and those
that are advertised from a Git remote over protocol v2.

The new fetch.bundleURI config value applies for static bundle URIs that
are not advertised over protocol v2. If the user specifies a static URI
via 'git clone --bundle-uri', then Git can set this config as a reminder
for future 'git fetch' operations to check the bundle list before
connecting to the remote(s).

For lists provided over protocol v2, we will want to take a different
approach and create a property of the remote itself by creating a
remote.<id>.* type config key. That is not implemented in this change.

Later changes will update 'git fetch' to consume this option.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/fetch.txt |  8 +++++++
 builtin/clone.c                |  6 +++++-
 bundle-uri.c                   |  5 ++++-
 bundle-uri.h                   |  8 ++++++-
 t/t5558-clone-bundle-uri.sh    | 39 ++++++++++++++++++++++++++++++++++
 5 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index cd65d236b43..244f44d460f 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -96,3 +96,11 @@ fetch.writeCommitGraph::
 	merge and the write may take longer. Having an updated commit-graph
 	file helps performance of many Git commands, including `git merge-base`,
 	`git push -f`, and `git log --graph`. Defaults to false.
+
+fetch.bundleURI::
+	This value stores a URI for downloading Git object data from a bundle
+	URI before performing an incremental fetch from the origin Git server.
+	This is similar to how the `--bundle-uri` option behaves in
+	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
+	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
+	list that is organized for incremental fetches.
diff --git a/builtin/clone.c b/builtin/clone.c
index 5453ba5277f..5370617664d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1248,12 +1248,16 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 * data from the --bundle-uri option.
 	 */
 	if (bundle_uri) {
+		int has_heuristic = 0;
+
 		/* At this point, we need the_repository to match the cloned repo. */
 		if (repo_init(the_repository, git_dir, work_tree))
 			warning(_("failed to initialize the repo, skipping bundle URI"));
-		else if (fetch_bundle_uri(the_repository, bundle_uri))
+		else if (fetch_bundle_uri(the_repository, bundle_uri, &has_heuristic))
 			warning(_("failed to fetch objects from bundle URI '%s'"),
 				bundle_uri);
+		else if (has_heuristic)
+			git_config_set_gently("fetch.bundleuri", bundle_uri);
 	}
 
 	strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
diff --git a/bundle-uri.c b/bundle-uri.c
index 39acd856fb9..162a9276f31 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -742,7 +742,8 @@ static int unlink_bundle(struct remote_bundle_info *info, void *data)
 	return 0;
 }
 
-int fetch_bundle_uri(struct repository *r, const char *uri)
+int fetch_bundle_uri(struct repository *r, const char *uri,
+		     int *has_heuristic)
 {
 	int result;
 	struct bundle_list list;
@@ -762,6 +763,8 @@ int fetch_bundle_uri(struct repository *r, const char *uri)
 	result = unbundle_all_bundles(r, &list);
 
 cleanup:
+	if (has_heuristic)
+		*has_heuristic = (list.heuristic != BUNDLE_HEURISTIC_NONE);
 	for_all_bundles_in_list(&list, unlink_bundle, NULL);
 	clear_bundle_list(&list);
 	clear_remote_bundle_info(&bundle, NULL);
diff --git a/bundle-uri.h b/bundle-uri.h
index ef32840bfa6..6dbc780f661 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -124,8 +124,14 @@ int bundle_uri_parse_config_format(const char *uri,
  * based on that information.
  *
  * Returns non-zero if no bundle information is found at the given 'uri'.
+ *
+ * If the pointer 'has_heuristic' is non-NULL, then the value it points to
+ * will be set to be non-zero if and only if the fetched list has a
+ * heuristic value. Such a value indicates that the list was designed for
+ * incremental fetches.
  */
-int fetch_bundle_uri(struct repository *r, const char *uri);
+int fetch_bundle_uri(struct repository *r, const char *uri,
+		     int *has_heuristic);
 
 /**
  * Given a bundle list that was already advertised (likely by the
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 6f9417a0afb..b2d15e141ca 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -432,6 +432,8 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 		--single-branch --branch=base --no-tags \
 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
 
+	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
 	$HTTPD_URL/bundle-1.bundle
@@ -441,6 +443,43 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	test_cmp expect actual
 '
 
+test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
+	test_when_finished rm -rf fetch-http-4 trace*.txt &&
+
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
+
+	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+
+	test_remote_https_urls <trace-clone.txt >actual &&
+	test_cmp expect actual &&
+
+	# only received base ref from bundle-1
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	EOF
+	test_cmp expect refs
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 05/10] bundle-uri: download in creationToken order
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.

The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.

During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.

Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.

However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:

 ---------------- bundle-4

       4
      / \
 ----|---|------- bundle-3
     |   |
     |   3
     |   |
 ----|---|------- bundle-2
     |   |
     2   |
     |   |
 ----|---|------- bundle-1
      \ /
       1
       |
 (previous commits)

In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.

A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.

Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 156 +++++++++++++++++++++++++++++++++++-
 t/t5558-clone-bundle-uri.sh |  40 +++++++--
 t/t5601-clone.sh            |  46 +++++++++++
 3 files changed, 233 insertions(+), 9 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index f46ab5c1743..39acd856fb9 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -453,6 +453,139 @@ static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data
 	return 0;
 }
 
+struct bundles_for_sorting {
+	struct remote_bundle_info **items;
+	size_t alloc;
+	size_t nr;
+};
+
+static int append_bundle(struct remote_bundle_info *bundle, void *data)
+{
+	struct bundles_for_sorting *list = data;
+	list->items[list->nr++] = bundle;
+	return 0;
+}
+
+/**
+ * For use in QSORT() to get a list sorted by creationToken
+ * in decreasing order.
+ */
+static int compare_creation_token_decreasing(const void *va, const void *vb)
+{
+	const struct remote_bundle_info * const *a = va;
+	const struct remote_bundle_info * const *b = vb;
+
+	if ((*a)->creationToken > (*b)->creationToken)
+		return -1;
+	if ((*a)->creationToken < (*b)->creationToken)
+		return 1;
+	return 0;
+}
+
+static int fetch_bundles_by_token(struct repository *r,
+				  struct bundle_list *list)
+{
+	int cur;
+	int move_direction = 0;
+	struct bundle_list_context ctx = {
+		.r = r,
+		.list = list,
+		.mode = list->mode,
+	};
+	struct bundles_for_sorting bundles = {
+		.alloc = hashmap_get_size(&list->bundles),
+	};
+
+	ALLOC_ARRAY(bundles.items, bundles.alloc);
+
+	for_all_bundles_in_list(list, append_bundle, &bundles);
+
+	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
+
+	/*
+	 * Attempt to download and unbundle the minimum number of bundles by
+	 * creationToken in decreasing order. If we fail to unbundle (after
+	 * a successful download) then move to the next non-downloaded bundle
+	 * and attempt downloading. Once we succeed in applying a bundle,
+	 * move to the previous unapplied bundle and attempt to unbundle it
+	 * again.
+	 *
+	 * In the case of a fresh clone, we will likely download all of the
+	 * bundles before successfully unbundling the oldest one, then the
+	 * rest of the bundles unbundle successfully in increasing order
+	 * of creationToken.
+	 *
+	 * If there are existing objects, then this process may terminate
+	 * early when all required commits from "new" bundles exist in the
+	 * repo's object store.
+	 */
+	cur = 0;
+	while (cur >= 0 && cur < bundles.nr) {
+		struct remote_bundle_info *bundle = bundles.items[cur];
+		if (!bundle->file) {
+			/*
+			 * Not downloaded yet. Try downloading.
+			 *
+			 * Note that bundle->file is non-NULL if a download
+			 * was attempted, even if it failed to download.
+			 */
+			if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
+				/* Mark as unbundled so we do not retry. */
+				bundle->unbundled = 1;
+
+				/* Try looking deeper in the list. */
+				move_direction = 1;
+				goto stack_operation;
+			}
+
+			/* We expect bundles when using creationTokens. */
+			if (!is_bundle(bundle->file, 1)) {
+				warning(_("file downloaded from '%s' is not a bundle"),
+					bundle->uri);
+				break;
+			}
+		}
+
+		if (bundle->file && !bundle->unbundled) {
+			/*
+			 * This was downloaded, but not successfully
+			 * unbundled. Try unbundling again.
+			 */
+			if (unbundle_from_file(ctx.r, bundle->file)) {
+				/* Try looking deeper in the list. */
+				move_direction = 1;
+			} else {
+				/*
+				 * Succeeded in unbundle. Retry bundles
+				 * that previously failed to unbundle.
+				 */
+				move_direction = -1;
+				bundle->unbundled = 1;
+			}
+		}
+
+		/*
+		 * Else case: downloaded and unbundled successfully.
+		 * Skip this by moving in the same direction as the
+		 * previous step.
+		 */
+
+stack_operation:
+		/* Move in the specified direction and repeat. */
+		cur += move_direction;
+	}
+
+	free(bundles.items);
+
+	/*
+	 * We succeed if the loop terminates because 'cur' drops below
+	 * zero. The other case is that we terminate because 'cur'
+	 * reaches the end of the list, so we have a failure no matter
+	 * which bundles we apply from the list.
+	 */
+	return cur >= 0;
+}
+
 static int download_bundle_list(struct repository *r,
 				struct bundle_list *local_list,
 				struct bundle_list *global_list,
@@ -490,7 +623,15 @@ static int fetch_bundle_list_in_config_format(struct repository *r,
 		goto cleanup;
 	}
 
-	if ((result = download_bundle_list(r, &list_from_bundle,
+	/*
+	 * If this list uses the creationToken heuristic, then the URIs
+	 * it advertises are expected to be bundles, not nested lists.
+	 * We can drop 'global_list' and 'depth'.
+	 */
+	if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
+		result = fetch_bundles_by_token(r, &list_from_bundle);
+		global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
+	} else if ((result = download_bundle_list(r, &list_from_bundle,
 					   global_list, depth)))
 		goto cleanup;
 
@@ -632,6 +773,14 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
 	int result;
 	struct bundle_list global_list;
 
+	/*
+	 * If the creationToken heuristic is used, then the URIs
+	 * advertised by 'list' are not nested lists and instead
+	 * direct bundles. We do not need to use global_list.
+	 */
+	if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		return fetch_bundles_by_token(r, list);
+
 	init_bundle_list(&global_list);
 
 	/* If a bundle is added to this global list, then it is required. */
@@ -640,7 +789,10 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
 	if ((result = download_bundle_list(r, list, &global_list, 0)))
 		goto cleanup;
 
-	result = unbundle_all_bundles(r, &global_list);
+	if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
+		result = fetch_bundles_by_token(r, list);
+	else
+		result = unbundle_all_bundles(r, &global_list);
 
 cleanup:
 	for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 474432c8ace..6f9417a0afb 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -401,17 +401,43 @@ test_expect_success 'clone bundle list (http, creationToken)' '
 	git -C clone-list-http-2 cat-file --batch-check <oids &&
 
 	cat >expect <<-EOF &&
-	$HTTPD_URL/bundle-1.bundle
-	$HTTPD_URL/bundle-2.bundle
-	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-list
 	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+
+	test_remote_https_urls <trace-clone.txt >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone incomplete bundle list (http, creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+
+	cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+	EOF
+
+	GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
+	git clone --bundle-uri="$HTTPD_URL/bundle-list" \
+		--single-branch --branch=base --no-tags \
+		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
+
+	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-1.bundle
 	EOF
 
-	# Since the creationToken heuristic is not yet understood by the
-	# client, the order cannot be verified at this moment. Sort the
-	# list for consistent results.
-	test_remote_https_urls <trace-clone.txt | sort >actual &&
+	test_remote_https_urls <trace-clone.txt >actual &&
 	test_cmp expect actual
 '
 
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 1928ea1dd7c..b7d5551262c 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -831,6 +831,52 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' '
 	grep -f pattern trace.txt
 '
 
+test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
+	test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+	test_when_finished rm -rf clone-heuristic trace*.txt &&
+
+	test_commit -C src newest &&
+	git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD &&
+	git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF &&
+	[uploadPack]
+		advertiseBundleURIs = true
+
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "everything"]
+		uri = $HTTPD_URL/everything.bundle
+		creationtoken = 1
+
+	[bundle "new"]
+		uri = $HTTPD_URL/new.bundle
+		creationtoken = 2
+
+	[bundle "newest"]
+		uri = $HTTPD_URL/newest.bundle
+		creationtoken = 3
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
+		git -c protocol.version=2 \
+		    -c transfer.bundleURI=true clone \
+		"$HTTPD_URL/smart/repo4.git" clone-heuristic &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/newest.bundle
+	$HTTPD_URL/new.bundle
+	$HTTPD_URL/everything.bundle
+	EOF
+
+	# We should fetch all bundles in the expected order.
+	test_remote_https_urls <trace-clone.txt >actual &&
+	test_cmp expect actual
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 03/10] bundle-uri: parse bundle.heuristic=creationToken
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

The bundle.heuristic value communicates that the bundle list is
organized to make use of the bundle.<id>.creationToken values that may
be provided in the bundle list. Those values will create a total order
on the bundles, allowing the Git client to download them in a specific
order and even remember previously-downloaded bundles by storing the
maximum creation token value.

Before implementing any logic that parses or uses the
bundle.<id>.creationToken values, teach Git to parse the
bundle.heuristic value from a bundle list. We can use 'test-tool
bundle-uri' to print the heuristic value and verify that the parsing
works correctly.

As an extra precaution, create the internal 'heuristics' array to be a
list of (enum, string) pairs so we can iterate through the array entries
carefully, regardless of the enum values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/bundle.txt |  7 +++++++
 bundle-uri.c                    | 34 +++++++++++++++++++++++++++++++++
 bundle-uri.h                    | 14 ++++++++++++++
 t/t5750-bundle-uri-parse.sh     | 19 ++++++++++++++++++
 4 files changed, 74 insertions(+)

diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
index daa21eb674a..3faae386853 100644
--- a/Documentation/config/bundle.txt
+++ b/Documentation/config/bundle.txt
@@ -15,6 +15,13 @@ bundle.mode::
 	complete understanding of the bundled information (`all`) or if any one
 	of the listed bundle URIs is sufficient (`any`).
 
+bundle.heuristic::
+	If this string-valued key exists, then the bundle list is designed to
+	work well with incremental `git fetch` commands. The heuristic signals
+	that there are additional keys available for each bundle that help
+	determine which subset of bundles the client should download. The
+	only value currently understood is `creationToken`.
+
 bundle.<id>.*::
 	The `bundle.<id>.*` keys are used to describe a single item in the
 	bundle list, grouped under `<id>` for identification purposes.
diff --git a/bundle-uri.c b/bundle-uri.c
index 2f079f713cf..0d64b1d84ba 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -9,6 +9,14 @@
 #include "config.h"
 #include "remote.h"
 
+static struct {
+	enum bundle_list_heuristic heuristic;
+	const char *name;
+} heuristics[BUNDLE_HEURISTIC__COUNT] = {
+	{ BUNDLE_HEURISTIC_NONE, ""},
+	{ BUNDLE_HEURISTIC_CREATIONTOKEN, "creationToken" },
+};
+
 static int compare_bundles(const void *hashmap_cmp_fn_data,
 			   const struct hashmap_entry *he1,
 			   const struct hashmap_entry *he2,
@@ -100,6 +108,17 @@ void print_bundle_list(FILE *fp, struct bundle_list *list)
 	fprintf(fp, "\tversion = %d\n", list->version);
 	fprintf(fp, "\tmode = %s\n", mode);
 
+	if (list->heuristic) {
+		int i;
+		for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
+			if (heuristics[i].heuristic == list->heuristic) {
+				printf("\theuristic = %s\n",
+				       heuristics[list->heuristic].name);
+				break;
+			}
+		}
+	}
+
 	for_all_bundles_in_list(list, summarize_bundle, fp);
 }
 
@@ -142,6 +161,21 @@ static int bundle_list_update(const char *key, const char *value,
 			return 0;
 		}
 
+		if (!strcmp(subkey, "heuristic")) {
+			int i;
+			for (i = 0; i < BUNDLE_HEURISTIC__COUNT; i++) {
+				if (heuristics[i].heuristic &&
+				    heuristics[i].name &&
+				    !strcmp(value, heuristics[i].name)) {
+					list->heuristic = heuristics[i].heuristic;
+					return 0;
+				}
+			}
+
+			/* Ignore unknown heuristics. */
+			return 0;
+		}
+
 		/* Ignore other unknown global keys. */
 		return 0;
 	}
diff --git a/bundle-uri.h b/bundle-uri.h
index d5e89f1671c..2e44a50a90b 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -52,6 +52,14 @@ enum bundle_list_mode {
 	BUNDLE_MODE_ANY
 };
 
+enum bundle_list_heuristic {
+	BUNDLE_HEURISTIC_NONE = 0,
+	BUNDLE_HEURISTIC_CREATIONTOKEN,
+
+	/* Must be last. */
+	BUNDLE_HEURISTIC__COUNT
+};
+
 /**
  * A bundle_list contains an unordered set of remote_bundle_info structs,
  * as well as information about the bundle listing, such as version and
@@ -75,6 +83,12 @@ struct bundle_list {
 	 * advertised by the bundle list at that location.
 	 */
 	char *baseURI;
+
+	/**
+	 * A list can have a heuristic, which helps reduce the number of
+	 * downloaded bundles.
+	 */
+	enum bundle_list_heuristic heuristic;
 };
 
 void init_bundle_list(struct bundle_list *list);
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 7b4f930e532..6fc92a9c0d4 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -250,4 +250,23 @@ test_expect_success 'parse config format edge cases: empty key or value' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format: creationToken heuristic' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+	[bundle "two"]
+		uri = https://example.com/bundle.bdl
+	[bundle "three"]
+		uri = file:///usr/share/git/bundle.bdl
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	test_must_be_empty err &&
+	test_cmp_config_output expect actual
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v2 04/10] bundle-uri: parse bundle.<id>.creationToken values
From: Derrick Stolee via GitGitGadget @ 2023-01-23 15:21 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v2.git.1674487310.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

The previous change taught Git to parse the bundle.heuristic value,
especially when its value is "creationToken". Now, teach Git to parse
the bundle.<id>.creationToken values on each bundle in a bundle list.

Before implementing any logic based on creationToken values for the
creationToken heuristic, parse and print these values for testing
purposes.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 bundle-uri.c                | 10 ++++++++++
 bundle-uri.h                |  6 ++++++
 t/t5750-bundle-uri-parse.sh | 18 ++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index 0d64b1d84ba..f46ab5c1743 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -83,6 +83,9 @@ static int summarize_bundle(struct remote_bundle_info *info, void *data)
 	FILE *fp = data;
 	fprintf(fp, "[bundle \"%s\"]\n", info->id);
 	fprintf(fp, "\turi = %s\n", info->uri);
+
+	if (info->creationToken)
+		fprintf(fp, "\tcreationToken = %"PRIu64"\n", info->creationToken);
 	return 0;
 }
 
@@ -203,6 +206,13 @@ static int bundle_list_update(const char *key, const char *value,
 		return 0;
 	}
 
+	if (!strcmp(subkey, "creationtoken")) {
+		if (sscanf(value, "%"PRIu64, &bundle->creationToken) != 1)
+			warning(_("could not parse bundle list key %s with value '%s'"),
+				"creationToken", value);
+		return 0;
+	}
+
 	/*
 	 * At this point, we ignore any information that we don't
 	 * understand, assuming it to be hints for a heuristic the client
diff --git a/bundle-uri.h b/bundle-uri.h
index 2e44a50a90b..ef32840bfa6 100644
--- a/bundle-uri.h
+++ b/bundle-uri.h
@@ -42,6 +42,12 @@ struct remote_bundle_info {
 	 * this boolean is true.
 	 */
 	unsigned unbundled:1;
+
+	/**
+	 * If the bundle is part of a list with the creationToken
+	 * heuristic, then we use this member for sorting the bundles.
+	 */
+	uint64_t creationToken;
 };
 
 #define REMOTE_BUNDLE_INFO_INIT { 0 }
diff --git a/t/t5750-bundle-uri-parse.sh b/t/t5750-bundle-uri-parse.sh
index 6fc92a9c0d4..81bdf58b944 100755
--- a/t/t5750-bundle-uri-parse.sh
+++ b/t/t5750-bundle-uri-parse.sh
@@ -258,10 +258,13 @@ test_expect_success 'parse config format: creationToken heuristic' '
 		heuristic = creationToken
 	[bundle "one"]
 		uri = http://example.com/bundle.bdl
+		creationToken = 123456
 	[bundle "two"]
 		uri = https://example.com/bundle.bdl
+		creationToken = 12345678901234567890
 	[bundle "three"]
 		uri = file:///usr/share/git/bundle.bdl
+		creationToken = 1
 	EOF
 
 	test-tool bundle-uri parse-config expect >actual 2>err &&
@@ -269,4 +272,19 @@ test_expect_success 'parse config format: creationToken heuristic' '
 	test_cmp_config_output expect actual
 '
 
+test_expect_success 'parse config format edge cases: creationToken heuristic' '
+	cat >expect <<-\EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+	[bundle "one"]
+		uri = http://example.com/bundle.bdl
+		creationToken = bogus
+	EOF
+
+	test-tool bundle-uri parse-config expect >actual 2>err &&
+	grep "could not parse bundle list key creationToken with value '\''bogus'\''" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox