Git development

Git development
 help / color / mirror / Atom feed

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Michal Suchánek @ 2023-01-31 20:45 UTC (permalink / raw)
  To: Eli Schwartz; +Cc: Konstantin Ryabitsev, brian m. carlson, Git List
In-Reply-To: <df7b0b43-efa2-ea04-dc5b-9515e7f1d86f@gmail.com>

On Tue, Jan 31, 2023 at 11:34:59AM -0500, Eli Schwartz wrote:
> On 1/31/23 11:20 AM, Konstantin Ryabitsev wrote:
> > On Tue, Jan 31, 2023 at 10:56:52AM -0500, Eli Schwartz wrote:
> >> And for tarballs that are generated once and uploaded to ftp storage,
> >> not repeatedly generated on the fly, we know the checksum will never
> >> legitimately change, so we *want* to hash the compressed file.
> >> Decompressing kernel.org tarballs in order to run PGP on them is *slow*.
> > 
> > FWIW, the most correct way is:
> > 
> > * download sha256sums.asc and verify its signature (auto-signed by infra)
> > * download the tarball you want and verify that the checksum matches
> > * uncompress and verify the PGP signature (signed by developer)
> > 
> > This script implements this workflow:
> > https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/tree/get-verified-tarball
> 
> 
> This is just what I said, but with an additional first step for when you
> are updating to a new tarball and don't have your own checksums
> integrated into your own ecosystem tracking.
> 
> In most contexts, it's utterly unacceptable to not remember the checksum
> of the file you used last time and instead simply trust PGP identity
> verification. This permits upstream the technical means to be malicious,
> and re-upload a totally different tarball with the same name, different
> contents, and different PGP signature, and you will never notice because
> the PGP signature is still okay.

But where is the hash remembered?

The signature is a hash+signature, it you can replace that, you can also
repolace a hash without a signature.

You can store hashesd of anything you want locally, and indeed such
stored hashes in some build systemns did detect some code hosting
corruption but that's not for upstream to do, that's something that only
unrelated third party can do.

Thanks

Michal

^ permalink raw reply

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Konstantin Ryabitsev @ 2023-01-31 20:34 UTC (permalink / raw)
  To: Eli Schwartz; +Cc: brian m. carlson, Git List
In-Reply-To: <df7b0b43-efa2-ea04-dc5b-9515e7f1d86f@gmail.com>

On Tue, Jan 31, 2023 at 11:34:59AM -0500, Eli Schwartz wrote:
> In most contexts, it's utterly unacceptable to not remember the checksum
> of the file you used last time and instead simply trust PGP identity
> verification. This permits upstream the technical means to be malicious,
> and re-upload a totally different tarball with the same name, different
> contents, and different PGP signature, and you will never notice because
> the PGP signature is still okay.

Yes, it's true, and it's something that Sigstore tries to address.

That said, if I wanted to trojan a download and had access to both the
infrastructure and the developer's credentials, I wouldn't pick a months-old
release for this purpose. I would wait until I see a new release coming out
and then swap it mid-flight. This lets me defeat even transparency-log based
solutions like sigstore.

(I'll probably be giving a talk at the Linux Security Summit titled "How to
trojan the Linux Kernel" where I'll go into some of these considerations. :))

-K

^ permalink raw reply

* Re: Bug: Cloning git repositories behind a proxy using the git:// protocol broken since 2.32
From: Jacob Keller @ 2023-01-31 20:31 UTC (permalink / raw)
  To: Florian Bezdeka
  Cc: brian m. carlson, git@vger.kernel.org, gitster@pobox.com,
	greg.pflaum@pnp-hcl.com, peff@peff.net
In-Reply-To: <339359ee8a228ea108109cf852bcb7e145807dcf.camel@siemens.com>

On Tue, Jan 31, 2023 at 4:17 AM Florian Bezdeka
<florian.bezdeka@siemens.com> wrote:
> Thanks for the super fast response, highly appreciated!
>
> I was able to get it running by switching to ncat using the --no-
> shutdown option, but I failed to bring back socat support so far.
>
> For me this is still a regression. We have to change our
> infrastructure/environment because we have a new requirement
> (independent handling of stdin/out) after updating git now. I would
> expect some noise from the yocto/OE community in the future where oe-
> git-proxy is heavily used.
>
> I guess proxy support was forgotten when the referenced change was
> made. Any chance we can avoid closing stdout when running "in proxy
> mode" to restore backward compatibility?
>
> Thanks a lot!

I had this issue in the past and i got it working with socat using "-t
10". You need a timeout that is larger than the keep alive value of 5
seconds which is sent by the git protocol.

Junio pointed out the excellent analysis from Peff regarding the
situation and the fact that socat is wrong here.

What value of -t did you try?

^ permalink raw reply

* Re: [PATCH v5] win32: fix thread usage for win32
From: Johannes Sixt @ 2023-01-31 20:00 UTC (permalink / raw)
  To: Rose via GitGitGadget; +Cc: Jeff Hostetler, Seija Kijin, git
In-Reply-To: <pull.1440.v5.git.git.1675176442381.gitgitgadget@gmail.com>

Am 31.01.23 um 15:47 schrieb Rose via GitGitGadget:
> Range-diff vs v4:
> 
>  1:  2e2d5ce7745 = 1:  6ab79d9275d win32: fix thread usage for win32

If you do not agree with review comments, it would be more polite to
express your thinking than to just resend a patch unchanged and silently.

-- Hannes


^ permalink raw reply

* Join us for Review Club!
From: Calvin Wan @ 2023-01-31 19:41 UTC (permalink / raw)
  To: Git Mailing List, zev

Hi everyone!

Review Club is happening this Wednesday at 14:00 Pacific time (UTC-8).
You can find more info at [1] and on gitcal [2]. We run a session every
other week, and you can find the full schedule on gitcal.

This week, we'll be discussing Zev Weiss's (new contributor yay!) added
flags to git-format-patch [3]. Let me know if you're interested and would
like to join (off-list is fine), and I'll send you an invite :)

See you there!

[1] https://lore.kernel.org/git/Yfl1%2FZN%2FtaYwfGD0@google.com/
[2] http://tinyurl.com/gitcal
[3] https://lore.kernel.org/git/20230119223858.29262-1-zev@bewilderbeest.net/

^ permalink raw reply

* Re: [PATCH v3 02/11] bundle: verify using check_connected()
From: Junio C Hamano @ 2023-01-31 19:36 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, me, vdye, avarab, steadmon,
	chooglen
In-Reply-To: <73c1d863-036b-81bb-50d3-2a084c2c2fb5@github.com>

Derrick Stolee <derrickstolee@github.com> writes:

> ... but I expect the old algorithm to be
> slower than the new case, because check_connected() can terminate
> with success (due to seeing all the prerequisite commits) faster
> than the old walk can terminate with failure (due to walking all
> reachable commits).

That is good to hear.  As long as we are not making it worse than
before, I am happy as a clam ;-)

Thanks.

^ permalink raw reply

* Re: [PATCH v3 02/11] bundle: verify using check_connected()
From: Derrick Stolee @ 2023-01-31 19:31 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, me, vdye, avarab, steadmon, chooglen
In-Reply-To: <xmqqr0vay5cz.fsf@gitster.g>

On 1/31/2023 12:35 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>> +	/* TODO: preserve this verbose language. */
> 
> I am lost -- aren't we preserving the BUNDLE_VERBOSE code below?

Sorry, I put this in so I wouldn't forget to test that the
verbose message sticks, but I did forget to remove it.

>> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
>> index 38dbbf89155..7d40994991e 100755
>> --- a/t/t6020-bundle-misc.sh
>> +++ b/t/t6020-bundle-misc.sh
>> @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' '
>>  		# Verify should fail
>>  		test_must_fail git bundle verify \
>>  			../clone-from/tip.bundle 2>err &&
>> -		grep "Could not read $BAD_OID" err &&
>> -		grep "Failed to traverse parents of commit $TIP_OID" err &&
>> +		grep "some prerequisite commits .* are not connected" err &&
>> +		test_line_count = 1 err &&
>>  
>>  		# Unbundling should fail
>>  		test_must_fail git bundle unbundle \
>>  			../clone-from/tip.bundle 2>err &&
>> -		grep "Could not read $BAD_OID" err &&
>> -		grep "Failed to traverse parents of commit $TIP_OID" err
>> +		grep "some prerequisite commits .* are not connected" err &&
>> +		test_line_count = 1 err
>>  	)
>>  '
> 
> Especially with the new test added in the previous step, we know we
> are not trading correctness off.  Excellent.
> 
> I wonder how much the performance hit does this version incur over
> the "not safe at all" version and over the "use custom and
> stricter-than-needed" version, by the way?

I was able to simulate this in an extreme situation by taking a clone
of the normal Git fork, creating a ref at every first parent, then
creating a bundle of the difference between git/git and gitster/git.
Finally, I compared the performance of 'git bundle verify' on Git
compiled before this change and after this change:

Benchmark 1: old
  Time (mean ± σ):     324.7 ms ±   7.5 ms    [User: 228.0 ms, System: 95.7 ms]
  Range (min … max):   315.3 ms … 338.0 ms    10 runs
 
Benchmark 2: new
  Time (mean ± σ):     331.2 ms ±  20.2 ms    [User: 230.6 ms, System: 99.7 ms]
  Range (min … max):   302.8 ms … 360.2 ms    10 runs
 
Summary
  'old' ran
    1.02 ± 0.07 times faster than 'new'

So, there is a performance difference in the two situations, but it
is very slight, in what I can detect.

Of course, there is the case where the behavior differs because of
the more permissible behavior, but I expect the old algorithm to be
slower than the new case, because check_connected() can terminate
with success (due to seeing all the prerequisite commits) faster
than the old walk can terminate with failure (due to walking all
reachable commits).

Thanks,
-Stolee

^ permalink raw reply

* Re: gitformat-index.txt has a gap in the "mode" description?
From: Junio C Hamano @ 2023-01-31 19:26 UTC (permalink / raw)
  To: Glen Choo; +Cc: git
In-Reply-To: <kl6l357qy7r2.fsf@chooglen-macbookpro.roam.corp.google.com>

Glen Choo <chooglen@google.com> writes:

> According to gitformat-index.txt [1], "mode" is 32 bits, but we've only
> documented 16 bits. I tried poking around read-cache.c, and my
> impression is that other 16 are just NUL. If so, it would be worth
> documenting that they're unused, especially since we documented
> unused bits right in that section.
>
> [1] https://github.com/git/git/blob/master/Documentation/gitformat-index.txt#L84

Good thinking.  The existing explanation starts with "32-bit mode,
split into (high to low bits)", followed by "4-bit object type", as
if the "4-bit object type" occupies bits 29-32, which is not quite
what we want to say.

By the way, we should avoid saying "unused" but be more explicit and
say "must be zero".  The latter has no ambiguity but the former can
be misinterpreted to allow any garbage.

Thanks.

^ permalink raw reply

* [PATCH v3] grep: fall back to interpreter if JIT memory allocation fails
From: Mathias Krause @ 2023-01-31 18:56 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Mathias Krause, Carlo Marcelo Arenas Belón
In-Reply-To: <20230127154952.485913-1-minipli@grsecurity.net>

Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT
enabled, the allocation of PCRE2's JIT rwx memory may be prohibited,
making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48):

  [user@fedora git]$ git grep -c PCRE2_JIT
  grep.c:1

  [user@fedora git]$ # Enable SELinux's W^X policy
  [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem

  [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep'
  [user@fedora git]$ git grep -c PCRE2_JIT
  fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48'

Instead of failing hard in this case and making 'git grep' unusable on
such systems, simply fall back to interpreter mode, leading to a much
better user experience.

As having a functional PCRE2 JIT compiler is a legitimate use case for
performance reasons, we'll only do the fallback if the supposedly
available JIT is found to be non-functional by attempting to JIT compile
a very simple pattern. If this fails, JIT is deemed to be non-functional
and we do the interpreter fallback. For all other cases, i.e. the simple
pattern can be compiled but the user provided cannot, we fail hard as we
do now as the reason for the failure must be the pattern itself. To aid
users in helping themselves change the error message to include a hint
about the '(*NO_JIT)' prefix. Also clip the pattern at 64 characters to
ensure the hint will be seen by the user and not internally truncated by
the die() function.

Cc: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
---
v2: https://lore.kernel.org/git/20230127154952.485913-1-minipli@grsecurity.net/

Changes in v3:

Mention the possibility to prefix a failing pattern with '(*NO_JIT)' in
case we run into the JIT's limitations, as per Junio. Also clip the
printed pattern to ensure the hint actually gets printed.
---
 grep.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/grep.c b/grep.c
index 06eed694936c..c6fd44e3ec94 100644
--- a/grep.c
+++ b/grep.c
@@ -262,6 +262,31 @@ static void pcre2_free(void *pointer, MAYBE_UNUSED void *memory_data)
 	free(pointer);
 }
 
+static int pcre2_jit_functional(void)
+{
+	static int jit_working = -1;
+	pcre2_code *code;
+	size_t off;
+	int err;
+
+	if (jit_working != -1)
+		return jit_working;
+
+	/*
+	 * Try to JIT compile a simple pattern to probe if the JIT is
+	 * working in general. It might fail for systems where creating
+	 * memory mappings for runtime code generation is restricted.
+	 */
+	code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL);
+	if (!code)
+		return 0;
+
+	jit_working = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE) == 0;
+	pcre2_code_free(code);
+
+	return jit_working;
+}
+
 static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt)
 {
 	int error;
@@ -317,8 +342,29 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on);
 	if (p->pcre2_jit_on) {
 		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
-		if (jitret)
-			die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
+		if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) {
+			/*
+			 * Even though pcre2_config(PCRE2_CONFIG_JIT, ...)
+			 * indicated JIT support, the library might still
+			 * fail to generate JIT code for various reasons,
+			 * e.g. when SELinux's 'deny_execmem' or PaX's
+			 * MPROTECT prevent creating W|X memory mappings.
+			 *
+			 * Instead of faling hard, fall back to interpreter
+			 * mode, just as if the pattern was prefixed with
+			 * '(*NO_JIT)'.
+			 */
+			p->pcre2_jit_on = 0;
+			return;
+		} else if (jitret) {
+			int need_clip = p->patternlen > 64;
+			int clip_len = need_clip ? 64 : p->patternlen;
+			die("Couldn't JIT the PCRE2 pattern '%.*s'%s, got '%d'%s",
+			    clip_len, p->pattern, need_clip ? "..." : "", jitret,
+			    pcre2_jit_functional()
+			    ? "\nPerhaps prefix (*NO_JIT) to your pattern?"
+			    : "");
+		}
 
 		/*
 		 * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just
-- 
2.39.1


^ permalink raw reply related

* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails
From: Mathias Krause @ 2023-01-31 18:34 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Ævar Arnfjörð Bjarmason,
	Carlo Marcelo Arenas Belón
In-Reply-To: <xmqqtu06zmg2.fsf@gitster.g>

On 31.01.23 17:41, Junio C Hamano wrote:
> Mathias Krause <minipli@grsecurity.net> writes:
> 
>> Looks sensible, but maybe something like below would be even better?
> 
> When I say "in the meantime", I expect it not to be the final one.
> This time, it was meant as a mere reminder to me while I wait for
> the (hopefully final) reroll.

Got it, will send v3 (the final one, as three times is the charm, they
say) integrating the change to die() in a moment.

Thanks,
Mathias

^ permalink raw reply

* Re: [PATCH v3 02/11] bundle: verify using check_connected()
From: Junio C Hamano @ 2023-01-31 17:35 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, vdye, avarab, steadmon, chooglen, Derrick Stolee
In-Reply-To: <20c29d37f9c1ba1367145331d25dd27f966312cd.1675171759.git.gitgitgadget@gmail.com>

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Thus, the code in verify_bundle() has previously had the additional
> check that all prerequisite commits are reachable from repository
> references. This is done via a revision walk from all references,
> stopping only if all prerequisite commits are discovered or all commits
> are walked. This uses a custom walk to verify_bundle().
>
> This check is more strict than what Git applies to fetched pack-files.
> In the fetch case, Git guarantees that the new references are closed
> under reachability by walking from the new references until walking
> commits that are reachable from repository refs. This is done through
> the well-used check_connected() method.
>
> To better align with the restrictions required by 'git fetch',
> reimplement this check in verify_bundle() to use check_connected(). This
> also simplifies the code significantly.

As I often say, breaking repository faster is not the kind of
performance gain we want to have in Git, and I am in favor of this
iteration compared to the earlier version that mostly punted on
ensuring the correctness (rather, it relied on assumption that "most
of the time this should be OK").

>  bundle.c               | 75 ++++++++++++++++--------------------------
>  t/t6020-bundle-misc.sh |  8 ++---
>  2 files changed, 33 insertions(+), 50 deletions(-)

The diffstat is very pleasing to see.

Let me read the postimage along aloud (preimage omitted).

> diff --git a/bundle.c b/bundle.c
> index 4ef7256aa11..76c3a904898 100644
> --- a/bundle.c
> +++ b/bundle.c
> @@ -187,6 +188,21 @@ static int list_refs(struct string_list *r, int argc, const char **argv)
>  /* Remember to update object flag allocation in object.h */
>  #define PREREQ_MARK (1u<<16)
>  
> +struct string_list_iterator {
> +	struct string_list *list;
> +	size_t cur;
> +};
> +
> +static const struct object_id *iterate_ref_map(void *cb_data)
> +{
> +	struct string_list_iterator *iter = cb_data;
> +
> +	if (iter->cur >= iter->list->nr)
> +		return NULL;
> +
> +	return iter->list->items[iter->cur++].util;
> +}

This is to let check_connected() collect all the prerequisite object
names.  OK.

>  int verify_bundle(struct repository *r,
>  		  struct bundle_header *header,
>  		  enum verify_bundle_flags flags)
>  {
>  	/*
>  	 * Do fast check, then if any prereqs are missing then go line by line
>  	 * to be verbose about the errors
>  	 */
>  	struct string_list *p = &header->prerequisites;
> +	int i, ret = 0;
>  	const char *message = _("Repository lacks these prerequisite commits:");
> +	struct string_list_iterator iter = {
> +		.list = p,
> +	};
> +	struct check_connected_options opts = {
> +		.quiet = 1,
> +	};
>  
>  	if (!r || !r->objects || !r->objects->odb)
>  		return error(_("need a repository to verify a bundle"));
>  
>  	for (i = 0; i < p->nr; i++) {
>  		struct string_list_item *e = p->items + i;
>  		const char *name = e->string;
>  		struct object_id *oid = e->util;
>  		struct object *o = parse_object(r, oid);
> +		if (o)
>  			continue;
>  		ret++;
>  		if (flags & VERIFY_BUNDLE_QUIET)
>  			continue;
>  		if (ret == 1)
>  			error("%s", message);
>  		error("%s %s", oid_to_hex(oid), name);
>  	}
> +	if (ret)
>  		goto cleanup;

The "quick fail" logic as before.  Looking sensible.

>  
> +	if ((ret = check_connected(iterate_ref_map, &iter, &opts)))
> +		error(_("some prerequisite commits exist in the object store, "
> +			"but are not connected to the repository's history"));

And then we let check_connected() to ensure that traversing from
these prerequisite objects down to the DAG formed by existing refs
will not die from missing objects.  Makes sense.

> +	/* TODO: preserve this verbose language. */

I am lost -- aren't we preserving the BUNDLE_VERBOSE code below?

>  	if (flags & VERIFY_BUNDLE_VERBOSE) {



> diff --git a/t/t6020-bundle-misc.sh b/t/t6020-bundle-misc.sh
> index 38dbbf89155..7d40994991e 100755
> --- a/t/t6020-bundle-misc.sh
> +++ b/t/t6020-bundle-misc.sh
> @@ -595,14 +595,14 @@ test_expect_success 'verify catches unreachable, broken prerequisites' '
>  		# Verify should fail
>  		test_must_fail git bundle verify \
>  			../clone-from/tip.bundle 2>err &&
> -		grep "Could not read $BAD_OID" err &&
> -		grep "Failed to traverse parents of commit $TIP_OID" err &&
> +		grep "some prerequisite commits .* are not connected" err &&
> +		test_line_count = 1 err &&
>  
>  		# Unbundling should fail
>  		test_must_fail git bundle unbundle \
>  			../clone-from/tip.bundle 2>err &&
> -		grep "Could not read $BAD_OID" err &&
> -		grep "Failed to traverse parents of commit $TIP_OID" err
> +		grep "some prerequisite commits .* are not connected" err &&
> +		test_line_count = 1 err
>  	)
>  '

Especially with the new test added in the previous step, we know we
are not trading correctness off.  Excellent.

I wonder how much the performance hit does this version incur over
the "not safe at all" version and over the "use custom and
stricter-than-needed" version, by the way?

Thanks.

^ permalink raw reply

* Re: Bug: Cloning git repositories behind a proxy using the git:// protocol broken since 2.32
From: Junio C Hamano @ 2023-01-31 16:57 UTC (permalink / raw)
  To: Florian Bezdeka
  Cc: brian m. carlson, git@vger.kernel.org, greg.pflaum@pnp-hcl.com,
	peff@peff.net
In-Reply-To: <339359ee8a228ea108109cf852bcb7e145807dcf.camel@siemens.com>

Florian Bezdeka <florian.bezdeka@siemens.com> writes:

> I guess proxy support was forgotten when the referenced change was
> made. Any chance we can avoid closing stdout when running "in proxy
> mode" to restore backward compatibility?

See the last paragraph of
https://lore.kernel.org/git/YS1Bni+QuZBOgkUI@coredump.intra.peff.net/

Nobody brought anything new to the table since then (the original
discussion was from Aug 2021) to change the conclusion that socat
was not doing the right thing in its default form.

Thanks.


^ permalink raw reply

* gitformat-index.txt has a gap in the "mode" description?
From: Glen Choo @ 2023-01-31 16:43 UTC (permalink / raw)
  To: git

According to gitformat-index.txt [1], "mode" is 32 bits, but we've only
documented 16 bits. I tried poking around read-cache.c, and my
impression is that other 16 are just NUL. If so, it would be worth
documenting that they're unused, especially since we documented
unused bits right in that section.

[1] https://github.com/git/git/blob/master/Documentation/gitformat-index.txt#L84

^ permalink raw reply

* Re: [PATCH v2] grep: fall back to interpreter if JIT memory allocation fails
From: Junio C Hamano @ 2023-01-31 16:41 UTC (permalink / raw)
  To: Mathias Krause
  Cc: git, Ævar Arnfjörð Bjarmason,
	Carlo Marcelo Arenas Belón
In-Reply-To: <d1c4cbad-bbb8-d610-5e27-970d96dd7a74@grsecurity.net>

Mathias Krause <minipli@grsecurity.net> writes:

> Looks sensible, but maybe something like below would be even better?

When I say "in the meantime", I expect it not to be the final one.
This time, it was meant as a mere reminder to me while I wait for
the (hopefully final) reroll.

Thanks.

^ permalink raw reply

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Eli Schwartz @ 2023-01-31 16:34 UTC (permalink / raw)
  To: Konstantin Ryabitsev; +Cc: brian m. carlson, Git List
In-Reply-To: <20230131162049.mgqdxcucjesw4afr@meerkat.local>

On 1/31/23 11:20 AM, Konstantin Ryabitsev wrote:
> On Tue, Jan 31, 2023 at 10:56:52AM -0500, Eli Schwartz wrote:
>> And for tarballs that are generated once and uploaded to ftp storage,
>> not repeatedly generated on the fly, we know the checksum will never
>> legitimately change, so we *want* to hash the compressed file.
>> Decompressing kernel.org tarballs in order to run PGP on them is *slow*.
> 
> FWIW, the most correct way is:
> 
> * download sha256sums.asc and verify its signature (auto-signed by infra)
> * download the tarball you want and verify that the checksum matches
> * uncompress and verify the PGP signature (signed by developer)
> 
> This script implements this workflow:
> https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/tree/get-verified-tarball

This is just what I said, but with an additional first step for when you
are updating to a new tarball and don't have your own checksums
integrated into your own ecosystem tracking.

In most contexts, it's utterly unacceptable to not remember the checksum
of the file you used last time and instead simply trust PGP identity
verification. This permits upstream the technical means to be malicious,
and re-upload a totally different tarball with the same name, different
contents, and different PGP signature, and you will never notice because
the PGP signature is still okay.

Just because I trust you all doesn't mean I should ignore existing best
practices to make sure that I always use the same reviewed
byte-identical tarball -- or find out exactly why it changed.

-- 
Eli Schwartz

^ permalink raw reply

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Konstantin Ryabitsev @ 2023-01-31 16:20 UTC (permalink / raw)
  To: Eli Schwartz; +Cc: brian m. carlson, Git List
In-Reply-To: <6fc8e122-a190-c291-c347-258a5a2ad9c9@gmail.com>

On Tue, Jan 31, 2023 at 10:56:52AM -0500, Eli Schwartz wrote:
> And for tarballs that are generated once and uploaded to ftp storage,
> not repeatedly generated on the fly, we know the checksum will never
> legitimately change, so we *want* to hash the compressed file.
> Decompressing kernel.org tarballs in order to run PGP on them is *slow*.

FWIW, the most correct way is:

* download sha256sums.asc and verify its signature (auto-signed by infra)
* download the tarball you want and verify that the checksum matches
* uncompress and verify the PGP signature (signed by developer)

This script implements this workflow:
https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git/tree/get-verified-tarball

-K

^ permalink raw reply

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Eli Schwartz @ 2023-01-31 15:56 UTC (permalink / raw)
  To: brian m. carlson, Git List
In-Reply-To: <Y9jlWYLzZ/yy4NqD@tapette.crustytoothpaste.net>

On 1/31/23 4:54 AM, brian m. carlson wrote:
> Part of the reason I think this is valuable is that once SHA-1 and
> SHA-256 interoperability is present, git archive will change the
> contents of the archive format, since it will embed a SHA-256 hash into
> the file instead of a SHA-1 hash, since that's what's in the repository.
> Thus, we can't produce an archive that's deterministic in the face of
> SHA-1/SHA-256 interoperability concerns, and we need to create a new
> format that doesn't contain that data embedded in it.

I assume that whatever the reason for originally embedding the OID into
the file is still an applicable reason even if a new PAX format is
established for the use of git-archive.

It may not be a great reason -- I don't know. Perhaps there's an
argument to remove it. But can't that be done irrespective of
standardizing the PAX format?

...

I'm not deeply knowledgeable about the SHA-256 transition work -- or
knowledgeable at all about it, frankly. (Also my understanding was it
seems to have stalled as discussed in https://lwn.net/Articles/898522/
-- I understand that you're still enthusiastic about the work? But that
doesn't really answer "is there a timeframe for that to ever happen".)

But I sort of assumed that the transition work would already have to
embed a fair bit of information into the repository about the whole
process? Would it not be possible to determine whether a given tag
started life as SHA-1 or SHA-256? Maybe even just a date when the
repository was converted to work with both, and embed the OID based on
whether the tag is tagging contents that were created after that conversion?

Seems to me like the problem should be solvable if people want to solve it.

...

git-archive run on a commit obviously doesn't have this problem -- it
can simply embed the OID for the same argument it was called with. But I
assume it's far more common to access tag-based github endpoints. :D

> Having said that, I don't think this should be based on the timestamp of
> the file, since that means that two otherwise identical archives
> differing in timestamp aren't ever going to be the same, and we do see
> people who import or vendor other projects. 

The timestamp of the output file? Surely not. But I only suggested the
timestamp of the commit/tag metadata that git-archive is asked to
produce output for. And we would need that in order to solve the problem
that reproducible github API archive endpoints poses.

I'm not sure what the "import or vendor other projects" angle here
means. Do you mean people who copy a directory of files into their
project? Who expects this to be the same to begin with? And doesn't
embedding the OID kill this idea, since the entire point of git commit
sha's is that you shouldn't (it should be prohibitively unrealistic to)
be able to produce the same one twice in different contexts?

I have never said to myself "ah yes, I really would like to be able to
download a git auto-generated tarball for project A, and compare its
hash to the tarball for project B, and have them compare identical even
though they are different projects with different commits". IMHO this
isn't an interesting problem to solve -- the interesting problem to
solve is that a single absolute URL to a downloadable file should be
able to offer documented guarantees that it will always be the same
file, even though it is generated on the fly.

> Nor do I think we should
> attempt to provide consistent compression, since I believe the output of
> things like zlib has changed in the past, and we can't continually carry
> an old, potentially insecure version of zlib just because the output
> changed.  People should be able to implement compression using gzip,
> zlib, pigz, miniz_oxide, or whatever if they want, since people
> implement Git in many different languages, and we won't want to force
> people using memory-safe languages like Go and Rust to explicitly use
> zlib for archives.

I do not think it is realistic or reasonable for people to implement
compression using intentionally incompatible replacements for gzip and
expect interoperability of any sort.

I also don't think people *have* to implement compression in rust using
zlib, but if they are going to make a git-alike that produces archives,
it would be worth it for them to write whatever memory-safe rust is
necessary to memory-safely produce the same output stream of bytes. It's
no less feasible than making sure that busybox gzip and GNU gzip produce
the same output, surely.

Alternatively, they could just not bother with gzip at all, and make
their git-alike produce zstd-compressed tarballs, which change their
byte outputs every time a new zstd release is published. :D Again, why
limit yourself to gzip if you want to be innovative anyway.

> That may mean that it's important for people to actually decompress the
> archive before checking hashes if they want deterministic behaviour, and
> I'm okay with that.  You already have to do that if you're verifying the
> signature on Git tarballs, since only the uncompressed tar archive is
> signed, so I don't think this is out of the question.

This is a very kernel.org-centric view of things, I think. I have rarely
seen PGP signatures applied to the uncompressed tar except in that
context. The vast majority of tarballs with signatures have signed a
single compressed tarball and don't concern themselves with, say,
providing a rotating backdated changeable list of compression formats
with a single signature covering all of them.

Nevertheless, in order to handle kernel.org-style tarballs, you are
entirely correct that one should be able to handle this.

>From experience, I can say that this needs to be selected on a
per-tarball basis. Since signature files have filenames, we can match
their stems and given foo.tar.asc and foo.tar.gz, check the signature of
the output of gzip -dc < foo.tar.gz, but given foo.tar.gz.asc and
foo.tar.gz, simply check the signature of the original foo.tar.gz.

This doesn't really work for checksums, because you need to settle on
one or the other everywhere or else embed decompression information into
your checksum metadata field.

And for tarballs that are generated once and uploaded to ftp storage,
not repeatedly generated on the fly, we know the checksum will never
legitimately change, so we *want* to hash the compressed file.
Decompressing kernel.org tarballs in order to run PGP on them is *slow*.
Although at least one can verify the checksums first without
decompression, which is virtually guaranteed to catch invalid source
code releases, so if you ever progress to the PGP verification stage
it's unlikely to be wasted effort -- that tarball is definitely getting
used to build something.

-- 
Eli Schwartz

^ permalink raw reply

* Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution
From: Konstantin Ryabitsev @ 2023-01-31 15:05 UTC (permalink / raw)
  To: brian m. carlson, Eli Schwartz, Git List
In-Reply-To: <Y9jlWYLzZ/yy4NqD@tapette.crustytoothpaste.net>

On Tue, Jan 31, 2023 at 09:54:58AM +0000, brian m. carlson wrote:
> I'm one of the GitHub employees who chimed in there, and I'm also a Git
> contributor in my own time (and I am speaking here only in my personal
> capacity, since this is a personal address).  I made a change some years
> back to the archive format to fix the permissions on pax headers when
> extracted as files, and kernel.org was relying on that and broke.  Linus
> yelled at me because of that.
> 
> Since then, I've been very opposed to us guaranteeing output format
> consistency without explicitly doing so.  I had sent some patches before
> that I don't think ever got picked up that documented this explicitly.
> I very much don't want people to come to rely on our behaviour unless we
> explicitly guarantee it.

I understand your position, but I also think it's one of those things that
happen despite your best efforts to prevent it. :)

May I suggest adding a "git-archive --stable" that offers this guarantee,
simply as a matter of codifying the fact that the world has built
infrastructure around git's repeatable output. Maybe just for .tar (and
.tar.gz).

I know this complicates the code and makes it more "expensive" to maintain,
but it would be dramatically less expensive than changing the established
practices around the world.

-K

^ permalink raw reply

* [PATCH v2] win32: check for NULL after creating thread
From: Rose via GitGitGadget @ 2023-01-31 14:53 UTC (permalink / raw)
  To: git; +Cc: Rose, Seija Kijin
In-Reply-To: <pull.1445.git.git.1675176581433.gitgitgadget@gmail.com>

From: Seija Kijin <doremylover123@gmail.com>

Check for NULL handles, not "INVALID_HANDLE,"
as CreateThread guarantees a valid handle in most cases.

The return value for failed thread creation is NULL,
not INVALID_HANDLE_VALUE, unlike other Windows
API functions.

Signed-off-by: Seija Kijin <doremylover123@gmail.com>
---
    win32: check for NULL after creating thread
    
    Check for NULL handles, not "INVALID_HANDLE," as CreateThread guarantees
    a valid handle in most cases.
    
    The return value for failed thread creation is NULL, not
    INVALID_HANDLE_VALUE, unlike other Windows API functions.
    
    Signed-off-by: Seija Kijin doremylover123@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1445%2FAtariDreams%2FhThread-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1445/AtariDreams/hThread-v2
Pull-Request: https://github.com/git/git/pull/1445

Range-diff vs v1:

 1:  e75d15e42f4 ! 1:  c956cafdec9 win32: check for NULL when creating thread
     @@ Metadata
      Author: Seija Kijin <doremylover123@gmail.com>
      
       ## Commit message ##
     -    win32: check for NULL when creating thread
     +    win32: check for NULL after creating thread
      
          Check for NULL handles, not "INVALID_HANDLE,"
          as CreateThread guarantees a valid handle in most cases.


 compat/winansi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/compat/winansi.c b/compat/winansi.c
index 3abe8dd5a27..f83610f684d 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -644,7 +644,7 @@ void winansi_init(void)
 
 	/* start console spool thread on the pipe's read end */
 	hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
-	if (hthread == INVALID_HANDLE_VALUE)
+	if (!hthread)
 		die_lasterr("CreateThread(console_thread) failed");
 
 	/* schedule cleanup routine */

base-commit: 2fc9e9ca3c7505bc60069f11e7ef09b1aeeee473
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH] win32: check for NULL when creating thread
From: Rose via GitGitGadget @ 2023-01-31 14:49 UTC (permalink / raw)
  To: git; +Cc: Rose, Seija Kijin

From: Seija Kijin <doremylover123@gmail.com>

Check for NULL handles, not "INVALID_HANDLE,"
as CreateThread guarantees a valid handle in most cases.

The return value for failed thread creation is NULL,
not INVALID_HANDLE_VALUE, unlike other Windows
API functions.

Signed-off-by: Seija Kijin <doremylover123@gmail.com>
---
    win32: check for NULL when creating thread
    
    Check for NULL handles, not "INVALID_HANDLE," as CreateThread guarantees
    a valid handle in most cases.
    
    The return value for failed thread creation is NULL, not
    INVALID_HANDLE_VALUE, unlike other Windows API functions.
    
    Signed-off-by: Seija Kijin doremylover123@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1445%2FAtariDreams%2FhThread-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1445/AtariDreams/hThread-v1
Pull-Request: https://github.com/git/git/pull/1445

 compat/winansi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/compat/winansi.c b/compat/winansi.c
index 3abe8dd5a27..f83610f684d 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -644,7 +644,7 @@ void winansi_init(void)
 
 	/* start console spool thread on the pipe's read end */
 	hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
-	if (hthread == INVALID_HANDLE_VALUE)
+	if (!hthread)
 		die_lasterr("CreateThread(console_thread) failed");
 
 	/* schedule cleanup routine */

base-commit: 2fc9e9ca3c7505bc60069f11e7ef09b1aeeee473
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v5] win32: fix thread usage for win32
From: Rose via GitGitGadget @ 2023-01-31 14:47 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt, Jeff Hostetler, Rose, Seija Kijin
In-Reply-To: <pull.1440.v4.git.git.1674492499537.gitgitgadget@gmail.com>

From: Seija Kijin <doremylover123@gmail.com>

Use _beginthreadex instead of CreateThread
since we use the Windows CRT,
as Microsoft recommends _beginthreadex
over CreateThread for these situations.

Finally, check for NULL handles, not "INVALID_HANDLE,"
as _beginthreadex guarantees a valid handle in most cases

Signed-off-by: Seija Kijin <doremylover123@gmail.com>
---
    win32: fix thread usage for win32
    
    Use pthread_exit instead of async_exit.
    
    This means we do not have to deal with Windows's implementation
    requiring an unsigned exit coded despite the POSIX exit code requiring a
    signed exit code.
    
    Use _beginthreadex instead of CreateThread since we use the Windows CRT.
    
    Finally, check for NULL handles, not "INVALID_HANDLE," as _beginthreadex
    guarantees a valid handle in most cases
    
    Signed-off-by: Seija Kijin doremylover123@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1440%2FAtariDreams%2FCreateThread-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1440/AtariDreams/CreateThread-v5
Pull-Request: https://github.com/git/git/pull/1440

Range-diff vs v4:

 1:  2e2d5ce7745 = 1:  6ab79d9275d win32: fix thread usage for win32


 compat/winansi.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/compat/winansi.c b/compat/winansi.c
index 3abe8dd5a27..be65b27bd75 100644
--- a/compat/winansi.c
+++ b/compat/winansi.c
@@ -340,7 +340,7 @@ enum {
 	TEXT = 0, ESCAPE = 033, BRACKET = '['
 };
 
-static DWORD WINAPI console_thread(LPVOID unused)
+static unsigned int WINAPI console_thread(LPVOID unused)
 {
 	unsigned char buffer[BUFFER_SIZE];
 	DWORD bytes;
@@ -643,9 +643,9 @@ void winansi_init(void)
 		die_lasterr("CreateFile for named pipe failed");
 
 	/* start console spool thread on the pipe's read end */
-	hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
-	if (hthread == INVALID_HANDLE_VALUE)
-		die_lasterr("CreateThread(console_thread) failed");
+	hthread = (HANDLE)_beginthreadex(NULL, 0, console_thread, NULL, 0, NULL);
+	if (!hthread)
+		die_lasterr("_beginthreadex(console_thread) failed");
 
 	/* schedule cleanup routine */
 	if (atexit(winansi_exit))

base-commit: 2fc9e9ca3c7505bc60069f11e7ef09b1aeeee473
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v3 10/11] bundle-uri: store fetch.bundleCreationToken
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

When a bundle list specifies the "creationToken" heuristic, the Git
client downloads the list and then starts downloading bundles in
descending creationToken order. This process stops as soon as all
downloaded bundles can be applied to the repository (because all
required commits are present in the repository or in the downloaded
bundles).

When checking the same bundle list twice, this strategy requires
downloading the bundle with the maximum creationToken again, which is
wasteful. The creationToken heuristic promises that the client will not
have a use for that bundle if its creationToken value is at most the
previous creationToken value.

To prevent these wasteful downloads, create a fetch.bundleCreationToken
config setting that the Git client sets after downloading bundles. This
value allows skipping that maximum bundle download when this config
value is the same value (or larger).

To test that this works correctly, we can insert some "duplicate"
fetches into existing tests and demonstrate that only the bundle list is
downloaded.

The previous logic for downloading bundles by creationToken worked even
if the bundle list was empty, but now we have logic that depends on the
first entry of the list. Terminate early in the (non-sensical) case of
an empty bundle list.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/config/fetch.txt | 16 ++++++++++++
 bundle-uri.c                   | 48 ++++++++++++++++++++++++++++++++--
 t/t5558-clone-bundle-uri.sh    | 29 +++++++++++++++++++-
 3 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index 244f44d460f..568f0f75b30 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -104,3 +104,19 @@ fetch.bundleURI::
 	linkgit:git-clone[1]. `git clone --bundle-uri` will set the
 	`fetch.bundleURI` value if the supplied bundle URI contains a bundle
 	list that is organized for incremental fetches.
++
+If you modify this value and your repository has a `fetch.bundleCreationToken`
+value, then remove that `fetch.bundleCreationToken` value before fetching from
+the new bundle URI.
+
+fetch.bundleCreationToken::
+	When using `fetch.bundleURI` to fetch incrementally from a bundle
+	list that uses the "creationToken" heuristic, this config value
+	stores the maximum `creationToken` value of the downloaded bundles.
+	This value is used to prevent downloading bundles in the future
+	if the advertised `creationToken` is not strictly larger than this
+	value.
++
+The creation token values are chosen by the provider serving the specific
+bundle URI. If you modify the URI at `fetch.bundleURI`, then be sure to
+remove the value for the `fetch.bundleCreationToken` value before fetching.
diff --git a/bundle-uri.c b/bundle-uri.c
index 7a1b6d94bf5..d6f7df7350f 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -481,6 +481,8 @@ static int fetch_bundles_by_token(struct repository *r,
 {
 	int cur;
 	int move_direction = 0;
+	const char *creationTokenStr;
+	uint64_t maxCreationToken = 0, newMaxCreationToken = 0;
 	struct bundle_list_context ctx = {
 		.r = r,
 		.list = list,
@@ -494,8 +496,27 @@ static int fetch_bundles_by_token(struct repository *r,
 
 	for_all_bundles_in_list(list, append_bundle, &bundles);
 
+	if (!bundles.nr) {
+		free(bundles.items);
+		return 0;
+	}
+
 	QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
 
+	/*
+	 * If fetch.bundleCreationToken exists, parses to a uint64t, and
+	 * is not strictly smaller than the maximum creation token in the
+	 * bundle list, then do not download any bundles.
+	 */
+	if (!repo_config_get_value(r,
+				   "fetch.bundlecreationtoken",
+				   &creationTokenStr) &&
+	    sscanf(creationTokenStr, "%"PRIu64, &maxCreationToken) == 1 &&
+	    bundles.items[0]->creationToken <= maxCreationToken) {
+		free(bundles.items);
+		return 0;
+	}
+
 	/*
 	 * Attempt to download and unbundle the minimum number of bundles by
 	 * creationToken in decreasing order. If we fail to unbundle (after
@@ -516,6 +537,16 @@ static int fetch_bundles_by_token(struct repository *r,
 	cur = 0;
 	while (cur >= 0 && cur < bundles.nr) {
 		struct remote_bundle_info *bundle = bundles.items[cur];
+
+		/*
+		 * If we need to dig into bundles below the previous
+		 * creation token value, then likely we are in an erroneous
+		 * state due to missing or invalid bundles. Halt the process
+		 * instead of continuing to download extra data.
+		 */
+		if (bundle->creationToken <= maxCreationToken)
+			break;
+
 		if (!bundle->file) {
 			/*
 			 * Not downloaded yet. Try downloading.
@@ -555,6 +586,9 @@ static int fetch_bundles_by_token(struct repository *r,
 				 */
 				move_direction = -1;
 				bundle->unbundled = 1;
+
+				if (bundle->creationToken > newMaxCreationToken)
+					newMaxCreationToken = bundle->creationToken;
 			}
 		}
 
@@ -569,14 +603,24 @@ move:
 		cur += move_direction;
 	}
 
-	free(bundles.items);
-
 	/*
 	 * We succeed if the loop terminates because 'cur' drops below
 	 * zero. The other case is that we terminate because 'cur'
 	 * reaches the end of the list, so we have a failure no matter
 	 * which bundles we apply from the list.
 	 */
+	if (cur < 0) {
+		struct strbuf value = STRBUF_INIT;
+		strbuf_addf(&value, "%"PRIu64"", newMaxCreationToken);
+		if (repo_config_set_multivar_gently(ctx.r,
+						    "fetch.bundleCreationToken",
+						    value.buf, NULL, 0))
+			warning(_("failed to store maximum creation token"));
+
+		strbuf_release(&value);
+	}
+
+	free(bundles.items);
 	return cur >= 0;
 }
 
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 7deeb4b8ad1..9c2b7934b9b 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -433,6 +433,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 		"$HTTPD_URL/smart/fetch.git" clone-token-http &&
 
 	test_cmp_config -C clone-token-http "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C clone-token-http 1 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -468,6 +469,7 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
 		git -C clone-token-http fetch origin --no-tags \
 		refs/heads/merge:refs/heads/merge &&
+	test_cmp_config -C clone-token-http 4 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -511,6 +513,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		"$HTTPD_URL/smart/fetch.git" fetch-http-4 &&
 
 	test_cmp_config -C fetch-http-4 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C fetch-http-4 1 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -538,6 +541,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 		git -C fetch-http-4 fetch origin --no-tags \
 		refs/heads/left:refs/heads/left \
 		refs/heads/right:refs/heads/right &&
+	test_cmp_config -C fetch-http-4 2 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -555,6 +559,18 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	EOF
 	test_cmp expect refs &&
 
+	# No-op fetch
+	GIT_TRACE2_EVENT="$(pwd)/trace1b.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/left:refs/heads/left \
+		refs/heads/right:refs/heads/right &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	EOF
+	test_remote_https_urls <trace1b.txt >actual &&
+	test_cmp expect actual &&
+
 	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
 	[bundle "bundle-3"]
 		uri = bundle-3.bundle
@@ -570,6 +586,7 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
 		git -C fetch-http-4 fetch origin --no-tags \
 		refs/heads/merge:refs/heads/merge &&
+	test_cmp_config -C fetch-http-4 4 fetch.bundlecreationtoken &&
 
 	cat >expect <<-EOF &&
 	$HTTPD_URL/bundle-list
@@ -588,7 +605,17 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	refs/bundles/left
 	refs/bundles/merge
 	EOF
-	test_cmp expect refs
+	test_cmp expect refs &&
+
+	# No-op fetch
+	GIT_TRACE2_EVENT="$(pwd)/trace2b.txt" \
+		git -C fetch-http-4 fetch origin &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	EOF
+	test_remote_https_urls <trace2b.txt >actual &&
+	test_cmp expect actual
 '
 
 # Do not add tests here unless they use the HTTP server, as they will
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 11/11] bundle-uri: test missing bundles with heuristic
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

The creationToken heuristic uses a different mechanism for downloading
bundles from the "standard" approach. Specifically: it uses a concrete
order based on the creationToken values and attempts to download as few
bundles as possible. It also modifies local config to store a value for
future fetches to avoid downloading bundles, if possible.

However, if any of the individual bundles has a failed download, then
the logic for the ordering comes into question. It is important to avoid
infinite loops, assigning invalid creation token values in config, but
also to be opportunistic as possible when downloading as many bundles as
seem appropriate.

These tests were used to inform the implementation of
fetch_bundles_by_token() in bundle-uri.c, but are being added
independently here to allow focusing on faulty downloads. There may be
more cases that could be added that result in modifications to
fetch_bundles_by_token() as interesting data shapes reveal themselves in
real scenarios.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t5558-clone-bundle-uri.sh | 400 ++++++++++++++++++++++++++++++++++++
 1 file changed, 400 insertions(+)

diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 9c2b7934b9b..afd56926c53 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -618,6 +618,406 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	test_cmp expect actual
 '
 
+test_expect_success 'creationToken heuristic with failed downloads (clone)' '
+	test_when_finished rm -rf download-* trace*.txt &&
+
+	# Case 1: base bundle does not exist, nothing can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = fake.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-1.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-1 &&
+
+	# Bundle failure does not set these configs.
+	test_must_fail git -C download-1 config fetch.bundleuri &&
+	test_must_fail git -C download-1 config fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/fake.bundle
+	EOF
+	test_remote_https_urls <trace-clone-1.txt >actual &&
+	test_cmp expect actual &&
+
+	# All bundles failed to unbundle
+	git -C download-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	test_must_be_empty refs &&
+
+	# Case 2: middle bundle does not exist, only two bundles can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = fake.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-2.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-2 &&
+
+	# Bundle failure does not set these configs.
+	test_must_fail git -C download-2 config fetch.bundleuri &&
+	test_must_fail git -C download-2 config fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+	test_remote_https_urls <trace-clone-2.txt >actual &&
+	test_cmp expect actual &&
+
+	# bundle-1 and bundle-3 could unbundle, but bundle-4 could not
+	git -C download-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/right
+	EOF
+	test_cmp expect refs &&
+
+	# Case 3: top bundle does not exist, rest unbundle fine.
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = fake.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace-clone-3.txt" \
+	git clone --single-branch --branch=base \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" download-3 &&
+
+	# As long as we have continguous successful downloads,
+	# we _do_ set these configs.
+	test_cmp_config -C download-3 "$HTTPD_URL/bundle-list" fetch.bundleuri &&
+	test_cmp_config -C download-3 3 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	$HTTPD_URL/bundle-1.bundle
+	EOF
+	test_remote_https_urls <trace-clone-3.txt >actual &&
+	test_cmp expect actual &&
+
+	# fake.bundle did not unbundle, but the others did.
+	git -C download-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/right
+	EOF
+	test_cmp expect refs
+'
+
+# Expand the bundle list to include other interesting shapes, specifically
+# interesting for use when fetching from a previous state.
+#
+# ---------------- bundle-7
+#       7
+#     _/|\_
+# ---/--|--\------ bundle-6
+#   5   |   6
+# --|---|---|----- bundle-4
+#   |   4   |
+#   |  / \  /
+# --|-|---|/------ bundle-3 (the client will be caught up to this point.)
+#   \ |   3
+# ---\|---|------- bundle-2
+#     2   |
+# ----|---|------- bundle-1
+#      \ /
+#       1
+#       |
+# (previous commits)
+test_expect_success 'expand incremental bundle list' '
+	(
+		cd clone-from &&
+		git checkout -b lefter left &&
+		test_commit 5 &&
+		git checkout -b righter right &&
+		test_commit 6 &&
+		git checkout -b top lefter &&
+		git merge -m "7" merge righter &&
+
+		git bundle create bundle-6.bundle lefter righter --not left right &&
+		git bundle create bundle-7.bundle top --not lefter merge righter &&
+
+		cp bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/"
+	) &&
+	git -C "$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" fetch origin +refs/heads/*:refs/heads/*
+'
+
+test_expect_success 'creationToken heuristic with failed downloads (fetch)' '
+	test_when_finished rm -rf download-* trace*.txt &&
+
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+	EOF
+
+	git clone --single-branch --branch=left \
+		--bundle-uri="$HTTPD_URL/bundle-list" \
+		"$HTTPD_URL/smart/fetch.git" fetch-base &&
+	test_cmp_config -C fetch-base "$HTTPD_URL/bundle-list" fetch.bundleURI &&
+	test_cmp_config -C fetch-base 3 fetch.bundleCreationToken &&
+
+	# Case 1: all bundles exist: successful unbundling of all bundles
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = bundle-6.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = bundle-7.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-1 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-1.txt" \
+		git -C fetch-1 fetch origin &&
+	test_cmp_config -C fetch-1 7 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-7.bundle
+	$HTTPD_URL/bundle-6.bundle
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-1.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-1 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/lefter
+	refs/bundles/merge
+	refs/bundles/right
+	refs/bundles/righter
+	refs/bundles/top
+	EOF
+	test_cmp expect refs &&
+
+	# Case 2: middle bundle does not exist, only bundle-4 can unbundle
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = fake.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = bundle-7.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-2 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-2.txt" \
+		git -C fetch-2 fetch origin &&
+
+	# Since bundle-7 fails to unbundle, do not update creation token.
+	test_cmp_config -C fetch-2 3 fetch.bundlecreationtoken &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-7.bundle
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-2.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-2 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect refs &&
+
+	# Case 3: top bundle does not exist, rest unbundle fine.
+	cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+
+	[bundle "bundle-6"]
+		uri = bundle-6.bundle
+		creationToken = 6
+
+	[bundle "bundle-7"]
+		uri = fake.bundle
+		creationToken = 7
+	EOF
+
+	cp -r fetch-base fetch-3 &&
+	GIT_TRACE2_EVENT="$(pwd)/trace-fetch-3.txt" \
+		git -C fetch-3 fetch origin &&
+
+	# As long as we have continguous successful downloads,
+	# we _do_ set the maximum creation token.
+	test_cmp_config -C fetch-3 6 fetch.bundlecreationtoken &&
+
+	# NOTE: the fetch skips bundle-4 since bundle-6 successfully
+	# unbundles itself and bundle-7 failed to download.
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/fake.bundle
+	$HTTPD_URL/bundle-6.bundle
+	EOF
+	test_remote_https_urls <trace-fetch-3.txt >actual &&
+	test_cmp expect actual &&
+
+	# Check which bundles have unbundled by refs
+	git -C fetch-3 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/lefter
+	refs/bundles/right
+	refs/bundles/righter
+	EOF
+	test_cmp expect refs
+'
+
 # Do not add tests here unless they use the HTTP server, as they will
 # not run unless the HTTP dependencies exist.
 
-- 
gitgitgadget

^ permalink raw reply related

* [PATCH v3 09/11] fetch: fetch from an external bundle URI
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

When a user specifies a URI via 'git clone --bundle-uri', that URI may
be a bundle list that advertises a 'bundle.heuristic' value. In that
case, the Git client stores a 'fetch.bundleURI' config value storing
that URI.

Teach 'git fetch' to check for this config value and download bundles
from that URI before fetching from the Git remote(s). Likely, the bundle
provider has configured a heuristic (such as "creationToken") that will
allow the Git client to download only a portion of the bundles before
continuing the fetch.

Since this URI is completely independent of the remote server, we want
to be sure that we connect to the bundle URI before creating a
connection to the Git remote. We do not want to hold a stateful
connection for too long if we can avoid it.

To test that this works correctly, extend the previous tests that set
'fetch.bundleURI' to do follow-up fetches. The bundle list is updated
incrementally at each phase to demonstrate that the heuristic avoids
downloading older bundles. This includes the middle fetch downloading
the objects in bundle-3.bundle from the Git remote, and therefore not
needing that bundle in the third fetch.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/fetch.c             |   6 ++
 t/t5558-clone-bundle-uri.sh | 113 +++++++++++++++++++++++++++++++++++-
 2 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7378cafeec9..0477c379369 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -29,6 +29,7 @@
 #include "commit-graph.h"
 #include "shallow.h"
 #include "worktree.h"
+#include "bundle-uri.h"
 
 #define FORCED_UPDATES_DELAY_WARNING_IN_MS (10 * 1000)
 
@@ -2109,6 +2110,7 @@ static int fetch_one(struct remote *remote, int argc, const char **argv,
 int cmd_fetch(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	const char *bundle_uri;
 	struct string_list list = STRING_LIST_INIT_DUP;
 	struct remote *remote = NULL;
 	int result = 0;
@@ -2194,6 +2196,10 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
 	if (dry_run)
 		write_fetch_head = 0;
 
+	if (!git_config_get_string_tmp("fetch.bundleuri", &bundle_uri) &&
+	    fetch_bundle_uri(the_repository, bundle_uri, NULL))
+		warning(_("failed to fetch bundles from '%s'"), bundle_uri);
+
 	if (all) {
 		if (argc == 1)
 			die(_("fetch --all does not take a repository argument"));
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index b2d15e141ca..7deeb4b8ad1 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -440,7 +440,55 @@ test_expect_success 'clone incomplete bundle list (http, creationToken)' '
 	EOF
 
 	test_remote_https_urls <trace-clone.txt >actual &&
-	test_cmp expect actual
+	test_cmp expect actual &&
+
+	# We now have only one bundle ref.
+	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	EOF
+	test_cmp expect refs &&
+
+	# Add remaining bundles, exercising the "deepening" strategy
+	# for downloading via the creationToken heurisitc.
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
+		git -C clone-token-http fetch origin --no-tags \
+		refs/heads/merge:refs/heads/merge &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	$HTTPD_URL/bundle-3.bundle
+	$HTTPD_URL/bundle-2.bundle
+	EOF
+
+	test_remote_https_urls <trace1.txt >actual &&
+	test_cmp expect actual &&
+
+	# We now have all bundle refs.
+	git -C clone-token-http for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	refs/bundles/right
+	EOF
+	test_cmp expect refs
 '
 
 test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
@@ -477,6 +525,69 @@ test_expect_success 'http clone with bundle.heuristic creates fetch.bundleURI' '
 	cat >expect <<-\EOF &&
 	refs/bundles/base
 	EOF
+	test_cmp expect refs &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-2"]
+		uri = bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	# Fetch the objects for bundle-2 _and_ bundle-3.
+	GIT_TRACE2_EVENT="$(pwd)/trace1.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/left:refs/heads/left \
+		refs/heads/right:refs/heads/right &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-2.bundle
+	EOF
+
+	test_remote_https_urls <trace1.txt >actual &&
+	test_cmp expect actual &&
+
+	# received left from bundle-2
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect refs &&
+
+	cat >>"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
+	[bundle "bundle-3"]
+		uri = bundle-3.bundle
+		creationToken = 3
+
+	[bundle "bundle-4"]
+		uri = bundle-4.bundle
+		creationToken = 4
+	EOF
+
+	# This fetch should skip bundle-3.bundle, since its objects are
+	# already local (we have the requisite commits for bundle-4.bundle).
+	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+		git -C fetch-http-4 fetch origin --no-tags \
+		refs/heads/merge:refs/heads/merge &&
+
+	cat >expect <<-EOF &&
+	$HTTPD_URL/bundle-list
+	$HTTPD_URL/bundle-4.bundle
+	EOF
+
+	test_remote_https_urls <trace2.txt >actual &&
+	test_cmp expect actual &&
+
+	# received merge ref from bundle-4, but right is missing
+	# because we did not download bundle-3.
+	git -C fetch-http-4 for-each-ref --format="%(refname)" "refs/bundles/*" >refs &&
+
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	refs/bundles/merge
+	EOF
 	test_cmp expect refs
 '
 
-- 
gitgitgadget


^ permalink raw reply related

* [PATCH v3 08/11] bundle-uri: drop bundle.flag from design doc
From: Derrick Stolee via GitGitGadget @ 2023-01-31 13:29 UTC (permalink / raw)
  To: git
  Cc: gitster, me, vdye, avarab, steadmon, chooglen, Derrick Stolee,
	Derrick Stolee
In-Reply-To: <pull.1454.v3.git.1675171759.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

The Implementation Plan section lists a 'bundle.flag' option that is not
documented anywhere else. What is documented elsewhere in the document
and implemented by previous changes is the 'bundle.heuristic' config
key. For now, a heuristic is required to indicate that a bundle list is
organized for use during 'git fetch', and it is also sufficient for all
existing designs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 Documentation/technical/bundle-uri.txt | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/bundle-uri.txt b/Documentation/technical/bundle-uri.txt
index b78d01d9adf..91d3a13e327 100644
--- a/Documentation/technical/bundle-uri.txt
+++ b/Documentation/technical/bundle-uri.txt
@@ -479,14 +479,14 @@ outline for submitting these features:
    (This choice is an opt-in via a config option and a command-line
    option.)
 
-4. Allow the client to understand the `bundle.flag=forFetch` configuration
+4. Allow the client to understand the `bundle.heuristic` configuration key
    and the `bundle.<id>.creationToken` heuristic. When `git clone`
-   discovers a bundle URI with `bundle.flag=forFetch`, it configures the
-   client repository to check that bundle URI during later `git fetch <remote>`
+   discovers a bundle URI with `bundle.heuristic`, it configures the client
+   repository to check that bundle URI during later `git fetch <remote>`
    commands.
 
 5. Allow clients to discover bundle URIs during `git fetch` and configure
-   a bundle URI for later fetches if `bundle.flag=forFetch`.
+   a bundle URI for later fetches if `bundle.heuristic` is set.
 
 6. Implement the "inspect headers" heuristic to reduce data downloads when
    the `bundle.<id>.creationToken` heuristic is not available.
-- 
gitgitgadget


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox