All of lore.kernel.org
 help / color / mirror / Atom feed
From: "blanet via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: blanet <bupt_xingxin@163.com>, Xing Xin <xingxin.xx@bytedance.com>
Subject: [PATCH v2] bundle-uri: verify oid before writing refs
Date: Mon, 20 May 2024 12:36:45 +0000	[thread overview]
Message-ID: <pull.1730.v2.git.1716208605926.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1730.git.1715742069966.gitgitgadget@gmail.com>

From: Xing Xin <xingxin.xx@bytedance.com>

When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the heuristic
`creationTokens`.

And this is easy to reproduce. Suppose we have a repository with a
single branch `main` pointing to commit `A`, firstly we create a base
bundle with

  git bundle create base.bundle main

Then let's add a new commit `B` on top of `A`, so that an incremental
bundle for `main` can be created with

  git bundle create incr.bundle A..main

Now we can generate a bundle list with the following content:

  [bundle]
      version = 1
      mode = all
      heuristic = creationToken

  [bundle "base"]
      uri = base.bundle
      creationToken = 1

  [bundle "incr"]
      uri = incr.bundle
      creationToken = 2

A fresh clone with the bundle list above would give the expected
`refs/bundles/main` pointing at `B` in new repository, in other words we
already had everything locally from the bundles, but git would still
download everything from server as if we got nothing.

So why the `refs/bundles/main` is not discovered? After some digging I
found that:

1. Bundles in bundle list are downloaded to local files via
   `download_bundle_list` or via `fetch_bundles_by_token` for the
   creationToken heuristic case.
2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
   is called by `unbundle_all_bundles` or called within
   `fetch_bundles_by_token` for the creationToken heuristic case.
3. Here, we first read the bundle header to get all the prerequisites
   for the bundle, this is done in `read_bundle_header`.
4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
   the repository does indeed contain the prerequisites mentioned in the
   bundle.
5. The `verify_bundle` will call `parse_object`, within which the
   `prepare_packed_git` or `reprepare_packed_git` is eventually called,
   which means that the `raw_object_store->packed_git` data gets filled
   in and ``packed_git_initialized` is set. This also means consecutive
   calls to `prepare_packed_git` doesn't re-initiate
   `raw_object_store->packed_git` since `packed_git_initialized` already
   is set.
6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
   with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
   target arbitrary objects are written to the repository.
7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
   `mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
   tips. Here it won't call `reprepare_packed_git` anymore so it would
   fail to parse oids that only reside in the last bundle.

Back to the example above, when unbunding `incr.bundle`, `base.pack` is
enlisted to `packed_git` bacause of the prerequisites to verify. While
we can not find `B` for negotiation at a latter time because `B` exists
in `incr.pack` which is not enlisted in `packed_git`.

This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
flag when writing bundle refs, so we can:

1. Ensure that the bundle refs we are writing are pointing to valid
   objects.
2. Ensure all the tips from bundle refs can be correctly parsed.

And a set of negotiation related tests for bundle-uri are added.

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
    bundle-uri: refresh packed_git if unbundle succeed
    
    cc: Patrick Steinhardt ps@pks.im cc: Karthik Nayak karthik.188@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1730

Range-diff vs v1:

 1:  d4841eea556 ! 1:  8bdeacf1360 bundle-uri: refresh packed_git if unbundle succeed
     @@ Metadata
      Author: Xing Xin <xingxin.xx@bytedance.com>
      
       ## Commit message ##
     -    bundle-uri: refresh packed_git if unbundle succeed
     +    bundle-uri: verify oid before writing refs
      
          When using the bundle-uri mechanism with a bundle list containing
          multiple interrelated bundles, we encountered a bug where tips from
     -    downloaded bundles were not being discovered, resulting in rather slow
     +    downloaded bundles were not discovered, thus resulting in rather slow
          clones. This was particularly problematic when employing the heuristic
          `creationTokens`.
      
     @@ Commit message
          So why the `refs/bundles/main` is not discovered? After some digging I
          found that:
      
     -    1. when unbundling a downloaded bundle, a `verify_bundle` is called to
     -       check its prerequisites if any. The verify procedure would find oids
     -       so `packed_git` is initialized.
     -
     -    2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
     -       during which `mark_complete_and_common_ref` and `mark_tips` would
     -       find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
     -       enlisted if `packed_git` has already initialized in 1.
     +    1. Bundles in bundle list are downloaded to local files via
     +       `download_bundle_list` or via `fetch_bundles_by_token` for the
     +       creationToken heuristic case.
     +    2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
     +       is called by `unbundle_all_bundles` or called within
     +       `fetch_bundles_by_token` for the creationToken heuristic case.
     +    3. Here, we first read the bundle header to get all the prerequisites
     +       for the bundle, this is done in `read_bundle_header`.
     +    4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
     +       the repository does indeed contain the prerequisites mentioned in the
     +       bundle.
     +    5. The `verify_bundle` will call `parse_object`, within which the
     +       `prepare_packed_git` or `reprepare_packed_git` is eventually called,
     +       which means that the `raw_object_store->packed_git` data gets filled
     +       in and ``packed_git_initialized` is set. This also means consecutive
     +       calls to `prepare_packed_git` doesn't re-initiate
     +       `raw_object_store->packed_git` since `packed_git_initialized` already
     +       is set.
     +    6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
     +       with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
     +       target arbitrary objects are written to the repository.
     +    7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
     +       `mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
     +       tips. Here it won't call `reprepare_packed_git` anymore so it would
     +       fail to parse oids that only reside in the last bundle.
      
          Back to the example above, when unbunding `incr.bundle`, `base.pack` is
     -    enlisted to `packed_git` bacause of the prerequisites to verify. Then we
     -    can not find `B` for negotiation at a latter time bacause `B` exists in
     -    `incr.pack` which is not enlisted in `packed_git`.
     +    enlisted to `packed_git` bacause of the prerequisites to verify. While
     +    we can not find `B` for negotiation at a latter time because `B` exists
     +    in `incr.pack` which is not enlisted in `packed_git`.
     +
     +    This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
     +    flag when writing bundle refs, so we can:
      
     -    This commit fixes this by adding a `reprepare_packed_git` call for every
     -    successfully unbundled bundle, which ensures to enlist all generated
     -    packs from bundle uri. And a set of negotiation related tests are added.
     +    1. Ensure that the bundle refs we are writing are pointing to valid
     +       objects.
     +    2. Ensure all the tips from bundle refs can be correctly parsed.
     +
     +    And a set of negotiation related tests for bundle-uri are added.
      
          Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
      
       ## bundle-uri.c ##
     -@@
     - #include "refs.h"
     - #include "run-command.h"
     - #include "hashmap.h"
     -+#include "packfile.h"
     - #include "pkt-line.h"
     - #include "config.h"
     - #include "remote.h"
      @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file)
     - 			       VERIFY_BUNDLE_QUIET)))
     - 		return 1;
     + 		refs_update_ref(get_main_ref_store(the_repository),
     + 				"fetched bundle", bundle_ref.buf, oid,
     + 				has_old ? &old_oid : NULL,
     +-				REF_SKIP_OID_VERIFICATION,
     +-				UPDATE_REFS_MSG_ON_ERR);
     ++				0, UPDATE_REFS_MSG_ON_ERR);
     + 	}
       
     -+	reprepare_packed_git(r);
     -+
     - 	/*
     - 	 * Convert all refs/heads/ from the bundle into refs/bundles/
     - 	 * in the local repository.
     + 	bundle_header_release(&header);
      
       ## t/t5558-clone-bundle-uri.sh ##
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'fail to clone from non-bundle file' '


 bundle-uri.c                |   3 +-
 t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 91b3319a5c1..65666a11d9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
 		refs_update_ref(get_main_ref_store(the_repository),
 				"fetched bundle", bundle_ref.buf, oid,
 				has_old ? &old_oid : NULL,
-				REF_SKIP_OID_VERIFICATION,
-				UPDATE_REFS_MSG_ON_ERR);
+				0, UPDATE_REFS_MSG_ON_ERR);
 	}
 
 	bundle_header_release(&header);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..a5b04d6f187 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' '
 test_expect_success 'create bundle' '
 	git init clone-from &&
 	git -C clone-from checkout -b topic &&
+
 	test_commit -C clone-from A &&
+	git -C clone-from bundle create A.bundle topic &&
+
 	test_commit -C clone-from B &&
 	git -C clone-from bundle create B.bundle topic
 '
@@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
 	! grep "refs/bundles/" refs
 '
 
+#########################################################################
+# Clone negotiation related tests begin here
+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+	test_when_finished rm -rf trace*.txt &&
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="clone-from/A.bundle" \
+		clone-from nego-bundle-part &&
+	git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/topic
+	EOF
+	test_cmp expect actual &&
+	# Ensure that refs/bundles/topic are sent as "have".
+	grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+	test_when_finished rm -rf trace*.txt &&
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --single-branch --branch=topic --no-tags \
+		--bundle-uri="clone-from/B.bundle" \
+		clone-from nego-bundle-all &&
+	git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/topic
+	EOF
+	test_cmp expect actual &&
+	# We already have all needed commits so no "want" needed.
+	! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-no-heuristic &&
+
+	git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-heuristic &&
+
+	git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --single-branch --branch=left --no-tags \
+		--bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-all &&
+
+	git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	# We already have all needed commits so no "want" needed.
+	! grep "clone> want " trace-packet.txt
+'
+
 #########################################################################
 # HTTP tests begin here
 

base-commit: d8ab1d464d07baa30e5a180eb33b3f9aa5c93adf
-- 
gitgitgadget

  parent reply	other threads:[~2024-05-20 12:36 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-15  3:01 [PATCH] bundle-uri: refresh packed_git if unbundle succeed blanet via GitGitGadget
2024-05-17  5:00 ` Patrick Steinhardt
2024-05-17 16:14   ` Junio C Hamano
2024-05-20 11:48     ` Xing Xin
2024-05-20 17:19       ` Junio C Hamano
2024-05-27 16:04         ` Xing Xin
2024-05-20  9:41   ` Xing Xin
2024-05-17  7:36 ` Karthik Nayak
2024-05-20 10:19   ` Xing Xin
2024-05-20 12:36 ` blanet via GitGitGadget [this message]
2024-05-21 15:41   ` [PATCH v2] bundle-uri: verify oid before writing refs Karthik Nayak
2024-05-27 15:41   ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-27 15:41     ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-05-28 11:55       ` Patrick Steinhardt
2024-05-30  8:32         ` Xing Xin
2024-05-27 15:41     ` [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling Xing Xin via GitGitGadget
2024-05-28 12:03       ` Patrick Steinhardt
2024-05-29 18:12         ` Xing Xin
2024-05-30  4:38           ` Patrick Steinhardt
2024-05-30  8:46             ` Xing Xin
2024-05-27 15:41     ` [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-05-28 12:03       ` Patrick Steinhardt
2024-05-28 17:10         ` Junio C Hamano
2024-05-28 17:24           ` Junio C Hamano
2024-05-29  5:52             ` Patrick Steinhardt
2024-05-30  8:48             ` Xing Xin
2024-05-29  5:52           ` Patrick Steinhardt
2024-05-27 15:41     ` [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
2024-05-28 12:05       ` Patrick Steinhardt
2024-05-30  8:54         ` Xing Xin
2024-05-30  8:21     ` [PATCH v4 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-30  8:21       ` [PATCH v4 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-05-30  8:21       ` [PATCH v4 2/4] unbundle: extend verify_bundle_flags to support fsck-objects Xing Xin via GitGitGadget
2024-06-06 12:06         ` Patrick Steinhardt
2024-06-11  6:46           ` Xing Xin
2024-05-30  8:21       ` [PATCH v4 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-05-30  8:21       ` [PATCH v4 4/4] unbundle: introduce option VERIFY_BUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
2024-06-06 12:06         ` Patrick Steinhardt
2024-06-11  6:46           ` Xing Xin
2024-06-11  6:42       ` [PATCH v5 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-11  6:42         ` [PATCH v5 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-11  6:42         ` [PATCH v5 2/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-11  6:42         ` [PATCH v5 3/4] unbundle: extend options to support object verification Xing Xin via GitGitGadget
2024-06-11  9:11           ` Patrick Steinhardt
2024-06-11 12:47             ` Xing Xin
2024-06-11  6:42         ` [PATCH v5 4/4] unbundle: use VERIFY_BUNDLE_FSCK_FOLLOW_FETCH for fetches Xing Xin via GitGitGadget
2024-06-11 12:45         ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-11 12:45           ` [PATCH v6 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-11 19:08             ` Junio C Hamano
2024-06-17 13:53               ` Xing Xin
2024-06-11 12:45           ` [PATCH v6 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-11 19:20             ` Junio C Hamano
2024-06-11 12:45           ` [PATCH v6 3/3] unbundle: support object verification for fetches Xing Xin via GitGitGadget
2024-06-11 20:05             ` Junio C Hamano
2024-06-12 18:33               ` Xing Xin
2024-06-11 13:14           ` [PATCH v6 0/3] object checking related additions and fixes for bundles in fetches Patrick Steinhardt
2024-06-17 13:55           ` [PATCH v7 " blanet via GitGitGadget
2024-06-17 13:55             ` [PATCH v7 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-18 17:37               ` Junio C Hamano
2024-06-19  6:30                 ` Xing Xin
2024-06-17 13:55             ` [PATCH v7 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-17 13:55             ` [PATCH v7 3/3] unbundle: extend object verification for fetches Xing Xin via GitGitGadget
2024-06-19  4:07             ` [PATCH v8 0/3] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-06-19  4:07               ` [PATCH v8 1/3] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-06-19  4:07               ` [PATCH v8 2/3] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-06-19  4:07               ` [PATCH v8 3/3] unbundle: extend object verification for fetches Xing Xin via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1730.v2.git.1716208605926.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=bupt_xingxin@163.com \
    --cc=git@vger.kernel.org \
    --cc=xingxin.xx@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.