From: "Scott Chacon via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Scott Chacon <schacon@gmail.com>, Scott Chacon <schacon@gmail.com>
Subject: [PATCH] bundle-uri: copy all bundle references ino the refs/bundle space
Date: Tue, 25 Feb 2025 13:19:45 +0000 [thread overview]
Message-ID: <pull.1897.git.git.1740489585344.gitgitgadget@gmail.com> (raw)
From: Scott Chacon <schacon@gmail.com>
When downloading bundles via the bundle-uri functionality, we only copy the
references from refs/heads into the refs/bundle space. I'm not sure why this
refspec is hardcoded to be so limited, but it makes the ref negotiation on
the subsequent fetch suboptimal, since it won't use objects that are
referenced outside of the current heads of the bundled repository.
This change to copy everything in refs/ in the bundle to refs/bundles/
significantly helps the subsequent fetch, since nearly all the references
are now included in the negotiation.
Signed-off-by: Scott Chacon <schacon@gmail.com>
---
bundle-uri: copy all bundle references ino the refs/bundle space
This patch probably isn't meant for inclusion, but I wanted to see if
I'm crazy here or missing something.
It appears that the bundle-uri functionality has an issue with ref
negotiation. I hit this because I assumed all the objects I bundled
would be seen in the negotiation, but since only references under
refs/heads are copied to refs/bundles, they are the only ones that are
seen for negotiation, so it's quite inefficient.
I did several experiments trying to create a bundle where the subsequent
fetch was almost a no-op and it was frustratingly impossible and it took
me a while to figure out why it kept trying to get tons of other
objects.
Furthermore, when I bundled just a tag (thinking it would have most
reachable objects) it completely failed to work because there were no
refs/heads/ available for negotiation - so it downloaded a huge file and
then still started from scratch on the fetch.
However, if I copy all the refs in the bundle, it makes a big
difference.
Here are some benchmarks from the gitlab oss repo.
A normal clone pulls down 3,005,985 objects:
❯ time git clone https://gitlab.com/gitlab-org/gitlab-foss.git gl5
Cloning into 'gl5'...
remote: Enumerating objects: 3005985, done.
remote: Counting objects: 100% (314617/314617), done.
remote: Compressing objects: 100% (64278/64278), done.
remote: Total 3005985 (delta 244429), reused 311002 (delta 241404), pack-reused 2691368 (from 1)
Receiving objects: 100% (3005985/3005985), 1.35 GiB | 23.91 MiB/s, done.
Resolving deltas: 100% (2361484/2361484), done.
Updating files: 100% (59972/59972), done.
(*) 162.93s user 37.94s system 128% cpu 2:36.49 total
Then, I tried to bundle everything from a fresh clone, including all the
refs.
❯ git bundle create gitlab-base.bundle --all
This creates a 1.4G bundle, which I uploaded to a CDN and cloned again
with the bundle-uri:
❯ time git clone --bundle-uri=https://[cdn]/bundle/gitlab-base.bundle https://gitlab.com/gitlab-org/gitlab-foss.git gl4
Cloning into 'gl4'...
remote: Enumerating objects: 1092703, done.
remote: Counting objects: 100% (973405/973405), done.
remote: Compressing objects: 100% (385827/385827), done.
remote: Total 959773 (delta 710976), reused 766809 (delta 554276), pack-reused 0 (from 0)
Receiving objects: 100% (959773/959773), 366.94 MiB | 20.87 MiB/s, done.
Resolving deltas: 100% (710976/710976), completed with 9081 local objects.
Checking objects: 100% (4194304/4194304), done.
Checking connectivity: 959668, done.
Updating files: 100% (59972/59972), done.
(*) 181.98s user 40.23s system 110% cpu 3:20.89 total
Which is better from an "objects from the server" perspective, but still
has to download 959,773 objects, so 32% of the total. But it also takes
quite a lot longer, because it's redownloading most of those objects for
a second time.
If I apply this patch where I change the refspec for the bundle ref copy
from refs/heads/ to just refs/ and clone with this patched version, it's
much better:
❯ time ./git clone --bundle-uri=https://[cdn]/bundle/gitlab-base.bundle https://gitlab.com/gitlab-org/gitlab-foss.git gl3
Cloning into 'gl3'...
remote: Enumerating objects: 65538, done.
remote: Counting objects: 100% (56054/56054), done.
remote: Compressing objects: 100% (28950/28950), done.
remote: Total 43877 (delta 27401), reused 25170 (delta 13546), pack-reused 0 (from 0)
Receiving objects: 100% (43877/43877), 40.42 MiB | 22.27 MiB/s, done.
Resolving deltas: 100% (27401/27401), completed with 8564 local objects.
Updating files: 100% (59972/59972), done.
(*) 143.45s user 29.33s system 124% cpu 2:19.27 total
Now I'm only getting an extra 43k objects, so 1% of the original total,
and the entire operation is a bit faster as well.
I'm not sure if there is a downside here, it seems clearly how you would
want the negotiation to go. It ends up with way more refs under
refs/bundle (now there is refs/bundle/origin/master, etc) but that's
being polluted by the head refs anyhow, right?
Is this a reasonable change?
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1897%2Fschacon%2Fsc-more-bundle-refs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1897/schacon/sc-more-bundle-refs-v1
Pull-Request: https://github.com/git/git/pull/1897
bundle-uri.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/bundle-uri.c b/bundle-uri.c
index 744257c49c1..3371d56f4ce 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -403,7 +403,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
const char *branch_name;
int has_old;
- if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
+ if (!skip_prefix(refname->string, "refs/", &branch_name))
continue;
strbuf_setlen(&bundle_ref, bundle_prefix_len);
base-commit: 2d2a71ce85026edcc40f469678a1035df0dfcf57
--
gitgitgadget
next reply other threads:[~2025-02-25 13:19 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-25 13:19 Scott Chacon via GitGitGadget [this message]
2025-02-25 18:14 ` [PATCH] bundle-uri: copy all bundle references ino the refs/bundle space Junio C Hamano
2025-02-25 23:36 ` Derrick Stolee
2025-03-01 10:23 ` Scott Chacon
2025-03-03 17:12 ` Junio C Hamano
2025-03-03 18:46 ` Derrick Stolee
2025-03-01 10:33 ` [PATCH v2 0/3] " Scott Chacon via GitGitGadget
2025-03-01 10:33 ` [PATCH v2 1/3] " Scott Chacon via GitGitGadget
2025-03-01 10:33 ` [PATCH v2 2/3] bundle-uri: update bundle clone tests with new refspec path Scott Chacon via GitGitGadget
2025-03-01 10:33 ` [PATCH v2 3/3] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-03-03 18:49 ` [PATCH v2 0/3] bundle-uri: copy all bundle references ino the refs/bundle space Derrick Stolee
2025-03-18 15:36 ` [PATCH v3 0/2] " Scott Chacon via GitGitGadget
2025-03-18 15:36 ` [PATCH v3 1/2] " Scott Chacon via GitGitGadget
2025-03-19 10:24 ` Phillip Wood
2025-03-18 15:36 ` [PATCH v3 2/2] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-03-19 10:33 ` Phillip Wood
2025-03-19 17:50 ` Taylor Blau
2025-04-14 12:19 ` Toon Claes
2025-04-25 13:14 ` Scott Chacon
2025-03-21 6:31 ` Junio C Hamano
2025-04-25 13:17 ` [PATCH v4 0/2] bundle-uri: copy all bundle references ino the refs/bundle space Scott Chacon via GitGitGadget
2025-04-25 13:17 ` [PATCH v4 1/2] " Scott Chacon via GitGitGadget
2025-04-25 13:17 ` [PATCH v4 2/2] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-04-25 16:32 ` Scott Chacon
2025-04-25 13:53 ` [PATCH v4 0/2] bundle-uri: copy all bundle references ino the refs/bundle space Phillip Wood
2025-04-25 16:53 ` Junio C Hamano
2025-04-25 19:06 ` [PATCH v5 " Scott Chacon via GitGitGadget
2025-04-25 19:06 ` [PATCH v5 1/2] " Scott Chacon via GitGitGadget
2025-04-25 19:06 ` [PATCH v5 2/2] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-04-25 19:27 ` [PATCH v6 0/2] bundle-uri: copy all bundle references ino the refs/bundle space Scott Chacon via GitGitGadget
2025-04-25 19:27 ` [PATCH v6 1/2] " Scott Chacon via GitGitGadget
2025-04-25 19:27 ` [PATCH v6 2/2] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-04-25 19:33 ` [PATCH v7 0/2] bundle-uri: copy all bundle references ino the refs/bundle space Scott Chacon via GitGitGadget
2025-04-25 19:33 ` [PATCH v7 1/2] " Scott Chacon via GitGitGadget
2025-04-25 19:33 ` [PATCH v7 2/2] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-04-25 20:42 ` [PATCH v7 0/2] bundle-uri: copy all bundle references ino the refs/bundle space Junio C Hamano
2025-04-29 9:00 ` Phillip Wood
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1897.git.git.1740489585344.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=schacon@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).