git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bundle-uri: copy all bundle references ino the refs/bundle space
@ 2025-02-25 13:19 Scott Chacon via GitGitGadget
  2025-02-25 18:14 ` Junio C Hamano
  2025-03-01 10:33 ` [PATCH v2 0/3] " Scott Chacon via GitGitGadget
  0 siblings, 2 replies; 37+ messages in thread
From: Scott Chacon via GitGitGadget @ 2025-02-25 13:19 UTC (permalink / raw)
  To: git; +Cc: Scott Chacon, Scott Chacon

From: Scott Chacon <schacon@gmail.com>

When downloading bundles via the bundle-uri functionality, we only copy the
references from refs/heads into the refs/bundle space. I'm not sure why this
refspec is hardcoded to be so limited, but it makes the ref negotiation on
the subsequent fetch suboptimal, since it won't use objects that are
referenced outside of the current heads of the bundled repository.

This change to copy everything in refs/ in the bundle to refs/bundles/
significantly helps the subsequent fetch, since nearly all the references
are now included in the negotiation.

Signed-off-by: Scott Chacon <schacon@gmail.com>
---
    bundle-uri: copy all bundle references ino the refs/bundle space
    
    This patch probably isn't meant for inclusion, but I wanted to see if
    I'm crazy here or missing something.
    
    It appears that the bundle-uri functionality has an issue with ref
    negotiation. I hit this because I assumed all the objects I bundled
    would be seen in the negotiation, but since only references under
    refs/heads are copied to refs/bundles, they are the only ones that are
    seen for negotiation, so it's quite inefficient.
    
    I did several experiments trying to create a bundle where the subsequent
    fetch was almost a no-op and it was frustratingly impossible and it took
    me a while to figure out why it kept trying to get tons of other
    objects.
    
    Furthermore, when I bundled just a tag (thinking it would have most
    reachable objects) it completely failed to work because there were no
    refs/heads/ available for negotiation - so it downloaded a huge file and
    then still started from scratch on the fetch.
    
    However, if I copy all the refs in the bundle, it makes a big
    difference.
    
    Here are some benchmarks from the gitlab oss repo.
    
    A normal clone pulls down 3,005,985 objects:
    
    ❯  time git clone https://gitlab.com/gitlab-org/gitlab-foss.git gl5
    Cloning into 'gl5'...
    remote: Enumerating objects: 3005985, done.
    remote: Counting objects: 100% (314617/314617), done.
    remote: Compressing objects: 100% (64278/64278), done.
    remote: Total 3005985 (delta 244429), reused 311002 (delta 241404), pack-reused 2691368 (from 1)
    Receiving objects: 100% (3005985/3005985), 1.35 GiB | 23.91 MiB/s, done.
    Resolving deltas: 100% (2361484/2361484), done.
    Updating files: 100% (59972/59972), done.
    (*) 162.93s user 37.94s system 128% cpu 2:36.49 total
    
    
    Then, I tried to bundle everything from a fresh clone, including all the
    refs.
    
     ❯  git bundle create gitlab-base.bundle --all
    
    
    This creates a 1.4G bundle, which I uploaded to a CDN and cloned again
    with the bundle-uri:
    
    ❯  time git clone --bundle-uri=https://[cdn]/bundle/gitlab-base.bundle https://gitlab.com/gitlab-org/gitlab-foss.git gl4
    Cloning into 'gl4'...
    remote: Enumerating objects: 1092703, done.
    remote: Counting objects: 100% (973405/973405), done.
    remote: Compressing objects: 100% (385827/385827), done.
    remote: Total 959773 (delta 710976), reused 766809 (delta 554276), pack-reused 0 (from 0)
    Receiving objects: 100% (959773/959773), 366.94 MiB | 20.87 MiB/s, done.
    Resolving deltas: 100% (710976/710976), completed with 9081 local objects.
    Checking objects: 100% (4194304/4194304), done.
    Checking connectivity: 959668, done.
    Updating files: 100% (59972/59972), done.
    (*) 181.98s user 40.23s system 110% cpu 3:20.89 total
    
    
    Which is better from an "objects from the server" perspective, but still
    has to download 959,773 objects, so 32% of the total. But it also takes
    quite a lot longer, because it's redownloading most of those objects for
    a second time.
    
    If I apply this patch where I change the refspec for the bundle ref copy
    from refs/heads/ to just refs/ and clone with this patched version, it's
    much better:
    
    ❯  time ./git clone --bundle-uri=https://[cdn]/bundle/gitlab-base.bundle https://gitlab.com/gitlab-org/gitlab-foss.git gl3
    Cloning into 'gl3'...
    remote: Enumerating objects: 65538, done.
    remote: Counting objects: 100% (56054/56054), done.
    remote: Compressing objects: 100% (28950/28950), done.
    remote: Total 43877 (delta 27401), reused 25170 (delta 13546), pack-reused 0 (from 0)
    Receiving objects: 100% (43877/43877), 40.42 MiB | 22.27 MiB/s, done.
    Resolving deltas: 100% (27401/27401), completed with 8564 local objects.
    Updating files: 100% (59972/59972), done.
    (*) 143.45s user 29.33s system 124% cpu 2:19.27 total
    
    
    Now I'm only getting an extra 43k objects, so 1% of the original total,
    and the entire operation is a bit faster as well.
    
    I'm not sure if there is a downside here, it seems clearly how you would
    want the negotiation to go. It ends up with way more refs under
    refs/bundle (now there is refs/bundle/origin/master, etc) but that's
    being polluted by the head refs anyhow, right?
    
    Is this a reasonable change?

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1897%2Fschacon%2Fsc-more-bundle-refs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1897/schacon/sc-more-bundle-refs-v1
Pull-Request: https://github.com/git/git/pull/1897

 bundle-uri.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 744257c49c1..3371d56f4ce 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -403,7 +403,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
 		const char *branch_name;
 		int has_old;
 
-		if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
+		if (!skip_prefix(refname->string, "refs/", &branch_name))
 			continue;
 
 		strbuf_setlen(&bundle_ref, bundle_prefix_len);

base-commit: 2d2a71ce85026edcc40f469678a1035df0dfcf57
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2025-04-29  9:00 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-25 13:19 [PATCH] bundle-uri: copy all bundle references ino the refs/bundle space Scott Chacon via GitGitGadget
2025-02-25 18:14 ` Junio C Hamano
2025-02-25 23:36   ` Derrick Stolee
2025-03-01 10:23     ` Scott Chacon
2025-03-03 17:12       ` Junio C Hamano
2025-03-03 18:46         ` Derrick Stolee
2025-03-01 10:33 ` [PATCH v2 0/3] " Scott Chacon via GitGitGadget
2025-03-01 10:33   ` [PATCH v2 1/3] " Scott Chacon via GitGitGadget
2025-03-01 10:33   ` [PATCH v2 2/3] bundle-uri: update bundle clone tests with new refspec path Scott Chacon via GitGitGadget
2025-03-01 10:33   ` [PATCH v2 3/3] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-03-03 18:49   ` [PATCH v2 0/3] bundle-uri: copy all bundle references ino the refs/bundle space Derrick Stolee
2025-03-18 15:36   ` [PATCH v3 0/2] " Scott Chacon via GitGitGadget
2025-03-18 15:36     ` [PATCH v3 1/2] " Scott Chacon via GitGitGadget
2025-03-19 10:24       ` Phillip Wood
2025-03-18 15:36     ` [PATCH v3 2/2] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-03-19 10:33       ` Phillip Wood
2025-03-19 17:50         ` Taylor Blau
2025-04-14 12:19           ` Toon Claes
2025-04-25 13:14             ` Scott Chacon
2025-03-21  6:31         ` Junio C Hamano
2025-04-25 13:17     ` [PATCH v4 0/2] bundle-uri: copy all bundle references ino the refs/bundle space Scott Chacon via GitGitGadget
2025-04-25 13:17       ` [PATCH v4 1/2] " Scott Chacon via GitGitGadget
2025-04-25 13:17       ` [PATCH v4 2/2] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-04-25 16:32         ` Scott Chacon
2025-04-25 13:53       ` [PATCH v4 0/2] bundle-uri: copy all bundle references ino the refs/bundle space Phillip Wood
2025-04-25 16:53         ` Junio C Hamano
2025-04-25 19:06       ` [PATCH v5 " Scott Chacon via GitGitGadget
2025-04-25 19:06         ` [PATCH v5 1/2] " Scott Chacon via GitGitGadget
2025-04-25 19:06         ` [PATCH v5 2/2] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-04-25 19:27         ` [PATCH v6 0/2] bundle-uri: copy all bundle references ino the refs/bundle space Scott Chacon via GitGitGadget
2025-04-25 19:27           ` [PATCH v6 1/2] " Scott Chacon via GitGitGadget
2025-04-25 19:27           ` [PATCH v6 2/2] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-04-25 19:33           ` [PATCH v7 0/2] bundle-uri: copy all bundle references ino the refs/bundle space Scott Chacon via GitGitGadget
2025-04-25 19:33             ` [PATCH v7 1/2] " Scott Chacon via GitGitGadget
2025-04-25 19:33             ` [PATCH v7 2/2] bundle-uri: add test for bundle-uri clones with tags Scott Chacon via GitGitGadget
2025-04-25 20:42             ` [PATCH v7 0/2] bundle-uri: copy all bundle references ino the refs/bundle space Junio C Hamano
2025-04-29  9:00             ` Phillip Wood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).