From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jacob Vosmaer <jacob@gitlab.com>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
Jeff King <peff@peff.net>
Subject: Re: [PATCH 1/1] builtin/pack-objects.c: avoid iterating all refs
Date: Wed, 20 Jan 2021 09:50:19 +0100 [thread overview]
Message-ID: <87lfco801g.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20210119143348.27535-2-jacob@gitlab.com>
On Tue, Jan 19 2021, Jacob Vosmaer wrote:
> In git-pack-objects, we iterate over all the tags if the --include-tag
> option is passed on the command line. For some reason this uses
> for_each_ref which is expensive if the repo has many refs. We should
> use a prefix iterator instead.
>
> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
> ---
> builtin/pack-objects.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 2a00358f34..2b32bc93bd 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3740,7 +3740,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
> }
> cleanup_preferred_base();
> if (include_tag && nr_result)
> - for_each_ref(add_ref_tag, NULL);
> + for_each_fullref_in("refs/tags/", add_ref_tag, NULL, 0);
> stop_progress(&progress_state);
> trace2_region_leave("pack-objects", "enumerate-objects",
> the_repository);
Not really a new problem, but it would be nice to also have a test in
t5305-include-tag.sh to check what happens with "tags" outside the
refs/tags/ namespace. I.e. if this & the existing prefix in
add_ref_tag() were dropped.
To elaborate on other things that aren't really your problem & Taylor's
E-Mail downthread we originally added this because:
If new annotated tags have been introduced then we can also include
them in the packfile, saving the client from needing to request them
through a second connection.
I've barked up this whole tag fetch tree before 97716d217c (fetch: add a
--prune-tags option and fetch.pruneTags config, 2018-02-09) but really
not this specific area.
I wonder if longer term simply moving this to be work the client does
wouldn't make more sense. I.e. if we just delete this for_each_ref()
loop.
We're just saving a client from saying they "want" a tag. I.e. with the
whole thing removed we need this to make t5503-tagfollow.sh pass (see
[1] at the end for the dialog, the tag is 3253df4d...):
diff --git a/t/t5503-tagfollow.sh b/t/t5503-tagfollow.sh
index 6041a4dd32..1ddc430aef 100755
--- a/t/t5503-tagfollow.sh
+++ b/t/t5503-tagfollow.sh
@@ -134,6 +134,7 @@ test_expect_success 'fetch B, S (commit and tag : 1 connection)' '
test_expect_success 'setup expect' '
cat - <<EOF >expect
want $B
+want $T
want $S
EOF
'
@@ -146,6 +147,7 @@ test_expect_success 'new clone fetch master and tags' '
cd clone2 &&
git init &&
git remote add origin .. &&
+ git config --add remote.origin.fetch "+refs/tags/*:refs/tags/*" &&
GIT_TRACE_PACKET=$UPATH git fetch &&
test $B = $(git rev-parse --verify origin/master) &&
test $S = $(git rev-parse --verify tag2) &&
We're also saving the client the work of having to go through
refs/tags/* and figure out whether there are tags there that aren't on
our main history.
I think that since f0a24aa56e (git-pack-objects: Automatically pack
annotated tags if object was packed, 2008-03-03) was written it's become
clear that in the wild that's almost nobody's use-case. I.e. people with
a set of refs/heads/* branches and a refs/tags/* set that doesn't refer
to those branches.
Or if they do, I think it's such an obscure use-case that if we were
designing this today we could pass that work of figuring out if there's
such tags in refs/tags/* off to the client.
It seems we might just be able to delete this code on the server-side,
per protocol-capabilities.txt:
Clients MUST be prepared for the case where a server has ignored
include-tag and has not actually sent tags in the pack. In such
cases the client SHOULD issue a subsequent fetch to acquire the tags
that include-tag would have otherwise given the client.
I.e. in the case where the server isn't playing along and I haven't set
"+refs/tags/*:refs/tags/*". But as the test shows we don't do that
following ourselves unless refs/tags/* is in the refspec (and then it's
not really "following", we're just getting all the tags).
1. From t5503-tagfollow.sh:
$ grep -C5 3253df4d1cf6fb138b52b1938473bcfec1483223 UPLOAD_LOG
packet: upload-pack< ref-prefix refs/tags/
packet: upload-pack< ref-prefix refs/tags/
packet: upload-pack< 0000
packet: upload-pack> 8e10cf4e007ad7e003463c30c34b1050b039db78 refs/heads/master
packet: fetch< 8e10cf4e007ad7e003463c30c34b1050b039db78 refs/heads/master
packet: upload-pack> 3253df4d1cf6fb138b52b1938473bcfec1483223 refs/tags/tag1 peeled:c06aaaf9c64f9ef1f209bcb6d7bdba68a5e27fab
packet: fetch< 3253df4d1cf6fb138b52b1938473bcfec1483223 refs/tags/tag1 peeled:c06aaaf9c64f9ef1f209bcb6d7bdba68a5e27fab
packet: upload-pack> ddfa4a33562179aca1ace2bcc662244a17d0b503 refs/tags/tag2 peeled:8e10cf4e007ad7e003463c30c34b1050b039db78
packet: upload-pack> 0000
packet: fetch< ddfa4a33562179aca1ace2bcc662244a17d0b503 refs/tags/tag2 peeled:8e10cf4e007ad7e003463c30c34b1050b039db78
packet: fetch< 0000
packet: fetch> command=fetch
--
packet: fetch> 0001
packet: fetch> thin-pack
packet: fetch> include-tag
packet: fetch> ofs-delta
packet: fetch> want 8e10cf4e007ad7e003463c30c34b1050b039db78
packet: fetch> want 3253df4d1cf6fb138b52b1938473bcfec1483223
packet: fetch> want ddfa4a33562179aca1ace2bcc662244a17d0b503
packet: fetch> done
packet: fetch> 0000
packet: upload-pack< object-format=sha1
packet: upload-pack< command=fetch
--
packet: upload-pack< 0001
packet: upload-pack< thin-pack
packet: upload-pack< include-tag
packet: upload-pack< ofs-delta
packet: upload-pack< want 8e10cf4e007ad7e003463c30c34b1050b039db78
packet: upload-pack< want 3253df4d1cf6fb138b52b1938473bcfec1483223
packet: upload-pack< want ddfa4a33562179aca1ace2bcc662244a17d0b503
packet: upload-pack< done
packet: upload-pack< 0000
packet: upload-pack> packfile
packet: fetch< packfile
next prev parent reply other threads:[~2021-01-20 9:00 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-19 14:33 [PATCH 0/1] builtin/pack-objects.c: avoid iterating all refs Jacob Vosmaer
2021-01-19 14:33 ` [PATCH 1/1] " Jacob Vosmaer
2021-01-19 23:08 ` Taylor Blau
2021-01-19 23:33 ` Jeff King
2021-01-19 23:54 ` Taylor Blau
2021-01-19 23:15 ` Jeff King
2021-01-20 8:50 ` Ævar Arnfjörð Bjarmason [this message]
2021-01-20 15:02 ` Taylor Blau
2021-01-20 16:32 ` Ævar Arnfjörð Bjarmason
2021-01-20 19:53 ` Jeff King
2021-01-21 10:26 ` Ævar Arnfjörð Bjarmason
2021-01-21 15:34 ` Jeff King
2021-01-19 23:26 ` [PATCH 0/1] " Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87lfco801g.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=jacob@gitlab.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).