git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jacob Vosmaer <jacob@gitlab.com>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH 1/1] builtin/pack-objects.c: avoid iterating all refs
Date: Wed, 20 Jan 2021 09:50:19 +0100	[thread overview]
Message-ID: <87lfco801g.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20210119143348.27535-2-jacob@gitlab.com>


On Tue, Jan 19 2021, Jacob Vosmaer wrote:

> In git-pack-objects, we iterate over all the tags if the --include-tag
> option is passed on the command line. For some reason this uses
> for_each_ref which is expensive if the repo has many refs. We should
> use a prefix iterator instead.
>
> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
> ---
>  builtin/pack-objects.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 2a00358f34..2b32bc93bd 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3740,7 +3740,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	}
>  	cleanup_preferred_base();
>  	if (include_tag && nr_result)
> -		for_each_ref(add_ref_tag, NULL);
> +		for_each_fullref_in("refs/tags/", add_ref_tag, NULL, 0);
>  	stop_progress(&progress_state);
>  	trace2_region_leave("pack-objects", "enumerate-objects",
>  			    the_repository);

Not really a new problem, but it would be nice to also have a test in
t5305-include-tag.sh to check what happens with "tags" outside the
refs/tags/ namespace. I.e. if this & the existing prefix in
add_ref_tag() were dropped.

To elaborate on other things that aren't really your problem & Taylor's
E-Mail downthread we originally added this because:

    If new annotated tags have been introduced then we can also include
    them in the packfile, saving the client from needing to request them
    through a second connection.

I've barked up this whole tag fetch tree before 97716d217c (fetch: add a
--prune-tags option and fetch.pruneTags config, 2018-02-09) but really
not this specific area.

I wonder if longer term simply moving this to be work the client does
wouldn't make more sense. I.e. if we just delete this for_each_ref()
loop.
    
We're just saving a client from saying they "want" a tag. I.e. with the
whole thing removed we need this to make t5503-tagfollow.sh pass (see
[1] at the end for the dialog, the tag is 3253df4d...):

    diff --git a/t/t5503-tagfollow.sh b/t/t5503-tagfollow.sh
    index 6041a4dd32..1ddc430aef 100755
    --- a/t/t5503-tagfollow.sh
    +++ b/t/t5503-tagfollow.sh
    @@ -134,6 +134,7 @@ test_expect_success 'fetch B, S (commit and tag : 1 connection)' '
     test_expect_success 'setup expect' '
     cat - <<EOF >expect
     want $B
    +want $T
     want $S
     EOF
     '
    @@ -146,6 +147,7 @@ test_expect_success 'new clone fetch master and tags' '
                    cd clone2 &&
                    git init &&
                    git remote add origin .. &&
    +               git config --add remote.origin.fetch "+refs/tags/*:refs/tags/*" &&
                    GIT_TRACE_PACKET=$UPATH git fetch &&
                    test $B = $(git rev-parse --verify origin/master) &&
                    test $S = $(git rev-parse --verify tag2) &&

We're also saving the client the work of having to go through
refs/tags/* and figure out whether there are tags there that aren't on
our main history.

I think that since f0a24aa56e (git-pack-objects: Automatically pack
annotated tags if object was packed, 2008-03-03) was written it's become
clear that in the wild that's almost nobody's use-case. I.e. people with
a set of refs/heads/* branches and a refs/tags/* set that doesn't refer
to those branches.

Or if they do, I think it's such an obscure use-case that if we were
designing this today we could pass that work of figuring out if there's
such tags in refs/tags/* off to the client.

It seems we might just be able to delete this code on the server-side,
per protocol-capabilities.txt:

    Clients MUST be prepared for the case where a server has ignored
    include-tag and has not actually sent tags in the pack.  In such
    cases the client SHOULD issue a subsequent fetch to acquire the tags
    that include-tag would have otherwise given the client.

I.e. in the case where the server isn't playing along and I haven't set
"+refs/tags/*:refs/tags/*". But as the test shows we don't do that
following ourselves unless refs/tags/* is in the refspec (and then it's
not really "following", we're just getting all the tags).

1. From t5503-tagfollow.sh:
    $ grep -C5 3253df4d1cf6fb138b52b1938473bcfec1483223 UPLOAD_LOG 
    packet:  upload-pack< ref-prefix refs/tags/
    packet:  upload-pack< ref-prefix refs/tags/
    packet:  upload-pack< 0000
    packet:  upload-pack> 8e10cf4e007ad7e003463c30c34b1050b039db78 refs/heads/master
    packet:        fetch< 8e10cf4e007ad7e003463c30c34b1050b039db78 refs/heads/master
    packet:  upload-pack> 3253df4d1cf6fb138b52b1938473bcfec1483223 refs/tags/tag1 peeled:c06aaaf9c64f9ef1f209bcb6d7bdba68a5e27fab
    packet:        fetch< 3253df4d1cf6fb138b52b1938473bcfec1483223 refs/tags/tag1 peeled:c06aaaf9c64f9ef1f209bcb6d7bdba68a5e27fab
    packet:  upload-pack> ddfa4a33562179aca1ace2bcc662244a17d0b503 refs/tags/tag2 peeled:8e10cf4e007ad7e003463c30c34b1050b039db78
    packet:  upload-pack> 0000
    packet:        fetch< ddfa4a33562179aca1ace2bcc662244a17d0b503 refs/tags/tag2 peeled:8e10cf4e007ad7e003463c30c34b1050b039db78
    packet:        fetch< 0000
    packet:        fetch> command=fetch
    --
    packet:        fetch> 0001
    packet:        fetch> thin-pack
    packet:        fetch> include-tag
    packet:        fetch> ofs-delta
    packet:        fetch> want 8e10cf4e007ad7e003463c30c34b1050b039db78
    packet:        fetch> want 3253df4d1cf6fb138b52b1938473bcfec1483223
    packet:        fetch> want ddfa4a33562179aca1ace2bcc662244a17d0b503
    packet:        fetch> done
    packet:        fetch> 0000
    packet:  upload-pack< object-format=sha1
    packet:  upload-pack< command=fetch
    --
    packet:  upload-pack< 0001
    packet:  upload-pack< thin-pack
    packet:  upload-pack< include-tag
    packet:  upload-pack< ofs-delta
    packet:  upload-pack< want 8e10cf4e007ad7e003463c30c34b1050b039db78
    packet:  upload-pack< want 3253df4d1cf6fb138b52b1938473bcfec1483223
    packet:  upload-pack< want ddfa4a33562179aca1ace2bcc662244a17d0b503
    packet:  upload-pack< done
    packet:  upload-pack< 0000
    packet:  upload-pack> packfile
    packet:        fetch< packfile
    

  parent reply	other threads:[~2021-01-20  9:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 14:33 [PATCH 0/1] builtin/pack-objects.c: avoid iterating all refs Jacob Vosmaer
2021-01-19 14:33 ` [PATCH 1/1] " Jacob Vosmaer
2021-01-19 23:08   ` Taylor Blau
2021-01-19 23:33     ` Jeff King
2021-01-19 23:54       ` Taylor Blau
2021-01-19 23:15   ` Jeff King
2021-01-20  8:50   ` Ævar Arnfjörð Bjarmason [this message]
2021-01-20 15:02     ` Taylor Blau
2021-01-20 16:32       ` Ævar Arnfjörð Bjarmason
2021-01-20 19:53     ` Jeff King
2021-01-21 10:26       ` Ævar Arnfjörð Bjarmason
2021-01-21 15:34         ` Jeff King
2021-01-19 23:26 ` [PATCH 0/1] " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lfco801g.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jacob@gitlab.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).