git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Konstantin Tokarev <annulen@yandex.ru>
Cc: Derrick Stolee <stolee@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	git@vger.kernel.org
Subject: [PATCH] clone: use "quick" lookup while following tags
Date: Wed, 1 Apr 2020 08:15:37 -0400	[thread overview]
Message-ID: <20200401121537.GA1916590@coredump.intra.peff.net> (raw)
In-Reply-To: <20200328144023.GB1198080@coredump.intra.peff.net>

On Sat, Mar 28, 2020 at 10:40:23AM -0400, Jeff King wrote:

> So I guess the problem is not with shallow clones specifically, but they
> lead us to not having fetched the commits pointed to by tags, which
> leads to us trying to fault in those commits (and their trees) rather
> than realizing that we weren't meant to have them. And the size of the
> local repo balloons because you're fetching all those commits one by
> one, and not getting the benefit of the deltas you would when you do a
> single --filter=blob:none fetch.
> 
> I guess we need something like this:

The issue is actually with --single-branch, which is implied by --depth.
But the fix is the same either way.

Here it is with a commit message and test.

-- >8 --
Subject: [PATCH] clone: use "quick" lookup while following tags

When cloning with --single-branch, we implement git-fetch's usual
tag-following behavior, grabbing any tag objects that point to objects
we have locally.

When we're a partial clone, though, our has_object_file() check will
actually lazy-fetch each tag. That not only defeats the purpose of
--single-branch, but it does it incredibly slowly, potentially kicking
off a new fetch for each tag. This is even worse for a shallow clone,
which implies --single-branch, because even tags which are supersets of
each other will be fetched individually.

We can fix this by passing OBJECT_INFO_SKIP_FETCH_OBJECT to the call,
which is what git-fetch does in this case.

Likewise, let's include OBJECT_INFO_QUICK, as that's what git-fetch
does. The rationale is discussed in 5827a03545 (fetch: use "quick"
has_sha1_file for tag following, 2016-10-13), but here the tradeoff
would apply even more so because clone is very unlikely to be racing
with another process repacking our newly-created repository.

This may provide a very small speedup even in the non-partial case case,
as we'd avoid calling reprepare_packed_git() for each tag (though in
practice, we'd only have a single packfile, so that reprepare should be
quite cheap).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/clone.c          | 4 +++-
 t/t5616-partial-clone.sh | 8 ++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index d8b1f413aa..9da6459f1d 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -643,7 +643,9 @@ static void write_followtags(const struct ref *refs, const char *msg)
 			continue;
 		if (ends_with(ref->name, "^{}"))
 			continue;
-		if (!has_object_file(&ref->old_oid))
+		if (!has_object_file_with_flags(&ref->old_oid,
+						OBJECT_INFO_QUICK |
+						OBJECT_INFO_SKIP_FETCH_OBJECT))
 			continue;
 		update_ref(msg, ref->name, &ref->old_oid, NULL, 0,
 			   UPDATE_REFS_DIE_ON_ERR);
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 77bb91e976..8f0d81a27e 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -415,6 +415,14 @@ test_expect_success 'verify fetch downloads only one pack when updating refs' '
 	test_line_count = 3 pack-list
 '
 
+test_expect_success 'single-branch tag following respects partial clone' '
+	git clone --single-branch -b B --filter=blob:none \
+		"file://$(pwd)/srv.bare" single &&
+	git -C single rev-parse --verify refs/tags/B &&
+	git -C single rev-parse --verify refs/tags/A &&
+	test_must_fail git -C single rev-parse --verify refs/tags/C
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.26.0.408.gebd8a4413c


  parent reply	other threads:[~2020-04-01 12:15 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27 21:08 Inefficiency of partial shallow clone vs shallow clone + "old-style" sparse checkout Konstantin Tokarev
2020-03-28 14:40 ` Jeff King
2020-03-28 16:58   ` Derrick Stolee
2020-03-31 21:46     ` Taylor Blau
2020-04-01 12:18       ` Jeff King
2020-03-31 22:10     ` Konstantin Tokarev
2020-03-31 22:23       ` Konstantin Tokarev
2020-04-01  0:09         ` Derrick Stolee
2020-04-01  1:49           ` Konstantin Tokarev
2020-04-01 11:44             ` Jeff King
2020-04-01 12:15   ` Jeff King [this message]
2020-04-01 19:12     ` [PATCH] clone: use "quick" lookup while following tags Konstantin Tokarev
2020-04-01 19:25       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200401121537.GA1916590@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=annulen@yandex.ru \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).