git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Jeff King <peff@peff.net>, Konstantin Tokarev <annulen@yandex.ru>,
	Jonathan Tan <jonathantanmy@google.com>,
	git@vger.kernel.org
Subject: Re: Inefficiency of partial shallow clone vs shallow clone + "old-style" sparse checkout
Date: Tue, 31 Mar 2020 15:46:53 -0600	[thread overview]
Message-ID: <20200331214653.GA95875@syl.local> (raw)
In-Reply-To: <decf87bb-dffc-e44e-912e-fe51bc2514c3@gmail.com>

On Sat, Mar 28, 2020 at 12:58:41PM -0400, Derrick Stolee wrote:
> On 3/28/2020 10:40 AM, Jeff King wrote:
> > On Sat, Mar 28, 2020 at 12:08:17AM +0300, Konstantin Tokarev wrote:
> >
> >> Is it a known thing that addition of --filter=blob:none to workflow
> >> with shalow clone (e.g. --depth=1) and following sparse checkout may
> >> significantly slow down process and result in much larger .git
> >> repository?
>
> In general, I would recommend not using shallow clones in conjunction
> with partial clone. The blob:none filter will get you what you really
> want from shallow clone without any of the downsides of shallow clone.
>
> You do point out a bug that happens when these features are combined,
> which is helpful. I'm just recommending that you do not combine these
> features as you'll have a better experience (in my opinion).
>
> >> In case anyone is interested, I've posted my measurements at [1].
> >>
> >> I understand this may have something to do with GitHub's server side
> >> implementation, but AFAIK there are some GitHub folks here as well.
> >
> > I think the problem is on the client side. Just with a local git.git
> > clone, try this:
> >
> >   $ git config uploadpack.allowfilter true
> >   $ git clone --no-local --bare --depth=1 --filter=blob:none . both.git
> >   Cloning into bare repository 'both.git'...
> >   remote: Enumerating objects: 197, done.
> >   remote: Counting objects: 100% (197/197), done.
> >   remote: Compressing objects: 100% (153/153), done.
> >   remote: Total 197 (delta 3), reused 171 (delta 3), pack-reused 0
> >   Receiving objects: 100% (197/197), 113.63 KiB | 28.41 MiB/s, done.
> >   Resolving deltas: 100% (3/3), done.
> >   remote: Enumerating objects: 1871, done.
> >   remote: Counting objects: 100% (1871/1871), done.
> >   remote: Compressing objects: 100% (870/870), done.
> >   remote: Total 1871 (delta 1001), reused 1855 (delta 994), pack-reused 0
> >   Receiving objects: 100% (1871/1871), 384.93 KiB | 38.49 MiB/s, done.
> >   Resolving deltas: 100% (1001/1001), done.
> >   remote: Enumerating objects: 1878, done.
> >   remote: Counting objects: 100% (1878/1878), done.
> >   remote: Compressing objects: 100% (872/872), done.
> >   remote: Total 1878 (delta 1004), reused 1864 (delta 999), pack-reused 0
> >   Receiving objects: 100% (1878/1878), 386.41 KiB | 25.76 MiB/s, done.
> >   Resolving deltas: 100% (1004/1004), done.
> >   remote: Enumerating objects: 1903, done.
> >   remote: Counting objects: 100% (1903/1903), done.
> >   remote: Compressing objects: 100% (882/882), done.
> >   remote: Total 1903 (delta 1020), reused 1890 (delta 1014), pack-reused 0
> >   Receiving objects: 100% (1903/1903), 391.05 KiB | 16.29 MiB/s, done.
> >   Resolving deltas: 100% (1020/1020), done.
> >   remote: Enumerating objects: 1975, done.
> >   remote: Counting objects: 100% (1975/1975), done.
> >   remote: Compressing objects: 100% (915/915), done.
> >   remote: Total 1975 (delta 1059), reused 1959 (delta 1052), pack-reused 0
> >   Receiving objects: 100% (1975/1975), 405.58 KiB | 16.90 MiB/s, done.
> >   Resolving deltas: 100% (1059/1059), done.
> >   [...and so on...]
> >
> > Oops. The backtrace for the clone during this process looks like:
> >
> >   [...]
> >   #11 0x000055b980be01dc in fetch_objects (remote_name=0x55b981607620 "origin", oids=0x55b9816217a8, oid_nr=1)
> >       at promisor-remote.c:47
> >   #12 0x000055b980be0812 in promisor_remote_get_direct (repo=0x55b980dcab00 <the_repo>, oids=0x55b9816217a8, oid_nr=1)
> >       at promisor-remote.c:247
> >   #13 0x000055b980c3e475 in do_oid_object_info_extended (r=0x55b980dcab00 <the_repo>, oid=0x55b9816217a8,
> >       oi=0x55b980dcaec0 <blank_oi>, flags=0) at sha1-file.c:1511
> >   #14 0x000055b980c3e579 in oid_object_info_extended (r=0x55b980dcab00 <the_repo>, oid=0x55b9816217a8, oi=0x0, flags=0)
> >       at sha1-file.c:1544
> >   #15 0x000055b980c3f7bc in repo_has_object_file_with_flags (r=0x55b980dcab00 <the_repo>, oid=0x55b9816217a8, flags=0)
> >       at sha1-file.c:1980
> >   #16 0x000055b980c3f7ee in repo_has_object_file (r=0x55b980dcab00 <the_repo>, oid=0x55b9816217a8) at sha1-file.c:1986
> >   #17 0x000055b980a54533 in write_followtags (refs=0x55b981610900,
> >       msg=0x55b981601230 "clone: from /home/peff/compile/git/.") at builtin/clone.c:646
> >   #18 0x000055b980a54723 in update_remote_refs (refs=0x55b981610900, mapped_refs=0x55b98160fe20,
> >       remote_head_points_at=0x0, branch_top=0x55b981601130 "refs/heads/",
> >       msg=0x55b981601230 "clone: from /home/peff/compile/git/.", transport=0x55b98160da90, check_connectivity=1,
> >       check_refs_are_promisor_objects_only=1) at builtin/clone.c:699
> >   #19 0x000055b980a5625b in cmd_clone (argc=2, argv=0x7fff5e0a1e70, prefix=0x0) at builtin/clone.c:1280
> >   [...]
> >
> > So I guess the problem is not with shallow clones specifically, but they
> > lead us to not having fetched the commits pointed to by tags, which
> > leads to us trying to fault in those commits (and their trees) rather
> > than realizing that we weren't meant to have them. And the size of the
> > local repo balloons because you're fetching all those commits one by
> > one, and not getting the benefit of the deltas you would when you do a
> > single --filter=blob:none fetch.
> >
> > I guess we need something like this:
> >
> > diff --git a/builtin/clone.c b/builtin/clone.c
> > index 488bdb0741..a1879994f5 100644
> > --- a/builtin/clone.c
> > +++ b/builtin/clone.c
> > @@ -643,7 +643,8 @@ static void write_followtags(const struct ref *refs, const char *msg)
> >  			continue;
> >  		if (ends_with(ref->name, "^{}"))
> >  			continue;
> > -		if (!has_object_file(&ref->old_oid))
> > +		if (!has_object_file_with_flags(&ref->old_oid,
> > +						OBJECT_INFO_SKIP_FETCH_OBJECT))
> >  			continue;
> >  		update_ref(msg, ref->name, &ref->old_oid, NULL, 0,
> >  			   UPDATE_REFS_DIE_ON_ERR);
> >
> > which seems to produce the desired result.
>
> This is a good find, and I expect we will find more "opportunities"
> to insert OBJECT_INFO_SKIP_FETCH_OBJECT like this.

Should we turn this into a proper patch and have it reviewed? It seems
to be helping the situation, and after thinking about it (only briefly,
but more than not ;-)), this seems like the right direction.

There's an argument to be had about bundling a number of these up
instead of having a slow drip of patches that sprinkle
'SKIP_FETCH_OBJECT' everywhere, but I don't think that we want perfect
to be the enemy of good here.

Peff, if you don't feel like doing this, or have a backlog that is too
long, I'd be happy to polish this for you.

> -Stolee

Thanks,
Taylor

  reply	other threads:[~2020-03-31 21:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27 21:08 Inefficiency of partial shallow clone vs shallow clone + "old-style" sparse checkout Konstantin Tokarev
2020-03-28 14:40 ` Jeff King
2020-03-28 16:58   ` Derrick Stolee
2020-03-31 21:46     ` Taylor Blau [this message]
2020-04-01 12:18       ` Jeff King
2020-03-31 22:10     ` Konstantin Tokarev
2020-03-31 22:23       ` Konstantin Tokarev
2020-04-01  0:09         ` Derrick Stolee
2020-04-01  1:49           ` Konstantin Tokarev
2020-04-01 11:44             ` Jeff King
2020-04-01 12:15   ` [PATCH] clone: use "quick" lookup while following tags Jeff King
2020-04-01 19:12     ` Konstantin Tokarev
2020-04-01 19:25       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200331214653.GA95875@syl.local \
    --to=me@ttaylorr.com \
    --cc=annulen@yandex.ru \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).