All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>, Konstantin Tokarev <annulen@yandex.ru>
Cc: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Subject: Re: Inefficiency of partial shallow clone vs shallow clone + "old-style" sparse checkout
Date: Sat, 28 Mar 2020 12:58:41 -0400	[thread overview]
Message-ID: <decf87bb-dffc-e44e-912e-fe51bc2514c3@gmail.com> (raw)
In-Reply-To: <20200328144023.GB1198080@coredump.intra.peff.net>

On 3/28/2020 10:40 AM, Jeff King wrote:
> On Sat, Mar 28, 2020 at 12:08:17AM +0300, Konstantin Tokarev wrote:
> 
>> Is it a known thing that addition of --filter=blob:none to workflow
>> with shalow clone (e.g. --depth=1) and following sparse checkout may
>> significantly slow down process and result in much larger .git
>> repository?

In general, I would recommend not using shallow clones in conjunction
with partial clone. The blob:none filter will get you what you really
want from shallow clone without any of the downsides of shallow clone.

You do point out a bug that happens when these features are combined,
which is helpful. I'm just recommending that you do not combine these
features as you'll have a better experience (in my opinion).

>> In case anyone is interested, I've posted my measurements at [1].
>>
>> I understand this may have something to do with GitHub's server side
>> implementation, but AFAIK there are some GitHub folks here as well.
> 
> I think the problem is on the client side. Just with a local git.git
> clone, try this:
> 
>   $ git config uploadpack.allowfilter true
>   $ git clone --no-local --bare --depth=1 --filter=blob:none . both.git
>   Cloning into bare repository 'both.git'...
>   remote: Enumerating objects: 197, done.
>   remote: Counting objects: 100% (197/197), done.
>   remote: Compressing objects: 100% (153/153), done.
>   remote: Total 197 (delta 3), reused 171 (delta 3), pack-reused 0
>   Receiving objects: 100% (197/197), 113.63 KiB | 28.41 MiB/s, done.
>   Resolving deltas: 100% (3/3), done.
>   remote: Enumerating objects: 1871, done.
>   remote: Counting objects: 100% (1871/1871), done.
>   remote: Compressing objects: 100% (870/870), done.
>   remote: Total 1871 (delta 1001), reused 1855 (delta 994), pack-reused 0
>   Receiving objects: 100% (1871/1871), 384.93 KiB | 38.49 MiB/s, done.
>   Resolving deltas: 100% (1001/1001), done.
>   remote: Enumerating objects: 1878, done.
>   remote: Counting objects: 100% (1878/1878), done.
>   remote: Compressing objects: 100% (872/872), done.
>   remote: Total 1878 (delta 1004), reused 1864 (delta 999), pack-reused 0
>   Receiving objects: 100% (1878/1878), 386.41 KiB | 25.76 MiB/s, done.
>   Resolving deltas: 100% (1004/1004), done.
>   remote: Enumerating objects: 1903, done.
>   remote: Counting objects: 100% (1903/1903), done.
>   remote: Compressing objects: 100% (882/882), done.
>   remote: Total 1903 (delta 1020), reused 1890 (delta 1014), pack-reused 0
>   Receiving objects: 100% (1903/1903), 391.05 KiB | 16.29 MiB/s, done.
>   Resolving deltas: 100% (1020/1020), done.
>   remote: Enumerating objects: 1975, done.
>   remote: Counting objects: 100% (1975/1975), done.
>   remote: Compressing objects: 100% (915/915), done.
>   remote: Total 1975 (delta 1059), reused 1959 (delta 1052), pack-reused 0
>   Receiving objects: 100% (1975/1975), 405.58 KiB | 16.90 MiB/s, done.
>   Resolving deltas: 100% (1059/1059), done.
>   [...and so on...]
> 
> Oops. The backtrace for the clone during this process looks like:
> 
>   [...]
>   #11 0x000055b980be01dc in fetch_objects (remote_name=0x55b981607620 "origin", oids=0x55b9816217a8, oid_nr=1)
>       at promisor-remote.c:47
>   #12 0x000055b980be0812 in promisor_remote_get_direct (repo=0x55b980dcab00 <the_repo>, oids=0x55b9816217a8, oid_nr=1)
>       at promisor-remote.c:247
>   #13 0x000055b980c3e475 in do_oid_object_info_extended (r=0x55b980dcab00 <the_repo>, oid=0x55b9816217a8, 
>       oi=0x55b980dcaec0 <blank_oi>, flags=0) at sha1-file.c:1511
>   #14 0x000055b980c3e579 in oid_object_info_extended (r=0x55b980dcab00 <the_repo>, oid=0x55b9816217a8, oi=0x0, flags=0)
>       at sha1-file.c:1544
>   #15 0x000055b980c3f7bc in repo_has_object_file_with_flags (r=0x55b980dcab00 <the_repo>, oid=0x55b9816217a8, flags=0)
>       at sha1-file.c:1980
>   #16 0x000055b980c3f7ee in repo_has_object_file (r=0x55b980dcab00 <the_repo>, oid=0x55b9816217a8) at sha1-file.c:1986
>   #17 0x000055b980a54533 in write_followtags (refs=0x55b981610900, 
>       msg=0x55b981601230 "clone: from /home/peff/compile/git/.") at builtin/clone.c:646
>   #18 0x000055b980a54723 in update_remote_refs (refs=0x55b981610900, mapped_refs=0x55b98160fe20, 
>       remote_head_points_at=0x0, branch_top=0x55b981601130 "refs/heads/", 
>       msg=0x55b981601230 "clone: from /home/peff/compile/git/.", transport=0x55b98160da90, check_connectivity=1, 
>       check_refs_are_promisor_objects_only=1) at builtin/clone.c:699
>   #19 0x000055b980a5625b in cmd_clone (argc=2, argv=0x7fff5e0a1e70, prefix=0x0) at builtin/clone.c:1280
>   [...]
> 
> So I guess the problem is not with shallow clones specifically, but they
> lead us to not having fetched the commits pointed to by tags, which
> leads to us trying to fault in those commits (and their trees) rather
> than realizing that we weren't meant to have them. And the size of the
> local repo balloons because you're fetching all those commits one by
> one, and not getting the benefit of the deltas you would when you do a
> single --filter=blob:none fetch.
> 
> I guess we need something like this:
> 
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 488bdb0741..a1879994f5 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -643,7 +643,8 @@ static void write_followtags(const struct ref *refs, const char *msg)
>  			continue;
>  		if (ends_with(ref->name, "^{}"))
>  			continue;
> -		if (!has_object_file(&ref->old_oid))
> +		if (!has_object_file_with_flags(&ref->old_oid,
> +						OBJECT_INFO_SKIP_FETCH_OBJECT))
>  			continue;
>  		update_ref(msg, ref->name, &ref->old_oid, NULL, 0,
>  			   UPDATE_REFS_DIE_ON_ERR);
> 
> which seems to produce the desired result.

This is a good find, and I expect we will find more "opportunities"
to insert OBJECT_INFO_SKIP_FETCH_OBJECT like this.

-Stolee


  reply	other threads:[~2020-03-28 16:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27 21:08 Inefficiency of partial shallow clone vs shallow clone + "old-style" sparse checkout Konstantin Tokarev
2020-03-28 14:40 ` Jeff King
2020-03-28 16:58   ` Derrick Stolee [this message]
2020-03-31 21:46     ` Taylor Blau
2020-04-01 12:18       ` Jeff King
2020-03-31 22:10     ` Konstantin Tokarev
2020-03-31 22:23       ` Konstantin Tokarev
2020-04-01  0:09         ` Derrick Stolee
2020-04-01  1:49           ` Konstantin Tokarev
2020-04-01 11:44             ` Jeff King
2020-04-01 12:15   ` [PATCH] clone: use "quick" lookup while following tags Jeff King
2020-04-01 19:12     ` Konstantin Tokarev
2020-04-01 19:25       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=decf87bb-dffc-e44e-912e-fe51bc2514c3@gmail.com \
    --to=stolee@gmail.com \
    --cc=annulen@yandex.ru \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.