All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no
Date: Fri, 20 Nov 2020 20:36:26 +0000	[thread overview]
Message-ID: <pull.797.git.1605904586929.gitgitgadget@gmail.com> (raw)

From: Derrick Stolee <dstolee@microsoft.com>

The partial clone feature has several modes, but only a few are quick
for a server to process using reachability bitmaps:

* Blobless: --filter=blob:none downloads all commits and trees and
  fetches necessary blobs on-demand.

* Treeless: --filter=tree:0 downloads all commits and fetches necessary
  trees and blobs on demand.

This treeles mode is most similar to a shallow clone in the total size
(it only adds the commit objects for the full history). This makes
treeless clones an interesting replacement for shallow clones. A user
can run more commands in a treeless clone than in a shallow clone,
especially 'git log' (no pathspec).

In particular, servers can still serve 'git fetch' requests quickly by
calculating the difference between commit wants and haves using bitmaps.

I was testing this feature with this in mind, and I knew that some trees
would be downloaded multiple times when checking out a new branch, but I
did not expect to discover a significant issue with 'git fetch', at
least in repostiories with submodules.

I was testing these commands:

	$ git clone --filter=tree:0 --single-branch --branch=master \
	  https://github.com/git/git
	$ git -C git fetch origin "+refs/heads/*:refs/remotes/origin/*"

This fetch command started downloading several pack-files of trees
before completing the command. I never let it finish since I got so
impatient with the repeated downloads. During debugging, I found that
the stack triggering promisor_remote_get_direct() was going through
fetch_populated_submodules(). Notice that I did not recurse my
submodules in the original clone, so the sha1collisiondetection
submodule is not initialized. Even so, my 'git fetch' was scanning
commits for updates to submodules.

I decided that even if I did populate the submodules, the nature of
treeless clones makes me not want to care about the contents of commits
other than those that I am explicitly navigating to.

This loop of tree fetches can be avoided by adding
--no-recurse-submodules to the 'git fetch' command or setting
fetch.recurseSubmodules=no.

To make this as painless as possible for future users of treeless
clones, automatically set fetch.recurseSubmodules=no at clone time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    clone: --filter=tree:0 implies fetch.recurseSubmodules=no
    
    While testing different partial clone options, I stumbled across this
    one. My initial thought was that we were parsing commits and loading
    their root trees unnecessarily, but I see that doesn't happen after this
    change.
    
    Here are some recent discussions about using --filter=tree:0:
    
    [1] 
    https://lore.kernel.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com/
    [2] https://lore.kernel.org/git/cover.1588633810.git.me@ttaylorr.com/[3] 
    https://lore.kernel.org/git/58274817-7ac6-b6ae-0d10-22485dfe5e0e@syntevo.com/
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-797%2Fderrickstolee%2Ftree-0-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-797/derrickstolee/tree-0-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/797

 list-objects-filter-options.c | 4 ++++
 t/t5616-partial-clone.sh      | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index defd3dfd10..249939dfa5 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -376,6 +376,10 @@ void partial_clone_register(
 		       expand_list_objects_filter_spec(filter_options));
 	free(filter_name);
 
+	if (filter_options->choice == LOFC_TREE_DEPTH &&
+	    !filter_options->tree_exclude_depth)
+		git_config_set("fetch.recursesubmodules", "no");
+
 	/* Make sure the config info are reset */
 	promisor_remote_reinit();
 }
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index f4d49d8335..b2eaf78069 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -341,6 +341,12 @@ test_expect_success 'partial clone with sparse filter succeeds' '
 	)
 '
 
+test_expect_success '--filter=tree:0 sets fetch.recurseSubmodules=no' '
+	rm -rf dst &&
+	git clone --filter=tree:0 "file://$(pwd)/src" dst &&
+	test_config -C dst fetch.recursesubmodules no
+'
+
 test_expect_success 'partial clone with unresolvable sparse filter fails cleanly' '
 	rm -rf dst.git &&
 	test_must_fail git clone --no-local --bare \

base-commit: faefdd61ec7c7f6f3c8c9907891465ac9a2a1475
-- 
gitgitgadget

             reply	other threads:[~2020-11-20 20:36 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 20:36 Derrick Stolee via GitGitGadget [this message]
2020-11-21  0:04 ` [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no Jeff King
2020-11-23 15:18   ` Derrick Stolee
2020-11-24  8:04     ` Jeff King
2020-11-21 16:19 ` Philippe Blain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.797.git.1605904586929.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.