From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
Taylor Blau <me@ttaylorr.com>,
Derrick Stolee <derrickstolee@github.com>,
Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no
Date: Fri, 20 Nov 2020 20:36:26 +0000 [thread overview]
Message-ID: <pull.797.git.1605904586929.gitgitgadget@gmail.com> (raw)
From: Derrick Stolee <dstolee@microsoft.com>
The partial clone feature has several modes, but only a few are quick
for a server to process using reachability bitmaps:
* Blobless: --filter=blob:none downloads all commits and trees and
fetches necessary blobs on-demand.
* Treeless: --filter=tree:0 downloads all commits and fetches necessary
trees and blobs on demand.
This treeles mode is most similar to a shallow clone in the total size
(it only adds the commit objects for the full history). This makes
treeless clones an interesting replacement for shallow clones. A user
can run more commands in a treeless clone than in a shallow clone,
especially 'git log' (no pathspec).
In particular, servers can still serve 'git fetch' requests quickly by
calculating the difference between commit wants and haves using bitmaps.
I was testing this feature with this in mind, and I knew that some trees
would be downloaded multiple times when checking out a new branch, but I
did not expect to discover a significant issue with 'git fetch', at
least in repostiories with submodules.
I was testing these commands:
$ git clone --filter=tree:0 --single-branch --branch=master \
https://github.com/git/git
$ git -C git fetch origin "+refs/heads/*:refs/remotes/origin/*"
This fetch command started downloading several pack-files of trees
before completing the command. I never let it finish since I got so
impatient with the repeated downloads. During debugging, I found that
the stack triggering promisor_remote_get_direct() was going through
fetch_populated_submodules(). Notice that I did not recurse my
submodules in the original clone, so the sha1collisiondetection
submodule is not initialized. Even so, my 'git fetch' was scanning
commits for updates to submodules.
I decided that even if I did populate the submodules, the nature of
treeless clones makes me not want to care about the contents of commits
other than those that I am explicitly navigating to.
This loop of tree fetches can be avoided by adding
--no-recurse-submodules to the 'git fetch' command or setting
fetch.recurseSubmodules=no.
To make this as painless as possible for future users of treeless
clones, automatically set fetch.recurseSubmodules=no at clone time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
clone: --filter=tree:0 implies fetch.recurseSubmodules=no
While testing different partial clone options, I stumbled across this
one. My initial thought was that we were parsing commits and loading
their root trees unnecessarily, but I see that doesn't happen after this
change.
Here are some recent discussions about using --filter=tree:0:
[1]
https://lore.kernel.org/git/aa7b89ee-08aa-7943-6a00-28dcf344426e@syntevo.com/
[2] https://lore.kernel.org/git/cover.1588633810.git.me@ttaylorr.com/[3]
https://lore.kernel.org/git/58274817-7ac6-b6ae-0d10-22485dfe5e0e@syntevo.com/
Thanks, -Stolee
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-797%2Fderrickstolee%2Ftree-0-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-797/derrickstolee/tree-0-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/797
list-objects-filter-options.c | 4 ++++
t/t5616-partial-clone.sh | 6 ++++++
2 files changed, 10 insertions(+)
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index defd3dfd10..249939dfa5 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -376,6 +376,10 @@ void partial_clone_register(
expand_list_objects_filter_spec(filter_options));
free(filter_name);
+ if (filter_options->choice == LOFC_TREE_DEPTH &&
+ !filter_options->tree_exclude_depth)
+ git_config_set("fetch.recursesubmodules", "no");
+
/* Make sure the config info are reset */
promisor_remote_reinit();
}
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index f4d49d8335..b2eaf78069 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -341,6 +341,12 @@ test_expect_success 'partial clone with sparse filter succeeds' '
)
'
+test_expect_success '--filter=tree:0 sets fetch.recurseSubmodules=no' '
+ rm -rf dst &&
+ git clone --filter=tree:0 "file://$(pwd)/src" dst &&
+ test_config -C dst fetch.recursesubmodules no
+'
+
test_expect_success 'partial clone with unresolvable sparse filter fails cleanly' '
rm -rf dst.git &&
test_must_fail git clone --no-local --bare \
base-commit: faefdd61ec7c7f6f3c8c9907891465ac9a2a1475
--
gitgitgadget
next reply other threads:[~2020-11-20 20:36 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-20 20:36 Derrick Stolee via GitGitGadget [this message]
2020-11-21 0:04 ` [PATCH] clone: --filter=tree:0 implies fetch.recurseSubmodules=no Jeff King
2020-11-23 15:18 ` Derrick Stolee
2020-11-24 8:04 ` Jeff King
2020-11-21 16:19 ` Philippe Blain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.797.git.1605904586929.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=derrickstolee@github.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.