From: "Kevin Lyles via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Kevin Lyles <klyles+github@epic.com>,
Kevin Lyles <klyles+github@epic.com>
Subject: [PATCH] Mark `cat-file` sparse-index compatible
Date: Mon, 26 Aug 2024 18:08:52 +0000 [thread overview]
Message-ID: <pull.1770.git.git.1724695732305.gitgitgadget@gmail.com> (raw)
From: Kevin Lyles <klyles+github@epic.com>
`cat-file` will expand a sparse index to a full index when needed, but
is currently marked as needing a full index (or rather, not marked as
not needing a full index). This results in much slower `cat-file`
operations when working within the sparse index, since we expand the
index whether it's needed or not.
Mark `cat-file` as not needing a full index, so that you only pay the
cost of expanding the sparse index to a full index when you request a
file outside of the sparse index.
Add tests to ensure both that:
- `cat-file` returns the correct file contents whether or not the file
is in the sparse index
- `cat-file` warns about expanding to the full index any time you
request something outside of the sparse index
Signed-off-by: Kevin Lyles <klyles+github@epic.com>
---
Mark cat-file sparse-index compatible
Please note that this is my first contribution to git. I've tried to
follow the instructions about how to correctly submit a patch (I'm using
GitGitGadget as getting Outlook to do plain text e-mail correctly seems
impossible), but please let me know if I've missed something.
I've worded the commit message itself as though I'm definitely correct
about how cat-file behaves, since I assume we want the final commit
message to be definite. However, this change felt a little too easy and
I can't help but feel that I might have missed something. So, even
though this is just one commit, I'm also including this cover letter
going into more detail about the parts that don't need to be part of the
commit history.
My motivation for making this change is purely performance. We have a
large repository that we enable the sparse index for, and I am
developing a pre-commit hook that (among other things) uses git cat-file
to get the staged contents of certain files. Without this change,
getting the contents of a single small file from the index can take
upwards of 10 seconds due to the index expansion. After this change, it
only takes ~0.3 seconds unless the file is outside of the sparse index.
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1770%2Fklylesatepic%2Fkl%2Fmark-cat-file-sparse-index-compatible-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1770/klylesatepic/kl/mark-cat-file-sparse-index-compatible-v1
Pull-Request: https://github.com/git/git/pull/1770
builtin/cat-file.c | 3 +
t/t1092-sparse-checkout-compatibility.sh | 71 ++++++++++++++++++++++--
2 files changed, 69 insertions(+), 5 deletions(-)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 18fe58d6b8b..1afdfb5cbae 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -1047,6 +1047,9 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix)
if (batch.buffer_output < 0)
batch.buffer_output = batch.all_objects;
+ prepare_repo_settings(the_repository);
+ the_repository->settings.command_requires_full_index = 0;
+
/* Return early if we're in batch mode? */
if (batch.enabled) {
if (opt_cw)
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index a2c0e1b4dcc..0f36246ae84 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -179,22 +179,26 @@ init_repos_as_submodules () {
}
run_on_sparse () {
+ tee run_on_sparse-checkout run_on_sparse-index &&
+
(
cd sparse-checkout &&
GIT_PROGRESS_DELAY=100000 "$@" >../sparse-checkout-out 2>../sparse-checkout-err
- ) &&
+ ) <run_on_sparse-checkout &&
(
cd sparse-index &&
GIT_PROGRESS_DELAY=100000 "$@" >../sparse-index-out 2>../sparse-index-err
- )
+ ) <run_on_sparse-index
}
run_on_all () {
+ tee run_on_all-full run_on_all-sparse &&
+
(
cd full-checkout &&
GIT_PROGRESS_DELAY=100000 "$@" >../full-checkout-out 2>../full-checkout-err
- ) &&
- run_on_sparse "$@"
+ ) <run_on_all-full &&
+ run_on_sparse "$@" <run_on_all-sparse
}
test_all_match () {
@@ -221,7 +225,7 @@ test_sparse_unstaged () {
done
}
-# Usage: test_sprase_checkout_set "<c1> ... <cN>" "<s1> ... <sM>"
+# Usage: test_sparse_checkout_set "<c1> ... <cN>" "<s1> ... <sM>"
# Verifies that "git sparse-checkout set <c1> ... <cN>" succeeds and
# leaves the sparse index in a state where <s1> ... <sM> are sparse
# directories (and <c1> ... <cN> are not).
@@ -2345,4 +2349,61 @@ test_expect_success 'advice.sparseIndexExpanded' '
grep "The sparse index is expanding to a full index" err
'
+test_expect_success 'cat-file -p' '
+ init_repos &&
+ echo "new content" >>full-checkout/deep/a &&
+ echo "new content" >>sparse-checkout/deep/a &&
+ echo "new content" >>sparse-index/deep/a &&
+ run_on_all git add deep/a &&
+
+ test_all_match git cat-file -p HEAD:deep/a &&
+ ensure_not_expanded cat-file -p HEAD:deep/a &&
+ test_all_match git cat-file -p HEAD:folder1/a &&
+ ensure_not_expanded cat-file -p HEAD:folder1/a &&
+
+ test_all_match git cat-file -p :deep/a &&
+ ensure_not_expanded cat-file -p :deep/a &&
+ run_on_all git cat-file -p :folder1/a &&
+ test_cmp full-checkout-out sparse-checkout-out &&
+ test_cmp full-checkout-out sparse-index-out &&
+ test_cmp full-checkout-err sparse-checkout-err &&
+ ensure_expanded cat-file -p :folder1/a'
+
+test_expect_success 'cat-file --batch' '
+ init_repos &&
+ echo "new content" >>full-checkout/deep/a &&
+ echo "new content" >>sparse-checkout/deep/a &&
+ echo "new content" >>sparse-index/deep/a &&
+ run_on_all git add deep/a &&
+
+ cat <<-\EOF | test_all_match git cat-file --batch &&
+ HEAD:deep/a
+ :deep/a
+ EOF
+ cat <<-\EOF | ensure_not_expanded cat-file --batch &&
+ HEAD:deep/a
+ :deep/a
+ EOF
+
+ echo "HEAD:folder1/a" | test_all_match git cat-file --batch &&
+ echo "HEAD:folder1/a" | ensure_not_expanded cat-file --batch &&
+
+ echo ":folder1/a" | run_on_all git cat-file --batch &&
+ test_cmp full-checkout-out sparse-checkout-out &&
+ test_cmp full-checkout-out sparse-index-out &&
+ test_cmp full-checkout-err sparse-checkout-err &&
+ echo ":folder1/a" | ensure_expanded cat-file --batch &&
+
+ cat <<-\EOF | run_on_all git cat-file --batch &&
+ :deep/a
+ :folder1/a
+ EOF
+ test_cmp full-checkout-out sparse-checkout-out &&
+ test_cmp full-checkout-out sparse-index-out &&
+ test_cmp full-checkout-err sparse-checkout-err &&
+ cat <<-\EOF | ensure_expanded cat-file --batch
+ :deep/a
+ :folder1/a
+ EOF'
+
test_done
base-commit: 80ccd8a2602820fdf896a8e8894305225f86f61d
--
gitgitgadget
next reply other threads:[~2024-08-26 18:08 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-26 18:08 Kevin Lyles via GitGitGadget [this message]
2024-08-29 1:59 ` [PATCH] Mark `cat-file` sparse-index compatible Derrick Stolee
2024-08-30 21:10 ` [PATCH v2 0/2] Mark cat-file " Kevin Lyles via GitGitGadget
2024-08-30 21:10 ` [PATCH v2 1/2] Allow using stdin in run_on_* functions Kevin Lyles via GitGitGadget
2024-08-30 21:10 ` [PATCH v2 2/2] Mark 'git cat-file' sparse-index compatible Kevin Lyles via GitGitGadget
2024-09-03 14:17 ` Derrick Stolee
2024-09-03 17:21 ` Junio C Hamano
2024-09-03 17:54 ` [PATCH v3 0/2] " Kevin Lyles via GitGitGadget
2024-09-03 17:54 ` [PATCH v3 1/2] Allow using stdin in run_on_* functions Kevin Lyles via GitGitGadget
2024-09-03 19:11 ` Junio C Hamano
2024-09-03 17:54 ` [PATCH v3 2/2] Mark 'git cat-file' sparse-index compatible Kevin Lyles via GitGitGadget
2024-09-03 19:19 ` Junio C Hamano
2024-09-03 22:06 ` [PATCH v4 0/2] builtin/cat-file: mark " Kevin Lyles via GitGitGadget
2024-09-03 22:06 ` [PATCH v4 1/2] t1092: allow run_on_* functions to use standard input Kevin Lyles via GitGitGadget
2024-09-04 16:23 ` Junio C Hamano
2024-09-03 22:06 ` [PATCH v4 2/2] builtin/cat-file: mark 'git cat-file' sparse-index compatible Kevin Lyles via GitGitGadget
2024-09-04 16:35 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1770.git.git.1724695732305.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=klyles+github@epic.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).