From: "Kevin Lyles via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <stolee@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Kevin Lyles <klyles+github@epic.com>
Subject: [PATCH v4 0/2] builtin/cat-file: mark 'git cat-file' sparse-index compatible
Date: Tue, 03 Sep 2024 22:06:45 +0000 [thread overview]
Message-ID: <pull.1770.v4.git.git.1725401207.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1770.v3.git.git.1725386044.gitgitgadget@gmail.com>
I believe I've addressed all issues raised except the question about how to
exhaustively validate the changes. I was under the impression that the
combination of the existing 'git cat-file' test cases and my new cases were
enough, but I have thus far been unable to get code coverage working locally
to validate that myself, due to spurious failures in other tests. I've
kicked off another run with most unrelated tests removed to see if that
helps.
Please note that this is my first contribution to git. I've tried to follow
the instructions about how to correctly submit a patch (I'm using
GitGitGadget as getting Outlook to do plain text e-mail correctly seems
impossible), but please let me know if I've missed something.
My motivation for making this change is purely performance. We have a large
repository that we enable the sparse index for, and I am developing a
pre-commit hook that (among other things) uses git cat-file to get the
staged contents of certain files. Without this change, getting the contents
of a single small file from the index can take upwards of 10 seconds due to
the index expansion. After this change, it only takes ~0.3 seconds unless
the file is outside of the sparse index.
Kevin Lyles (2):
t1092: allow run_on_* functions to use standard input
builtin/cat-file: mark 'git cat-file' sparse-index compatible
builtin/cat-file.c | 3 ++
t/t1092-sparse-checkout-compatibility.sh | 50 +++++++++++++++++++++---
2 files changed, 48 insertions(+), 5 deletions(-)
base-commit: 4590f2e9412378c61eac95966709c78766d326ba
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1770%2Fklylesatepic%2Fkl%2Fmark-cat-file-sparse-index-compatible-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1770/klylesatepic/kl/mark-cat-file-sparse-index-compatible-v4
Pull-Request: https://github.com/git/git/pull/1770
Range-diff vs v3:
1: b310593aec2 ! 1: 73fe71abcd5 Allow using stdin in run_on_* functions
@@ Metadata
Author: Kevin Lyles <klyles@epic.com>
## Commit message ##
- Allow using stdin in run_on_* functions
+ t1092: allow run_on_* functions to use standard input
- The 'run_on_sparse' and 'run_on_all' functions previously did not work
- correctly for commands accepting standard input. This also indirectly
- affected 'test_all_match' and 'test_sparse_match'.
+ The 'run_on_sparse' and 'run_on_all' functions do not work correctly for
+ commands accepting standard input, because they run the same command
+ multiple times and the first instance consumes it. This also indirectly
+ affects 'test_all_match' and 'test_sparse_match'.
- Now, we accept standard input and will send it to each command that we
- run. This does not impact using the functions for commands that don't
- need standard input.
+ To allow these functions to work with commands accepting standard input,
+ first slurp standard input to a temporary file, and then run the command
+ with its standard input redirected from the temporary file. This ensures
+ that each command sees the same contents from its standard input.
+
+ Note that this does not impact commands that do not read from standard
+ input; they continue to ignore it. Additionally, existing uses of the
+ run_on_* functions do not need to do anything differently, as the
+ standard input of the test environment is already connected to
+ /dev/null.
+
+ We do not explicitly clean up the input files because they are cleaned
+ up with the rest of the test repositories and their contents may be
+ useful for figuring out which command failed when a test case fails.
Signed-off-by: Kevin Lyles <klyles@epic.com>
@@ t/t1092-sparse-checkout-compatibility.sh: init_repos_as_submodules () {
}
run_on_sparse () {
-+ cat >run_on_sparse-input &&
++ cat >run-on-sparse-input &&
+
(
cd sparse-checkout &&
GIT_PROGRESS_DELAY=100000 "$@" >../sparse-checkout-out 2>../sparse-checkout-err
- ) &&
-+ ) <run_on_sparse-input &&
++ ) <run-on-sparse-input &&
(
cd sparse-index &&
GIT_PROGRESS_DELAY=100000 "$@" >../sparse-index-out 2>../sparse-index-err
- )
-+ ) <run_on_sparse-input
++ ) <run-on-sparse-input
}
run_on_all () {
-+ cat >run_on_all-input &&
++ cat >run-on-all-input &&
+
(
cd full-checkout &&
GIT_PROGRESS_DELAY=100000 "$@" >../full-checkout-out 2>../full-checkout-err
- ) &&
- run_on_sparse "$@"
-+ ) <run_on_all-input &&
-+ run_on_sparse "$@" <run_on_all-input
++ ) <run-on-all-input &&
++ run_on_sparse "$@" <run-on-all-input
}
test_all_match () {
2: f4d1461b993 ! 2: ac913257309 Mark 'git cat-file' sparse-index compatible
@@ Metadata
Author: Kevin Lyles <klyles+github@epic.com>
## Commit message ##
- Mark 'git cat-file' sparse-index compatible
+ builtin/cat-file: mark 'git cat-file' sparse-index compatible
This change affects how 'git cat-file' works with the index when
specifying an object with the ":<path>" syntax (which will give file
contents from the index).
- 'git cat-file' will expand a sparse index to a full index when needed,
- but is currently marked as needing a full index (or rather, not marked
- as not needing a full index). This results in much slower 'git cat-file'
- operations when working within the sparse index, since we expand the
- index whether it's needed or not.
+ 'git cat-file' expands a sparse index to a full index any time contents
+ are requested from the index by specifying an object with the ":<path>"
+ syntax. This is true even when the requested file is part of the sparse
+ index, and results in much slower 'git cat-file' operations when working
+ within the sparse index.
Mark 'git cat-file' as not needing a full index, so that you only pay
the cost of expanding the sparse index to a full index when you request
--
gitgitgadget
next prev parent reply other threads:[~2024-09-03 22:06 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-26 18:08 [PATCH] Mark `cat-file` sparse-index compatible Kevin Lyles via GitGitGadget
2024-08-29 1:59 ` Derrick Stolee
2024-08-30 21:10 ` [PATCH v2 0/2] Mark cat-file " Kevin Lyles via GitGitGadget
2024-08-30 21:10 ` [PATCH v2 1/2] Allow using stdin in run_on_* functions Kevin Lyles via GitGitGadget
2024-08-30 21:10 ` [PATCH v2 2/2] Mark 'git cat-file' sparse-index compatible Kevin Lyles via GitGitGadget
2024-09-03 14:17 ` Derrick Stolee
2024-09-03 17:21 ` Junio C Hamano
2024-09-03 17:54 ` [PATCH v3 0/2] " Kevin Lyles via GitGitGadget
2024-09-03 17:54 ` [PATCH v3 1/2] Allow using stdin in run_on_* functions Kevin Lyles via GitGitGadget
2024-09-03 19:11 ` Junio C Hamano
2024-09-03 17:54 ` [PATCH v3 2/2] Mark 'git cat-file' sparse-index compatible Kevin Lyles via GitGitGadget
2024-09-03 19:19 ` Junio C Hamano
2024-09-03 22:06 ` Kevin Lyles via GitGitGadget [this message]
2024-09-03 22:06 ` [PATCH v4 1/2] t1092: allow run_on_* functions to use standard input Kevin Lyles via GitGitGadget
2024-09-04 16:23 ` Junio C Hamano
2024-09-03 22:06 ` [PATCH v4 2/2] builtin/cat-file: mark 'git cat-file' sparse-index compatible Kevin Lyles via GitGitGadget
2024-09-04 16:35 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.1770.v4.git.git.1725401207.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=klyles+github@epic.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).