* [PATCH] ls-files: conditionally leave index sparse
@ 2025-08-15 16:12 Derrick Stolee via GitGitGadget
2025-08-26 16:40 ` Derrick Stolee
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-08-15 16:12 UTC (permalink / raw)
To: git; +Cc: gitster, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
When running 'git ls-files' with a pathspec, the index entries get
filtered according to that pathspec before iterating over them in
show_files(). In 78087097b8 (ls-files: add --sparse option,
2021-12-22), this iteration was prefixed with a check for the '--sparse'
option which allows the command to output directory entries; this
created a pre-loop call to ensure_full_index().
However, when a user runs 'git ls-files' where the pathspec matches
directories that are recursively matched in the sparse-checkout, there
are not any sparse directories that match the pathspec so they would not
be written to the output. The expansion in this case is just a
performance drop for no behavior difference.
Replace this global check to expand the index with a check inside the
loop for a matched sparse directory. If we see one, then expand the
index and continue from the current location. This is safe since the
previous entries in the index did not have any sparse directories and
thus would remain stable in this expansion.
A test in t1092 confirms that this changes the behavior.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
ls-files: conditionally leave index sparse
Here's a small sparse index performance update based on a user report.
Thanks, -Stolee
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1955%2Fderrickstolee%2Fls-files-sparse-index-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1955/derrickstolee/ls-files-sparse-index-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1955
builtin/ls-files.c | 13 ++++++++++---
t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++++++
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index c06a6f33e41..b148607f7a1 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -414,14 +414,21 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
if (!(show_cached || show_stage || show_deleted || show_modified))
return;
- if (!show_sparse_dirs)
- ensure_full_index(repo->index);
-
for (i = 0; i < repo->index->cache_nr; i++) {
const struct cache_entry *ce = repo->index->cache[i];
struct stat st;
int stat_err;
+ if (S_ISSPARSEDIR(ce->ce_mode) && !show_sparse_dirs) {
+ /*
+ * This is the first time we've hit a sparse dir,
+ * so expansion will leave the first 'i' entries
+ * alone.
+ */
+ ensure_full_index(repo->index);
+ ce = repo->index->cache[i];
+ }
+
construct_fullname(&fullname, repo, ce);
if ((dir->flags & DIR_SHOW_IGNORED) &&
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index d8101139b40..b0f691c151a 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -1506,6 +1506,8 @@ test_expect_success 'sparse-index is not expanded' '
ensure_not_expanded reset --hard &&
ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
+ ensure_not_expanded ls-files deep/deeper1 &&
+
echo >>sparse-index/README.md &&
ensure_not_expanded add -A &&
echo >>sparse-index/extra.txt &&
@@ -1607,6 +1609,17 @@ test_expect_success 'describe tested on all' '
test_all_match git describe --dirty
'
+test_expect_success 'ls-files filtering and expansion' '
+ init_repos &&
+
+ # This filtering will hit a sparse directory midway
+ # through the iteration.
+ test_all_match git ls-files deep &&
+
+ # This pathspec will filter the index to only a sparse
+ # directory.
+ test_all_match git ls-files folder1
+'
test_expect_success 'sparse-index is not expanded: describe' '
init_repos &&
base-commit: 724518f3884d8707c5f51428ba98c115818229b8
--
gitgitgadget
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] ls-files: conditionally leave index sparse
2025-08-15 16:12 [PATCH] ls-files: conditionally leave index sparse Derrick Stolee via GitGitGadget
@ 2025-08-26 16:40 ` Derrick Stolee
2025-08-27 19:36 ` Derrick Stolee
2025-08-26 18:37 ` Elijah Newren
2025-08-26 19:00 ` Junio C Hamano
2 siblings, 1 reply; 6+ messages in thread
From: Derrick Stolee @ 2025-08-26 16:40 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget, git; +Cc: gitster, Elijah Newren
On 8/15/2025 12:12 PM, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
...
> Replace this global check to expand the index with a check inside the
> loop for a matched sparse directory. If we see one, then expand the
> index and continue from the current location. This is safe since the
> previous entries in the index did not have any sparse directories and
> thus would remain stable in this expansion.
...> Here's a small sparse index performance update based on a user report.
I know this is small and somewhat niche, but it hasn't had any review
or been picked up in What's Cooking. Could someone please take a look?
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ls-files: conditionally leave index sparse
2025-08-26 16:40 ` Derrick Stolee
@ 2025-08-27 19:36 ` Derrick Stolee
2025-08-28 15:02 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Derrick Stolee @ 2025-08-27 19:36 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget, git; +Cc: gitster, Elijah Newren
On 8/26/2025 12:40 PM, Derrick Stolee wrote:
> On 8/15/2025 12:12 PM, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
> ...
>> Replace this global check to expand the index with a check inside the
>> loop for a matched sparse directory. If we see one, then expand the
>> index and continue from the current location. This is safe since the
>> previous entries in the index did not have any sparse directories and
>> thus would remain stable in this expansion.
> ...> Here's a small sparse index performance update based on a user report.
>
> I know this is small and somewhat niche, but it hasn't had any review
> or been picked up in What's Cooking. Could someone please take a look?
Thanks, Elijah and Junio for reviewing.
By coincidence, a user reported an issue where the sparse index was
expanded during "git mergetool" and it was due to a pathspec-focused
ls-files subcommand. So maybe it's less niche than I had thought.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ls-files: conditionally leave index sparse
2025-08-27 19:36 ` Derrick Stolee
@ 2025-08-28 15:02 ` Junio C Hamano
0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2025-08-28 15:02 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git, Elijah Newren
Derrick Stolee <stolee@gmail.com> writes:
> On 8/26/2025 12:40 PM, Derrick Stolee wrote:
>> On 8/15/2025 12:12 PM, Derrick Stolee via GitGitGadget wrote:
>>> From: Derrick Stolee <stolee@gmail.com>
>> ...
>>> Replace this global check to expand the index with a check inside the
>>> loop for a matched sparse directory. If we see one, then expand the
>>> index and continue from the current location. This is safe since the
>>> previous entries in the index did not have any sparse directories and
>>> thus would remain stable in this expansion.
>> ...> Here's a small sparse index performance update based on a user report.
>>
>> I know this is small and somewhat niche, but it hasn't had any review
>> or been picked up in What's Cooking. Could someone please take a look?
>
> Thanks, Elijah and Junio for reviewing.
>
> By coincidence, a user reported an issue where the sparse index was
> expanded during "git mergetool" and it was due to a pathspec-focused
> ls-files subcommand. So maybe it's less niche than I had thought.
I do not speak for others who did not comment on the thread, but the
reason I did not act on the message was not because I found it niche
or insignificant. I simply missed the single message in the sea of
other threads.
Queued. Thanks for pinging.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] ls-files: conditionally leave index sparse
2025-08-15 16:12 [PATCH] ls-files: conditionally leave index sparse Derrick Stolee via GitGitGadget
2025-08-26 16:40 ` Derrick Stolee
@ 2025-08-26 18:37 ` Elijah Newren
2025-08-26 19:00 ` Junio C Hamano
2 siblings, 0 replies; 6+ messages in thread
From: Elijah Newren @ 2025-08-26 18:37 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee
On Fri, Aug 15, 2025 at 9:13 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <stolee@gmail.com>
>
> When running 'git ls-files' with a pathspec, the index entries get
> filtered according to that pathspec before iterating over them in
When I first read this patch, I missed this part of your commit
message and figured there was no possible way your patch could
actually speed things up. I verified with your testcase that it
worked, though, and had to step through a debugger to find out what I
was missing. It's the prune_index() call in cmd_ls_files() that does
this -- but only when the pathspecs provided have some common prefix.
So, it's not unique to when there's a single pathspec as your commit
message claims, and the pointer to prune_index() may have helped save
me some head-scratching in review the patch.
Perhaps this could be clarified here (and made more explicit for folks
like me that gloss over it), something like
When running 'git ls-files' with pathspecs with a common prefix, the
index entries get
filtered according to that common prefix in prune_index() before
iterating over them in show_files().
> show_files(). In 78087097b8 (ls-files: add --sparse option,
> 2021-12-22), this iteration was prefixed with a check for the '--sparse'
> option which allows the command to output directory entries; this
> created a pre-loop call to ensure_full_index().
>
> However, when a user runs 'git ls-files' where the pathspec matches
> directories that are recursively matched in the sparse-checkout, there
> are not any sparse directories that match the pathspec so they would not
> be written to the output. The expansion in this case is just a
> performance drop for no behavior difference.
>
> Replace this global check to expand the index with a check inside the
> loop for a matched sparse directory. If we see one, then expand the
> index and continue from the current location. This is safe since the
> previous entries in the index did not have any sparse directories and
> thus would remain stable in this expansion.
>
> A test in t1092 confirms that this changes the behavior.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> ls-files: conditionally leave index sparse
>
> Here's a small sparse index performance update based on a user report.
>
> Thanks, -Stolee
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1955%2Fderrickstolee%2Fls-files-sparse-index-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1955/derrickstolee/ls-files-sparse-index-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1955
>
> builtin/ls-files.c | 13 ++++++++++---
> t/t1092-sparse-checkout-compatibility.sh | 13 +++++++++++++
> 2 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index c06a6f33e41..b148607f7a1 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -414,14 +414,21 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> if (!(show_cached || show_stage || show_deleted || show_modified))
> return;
>
> - if (!show_sparse_dirs)
> - ensure_full_index(repo->index);
> -
> for (i = 0; i < repo->index->cache_nr; i++) {
> const struct cache_entry *ce = repo->index->cache[i];
> struct stat st;
> int stat_err;
>
> + if (S_ISSPARSEDIR(ce->ce_mode) && !show_sparse_dirs) {
> + /*
> + * This is the first time we've hit a sparse dir,
> + * so expansion will leave the first 'i' entries
> + * alone.
> + */
> + ensure_full_index(repo->index);
> + ce = repo->index->cache[i];
> + }
I see how this is safe. I didn't understand how it helped performance
until I figured out by stepping through that repo->indexc->cache_nr is
much less than I expected, because of the prune_index() call that
happened earlier.
> construct_fullname(&fullname, repo, ce);
>
> if ((dir->flags & DIR_SHOW_IGNORED) &&
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index d8101139b40..b0f691c151a 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -1506,6 +1506,8 @@ test_expect_success 'sparse-index is not expanded' '
> ensure_not_expanded reset --hard &&
> ensure_not_expanded restore -s rename-out-to-out -- deep/deeper1 &&
>
> + ensure_not_expanded ls-files deep/deeper1 &&
> +
Thanks, this testcase is exactly what I needed to figure out what I
was misunderstanding.
> echo >>sparse-index/README.md &&
> ensure_not_expanded add -A &&
> echo >>sparse-index/extra.txt &&
> @@ -1607,6 +1609,17 @@ test_expect_success 'describe tested on all' '
> test_all_match git describe --dirty
> '
>
> +test_expect_success 'ls-files filtering and expansion' '
> + init_repos &&
> +
> + # This filtering will hit a sparse directory midway
> + # through the iteration.
> + test_all_match git ls-files deep &&
> +
> + # This pathspec will filter the index to only a sparse
> + # directory.
> + test_all_match git ls-files folder1
> +'
Looks good.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] ls-files: conditionally leave index sparse
2025-08-15 16:12 [PATCH] ls-files: conditionally leave index sparse Derrick Stolee via GitGitGadget
2025-08-26 16:40 ` Derrick Stolee
2025-08-26 18:37 ` Elijah Newren
@ 2025-08-26 19:00 ` Junio C Hamano
2 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2025-08-26 19:00 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> for (i = 0; i < repo->index->cache_nr; i++) {
> const struct cache_entry *ce = repo->index->cache[i];
> struct stat st;
> int stat_err;
>
> + if (S_ISSPARSEDIR(ce->ce_mode) && !show_sparse_dirs) {
> + /*
> + * This is the first time we've hit a sparse dir,
> + * so expansion will leave the first 'i' entries
> + * alone.
> + */
In other words,
(1) we know that the original index entries are sorted
(2) we are looking at a single directory entry that is sparse, say
"D/", and ensure_full_index() will expand it (and other later
entries in the current index).
(3) we assume that the contents of "D/" will never sort before the
original location where "D/" used to sit, iow, we do not have
to rewind to the beginning of index->cache[] array and skip
what we have already processed.
Having bitten by the index sort order number of times, I just wanted
to make sure everybody's assumption is on the same page.
> + ensure_full_index(repo->index);
> + ce = repo->index->cache[i];
and there is no need to say "again" (redo this round of the loop)
here, as grabbing ce was the only thing the loop did, and we just
replaced the entry for the originally folded "D/" with one for the
first subpath in "D/". Sounds sensible.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-28 15:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15 16:12 [PATCH] ls-files: conditionally leave index sparse Derrick Stolee via GitGitGadget
2025-08-26 16:40 ` Derrick Stolee
2025-08-27 19:36 ` Derrick Stolee
2025-08-28 15:02 ` Junio C Hamano
2025-08-26 18:37 ` Elijah Newren
2025-08-26 19:00 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).