All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] ls-files: filter pathspec before lstat
@ 2026-06-09  2:37 Tamir Duberstein
  2026-06-09  3:26 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Tamir Duberstein @ 2026-06-09  2:37 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Patrick Steinhardt, Junio C Hamano, Jeff King,
	Tamir Duberstein

show_files() checks whether each index entry is deleted or modified
before show_ce() applies the pathspec. prune_index() avoids most of this
work for pathspecs with a common directory prefix, but a top-level name
or leading wildcard leaves every entry to be checked.

For a single pathspec, match it before lstat() in the deleted and
modified modes. Keep the later match in show_ce() so --error-unmatch is
satisfied only by entries that are actually shown.

match_pathspec() is linear in the number of pathspec items. Applying it
early for every item can therefore multiply the work for commands with
many pathspecs, especially when lstat() shows that no entries are
modified. Restrict the early check to one pathspec. Callers with
multiple pathspecs retain the existing lstat()-first order.

On a repository with 859,211 index entries, a 19,931,862-byte index,
and 25,303,439 packed objects occupying 21.13 GiB, I exported $parent
and $this to binaries built from the parent and this commit, then ran:

    hyperfine --warmup 0 --runs 3 \
        --command-name parent \
        '$parent -c core.fsmonitor=false ls-files --deleted -- README.md' \
        --command-name 'this commit' \
        '$this -c core.fsmonitor=false ls-files --deleted -- README.md'

The results were:

             parent       this commit
  elapsed    60.742 s     1.061 s
  user        1.117 s     0.963 s
  system     10.740 s     0.042 s

For an all-matching pathspec, I used a checkout with 859,940 index
entries and ran:

    hyperfine --warmup 0 --runs 3 \
        --command-name parent \
        '$parent -c core.fsmonitor=false ls-files --deleted -- "*"' \
        --command-name 'this commit' \
        '$this -c core.fsmonitor=false ls-files --deleted -- "*"'

I repeated the benchmark with the commands reversed. The results were:

                         parent          this commit
  parent first elapsed    56.807 s        64.618 s
               user        1.256 s         1.270 s
               system     10.633 s        11.068 s
  patched first elapsed   63.361 s        64.316 s
                user       1.238 s         1.280 s
                system    10.296 s        11.864 s

The patched user-time means were 14 ms and 42 ms higher in the two
orderings. Elapsed time changed by several seconds when the order was
reversed, so those results do not show a stable wall-time ordering.

Jeff King pointed out that a preliminary match for each of many literal
pathspecs can be much more expensive. On a generated repository with
10,000 clean files, I recorded the paths with "git ls-files >paths".
With $v1 exported to a binary built from the implementation sent in v1,
I ran:

    hyperfine --warmup 2 --runs 10 \
        --command-name parent \
        '$parent ls-files -m -- $(cat paths) >/dev/null' \
        --command-name 'this commit' \
        '$this ls-files -m -- $(cat paths) >/dev/null'

I replaced $this with $v1 in a second invocation. The wall-clock means
and standard deviations were:

                         mean          standard deviation
  parent, final run     110.1 ms              4.1 ms
  this commit           104.9 ms              2.2 ms
  parent, v1 run        112.5 ms              6.6 ms
  unguarded v1          494.1 ms             17.2 ms

The guarded result matches the parent within the observed variation,
while avoiding the regression in v1.

All three revisions were built with -O3, -mcpu=native, and ThinLTO
using Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
(Mac16,6) with a 16-core Apple M4 Max (12 performance and four
efficiency cores) and 128 GB RAM.

Link: https://lore.kernel.org/r/20260607-ls-files-pathspec-lstat-v1-1-8cf40b730146@gmail.com
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
---
A selective pathspec should let ls-files --deleted and --modified avoid
statting entries that cannot be shown. Match a single pathspec before
accessing the worktree, while preserving the existing lstat-first order
for multiple pathspecs whose matching cost grows linearly.
---
Changes in v2:
- Restrict early matching to one pathspec, avoiding the regression Jeff
  demonstrated with many pathspecs.
- Add all-matching and many-pathspec performance results.
- Drop the Assisted-by trailer.
- Link to v1: https://patch.msgid.link/20260607-ls-files-pathspec-lstat-v1-1-8cf40b730146@gmail.com
---
 builtin/ls-files.c                  | 11 +++++++++++
 t/meson.build                       |  1 +
 t/perf/p3010-ls-files.sh            | 31 +++++++++++++++++++++++++++++++
 t/t3010-ls-files-killed-modified.sh | 18 ++++++++++++++++++
 4 files changed, 61 insertions(+)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index e1a22b41b9..8d7158652b 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -450,6 +450,17 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			continue;
 		if (ce_skip_worktree(ce))
 			continue;
+		/*
+		 * match_pathspec() is linear in pathspec.nr, so prefilter only
+		 * the single-pathspec case. Only entries shown by show_ce()
+		 * satisfy --error-unmatch.
+		 */
+		if (pathspec.nr == 1 &&
+		    !match_pathspec(repo->index, &pathspec, fullname.buf,
+				    fullname.len, max_prefix_len, NULL,
+				    S_ISDIR(ce->ce_mode) ||
+				    S_ISGITLINK(ce->ce_mode)))
+			continue;
 		stat_err = lstat(fullname.buf, &st);
 		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
 			error_errno("cannot lstat '%s'", fullname.buf);
diff --git a/t/meson.build b/t/meson.build
index 2af8d01279..ee8086e6ef 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -1140,6 +1140,7 @@ benchmarks = [
   'perf/p1500-graph-walks.sh',
   'perf/p1501-rev-parse-oneline.sh',
   'perf/p2000-sparse-operations.sh',
+  'perf/p3010-ls-files.sh',
   'perf/p3400-rebase.sh',
   'perf/p3404-rebase-interactive.sh',
   'perf/p4000-diff-algorithms.sh',
diff --git a/t/perf/p3010-ls-files.sh b/t/perf/p3010-ls-files.sh
new file mode 100755
index 0000000000..ae14449432
--- /dev/null
+++ b/t/perf/p3010-ls-files.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+test_description='Tests ls-files worktree performance'
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+test_expect_success 'select a zero-prefix pathspec' '
+	tracked_file=$(git ls-files | sed -n 1p) &&
+	test -n "$tracked_file" &&
+	pathspec="?${tracked_file#?}" &&
+	test_export pathspec
+'
+
+test_perf 'ls-files --deleted with pathspec' '
+	git -c core.fsmonitor=false ls-files --deleted \
+		-- "$pathspec" >/dev/null
+'
+
+test_perf 'ls-files --deleted with all-matching pathspec' '
+	git -c core.fsmonitor=false ls-files --deleted -- "*" >/dev/null
+'
+
+test_perf 'ls-files --modified with pathspec' '
+	git -c core.fsmonitor=false ls-files --modified \
+		-- "$pathspec" >/dev/null
+'
+
+test_done
diff --git a/t/t3010-ls-files-killed-modified.sh b/t/t3010-ls-files-killed-modified.sh
index 7af4532cd1..6e38e10219 100755
--- a/t/t3010-ls-files-killed-modified.sh
+++ b/t/t3010-ls-files-killed-modified.sh
@@ -124,4 +124,22 @@ test_expect_success 'validate git ls-files -m output.' '
 	test_cmp .expected .output
 '
 
+test_expect_success 'worktree modes honor wildcard pathspecs' '
+	cat >.expected <<-\EOF &&
+	path2/file2
+	path3/file3
+	EOF
+	git ls-files --deleted -- "path?/file?" >.output &&
+	test_cmp .expected .output &&
+
+	cat >.expected <<-\EOF &&
+	path7
+	path8
+	EOF
+	git ls-files --modified --error-unmatch -- "path[78]" >.output &&
+	test_cmp .expected .output &&
+
+	test_must_fail git ls-files --modified --error-unmatch -- path10
+'
+
 test_done

---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260607-ls-files-pathspec-lstat-885125a5d644

Best regards,
--  
Tamir Duberstein <tamird@gmail.com>


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-09  3:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-09  2:37 [PATCH v2] ls-files: filter pathspec before lstat Tamir Duberstein
2026-06-09  3:26 ` Junio C Hamano
2026-06-09  3:38   ` Tamir Duberstein
2026-06-09  3:42   ` Junio C Hamano
2026-06-09  3:48     ` Tamir Duberstein

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.