Git development
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Tamir Duberstein <tamird@gmail.com>
Cc: git@vger.kernel.org, "René Scharfe" <l.s.r@web.de>,
	"Patrick Steinhardt" <ps@pks.im>
Subject: Re: [PATCH] ls-files: filter pathspec before lstat
Date: Mon, 08 Jun 2026 06:06:33 -0700	[thread overview]
Message-ID: <xmqqa4t5yyee.fsf@gitster.g> (raw)
In-Reply-To: <20260607-ls-files-pathspec-lstat-v1-1-8cf40b730146@gmail.com> (Tamir Duberstein's message of "Sun, 07 Jun 2026 11:40:56 -0400")

On Sun, Jun 7, 2026 at 11:40, Tamir Duberstein wrote:
> show_files() checks whether each index entry is deleted or modified
> before show_ce() applies the pathspec. prune_index() avoids most of this
> work for pathspecs with a common directory prefix, but a top-level name
> or leading wildcard leaves every entry to be checked.
> 
> Match the pathspec before lstat() for the deleted and modified modes.
> Keep the later match in show_ce() so --error-unmatch is satisfied only
> by entries that are actually shown.

Adding an extra early `match_pathspec()` check before making slow
system calls like `lstat()` makes sense, especially when most of the
index entries need to be skipped.  But if most of them would match,
then we would end up doing the same match_pathspec() calls twice for
each path, and run lstat() anyway, so you may also be able to
construct a perf test that demonstrates a case where this approach
is not a clear win (or even degradation), perhaps?

> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index e1a22b41b9..702c607183 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -450,6 +450,13 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
>  			continue;
>  		if (ce_skip_worktree(ce))
>  			continue;
> +		/* Only entries shown by show_ce() satisfy --error-unmatch. */
> +		if (pathspec.nr &&
> +		    !match_pathspec(repo->index, &pathspec, fullname.buf,
> +				    fullname.len, max_prefix_len, NULL,
> +				    S_ISDIR(ce->ce_mode) ||
> +				    S_ISGITLINK(ce->ce_mode)))
> +			continue;
>  		stat_err = lstat(fullname.buf, &st);
>  		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
>  			error_errno("cannot lstat '%s'", fullname.buf);

Hmph.  In the current code, because there is no such pre-filtering,
show_ce() would unconditionally recurse into active submodules when
told to with the "--recurse-submodules" flag, even if your pathspec
coes not match the submodule.  With this change, such a submodule
whose path does not match the pathspec would not even be seen by
show_ce().  Would it cause a change in behaviour?

  parent reply	other threads:[~2026-06-08 13:06 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-07 15:40 [PATCH] ls-files: filter pathspec before lstat Tamir Duberstein
2026-06-07 16:02 ` Kristoffer Haugsbakk
2026-06-07 16:07   ` Tamir Duberstein
2026-06-07 16:17     ` Kristoffer Haugsbakk
2026-06-07 18:24       ` Tamir Duberstein
2026-06-08 13:06 ` Junio C Hamano [this message]
2026-06-08 19:15   ` Tamir Duberstein
2026-06-08 23:03   ` Jeff King
2026-06-08 23:25     ` Jeff King
2026-06-09  0:13       ` Tamir Duberstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqa4t5yyee.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=ps@pks.im \
    --cc=tamird@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox