All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering
Date: Tue, 17 Mar 2026 15:19:00 -0700	[thread overview]
Message-ID: <xmqqms06hzfv.fsf@gitster.g> (raw)
In-Reply-To: <beb1c92554c76907315a4d1a7983226d2bf5a828.1773707361.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Tue, 17 Mar 2026 00:29:21 +0000")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <stolee@gmail.com>
>
> Previously, walk_objects_by_path() silently ignored pathspecs containing
> wildcards or magic by clearing them. This caused all blobs to be
> downloaded regardless of the given pathspec. Wildcard pathspecs like
> "d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
> during 'git backfill').
>
> Support wildcard pathspecs by making three changes:
>
>  1. Add an 'exact_pathspecs' flag to path_walk_context. When the
>     pathspec has no wildcards or magic, set this flag and use the
>     existing fast-path prefix matching in add_tree_entries(). When
>     wildcards are present, skip that block since prefix matching
>     cannot handle glob patterns.
>
>  2. Disable revision-level commit pruning (revs->prune = 0) for
>     wildcard pathspecs. The revision walk uses the pathspec to filter
>     commits via TREESAME detection. For exact prefix pathspecs this
>     works well, but wildcard pathspecs may fail to match through
>     TREESAME because fnmatch with WM_PATHNAME does not cross directory
>     boundaries. Disabling pruning ensures all commits are visited and
>     their trees are available for the path-walk to filter.

Hmph, I wonder how significant an impact does it have on the
performance that we have to disable pruning here.  With the bog
standard tree traversal, wouldn't tree_entry_interesting() already
be capable of doing this, even with fnmatch / WM_PATHNAME ?

>  3. Add a match_pathspec() check in walk_path() to filter out blobs
>     whose full path does not match the pathspec. This provides the
>     actual blob-level filtering for wildcard pathspecs.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The latter person cannot sign DCO or vouch for the origin of what
they have written in this patch, can they?

> ---
>  path-walk.c         | 22 ++++++++++++++--------
>  t/t5620-backfill.sh |  7 +++----
>  2 files changed, 17 insertions(+), 12 deletions(-)

  reply	other threads:[~2026-03-17 22:19 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17  0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17  0:29 ` [PATCH 1/5] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-17 21:52   ` Junio C Hamano
2026-03-17  0:29 ` [PATCH 2/5] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-17  0:29 ` [PATCH 3/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 22:01   ` Junio C Hamano
2026-03-18 15:37   ` Kristoffer Haugsbakk
2026-03-23  0:31     ` Derrick Stolee
2026-03-19  9:54   ` Patrick Steinhardt
2026-03-23  0:35     ` Derrick Stolee
2026-03-17  0:29 ` [PATCH 4/5] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-17 22:10   ` Junio C Hamano
2026-03-18 13:15     ` Derrick Stolee
2026-03-19  9:54       ` Patrick Steinhardt
2026-03-19  9:55   ` Patrick Steinhardt
2026-03-19 10:15   ` Patrick Steinhardt
2026-03-23  0:47     ` Derrick Stolee
2026-03-17  0:29 ` [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-17 22:19   ` Junio C Hamano [this message]
2026-03-18 13:16     ` Derrick Stolee
2026-03-23  1:33       ` Derrick Stolee
2026-03-17 21:45 ` [PATCH 0/5] backfill: accept revision arguments Junio C Hamano
2026-03-19  9:54 ` Patrick Steinhardt
2026-03-19 12:59   ` Derrick Stolee
2026-03-20  7:35     ` Patrick Steinhardt
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-24  7:59     ` Patrick Steinhardt
2026-03-26 12:55       ` Derrick Stolee
2026-03-23 11:40   ` [PATCH v2 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-24  7:59     ` Patrick Steinhardt
2026-03-26 12:58       ` Derrick Stolee
2026-03-23 11:40   ` [PATCH v2 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-23 15:29     ` Junio C Hamano
2026-03-23 20:39       ` Derrick Stolee
2026-03-26 15:14   ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-27  7:07     ` [PATCH v3 0/6] backfill: accept revision arguments Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqms06hzfv.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.