From: Patrick Steinhardt <ps@pks.im>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Justin Tobler <jltobler@gmail.com>
Subject: Re: [PATCH 2/3] within_depth: fix return for empty path
Date: Wed, 30 Jul 2025 08:20:44 +0200 [thread overview]
Message-ID: <aIm5vMeTLSLD-6Fz@pks.im> (raw)
In-Reply-To: <20250729-toon-max-depth-v1-2-c177e39c40fb@iotcl.com>
On Tue, Jul 29, 2025 at 08:57:43PM +0200, Toon Claes wrote:
> From: Jeff King <peff@peff.net>
>
> The within_depth() function is used to check whether pathspecs limited
> by a max-depth parameter are acceptable. It takes a path to check, a
> maximum depth, and a "base" depth. It counts the components in the
> path (by counting slashes), adds them to the base, and compare them to
s/compare/&s/
> the maximum.
>
> However, if the base does not have any slashes at all, we always return
> `true`. If the base depth is 0, then this is correct; no matter what the
> maximum is, we are always within it. However, if the base depth is
> greater than 0, then we might return an erroneous result.
>
> This ends up not causing any user-visible bugs in the current code. The
> call sites in dir.c always pass a base depth of 0, so are unaffected.
> But tree_entry_interesting() uses this function differently: it will
> pass the prefix of the current entry, along with a `1` if the entry is a
> directory, in essence checking whether items inside the entry would be
> of interest. It turns out not to make a difference in behavior, but the
> reasoning is complex.
>
> Given a tree like:
>
> file
> a/file
> a/b/file
>
> walking the tree and calling tree_entry_interesting() will yield the
> following results:
>
> (with max_depth=0):
> file: yes
> a: yes
> a/file: no
> a/b: no
>
> (with max_depth=1):
> file: yes
> a: yes
> a/file: yes
> a/b: no
>
> So we have inconsistent behavior in considering directories interesting.
> If they are at the edge of our depth but at the root, we will recurse
> into them, but then find all of their entries uninteresting (e.g., in
> the first case, we will look at "a" but find "a/*" uninteresting). But
> if they are at the edge of our depth and not at the root, then we will
> not recurse (in the second example, we do not even bother entering
> "a/b").
>
> This turns out not to matter because the only caller which uses
> max-depth pathspecs is cmd_grep(), which only cares about blob entries.
> From its perspective, it is exactly the same to not recurse into a
> subtree, or to recurse and find that it contains no matching entries.
> Not recursing is merely an optimization.
Okay, well-explained.
> It is debatable whether tree_entry_interesting() should consider such an
> entry interesting. The only caller does not care if it sees the tree
> itself, and can benefit from the optimization. But if we add a
> "max-depth" limiter to regular diffs, then a diff with
> DIFF_OPT_TREE_IN_RECURSIVE would probably want to show the tree itself,
> but not what it contains.
I haven't yet read the cover letter, but I guess that the scenario where
we'll care about this is in git-last-modified(1) if we want to teach
that command a `--max-depth` parameter?
> This patch just fixes within_depth(), which means we consider such
> entries uninteresting (and makes the current caller happy). If we want
> to change that in the future, then this fix is still the correct first
> step, as the current behavior is simply inconsistent.
So... do we end up with this now?
(with max_depth=1):
file: yes
a: yes
a/file: no
a/b: no
I think it would be nice to include that in the message to show the
change in behaviour at a glance.
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
> dir.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index 02873f59ea..900ee5e559 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -277,7 +277,7 @@ int within_depth(const char *name, int namelen,
> if (depth > max_depth)
> return 0;
> }
> - return 1;
> + return depth <= max_depth;
> }
Just for my own understanding: the only difference is when we don't have
even a single matching slash, as we don't verify `depth > max_depth` in
that case. So in theory, we could modify the function to the following
equivalent:
int within_depth(const char *name, int namelen,
int depth, int max_depth)
{
const char *cp = name, *cpe = name + namelen;
if (depth > max_depth)
return 0;
while (cp < cpe) {
if (*cp++ != '/')
continue;
depth++;
if (depth > max_depth)
return 0;
}
return 1;
}
(Not saying we should, I'm just double checking my understanding).
Patrick
next prev parent reply other threads:[~2025-07-30 6:20 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-29 18:57 [PATCH 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
2025-07-29 18:57 ` [PATCH 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
2025-07-29 18:57 ` [PATCH 2/3] within_depth: fix return for empty path Toon Claes
2025-07-30 6:20 ` Patrick Steinhardt [this message]
2025-08-06 14:30 ` Toon Claes
2025-08-07 6:15 ` Patrick Steinhardt
2025-07-29 18:57 ` [PATCH 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
2025-07-30 6:20 ` Patrick Steinhardt
2025-08-06 14:49 ` Toon Claes
2025-08-07 5:55 ` Patrick Steinhardt
2025-08-07 20:52 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
2025-08-07 20:52 ` [PATCH v2 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
2025-08-07 20:52 ` [PATCH v2 2/3] within_depth: fix return for empty path Toon Claes
2025-08-07 20:52 ` [PATCH v2 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
2025-08-14 15:15 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Junio C Hamano
2025-08-15 5:17 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIm5vMeTLSLD-6Fz@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=peff@peff.net \
--cc=toon@iotcl.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).