git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Toon Claes <toon@iotcl.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Justin Tobler <jltobler@gmail.com>
Subject: Re: [PATCH 2/3] within_depth: fix return for empty path
Date: Wed, 30 Jul 2025 08:20:44 +0200	[thread overview]
Message-ID: <aIm5vMeTLSLD-6Fz@pks.im> (raw)
In-Reply-To: <20250729-toon-max-depth-v1-2-c177e39c40fb@iotcl.com>

On Tue, Jul 29, 2025 at 08:57:43PM +0200, Toon Claes wrote:
> From: Jeff King <peff@peff.net>
> 
> The within_depth() function is used to check whether pathspecs limited
> by a max-depth parameter are acceptable. It takes a path to check, a
> maximum depth, and a "base" depth. It counts the components in the
> path (by counting slashes), adds them to the base, and compare them to

s/compare/&s/

> the maximum.
> 
> However, if the base does not have any slashes at all, we always return
> `true`. If the base depth is 0, then this is correct; no matter what the
> maximum is, we are always within it. However, if the base depth is
> greater than 0, then we might return an erroneous result.
> 
> This ends up not causing any user-visible bugs in the current code. The
> call sites in dir.c always pass a base depth of 0, so are unaffected.
> But tree_entry_interesting() uses this function differently: it will
> pass the prefix of the current entry, along with a `1` if the entry is a
> directory, in essence checking whether items inside the entry would be
> of interest. It turns out not to make a difference in behavior, but the
> reasoning is complex.
> 
> Given a tree like:
> 
>   file
>   a/file
>   a/b/file
> 
> walking the tree and calling tree_entry_interesting() will yield the
> following results:
> 
>   (with max_depth=0):
>       file: yes
>          a: yes
>     a/file: no
>        a/b: no
> 
>   (with max_depth=1):
>       file: yes
>          a: yes
>     a/file: yes
>        a/b: no
> 
> So we have inconsistent behavior in considering directories interesting.
> If they are at the edge of our depth but at the root, we will recurse
> into them, but then find all of their entries uninteresting (e.g., in
> the first case, we will look at "a" but find "a/*" uninteresting). But
> if they are at the edge of our depth and not at the root, then we will
> not recurse (in the second example, we do not even bother entering
> "a/b").
> 
> This turns out not to matter because the only caller which uses
> max-depth pathspecs is cmd_grep(), which only cares about blob entries.
> From its perspective, it is exactly the same to not recurse into a
> subtree, or to recurse and find that it contains no matching entries.
> Not recursing is merely an optimization.

Okay, well-explained.

> It is debatable whether tree_entry_interesting() should consider such an
> entry interesting. The only caller does not care if it sees the tree
> itself, and can benefit from the optimization. But if we add a
> "max-depth" limiter to regular diffs, then a diff with
> DIFF_OPT_TREE_IN_RECURSIVE would probably want to show the tree itself,
> but not what it contains.

I haven't yet read the cover letter, but I guess that the scenario where
we'll care about this is in git-last-modified(1) if we want to teach
that command a `--max-depth` parameter?

> This patch just fixes within_depth(), which means we consider such
> entries uninteresting (and makes the current caller happy). If we want
> to change that in the future, then this fix is still the correct first
> step, as the current behavior is simply inconsistent.

So... do we end up with this now?

   (with max_depth=1):
       file: yes
          a: yes
     a/file: no
        a/b: no

I think it would be nice to include that in the message to show the
change in behaviour at a glance.

> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
>  dir.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index 02873f59ea..900ee5e559 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -277,7 +277,7 @@ int within_depth(const char *name, int namelen,
>  		if (depth > max_depth)
>  			return 0;
>  	}
> -	return 1;
> +	return depth <= max_depth;
>  }

Just for my own understanding: the only difference is when we don't have
even a single matching slash, as we don't verify `depth > max_depth` in
that case. So in theory, we could modify the function to the following
equivalent:

	int within_depth(const char *name, int namelen,
				int depth, int max_depth)
	{
		const char *cp = name, *cpe = name + namelen;

		if (depth > max_depth)
			return 0;

		while (cp < cpe) {
			if (*cp++ != '/')
				continue;
			depth++;
			if (depth > max_depth)
				return 0;
		}
		return 1;
	}

(Not saying we should, I'm just double checking my understanding).

Patrick

  reply	other threads:[~2025-07-30  6:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-29 18:57 [PATCH 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
2025-07-29 18:57 ` [PATCH 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
2025-07-29 18:57 ` [PATCH 2/3] within_depth: fix return for empty path Toon Claes
2025-07-30  6:20   ` Patrick Steinhardt [this message]
2025-08-06 14:30     ` Toon Claes
2025-08-07  6:15       ` Patrick Steinhardt
2025-07-29 18:57 ` [PATCH 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
2025-07-30  6:20   ` Patrick Steinhardt
2025-08-06 14:49     ` Toon Claes
2025-08-07  5:55       ` Patrick Steinhardt
2025-08-07 20:52 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
2025-08-07 20:52   ` [PATCH v2 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
2025-08-07 20:52   ` [PATCH v2 2/3] within_depth: fix return for empty path Toon Claes
2025-08-07 20:52   ` [PATCH v2 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
2025-08-14 15:15   ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Junio C Hamano
2025-08-15  5:17     ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aIm5vMeTLSLD-6Fz@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=peff@peff.net \
    --cc=toon@iotcl.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).