git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Teach git-diff-tree(1) option --max-depth
@ 2025-07-29 18:57 Toon Claes
  2025-07-29 18:57 ` [PATCH 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Toon Claes @ 2025-07-29 18:57 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Justin Tobler, Toon Claes

Please consider these patches authored by Peff. They add option
`--max-depth` to the diff machinery.

During the process to upstream the git-blame-tree(1), later named
git-last-modified(1), various times[1][2][3] the topic was raised to add
option `--max-depth` to the diff machinery. In this patch series this
change is added as a separate patch series.

The patches were originally written by Peff[4] and I'm crediting him as
the author. I've taken the patches almost verbatim from his fork on
GitHub, with some minor tweaks in the commit messages. Because only tiny
changes were made, I've kept his Signed-off-by trailers, but I can
remove if disagreed on.

The goal of the option `--max-depth` is to stop recursively traversing
the tree if the given depth is reached from the pathspec.

These patches add `max_depth` and `max_depth_valid` to `struct
diff_options`. This is different from what git-grep(1) does, which uses
`max_depth` on `struct pathspec` instead. At the moment I'm on the fence
whether this is an issue: while it probably makes sense to consolidate
them into the same structs, it does not really make sense to reuse these
the struct fields if they are used in two separate code paths.

[1]: https://lore.kernel.org/git/20130318121243.GC14789@sigill.intra.peff.net/
[2]: https://lore.kernel.org/git/20160831054201.ldlwptlmcndjmfwu@sigill.intra.peff.net/
[3]: https://lore.kernel.org/git/Y+%2FmnnJUz75yfWCN@coredump.intra.peff.net/
[4]: https://github.com/peff/git/tree/jk/diff-max-depth

---
Jeff King (3):
      combine-diff: zero memory used for callback filepairs
      within_depth: fix return for empty path
      diff: teach tree-diff a max-depth parameter

 Documentation/diff-options.adoc |  28 +++++++++++
 combine-diff.c                  |   2 +-
 diff-lib.c                      |   5 ++
 diff.c                          |  19 +++++++
 diff.h                          |   9 ++++
 dir.c                           |   2 +-
 t/meson.build                   |   1 +
 t/t4072-diff-max-depth.sh       | 109 ++++++++++++++++++++++++++++++++++++++++
 tree-diff.c                     |  78 ++++++++++++++++++++++++++--
 9 files changed, 248 insertions(+), 5 deletions(-)
---



---

base-commit: e813a0200a7121b97fec535f0d0b460b0a33356c
change-id: 20250724-toon-max-depth-25d3c19e2607

Thanks
--
Toon


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] combine-diff: zero memory used for callback filepairs
  2025-07-29 18:57 [PATCH 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
@ 2025-07-29 18:57 ` Toon Claes
  2025-07-29 18:57 ` [PATCH 2/3] within_depth: fix return for empty path Toon Claes
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Toon Claes @ 2025-07-29 18:57 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Justin Tobler, Toon Claes

From: Jeff King <peff@peff.net>

In commit 25e5e2bf85 (combine-diff: support format_callback,
2011-08-19), the combined-diff code learned how to make a multi-sourced
`diff_filepair` to pass to a diff callback. When we create each
filepair, we do not bother to fill in many of the fields, because they
would make no sense (e.g. there can be no rename score or broken_pair
flag because we do not go through the diffcore filters). However, we did
not even bother to zero them, leading to random values. Let's make sure
everything is blank with xcalloc(), just as the regular diff code does.

We would potentially want to set the `status` flag to
something non-zero, but it is not clear to what. Possibly a
new DIFF_STATUS_COMBINED would make sense, as this is not
strictly a modification, nor does it fit any other category.

Since it is not yet clear what callers would want, this
patch simply leaves it as `0`, the same empty flag that is
seen when `diffcore_std` is not used at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
---
 combine-diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/combine-diff.c b/combine-diff.c
index 4ea2dc93c4..3878faabe7 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1315,7 +1315,7 @@ static struct diff_filepair *combined_pair(struct combine_diff_path *p,
 	struct diff_filepair *pair;
 	struct diff_filespec *pool;
 
-	pair = xmalloc(sizeof(*pair));
+	CALLOC_ARRAY(pair, 1);
 	CALLOC_ARRAY(pool, st_add(num_parent, 1));
 	pair->one = pool + 1;
 	pair->two = pool;

-- 
2.50.1.327.g047016eb4a


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] within_depth: fix return for empty path
  2025-07-29 18:57 [PATCH 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
  2025-07-29 18:57 ` [PATCH 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
@ 2025-07-29 18:57 ` Toon Claes
  2025-07-30  6:20   ` Patrick Steinhardt
  2025-07-29 18:57 ` [PATCH 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
  2025-08-07 20:52 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
  3 siblings, 1 reply; 16+ messages in thread
From: Toon Claes @ 2025-07-29 18:57 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Justin Tobler, Toon Claes

From: Jeff King <peff@peff.net>

The within_depth() function is used to check whether pathspecs limited
by a max-depth parameter are acceptable. It takes a path to check, a
maximum depth, and a "base" depth. It counts the components in the
path (by counting slashes), adds them to the base, and compare them to
the maximum.

However, if the base does not have any slashes at all, we always return
`true`. If the base depth is 0, then this is correct; no matter what the
maximum is, we are always within it. However, if the base depth is
greater than 0, then we might return an erroneous result.

This ends up not causing any user-visible bugs in the current code. The
call sites in dir.c always pass a base depth of 0, so are unaffected.
But tree_entry_interesting() uses this function differently: it will
pass the prefix of the current entry, along with a `1` if the entry is a
directory, in essence checking whether items inside the entry would be
of interest. It turns out not to make a difference in behavior, but the
reasoning is complex.

Given a tree like:

  file
  a/file
  a/b/file

walking the tree and calling tree_entry_interesting() will yield the
following results:

  (with max_depth=0):
      file: yes
         a: yes
    a/file: no
       a/b: no

  (with max_depth=1):
      file: yes
         a: yes
    a/file: yes
       a/b: no

So we have inconsistent behavior in considering directories interesting.
If they are at the edge of our depth but at the root, we will recurse
into them, but then find all of their entries uninteresting (e.g., in
the first case, we will look at "a" but find "a/*" uninteresting). But
if they are at the edge of our depth and not at the root, then we will
not recurse (in the second example, we do not even bother entering
"a/b").

This turns out not to matter because the only caller which uses
max-depth pathspecs is cmd_grep(), which only cares about blob entries.
From its perspective, it is exactly the same to not recurse into a
subtree, or to recurse and find that it contains no matching entries.
Not recursing is merely an optimization.

It is debatable whether tree_entry_interesting() should consider such an
entry interesting. The only caller does not care if it sees the tree
itself, and can benefit from the optimization. But if we add a
"max-depth" limiter to regular diffs, then a diff with
DIFF_OPT_TREE_IN_RECURSIVE would probably want to show the tree itself,
but not what it contains.

This patch just fixes within_depth(), which means we consider such
entries uninteresting (and makes the current caller happy). If we want
to change that in the future, then this fix is still the correct first
step, as the current behavior is simply inconsistent.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
---
 dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 02873f59ea..900ee5e559 100644
--- a/dir.c
+++ b/dir.c
@@ -277,7 +277,7 @@ int within_depth(const char *name, int namelen,
 		if (depth > max_depth)
 			return 0;
 	}
-	return 1;
+	return depth <= max_depth;
 }
 
 /*

-- 
2.50.1.327.g047016eb4a


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] diff: teach tree-diff a max-depth parameter
  2025-07-29 18:57 [PATCH 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
  2025-07-29 18:57 ` [PATCH 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
  2025-07-29 18:57 ` [PATCH 2/3] within_depth: fix return for empty path Toon Claes
@ 2025-07-29 18:57 ` Toon Claes
  2025-07-30  6:20   ` Patrick Steinhardt
  2025-08-07 20:52 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
  3 siblings, 1 reply; 16+ messages in thread
From: Toon Claes @ 2025-07-29 18:57 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Justin Tobler, Toon Claes

From: Jeff King <peff@peff.net>

When you are doing a tree-diff, there are basically two options: do not
recurse into subtrees at all, or recurse indefinitely. While most
callers would want to always recurse and see full pathnames, some may
want the efficiency of looking only at a particular level of the tree.
This is currently easy to do for the top-level (just turn off
recursion), but you cannot say "show me what changed in subdir/, but do
not recurse".

This patch adds a max-depth parameter which is measured from the closest
pathspec match, so that you can do:

  git log --raw --max-depth=1 -- a/b/c

and see the raw output for a/b/c/, but not those of a/b/c/d/
(instead of the raw output you would see for a/b/c/d).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
---
 Documentation/diff-options.adoc |  28 +++++++++++
 diff-lib.c                      |   5 ++
 diff.c                          |  19 +++++++
 diff.h                          |   9 ++++
 t/meson.build                   |   1 +
 t/t4072-diff-max-depth.sh       | 109 ++++++++++++++++++++++++++++++++++++++++
 tree-diff.c                     |  78 ++++++++++++++++++++++++++--
 7 files changed, 246 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc
index f3a35d8141..46e6ed2d67 100644
--- a/Documentation/diff-options.adoc
+++ b/Documentation/diff-options.adoc
@@ -893,5 +893,33 @@ endif::git-format-patch[]
 	reverted with `--ita-visible-in-index`. Both options are
 	experimental and could be removed in future.
 
+--max-depth=<n>::
+
+	Limit diff recursion to `<n>` levels (implies `-r`). The depth
+	is measured from the closest pathspec. Given a tree containing
+	`foo/bar/baz`, the following list shows the matches generated by
+	each set of options:
++
+--
+ - `--max-depth=0 -- foo`: `foo`
+
+ - `--max-depth=1 -- foo`: `foo/bar`
+
+ - `--max-depth=1 -- foo/bar`: `foo/bar/baz`
+
+ - `--max-depth=1 -- foo foo/bar`: `foo/bar/baz`
+
+ - `--max-depth=2 -- foo`: `foo/bar/baz`
+--
++
+If no pathspec is given, the depth is measured as if all
+top-level entries were specified. Note that this is different
+than measuring from the root, in that `--max-depth=0` would
+still return `foo`. This allows you to still limit depth while
+asking for a subset of the top-level entries.
++
+Note that this option is only supported for diffs between tree objects,
+not against the index or working tree.
+
 For more detailed explanation on these common options, see also
 linkgit:gitdiffcore[7].
diff --git a/diff-lib.c b/diff-lib.c
index 244468dd1a..fa7c24c267 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -115,6 +115,9 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
 	uint64_t start = getnanotime();
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	if (revs->diffopt.max_depth_valid)
+		die("max-depth is not supported for worktree diffs");
+
 	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
 
 	refresh_fsmonitor(istate);
@@ -560,6 +563,8 @@ static int diff_cache(struct rev_info *revs,
 	opts.dst_index = NULL;
 	opts.pathspec = &revs->diffopt.pathspec;
 	opts.pathspec->recursive = 1;
+	if (revs->diffopt.max_depth_valid)
+		die("max-depth is not supported for index diffs");
 
 	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	return unpack_trees(1, &t, &opts);
diff --git a/diff.c b/diff.c
index dca87e164f..c03a59ac3b 100644
--- a/diff.c
+++ b/diff.c
@@ -4988,6 +4988,9 @@ void diff_setup_done(struct diff_options *options)
 			options->filter = ~filter_bit[DIFF_STATUS_FILTER_AON];
 		options->filter &= ~options->filter_not;
 	}
+
+	if (options->pathspec.has_wildcard && options->max_depth_valid)
+		die("max-depth cannot be used with wildcard pathspecs");
 }
 
 int parse_long_opt(const char *opt, const char **argv,
@@ -5622,6 +5625,18 @@ static int diff_opt_rotate_to(const struct option *opt, const char *arg, int uns
 	return 0;
 }
 
+static int diff_opt_max_depth(const struct option *opt,
+			      const char *arg, int unset)
+{
+	struct diff_options *options = opt->value;
+
+	BUG_ON_OPT_NEG(unset);
+	options->flags.recursive = 1;
+	options->max_depth = strtol(arg, NULL, 10);
+	options->max_depth_valid = 1;
+	return 0;
+}
+
 /*
  * Consider adding new flags to __git_diff_common_options
  * in contrib/completion/git-completion.bash
@@ -5894,6 +5909,10 @@ struct option *add_diff_options(const struct option *opts,
 		OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
 			       N_("select files by diff type"),
 			       PARSE_OPT_NONEG, diff_opt_diff_filter),
+		OPT_CALLBACK_F(0, "max-depth", options, N_("<depth>"),
+			       N_("maximum tree depth to recurse"),
+			       PARSE_OPT_NONEG, diff_opt_max_depth),
+
 		{
 			.type = OPTION_CALLBACK,
 			.long_name = "output",
diff --git a/diff.h b/diff.h
index 62e5768a9a..e3767df237 100644
--- a/diff.h
+++ b/diff.h
@@ -404,6 +404,15 @@ struct diff_options {
 	struct strmap *additional_path_headers;
 
 	int no_free;
+
+	/*
+	 * The extra "valid" flag is a slight hack. The value "0" is perfectly
+	 * valid for max-depth. We would normally use -1 to indicate "not set",
+	 * but there are many code paths which assume that assume that just
+	 * zero-ing out a diff_options is enough to initialize it.
+	 */
+	int max_depth;
+	int max_depth_valid;
 };
 
 unsigned diff_filter_bit(char status);
diff --git a/t/meson.build b/t/meson.build
index 660d780dcc..11908dad6f 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -501,6 +501,7 @@ integration_tests = [
   't4069-remerge-diff.sh',
   't4070-diff-pairs.sh',
   't4071-diff-minimal.sh',
+  't4072-diff-max-depth.sh',
   't4100-apply-stat.sh',
   't4101-apply-nonl.sh',
   't4102-apply-rename.sh',
diff --git a/t/t4072-diff-max-depth.sh b/t/t4072-diff-max-depth.sh
new file mode 100755
index 0000000000..1545ebb869
--- /dev/null
+++ b/t/t4072-diff-max-depth.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+
+test_description='check that diff --max-depth will limit recursion'
+. ./test-lib.sh
+
+make_dir() {
+	mkdir -p "$1" &&
+	echo "$2" >"$1/file"
+}
+
+make_files() {
+	echo "$1" >file &&
+	make_dir one "$1" &&
+	make_dir one/two "$1" &&
+	make_dir one/two/three "$1"
+}
+
+test_expect_success 'setup' '
+	git commit --allow-empty -m empty &&
+	git tag empty &&
+	make_files added &&
+	git add . &&
+	git commit -m added &&
+	make_files modified &&
+	git add . &&
+	git commit -m modified &&
+	make_files index &&
+	git add . &&
+	make_files worktree
+'
+
+test_expect_success '--max-depth is disallowed with wildcard pathspecs' '
+	test_must_fail git diff-tree --max-depth=0 HEAD^ HEAD -- "f*"
+'
+
+check_one() {
+	type=$1; shift
+	args=$1; shift
+	path=$1; shift
+	depth=$1; shift
+	test_expect_${expect:-success} "diff-$type $args, path=$path, depth=$depth" "
+		for i in $*; do echo \$i; done >expect &&
+		git diff-$type --max-depth=$depth --name-only $args -- $path >actual &&
+		test_cmp expect actual
+	"
+}
+
+# For tree comparisons, we expect to see subtrees at the boundary
+# get their own entry.
+check_trees() {
+	check_one tree "$*" '' 0 file one
+	check_one tree "$*" '' 1 file one/file one/two
+	check_one tree "$*" '' 2 file one/file one/two/file one/two/three
+	check_one tree "$*" '' 3 file one/file one/two/file one/two/three/file
+	check_one tree "$*" one 0 one
+	check_one tree "$*" one 1 one/file one/two
+	check_one tree "$*" one 2 one/file one/two/file one/two/three
+	check_one tree "$*" one 3 one/file one/two/file one/two/three/file
+	check_one tree "$*" one/two 0 one/two
+	check_one tree "$*" one/two 1 one/two/file one/two/three
+	check_one tree "$*" one/two 2 one/two/file one/two/three/file
+	check_one tree "$*" one/two/three 0 one/two/three
+	check_one tree "$*" one/two/three 1 one/two/three/file
+}
+
+# But for index comparisons, we do not store subtrees at all, so we do not
+# expect them.
+check_index() {
+	check_one "$@" '' 0 file
+	check_one "$@" '' 1 file one/file
+	check_one "$@" '' 2 file one/file one/two/file
+	check_one "$@" '' 3 file one/file one/two/file one/two/three/file
+	check_one "$@" one 0
+	check_one "$@" one 1 one/file
+	check_one "$@" one 2 one/file one/two/file
+	check_one "$@" one 3 one/file one/two/file one/two/three/file
+	check_one "$@" one/two 0
+	check_one "$@" one/two 1 one/two/file
+	check_one "$@" one/two 2 one/two/file one/two/three/file
+	check_one "$@" one/two/three 0
+	check_one "$@" one/two/three 1 one/two/three/file
+}
+
+# Check as a modification...
+check_trees HEAD^ HEAD
+# ...and as an addition...
+check_trees empty HEAD
+# ...and as a deletion.
+check_trees HEAD empty
+
+# We currently only implement max-depth for trees.
+expect=failure
+# Check index against a tree
+check_index index "--cached HEAD"
+# and index against the worktree
+check_index files ""
+expect=
+
+test_expect_success 'find shortest path within embedded pathspecs' '
+	cat >expect <<-\EOF &&
+	one/file
+	one/two/file
+	one/two/three/file
+	EOF
+	git diff-tree --max-depth=2 --name-only HEAD^ HEAD -- one one/two >actual &&
+	test_cmp expect actual
+'
+
+test_done
diff --git a/tree-diff.c b/tree-diff.c
index e00fc2f450..acd302500f 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -13,6 +13,7 @@
 #include "tree-walk.h"
 #include "environment.h"
 #include "repository.h"
+#include "dir.h"
 
 /*
  * Some mode bits are also used internally for computations.
@@ -48,6 +49,73 @@
 		free((x)); \
 } while(0)
 
+/* Returns true if and only if "dir" is a leading directory of "path" */
+static int is_dir_prefix(const char *path, const char *dir, int dirlen)
+{
+	return !strncmp(path, dir, dirlen) &&
+		(!path[dirlen] || path[dirlen] == '/');
+}
+
+static int check_recursion_depth(struct strbuf *name,
+				 const struct pathspec *ps,
+				 int max_depth)
+{
+	int i;
+
+	if (!ps->nr)
+		return within_depth(name->buf, name->len, 1, max_depth);
+
+	/*
+	 * We look through the pathspecs in reverse-sorted order, because we
+	 * want to find the longest match first (e.g., "a/b" is better for
+	 * checking depth than "a/b/c").
+	 */
+	for (i = ps->nr - 1; i >= 0; i--) {
+		const struct pathspec_item *item = ps->items+i;
+
+		/*
+		 * If the name to match is longer than the pathspec, then we
+		 * are only interested if the pathspec matches and we are
+		 * within the allowed depth.
+		 */
+		if (name->len >= item->len) {
+			if (!is_dir_prefix(name->buf, item->match, item->len))
+				continue;
+			return within_depth(name->buf + item->len,
+					    name->len - item->len,
+					    1, max_depth);
+		}
+
+		/*
+		 * Otherwise, our name is shorter than the pathspec. We need to
+		 * check if it is a prefix of the pathspec; if so, we must
+		 * always recurse in order to process further (the resulting
+		 * paths we find might or might not match our pathspec, but we
+		 * cannot know until we recurse).
+		 */
+		if (is_dir_prefix(item->match, name->buf, name->len))
+			return 1;
+	}
+	return 0;
+}
+
+static int should_recurse(struct strbuf *name, struct diff_options *opt)
+{
+	if (!opt->flags.recursive)
+		return 0;
+	if (!opt->max_depth_valid)
+		return 1;
+
+	/*
+	 * We catch this during diff_setup_done, but let's double-check
+	 * against any internal munging.
+	 */
+	if (opt->pathspec.has_wildcard)
+		die("BUG: wildcard pathspecs are incompatible with max-depth");
+
+	return check_recursion_depth(name, &opt->pathspec, opt->max_depth);
+}
+
 static void ll_diff_tree_paths(
 	struct combine_diff_path ***tail, const struct object_id *oid,
 	const struct object_id **parents_oid, int nparent,
@@ -170,9 +238,13 @@ static void emit_path(struct combine_diff_path ***tail,
 		mode = 0;
 	}
 
-	if (opt->flags.recursive && isdir) {
-		recurse = 1;
-		emitthis = opt->flags.tree_in_recursive;
+	if (isdir) {
+		strbuf_add(base, path, pathlen);
+		if (should_recurse(base, opt)) {
+			recurse = 1;
+			emitthis = opt->flags.tree_in_recursive;
+		}
+		strbuf_setlen(base, old_baselen);
 	}
 
 	if (emitthis) {

-- 
2.50.1.327.g047016eb4a


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] within_depth: fix return for empty path
  2025-07-29 18:57 ` [PATCH 2/3] within_depth: fix return for empty path Toon Claes
@ 2025-07-30  6:20   ` Patrick Steinhardt
  2025-08-06 14:30     ` Toon Claes
  0 siblings, 1 reply; 16+ messages in thread
From: Patrick Steinhardt @ 2025-07-30  6:20 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Jeff King, Justin Tobler

On Tue, Jul 29, 2025 at 08:57:43PM +0200, Toon Claes wrote:
> From: Jeff King <peff@peff.net>
> 
> The within_depth() function is used to check whether pathspecs limited
> by a max-depth parameter are acceptable. It takes a path to check, a
> maximum depth, and a "base" depth. It counts the components in the
> path (by counting slashes), adds them to the base, and compare them to

s/compare/&s/

> the maximum.
> 
> However, if the base does not have any slashes at all, we always return
> `true`. If the base depth is 0, then this is correct; no matter what the
> maximum is, we are always within it. However, if the base depth is
> greater than 0, then we might return an erroneous result.
> 
> This ends up not causing any user-visible bugs in the current code. The
> call sites in dir.c always pass a base depth of 0, so are unaffected.
> But tree_entry_interesting() uses this function differently: it will
> pass the prefix of the current entry, along with a `1` if the entry is a
> directory, in essence checking whether items inside the entry would be
> of interest. It turns out not to make a difference in behavior, but the
> reasoning is complex.
> 
> Given a tree like:
> 
>   file
>   a/file
>   a/b/file
> 
> walking the tree and calling tree_entry_interesting() will yield the
> following results:
> 
>   (with max_depth=0):
>       file: yes
>          a: yes
>     a/file: no
>        a/b: no
> 
>   (with max_depth=1):
>       file: yes
>          a: yes
>     a/file: yes
>        a/b: no
> 
> So we have inconsistent behavior in considering directories interesting.
> If they are at the edge of our depth but at the root, we will recurse
> into them, but then find all of their entries uninteresting (e.g., in
> the first case, we will look at "a" but find "a/*" uninteresting). But
> if they are at the edge of our depth and not at the root, then we will
> not recurse (in the second example, we do not even bother entering
> "a/b").
> 
> This turns out not to matter because the only caller which uses
> max-depth pathspecs is cmd_grep(), which only cares about blob entries.
> From its perspective, it is exactly the same to not recurse into a
> subtree, or to recurse and find that it contains no matching entries.
> Not recursing is merely an optimization.

Okay, well-explained.

> It is debatable whether tree_entry_interesting() should consider such an
> entry interesting. The only caller does not care if it sees the tree
> itself, and can benefit from the optimization. But if we add a
> "max-depth" limiter to regular diffs, then a diff with
> DIFF_OPT_TREE_IN_RECURSIVE would probably want to show the tree itself,
> but not what it contains.

I haven't yet read the cover letter, but I guess that the scenario where
we'll care about this is in git-last-modified(1) if we want to teach
that command a `--max-depth` parameter?

> This patch just fixes within_depth(), which means we consider such
> entries uninteresting (and makes the current caller happy). If we want
> to change that in the future, then this fix is still the correct first
> step, as the current behavior is simply inconsistent.

So... do we end up with this now?

   (with max_depth=1):
       file: yes
          a: yes
     a/file: no
        a/b: no

I think it would be nice to include that in the message to show the
change in behaviour at a glance.

> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Toon Claes <toon@iotcl.com>
> ---
>  dir.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index 02873f59ea..900ee5e559 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -277,7 +277,7 @@ int within_depth(const char *name, int namelen,
>  		if (depth > max_depth)
>  			return 0;
>  	}
> -	return 1;
> +	return depth <= max_depth;
>  }

Just for my own understanding: the only difference is when we don't have
even a single matching slash, as we don't verify `depth > max_depth` in
that case. So in theory, we could modify the function to the following
equivalent:

	int within_depth(const char *name, int namelen,
				int depth, int max_depth)
	{
		const char *cp = name, *cpe = name + namelen;

		if (depth > max_depth)
			return 0;

		while (cp < cpe) {
			if (*cp++ != '/')
				continue;
			depth++;
			if (depth > max_depth)
				return 0;
		}
		return 1;
	}

(Not saying we should, I'm just double checking my understanding).

Patrick

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] diff: teach tree-diff a max-depth parameter
  2025-07-29 18:57 ` [PATCH 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
@ 2025-07-30  6:20   ` Patrick Steinhardt
  2025-08-06 14:49     ` Toon Claes
  0 siblings, 1 reply; 16+ messages in thread
From: Patrick Steinhardt @ 2025-07-30  6:20 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Jeff King, Justin Tobler

On Tue, Jul 29, 2025 at 08:57:44PM +0200, Toon Claes wrote:
> From: Jeff King <peff@peff.net>
> 
> When you are doing a tree-diff, there are basically two options: do not
> recurse into subtrees at all, or recurse indefinitely. While most
> callers would want to always recurse and see full pathnames, some may
> want the efficiency of looking only at a particular level of the tree.
> This is currently easy to do for the top-level (just turn off
> recursion), but you cannot say "show me what changed in subdir/, but do
> not recurse".
> 
> This patch adds a max-depth parameter which is measured from the closest
> pathspec match, so that you can do:
> 
>   git log --raw --max-depth=1 -- a/b/c
> 
> and see the raw output for a/b/c/, but not those of a/b/c/d/
> (instead of the raw output you would see for a/b/c/d).

Hm, okay. So what happens if I pass both "a/b" and "a/b/c"? Would I see
the contents of both trees?

> diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc
> index f3a35d8141..46e6ed2d67 100644
> --- a/Documentation/diff-options.adoc
> +++ b/Documentation/diff-options.adoc
> @@ -893,5 +893,33 @@ endif::git-format-patch[]
>  	reverted with `--ita-visible-in-index`. Both options are
>  	experimental and could be removed in future.
>  
> +--max-depth=<n>::
> +
> +	Limit diff recursion to `<n>` levels (implies `-r`). The depth
> +	is measured from the closest pathspec. Given a tree containing
> +	`foo/bar/baz`, the following list shows the matches generated by
> +	each set of options:
> ++
> +--
> + - `--max-depth=0 -- foo`: `foo`
> +
> + - `--max-depth=1 -- foo`: `foo/bar`
> +
> + - `--max-depth=1 -- foo/bar`: `foo/bar/baz`
> +
> + - `--max-depth=1 -- foo foo/bar`: `foo/bar/baz`

This partially answers my question, as we talk about the scenario where
we have "foo/bar/baz", but no "foo/qux". If we had the latter, would
that also be displayed here because it's 1 level deep from the closest
matching pathspec ("foo")?

> + - `--max-depth=2 -- foo`: `foo/bar/baz`
> +--
> ++
> +If no pathspec is given, the depth is measured as if all
> +top-level entries were specified. Note that this is different
> +than measuring from the root, in that `--max-depth=0` would
> +still return `foo`. This allows you to still limit depth while
> +asking for a subset of the top-level entries.
> ++
> +Note that this option is only supported for diffs between tree objects,
> +not against the index or working tree.

Let's also note that wildcard pathspecs aren't supported.

>  For more detailed explanation on these common options, see also
>  linkgit:gitdiffcore[7].
> diff --git a/diff-lib.c b/diff-lib.c
> index 244468dd1a..fa7c24c267 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -115,6 +115,9 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
>  	uint64_t start = getnanotime();
>  	struct index_state *istate = revs->diffopt.repo->index;
>  
> +	if (revs->diffopt.max_depth_valid)
> +		die("max-depth is not supported for worktree diffs");

This and the following error messages should be made translatable.

> diff --git a/diff.c b/diff.c
> index dca87e164f..c03a59ac3b 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -5622,6 +5625,18 @@ static int diff_opt_rotate_to(const struct option *opt, const char *arg, int uns
>  	return 0;
>  }
>  
> +static int diff_opt_max_depth(const struct option *opt,
> +			      const char *arg, int unset)
> +{
> +	struct diff_options *options = opt->value;
> +
> +	BUG_ON_OPT_NEG(unset);
> +	options->flags.recursive = 1;
> +	options->max_depth = strtol(arg, NULL, 10);

We're missing error handling in case the argument is not an integer. We
should probably use `git_parse_unsigned()` instead.

> @@ -5894,6 +5909,10 @@ struct option *add_diff_options(const struct option *opts,
>  		OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
>  			       N_("select files by diff type"),
>  			       PARSE_OPT_NONEG, diff_opt_diff_filter),
> +		OPT_CALLBACK_F(0, "max-depth", options, N_("<depth>"),
> +			       N_("maximum tree depth to recurse"),
> +			       PARSE_OPT_NONEG, diff_opt_max_depth),
> +
>  		{
>  			.type = OPTION_CALLBACK,
>  			.long_name = "output",

Okay. We don't use `OPT_UNSIGNED()` because we also want to impliy the
`recursive` flag. Wouldn't it be simpler though to use `OPT_UNSIGNED()`
and then set the flag in `diff_setup_done()` like we already do for a
couple of other options?

> diff --git a/diff.h b/diff.h
> index 62e5768a9a..e3767df237 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -404,6 +404,15 @@ struct diff_options {
>  	struct strmap *additional_path_headers;
>  
>  	int no_free;
> +
> +	/*
> +	 * The extra "valid" flag is a slight hack. The value "0" is perfectly
> +	 * valid for max-depth. We would normally use -1 to indicate "not set",
> +	 * but there are many code paths which assume that assume that just

Double "assume that".

> +	 * zero-ing out a diff_options is enough to initialize it.
> +	 */
> +	int max_depth;
> +	int max_depth_valid;

So in that case, shouldn't we convert `max_depth` to be unsigned and
`max_depth_valid` to be a boolean?

> diff --git a/tree-diff.c b/tree-diff.c
> index e00fc2f450..acd302500f 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -48,6 +49,73 @@
>  		free((x)); \
>  } while(0)
>  
> +/* Returns true if and only if "dir" is a leading directory of "path" */
> +static int is_dir_prefix(const char *path, const char *dir, int dirlen)
> +{
> +	return !strncmp(path, dir, dirlen) &&
> +		(!path[dirlen] || path[dirlen] == '/');
> +}
> +
> +static int check_recursion_depth(struct strbuf *name,

Let's mark the `name` parameter as `const` both here and in
`should_recurse()` so that it becomes clear that it shouldn't be
modified.

> +				 const struct pathspec *ps,
> +				 int max_depth)
> +{
> +	int i;
> +
> +	if (!ps->nr)
> +		return within_depth(name->buf, name->len, 1, max_depth);
> +
> +	/*
> +	 * We look through the pathspecs in reverse-sorted order, because we
> +	 * want to find the longest match first (e.g., "a/b" is better for
> +	 * checking depth than "a/b/c").
> +	 */
> +	for (i = ps->nr - 1; i >= 0; i--) {

`nr` is of type `int` indeed, so the loop index is correct even though
it feels wrong. But that's nothing we have to fix as part of this patch
series. We could inline the declaration though and make it loop-local.

> +		const struct pathspec_item *item = ps->items+i;
> +
> +		/*
> +		 * If the name to match is longer than the pathspec, then we
> +		 * are only interested if the pathspec matches and we are
> +		 * within the allowed depth.
> +		 */
> +		if (name->len >= item->len) {
> +			if (!is_dir_prefix(name->buf, item->match, item->len))
> +				continue;
> +			return within_depth(name->buf + item->len,
> +					    name->len - item->len,
> +					    1, max_depth);
> +		}
> +
> +		/*
> +		 * Otherwise, our name is shorter than the pathspec. We need to
> +		 * check if it is a prefix of the pathspec; if so, we must
> +		 * always recurse in order to process further (the resulting
> +		 * paths we find might or might not match our pathspec, but we
> +		 * cannot know until we recurse).
> +		 */
> +		if (is_dir_prefix(item->match, name->buf, name->len))
> +			return 1;
> +	}
> +	return 0;
> +}
> +
> +static int should_recurse(struct strbuf *name, struct diff_options *opt)
> +{
> +	if (!opt->flags.recursive)
> +		return 0;
> +	if (!opt->max_depth_valid)
> +		return 1;
> +
> +	/*
> +	 * We catch this during diff_setup_done, but let's double-check
> +	 * against any internal munging.
> +	 */
> +	if (opt->pathspec.has_wildcard)
> +		die("BUG: wildcard pathspecs are incompatible with max-depth");

Let's use `BUG()` instead. This patch series must be old, the function
has been introduced 8 years ago in d8193743e0 (usage.c: add BUG()
function, 2017-05-12).

Patrick

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] within_depth: fix return for empty path
  2025-07-30  6:20   ` Patrick Steinhardt
@ 2025-08-06 14:30     ` Toon Claes
  2025-08-07  6:15       ` Patrick Steinhardt
  0 siblings, 1 reply; 16+ messages in thread
From: Toon Claes @ 2025-08-06 14:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Justin Tobler

Patrick Steinhardt <ps@pks.im> writes:

> On Tue, Jul 29, 2025 at 08:57:43PM +0200, Toon Claes wrote:
>> From: Jeff King <peff@peff.net>
>> 
>> The within_depth() function is used to check whether pathspecs limited
>> by a max-depth parameter are acceptable. It takes a path to check, a
>> maximum depth, and a "base" depth. It counts the components in the
>> path (by counting slashes), adds them to the base, and compare them to
>
> s/compare/&s/
>
>> the maximum.
>> 
>> However, if the base does not have any slashes at all, we always return
>> `true`. If the base depth is 0, then this is correct; no matter what the
>> maximum is, we are always within it. However, if the base depth is
>> greater than 0, then we might return an erroneous result.
>> 
>> This ends up not causing any user-visible bugs in the current code. The
>> call sites in dir.c always pass a base depth of 0, so are unaffected.
>> But tree_entry_interesting() uses this function differently: it will
>> pass the prefix of the current entry, along with a `1` if the entry is a
>> directory, in essence checking whether items inside the entry would be
>> of interest. It turns out not to make a difference in behavior, but the
>> reasoning is complex.
>> 
>> Given a tree like:
>> 
>>   file
>>   a/file
>>   a/b/file
>> 
>> walking the tree and calling tree_entry_interesting() will yield the
>> following results:
>> 
>>   (with max_depth=0):
>>       file: yes
>>          a: yes
>>     a/file: no
>>        a/b: no
>> 
>>   (with max_depth=1):
>>       file: yes
>>          a: yes
>>     a/file: yes
>>        a/b: no
>> 
>> So we have inconsistent behavior in considering directories interesting.
>> If they are at the edge of our depth but at the root, we will recurse
>> into them, but then find all of their entries uninteresting (e.g., in
>> the first case, we will look at "a" but find "a/*" uninteresting). But
>> if they are at the edge of our depth and not at the root, then we will
>> not recurse (in the second example, we do not even bother entering
>> "a/b").
>> 
>> This turns out not to matter because the only caller which uses
>> max-depth pathspecs is cmd_grep(), which only cares about blob entries.
>> From its perspective, it is exactly the same to not recurse into a
>> subtree, or to recurse and find that it contains no matching entries.
>> Not recursing is merely an optimization.
>
> Okay, well-explained.
>
>> It is debatable whether tree_entry_interesting() should consider such an
>> entry interesting. The only caller does not care if it sees the tree
>> itself, and can benefit from the optimization. But if we add a
>> "max-depth" limiter to regular diffs, then a diff with
>> DIFF_OPT_TREE_IN_RECURSIVE would probably want to show the tree itself,
>> but not what it contains.
>
> I haven't yet read the cover letter, but I guess that the scenario where
> we'll care about this is in git-last-modified(1) if we want to teach
> that command a `--max-depth` parameter?

Well, yes git-last-modified(1) will be able to use this option as well.
And it will be useful there. But those patches are still in flight in a
separate series.

In the commit following this one we see `--max-depth` being used in
git-diff-tree(1). Maybe I can do a better job explaining how this
changes affects that patch. Let me see how I can phrase this.

>> This patch just fixes within_depth(), which means we consider such
>> entries uninteresting (and makes the current caller happy). If we want
>> to change that in the future, then this fix is still the correct first
>> step, as the current behavior is simply inconsistent.
>
> So... do we end up with this now?
>
>    (with max_depth=1):
>        file: yes
>           a: yes
>      a/file: no
>         a/b: no
>
> I think it would be nice to include that in the message to show the
> change in behaviour at a glance.

The change in behavior happens when max_depth equals zero. In that case
the example would be:

    (with max_depth=0):
        file: yes
           a: yes
      a/file: no
         a/b: no

So no change actually. But that's due to how tree_entry_interesting()
calls within_depth(). It doesn't pass the path of the entry, but the
base of it. I'm not sure using tree_entry_interesting() here in the
commit message to explain the change is the best idea, but it's the
callsite that /might/ be affected. But it isn't. So it's pretty involved
trying to put this change into words.

>> Signed-off-by: Jeff King <peff@peff.net>
>> Signed-off-by: Toon Claes <toon@iotcl.com>
>> ---
>>  dir.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/dir.c b/dir.c
>> index 02873f59ea..900ee5e559 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -277,7 +277,7 @@ int within_depth(const char *name, int namelen,
>>  		if (depth > max_depth)
>>  			return 0;
>>  	}
>> -	return 1;
>> +	return depth <= max_depth;
>>  }
>
> Just for my own understanding: the only difference is when we don't have
> even a single matching slash, as we don't verify `depth > max_depth` in
> that case. So in theory, we could modify the function to the following
> equivalent:
>
> 	int within_depth(const char *name, int namelen,
> 				int depth, int max_depth)
> 	{
> 		const char *cp = name, *cpe = name + namelen;
>
> 		if (depth > max_depth)
> 			return 0;
>
> 		while (cp < cpe) {
> 			if (*cp++ != '/')
> 				continue;
> 			depth++;
> 			if (depth > max_depth)
> 				return 0;
> 		}
> 		return 1;
> 	}
>
> (Not saying we should, I'm just double checking my understanding).

To be honest, I wasn't sure no more. I decided to see if I can write a
unit test. This is what I came up with:

    void test_dir__within_depth(void)
    {
    	struct test_case {
    		const char *path;
    		int depth;
    		int max_depth;
    		int expect;
    	} test_cases[] = {
    		/* depth = 0; max_depth = 0 */
    		{ "",         0, 0, 1 },
    		{ "file",     0, 0, 1 },
    		{ "a",        0, 0, 1 },
    		{ "a/file",   0, 0, 0 },
    		{ "a/b",      0, 0, 0 },
    		{ "a/b/file", 0, 0, 0 },

    		/* depth = 0; max_depth = 1 */
    		{ "",         0, 1, 1 },
    		{ "file",     0, 1, 1 },
    		{ "a",        0, 1, 1 },
    		{ "a/file",   0, 1, 1 },
    		{ "a/b",      0, 1, 1 },
    		{ "a/b/file", 0, 1, 0 },

    		/* depth = 1; max_depth = 1 */
    		{ "",         1, 1, 1 },
    		{ "file",     1, 1, 1 },
    		{ "a",        1, 1, 1 },
    		{ "a/file",   1, 1, 0 },
    		{ "a/b",      1, 1, 0 },
    		{ "a/b/file", 1, 1, 0 },

    		/* depth = 1; max_depth = 0 */
    		{ "",         1, 0, 0 },
    		{ "file",     1, 0, 0 },
    		{ "a",        1, 0, 0 },
    		{ "a/file",   1, 0, 0 },
    		{ "a/b",      1, 0, 0 },
    		{ "a/b/file", 1, 0, 0 },
    	};

    	for (size_t i = 0; i < ARRAY_SIZE(test_cases); i++) {
    		int result = within_depth(test_cases[i].path, strlen(test_cases[i].path),
    					  test_cases[i].depth, test_cases[i].max_depth);
    		cl_assert_equal_b(result, test_cases[i].expect);
    	}
    }

The change in this patch affect the last batch of tests (the batch with
'depth = 1; max_depth = 0'). Running the test on both this patch and
your suggestion gives the same results, so yes your suggestion has the
same output.

Do you think it's worth to add such unit test?

I did add `""` because I've seen cmd_grep() might be passing an empty
path.

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] diff: teach tree-diff a max-depth parameter
  2025-07-30  6:20   ` Patrick Steinhardt
@ 2025-08-06 14:49     ` Toon Claes
  2025-08-07  5:55       ` Patrick Steinhardt
  0 siblings, 1 reply; 16+ messages in thread
From: Toon Claes @ 2025-08-06 14:49 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Justin Tobler

Patrick Steinhardt <ps@pks.im> writes:

> On Tue, Jul 29, 2025 at 08:57:44PM +0200, Toon Claes wrote:
>> From: Jeff King <peff@peff.net>
>> 
>> When you are doing a tree-diff, there are basically two options: do not
>> recurse into subtrees at all, or recurse indefinitely. While most
>> callers would want to always recurse and see full pathnames, some may
>> want the efficiency of looking only at a particular level of the tree.
>> This is currently easy to do for the top-level (just turn off
>> recursion), but you cannot say "show me what changed in subdir/, but do
>> not recurse".
>> 
>> This patch adds a max-depth parameter which is measured from the closest
>> pathspec match, so that you can do:
>> 
>>   git log --raw --max-depth=1 -- a/b/c
>> 
>> and see the raw output for a/b/c/, but not those of a/b/c/d/
>> (instead of the raw output you would see for a/b/c/d).
>
> Hm, okay. So what happens if I pass both "a/b" and "a/b/c"? Would I see
> the contents of both trees?

Yes, the max-depth applies to both pathspecs and entries that match
either of them are shown.

>
>> diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc
>> index f3a35d8141..46e6ed2d67 100644
>> --- a/Documentation/diff-options.adoc
>> +++ b/Documentation/diff-options.adoc
>> @@ -893,5 +893,33 @@ endif::git-format-patch[]
>>  	reverted with `--ita-visible-in-index`. Both options are
>>  	experimental and could be removed in future.
>>  
>> +--max-depth=<n>::
>> +
>> +	Limit diff recursion to `<n>` levels (implies `-r`). The depth
>> +	is measured from the closest pathspec. Given a tree containing
>> +	`foo/bar/baz`, the following list shows the matches generated by
>> +	each set of options:
>> ++
>> +--
>> + - `--max-depth=0 -- foo`: `foo`
>> +
>> + - `--max-depth=1 -- foo`: `foo/bar`
>> +
>> + - `--max-depth=1 -- foo/bar`: `foo/bar/baz`
>> +
>> + - `--max-depth=1 -- foo foo/bar`: `foo/bar/baz`
>
> This partially answers my question, as we talk about the scenario where
> we have "foo/bar/baz", but no "foo/qux". If we had the latter, would
> that also be displayed here because it's 1 level deep from the closest
> matching pathspec ("foo")?

"foo/qux" is indeed within 1 level deep from "foo", so yes "foo/qux"
will be shown. "foo/qux/duck" not obviously.

>
>> + - `--max-depth=2 -- foo`: `foo/bar/baz`
>> +--
>> ++
>> +If no pathspec is given, the depth is measured as if all
>> +top-level entries were specified. Note that this is different
>> +than measuring from the root, in that `--max-depth=0` would
>> +still return `foo`. This allows you to still limit depth while
>> +asking for a subset of the top-level entries.
>> ++
>> +Note that this option is only supported for diffs between tree objects,
>> +not against the index or working tree.
>
> Let's also note that wildcard pathspecs aren't supported.

Ah yes, good point. Will add.

>>  For more detailed explanation on these common options, see also
>>  linkgit:gitdiffcore[7].
>> diff --git a/diff-lib.c b/diff-lib.c
>> index 244468dd1a..fa7c24c267 100644
>> --- a/diff-lib.c
>> +++ b/diff-lib.c
>> @@ -115,6 +115,9 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
>>  	uint64_t start = getnanotime();
>>  	struct index_state *istate = revs->diffopt.repo->index;
>>  
>> +	if (revs->diffopt.max_depth_valid)
>> +		die("max-depth is not supported for worktree diffs");
>
> This and the following error messages should be made translatable.

True. Will do.

>> diff --git a/diff.c b/diff.c
>> index dca87e164f..c03a59ac3b 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -5622,6 +5625,18 @@ static int diff_opt_rotate_to(const struct option *opt, const char *arg, int uns
>>  	return 0;
>>  }
>>  
>> +static int diff_opt_max_depth(const struct option *opt,
>> +			      const char *arg, int unset)
>> +{
>> +	struct diff_options *options = opt->value;
>> +
>> +	BUG_ON_OPT_NEG(unset);
>> +	options->flags.recursive = 1;
>> +	options->max_depth = strtol(arg, NULL, 10);
>
> We're missing error handling in case the argument is not an integer. We
> should probably use `git_parse_unsigned()` instead.

Absolutely. Well, TIL!

>
>> @@ -5894,6 +5909,10 @@ struct option *add_diff_options(const struct option *opts,
>>  		OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
>>  			       N_("select files by diff type"),
>>  			       PARSE_OPT_NONEG, diff_opt_diff_filter),
>> +		OPT_CALLBACK_F(0, "max-depth", options, N_("<depth>"),
>> +			       N_("maximum tree depth to recurse"),
>> +			       PARSE_OPT_NONEG, diff_opt_max_depth),
>> +
>>  		{
>>  			.type = OPTION_CALLBACK,
>>  			.long_name = "output",
>
> Okay. We don't use `OPT_UNSIGNED()` because we also want to impliy the
> `recursive` flag. Wouldn't it be simpler though to use `OPT_UNSIGNED()`
> and then set the flag in `diff_setup_done()` like we already do for a
> couple of other options?

But how would you determine `max_depth_valid`?

And, without realizing, you touch a good point here. Looking at the
git-grep(1) docs, it says:

    For each <pathspec> given on command line, descend at most <depth>
    levels of directories. A value of -1 means no limit.

And:

    -r::
    --recursive::
    	Same as `--max-depth=-1`; this is the default.

We should handle -1 the same, that's currently not the case.

>> diff --git a/diff.h b/diff.h
>> index 62e5768a9a..e3767df237 100644
>> --- a/diff.h
>> +++ b/diff.h
>> @@ -404,6 +404,15 @@ struct diff_options {
>>  	struct strmap *additional_path_headers;
>>  
>>  	int no_free;
>> +
>> +	/*
>> +	 * The extra "valid" flag is a slight hack. The value "0" is perfectly
>> +	 * valid for max-depth. We would normally use -1 to indicate "not set",
>> +	 * but there are many code paths which assume that assume that just
>
> Double "assume that".

Thanks!

>> +	 * zero-ing out a diff_options is enough to initialize it.
>> +	 */
>> +	int max_depth;
>> +	int max_depth_valid;
>
> So in that case, shouldn't we convert `max_depth` to be unsigned and
> `max_depth_valid` to be a boolean?

As I mentioned above, -1 should mean "No limit", but should enable
recursion. Now, that doesn't mean we can't make the changes you suggest,
we just have to take that into account.

>> diff --git a/tree-diff.c b/tree-diff.c
>> index e00fc2f450..acd302500f 100644
>> --- a/tree-diff.c
>> +++ b/tree-diff.c
>> @@ -48,6 +49,73 @@
>>  		free((x)); \
>>  } while(0)
>>  
>> +/* Returns true if and only if "dir" is a leading directory of "path" */
>> +static int is_dir_prefix(const char *path, const char *dir, int dirlen)
>> +{
>> +	return !strncmp(path, dir, dirlen) &&
>> +		(!path[dirlen] || path[dirlen] == '/');
>> +}
>> +
>> +static int check_recursion_depth(struct strbuf *name,
>
> Let's mark the `name` parameter as `const` both here and in
> `should_recurse()` so that it becomes clear that it shouldn't be
> modified.

:+1:

>> +				 const struct pathspec *ps,
>> +				 int max_depth)
>> +{
>> +	int i;
>> +
>> +	if (!ps->nr)
>> +		return within_depth(name->buf, name->len, 1, max_depth);
>> +
>> +	/*
>> +	 * We look through the pathspecs in reverse-sorted order, because we
>> +	 * want to find the longest match first (e.g., "a/b" is better for
>> +	 * checking depth than "a/b/c").
>> +	 */
>> +	for (i = ps->nr - 1; i >= 0; i--) {
>
> `nr` is of type `int` indeed, so the loop index is correct even though
> it feels wrong. But that's nothing we have to fix as part of this patch
> series. We could inline the declaration though and make it loop-local.

Ack.

>> +		const struct pathspec_item *item = ps->items+i;
>> +
>> +		/*
>> +		 * If the name to match is longer than the pathspec, then we
>> +		 * are only interested if the pathspec matches and we are
>> +		 * within the allowed depth.
>> +		 */
>> +		if (name->len >= item->len) {
>> +			if (!is_dir_prefix(name->buf, item->match, item->len))
>> +				continue;
>> +			return within_depth(name->buf + item->len,
>> +					    name->len - item->len,
>> +					    1, max_depth);
>> +		}
>> +
>> +		/*
>> +		 * Otherwise, our name is shorter than the pathspec. We need to
>> +		 * check if it is a prefix of the pathspec; if so, we must
>> +		 * always recurse in order to process further (the resulting
>> +		 * paths we find might or might not match our pathspec, but we
>> +		 * cannot know until we recurse).
>> +		 */
>> +		if (is_dir_prefix(item->match, name->buf, name->len))
>> +			return 1;
>> +	}
>> +	return 0;
>> +}
>> +
>> +static int should_recurse(struct strbuf *name, struct diff_options *opt)
>> +{
>> +	if (!opt->flags.recursive)
>> +		return 0;
>> +	if (!opt->max_depth_valid)
>> +		return 1;
>> +
>> +	/*
>> +	 * We catch this during diff_setup_done, but let's double-check
>> +	 * against any internal munging.
>> +	 */
>> +	if (opt->pathspec.has_wildcard)
>> +		die("BUG: wildcard pathspecs are incompatible with max-depth");
>
> Let's use `BUG()` instead. This patch series must be old, the function
> has been introduced 8 years ago in d8193743e0 (usage.c: add BUG()
> function, 2017-05-12).

Good catch. Even looking at your suggestion, at first I didn't realize
it wasn't using BUG(). ;)

-- 
Cheers,
Toon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] diff: teach tree-diff a max-depth parameter
  2025-08-06 14:49     ` Toon Claes
@ 2025-08-07  5:55       ` Patrick Steinhardt
  0 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-08-07  5:55 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Jeff King, Justin Tobler

On Wed, Aug 06, 2025 at 04:49:30PM +0200, Toon Claes wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > On Tue, Jul 29, 2025 at 08:57:44PM +0200, Toon Claes wrote:
> >> diff --git a/diff.c b/diff.c
> >> index dca87e164f..c03a59ac3b 100644
> >> --- a/diff.c
> >> +++ b/diff.c
> >> @@ -5894,6 +5909,10 @@ struct option *add_diff_options(const struct option *opts,
> >>  		OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
> >>  			       N_("select files by diff type"),
> >>  			       PARSE_OPT_NONEG, diff_opt_diff_filter),
> >> +		OPT_CALLBACK_F(0, "max-depth", options, N_("<depth>"),
> >> +			       N_("maximum tree depth to recurse"),
> >> +			       PARSE_OPT_NONEG, diff_opt_max_depth),
> >> +
> >>  		{
> >>  			.type = OPTION_CALLBACK,
> >>  			.long_name = "output",
> >
> > Okay. We don't use `OPT_UNSIGNED()` because we also want to impliy the
> > `recursive` flag. Wouldn't it be simpler though to use `OPT_UNSIGNED()`
> > and then set the flag in `diff_setup_done()` like we already do for a
> > couple of other options?
> 
> But how would you determine `max_depth_valid`?

Ideally we wouldn't need it as we could just initialize with a default
value.

> And, without realizing, you touch a good point here. Looking at the
> git-grep(1) docs, it says:
> 
>     For each <pathspec> given on command line, descend at most <depth>
>     levels of directories. A value of -1 means no limit.
> 
> And:
> 
>     -r::
>     --recursive::
>     	Same as `--max-depth=-1`; this is the default.
> 
> We should handle -1 the same, that's currently not the case.

True. Do we have the same default? If so, couldn't we drop
`max_depth_valid` and initialize it with -1 from the start?

If that's not easily possible we can just stick with what you have.

Patrick

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] within_depth: fix return for empty path
  2025-08-06 14:30     ` Toon Claes
@ 2025-08-07  6:15       ` Patrick Steinhardt
  0 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-08-07  6:15 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Jeff King, Justin Tobler

On Wed, Aug 06, 2025 at 04:30:32PM +0200, Toon Claes wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > Just for my own understanding: the only difference is when we don't have
> > even a single matching slash, as we don't verify `depth > max_depth` in
> > that case. So in theory, we could modify the function to the following
> > equivalent:
> >
> > 	int within_depth(const char *name, int namelen,
> > 				int depth, int max_depth)
> > 	{
> > 		const char *cp = name, *cpe = name + namelen;
> >
> > 		if (depth > max_depth)
> > 			return 0;
> >
> > 		while (cp < cpe) {
> > 			if (*cp++ != '/')
> > 				continue;
> > 			depth++;
> > 			if (depth > max_depth)
> > 				return 0;
> > 		}
> > 		return 1;
> > 	}
> >
> > (Not saying we should, I'm just double checking my understanding).
> 
> To be honest, I wasn't sure no more. I decided to see if I can write a
> unit test. This is what I came up with:
> 
>     void test_dir__within_depth(void)
>     {
>     	struct test_case {
>     		const char *path;
>     		int depth;
>     		int max_depth;
>     		int expect;
>     	} test_cases[] = {
>     		/* depth = 0; max_depth = 0 */
>     		{ "",         0, 0, 1 },
>     		{ "file",     0, 0, 1 },
>     		{ "a",        0, 0, 1 },
>     		{ "a/file",   0, 0, 0 },
>     		{ "a/b",      0, 0, 0 },
>     		{ "a/b/file", 0, 0, 0 },
> 
>     		/* depth = 0; max_depth = 1 */
>     		{ "",         0, 1, 1 },
>     		{ "file",     0, 1, 1 },
>     		{ "a",        0, 1, 1 },
>     		{ "a/file",   0, 1, 1 },
>     		{ "a/b",      0, 1, 1 },
>     		{ "a/b/file", 0, 1, 0 },
> 
>     		/* depth = 1; max_depth = 1 */
>     		{ "",         1, 1, 1 },
>     		{ "file",     1, 1, 1 },
>     		{ "a",        1, 1, 1 },
>     		{ "a/file",   1, 1, 0 },
>     		{ "a/b",      1, 1, 0 },
>     		{ "a/b/file", 1, 1, 0 },
> 
>     		/* depth = 1; max_depth = 0 */
>     		{ "",         1, 0, 0 },
>     		{ "file",     1, 0, 0 },
>     		{ "a",        1, 0, 0 },
>     		{ "a/file",   1, 0, 0 },
>     		{ "a/b",      1, 0, 0 },
>     		{ "a/b/file", 1, 0, 0 },
>     	};
> 
>     	for (size_t i = 0; i < ARRAY_SIZE(test_cases); i++) {
>     		int result = within_depth(test_cases[i].path, strlen(test_cases[i].path),
>     					  test_cases[i].depth, test_cases[i].max_depth);
>     		cl_assert_equal_b(result, test_cases[i].expect);
>     	}
>     }
> 
> The change in this patch affect the last batch of tests (the batch with
> 'depth = 1; max_depth = 0'). Running the test on both this patch and
> your suggestion gives the same results, so yes your suggestion has the
> same output.
> 
> Do you think it's worth to add such unit test?

Sure, if we can easily test this via such a unit test I think it would
be a good addition.

Patrick

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth
  2025-07-29 18:57 [PATCH 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
                   ` (2 preceding siblings ...)
  2025-07-29 18:57 ` [PATCH 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
@ 2025-08-07 20:52 ` Toon Claes
  2025-08-07 20:52   ` [PATCH v2 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
                     ` (3 more replies)
  3 siblings, 4 replies; 16+ messages in thread
From: Toon Claes @ 2025-08-07 20:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Justin Tobler, Patrick Steinhardt, Toon Claes

Please consider these patches authored by Peff. They add option
`--max-depth` to the diff machinery.

During the process to upstream the git-blame-tree(1), later named
git-last-modified(1), various times[1][2][3] the topic was raised to add
option `--max-depth` to the diff machinery. In this patch series this
change is added as a separate patch series.

The patches were originally written by Peff[4] and I'm crediting him as
the author. I've taken the patches almost verbatim from his fork on
GitHub, with some minor tweaks in the commit messages. Because only tiny
changes were made, I've kept his Signed-off-by trailers, but I can
remove if disagreed on.

The goal of the option `--max-depth` is to stop recursively traversing
the tree if the given depth is reached from the pathspec.

These patches add `max_depth` and `max_depth_valid` to `struct
diff_options`. This is different from what git-grep(1) does, which uses
`max_depth` on `struct pathspec` instead. At the moment I'm on the fence
whether this is an issue: while it probably makes sense to consolidate
them into the same structs, it does not really make sense to reuse these
the struct fields if they are used in two separate code paths.

[1]: https://lore.kernel.org/git/20130318121243.GC14789@sigill.intra.peff.net/
[2]: https://lore.kernel.org/git/20160831054201.ldlwptlmcndjmfwu@sigill.intra.peff.net/
[3]: https://lore.kernel.org/git/Y+%2FmnnJUz75yfWCN@coredump.intra.peff.net/
[4]: https://github.com/peff/git/tree/jk/diff-max-depth

---
Changes in v2:
- Added unit-tests for within_depth() in dir.c. Originally the patch
  was a oneline change by Peff, but I've added a bunch of code and
  extended the commit message, so I've set myself as the author and set
  a Based-on-patch-by trailer for Peff. I hope that's okay?
- Added support for --max-depth=-1 and extended code comments why we
  need the max_depth_valid flag. With these modification it did no
  longer feel appropriate to keep Peff's Signed-off-by trailer.
- Made die() messages translatable.
- Small tweaks to the docs.
- Added some const-correctness.
- Switched from `die("BUG: ...")` to `BUG(...)`.
- Link to v1: https://lore.kernel.org/r/20250729-toon-max-depth-v1-0-c177e39c40fb@iotcl.com

---
Jeff King (2):
      combine-diff: zero memory used for callback filepairs
      diff: teach tree-diff a max-depth parameter

Toon Claes (1):
      within_depth: fix return for empty path

 Documentation/diff-options.adoc |  28 ++++++++++
 Makefile                        |   1 +
 combine-diff.c                  |   2 +-
 diff-lib.c                      |   5 ++
 diff.c                          |  24 +++++++++
 diff.h                          |   8 +++
 dir.c                           |   2 +-
 t/meson.build                   |   2 +
 t/t4072-diff-max-depth.sh       | 116 ++++++++++++++++++++++++++++++++++++++++
 t/unit-tests/u-dir.c            |  47 ++++++++++++++++
 tree-diff.c                     |  78 +++++++++++++++++++++++++--
 11 files changed, 308 insertions(+), 5 deletions(-)
---

Range-diff versus v1:

1:  b16919a12d = 1:  f551775c58 combine-diff: zero memory used for callback filepairs
2:  c67ed5edee ! 2:  739374a2f8 within_depth: fix return for empty path
    @@
      ## Metadata ##
    -Author: Jeff King <peff@peff.net>
    +Author: Toon Claes <toon@iotcl.com>
     
      ## Commit message ##
         within_depth: fix return for empty path
    @@ Commit message
         The within_depth() function is used to check whether pathspecs limited
         by a max-depth parameter are acceptable. It takes a path to check, a
         maximum depth, and a "base" depth. It counts the components in the
    -    path (by counting slashes), adds them to the base, and compare them to
    +    path (by counting slashes), adds them to the base, and compares them to
         the maximum.
     
         However, if the base does not have any slashes at all, we always return
    @@ Commit message
         to change that in the future, then this fix is still the correct first
         step, as the current behavior is simply inconsistent.
     
    -    Signed-off-by: Jeff King <peff@peff.net>
    +    This has the effect the function tree_entry_interesting() now behaves
    +    like following on the first example:
    +
    +      (with max_depth=0):
    +          file: yes
    +             a: no
    +        a/file: no
    +           a/b: no
    +
    +    Meaning we won't step in "a/" no more to realize all "a/*" entries are
    +    uninterested, but we stop at the tree entry itself.
    +
    +    Based-on-patch-by: Jeff King <peff@peff.net>
         Signed-off-by: Toon Claes <toon@iotcl.com>
     
    + ## Makefile ##
    +@@ Makefile: THIRD_PARTY_SOURCES += $(UNIT_TEST_DIR)/clar/%
    + THIRD_PARTY_SOURCES += $(UNIT_TEST_DIR)/clar/clar/%
    + 
    + CLAR_TEST_SUITES += u-ctype
    ++CLAR_TEST_SUITES += u-dir
    + CLAR_TEST_SUITES += u-example-decorate
    + CLAR_TEST_SUITES += u-hash
    + CLAR_TEST_SUITES += u-hashmap
    +
      ## dir.c ##
     @@ dir.c: int within_depth(const char *name, int namelen,
      		if (depth > max_depth)
    @@ dir.c: int within_depth(const char *name, int namelen,
      }
      
      /*
    +
    + ## t/meson.build ##
    +@@
    + clar_test_suites = [
    +   'unit-tests/u-ctype.c',
    ++  'unit-tests/u-dir.c',
    +   'unit-tests/u-example-decorate.c',
    +   'unit-tests/u-hash.c',
    +   'unit-tests/u-hashmap.c',
    +
    + ## t/unit-tests/u-dir.c (new) ##
    +@@
    ++#include "unit-test.h"
    ++#include "dir.h"
    ++
    ++#define TEST_WITHIN_DEPTH(path, depth, max_depth, expect) do { \
    ++		int actual = within_depth(path, strlen(path), \
    ++					  depth, max_depth); \
    ++		if (actual != expect) \
    ++			cl_failf("path '%s' with depth '%d' and max-depth '%d': expected %d, got %d", \
    ++				 path, depth, max_depth, expect, actual); \
    ++	} while (0)
    ++
    ++void test_dir__within_depth(void)
    ++{
    ++	/* depth = 0; max_depth = 0 */
    ++	TEST_WITHIN_DEPTH("",         0, 0, 1);
    ++	TEST_WITHIN_DEPTH("file",     0, 0, 1);
    ++	TEST_WITHIN_DEPTH("a",        0, 0, 1);
    ++	TEST_WITHIN_DEPTH("a/file",   0, 0, 0);
    ++	TEST_WITHIN_DEPTH("a/b",      0, 0, 0);
    ++	TEST_WITHIN_DEPTH("a/b/file", 0, 0, 0);
    ++
    ++	/* depth = 0; max_depth = 1 */
    ++	TEST_WITHIN_DEPTH("",         0, 1, 1);
    ++	TEST_WITHIN_DEPTH("file",     0, 1, 1);
    ++	TEST_WITHIN_DEPTH("a",        0, 1, 1);
    ++	TEST_WITHIN_DEPTH("a/file",   0, 1, 1);
    ++	TEST_WITHIN_DEPTH("a/b",      0, 1, 1);
    ++	TEST_WITHIN_DEPTH("a/b/file", 0, 1, 0);
    ++
    ++	/* depth = 1; max_depth = 1 */
    ++	TEST_WITHIN_DEPTH("",         1, 1, 1);
    ++	TEST_WITHIN_DEPTH("file",     1, 1, 1);
    ++	TEST_WITHIN_DEPTH("a",        1, 1, 1);
    ++	TEST_WITHIN_DEPTH("a/file",   1, 1, 0);
    ++	TEST_WITHIN_DEPTH("a/b",      1, 1, 0);
    ++	TEST_WITHIN_DEPTH("a/b/file", 1, 1, 0);
    ++
    ++	/* depth = 1; max_depth = 0 */
    ++	TEST_WITHIN_DEPTH("",         1, 0, 0);
    ++	TEST_WITHIN_DEPTH("file",     1, 0, 0);
    ++	TEST_WITHIN_DEPTH("a",        1, 0, 0);
    ++	TEST_WITHIN_DEPTH("a/file",   1, 0, 0);
    ++	TEST_WITHIN_DEPTH("a/b",      1, 0, 0);
    ++	TEST_WITHIN_DEPTH("a/b/file", 1, 0, 0);
    ++
    ++
    ++}
3:  b3eb4863c2 ! 3:  2987dbc007 diff: teach tree-diff a max-depth parameter
    @@ Commit message
         and see the raw output for a/b/c/, but not those of a/b/c/d/
         (instead of the raw output you would see for a/b/c/d).
     
    -    Signed-off-by: Jeff King <peff@peff.net>
    +    Co-authored-by: Toon Claes <toon@iotcl.com>
         Signed-off-by: Toon Claes <toon@iotcl.com>
     
      ## Documentation/diff-options.adoc ##
    @@ Documentation/diff-options.adoc: endif::git-format-patch[]
      	reverted with `--ita-visible-in-index`. Both options are
      	experimental and could be removed in future.
      
    -+--max-depth=<n>::
    -+
    -+	Limit diff recursion to `<n>` levels (implies `-r`). The depth
    -+	is measured from the closest pathspec. Given a tree containing
    -+	`foo/bar/baz`, the following list shows the matches generated by
    -+	each set of options:
    ++--max-depth=<depth>::
    ++	For each pathspec given on command line, descend at most `<depth>`
    ++	levels of directories. A value of `-1` means no limit.
    ++	Cannot be combined with wildcards in the pathspec.
    ++	Given a tree containing `foo/bar/baz`, the following list shows the
    ++	matches generated by each set of options:
     ++
     +--
     + - `--max-depth=0 -- foo`: `foo`
    @@ diff-lib.c: void run_diff_files(struct rev_info *revs, unsigned int option)
      	struct index_state *istate = revs->diffopt.repo->index;
      
     +	if (revs->diffopt.max_depth_valid)
    -+		die("max-depth is not supported for worktree diffs");
    ++		die(_("max-depth is not supported for worktree diffs"));
     +
      	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
      
    @@ diff-lib.c: static int diff_cache(struct rev_info *revs,
      	opts.pathspec = &revs->diffopt.pathspec;
      	opts.pathspec->recursive = 1;
     +	if (revs->diffopt.max_depth_valid)
    -+		die("max-depth is not supported for index diffs");
    ++		die(_("max-depth is not supported for index diffs"));
      
      	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
      	return unpack_trees(1, &t, &opts);
    @@ diff.c: static int diff_opt_rotate_to(const struct option *opt, const char *arg,
     +	struct diff_options *options = opt->value;
     +
     +	BUG_ON_OPT_NEG(unset);
    ++
    ++	if (!git_parse_int(arg, &options->max_depth))
    ++		return error(_("invalid value for '%s': '%s'"),
    ++			     "--max-depth", arg);
    ++
     +	options->flags.recursive = 1;
    -+	options->max_depth = strtol(arg, NULL, 10);
    -+	options->max_depth_valid = 1;
    ++	options->max_depth_valid = options->max_depth >= 0;
    ++
     +	return 0;
     +}
     +
    @@ diff.h: struct diff_options {
      	int no_free;
     +
     +	/*
    -+	 * The extra "valid" flag is a slight hack. The value "0" is perfectly
    -+	 * valid for max-depth. We would normally use -1 to indicate "not set",
    -+	 * but there are many code paths which assume that assume that just
    -+	 * zero-ing out a diff_options is enough to initialize it.
    ++	 * The value '0' is a valid max-depth (for no recursion), and value '-1'
    ++	 * also (for unlimited recursion), so the extra "valid" flag is used to
    ++	 * determined whether the user specified option --max-depth.
     +	 */
     +	int max_depth;
     +	int max_depth_valid;
    @@ t/t4072-diff-max-depth.sh (new)
     +	check_one tree "$*" '' 1 file one/file one/two
     +	check_one tree "$*" '' 2 file one/file one/two/file one/two/three
     +	check_one tree "$*" '' 3 file one/file one/two/file one/two/three/file
    ++	check_one tree "$*" '' -1 file one/file one/two/file one/two/three/file
     +	check_one tree "$*" one 0 one
     +	check_one tree "$*" one 1 one/file one/two
     +	check_one tree "$*" one 2 one/file one/two/file one/two/three
    @@ t/t4072-diff-max-depth.sh (new)
     +	check_one tree "$*" one/two 0 one/two
     +	check_one tree "$*" one/two 1 one/two/file one/two/three
     +	check_one tree "$*" one/two 2 one/two/file one/two/three/file
    ++	check_one tree "$*" one/two 2 one/two/file one/two/three/file
     +	check_one tree "$*" one/two/three 0 one/two/three
     +	check_one tree "$*" one/two/three 1 one/two/three/file
     +}
    @@ t/t4072-diff-max-depth.sh (new)
     +	check_one "$@" one/two 2 one/two/file one/two/three/file
     +	check_one "$@" one/two/three 0
     +	check_one "$@" one/two/three 1 one/two/three/file
    ++
    ++	# Value '-1' for '--max-depth is the same as recursion without limit,
    ++	# and thus should always succeed.
    ++	local expect=
    ++	check_one "$@" '' -1 file one/file one/two/file one/two/three/file
     +}
     +
     +# Check as a modification...
    @@ tree-diff.c
     +		(!path[dirlen] || path[dirlen] == '/');
     +}
     +
    -+static int check_recursion_depth(struct strbuf *name,
    ++static int check_recursion_depth(const struct strbuf *name,
     +				 const struct pathspec *ps,
     +				 int max_depth)
     +{
    @@ tree-diff.c
     +	return 0;
     +}
     +
    -+static int should_recurse(struct strbuf *name, struct diff_options *opt)
    ++static int should_recurse(const struct strbuf *name, struct diff_options *opt)
     +{
     +	if (!opt->flags.recursive)
     +		return 0;
    @@ tree-diff.c
     +	 * against any internal munging.
     +	 */
     +	if (opt->pathspec.has_wildcard)
    -+		die("BUG: wildcard pathspecs are incompatible with max-depth");
    ++		BUG("wildcard pathspecs are incompatible with max-depth");
     +
     +	return check_recursion_depth(name, &opt->pathspec, opt->max_depth);
     +}


---

base-commit: 64cbe5e2e8a7b0f92c780b210e602496bd5cad0f
change-id: 20250724-toon-max-depth-25d3c19e2607

Thanks
--
Toon


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 1/3] combine-diff: zero memory used for callback filepairs
  2025-08-07 20:52 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
@ 2025-08-07 20:52   ` Toon Claes
  2025-08-07 20:52   ` [PATCH v2 2/3] within_depth: fix return for empty path Toon Claes
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Toon Claes @ 2025-08-07 20:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Justin Tobler, Patrick Steinhardt, Toon Claes

From: Jeff King <peff@peff.net>

In commit 25e5e2bf85 (combine-diff: support format_callback,
2011-08-19), the combined-diff code learned how to make a multi-sourced
`diff_filepair` to pass to a diff callback. When we create each
filepair, we do not bother to fill in many of the fields, because they
would make no sense (e.g. there can be no rename score or broken_pair
flag because we do not go through the diffcore filters). However, we did
not even bother to zero them, leading to random values. Let's make sure
everything is blank with xcalloc(), just as the regular diff code does.

We would potentially want to set the `status` flag to
something non-zero, but it is not clear to what. Possibly a
new DIFF_STATUS_COMBINED would make sense, as this is not
strictly a modification, nor does it fit any other category.

Since it is not yet clear what callers would want, this
patch simply leaves it as `0`, the same empty flag that is
seen when `diffcore_std` is not used at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
---
 combine-diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/combine-diff.c b/combine-diff.c
index 4ea2dc93c4..3878faabe7 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1315,7 +1315,7 @@ static struct diff_filepair *combined_pair(struct combine_diff_path *p,
 	struct diff_filepair *pair;
 	struct diff_filespec *pool;
 
-	pair = xmalloc(sizeof(*pair));
+	CALLOC_ARRAY(pair, 1);
 	CALLOC_ARRAY(pool, st_add(num_parent, 1));
 	pair->one = pool + 1;
 	pair->two = pool;

-- 
2.50.1.327.g047016eb4a


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/3] within_depth: fix return for empty path
  2025-08-07 20:52 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
  2025-08-07 20:52   ` [PATCH v2 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
@ 2025-08-07 20:52   ` Toon Claes
  2025-08-07 20:52   ` [PATCH v2 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
  2025-08-14 15:15   ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Junio C Hamano
  3 siblings, 0 replies; 16+ messages in thread
From: Toon Claes @ 2025-08-07 20:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Justin Tobler, Patrick Steinhardt, Toon Claes

The within_depth() function is used to check whether pathspecs limited
by a max-depth parameter are acceptable. It takes a path to check, a
maximum depth, and a "base" depth. It counts the components in the
path (by counting slashes), adds them to the base, and compares them to
the maximum.

However, if the base does not have any slashes at all, we always return
`true`. If the base depth is 0, then this is correct; no matter what the
maximum is, we are always within it. However, if the base depth is
greater than 0, then we might return an erroneous result.

This ends up not causing any user-visible bugs in the current code. The
call sites in dir.c always pass a base depth of 0, so are unaffected.
But tree_entry_interesting() uses this function differently: it will
pass the prefix of the current entry, along with a `1` if the entry is a
directory, in essence checking whether items inside the entry would be
of interest. It turns out not to make a difference in behavior, but the
reasoning is complex.

Given a tree like:

  file
  a/file
  a/b/file

walking the tree and calling tree_entry_interesting() will yield the
following results:

  (with max_depth=0):
      file: yes
         a: yes
    a/file: no
       a/b: no

  (with max_depth=1):
      file: yes
         a: yes
    a/file: yes
       a/b: no

So we have inconsistent behavior in considering directories interesting.
If they are at the edge of our depth but at the root, we will recurse
into them, but then find all of their entries uninteresting (e.g., in
the first case, we will look at "a" but find "a/*" uninteresting). But
if they are at the edge of our depth and not at the root, then we will
not recurse (in the second example, we do not even bother entering
"a/b").

This turns out not to matter because the only caller which uses
max-depth pathspecs is cmd_grep(), which only cares about blob entries.
From its perspective, it is exactly the same to not recurse into a
subtree, or to recurse and find that it contains no matching entries.
Not recursing is merely an optimization.

It is debatable whether tree_entry_interesting() should consider such an
entry interesting. The only caller does not care if it sees the tree
itself, and can benefit from the optimization. But if we add a
"max-depth" limiter to regular diffs, then a diff with
DIFF_OPT_TREE_IN_RECURSIVE would probably want to show the tree itself,
but not what it contains.

This patch just fixes within_depth(), which means we consider such
entries uninteresting (and makes the current caller happy). If we want
to change that in the future, then this fix is still the correct first
step, as the current behavior is simply inconsistent.

This has the effect the function tree_entry_interesting() now behaves
like following on the first example:

  (with max_depth=0):
      file: yes
         a: no
    a/file: no
       a/b: no

Meaning we won't step in "a/" no more to realize all "a/*" entries are
uninterested, but we stop at the tree entry itself.

Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Toon Claes <toon@iotcl.com>
---
 Makefile             |  1 +
 dir.c                |  2 +-
 t/meson.build        |  1 +
 t/unit-tests/u-dir.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index e11340c1ae..8d403301d9 100644
--- a/Makefile
+++ b/Makefile
@@ -1354,6 +1354,7 @@ THIRD_PARTY_SOURCES += $(UNIT_TEST_DIR)/clar/%
 THIRD_PARTY_SOURCES += $(UNIT_TEST_DIR)/clar/clar/%
 
 CLAR_TEST_SUITES += u-ctype
+CLAR_TEST_SUITES += u-dir
 CLAR_TEST_SUITES += u-example-decorate
 CLAR_TEST_SUITES += u-hash
 CLAR_TEST_SUITES += u-hashmap
diff --git a/dir.c b/dir.c
index dfb4d40103..71108ac79b 100644
--- a/dir.c
+++ b/dir.c
@@ -277,7 +277,7 @@ int within_depth(const char *name, int namelen,
 		if (depth > max_depth)
 			return 0;
 	}
-	return 1;
+	return depth <= max_depth;
 }
 
 /*
diff --git a/t/meson.build b/t/meson.build
index bbeba1a8d5..03245e2160 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -1,5 +1,6 @@
 clar_test_suites = [
   'unit-tests/u-ctype.c',
+  'unit-tests/u-dir.c',
   'unit-tests/u-example-decorate.c',
   'unit-tests/u-hash.c',
   'unit-tests/u-hashmap.c',
diff --git a/t/unit-tests/u-dir.c b/t/unit-tests/u-dir.c
new file mode 100644
index 0000000000..2d0adaa39e
--- /dev/null
+++ b/t/unit-tests/u-dir.c
@@ -0,0 +1,47 @@
+#include "unit-test.h"
+#include "dir.h"
+
+#define TEST_WITHIN_DEPTH(path, depth, max_depth, expect) do { \
+		int actual = within_depth(path, strlen(path), \
+					  depth, max_depth); \
+		if (actual != expect) \
+			cl_failf("path '%s' with depth '%d' and max-depth '%d': expected %d, got %d", \
+				 path, depth, max_depth, expect, actual); \
+	} while (0)
+
+void test_dir__within_depth(void)
+{
+	/* depth = 0; max_depth = 0 */
+	TEST_WITHIN_DEPTH("",         0, 0, 1);
+	TEST_WITHIN_DEPTH("file",     0, 0, 1);
+	TEST_WITHIN_DEPTH("a",        0, 0, 1);
+	TEST_WITHIN_DEPTH("a/file",   0, 0, 0);
+	TEST_WITHIN_DEPTH("a/b",      0, 0, 0);
+	TEST_WITHIN_DEPTH("a/b/file", 0, 0, 0);
+
+	/* depth = 0; max_depth = 1 */
+	TEST_WITHIN_DEPTH("",         0, 1, 1);
+	TEST_WITHIN_DEPTH("file",     0, 1, 1);
+	TEST_WITHIN_DEPTH("a",        0, 1, 1);
+	TEST_WITHIN_DEPTH("a/file",   0, 1, 1);
+	TEST_WITHIN_DEPTH("a/b",      0, 1, 1);
+	TEST_WITHIN_DEPTH("a/b/file", 0, 1, 0);
+
+	/* depth = 1; max_depth = 1 */
+	TEST_WITHIN_DEPTH("",         1, 1, 1);
+	TEST_WITHIN_DEPTH("file",     1, 1, 1);
+	TEST_WITHIN_DEPTH("a",        1, 1, 1);
+	TEST_WITHIN_DEPTH("a/file",   1, 1, 0);
+	TEST_WITHIN_DEPTH("a/b",      1, 1, 0);
+	TEST_WITHIN_DEPTH("a/b/file", 1, 1, 0);
+
+	/* depth = 1; max_depth = 0 */
+	TEST_WITHIN_DEPTH("",         1, 0, 0);
+	TEST_WITHIN_DEPTH("file",     1, 0, 0);
+	TEST_WITHIN_DEPTH("a",        1, 0, 0);
+	TEST_WITHIN_DEPTH("a/file",   1, 0, 0);
+	TEST_WITHIN_DEPTH("a/b",      1, 0, 0);
+	TEST_WITHIN_DEPTH("a/b/file", 1, 0, 0);
+
+
+}

-- 
2.50.1.327.g047016eb4a


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/3] diff: teach tree-diff a max-depth parameter
  2025-08-07 20:52 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
  2025-08-07 20:52   ` [PATCH v2 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
  2025-08-07 20:52   ` [PATCH v2 2/3] within_depth: fix return for empty path Toon Claes
@ 2025-08-07 20:52   ` Toon Claes
  2025-08-14 15:15   ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Junio C Hamano
  3 siblings, 0 replies; 16+ messages in thread
From: Toon Claes @ 2025-08-07 20:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Justin Tobler, Patrick Steinhardt, Toon Claes

From: Jeff King <peff@peff.net>

When you are doing a tree-diff, there are basically two options: do not
recurse into subtrees at all, or recurse indefinitely. While most
callers would want to always recurse and see full pathnames, some may
want the efficiency of looking only at a particular level of the tree.
This is currently easy to do for the top-level (just turn off
recursion), but you cannot say "show me what changed in subdir/, but do
not recurse".

This patch adds a max-depth parameter which is measured from the closest
pathspec match, so that you can do:

  git log --raw --max-depth=1 -- a/b/c

and see the raw output for a/b/c/, but not those of a/b/c/d/
(instead of the raw output you would see for a/b/c/d).

Co-authored-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Toon Claes <toon@iotcl.com>
---
 Documentation/diff-options.adoc |  28 ++++++++++
 diff-lib.c                      |   5 ++
 diff.c                          |  24 +++++++++
 diff.h                          |   8 +++
 t/meson.build                   |   1 +
 t/t4072-diff-max-depth.sh       | 116 ++++++++++++++++++++++++++++++++++++++++
 tree-diff.c                     |  78 +++++++++++++++++++++++++--
 7 files changed, 257 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc
index f3a35d8141..b4cbbd4290 100644
--- a/Documentation/diff-options.adoc
+++ b/Documentation/diff-options.adoc
@@ -893,5 +893,33 @@ endif::git-format-patch[]
 	reverted with `--ita-visible-in-index`. Both options are
 	experimental and could be removed in future.
 
+--max-depth=<depth>::
+	For each pathspec given on command line, descend at most `<depth>`
+	levels of directories. A value of `-1` means no limit.
+	Cannot be combined with wildcards in the pathspec.
+	Given a tree containing `foo/bar/baz`, the following list shows the
+	matches generated by each set of options:
++
+--
+ - `--max-depth=0 -- foo`: `foo`
+
+ - `--max-depth=1 -- foo`: `foo/bar`
+
+ - `--max-depth=1 -- foo/bar`: `foo/bar/baz`
+
+ - `--max-depth=1 -- foo foo/bar`: `foo/bar/baz`
+
+ - `--max-depth=2 -- foo`: `foo/bar/baz`
+--
++
+If no pathspec is given, the depth is measured as if all
+top-level entries were specified. Note that this is different
+than measuring from the root, in that `--max-depth=0` would
+still return `foo`. This allows you to still limit depth while
+asking for a subset of the top-level entries.
++
+Note that this option is only supported for diffs between tree objects,
+not against the index or working tree.
+
 For more detailed explanation on these common options, see also
 linkgit:gitdiffcore[7].
diff --git a/diff-lib.c b/diff-lib.c
index 244468dd1a..b8f8f3bc31 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -115,6 +115,9 @@ void run_diff_files(struct rev_info *revs, unsigned int option)
 	uint64_t start = getnanotime();
 	struct index_state *istate = revs->diffopt.repo->index;
 
+	if (revs->diffopt.max_depth_valid)
+		die(_("max-depth is not supported for worktree diffs"));
+
 	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
 
 	refresh_fsmonitor(istate);
@@ -560,6 +563,8 @@ static int diff_cache(struct rev_info *revs,
 	opts.dst_index = NULL;
 	opts.pathspec = &revs->diffopt.pathspec;
 	opts.pathspec->recursive = 1;
+	if (revs->diffopt.max_depth_valid)
+		die(_("max-depth is not supported for index diffs"));
 
 	init_tree_desc(&t, &tree->object.oid, tree->buffer, tree->size);
 	return unpack_trees(1, &t, &opts);
diff --git a/diff.c b/diff.c
index dca87e164f..b548e952fc 100644
--- a/diff.c
+++ b/diff.c
@@ -4988,6 +4988,9 @@ void diff_setup_done(struct diff_options *options)
 			options->filter = ~filter_bit[DIFF_STATUS_FILTER_AON];
 		options->filter &= ~options->filter_not;
 	}
+
+	if (options->pathspec.has_wildcard && options->max_depth_valid)
+		die("max-depth cannot be used with wildcard pathspecs");
 }
 
 int parse_long_opt(const char *opt, const char **argv,
@@ -5622,6 +5625,23 @@ static int diff_opt_rotate_to(const struct option *opt, const char *arg, int uns
 	return 0;
 }
 
+static int diff_opt_max_depth(const struct option *opt,
+			      const char *arg, int unset)
+{
+	struct diff_options *options = opt->value;
+
+	BUG_ON_OPT_NEG(unset);
+
+	if (!git_parse_int(arg, &options->max_depth))
+		return error(_("invalid value for '%s': '%s'"),
+			     "--max-depth", arg);
+
+	options->flags.recursive = 1;
+	options->max_depth_valid = options->max_depth >= 0;
+
+	return 0;
+}
+
 /*
  * Consider adding new flags to __git_diff_common_options
  * in contrib/completion/git-completion.bash
@@ -5894,6 +5914,10 @@ struct option *add_diff_options(const struct option *opts,
 		OPT_CALLBACK_F(0, "diff-filter", options, N_("[(A|C|D|M|R|T|U|X|B)...[*]]"),
 			       N_("select files by diff type"),
 			       PARSE_OPT_NONEG, diff_opt_diff_filter),
+		OPT_CALLBACK_F(0, "max-depth", options, N_("<depth>"),
+			       N_("maximum tree depth to recurse"),
+			       PARSE_OPT_NONEG, diff_opt_max_depth),
+
 		{
 			.type = OPTION_CALLBACK,
 			.long_name = "output",
diff --git a/diff.h b/diff.h
index 62e5768a9a..bbced5f745 100644
--- a/diff.h
+++ b/diff.h
@@ -404,6 +404,14 @@ struct diff_options {
 	struct strmap *additional_path_headers;
 
 	int no_free;
+
+	/*
+	 * The value '0' is a valid max-depth (for no recursion), and value '-1'
+	 * also (for unlimited recursion), so the extra "valid" flag is used to
+	 * determined whether the user specified option --max-depth.
+	 */
+	int max_depth;
+	int max_depth_valid;
 };
 
 unsigned diff_filter_bit(char status);
diff --git a/t/meson.build b/t/meson.build
index 03245e2160..adae038f4a 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -487,6 +487,7 @@ integration_tests = [
   't4069-remerge-diff.sh',
   't4070-diff-pairs.sh',
   't4071-diff-minimal.sh',
+  't4072-diff-max-depth.sh',
   't4100-apply-stat.sh',
   't4101-apply-nonl.sh',
   't4102-apply-rename.sh',
diff --git a/t/t4072-diff-max-depth.sh b/t/t4072-diff-max-depth.sh
new file mode 100755
index 0000000000..0fbf1321f7
--- /dev/null
+++ b/t/t4072-diff-max-depth.sh
@@ -0,0 +1,116 @@
+#!/bin/sh
+
+test_description='check that diff --max-depth will limit recursion'
+. ./test-lib.sh
+
+make_dir() {
+	mkdir -p "$1" &&
+	echo "$2" >"$1/file"
+}
+
+make_files() {
+	echo "$1" >file &&
+	make_dir one "$1" &&
+	make_dir one/two "$1" &&
+	make_dir one/two/three "$1"
+}
+
+test_expect_success 'setup' '
+	git commit --allow-empty -m empty &&
+	git tag empty &&
+	make_files added &&
+	git add . &&
+	git commit -m added &&
+	make_files modified &&
+	git add . &&
+	git commit -m modified &&
+	make_files index &&
+	git add . &&
+	make_files worktree
+'
+
+test_expect_success '--max-depth is disallowed with wildcard pathspecs' '
+	test_must_fail git diff-tree --max-depth=0 HEAD^ HEAD -- "f*"
+'
+
+check_one() {
+	type=$1; shift
+	args=$1; shift
+	path=$1; shift
+	depth=$1; shift
+	test_expect_${expect:-success} "diff-$type $args, path=$path, depth=$depth" "
+		for i in $*; do echo \$i; done >expect &&
+		git diff-$type --max-depth=$depth --name-only $args -- $path >actual &&
+		test_cmp expect actual
+	"
+}
+
+# For tree comparisons, we expect to see subtrees at the boundary
+# get their own entry.
+check_trees() {
+	check_one tree "$*" '' 0 file one
+	check_one tree "$*" '' 1 file one/file one/two
+	check_one tree "$*" '' 2 file one/file one/two/file one/two/three
+	check_one tree "$*" '' 3 file one/file one/two/file one/two/three/file
+	check_one tree "$*" '' -1 file one/file one/two/file one/two/three/file
+	check_one tree "$*" one 0 one
+	check_one tree "$*" one 1 one/file one/two
+	check_one tree "$*" one 2 one/file one/two/file one/two/three
+	check_one tree "$*" one 3 one/file one/two/file one/two/three/file
+	check_one tree "$*" one/two 0 one/two
+	check_one tree "$*" one/two 1 one/two/file one/two/three
+	check_one tree "$*" one/two 2 one/two/file one/two/three/file
+	check_one tree "$*" one/two 2 one/two/file one/two/three/file
+	check_one tree "$*" one/two/three 0 one/two/three
+	check_one tree "$*" one/two/three 1 one/two/three/file
+}
+
+# But for index comparisons, we do not store subtrees at all, so we do not
+# expect them.
+check_index() {
+	check_one "$@" '' 0 file
+	check_one "$@" '' 1 file one/file
+	check_one "$@" '' 2 file one/file one/two/file
+	check_one "$@" '' 3 file one/file one/two/file one/two/three/file
+	check_one "$@" one 0
+	check_one "$@" one 1 one/file
+	check_one "$@" one 2 one/file one/two/file
+	check_one "$@" one 3 one/file one/two/file one/two/three/file
+	check_one "$@" one/two 0
+	check_one "$@" one/two 1 one/two/file
+	check_one "$@" one/two 2 one/two/file one/two/three/file
+	check_one "$@" one/two/three 0
+	check_one "$@" one/two/three 1 one/two/three/file
+
+	# Value '-1' for '--max-depth is the same as recursion without limit,
+	# and thus should always succeed.
+	local expect=
+	check_one "$@" '' -1 file one/file one/two/file one/two/three/file
+}
+
+# Check as a modification...
+check_trees HEAD^ HEAD
+# ...and as an addition...
+check_trees empty HEAD
+# ...and as a deletion.
+check_trees HEAD empty
+
+# We currently only implement max-depth for trees.
+expect=failure
+# Check index against a tree
+check_index index "--cached HEAD"
+# and index against the worktree
+check_index files ""
+expect=
+
+test_expect_success 'find shortest path within embedded pathspecs' '
+	cat >expect <<-\EOF &&
+	one/file
+	one/two/file
+	one/two/three/file
+	EOF
+	git diff-tree --max-depth=2 --name-only HEAD^ HEAD -- one one/two >actual &&
+	test_cmp expect actual
+'
+
+test_done
diff --git a/tree-diff.c b/tree-diff.c
index e00fc2f450..5988148b60 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -13,6 +13,7 @@
 #include "tree-walk.h"
 #include "environment.h"
 #include "repository.h"
+#include "dir.h"
 
 /*
  * Some mode bits are also used internally for computations.
@@ -48,6 +49,73 @@
 		free((x)); \
 } while(0)
 
+/* Returns true if and only if "dir" is a leading directory of "path" */
+static int is_dir_prefix(const char *path, const char *dir, int dirlen)
+{
+	return !strncmp(path, dir, dirlen) &&
+		(!path[dirlen] || path[dirlen] == '/');
+}
+
+static int check_recursion_depth(const struct strbuf *name,
+				 const struct pathspec *ps,
+				 int max_depth)
+{
+	int i;
+
+	if (!ps->nr)
+		return within_depth(name->buf, name->len, 1, max_depth);
+
+	/*
+	 * We look through the pathspecs in reverse-sorted order, because we
+	 * want to find the longest match first (e.g., "a/b" is better for
+	 * checking depth than "a/b/c").
+	 */
+	for (i = ps->nr - 1; i >= 0; i--) {
+		const struct pathspec_item *item = ps->items+i;
+
+		/*
+		 * If the name to match is longer than the pathspec, then we
+		 * are only interested if the pathspec matches and we are
+		 * within the allowed depth.
+		 */
+		if (name->len >= item->len) {
+			if (!is_dir_prefix(name->buf, item->match, item->len))
+				continue;
+			return within_depth(name->buf + item->len,
+					    name->len - item->len,
+					    1, max_depth);
+		}
+
+		/*
+		 * Otherwise, our name is shorter than the pathspec. We need to
+		 * check if it is a prefix of the pathspec; if so, we must
+		 * always recurse in order to process further (the resulting
+		 * paths we find might or might not match our pathspec, but we
+		 * cannot know until we recurse).
+		 */
+		if (is_dir_prefix(item->match, name->buf, name->len))
+			return 1;
+	}
+	return 0;
+}
+
+static int should_recurse(const struct strbuf *name, struct diff_options *opt)
+{
+	if (!opt->flags.recursive)
+		return 0;
+	if (!opt->max_depth_valid)
+		return 1;
+
+	/*
+	 * We catch this during diff_setup_done, but let's double-check
+	 * against any internal munging.
+	 */
+	if (opt->pathspec.has_wildcard)
+		BUG("wildcard pathspecs are incompatible with max-depth");
+
+	return check_recursion_depth(name, &opt->pathspec, opt->max_depth);
+}
+
 static void ll_diff_tree_paths(
 	struct combine_diff_path ***tail, const struct object_id *oid,
 	const struct object_id **parents_oid, int nparent,
@@ -170,9 +238,13 @@ static void emit_path(struct combine_diff_path ***tail,
 		mode = 0;
 	}
 
-	if (opt->flags.recursive && isdir) {
-		recurse = 1;
-		emitthis = opt->flags.tree_in_recursive;
+	if (isdir) {
+		strbuf_add(base, path, pathlen);
+		if (should_recurse(base, opt)) {
+			recurse = 1;
+			emitthis = opt->flags.tree_in_recursive;
+		}
+		strbuf_setlen(base, old_baselen);
 	}
 
 	if (emitthis) {

-- 
2.50.1.327.g047016eb4a


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth
  2025-08-07 20:52 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
                     ` (2 preceding siblings ...)
  2025-08-07 20:52   ` [PATCH v2 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
@ 2025-08-14 15:15   ` Junio C Hamano
  2025-08-15  5:17     ` Patrick Steinhardt
  3 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2025-08-14 15:15 UTC (permalink / raw)
  To: git; +Cc: Toon Claes, Jeff King, Justin Tobler, Patrick Steinhardt

Toon Claes <toon@iotcl.com> writes:

> Changes in v2:
> - Added unit-tests for within_depth() in dir.c. Originally the patch
>   was a oneline change by Peff, but I've added a bunch of code and
>   extended the commit message, so I've set myself as the author and set
>   a Based-on-patch-by trailer for Peff. I hope that's okay?
> - Added support for --max-depth=-1 and extended code comments why we
>   need the max_depth_valid flag. With these modification it did no
>   longer feel appropriate to keep Peff's Signed-off-by trailer.
> - Made die() messages translatable.
> - Small tweaks to the docs.
> - Added some const-correctness.
> - Switched from `die("BUG: ...")` to `BUG(...)`.
> - Link to v1: https://lore.kernel.org/r/20250729-toon-max-depth-v1-0-c177e39c40fb@iotcl.com

This round seems to have attracted no comments, after seeing and
reacting to a few comments in the previous rounds.  Should this be
a part of the first batch after 2.51 final gets tagged?

Thanks.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth
  2025-08-14 15:15   ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Junio C Hamano
@ 2025-08-15  5:17     ` Patrick Steinhardt
  0 siblings, 0 replies; 16+ messages in thread
From: Patrick Steinhardt @ 2025-08-15  5:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Toon Claes, Jeff King, Justin Tobler

On Thu, Aug 14, 2025 at 08:15:44AM -0700, Junio C Hamano wrote:
> Toon Claes <toon@iotcl.com> writes:
> 
> > Changes in v2:
> > - Added unit-tests for within_depth() in dir.c. Originally the patch
> >   was a oneline change by Peff, but I've added a bunch of code and
> >   extended the commit message, so I've set myself as the author and set
> >   a Based-on-patch-by trailer for Peff. I hope that's okay?
> > - Added support for --max-depth=-1 and extended code comments why we
> >   need the max_depth_valid flag. With these modification it did no
> >   longer feel appropriate to keep Peff's Signed-off-by trailer.
> > - Made die() messages translatable.
> > - Small tweaks to the docs.
> > - Added some const-correctness.
> > - Switched from `die("BUG: ...")` to `BUG(...)`.
> > - Link to v1: https://lore.kernel.org/r/20250729-toon-max-depth-v1-0-c177e39c40fb@iotcl.com
> 
> This round seems to have attracted no comments, after seeing and
> reacting to a few comments in the previous rounds.  Should this be
> a part of the first batch after 2.51 final gets tagged?

My nits all got addressed judging by the range diff, so I'm happy with
this version.

Thanks!

Patrick

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-08-15  5:17 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29 18:57 [PATCH 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
2025-07-29 18:57 ` [PATCH 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
2025-07-29 18:57 ` [PATCH 2/3] within_depth: fix return for empty path Toon Claes
2025-07-30  6:20   ` Patrick Steinhardt
2025-08-06 14:30     ` Toon Claes
2025-08-07  6:15       ` Patrick Steinhardt
2025-07-29 18:57 ` [PATCH 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
2025-07-30  6:20   ` Patrick Steinhardt
2025-08-06 14:49     ` Toon Claes
2025-08-07  5:55       ` Patrick Steinhardt
2025-08-07 20:52 ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Toon Claes
2025-08-07 20:52   ` [PATCH v2 1/3] combine-diff: zero memory used for callback filepairs Toon Claes
2025-08-07 20:52   ` [PATCH v2 2/3] within_depth: fix return for empty path Toon Claes
2025-08-07 20:52   ` [PATCH v2 3/3] diff: teach tree-diff a max-depth parameter Toon Claes
2025-08-14 15:15   ` [PATCH v2 0/3] Teach git-diff-tree(1) option --max-depth Junio C Hamano
2025-08-15  5:17     ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).