* [PATCH v4 0/3] diff: add pathspec support to --no-index
@ 2025-05-21 23:29 Jacob Keller
2025-05-21 23:29 ` [PATCH v4 1/3] pathspec: add match_leading_pathspec variant Jacob Keller
` (4 more replies)
0 siblings, 5 replies; 22+ messages in thread
From: Jacob Keller @ 2025-05-21 23:29 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jacob Keller
From: Jacob Keller <jacob.keller@gmail.com>
This series adds support for using pathspecs to limit the comparison when
using git diff --no-index. This is similar to how you can limit what is
included with pathspecs when comparing inside a repository.
This version uses only one set of pathspecs and instead uses some logic to
skip past the root of each directory tree being scanned. This avoids needing
to parse pathspecs multiple times, and is overall a simpler approach.
I also opted to add a match_leading_pathspec() instead of exposing the
match_pathspec_with_flags(), since I didn't how DO_MATCH_EXCLUDES wasn't
exposed. It felt messy.
I tried a couple of different methods for skipping past the leading portion
of a path, including skip_prefix. Ultimately just the index to skip to
seemed like the simplest solution. I like that it means we only need a
single pathspec array now, and that we no longer have to worry about
changing prefix_path_gently.
Changes since v3:
* Drop the patch modifying prefix_path(_gently).
* Instead of exposing the do_match_pathspec flags, create a
match_leading_pathspec() variant that sets both flags when is_dir is true.
* Use some simple logic to skip past the starting portions of each path
before calling match_leading_pathspec
* Re-write the commit message for the final patch
* Add a couple more test cases
* Simplify existing test cases to use --name-status
* Drop remaining TODOs
Jacob Keller (3):
pathspec: add match_leading_pathspec variant
pathspec: add flag to indicate operation without repository
diff --no-index: support limiting by pathspec
pathspec.h | 11 +++++
builtin/diff.c | 2 +-
diff-no-index.c | 89 ++++++++++++++++++++++++++++++-------
dir.c | 19 ++++++--
pathspec.c | 6 ++-
Documentation/git-diff.adoc | 10 +++--
t/t4053-diff-no-index.sh | 75 +++++++++++++++++++++++++++++++
7 files changed, 187 insertions(+), 25 deletions(-)
--
2.48.1.397.gec9d649cc640
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v4 1/3] pathspec: add match_leading_pathspec variant
2025-05-21 23:29 [PATCH v4 0/3] diff: add pathspec support to --no-index Jacob Keller
@ 2025-05-21 23:29 ` Jacob Keller
2025-05-21 23:29 ` [PATCH v4 2/3] pathspec: add flag to indicate operation without repository Jacob Keller
` (3 subsequent siblings)
4 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2025-05-21 23:29 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jacob Keller
From: Jacob Keller <jacob.keller@gmail.com>
The do_match_pathspec() function has the DO_MATCH_LEADING_PATHSPEC
option to allow pathspecs to match when matching "src" against a
pathspec like "src/path/...". This support is not exposed by
match_pathspec, and the internal flags to do_match_pathspec are not
exposed outside of dir.c
The upcoming support for pathspecs in git diff --no-index need the
LEADING matching behavior when iterating down through a directory with
readdir.
We could try to expose the match_pathspec_with_flags to the public API.
However, DO_MATCH_EXCLUDES really shouldn't be public, and its a bit
weird to only have a few of the flags become public.
Instead, add match_leading_pathspec() as a function which sets both
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC when is_dir is true.
This will be used in a following change to support pathspec matching in
git diff --no-index.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
pathspec.h | 6 ++++++
dir.c | 10 ++++++++++
2 files changed, 16 insertions(+)
diff --git a/pathspec.h b/pathspec.h
index de537cff3cb6..cda3eb5b91f7 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -184,6 +184,12 @@ int match_pathspec(struct index_state *istate,
const char *name, int namelen,
int prefix, char *seen, int is_dir);
+/* Set both DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC if is_dir true */
+int match_leading_pathspec(struct index_state *istate,
+ const struct pathspec *ps,
+ const char *name, int namelen,
+ int prefix, char *seen, int is_dir);
+
/*
* Determine whether a pathspec will match only entire index entries (non-sparse
* files and/or entire sparse directories). If the pathspec has the potential to
diff --git a/dir.c b/dir.c
index a374972b6243..86eb77b82a79 100644
--- a/dir.c
+++ b/dir.c
@@ -577,6 +577,16 @@ int match_pathspec(struct index_state *istate,
prefix, seen, flags);
}
+int match_leading_pathspec(struct index_state *istate,
+ const struct pathspec *ps,
+ const char *name, int namelen,
+ int prefix, char *seen, int is_dir)
+{
+ unsigned flags = is_dir ? DO_MATCH_DIRECTORY | DO_MATCH_LEADING_PATHSPEC : 0;
+ return match_pathspec_with_flags(istate, ps, name, namelen,
+ prefix, seen, flags);
+}
+
/**
* Check if a submodule is a superset of the pathspec
*/
--
2.48.1.397.gec9d649cc640
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v4 2/3] pathspec: add flag to indicate operation without repository
2025-05-21 23:29 [PATCH v4 0/3] diff: add pathspec support to --no-index Jacob Keller
2025-05-21 23:29 ` [PATCH v4 1/3] pathspec: add match_leading_pathspec variant Jacob Keller
@ 2025-05-21 23:29 ` Jacob Keller
2025-05-21 23:29 ` [PATCH v4 3/3] diff --no-index: support limiting by pathspec Jacob Keller
` (2 subsequent siblings)
4 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2025-05-21 23:29 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jacob Keller
From: Jacob Keller <jacob.keller@gmail.com>
A following change will add support for pathspecs to the git diff
--no-index command. This mode of git diff does not load any repository.
Add a new PATHSPEC_NO_REPOSITORY flag indicating that we're parsing
pathspecs without a repository.
Both PATHSPEC_ATTR and PATHSPEC_FROMTOP require a repository to
function. Thus, verify that both of these are set in magic_mask to
ensure they won't be accepted when PATHSPEC_NO_REPOSITORY is set.
Check PATHSPEC_NO_REPOSITORY when warning about paths outside the
directory tree. When the flag is set, do not look for a git repository
when generating the warning message.
Finally, add a BUG in match_pathspec_item if the istate is NULL but the
pathspec has PATHSPEC_ATTR set. Callers which support PATHSPEC_ATTR
should always pass a valid istate, and callers which don't pass a valid
istate should have set PATHSPEC_ATTR in the magic_mask field to disable
support for attribute-based pathspecs.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
pathspec.h | 5 +++++
dir.c | 9 ++++++---
pathspec.c | 6 +++++-
3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/pathspec.h b/pathspec.h
index cda3eb5b91f7..5e3a6f1fe7b7 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -76,6 +76,11 @@ struct pathspec {
* allowed, then it will automatically set for every pathspec.
*/
#define PATHSPEC_LITERAL_PATH (1<<6)
+/*
+ * For git diff --no-index, indicate that we are operating without
+ * a repository or index.
+ */
+#define PATHSPEC_NO_REPOSITORY (1<<7)
/**
* Given command line arguments and a prefix, convert the input to
diff --git a/dir.c b/dir.c
index 86eb77b82a79..c8c869faed2b 100644
--- a/dir.c
+++ b/dir.c
@@ -397,9 +397,12 @@ static int match_pathspec_item(struct index_state *istate,
strncmp(item->match, name - prefix, item->prefix))
return 0;
- if (item->attr_match_nr &&
- !match_pathspec_attrs(istate, name - prefix, namelen + prefix, item))
- return 0;
+ if (item->attr_match_nr) {
+ if (!istate)
+ BUG("magic PATHSPEC_ATTR requires an index");
+ if (!match_pathspec_attrs(istate, name - prefix, namelen + prefix, item))
+ return 0;
+ }
/* If the match was just the prefix, we matched */
if (!*match)
diff --git a/pathspec.c b/pathspec.c
index 2b4e434bc0aa..a3ddd701c740 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -492,7 +492,7 @@ static void init_pathspec_item(struct pathspec_item *item, unsigned flags,
if (!match) {
const char *hint_path;
- if (!have_git_dir())
+ if ((flags & PATHSPEC_NO_REPOSITORY) || !have_git_dir())
die(_("'%s' is outside the directory tree"),
copyfrom);
hint_path = repo_get_work_tree(the_repository);
@@ -614,6 +614,10 @@ void parse_pathspec(struct pathspec *pathspec,
(flags & PATHSPEC_PREFER_FULL))
BUG("PATHSPEC_PREFER_CWD and PATHSPEC_PREFER_FULL are incompatible");
+ if ((flags & PATHSPEC_NO_REPOSITORY) &&
+ (~magic_mask & (PATHSPEC_ATTR | PATHSPEC_FROMTOP)))
+ BUG("PATHSPEC_NO_REPOSITORY is incompatible with PATHSPEC_ATTR and PATHSPEC_FROMTOP");
+
/* No arguments with prefix -> prefix pathspec */
if (!entry) {
if (flags & PATHSPEC_PREFER_FULL)
--
2.48.1.397.gec9d649cc640
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-05-21 23:29 [PATCH v4 0/3] diff: add pathspec support to --no-index Jacob Keller
2025-05-21 23:29 ` [PATCH v4 1/3] pathspec: add match_leading_pathspec variant Jacob Keller
2025-05-21 23:29 ` [PATCH v4 2/3] pathspec: add flag to indicate operation without repository Jacob Keller
@ 2025-05-21 23:29 ` Jacob Keller
2025-06-04 2:37 ` Ben Knoble
2025-09-23 14:57 ` Johannes Schindelin
2025-05-22 21:37 ` [PATCH v4 0/3] diff: add pathspec support to --no-index Junio C Hamano
2025-06-03 21:12 ` Junio C Hamano
4 siblings, 2 replies; 22+ messages in thread
From: Jacob Keller @ 2025-05-21 23:29 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Jacob Keller
From: Jacob Keller <jacob.keller@gmail.com>
The --no-index option of git-diff enables using the diff machinery from
git while operating outside of a repository. This mode of git diff is
able to compare directories and produce a diff of their contents.
When operating git diff in a repository, git has the notion of
"pathspecs" which can specify which files to compare. In particular,
when using git to diff two trees, you might invoke:
$ git diff-tree -r <treeish1> <treeish2>.
where the treeish could point to a subdirectory of the repository.
When invoked this way, users can limit the selected paths of the tree by
using a pathspec. Either by providing some list of paths to accept, or
by removing paths via a negative refspec.
The git diff --no-index mode does not support pathspecs, and cannot
limit the diff output in this way. Other diff programs such as GNU
difftools have options for excluding paths based on a pattern match.
However, using git diff as a diff replacement has several advantages
over many popular diff tools, including coloring moved lines, rename
detections, and similar.
Teach git diff --no-index how to handle pathspecs to limit the
comparisons. This will only be supported if both provided paths are
directories.
For comparisons where one path isn't a directory, the --no-index mode
already has some DWIM shortcuts implemented in the fixup_paths()
function.
Modify the fixup_paths function to return 1 if both paths are
directories. If this is the case, interpret any extra arguments to git
diff as pathspecs via parse_pathspec.
Use parse_pathspec to load the remaining arguments (if any) to git diff
--no-index as pathspec items. Disable PATHSPEC_ATTR support since we do
not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP
since we do not have a repository root. All pathspecs are treated as
rooted at the provided comparison paths.
After loading the pathspec data, calculate skip offsets for skipping
past the root portion of the paths. This is required to ensure that
pathspecs start matching from the provided path, rather than matching
from the absolute path. We could instead pass the paths as prefix values
to parse_pathspec. This is slightly problematic because the paths come
from the command line and don't necessarily have the proper trailing
slash. Additionally, that would require parsing pathspecs multiple
times.
Pass the pathspec object and the skip offsets into queue_diff, which
in-turn must pass them along to read_directory_contents.
Modify read_directory_contents to check against the pathspecs when
scanning the directory. Use the skip offset to skip past the initial
root of the path, and only match against portions that are below the
intended directory structure being compared.
The search algorithm for finding paths is recursive with read_dir. To
make pathspec matching work properly, we must set both
DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC.
Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against
pathspecs like "a/b/c". This is usually achieved by setting the is_dir
parameter of match_pathspec.
Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match
against pathspecs like "a/b/c/d". This is crucial because we recursively
iterate down the directories. We could simply avoid checking pathspecs
at subdirectories, but this would force recursion down directories
which would simply be skipped.
If we always passed DO_MATCH_LEADING_PATHSPEC, then we will
incorrectly match in certain cases such as matching 'a/c' against
':(glob)**/d'. The match logic will see that a matches the leading part
of the **/ and accept this even tho c doesn't match.
To avoid this, use the match_leading_pathspec() variant recently
introduced. This sets both flags when is_dir is set, but leaves them
both cleared when is_dir is 0.
Add test cases and documentation covering the new functionality. Note
for the documentation I opted not to move the placement of '--' which is
sometimes used to disambiguate arguments. The diff --no-index mode
requires exactly 2 arguments determining what to compare. Any additional
arguments are interpreted as pathspecs and must come afterwards. Use of
'--' would not actually disambiguate anything, since there will never be
ambiguity over which arguments represent paths or pathspecs.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
builtin/diff.c | 2 +-
diff-no-index.c | 85 +++++++++++++++++++++++++++++--------
Documentation/git-diff.adoc | 10 +++--
t/t4053-diff-no-index.sh | 75 ++++++++++++++++++++++++++++++++
4 files changed, 151 insertions(+), 21 deletions(-)
diff --git a/builtin/diff.c b/builtin/diff.c
index fa963808c318..c6231edce4e8 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -35,7 +35,7 @@ static const char builtin_diff_usage[] =
" or: git diff [<options>] [--merge-base] <commit> [<commit>...] <commit> [--] [<path>...]\n"
" or: git diff [<options>] <commit>...<commit> [--] [<path>...]\n"
" or: git diff [<options>] <blob> <blob>\n"
-" or: git diff [<options>] --no-index [--] <path> <path>"
+" or: git diff [<options>] --no-index [--] <path> <path> [<pathspec>...]"
"\n"
COMMON_DIFF_OPTIONS_HELP;
diff --git a/diff-no-index.c b/diff-no-index.c
index 9739b2b268b9..4aeeb98cfa8f 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -15,20 +15,45 @@
#include "gettext.h"
#include "revision.h"
#include "parse-options.h"
+#include "pathspec.h"
#include "string-list.h"
#include "dir.h"
-static int read_directory_contents(const char *path, struct string_list *list)
+static int read_directory_contents(const char *path, struct string_list *list,
+ const struct pathspec *pathspec,
+ int skip)
{
+ struct strbuf match = STRBUF_INIT;
+ int len;
DIR *dir;
struct dirent *e;
if (!(dir = opendir(path)))
return error("Could not open directory %s", path);
- while ((e = readdir_skip_dot_and_dotdot(dir)))
- string_list_insert(list, e->d_name);
+ if (pathspec) {
+ strbuf_addstr(&match, path);
+ strbuf_complete(&match, '/');
+ strbuf_remove(&match, 0, skip);
+ len = match.len;
+ }
+
+ while ((e = readdir_skip_dot_and_dotdot(dir))) {
+ if (pathspec) {
+ strbuf_setlen(&match, len);
+ strbuf_addstr(&match, e->d_name);
+
+ if (!match_leading_pathspec(NULL, pathspec,
+ match.buf, match.len,
+ 0, NULL, e->d_type == DT_DIR ? 1 : 0))
+ continue;
+ }
+
+ string_list_insert(list, e->d_name);
+ }
+
+ strbuf_release(&match);
closedir(dir);
return 0;
}
@@ -131,7 +156,8 @@ static struct diff_filespec *noindex_filespec(const struct git_hash_algo *algop,
}
static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop,
- const char *name1, const char *name2, int recursing)
+ const char *name1, const char *name2, int recursing,
+ const struct pathspec *ps, int skip1, int skip2)
{
int mode1 = 0, mode2 = 0;
enum special special1 = SPECIAL_NONE, special2 = SPECIAL_NONE;
@@ -171,9 +197,9 @@ static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop,
int i1, i2, ret = 0;
size_t len1 = 0, len2 = 0;
- if (name1 && read_directory_contents(name1, &p1))
+ if (name1 && read_directory_contents(name1, &p1, ps, skip1))
return -1;
- if (name2 && read_directory_contents(name2, &p2)) {
+ if (name2 && read_directory_contents(name2, &p2, ps, skip2)) {
string_list_clear(&p1, 0);
return -1;
}
@@ -218,7 +244,7 @@ static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop,
n2 = buffer2.buf;
}
- ret = queue_diff(o, algop, n1, n2, 1);
+ ret = queue_diff(o, algop, n1, n2, 1, ps, skip1, skip2);
}
string_list_clear(&p1, 0);
string_list_clear(&p2, 0);
@@ -258,8 +284,10 @@ static void append_basename(struct strbuf *path, const char *dir, const char *fi
* DWIM "diff D F" into "diff D/F F" and "diff F D" into "diff F D/F"
* Note that we append the basename of F to D/, so "diff a/b/file D"
* becomes "diff a/b/file D/file", not "diff a/b/file D/a/b/file".
+ *
+ * Return 1 if both paths are directories, 0 otherwise.
*/
-static void fixup_paths(const char **path, struct strbuf *replacement)
+static int fixup_paths(const char **path, struct strbuf *replacement)
{
struct stat st;
unsigned int isdir0 = 0, isdir1 = 0;
@@ -282,26 +310,31 @@ static void fixup_paths(const char **path, struct strbuf *replacement)
if ((isdir0 && ispipe1) || (ispipe0 && isdir1))
die(_("cannot compare a named pipe to a directory"));
- if (isdir0 == isdir1)
- return;
+ /* if both paths are directories, we will enable pathspecs */
+ if (isdir0 && isdir1)
+ return 1;
+
if (isdir0) {
append_basename(replacement, path[0], path[1]);
path[0] = replacement->buf;
- } else {
+ } else if (isdir1) {
append_basename(replacement, path[1], path[0]);
path[1] = replacement->buf;
}
+
+ return 0;
}
static const char * const diff_no_index_usage[] = {
- N_("git diff --no-index [<options>] <path> <path>"),
+ N_("git diff --no-index [<options>] <path> <path> [<pathspec>...]"),
NULL
};
int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
int implicit_no_index, int argc, const char **argv)
{
- int i, no_index;
+ struct pathspec pathspec, *ps = NULL;
+ int i, no_index, skip1 = 0, skip2 = 0;
int ret = 1;
const char *paths[2];
char *to_free[ARRAY_SIZE(paths)] = { 0 };
@@ -317,13 +350,12 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
options = add_diff_options(no_index_options, &revs->diffopt);
argc = parse_options(argc, argv, revs->prefix, options,
diff_no_index_usage, 0);
- if (argc != 2) {
+ if (argc < 2) {
if (implicit_no_index)
warning(_("Not a git repository. Use --no-index to "
"compare two paths outside a working tree"));
usage_with_options(diff_no_index_usage, options);
}
- FREE_AND_NULL(options);
for (i = 0; i < 2; i++) {
const char *p = argv[i];
if (!strcmp(p, "-"))
@@ -337,7 +369,23 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
paths[i] = p;
}
- fixup_paths(paths, &replacement);
+ if (fixup_paths(paths, &replacement)) {
+ parse_pathspec(&pathspec, PATHSPEC_FROMTOP | PATHSPEC_ATTR,
+ PATHSPEC_PREFER_FULL | PATHSPEC_NO_REPOSITORY,
+ NULL, &argv[2]);
+ if (pathspec.nr)
+ ps = &pathspec;
+
+ skip1 = strlen(paths[0]);
+ skip1 += paths[0][skip1] == '/' ? 0 : 1;
+ skip2 = strlen(paths[1]);
+ skip2 += paths[1][skip2] == '/' ? 0 : 1;
+ } else if (argc > 2) {
+ warning(_("Limiting comparison with pathspecs is only "
+ "supported if both paths are directories."));
+ usage_with_options(diff_no_index_usage, options);
+ }
+ FREE_AND_NULL(options);
revs->diffopt.skip_stat_unmatch = 1;
if (!revs->diffopt.output_format)
@@ -354,7 +402,8 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
setup_diff_pager(&revs->diffopt);
revs->diffopt.flags.exit_with_status = 1;
- if (queue_diff(&revs->diffopt, algop, paths[0], paths[1], 0))
+ if (queue_diff(&revs->diffopt, algop, paths[0], paths[1], 0, ps,
+ skip1, skip2))
goto out;
diff_set_mnemonic_prefix(&revs->diffopt, "1/", "2/");
diffcore_std(&revs->diffopt);
@@ -370,5 +419,7 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
for (i = 0; i < ARRAY_SIZE(to_free); i++)
free(to_free[i]);
strbuf_release(&replacement);
+ if (ps)
+ clear_pathspec(ps);
return ret;
}
diff --git a/Documentation/git-diff.adoc b/Documentation/git-diff.adoc
index dec173a34517..272331afbaec 100644
--- a/Documentation/git-diff.adoc
+++ b/Documentation/git-diff.adoc
@@ -14,7 +14,7 @@ git diff [<options>] --cached [--merge-base] [<commit>] [--] [<path>...]
git diff [<options>] [--merge-base] <commit> [<commit>...] <commit> [--] [<path>...]
git diff [<options>] <commit>...<commit> [--] [<path>...]
git diff [<options>] <blob> <blob>
-git diff [<options>] --no-index [--] <path> <path>
+git diff [<options>] --no-index [--] <path> <path> [<pathspec>...]
DESCRIPTION
-----------
@@ -31,14 +31,18 @@ files on disk.
further add to the index but you still haven't. You can
stage these changes by using linkgit:git-add[1].
-`git diff [<options>] --no-index [--] <path> <path>`::
+`git diff [<options>] --no-index [--] <path> <path> [<pathspec>...]`::
This form is to compare the given two paths on the
filesystem. You can omit the `--no-index` option when
running the command in a working tree controlled by Git and
at least one of the paths points outside the working tree,
or when running the command outside a working tree
- controlled by Git. This form implies `--exit-code`.
+ controlled by Git. This form implies `--exit-code`. If both
+ paths point to directories, additional pathspecs may be
+ provided. These will limit the files included in the
+ difference. All such pathspecs must be relative as they
+ apply to both sides of the diff.
`git diff [<options>] --cached [--merge-base] [<commit>] [--] [<path>...]`::
diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
index 5e5bad61ca1e..01db9243abfe 100755
--- a/t/t4053-diff-no-index.sh
+++ b/t/t4053-diff-no-index.sh
@@ -295,4 +295,79 @@ test_expect_success PIPE,SYMLINKS 'diff --no-index reads from pipes' '
test_cmp expect actual
'
+test_expect_success 'diff --no-index F F rejects pathspecs' '
+ test_must_fail git diff --no-index -- a/1 a/2 a 2>actual.err &&
+ test_grep "usage: git diff --no-index" actual.err
+'
+
+test_expect_success 'diff --no-index D F rejects pathspecs' '
+ test_must_fail git diff --no-index -- a a/2 a 2>actual.err &&
+ test_grep "usage: git diff --no-index" actual.err
+'
+
+test_expect_success 'diff --no-index F D rejects pathspecs' '
+ test_must_fail git diff --no-index -- a/1 b b 2>actual.err &&
+ test_grep "usage: git diff --no-index" actual.err
+'
+
+test_expect_success 'diff --no-index rejects absolute pathspec' '
+ test_must_fail git diff --no-index -- a b $(pwd)/a/1
+'
+
+test_expect_success 'diff --no-index with pathspec' '
+ test_expect_code 1 git diff --name-status --no-index a b 1 >actual &&
+ cat >expect <<-EOF &&
+ D a/1
+ EOF
+ test_cmp expect actual
+'
+
+test_expect_success 'diff --no-index with pathspec no matches' '
+ test_expect_code 0 git diff --name-status --no-index a b missing
+'
+
+test_expect_success 'diff --no-index with negative pathspec' '
+ test_expect_code 1 git diff --name-status --no-index a b ":!2" >actual &&
+ cat >expect <<-EOF &&
+ D a/1
+ EOF
+ test_cmp expect actual
+'
+
+test_expect_success 'setup nested' '
+ mkdir -p c/1/2 &&
+ mkdir -p d/1/2 &&
+ echo 1 >c/1/2/a &&
+ echo 2 >c/1/2/b
+'
+
+test_expect_success 'diff --no-index with pathspec nested negative pathspec' '
+ test_expect_code 0 git diff --no-index c d ":!1"
+'
+
+test_expect_success 'diff --no-index with pathspec nested pathspec' '
+ test_expect_code 1 git diff --name-status --no-index c d 1/2 >actual &&
+ cat >expect <<-EOF &&
+ D c/1/2/a
+ D c/1/2/b
+ EOF
+ test_cmp expect actual
+'
+
+test_expect_success 'diff --no-index with pathspec glob' '
+ test_expect_code 1 git diff --name-status --no-index c d ":(glob)**/a" >actual &&
+ cat >expect <<-EOF &&
+ D c/1/2/a
+ EOF
+ test_cmp expect actual
+'
+
+test_expect_success 'diff --no-index with pathspec glob and exclude' '
+ test_expect_code 1 git diff --name-status --no-index c d ":(glob,exclude)**/a" >actual &&
+ cat >expect <<-EOF &&
+ D c/1/2/b
+ EOF
+ test_cmp expect actual
+'
+
test_done
--
2.48.1.397.gec9d649cc640
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v4 0/3] diff: add pathspec support to --no-index
2025-05-21 23:29 [PATCH v4 0/3] diff: add pathspec support to --no-index Jacob Keller
` (2 preceding siblings ...)
2025-05-21 23:29 ` [PATCH v4 3/3] diff --no-index: support limiting by pathspec Jacob Keller
@ 2025-05-22 21:37 ` Junio C Hamano
2025-05-22 21:50 ` Jacob Keller
2025-06-03 21:12 ` Junio C Hamano
4 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2025-05-22 21:37 UTC (permalink / raw)
To: Jacob Keller; +Cc: git, Jacob Keller
Jacob Keller <jacob.e.keller@intel.com> writes:
> From: Jacob Keller <jacob.keller@gmail.com>
>
> This series adds support for using pathspecs to limit the comparison when
> using git diff --no-index. This is similar to how you can limit what is
> included with pathspecs when comparing inside a repository.
>
> This version uses only one set of pathspecs and instead uses some logic to
> skip past the root of each directory tree being scanned. This avoids needing
> to parse pathspecs multiple times, and is overall a simpler approach.
> ...
> I tried a couple of different methods for skipping past the leading portion
> of a path, including skip_prefix. Ultimately just the index to skip to
> seemed like the simplest solution. I like that it means we only need a
> single pathspec array now, and that we no longer have to worry about
> changing prefix_path_gently.
Nice. I kept the previous iteration out of 'seen' primarily because
it seemed to break the tests (even though it passed standalone).
Let me see how well we do with this iteration.
Will queue. Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 0/3] diff: add pathspec support to --no-index
2025-05-22 21:37 ` [PATCH v4 0/3] diff: add pathspec support to --no-index Junio C Hamano
@ 2025-05-22 21:50 ` Jacob Keller
2025-05-22 22:04 ` Junio C Hamano
0 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2025-05-22 21:50 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Jacob Keller
On 5/22/2025 2:37 PM, Junio C Hamano wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
>
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> This series adds support for using pathspecs to limit the comparison when
>> using git diff --no-index. This is similar to how you can limit what is
>> included with pathspecs when comparing inside a repository.
>>
>> This version uses only one set of pathspecs and instead uses some logic to
>> skip past the root of each directory tree being scanned. This avoids needing
>> to parse pathspecs multiple times, and is overall a simpler approach.
>> ...
>> I tried a couple of different methods for skipping past the leading portion
>> of a path, including skip_prefix. Ultimately just the index to skip to
>> seemed like the simplest solution. I like that it means we only need a
>> single pathspec array now, and that we no longer have to worry about
>> changing prefix_path_gently.
>
>
> Nice. I kept the previous iteration out of 'seen' primarily because
> it seemed to break the tests (even though it passed standalone).
> Let me see how well we do with this iteration.
>
> Will queue. Thanks.
The tests all passed for me on their own, but maybe something is flaky?
My guess would be that the changes to prefix_path_gently which are now
dropped would be the most likely culprit of some sort of intermittent
failure since the rest of the changes are fairly well separated to just
the diff-no-index.c code.
Thanks,
Jake
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 0/3] diff: add pathspec support to --no-index
2025-05-22 21:50 ` Jacob Keller
@ 2025-05-22 22:04 ` Junio C Hamano
0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2025-05-22 22:04 UTC (permalink / raw)
To: Jacob Keller; +Cc: git, Jacob Keller
Jacob Keller <jacob.e.keller@intel.com> writes:
>> Nice. I kept the previous iteration out of 'seen' primarily because
>> it seemed to break the tests (even though it passed standalone).
>> Let me see how well we do with this iteration.
>>
>> Will queue. Thanks.
>
> The tests all passed for me on their own, but maybe something is flaky?
Do not recall the details, but it is possible there were some
interactions with topics in flight. I am in the middle of day's
second integration cycles, so we'll see how it goes soon.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 0/3] diff: add pathspec support to --no-index
2025-05-21 23:29 [PATCH v4 0/3] diff: add pathspec support to --no-index Jacob Keller
` (3 preceding siblings ...)
2025-05-22 21:37 ` [PATCH v4 0/3] diff: add pathspec support to --no-index Junio C Hamano
@ 2025-06-03 21:12 ` Junio C Hamano
2025-06-04 2:32 ` Ben Knoble
2025-06-05 15:34 ` Phillip Wood
4 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2025-06-03 21:12 UTC (permalink / raw)
To: Jacob Keller; +Cc: git, Jacob Keller
Jacob Keller <jacob.e.keller@intel.com> writes:
> From: Jacob Keller <jacob.keller@gmail.com>
>
> This series adds support for using pathspecs to limit the comparison when
> using git diff --no-index. This is similar to how you can limit what is
> included with pathspecs when comparing inside a repository.
>
> This version uses only one set of pathspecs and instead uses some logic to
> skip past the root of each directory tree being scanned. This avoids needing
> to parse pathspecs multiple times, and is overall a simpler approach.
>
> I also opted to add a match_leading_pathspec() instead of exposing the
> match_pathspec_with_flags(), since I didn't how DO_MATCH_EXCLUDES wasn't
> exposed. It felt messy.
>
> I tried a couple of different methods for skipping past the leading portion
> of a path, including skip_prefix. Ultimately just the index to skip to
> seemed like the simplest solution. I like that it means we only need a
> single pathspec array now, and that we no longer have to worry about
> changing prefix_path_gently.
>
> Changes since v3:
> * Drop the patch modifying prefix_path(_gently).
> * Instead of exposing the do_match_pathspec flags, create a
> match_leading_pathspec() variant that sets both flags when is_dir is true.
> * Use some simple logic to skip past the starting portions of each path
> before calling match_leading_pathspec
> * Re-write the commit message for the final patch
> * Add a couple more test cases
> * Simplify existing test cases to use --name-status
> * Drop remaining TODOs
Anybody, other than Jacob and I, interested in this series? We
haven't seen any support or review and I am considering merging it
down for the next cycle sometime in coming weeks.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 0/3] diff: add pathspec support to --no-index
2025-06-03 21:12 ` Junio C Hamano
@ 2025-06-04 2:32 ` Ben Knoble
2025-06-05 15:34 ` Phillip Wood
1 sibling, 0 replies; 22+ messages in thread
From: Ben Knoble @ 2025-06-04 2:32 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jacob Keller, git, Jacob Keller
> Le 3 juin 2025 à 17:12, Junio C Hamano <gitster@pobox.com> a écrit :
>
> Jacob Keller <jacob.e.keller@intel.com> writes:
>
>> From: Jacob Keller <jacob.keller@gmail.com>
>> This series adds support for using pathspecs to limit the comparison when
>> using git diff --no-index. This is similar to how you can limit what is
>> included with pathspecs when comparing inside a repository.
>> This version uses only one set of pathspecs and instead uses some logic to
>> skip past the root of each directory tree being scanned. This avoids needing
>> to parse pathspecs multiple times, and is overall a simpler approach.
>> I also opted to add a match_leading_pathspec() instead of exposing the
>> match_pathspec_with_flags(), since I didn't how DO_MATCH_EXCLUDES wasn't
>> exposed. It felt messy.
>> I tried a couple of different methods for skipping past the leading portion
>> of a path, including skip_prefix. Ultimately just the index to skip to
>> seemed like the simplest solution. I like that it means we only need a
>> single pathspec array now, and that we no longer have to worry about
>> changing prefix_path_gently.
>> Changes since v3:
>> * Drop the patch modifying prefix_path(_gently).
>> * Instead of exposing the do_match_pathspec flags, create a
>> match_leading_pathspec() variant that sets both flags when is_dir is true.
>> * Use some simple logic to skip past the starting portions of each path
>> before calling match_leading_pathspec
>> * Re-write the commit message for the final patch
>> * Add a couple more test cases
>> * Simplify existing test cases to use --name-status
>> * Drop remaining TODOs
>
> Anybody, other than Jacob and I, interested in this series? We
> haven't seen any support or review and I am considering merging it
> down for the next cycle sometime in coming weeks.
>
> Thanks.
I have no code review, and the docs look good. I probably would lean more towards other tools for diffing (non-Git) directories, but I don’t know which offhand, so I don’t mind extending Git’s existing mechanism this way. It certainly allows some interesting commands that I can see myself taking advantage of, and it feels overall like a natural extension of the interface.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-05-21 23:29 ` [PATCH v4 3/3] diff --no-index: support limiting by pathspec Jacob Keller
@ 2025-06-04 2:37 ` Ben Knoble
2025-06-04 17:22 ` Jacob Keller
2025-06-04 18:27 ` Jacob Keller
2025-09-23 14:57 ` Johannes Schindelin
1 sibling, 2 replies; 22+ messages in thread
From: Ben Knoble @ 2025-06-04 2:37 UTC (permalink / raw)
To: Jacob Keller; +Cc: git, Junio C Hamano, Jacob Keller
Actually, one comment :)
> Le 21 mai 2025 à 19:29, Jacob Keller <jacob.e.keller@intel.com> a écrit :
>
> From: Jacob Keller <jacob.keller@gmail.com>
>
> The --no-index option of git-diff enables using the diff machinery from
> git while operating outside of a repository. This mode of git diff is
> able to compare directories and produce a diff of their contents.
>
> When operating git diff in a repository, git has the notion of
> "pathspecs" which can specify which files to compare. In particular,
> when using git to diff two trees, you might invoke:
>
> $ git diff-tree -r <treeish1> <treeish2>.
I do find it slightly confusing that this series and in particular this patch is all about git-diff(1), but the only example is about git-diff-tree(1). It’s not the best example to me, esp. since it doesn’t actually use the pathspec machinery (deferring that to prose only). But I get the gist, so not really an issue.
Rereading a bit, it seems this message goes to lengths to teach readers about pathspecs for git-diff here; perhaps we can simplify those parts and assume the reader is familiar enough with the details to understand the implications of « no-index mode doesn’t support pathspecs to limit comparison »?
Nit: Should the diff-tree command end with a period?
> where the treeish could point to a subdirectory of the repository.
>
> When invoked this way, users can limit the selected paths of the tree by
> using a pathspec. Either by providing some list of paths to accept, or
> by removing paths via a negative refspec.
>
> The git diff --no-index mode does not support pathspecs, and cannot
> limit the diff output in this way. Other diff programs such as GNU
> difftools have options for excluding paths based on a pattern match.
> However, using git diff as a diff replacement has several advantages
> over many popular diff tools, including coloring moved lines, rename
> detections, and similar.
>
> Teach git diff --no-index how to handle pathspecs to limit the
> comparisons. This will only be supported if both provided paths are
> directories.
>
> For comparisons where one path isn't a directory, the --no-index mode
> already has some DWIM shortcuts implemented in the fixup_paths()
> function.
>
> Modify the fixup_paths function to return 1 if both paths are
> directories. If this is the case, interpret any extra arguments to git
> diff as pathspecs via parse_pathspec.
>
> Use parse_pathspec to load the remaining arguments (if any) to git diff
> --no-index as pathspec items. Disable PATHSPEC_ATTR support since we do
> not have a repository to do attribute lookup. Disable PATHSPEC_FROMTOP
> since we do not have a repository root. All pathspecs are treated as
> rooted at the provided comparison paths.
>
> After loading the pathspec data, calculate skip offsets for skipping
> past the root portion of the paths. This is required to ensure that
> pathspecs start matching from the provided path, rather than matching
> from the absolute path. We could instead pass the paths as prefix values
> to parse_pathspec. This is slightly problematic because the paths come
> from the command line and don't necessarily have the proper trailing
> slash. Additionally, that would require parsing pathspecs multiple
> times.
>
> Pass the pathspec object and the skip offsets into queue_diff, which
> in-turn must pass them along to read_directory_contents.
>
> Modify read_directory_contents to check against the pathspecs when
> scanning the directory. Use the skip offset to skip past the initial
> root of the path, and only match against portions that are below the
> intended directory structure being compared.
>
> The search algorithm for finding paths is recursive with read_dir. To
> make pathspec matching work properly, we must set both
> DO_MATCH_DIRECTORY and DO_MATCH_LEADING_PATHSPEC.
>
> Without DO_MATCH_DIRECTORY, paths like "a/b/c/d" will not match against
> pathspecs like "a/b/c". This is usually achieved by setting the is_dir
> parameter of match_pathspec.
>
> Without DO_MATCH_LEADING_PATHSPEC, paths like "a/b/c" would not match
> against pathspecs like "a/b/c/d". This is crucial because we recursively
> iterate down the directories. We could simply avoid checking pathspecs
> at subdirectories, but this would force recursion down directories
> which would simply be skipped.
>
> If we always passed DO_MATCH_LEADING_PATHSPEC, then we will
> incorrectly match in certain cases such as matching 'a/c' against
> ':(glob)**/d'. The match logic will see that a matches the leading part
> of the **/ and accept this even tho c doesn't match.
>
> To avoid this, use the match_leading_pathspec() variant recently
> introduced. This sets both flags when is_dir is set, but leaves them
> both cleared when is_dir is 0.
>
> Add test cases and documentation covering the new functionality. Note
> for the documentation I opted not to move the placement of '--' which is
> sometimes used to disambiguate arguments. The diff --no-index mode
> requires exactly 2 arguments determining what to compare. Any additional
> arguments are interpreted as pathspecs and must come afterwards. Use of
> '--' would not actually disambiguate anything, since there will never be
> ambiguity over which arguments represent paths or pathspecs.
>
> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
> ---
> builtin/diff.c | 2 +-
> diff-no-index.c | 85 +++++++++++++++++++++++++++++--------
> Documentation/git-diff.adoc | 10 +++--
> t/t4053-diff-no-index.sh | 75 ++++++++++++++++++++++++++++++++
> 4 files changed, 151 insertions(+), 21 deletions(-)
>
> diff --git a/builtin/diff.c b/builtin/diff.c
> index fa963808c318..c6231edce4e8 100644
> --- a/builtin/diff.c
> +++ b/builtin/diff.c
> @@ -35,7 +35,7 @@ static const char builtin_diff_usage[] =
> " or: git diff [<options>] [--merge-base] <commit> [<commit>...] <commit> [--] [<path>...]\n"
> " or: git diff [<options>] <commit>...<commit> [--] [<path>...]\n"
> " or: git diff [<options>] <blob> <blob>\n"
> -" or: git diff [<options>] --no-index [--] <path> <path>"
> +" or: git diff [<options>] --no-index [--] <path> <path> [<pathspec>...]"
> "\n"
> COMMON_DIFF_OPTIONS_HELP;
>
> diff --git a/diff-no-index.c b/diff-no-index.c
> index 9739b2b268b9..4aeeb98cfa8f 100644
> --- a/diff-no-index.c
> +++ b/diff-no-index.c
> @@ -15,20 +15,45 @@
> #include "gettext.h"
> #include "revision.h"
> #include "parse-options.h"
> +#include "pathspec.h"
> #include "string-list.h"
> #include "dir.h"
>
> -static int read_directory_contents(const char *path, struct string_list *list)
> +static int read_directory_contents(const char *path, struct string_list *list,
> + const struct pathspec *pathspec,
> + int skip)
> {
> + struct strbuf match = STRBUF_INIT;
> + int len;
> DIR *dir;
> struct dirent *e;
>
> if (!(dir = opendir(path)))
> return error("Could not open directory %s", path);
>
> - while ((e = readdir_skip_dot_and_dotdot(dir)))
> - string_list_insert(list, e->d_name);
> + if (pathspec) {
> + strbuf_addstr(&match, path);
> + strbuf_complete(&match, '/');
> + strbuf_remove(&match, 0, skip);
>
> + len = match.len;
> + }
> +
> + while ((e = readdir_skip_dot_and_dotdot(dir))) {
> + if (pathspec) {
> + strbuf_setlen(&match, len);
> + strbuf_addstr(&match, e->d_name);
> +
> + if (!match_leading_pathspec(NULL, pathspec,
> + match.buf, match.len,
> + 0, NULL, e->d_type == DT_DIR ? 1 : 0))
> + continue;
> + }
> +
> + string_list_insert(list, e->d_name);
> + }
> +
> + strbuf_release(&match);
> closedir(dir);
> return 0;
> }
> @@ -131,7 +156,8 @@ static struct diff_filespec *noindex_filespec(const struct git_hash_algo *algop,
> }
>
> static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop,
> - const char *name1, const char *name2, int recursing)
> + const char *name1, const char *name2, int recursing,
> + const struct pathspec *ps, int skip1, int skip2)
> {
> int mode1 = 0, mode2 = 0;
> enum special special1 = SPECIAL_NONE, special2 = SPECIAL_NONE;
> @@ -171,9 +197,9 @@ static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop,
> int i1, i2, ret = 0;
> size_t len1 = 0, len2 = 0;
>
> - if (name1 && read_directory_contents(name1, &p1))
> + if (name1 && read_directory_contents(name1, &p1, ps, skip1))
> return -1;
> - if (name2 && read_directory_contents(name2, &p2)) {
> + if (name2 && read_directory_contents(name2, &p2, ps, skip2)) {
> string_list_clear(&p1, 0);
> return -1;
> }
> @@ -218,7 +244,7 @@ static int queue_diff(struct diff_options *o, const struct git_hash_algo *algop,
> n2 = buffer2.buf;
> }
>
> - ret = queue_diff(o, algop, n1, n2, 1);
> + ret = queue_diff(o, algop, n1, n2, 1, ps, skip1, skip2);
> }
> string_list_clear(&p1, 0);
> string_list_clear(&p2, 0);
> @@ -258,8 +284,10 @@ static void append_basename(struct strbuf *path, const char *dir, const char *fi
> * DWIM "diff D F" into "diff D/F F" and "diff F D" into "diff F D/F"
> * Note that we append the basename of F to D/, so "diff a/b/file D"
> * becomes "diff a/b/file D/file", not "diff a/b/file D/a/b/file".
> + *
> + * Return 1 if both paths are directories, 0 otherwise.
> */
> -static void fixup_paths(const char **path, struct strbuf *replacement)
> +static int fixup_paths(const char **path, struct strbuf *replacement)
> {
> struct stat st;
> unsigned int isdir0 = 0, isdir1 = 0;
> @@ -282,26 +310,31 @@ static void fixup_paths(const char **path, struct strbuf *replacement)
> if ((isdir0 && ispipe1) || (ispipe0 && isdir1))
> die(_("cannot compare a named pipe to a directory"));
>
> - if (isdir0 == isdir1)
> - return;
> + /* if both paths are directories, we will enable pathspecs */
> + if (isdir0 && isdir1)
> + return 1;
> +
> if (isdir0) {
> append_basename(replacement, path[0], path[1]);
> path[0] = replacement->buf;
> - } else {
> + } else if (isdir1) {
> append_basename(replacement, path[1], path[0]);
> path[1] = replacement->buf;
> }
> +
> + return 0;
> }
>
> static const char * const diff_no_index_usage[] = {
> - N_("git diff --no-index [<options>] <path> <path>"),
> + N_("git diff --no-index [<options>] <path> <path> [<pathspec>...]"),
> NULL
> };
>
> int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> int implicit_no_index, int argc, const char **argv)
> {
> - int i, no_index;
> + struct pathspec pathspec, *ps = NULL;
> + int i, no_index, skip1 = 0, skip2 = 0;
> int ret = 1;
> const char *paths[2];
> char *to_free[ARRAY_SIZE(paths)] = { 0 };
> @@ -317,13 +350,12 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> options = add_diff_options(no_index_options, &revs->diffopt);
> argc = parse_options(argc, argv, revs->prefix, options,
> diff_no_index_usage, 0);
> - if (argc != 2) {
> + if (argc < 2) {
> if (implicit_no_index)
> warning(_("Not a git repository. Use --no-index to "
> "compare two paths outside a working tree"));
> usage_with_options(diff_no_index_usage, options);
> }
> - FREE_AND_NULL(options);
> for (i = 0; i < 2; i++) {
> const char *p = argv[i];
> if (!strcmp(p, "-"))
> @@ -337,7 +369,23 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> paths[i] = p;
> }
>
> - fixup_paths(paths, &replacement);
> + if (fixup_paths(paths, &replacement)) {
> + parse_pathspec(&pathspec, PATHSPEC_FROMTOP | PATHSPEC_ATTR,
> + PATHSPEC_PREFER_FULL | PATHSPEC_NO_REPOSITORY,
> + NULL, &argv[2]);
> + if (pathspec.nr)
> + ps = &pathspec;
> +
> + skip1 = strlen(paths[0]);
> + skip1 += paths[0][skip1] == '/' ? 0 : 1;
> + skip2 = strlen(paths[1]);
> + skip2 += paths[1][skip2] == '/' ? 0 : 1;
> + } else if (argc > 2) {
> + warning(_("Limiting comparison with pathspecs is only "
> + "supported if both paths are directories."));
> + usage_with_options(diff_no_index_usage, options);
> + }
> + FREE_AND_NULL(options);
>
> revs->diffopt.skip_stat_unmatch = 1;
> if (!revs->diffopt.output_format)
> @@ -354,7 +402,8 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> setup_diff_pager(&revs->diffopt);
> revs->diffopt.flags.exit_with_status = 1;
>
> - if (queue_diff(&revs->diffopt, algop, paths[0], paths[1], 0))
> + if (queue_diff(&revs->diffopt, algop, paths[0], paths[1], 0, ps,
> + skip1, skip2))
> goto out;
> diff_set_mnemonic_prefix(&revs->diffopt, "1/", "2/");
> diffcore_std(&revs->diffopt);
> @@ -370,5 +419,7 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> for (i = 0; i < ARRAY_SIZE(to_free); i++)
> free(to_free[i]);
> strbuf_release(&replacement);
> + if (ps)
> + clear_pathspec(ps);
> return ret;
> }
> diff --git a/Documentation/git-diff.adoc b/Documentation/git-diff.adoc
> index dec173a34517..272331afbaec 100644
> --- a/Documentation/git-diff.adoc
> +++ b/Documentation/git-diff.adoc
> @@ -14,7 +14,7 @@ git diff [<options>] --cached [--merge-base] [<commit>] [--] [<path>...]
> git diff [<options>] [--merge-base] <commit> [<commit>...] <commit> [--] [<path>...]
> git diff [<options>] <commit>...<commit> [--] [<path>...]
> git diff [<options>] <blob> <blob>
> -git diff [<options>] --no-index [--] <path> <path>
> +git diff [<options>] --no-index [--] <path> <path> [<pathspec>...]
>
> DESCRIPTION
> -----------
> @@ -31,14 +31,18 @@ files on disk.
> further add to the index but you still haven't. You can
> stage these changes by using linkgit:git-add[1].
>
> -`git diff [<options>] --no-index [--] <path> <path>`::
> +`git diff [<options>] --no-index [--] <path> <path> [<pathspec>...]`::
>
> This form is to compare the given two paths on the
> filesystem. You can omit the `--no-index` option when
> running the command in a working tree controlled by Git and
> at least one of the paths points outside the working tree,
> or when running the command outside a working tree
> - controlled by Git. This form implies `--exit-code`.
> + controlled by Git. This form implies `--exit-code`. If both
> + paths point to directories, additional pathspecs may be
> + provided. These will limit the files included in the
> + difference. All such pathspecs must be relative as they
> + apply to both sides of the diff.
>
> `git diff [<options>] --cached [--merge-base] [<commit>] [--] [<path>...]`::
>
> diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
> index 5e5bad61ca1e..01db9243abfe 100755
> --- a/t/t4053-diff-no-index.sh
> +++ b/t/t4053-diff-no-index.sh
> @@ -295,4 +295,79 @@ test_expect_success PIPE,SYMLINKS 'diff --no-index reads from pipes' '
> test_cmp expect actual
> '
>
> +test_expect_success 'diff --no-index F F rejects pathspecs' '
> + test_must_fail git diff --no-index -- a/1 a/2 a 2>actual.err &&
> + test_grep "usage: git diff --no-index" actual.err
> +'
> +
> +test_expect_success 'diff --no-index D F rejects pathspecs' '
> + test_must_fail git diff --no-index -- a a/2 a 2>actual.err &&
> + test_grep "usage: git diff --no-index" actual.err
> +'
> +
> +test_expect_success 'diff --no-index F D rejects pathspecs' '
> + test_must_fail git diff --no-index -- a/1 b b 2>actual.err &&
> + test_grep "usage: git diff --no-index" actual.err
> +'
> +
> +test_expect_success 'diff --no-index rejects absolute pathspec' '
> + test_must_fail git diff --no-index -- a b $(pwd)/a/1
> +'
> +
> +test_expect_success 'diff --no-index with pathspec' '
> + test_expect_code 1 git diff --name-status --no-index a b 1 >actual &&
> + cat >expect <<-EOF &&
> + D a/1
> + EOF
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'diff --no-index with pathspec no matches' '
> + test_expect_code 0 git diff --name-status --no-index a b missing
> +'
> +
> +test_expect_success 'diff --no-index with negative pathspec' '
> + test_expect_code 1 git diff --name-status --no-index a b ":!2" >actual &&
> + cat >expect <<-EOF &&
> + D a/1
> + EOF
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'setup nested' '
> + mkdir -p c/1/2 &&
> + mkdir -p d/1/2 &&
> + echo 1 >c/1/2/a &&
> + echo 2 >c/1/2/b
> +'
> +
> +test_expect_success 'diff --no-index with pathspec nested negative pathspec' '
> + test_expect_code 0 git diff --no-index c d ":!1"
> +'
> +
> +test_expect_success 'diff --no-index with pathspec nested pathspec' '
> + test_expect_code 1 git diff --name-status --no-index c d 1/2 >actual &&
> + cat >expect <<-EOF &&
> + D c/1/2/a
> + D c/1/2/b
> + EOF
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'diff --no-index with pathspec glob' '
> + test_expect_code 1 git diff --name-status --no-index c d ":(glob)**/a" >actual &&
> + cat >expect <<-EOF &&
> + D c/1/2/a
> + EOF
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'diff --no-index with pathspec glob and exclude' '
> + test_expect_code 1 git diff --name-status --no-index c d ":(glob,exclude)**/a" >actual &&
> + cat >expect <<-EOF &&
> + D c/1/2/b
> + EOF
> + test_cmp expect actual
> +'
> +
> test_done
> --
> 2.48.1.397.gec9d649cc640
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-06-04 2:37 ` Ben Knoble
@ 2025-06-04 17:22 ` Jacob Keller
2025-06-04 18:27 ` Jacob Keller
1 sibling, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2025-06-04 17:22 UTC (permalink / raw)
To: Ben Knoble; +Cc: git, Junio C Hamano, Jacob Keller
On 6/3/2025 7:37 PM, Ben Knoble wrote:
> Actually, one comment :)
>
>> Le 21 mai 2025 à 19:29, Jacob Keller <jacob.e.keller@intel.com> a écrit :
>>
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> The --no-index option of git-diff enables using the diff machinery from
>> git while operating outside of a repository. This mode of git diff is
>> able to compare directories and produce a diff of their contents.
>>
>> When operating git diff in a repository, git has the notion of
>> "pathspecs" which can specify which files to compare. In particular,
>> when using git to diff two trees, you might invoke:
>>
>> $ git diff-tree -r <treeish1> <treeish2>.
>
> I do find it slightly confusing that this series and in particular this patch is all about git-diff(1), but the only example is about git-diff-tree(1). It’s not the best example to me, esp. since it doesn’t actually use the pathspec machinery (deferring that to prose only). But I get the gist, so not really an issue.
>
Fair point. I think my brain wires got crossed as I was thinking of git
diff-tree as an example for why pathspecs always come at the end.
> Rereading a bit, it seems this message goes to lengths to teach readers about pathspecs for git-diff here; perhaps we can simplify those parts and assume the reader is familiar enough with the details to understand the implications of « no-index mode doesn’t support pathspecs to limit comparison »?
>
Yea, we could probably simplify this. I usually try to be verbose with
my "this is the reasoning for why to do this" and give background.. but
that can sometimes end up being too much.
Regardless, the example probably should use pathspecs if we keep it..
> Nit: Should the diff-tree command end with a period?
>
It should not.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-06-04 2:37 ` Ben Knoble
2025-06-04 17:22 ` Jacob Keller
@ 2025-06-04 18:27 ` Jacob Keller
2025-06-04 20:19 ` Junio C Hamano
1 sibling, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2025-06-04 18:27 UTC (permalink / raw)
To: Ben Knoble, Junio C Hamano; +Cc: git, Jacob Keller
On 6/3/2025 7:37 PM, Ben Knoble wrote:
> Actually, one comment :)
>
>> Le 21 mai 2025 à 19:29, Jacob Keller <jacob.e.keller@intel.com> a écrit :
>>
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> The --no-index option of git-diff enables using the diff machinery from
>> git while operating outside of a repository. This mode of git diff is
>> able to compare directories and produce a diff of their contents.
>>
>> When operating git diff in a repository, git has the notion of
>> "pathspecs" which can specify which files to compare. In particular,
>> when using git to diff two trees, you might invoke:
>>
>> $ git diff-tree -r <treeish1> <treeish2>.
>
> I do find it slightly confusing that this series and in particular this patch is all about git-diff(1), but the only example is about git-diff-tree(1). It’s not the best example to me, esp. since it doesn’t actually use the pathspec machinery (deferring that to prose only). But I get the gist, so not really an issue.
>
> Rereading a bit, it seems this message goes to lengths to teach readers about pathspecs for git-diff here; perhaps we can simplify those parts and assume the reader is familiar enough with the details to understand the implications of « no-index mode doesn’t support pathspecs to limit comparison »?
>
> Nit: Should the diff-tree command end with a period?
>
@Junio,
Would you like a v5 with an updated commit message?
Thanks,
Jake
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-06-04 18:27 ` Jacob Keller
@ 2025-06-04 20:19 ` Junio C Hamano
2025-06-04 21:05 ` Jacob Keller
0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2025-06-04 20:19 UTC (permalink / raw)
To: Jacob Keller; +Cc: Ben Knoble, git, Jacob Keller
Jacob Keller <jacob.e.keller@intel.com> writes:
> Would you like a v5 with an updated commit message?
What we had was already plenty readable to me, but if you think you
can improve it further, I do not mind waiting for another round of
update.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-06-04 20:19 ` Junio C Hamano
@ 2025-06-04 21:05 ` Jacob Keller
2025-06-04 21:36 ` D. Ben Knoble
0 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2025-06-04 21:05 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Ben Knoble, git, Jacob Keller
On 6/4/2025 1:19 PM, Junio C Hamano wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
>
>> Would you like a v5 with an updated commit message?
>
> What we had was already plenty readable to me, but if you think you
> can improve it further, I do not mind waiting for another round of
> update.
>
> Thanks.
I'm fine with it as-is. I think the minor nits from Ben aren't worth a
re-roll since there is no functional change, but wanted to confirm my
opinion :)
Thanks,
Jake
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-06-04 21:05 ` Jacob Keller
@ 2025-06-04 21:36 ` D. Ben Knoble
2025-06-04 23:22 ` Junio C Hamano
0 siblings, 1 reply; 22+ messages in thread
From: D. Ben Knoble @ 2025-06-04 21:36 UTC (permalink / raw)
To: Jacob Keller; +Cc: Junio C Hamano, git, Jacob Keller
On Wed, Jun 4, 2025 at 5:05 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
>
>
> On 6/4/2025 1:19 PM, Junio C Hamano wrote:
> > Jacob Keller <jacob.e.keller@intel.com> writes:
> >
> >> Would you like a v5 with an updated commit message?
> >
> > What we had was already plenty readable to me, but if you think you
> > can improve it further, I do not mind waiting for another round of
> > update.
> >
> > Thanks.
>
> I'm fine with it as-is. I think the minor nits from Ben aren't worth a
> re-roll since there is no functional change, but wanted to confirm my
> opinion :)
>
> Thanks,
> Jake
Fine by me as well, thanks.
--
D. Ben Knoble
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-06-04 21:36 ` D. Ben Knoble
@ 2025-06-04 23:22 ` Junio C Hamano
0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2025-06-04 23:22 UTC (permalink / raw)
To: D. Ben Knoble; +Cc: Jacob Keller, git, Jacob Keller
"D. Ben Knoble" <ben.knoble@gmail.com> writes:
> On Wed, Jun 4, 2025 at 5:05 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>>
>>
>>
>> On 6/4/2025 1:19 PM, Junio C Hamano wrote:
>> > Jacob Keller <jacob.e.keller@intel.com> writes:
>> >
>> >> Would you like a v5 with an updated commit message?
>> >
>> > What we had was already plenty readable to me, but if you think you
>> > can improve it further, I do not mind waiting for another round of
>> > update.
>> >
>> > Thanks.
>>
>> I'm fine with it as-is. I think the minor nits from Ben aren't worth a
>> re-roll since there is no functional change, but wanted to confirm my
>> opinion :)
>>
>> Thanks,
>> Jake
>
> Fine by me as well, thanks.
Thanks. Let's move it forward, then.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 0/3] diff: add pathspec support to --no-index
2025-06-03 21:12 ` Junio C Hamano
2025-06-04 2:32 ` Ben Knoble
@ 2025-06-05 15:34 ` Phillip Wood
1 sibling, 0 replies; 22+ messages in thread
From: Phillip Wood @ 2025-06-05 15:34 UTC (permalink / raw)
To: Junio C Hamano, Jacob Keller; +Cc: git, Jacob Keller
On 03/06/2025 22:12, Junio C Hamano wrote:
> Jacob Keller <jacob.e.keller@intel.com> writes:
>
> Anybody, other than Jacob and I, interested in this series? We
> haven't seen any support or review and I am considering merging it
> down for the next cycle sometime in coming weeks.
I think this is a useful addition. I've read through the last patch and
didn't spot any issues. I found the commit message is very helpful. I
had a quick play with it by building what you have in seen (09fb155f111
(diff --no-index: support limiting by pathspec, 2025-05-21) is that v3
or this version?) and it all seemed to work as advertised.
Thanks
Phillip
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-05-21 23:29 ` [PATCH v4 3/3] diff --no-index: support limiting by pathspec Jacob Keller
2025-06-04 2:37 ` Ben Knoble
@ 2025-09-23 14:57 ` Johannes Schindelin
2025-09-23 22:48 ` Jacob Keller
1 sibling, 1 reply; 22+ messages in thread
From: Johannes Schindelin @ 2025-09-23 14:57 UTC (permalink / raw)
To: Jacob Keller; +Cc: git, Junio C Hamano, Jacob Keller
Hi Jacob,
I know this is about a patch that you contributed four months ago, and
the usual feedback required sweeping changes, including this one that was
introduced in v4:
On Wed, 21 May 2025, Jacob Keller wrote:
> diff --git a/diff-no-index.c b/diff-no-index.c
> index 9739b2b268b9..4aeeb98cfa8f 100644
> --- a/diff-no-index.c
> +++ b/diff-no-index.c
> @@ -15,20 +15,45 @@
> #include "gettext.h"
> #include "revision.h"
> #include "parse-options.h"
> +#include "pathspec.h"
> #include "string-list.h"
> #include "dir.h"
>
> -static int read_directory_contents(const char *path, struct string_list *list)
> +static int read_directory_contents(const char *path, struct string_list *list,
> + const struct pathspec *pathspec,
> + int skip)
> {
> + struct strbuf match = STRBUF_INIT;
> + int len;
> DIR *dir;
> struct dirent *e;
>
> if (!(dir = opendir(path)))
> return error("Could not open directory %s", path);
>
> - while ((e = readdir_skip_dot_and_dotdot(dir)))
> - string_list_insert(list, e->d_name);
> + if (pathspec) {
> + strbuf_addstr(&match, path);
> + strbuf_complete(&match, '/');
> + strbuf_remove(&match, 0, skip);
Okay, so here the `read_directory_contents()` function learns to
optionally skip `skip` bytes from the `path` variable, after potentially
appending a `/`.
> [...]
> @@ -337,7 +369,23 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
> paths[i] = p;
> }
>
> - fixup_paths(paths, &replacement);
> + if (fixup_paths(paths, &replacement)) {
> + parse_pathspec(&pathspec, PATHSPEC_FROMTOP | PATHSPEC_ATTR,
> + PATHSPEC_PREFER_FULL | PATHSPEC_NO_REPOSITORY,
> + NULL, &argv[2]);
> + if (pathspec.nr)
> + ps = &pathspec;
> +
> + skip1 = strlen(paths[0]);
> + skip1 += paths[0][skip1] == '/' ? 0 : 1;
Since `skip1` is defined as the length of `path[0]`, I would expect
`paths[0][skip1]` to always evaluate to NUL, and therefore the `== '/'`
condition to always evaluate to `false`. Did I miss anything?
> + skip2 = strlen(paths[1]);
> + skip2 += paths[1][skip2] == '/' ? 0 : 1;
Same here, `paths[1][skip2]` should always return `NUL`.
This has ramifications where `skip1` and `skip2` are each one larger than
the length of `paths[0]` and `paths[1]`, respectively, and hence the code
in `read_directory_contents()` will now try to remove one more than the
length of the path, after potentially appending a slash.
But what if there is already a slash? The answer is:
$ git diff --no-index -- /tmp/ /tmp/ ':!'
fatal: `pos + len' is too far after the end of the buffer
This has been reported (with Windows paths, don't let that distract) in
https://github.com/git-for-windows/git/issues/5836.
I _think_ that what the patch should have done instead was:
if (skip1 > 0 && paths[0][skip1 - 1] == '/')
skip1--;
and likewise
if (skip2 > 0 && paths[1][skip2 - 1] == '/')
skip2--;
Focusing on the lines' correctness (which I don't think was the primary
concern in the review of your patch), that would be what I would suggest.
However, this makes me wonder whether the logic itself is sound? It is not
immediately obvious to me why the `paths[0]` and `paths[1]` values aren't
matched against the pathspec yet their entirety is seemingly skipped in
`read_directory_contents()`?
Ciao,
Johannes
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-09-23 14:57 ` Johannes Schindelin
@ 2025-09-23 22:48 ` Jacob Keller
2025-09-24 11:19 ` Johannes Schindelin
0 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2025-09-23 22:48 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Junio C Hamano, Jacob Keller
[-- Attachment #1.1: Type: text/plain, Size: 7045 bytes --]
On 9/23/2025 7:57 AM, Johannes Schindelin wrote:
> Hi Jacob,
>
> I know this is about a patch that you contributed four months ago, and
> the usual feedback required sweeping changes, including this one that was
> introduced in v4:
>
Hello,
(Replying from my work address since its easier today than trying to
load my personal email up).
It's been quite some time since I did this so my memory is a bit hazy on
the logic. It took me a while to get things working, and its very likely
I missed corner cases.
> On Wed, 21 May 2025, Jacob Keller wrote:
>
>> diff --git a/diff-no-index.c b/diff-no-index.c
>> index 9739b2b268b9..4aeeb98cfa8f 100644
>> --- a/diff-no-index.c
>> +++ b/diff-no-index.c
>> @@ -15,20 +15,45 @@
>> #include "gettext.h"
>> #include "revision.h"
>> #include "parse-options.h"
>> +#include "pathspec.h"
>> #include "string-list.h"
>> #include "dir.h"
>>
>> -static int read_directory_contents(const char *path, struct string_list *list)
>> +static int read_directory_contents(const char *path, struct string_list *list,
>> + const struct pathspec *pathspec,
>> + int skip)
>> {
>> + struct strbuf match = STRBUF_INIT;
>> + int len;
>> DIR *dir;
>> struct dirent *e;
>>
>> if (!(dir = opendir(path)))
>> return error("Could not open directory %s", path);
>>
>> - while ((e = readdir_skip_dot_and_dotdot(dir)))
>> - string_list_insert(list, e->d_name);
>> + if (pathspec) {
>> + strbuf_addstr(&match, path);
>> + strbuf_complete(&match, '/');
>> + strbuf_remove(&match, 0, skip);
>
> Okay, so here the `read_directory_contents()` function learns to
> optionally skip `skip` bytes from the `path` variable, after potentially
> appending a `/`.
>
>> [...]
>> @@ -337,7 +369,23 @@ int diff_no_index(struct rev_info *revs, const struct git_hash_algo *algop,
>> paths[i] = p;
>> }
>>
>> - fixup_paths(paths, &replacement);
>> + if (fixup_paths(paths, &replacement)) {
>> + parse_pathspec(&pathspec, PATHSPEC_FROMTOP | PATHSPEC_ATTR,
>> + PATHSPEC_PREFER_FULL | PATHSPEC_NO_REPOSITORY,
>> + NULL, &argv[2]);
>> + if (pathspec.nr)
>> + ps = &pathspec;
>> +
>> + skip1 = strlen(paths[0]);
>> + skip1 += paths[0][skip1] == '/' ? 0 : 1;
>
> Since `skip1` is defined as the length of `path[0]`, I would expect
> `paths[0][skip1]` to always evaluate to NUL, and therefore the `== '/'`
> condition to always evaluate to `false`. Did I miss anything?
>
You're correct that obviously makes no sense.. :D
>> + skip2 = strlen(paths[1]);
>> + skip2 += paths[1][skip2] == '/' ? 0 : 1;
>
> Same here, `paths[1][skip2]` should always return `NUL`.
>
> This has ramifications where `skip1` and `skip2` are each one larger than
> the length of `paths[0]` and `paths[1]`, respectively, and hence the code
> in `read_directory_contents()` will now try to remove one more than the
> length of the path, after potentially appending a slash.
>
> But what if there is already a slash? The answer is:
>
> $ git diff --no-index -- /tmp/ /tmp/ ':!'
> fatal: `pos + len' is too far after the end of the buffer
>
> This has been reported (with Windows paths, don't let that distract) in
> https://github.com/git-for-windows/git/issues/5836.
>
> I _think_ that what the patch should have done instead was:
>
> if (skip1 > 0 && paths[0][skip1 - 1] == '/')
> skip1--;
>
> and likewise
>
> if (skip2 > 0 && paths[1][skip2 - 1] == '/')
> skip2--;
>
> Focusing on the lines' correctness (which I don't think was the primary
> concern in the review of your patch), that would be what I would suggest.
>
Perhaps. I need to dig into this code to see what the whole point was. I
think the idea is that I'm trying to skip past the common prefix?
> However, this makes me wonder whether the logic itself is sound? It is not
> immediately obvious to me why the `paths[0]` and `paths[1]` values aren't
> matched against the pathspec yet their entirety is seemingly skipped in
> `read_directory_contents()`?
I recall fiddling a lot to try and get this working. The idea here is
that fixup_paths does some conversions to handle the DWIM logic where a
"diff D F" becomes "diff D/F F". It returns true if both paths are
directories, so we only enter this block when both paths are
directories. (Which is required because we only support pathspec
limiting for directory differences).
We are calculating the skip length which is then passed into the
queue_diff function.
We pass the skip value into the queue_diff function, and this is the
length of the prefix of each path to skip. Essentially, we're skipping
everything from start of the path up to the root of what you pass into
the function.
The queue_diff then descends into each directory, and creates new paths
which are all the files and directories recursively under the target
directory. Each of these needs to have its root skipped (everything the
user supplied) in order to allow path spec matching to work, since we
apply pathspec matches as if you're at the root of the two trees being
diffed.
Crucially, we pass the skip value in to the first call of queue_diff,
but it remains unchanged as we descend further in, so we keep cutting
off only the first part of the path structure as we compare.
You're correct that the logic for calculating this incorrectly, as it
apparently doesn't work right if the paths end in a slash already. I'd
have to dig farther to figure out if this proposed patch is correct. I'm
not certain.
Basically, read_directory_contents is given the path to current
directory + the name of a file under the directory. We convert "path" to
be "dir/", then we remove the prefix of everything that was passed by
the user, so that we only check against parts of the dir path which are
recursively below the original passed directory. But we need to be
careful that we strip out the slash as well as the overall path, so that
we don't end up comparing subdirectories as if they're absolute paths.
Basically, if path does contain a slash, then the strlen alone is
sufficient, but if it doesn't contain a slash, then we need to avoid
adding 1. It happens that we incorrectly checked for this, which results
in us always adding 1, so when a string has a slash we skip too much.
It should be pretty easy to add a test case for this, and I think the
correct check is actually something like:
skip1 = strlen(paths[0])
if (!skip1 || paths[0][skip1 - 1] == '/')
skip1++
Basically, we actually want one to be the length of the string plus a
trailing slash. If the string doesn't include a trailing slash, we have
to add 1. If it does include one, we don't add one, since the length
already accounts for the slash. Note that if the string is 0 length,
there's no slash so skip1 should be 1, hence the !skip1 part of the check.
Let me see if I can hook up a test case and confirm this.
>
> Ciao,
> Johannes
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-09-23 22:48 ` Jacob Keller
@ 2025-09-24 11:19 ` Johannes Schindelin
2025-09-24 18:19 ` Jacob Keller
0 siblings, 1 reply; 22+ messages in thread
From: Johannes Schindelin @ 2025-09-24 11:19 UTC (permalink / raw)
To: Jacob Keller; +Cc: git, Junio C Hamano, Jacob Keller
Hi Jacob,
On Tue, 23 Sep 2025, Jacob Keller wrote:
> On 9/23/2025 7:57 AM, Johannes Schindelin wrote:
>
> > However, this makes me wonder whether the logic itself is sound? It is
> > not immediately obvious to me why the `paths[0]` and `paths[1]` values
> > aren't matched against the pathspec yet their entirety is seemingly
> > skipped in `read_directory_contents()`?
>
> I recall fiddling a lot to try and get this working. The idea here is
> that fixup_paths does some conversions to handle the DWIM logic where a
> "diff D F" becomes "diff D/F F". It returns true if both paths are
> directories, so we only enter this block when both paths are
> directories. (Which is required because we only support pathspec
> limiting for directory differences).
I do wonder, after seeing that `read_directory_contents()` has to
(re-)construct a complete `strbuf` in every single invocation whether it
would make more sense to construct two `strbuf`s in `diff_no_index()` and
pass those along to `queue_diff()` _instead_ of `skip1`/`skip2`. The
`queue_diff()` function would then have to extend these
`strbuf`s as it already does with `buffer1`/`buffer2`.
That would avoid appending the same prefix only to remove it right away
(with a not exactly cheap `memmove()`) during every
`read_directory_contents()` invocation, too, in addition to allocating and
releasing those `strbuf`s over and over again.
Ciao,
Johannes
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-09-24 11:19 ` Johannes Schindelin
@ 2025-09-24 18:19 ` Jacob Keller
2025-09-24 18:23 ` Jacob Keller
0 siblings, 1 reply; 22+ messages in thread
From: Jacob Keller @ 2025-09-24 18:19 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Junio C Hamano, Jacob Keller
[-- Attachment #1.1: Type: text/plain, Size: 2065 bytes --]
On 9/24/2025 4:19 AM, Johannes Schindelin wrote:
> Hi Jacob,
>
> On Tue, 23 Sep 2025, Jacob Keller wrote:
>
>> On 9/23/2025 7:57 AM, Johannes Schindelin wrote:
>>
>>> However, this makes me wonder whether the logic itself is sound? It is
>>> not immediately obvious to me why the `paths[0]` and `paths[1]` values
>>> aren't matched against the pathspec yet their entirety is seemingly
>>> skipped in `read_directory_contents()`?
>>
>> I recall fiddling a lot to try and get this working. The idea here is
>> that fixup_paths does some conversions to handle the DWIM logic where a
>> "diff D F" becomes "diff D/F F". It returns true if both paths are
>> directories, so we only enter this block when both paths are
>> directories. (Which is required because we only support pathspec
>> limiting for directory differences).
>
> I do wonder, after seeing that `read_directory_contents()` has to
> (re-)construct a complete `strbuf` in every single invocation whether it
> would make more sense to construct two `strbuf`s in `diff_no_index()` and
> pass those along to `queue_diff()` _instead_ of `skip1`/`skip2`. The
> `queue_diff()` function would then have to extend these
> `strbuf`s as it already does with `buffer1`/`buffer2`.
>
> That would avoid appending the same prefix only to remove it right away
> (with a not exactly cheap `memmove()`) during every
> `read_directory_contents()` invocation, too, in addition to allocating and
> releasing those `strbuf`s over and over again.
>
> Ciao,
> Johannes
Something like that is probably simpler, but I'd need to figure out how
to do things right. I'm also not 100% certain how much it would save on
computation.
What we ultimately want to construct is the ability to figure out that
we're given A/B/C/D path and a filename E and a prefix A/B, we want to
get C/D/E for use with the pathspec matching logic.
So we still need to do some sort of prefix matching here, because
queue_diff gets (and indeed, needs) the full path for its main
non-pathspec purpose.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 3/3] diff --no-index: support limiting by pathspec
2025-09-24 18:19 ` Jacob Keller
@ 2025-09-24 18:23 ` Jacob Keller
0 siblings, 0 replies; 22+ messages in thread
From: Jacob Keller @ 2025-09-24 18:23 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Junio C Hamano, Jacob Keller
[-- Attachment #1.1: Type: text/plain, Size: 1823 bytes --]
On 9/24/2025 11:19 AM, Jacob Keller wrote:
>
>
> On 9/24/2025 4:19 AM, Johannes Schindelin wrote:
>> Hi Jacob,
>>
>> On Tue, 23 Sep 2025, Jacob Keller wrote:
>>
>>> On 9/23/2025 7:57 AM, Johannes Schindelin wrote:
>>>
>>>> However, this makes me wonder whether the logic itself is sound? It is
>>>> not immediately obvious to me why the `paths[0]` and `paths[1]` values
>>>> aren't matched against the pathspec yet their entirety is seemingly
>>>> skipped in `read_directory_contents()`?
>>>
>>> I recall fiddling a lot to try and get this working. The idea here is
>>> that fixup_paths does some conversions to handle the DWIM logic where a
>>> "diff D F" becomes "diff D/F F". It returns true if both paths are
>>> directories, so we only enter this block when both paths are
>>> directories. (Which is required because we only support pathspec
>>> limiting for directory differences).
>>
>> I do wonder, after seeing that `read_directory_contents()` has to
>> (re-)construct a complete `strbuf` in every single invocation whether it
>> would make more sense to construct two `strbuf`s in `diff_no_index()` and
>> pass those along to `queue_diff()` _instead_ of `skip1`/`skip2`. The
>> `queue_diff()` function would then have to extend these
>> `strbuf`s as it already does with `buffer1`/`buffer2`.
>>
Ah, but I see what you meant better after some more thought. Instead of
bothering with skip1 and skip2 at all, we just generate the sub part of
the path along with buffer1/buffer2, but we start these new pathspec
bits empty, so that we can construct the proper C/D/E path in
read_directory_contents directly without needing to do a skip or memmove
etc.
Makes sense. Good suggestion!
I'll try to work on this today and make a test case to confirm this.
Thanks,
Jake
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-09-24 18:23 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 23:29 [PATCH v4 0/3] diff: add pathspec support to --no-index Jacob Keller
2025-05-21 23:29 ` [PATCH v4 1/3] pathspec: add match_leading_pathspec variant Jacob Keller
2025-05-21 23:29 ` [PATCH v4 2/3] pathspec: add flag to indicate operation without repository Jacob Keller
2025-05-21 23:29 ` [PATCH v4 3/3] diff --no-index: support limiting by pathspec Jacob Keller
2025-06-04 2:37 ` Ben Knoble
2025-06-04 17:22 ` Jacob Keller
2025-06-04 18:27 ` Jacob Keller
2025-06-04 20:19 ` Junio C Hamano
2025-06-04 21:05 ` Jacob Keller
2025-06-04 21:36 ` D. Ben Knoble
2025-06-04 23:22 ` Junio C Hamano
2025-09-23 14:57 ` Johannes Schindelin
2025-09-23 22:48 ` Jacob Keller
2025-09-24 11:19 ` Johannes Schindelin
2025-09-24 18:19 ` Jacob Keller
2025-09-24 18:23 ` Jacob Keller
2025-05-22 21:37 ` [PATCH v4 0/3] diff: add pathspec support to --no-index Junio C Hamano
2025-05-22 21:50 ` Jacob Keller
2025-05-22 22:04 ` Junio C Hamano
2025-06-03 21:12 ` Junio C Hamano
2025-06-04 2:32 ` Ben Knoble
2025-06-05 15:34 ` Phillip Wood
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).