git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] diff --no-index: teach option to exclude files by pattern
@ 2025-05-14 20:40 Jacob Keller
  2025-05-14 21:10 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jacob Keller @ 2025-05-14 20:40 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jacob Keller

From: Jacob Keller <jacob.keller@gmail.com>

The --no-index option of git-diff enables using the diff machinery when
operating outside a repository. This mode of git diff is able to compare
directories and produce a diff of their contents. In certain cases, it
may be helpful to ignore certain portions of a directory.

In particular, git diff --no-index does not ignore '.git' or other VCS
'hidden' folders. If you pass a path which happens to be a git
repository, it will attempt to diff the entire repository folder.
Standard diff utilities often include options to exclude files by path.
For example, the GNU diffutils program has the '-x' option to exclude
files by a pattern match.

Teach git diff --no-index the ability to exclude files by wildmatch
pattern when recursing through directories. The '--exclude' option
builds up a string list containing the patterns. These are checked with
wildmatch() in the read_directory_contents function. If any pattern
matches, then the file is not included in the directory contents.

The --exclude option is only supported by the --no-index mode. Standard
diff modes support negative pathspecs which is more powerful. I tried to
see if there was a way to add support for negative pathspecs themselves,
but haven't yet figured out if this is possible.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
I came across an issue when attempting to use git diff --no-index as a diff
replacement. I wanted to compare two folders, and one of them happened to be
a git repository. This caused the diff to recursively go into the .git
folder and try to compare all the paths.

Other diff tools such as the diff from GNU difftools have options to exclude
files by pattern match such as '-x'.

Git has negative pathspecs, but this doesn't work with git diff --no-index,
and making them work seems tricky, so I thought it wouldn't be too hard to
implement an exclude feature for git diff --no-index.

Is something like this worthwhile? I guess I could always try to use a
regular diff tool in cases where I'm operating on a potential repository..
But I like the colorization and other options of git diff and it would be
convenient to be able to still use git diff in this case.

Thoughts? What about trying to extend the tool to read in negative
pathspecs? That seems like its significantly more difficult, but might feel
more natural than the --exclude option.

Sent as RFC since this lacks tests and documentation.

 diff-no-index.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/diff-no-index.c b/diff-no-index.c
index 6f277892d3ae..913b2f3f7869 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -17,8 +17,10 @@
 #include "parse-options.h"
 #include "string-list.h"
 #include "dir.h"
+#include "wildmatch.h"
 
-static int read_directory_contents(const char *path, struct string_list *list)
+static int read_directory_contents(const char *path, struct string_list *list,
+				   struct string_list *exclude_patterns)
 {
 	DIR *dir;
 	struct dirent *e;
@@ -26,8 +28,20 @@ static int read_directory_contents(const char *path, struct string_list *list)
 	if (!(dir = opendir(path)))
 		return error("Could not open directory %s", path);
 
-	while ((e = readdir_skip_dot_and_dotdot(dir)))
+	while ((e = readdir_skip_dot_and_dotdot(dir))) {
+		int skip = 0;
+
+		for (int i = 0; i < exclude_patterns->nr; i++) {
+			if (!wildmatch(exclude_patterns->items[i].string, e->d_name, 0)) {
+				skip = 1;
+				break;
+			}
+		}
+		if (skip)
+			continue;
+
 		string_list_insert(list, e->d_name);
+	}
 
 	closedir(dir);
 	return 0;
@@ -130,7 +144,8 @@ static struct diff_filespec *noindex_filespec(const char *name, int mode,
 }
 
 static int queue_diff(struct diff_options *o,
-		      const char *name1, const char *name2, int recursing)
+		      const char *name1, const char *name2, int recursing,
+		      struct string_list *exclude_patterns)
 {
 	int mode1 = 0, mode2 = 0;
 	enum special special1 = SPECIAL_NONE, special2 = SPECIAL_NONE;
@@ -170,9 +185,9 @@ static int queue_diff(struct diff_options *o,
 		int i1, i2, ret = 0;
 		size_t len1 = 0, len2 = 0;
 
-		if (name1 && read_directory_contents(name1, &p1))
+		if (name1 && read_directory_contents(name1, &p1, exclude_patterns))
 			return -1;
-		if (name2 && read_directory_contents(name2, &p2)) {
+		if (name2 && read_directory_contents(name2, &p2, exclude_patterns)) {
 			string_list_clear(&p1, 0);
 			return -1;
 		}
@@ -217,7 +232,7 @@ static int queue_diff(struct diff_options *o,
 				n2 = buffer2.buf;
 			}
 
-			ret = queue_diff(o, n1, n2, 1);
+			ret = queue_diff(o, n1, n2, 1, exclude_patterns);
 		}
 		string_list_clear(&p1, 0);
 		string_list_clear(&p2, 0);
@@ -306,10 +321,13 @@ int diff_no_index(struct rev_info *revs,
 	const char *paths[2];
 	char *to_free[ARRAY_SIZE(paths)] = { 0 };
 	struct strbuf replacement = STRBUF_INIT;
+	struct string_list exclude_patterns = STRING_LIST_INIT_NODUP;
 	const char *prefix = revs->prefix;
 	struct option no_index_options[] = {
 		OPT_BOOL_F(0, "no-index", &no_index, "",
 			   PARSE_OPT_NONEG | PARSE_OPT_HIDDEN),
+		OPT_STRING_LIST(0, "exclude", &exclude_patterns, N_("pattern"),
+				N_("exclude files matching pattern when recursing a directory")),
 		OPT_END(),
 	};
 	struct option *options;
@@ -354,7 +372,7 @@ int diff_no_index(struct rev_info *revs,
 	setup_diff_pager(&revs->diffopt);
 	revs->diffopt.flags.exit_with_status = 1;
 
-	if (queue_diff(&revs->diffopt, paths[0], paths[1], 0))
+	if (queue_diff(&revs->diffopt, paths[0], paths[1], 0, &exclude_patterns))
 		goto out;
 	diff_set_mnemonic_prefix(&revs->diffopt, "1/", "2/");
 	diffcore_std(&revs->diffopt);
-- 
2.48.1.397.gec9d649cc640


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] diff --no-index: teach option to exclude files by pattern
  2025-05-14 20:40 [PATCH RFC] diff --no-index: teach option to exclude files by pattern Jacob Keller
@ 2025-05-14 21:10 ` Junio C Hamano
  2025-05-15 16:27   ` Jacob Keller
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2025-05-14 21:10 UTC (permalink / raw)
  To: Jacob Keller; +Cc: git, Jacob Keller

Jacob Keller <jacob.e.keller@intel.com> writes:

> From: Jacob Keller <jacob.keller@gmail.com>
>
> Teach git diff --no-index the ability to exclude files by wildmatch
> pattern when recursing through directories. The '--exclude' option
> builds up a string list containing the patterns. These are checked with
> wildmatch() in the read_directory_contents function. If any pattern
> matches, then the file is not included in the directory contents.

A quite natural question that comes to mind is:

    How would we do this for the normal "git diff" that is not the
    bolted on '--no-index' mode?

but ...

> The --exclude option is only supported by the --no-index mode. Standard
> diff modes support negative pathspecs which is more powerful. I tried to
> see if there was a way to add support for negative pathspecs themselves,
> but haven't yet figured out if this is possible.

... of course you have thought about it already.  I do agree with
you that we should figure out how and teach this mode to also use
pathspec, not necessarily only the negative ones but positive ones.

After all,

    $ git diff --no-index [<option>...] dirA dirB

is like running

    $ diff -r [<option>...] dirA dirB

after preparing these two directories like so:

    $ git archive revA | ( mkdir dirA && tar Cxf dirA - )
    $ git archive revB | ( mkdir dirB && tar Cxf dirB - )

Hence it is natural for users to expect that anything you can do
with

    $ git diff revA revB

should be doable, in

    $ git diff --no-index dirA dirB

and vice versa.  And as you said, when comparing two revisions,
you'd use pathspec for this kind of thing.

    $ git diff revA revB -- Documentation/ t/ ':!po/'

So, I pretty much agree with the need to be able to exclude some
parts of the tree(s) from comparison in "diff --no-index" mode, but
I doubt it is a good idea to tell what to exclude the "--no-index"
mode in a completely different way.

The last time I looked at it, I got an impression that the command
line argument parsing of "git diff --no-index" was messy (which is
sort of inevitable, since unlike the normal "git diff", it can
compare more than just two "collections"---it can take two paths to
regular files, for example, and in such a case pathspec arguments
can play no role), so teaching it pathspec parsing might be a bit of
work, though.

Thanks for starting an interesting topic.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] diff --no-index: teach option to exclude files by pattern
  2025-05-14 21:10 ` Junio C Hamano
@ 2025-05-15 16:27   ` Jacob Keller
  2025-05-15 18:09     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jacob Keller @ 2025-05-15 16:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jacob Keller, git

On Wed, May 14, 2025 at 2:10 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jacob Keller <jacob.e.keller@intel.com> writes:
>
> > From: Jacob Keller <jacob.keller@gmail.com>
> >
> > Teach git diff --no-index the ability to exclude files by wildmatch
> > pattern when recursing through directories. The '--exclude' option
> > builds up a string list containing the patterns. These are checked with
> > wildmatch() in the read_directory_contents function. If any pattern
> > matches, then the file is not included in the directory contents.
>
> A quite natural question that comes to mind is:
>
>     How would we do this for the normal "git diff" that is not the
>     bolted on '--no-index' mode?
>
> but ...
>
> > The --exclude option is only supported by the --no-index mode. Standard
> > diff modes support negative pathspecs which is more powerful. I tried to
> > see if there was a way to add support for negative pathspecs themselves,
> > but haven't yet figured out if this is possible.
>
> ... of course you have thought about it already.  I do agree with
> you that we should figure out how and teach this mode to also use
> pathspec, not necessarily only the negative ones but positive ones.
>

Sure, though I think we might need either an option or some other way
to distinguish pathspec vs the existing non-pathspec mode.

> After all,
>
>     $ git diff --no-index [<option>...] dirA dirB
>
> is like running
>
>     $ diff -r [<option>...] dirA dirB
>
> after preparing these two directories like so:
>
>     $ git archive revA | ( mkdir dirA && tar Cxf dirA - )
>     $ git archive revB | ( mkdir dirB && tar Cxf dirB - )
>
> Hence it is natural for users to expect that anything you can do
> with
>
>     $ git diff revA revB
>
> should be doable, in
>
>     $ git diff --no-index dirA dirB
>
> and vice versa.  And as you said, when comparing two revisions,
> you'd use pathspec for this kind of thing.
>
>     $ git diff revA revB -- Documentation/ t/ ':!po/'
>
> So, I pretty much agree with the need to be able to exclude some
> parts of the tree(s) from comparison in "diff --no-index" mode, but
> I doubt it is a good idea to tell what to exclude the "--no-index"
> mode in a completely different way.
>

Right. My main issue was that pathspec seemed to have a bunch of stuff
baked into assuming it has a repository.

> The last time I looked at it, I got an impression that the command
> line argument parsing of "git diff --no-index" was messy (which is
> sort of inevitable, since unlike the normal "git diff", it can
> compare more than just two "collections"---it can take two paths to
> regular files, for example, and in such a case pathspec arguments
> can play no role), so teaching it pathspec parsing might be a bit of
> work, though.
>

Right. It currently requires finding two paths to compare, and some
DWIM logic to make directory and file comparisons work.

pathspec capability is about specifying which things to include or
exclude from a given search. Hmm..

Actually, I think I have a path forward:

we teach git diff --no-index to treat the first 2 arguments as they
are now: pointers to the things to compare.

We check if either or both of those are directories. If they aren't,
then additional arguments won't be accepted.

If we have at least one directory, then instead of rejecting commands
with >2 arguments, we interpret any remaining arguments as pathspecs,
which apply to any directory path provided. These can limit the search
when scanning through a directory, so both positive and negative ones
would apply in the same say.

I guess the one weirdness is that pathspecs must come after the first
2 arguments, since we need to find 2 paths first. But this matches the
way that treeish must come first in git diff-tree -r takes treeish and
then pathspecs, and you can't re-order them arbitrarily either.

Does this sound like a reasonable extension to the existing 2 argument
form of git diff --no-index?

> Thanks for starting an interesting topic.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] diff --no-index: teach option to exclude files by pattern
  2025-05-15 16:27   ` Jacob Keller
@ 2025-05-15 18:09     ` Junio C Hamano
  2025-05-15 20:24       ` Jacob Keller
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2025-05-15 18:09 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jacob Keller, git

Jacob Keller <jacob.keller@gmail.com> writes:

> I guess the one weirdness is that pathspecs must come after the first
> 2 arguments, since we need to find 2 paths first. But this matches the
> way that treeish must come first in git diff-tree -r takes treeish and
> then pathspecs, and you can't re-order them arbitrarily either.
>
> Does this sound like a reasonable extension to the existing 2 argument
> form of git diff --no-index?

Absolutely.

Or you could even use "--" convention in the examples you would
write in the documentation, even though you may not absolutely need
it for the purpose of parsing the command line, to highlight the
fact that two things to be compared is given and then with an
optional pathspec after the two things, e.g.,

 $ git diff --no-index git-1.6.0 git-2.43.0 -- Documentation/

or something silly like that.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] diff --no-index: teach option to exclude files by pattern
  2025-05-15 18:09     ` Junio C Hamano
@ 2025-05-15 20:24       ` Jacob Keller
  0 siblings, 0 replies; 5+ messages in thread
From: Jacob Keller @ 2025-05-15 20:24 UTC (permalink / raw)
  To: Junio C Hamano, Jacob Keller; +Cc: git



On 5/15/2025 11:09 AM, Junio C Hamano wrote:
> Jacob Keller <jacob.keller@gmail.com> writes:
> 
>> I guess the one weirdness is that pathspecs must come after the first
>> 2 arguments, since we need to find 2 paths first. But this matches the
>> way that treeish must come first in git diff-tree -r takes treeish and
>> then pathspecs, and you can't re-order them arbitrarily either.
>>
>> Does this sound like a reasonable extension to the existing 2 argument
>> form of git diff --no-index?
> 
> Absolutely.
> 
> Or you could even use "--" convention in the examples you would
> write in the documentation, even though you may not absolutely need
> it for the purpose of parsing the command line, to highlight the
> fact that two things to be compared is given and then with an
> optional pathspec after the two things, e.g.,
> 
>  $ git diff --no-index git-1.6.0 git-2.43.0 -- Documentation/
> 
> or something silly like that.
> 

Yea, I'll do that once I get a version with doc. I sent a v2 that works
ok, but I think I need some feedback before I fully polish it, since
there are a couple of hacks to get things working.

Thanks for the feedback!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-05-15 20:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-14 20:40 [PATCH RFC] diff --no-index: teach option to exclude files by pattern Jacob Keller
2025-05-14 21:10 ` Junio C Hamano
2025-05-15 16:27   ` Jacob Keller
2025-05-15 18:09     ` Junio C Hamano
2025-05-15 20:24       ` Jacob Keller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).