* git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
@ 2025-07-23 5:47 hi
2025-07-23 8:00 ` Lidong Yan
0 siblings, 1 reply; 35+ messages in thread
From: hi @ 2025-07-23 5:47 UTC (permalink / raw)
To: git
I discovered that diff output is not filtered when the `--name-only`
flag is given, it always lists all affected files. Is this a known bug?
I could not find anything in the documentation.
Perhaps related, `--ignore-matching-lines` is ignored by my external
diff helper difftastic[1]. I'm not sure if this is an upstream issue, or
if the way external diff helpers are implemented is related to this at all.
The behavior can be observed in git 2.49.0 using `git diff --no-ext-diff
--name-only --ignore-matching-lines`.
Best,
Arne
[1]: https://difftastic.wilfred.me.uk/
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-23 5:47 git-diff: --ignore-matching-lines has no effect on the output when --name-only is used hi
@ 2025-07-23 8:00 ` Lidong Yan
2025-07-23 17:09 ` Junio C Hamano
0 siblings, 1 reply; 35+ messages in thread
From: Lidong Yan @ 2025-07-23 8:00 UTC (permalink / raw)
To: hi; +Cc: git, Michał Kępień
hi@arnes.space writes:
>
> I discovered that diff output is not filtered when the `--name-only`
> flag is given, it always lists all affected files. Is this a known bug?
> I could not find anything in the documentation.
>
> Perhaps related, `--ignore-matching-lines` is ignored by my external
> diff helper difftastic[1]. I'm not sure if this is an upstream issue, or
> if the way external diff helpers are implemented is related to this at all.
The `—ignore-matching-lines` options is introduced in 296d4a94e72
(Michał Kępień 2020-10-20 08:48:09 +0200 5803) Since the Git documentation
doesn't specify how git diff --ignore-matching-lines=<regex>
is supposed to behave when used with --name-only, I'm not sure whether
this is a bug. However, I'm confident that this is not an issue with your diff tool.
The reason is that git diff uses the built-in diffcore_std() to filter out diff file pairs
that shouldn't be output. However, in the latest Git source code, diffcore_std()
does not filter file pairs based on ignore_regex.
It appears that ignore_regex only takes effect when using the built-in diff (builtin_diff).
I'm not entirely certain about this, but if that's the case, then regardless of whether
your external diff tool supports --ignore-matching-lines or not, this option is likely
to have no effect when using an external diff.
>
> The behavior can be observed in git 2.49.0 using `git diff --no-ext-diff
> --name-only --ignore-matching-lines`.
>
> Best,
> Arne
>
> [1]: https://difftastic.wilfred.me.uk/
Cool diff tool
Thanks,
Lidong
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-23 8:00 ` Lidong Yan
@ 2025-07-23 17:09 ` Junio C Hamano
2025-07-24 1:56 ` Lidong Yan
2025-07-25 6:00 ` hi
0 siblings, 2 replies; 35+ messages in thread
From: Junio C Hamano @ 2025-07-23 17:09 UTC (permalink / raw)
To: Lidong Yan; +Cc: hi, git, Michał Kępień
Lidong Yan <yldhome2d2@gmail.com> writes:
> The `—ignore-matching-lines` options is introduced in 296d4a94e72
> (Michał Kępień 2020-10-20 08:48:09 +0200 5803) Since the Git documentation
WTH is that reference format?
> doesn't specify how git diff --ignore-matching-lines=<regex>
> is supposed to behave when used with --name-only,
True.
Traditionally, these options that deliberately allowed loss of
information (e.g., --name-only that strips _how_ they are different,
--quiet that only talks about if there is anything different) are
meant primarily to be optimization---when it can compute its result
without looking into the blob contents, it is very much encouraged
to do so. So originally the design principle was that options like
-w -b that needs to look into the blob contents are to be ignored
when computing --name-only and --quiet.
Things gradually have changed, and these days we generally consider
it a bug for the class of options that deliberately lose information
to disable the class of options that require to look into blob
contents to compute the outcome if the output were not losing
information.
"--ignore-matching-lines", together with any options that affect
XDF_IGNORE_WHITESPACE* (like -w, --ignore-space-at-eol) are special
cased to tell the machinery to look at the contents when deciding
what exit code to return:
if ((options->xdl_opts & XDF_WHITESPACE_FLAGS) ||
options->ignore_regex_nr)
options->flags.diff_from_contents = 1;
else
options->flags.diff_from_contents = 0;
so "git diff --ignore-matching-lines --quiet" would already be
operating correctly, I would think.
It is just --raw, --name-only, --name-status, and --checkdiff output
formats that deliberately ignore content based ignore mechanisms.
And I do not think it would be a good change to have them follow
"ignore" bits. When asked "has this path been modified?" "what are
the before and after blob object names?" etc., it does not make sense
for the answer to be different depending on the presense of -w or
--ignore-matching options.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-23 17:09 ` Junio C Hamano
@ 2025-07-24 1:56 ` Lidong Yan
2025-07-24 2:16 ` Eric Sunshine
2025-07-25 6:00 ` hi
1 sibling, 1 reply; 35+ messages in thread
From: Lidong Yan @ 2025-07-24 1:56 UTC (permalink / raw)
To: Junio C Hamano; +Cc: hi, git, Michał Kępień
Junio C Hamano <gitster@pobox.com> write:
>
> Lidong Yan <yldhome2d2@gmail.com> writes:
>
>> The `—ignore-matching-lines` options is introduced in 296d4a94e72
>> (Michał Kępień 2020-10-20 08:48:09 +0200 5803) Since the Git documentation
>
> WTH is that reference format?
Sorry. I should use `git log --format="%h (%s, %ad)" --date=short -n 1 296d4a`.
>
> It is just --raw, --name-only, --name-status, and --checkdiff output
> formats that deliberately ignore content based ignore mechanisms.
>
> And I do not think it would be a good change to have them follow
> "ignore" bits. When asked "has this path been modified?" "what are
> the before and after blob object names?" etc., it does not make sense
> for the answer to be different depending on the presense of -w or
> --ignore-matching options.
>
So the meaning of --ignore-matching-lines is: although the file has changed,
I don't want to see its diff content — it's not that we consider the file unchanged
just because the changes match the regular expression.
Thanks,
Lidong
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-24 1:56 ` Lidong Yan
@ 2025-07-24 2:16 ` Eric Sunshine
2025-07-24 3:38 ` Lidong Yan
0 siblings, 1 reply; 35+ messages in thread
From: Eric Sunshine @ 2025-07-24 2:16 UTC (permalink / raw)
To: Lidong Yan; +Cc: Junio C Hamano, hi, git, Michał Kępień
On Wed, Jul 23, 2025 at 9:56 PM Lidong Yan <yldhome2d2@gmail.com> wrote:
> Junio C Hamano <gitster@pobox.com> write:
> > Lidong Yan <yldhome2d2@gmail.com> writes:
> >> The `—ignore-matching-lines` options is introduced in 296d4a94e72
> >> (Michał Kępień 2020-10-20 08:48:09 +0200 5803) Since the Git documentation
> >
> > WTH is that reference format?
>
> Sorry. I should use `git log --format="%h (%s, %ad)" --date=short -n 1 296d4a`.
Simpler: git log --format=reference -n1 296d4a
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-24 2:16 ` Eric Sunshine
@ 2025-07-24 3:38 ` Lidong Yan
0 siblings, 0 replies; 35+ messages in thread
From: Lidong Yan @ 2025-07-24 3:38 UTC (permalink / raw)
To: Eric Sunshine; +Cc: Junio C Hamano, hi, git, Michał Kępień
Eric Sunshine <sunshine@sunshineco.com> writes:
>
> On Wed, Jul 23, 2025 at 9:56 PM Lidong Yan <yldhome2d2@gmail.com> wrote:
>> Junio C Hamano <gitster@pobox.com> write:
>>> Lidong Yan <yldhome2d2@gmail.com> writes:
>>>> The `—ignore-matching-lines` options is introduced in 296d4a94e72
>>>> (Michał Kępień 2020-10-20 08:48:09 +0200 5803) Since the Git documentation
>>>
>>> WTH is that reference format?
>>
>> Sorry. I should use `git log --format="%h (%s, %ad)" --date=short -n 1 296d4a`.
>
> Simpler: git log --format=reference -n1 296d4a
>
Got it, thanks very much.
Lidong
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-23 17:09 ` Junio C Hamano
2025-07-24 1:56 ` Lidong Yan
@ 2025-07-25 6:00 ` hi
2025-07-25 6:06 ` hi
2025-07-25 6:46 ` Lidong Yan
1 sibling, 2 replies; 35+ messages in thread
From: hi @ 2025-07-25 6:00 UTC (permalink / raw)
To: Junio C Hamano, Lidong Yan; +Cc: git, Michał Kępień
> it does not make sense for the answer to be different depending on the
> presense of -w or --ignore-matching options.
does it really not? i thought of `--name-only` as changing the
formatting of the output, as doing something similar to this:
git diff --no-ext-diff ... | grep -P "^(---|\+\+\+)" | cut -d/ -f2- | uniq
a file of which all changes have been filtered by
`--ignore-matching-lines` does not show up in `git diff --no-ext-diff`.
at the moment it appears like `--name-only` puts it back in.
understanding why that happens requires internal knowledge of how it is
implemented, and to me at least it was surprising. nniSell Ikea Eneby :review:
Source: [[notmuch:id:F936E930-D37B-4E1A-AF3C-47FC153B4E74@schlueter.is][Original mail]]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-25 6:00 ` hi
@ 2025-07-25 6:06 ` hi
2025-07-25 6:46 ` Lidong Yan
1 sibling, 0 replies; 35+ messages in thread
From: hi @ 2025-07-25 6:06 UTC (permalink / raw)
To: Junio C Hamano, Lidong Yan; +Cc: git, Michał Kępień
sorry for the noise at the end of the previous message, please don't mind my cat.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-25 6:00 ` hi
2025-07-25 6:06 ` hi
@ 2025-07-25 6:46 ` Lidong Yan
2025-07-25 8:08 ` hi
1 sibling, 1 reply; 35+ messages in thread
From: Lidong Yan @ 2025-07-25 6:46 UTC (permalink / raw)
To: hi; +Cc: Junio C Hamano, git, Michał Kępień
hi@arnes.space writes:
>
>> it does not make sense for the answer to be different depending on the
>> presense of -w or --ignore-matching options.
>
> does it really not? i thought of `--name-only` as changing the
> formatting of the output, as doing something similar to this:
>
> git diff --no-ext-diff ... | grep -P "^(---|\+\+\+)" | cut -d/ -f2- | uniq
Git computes a SHA value for each file, so when using --name-only to
check which files have changed, it only needs to compare the SHA values.
There's no need to generate and then filter the full diff content.
Of course, the implementation of git diff is more complex, but I hope you
can roughly understand that --name-only usually doesn't need the actual
diff content.
Thanks,
Lidong
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-25 6:46 ` Lidong Yan
@ 2025-07-25 8:08 ` hi
2025-07-25 11:11 ` Jeff King
0 siblings, 1 reply; 35+ messages in thread
From: hi @ 2025-07-25 8:08 UTC (permalink / raw)
To: Lidong Yan; +Cc: Junio C Hamano, git, Michał Kępień
i understand, and i get why that's useful from a performance
perspective. but i think i'm arguing at a different level.
i'm saying: `--name-only` can change whether a file appears in `git
diff`'s output or not. that is surprising, because the documentation
mentions nothing about this, nor does the flag itself sound like it
would.
my argument is that the non-buggy behavior would be to make
`--name-only` more consistent with the rest of `git diff`, because it
would be less surprising. the fact that `git diff` does not look at file
contents when the flag is given to me is just an implementation detail.
best,
arne
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-25 8:08 ` hi
@ 2025-07-25 11:11 ` Jeff King
2025-07-25 15:20 ` Junio C Hamano
0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2025-07-25 11:11 UTC (permalink / raw)
To: hi; +Cc: Lidong Yan, Junio C Hamano, git, Michał Kępień
On Fri, Jul 25, 2025 at 10:08:23AM +0200, hi@arnes.space wrote:
> i understand, and i get why that's useful from a performance
> perspective. but i think i'm arguing at a different level.
>
> i'm saying: `--name-only` can change whether a file appears in `git
> diff`'s output or not. that is surprising, because the documentation
> mentions nothing about this, nor does the flag itself sound like it
> would.
>
> my argument is that the non-buggy behavior would be to make
> `--name-only` more consistent with the rest of `git diff`, because it
> would be less surprising. the fact that `git diff` does not look at file
> contents when the flag is given to me is just an implementation detail.
Yeah, that line of thinking makes sense to me. It's an optimization not
to look at the file contents when we do not need to. But if you've asked
to do a more specific content-level comparison, then we could do that.
I think Junio's response earlier in the thread discusses this, and how
we already respect "-w" for "--quiet".
I'm not sure I agree with this part that he wrote, though:
> It is just --raw, --name-only, --name-status, and --checkdiff output
> formats that deliberately ignore content based ignore mechanisms.
>
> And I do not think it would be a good change to have them follow
> "ignore" bits. When asked "has this path been modified?" "what are
> the before and after blob object names?" etc., it does not make sense
> for the answer to be different depending on the presense of -w or
> --ignore-matching options.
I can see how it gets weird when you ask for --raw, but the object ids
we show do not necessarily reflect the content we actually compared. But
it is even weirder to me that something like:
git init
echo 'foo' >file
git add . && git commit -m base
echo 'foo ' >file
git diff -w --stat -p --name-only
will show "file" in the name-only list, but not in the accompanying stat
or patch! If the user has bothered to ask for a whitespace-only diff,
then it feels to me like the least-bad thing is to apply that
consistently to all output.
I do wonder if changing it at this point would somehow break somebody's
workflow or have unexpected fallout, though.
-Peff
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: git-diff: --ignore-matching-lines has no effect on the output when --name-only is used
2025-07-25 11:11 ` Jeff King
@ 2025-07-25 15:20 ` Junio C Hamano
2025-07-29 8:18 ` [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats Lidong Yan
0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-07-25 15:20 UTC (permalink / raw)
To: Jeff King; +Cc: hi, Lidong Yan, git, Michał Kępień
Jeff King <peff@peff.net> writes:
> I think Junio's response earlier in the thread discusses this, and how
> we already respect "-w" for "--quiet".
>
> I'm not sure I agree with this part that he wrote, though:
>
>> It is just --raw, --name-only, --name-status, and --checkdiff output
>> formats that deliberately ignore content based ignore mechanisms.
Ah, from the end of that non-sentence, something like "[these
formats] haven't been adjusted to the new world view". The new
world view was described with things like "-w vs --quiet" that
changed the semantics over time.
> I can see how it gets weird when you ask for --raw, but ...
The "--raw" output should never change its behaviour (as that
designed to be machine readable, and the reader is expected to make
good use of the object names by grabbing the contents of the
blob---instead of learning a single bit "are they different?").
"--checkdiff" is about detecting malformatted code, and omitting the
path from output only because it is a patch that does nothing other
than making the indentation worse (hence with -w, diff will say "ah,
no non-whitespace chagne") does not make sense, do practically
"--name-status" and "--name-only" are the only things that could be
updated.
> I do wonder if changing it at this point would somehow break somebody's
> workflow or have unexpected fallout, though.
That is certainly a concern.
I won't be touching the code for this change myself, but I can help
reviewing if somebody writes a proposed change.
Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-07-25 15:20 ` Junio C Hamano
@ 2025-07-29 8:18 ` Lidong Yan
2025-07-30 0:28 ` Junio C Hamano
0 siblings, 1 reply; 35+ messages in thread
From: Lidong Yan @ 2025-07-29 8:18 UTC (permalink / raw)
To: gitster; +Cc: git, hi, michal, peff, yldhome2d2
Previously, the `-I<regex>` option was inconsistently applied across
various `git diff` output formats. In some cases, files would appear
in the `--name-only` output but not in the accompanying `--stat` or
`-p` outputs, despite the user explicitly requesting to ignore certain
changes using `-I<regex>`. To provide this consistency, Introduces
the diffcore_ignore() function in the new diffcore-ignore.c file, which
removes changes matching `-I<regex>`, and call diffcore_ignore() in
diffcore_std().
This patch ensures that the behavior of `-I<regex>` is applied
consistently across multiple diff output formats (`--name-only`,
`--name-status`, `--stat`, and `-p`). Only `--raw` and `--check` will
ignore `-I<regex>` and retain the original output.
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
---
Makefile | 1 +
diff.c | 2 +
diffcore-ignore.c | 152 ++++++++++++++++++++++++++++++++++++++++
diffcore.h | 1 +
t/t4013-diff-various.sh | 57 ++++++++++++++-
5 files changed, 211 insertions(+), 2 deletions(-)
create mode 100644 diffcore-ignore.c
diff --git a/Makefile b/Makefile
index 5f7dd79dfa..2cdbd5d3fb 100644
--- a/Makefile
+++ b/Makefile
@@ -1014,6 +1014,7 @@ LIB_OBJS += diff-no-index.o
LIB_OBJS += diff.o
LIB_OBJS += diffcore-break.o
LIB_OBJS += diffcore-delta.o
+LIB_OBJS += diffcore-ignore.o
LIB_OBJS += diffcore-order.o
LIB_OBJS += diffcore-pickaxe.o
LIB_OBJS += diffcore-rename.o
diff --git a/diff.c b/diff.c
index dca87e164f..b9eed89531 100644
--- a/diff.c
+++ b/diff.c
@@ -7088,6 +7088,8 @@ void diffcore_std(struct diff_options *options)
}
if (options->pickaxe_opts & DIFF_PICKAXE_KINDS_MASK)
diffcore_pickaxe(options);
+ if (options->ignore_regex)
+ diffcore_ignore(options);
if (options->orderfile)
diffcore_order(options->orderfile);
if (options->rotate_to)
diff --git a/diffcore-ignore.c b/diffcore-ignore.c
new file mode 100644
index 0000000000..1578b264c1
--- /dev/null
+++ b/diffcore-ignore.c
@@ -0,0 +1,152 @@
+#include "git-compat-util.h"
+#include "diff.h"
+#include "diffcore.h"
+#include "xdiff-interface.h"
+
+#define KEEP 0
+#define IGNORE 1
+
+struct diffignore_cb {
+ regex_t **regex;
+ size_t sz;
+ int miss;
+};
+
+/*
+ * Check for any added or deleted line that does not match the ignore pattern.
+ * If found, mark data->miss=1 and return early.
+ */
+static int diffignore_consume(void *priv, char *line, unsigned long len)
+{
+ struct diffignore_cb *data = priv;
+
+ if (line[0] != '+' && line[0] != '-')
+ return 0;
+ if (data->miss)
+ BUG("Already matched in diffignore_consume! Broken xdiff_emit_line_fn?");
+
+ /* check if any of the regexes match */
+ data->miss = 1;
+ for (size_t nr = 0; nr < data->sz; nr++) {
+ regmatch_t regmatch;
+ regex_t *regex = data->regex[nr];
+
+ if (!regexec_buf(regex, line + 1, len - 1, 1,
+ ®match, 0)) {
+ data->miss = 0;
+ break;
+ }
+ }
+
+ /* stop looking for miss */
+ if (data->miss)
+ return 1;
+ return 0;
+}
+
+/* return IGNORE to ignore this change, return KEEP to keep it. */
+static int diff_ignore(mmfile_t *one, mmfile_t *two, struct diff_options *o)
+{
+ struct diffignore_cb ecbdata;
+ xpparam_t xpp;
+ xdemitconf_t xecfg;
+ int ret;
+
+ /*
+ * We have both sides; need to run textual diff and check if
+ * there are any unmatched (non-ignored) added or deleted lines.
+ */
+ memset(&xpp, 0, sizeof(xpp));
+ memset(&xecfg, 0, sizeof(xecfg));
+ ecbdata.regex = o->ignore_regex;
+ ecbdata.sz = o->ignore_regex_nr;
+ ecbdata.miss = 0;
+ xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
+ xecfg.ctxlen = 0;
+ xecfg.interhunkctxlen = 0;
+
+ /*
+ * An xdiff error might be our "data->miss" from above. See the
+ * comment for xdiff_emit_line_fn in xdiff-interface.h
+ */
+ ret = xdi_diff_outf(one, two, NULL, diffignore_consume,
+ &ecbdata, &xpp, &xecfg);
+
+ /* error happened, keep this change */
+ if (ret)
+ return KEEP;
+ /* If no regex matches, keep this change */
+ return ecbdata.miss ? KEEP : IGNORE;
+}
+
+/* return IGNORE to ignore this change, return KEEP to keep it. */
+static int ignore_match(struct diff_filepair *p, struct diff_options *o)
+{
+ struct userdiff_driver *textconv_one = NULL;
+ struct userdiff_driver *textconv_two = NULL;
+ mmfile_t mf1, mf2;
+ int ret;
+
+ /* keep unmerged */
+ if (!DIFF_FILE_VALID(p->one) && !DIFF_FILE_VALID(p->two))
+ return KEEP;
+
+ if (o->flags.allow_textconv) {
+ textconv_one = get_textconv(o->repo, p->one);
+ textconv_two = get_textconv(o->repo, p->two);
+ }
+
+ /* unmodified pair will be ignored anyway */
+ if (textconv_one == textconv_two && diff_unmodified_pair(p))
+ return IGNORE;
+
+ /* keep binary files if diff cannot be performed */
+ if (!o->flags.text &&
+ ((!textconv_one && diff_filespec_is_binary(o->repo, p->one)) ||
+ (!textconv_two && diff_filespec_is_binary(o->repo, p->two))))
+ return KEEP;
+
+ mf1.size = fill_textconv(o->repo, textconv_one, p->one, &mf1.ptr);
+ mf2.size = fill_textconv(o->repo, textconv_two, p->two, &mf2.ptr);
+
+ ret = diff_ignore(&mf1, &mf2, o);
+
+ if (textconv_one)
+ free(mf1.ptr);
+ if (textconv_two)
+ free(mf2.ptr);
+ diff_free_filespec_data(p->one);
+ diff_free_filespec_data(p->two);
+
+ return ret;
+}
+
+void diffcore_ignore(struct diff_options *o)
+{
+ struct diff_queue_struct *q = &diff_queued_diff;
+ struct diff_queue_struct outq = DIFF_QUEUE_INIT;
+
+ if (o->output_format &
+ (DIFF_FORMAT_PATCH |
+ DIFF_FORMAT_RAW |
+ DIFF_FORMAT_CHECKDIFF)) {
+ /*
+ * If we are generating a patch, let builtin_diff() ignore
+ * changes and return early here. If we are generating raw
+ * output or detecting malformed code, we cannot ignore
+ * changes, so we return early.
+ */
+ return;
+ }
+
+ for (int i = 0; i < q->nr; i++) {
+ struct diff_filepair *p = q->queue[i];
+ if (ignore_match(p, o))
+ diff_free_filepair(p);
+ else
+ diff_q(&outq, p);
+ }
+
+ free(q->queue);
+ *q = outq;
+}
diff --git a/diffcore.h b/diffcore.h
index 9c0a0e7aaf..97e6e3553b 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -189,6 +189,7 @@ void diffcore_rename_extended(struct diff_options *options,
struct strmap *cached_pairs);
void diffcore_merge_broken(void);
void diffcore_pickaxe(struct diff_options *);
+void diffcore_ignore(struct diff_options *);
void diffcore_order(const char *orderfile);
void diffcore_rotate(struct diff_options *);
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 8ebd170451..344623ecbe 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -609,10 +609,10 @@ test_expect_success 'show A B ... -- <pathspec>' '
test_cmp expect actual
'
-test_expect_success 'diff -I<regex>: setup' '
+test_expect_success 'diff -I<regex>: setup file0' '
git checkout master &&
test_seq 50 >file0 &&
- git commit -m "Set up -I<regex> test file" file0 &&
+ git commit -m "Set up -I<regex> test file0" file0 &&
test_seq 50 | sed -e "s/13/ten and three/" -e "/7\$/d" >file0 &&
echo >>file0
'
@@ -643,6 +643,59 @@ test_expect_success 'diff -I<regex> --stat' '
test_cmp expect actual
'
+test_expect_success 'diff -I<regex>: setup file1' '
+ test_seq 50 >file1 &&
+ git add file1 &&
+ test_seq 50 | sed -e "s/13/ten and three/" -e "/^[124-9]/ s/\$/ /" >file1
+'
+
+test_expect_success 'diff -I<regex>: ignore file1' '
+ git diff --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual &&
+ cat >expect <<-\EOF &&
+ diff --git a/file0 b/file0
+ --- a/file0
+ +++ b/file0
+ @@ -34,7 +31,6 @@
+ 34
+ 35
+ 36
+ -37
+ 38
+ 39
+ 40
+ EOF
+ compare_diff_patch expect actual &&
+
+ git diff --stat --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual &&
+ cat >expect <<-\EOF &&
+ file0 | 1 -
+ 1 file changed, 1 deletion(-)
+ EOF
+ test_cmp expect actual &&
+
+ git diff --name-status --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual &&
+ cat >expect <<-\EOF &&
+ M file0
+ EOF
+ test_cmp expect actual
+'
+
+test_expect_success 'diff -I<regex> --raw: --raw ignores -I<regex>' '
+ git diff --raw >expect &&
+ git diff --raw --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'diff -I<regex> --check: --check ignores -I<regex>' '
+ test_must_fail git diff --check 2>&1 >expect &&
+ test_must_fail git diff --check --ignore-blank-lines -I"ten.*e" -I"^[124-9]" 2>&1 >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'diff -I<regex>: rm file1' '
+ git rm -f file1
+'
+
test_expect_success 'diff -I<regex>: detect malformed regex' '
test_expect_code 129 git diff --ignore-matching-lines="^[124-9" 2>error &&
test_grep "invalid regex given to -I: " error
--
2.50.1.440.g2157409008
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-07-29 8:18 ` [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats Lidong Yan
@ 2025-07-30 0:28 ` Junio C Hamano
2025-08-02 10:22 ` Jeff King
2025-08-03 14:51 ` [PATCH v2] " Lidong Yan
0 siblings, 2 replies; 35+ messages in thread
From: Junio C Hamano @ 2025-07-30 0:28 UTC (permalink / raw)
To: Lidong Yan; +Cc: git, hi, michal, peff
Lidong Yan <yldhome2d2@gmail.com> writes:
> Previously, the `-I<regex>` option was inconsistently applied across
> various `git diff` output formats. In some cases, files would appear
> in the `--name-only` output but not in the accompanying `--stat` or
> `-p` outputs, despite the user explicitly requesting to ignore certain
> changes using `-I<regex>`. To provide this consistency, Introduces
> the diffcore_ignore() function in the new diffcore-ignore.c file, which
> removes changes matching `-I<regex>`, and call diffcore_ignore() in
> diffcore_std().
>
> This patch ensures that the behavior of `-I<regex>` is applied
> consistently across multiple diff output formats (`--name-only`,
> `--name-status`, `--stat`, and `-p`). Only `--raw` and `--check` will
> ignore `-I<regex>` and retain the original output.
>
> Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
> ---
> Makefile | 1 +
> diff.c | 2 +
> diffcore-ignore.c | 152 ++++++++++++++++++++++++++++++++++++++++
> diffcore.h | 1 +
> t/t4013-diff-various.sh | 57 ++++++++++++++-
> 5 files changed, 211 insertions(+), 2 deletions(-)
> create mode 100644 diffcore-ignore.c
The enthusiasm is appreciated, but the implementation raises two
questions.
* This special cases -I<pattern>, but any option that causes us to
set the .diff_from_contents flag, not just -I<pattern>, can cause
the raw blob comparison to be potentially different from what the
blob contents are compared with various "ignore this class of
changes" criteria. Shouldn't "git diff -w --name-status" and the
like get the same treatment?
* Also, should we internally run diff twice, especially even when
we are going to show the patch output and are not limited to
FORMAT_NAME and FORMAT_NAME_STATUS? Generally, running the real
diff in any of the diffcore transformatin is a sign of trouble.
Also, the usual way to compose a log message of this project is to
- Give an observation on how the current system works in the
present tense (so no need to say "Currently X is Y", or
"Previously X was Y" to describe the state before your change;
just "X is Y" is enough), and discuss what you perceive as a
problem in it.
- Propose a solution (optional---often, problem description
trivially leads to an obvious solution in reader's minds).
- Give commands to somebody editing the codebase to "make it so",
instead of saying "This commit does X".
in this order.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-07-30 0:28 ` Junio C Hamano
@ 2025-08-02 10:22 ` Jeff King
2025-08-03 8:42 ` Lidong Yan
` (2 more replies)
2025-08-03 14:51 ` [PATCH v2] " Lidong Yan
1 sibling, 3 replies; 35+ messages in thread
From: Jeff King @ 2025-08-02 10:22 UTC (permalink / raw)
To: Lidong Yan; +Cc: Junio C Hamano, git, hi, michal
On Tue, Jul 29, 2025 at 05:28:00PM -0700, Junio C Hamano wrote:
> The enthusiasm is appreciated, but the implementation raises two
> questions.
>
> * This special cases -I<pattern>, but any option that causes us to
> set the .diff_from_contents flag, not just -I<pattern>, can cause
> the raw blob comparison to be potentially different from what the
> blob contents are compared with various "ignore this class of
> changes" criteria. Shouldn't "git diff -w --name-status" and the
> like get the same treatment?
We already have the diff_from_contents flag which is used for
--exit-code. We should be able to see where that logic is applied and do
something similar. It looks like it happens in diff_flush(), which makes
sense:
if (output_format & DIFF_FORMAT_NO_OUTPUT &&
options->flags.exit_with_status &&
options->flags.diff_from_contents) {
/*
* run diff_flush_patch for the exit status. setting
* options->file to /dev/null should be safe, because we
* aren't supposed to produce any output anyway.
*/
diff_free_file(options);
options->file = xfopen("/dev/null", "w");
options->close_file = 1;
options->color_moved = 0;
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
if (check_pair_status(p))
diff_flush_patch(p, options);
if (options->found_changes)
break;
}
}
So here's a naive application of the same technique:
diff --git a/diff.c b/diff.c
index 76291e238c..0fe6eb7443 100644
--- a/diff.c
+++ b/diff.c
@@ -6845,8 +6845,28 @@ void diff_flush(struct diff_options *options)
DIFF_FORMAT_CHECKDIFF)) {
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
- if (check_pair_status(p))
- flush_one_pair(p, options);
+
+ if (!check_pair_status(p))
+ continue;
+
+ if (options->flags.diff_from_contents) {
+ FILE *orig_out = options->file;
+ int orig_changes = options->found_changes;
+ int skip;
+
+ options->file = xfopen("/dev/null", "w");
+ diff_flush_patch(p, options);
+ skip = !options->found_changes;
+
+ fclose(options->file);
+ options->file = orig_out;
+ options->found_changes = orig_changes;
+
+ if (skip)
+ continue;
+ }
+
+ flush_one_pair(p, options);
}
separator++;
}
which works on a trivial example. It affects all of raw, name-only,
name-status, and checkdiff. I know Junio said that --raw should not be
affected, but I'm not sure I agree. Anyway, it should be possible to
split the logic by output type.
I'm not sure if stuff like --stat would need something similar. It's
already doing a content comparison, so presumably it handles it
internally. Maybe stuff like --dirstat would need it, too? In which case
we'd maybe want to annotate each filepair in an initial loop with
whether it's modified at the content-level, and then take that into
account in various code paths.
And of course it's horribly hacky looking. Some refactoring might help.
Certainly it is silly to open /dev/null each time through the loop.
There might also be a better way of checking whether the diff found
anything than the found_changes flag.
So this is really just sketching out the direction, and somebody would
need to figure out the details.
> * Also, should we internally run diff twice, especially even when
> we are going to show the patch output and are not limited to
> FORMAT_NAME and FORMAT_NAME_STATUS? Generally, running the real
> diff in any of the diffcore transformatin is a sign of trouble.
The patch above also runs the diff twice for "-Ifoo --name-only -p". But
I think we are kind of stuck there. We want to show all name-only
entries before any content diffs. So either we have to run the content
diff twice, or we have to buffer it to show after we decide whether to
show name-only entries.
-Peff
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-08-02 10:22 ` Jeff King
@ 2025-08-03 8:42 ` Lidong Yan
2025-08-03 15:43 ` Junio C Hamano
2025-08-04 4:39 ` Junio C Hamano
2 siblings, 0 replies; 35+ messages in thread
From: Lidong Yan @ 2025-08-03 8:42 UTC (permalink / raw)
To: Jeff King, Junio C Hamano; +Cc: git, hi, michal
On Sun, Aug 2, 2025 at 06:22PM, Jeff King <peff@peff.net> wrote:
>
> So here's a naive application of the same technique:
>
> diff --git a/diff.c b/diff.c
> index 76291e238c..0fe6eb7443 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6845,8 +6845,28 @@ void diff_flush(struct diff_options *options)
> DIFF_FORMAT_CHECKDIFF)) {
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
> - if (check_pair_status(p))
> - flush_one_pair(p, options);
> +
> + if (!check_pair_status(p))
> + continue;
> +
> + if (options->flags.diff_from_contents) {
> + FILE *orig_out = options->file;
> + int orig_changes = options->found_changes;
> + int skip;
> +
> + options->file = xfopen("/dev/null", "w");
> + diff_flush_patch(p, options);
> + skip = !options->found_changes;
> +
> + fclose(options->file);
> + options->file = orig_out;
> + options->found_changes = orig_changes;
> +
> + if (skip)
> + continue;
> + }
> +
> + flush_one_pair(p, options);
> }
> separator++;
> }
>
> which works on a trivial example. It affects all of raw, name-only,
> name-status, and checkdiff. I know Junio said that --raw should not be
> affected, but I'm not sure I agree. Anyway, it should be possible to
> split the logic by output type.
I think I could do the same thing in diffcore_ignore(). Like:
+void diffcore_ignore(struct diff_options *o)
+{
+ struct diff_queue_struct *q = &diff_queued_diff;
+ struct diff_queue_struct outq = DIFF_QUEUE_INIT;
+
+ if (!(o->output_format &
+ (DIFF_FORMAT_NAME |
+ DIFF_FORMAT_NAME_STATUS)))
+ return;
+
+ for (int i = 0; i < q->nr; i++) {
+ struct diff_filepair *p = q->queue[i];
+ if (ignore_match(p, o))
+ diff_free_filepair(p);
+ else
+ diff_q(&outq, p);
+ }
+
+ free(q->queue);
+ *q = outq;
+}
And ignore_match(p, o) will run xdl_diff() for file pair p. This approach
ensures that the behavior of `git diff --raw` and `git diff --check` remains
unaffected.
> I'm not sure if stuff like --stat would need something similar. It's
> already doing a content comparison, so presumably it handles it
> internally. Maybe stuff like --dirstat would need it, too? In which case
> we'd maybe want to annotate each filepair in an initial loop with
> whether it's modified at the content-level, and then take that into
> account in various code paths.
>
> And of course it's horribly hacky looking. Some refactoring might help.
> Certainly it is silly to open /dev/null each time through the loop.
> There might also be a better way of checking whether the diff found
> anything than the found_changes flag.
>
> So this is really just sketching out the direction, and somebody would
> need to figure out the details.
Seems like compute_diffstat() will run xdl_diff() to fill diffstat. Since diff_flush()
calls compute_diffstat() for --stat, --dirstat=lines, --shortstat and --namestat, we
shouldn’t run an extra xdl_diff() for them. I don’t know if --dirstat=files would
need an extra xdl_diff() though.
>> * Also, should we internally run diff twice, especially even when
>> we are going to show the patch output and are not limited to
>> FORMAT_NAME and FORMAT_NAME_STATUS? Generally, running the real
>> diff in any of the diffcore transformatin is a sign of trouble.
>
> The patch above also runs the diff twice for "-Ifoo --name-only -p". But
> I think we are kind of stuck there. We want to show all name-only
> entries before any content diffs. So either we have to run the content
> diff twice, or we have to buffer it to show after we decide whether to
> show name-only entries.
I haven’t thought of a good way to avoid running the diff twice either.
Caching the diff content seems quite complicated. Moreover, `git diff -G<regex> -p`
requires running the diff twice: the first diff is used to filter out file pairs
that don’t match, and the second diff outputs the patch.
On Tue, Jul 29, 2025 at 05:28:00PM -0700, Junio C Hamano wrote:
>
> The enthusiasm is appreciated, but the implementation raises two
> questions.
>
> * This special cases -I<pattern>, but any option that causes us to
> set the .diff_from_contents flag, not just -I<pattern>, can cause
> the raw blob comparison to be potentially different from what the
> blob contents are compared with various "ignore this class of
> changes" criteria. Shouldn't "git diff -w --name-status" and the
> like get the same treatment?
Yes, I think I could modify ignore_match() to support -w and other ignore options.
> Also, the usual way to compose a log message of this project is to
>
> - Give an observation on how the current system works in the
> present tense (so no need to say "Currently X is Y", or
> "Previously X was Y" to describe the state before your change;
> just "X is Y" is enough), and discuss what you perceive as a
> problem in it.
>
> - Propose a solution (optional---often, problem description
> trivially leads to an obvious solution in reader's minds).
>
> - Give commands to somebody editing the codebase to "make it so",
> instead of saying "This commit does X".
>
> in this order.
Thank you once again for patiently explaining how to write proper log messages.
I’ve made notes and am ready to apply them to future commits.
Thanks,
Lidong
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v2] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-07-30 0:28 ` Junio C Hamano
2025-08-02 10:22 ` Jeff King
@ 2025-08-03 14:51 ` Lidong Yan
2025-08-04 0:39 ` Junio C Hamano
2025-08-06 12:33 ` [PATCH v3] diff: ensure consistent diff behavior with ignore options Lidong Yan
1 sibling, 2 replies; 35+ messages in thread
From: Lidong Yan @ 2025-08-03 14:51 UTC (permalink / raw)
To: gitster; +Cc: git, hi, michal, peff, yldhome2d2
`git diff -I<regex>` option is inconsistently applied across various
output formats. In some cases, files would appear in the `--name-only`
output but not in the accompanying `--stat` or `-p` outputs, despite
the user explicitly requesting to ignore certain changes using
`-I<regex>`. Not only for `-I<regex>`, but this inconsistency also
exists for other output formats that have `.diff_from_content` set
(e.g. `-w`, `--ignore-space-at-eol` and `--ignore-space-change`).
To provide this consistency, introduces diffcore_ignore() to filter
out file pairs which only contains changes marked for ignoring before
flush diff output. Add test cases in t4015 and t4013 to verify that
files ignored by each ignore option do not appear in the output when
using the `--name-only` and `--name-status` options.
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
---
Makefile | 1 +
diff.c | 2 +
diffcore-ignore.c | 151 +++++++++++++++++++++++++++++++++++++
diffcore.h | 1 +
meson.build | 1 +
t/t4013-diff-various.sh | 53 +++++++++++++
t/t4015-diff-whitespace.sh | 10 ++-
7 files changed, 217 insertions(+), 2 deletions(-)
create mode 100644 diffcore-ignore.c
diff --git a/Makefile b/Makefile
index 5f7dd79dfa..2cdbd5d3fb 100644
--- a/Makefile
+++ b/Makefile
@@ -1014,6 +1014,7 @@ LIB_OBJS += diff-no-index.o
LIB_OBJS += diff.o
LIB_OBJS += diffcore-break.o
LIB_OBJS += diffcore-delta.o
+LIB_OBJS += diffcore-ignore.o
LIB_OBJS += diffcore-order.o
LIB_OBJS += diffcore-pickaxe.o
LIB_OBJS += diffcore-rename.o
diff --git a/diff.c b/diff.c
index dca87e164f..1f905b5354 100644
--- a/diff.c
+++ b/diff.c
@@ -7088,6 +7088,8 @@ void diffcore_std(struct diff_options *options)
}
if (options->pickaxe_opts & DIFF_PICKAXE_KINDS_MASK)
diffcore_pickaxe(options);
+ if (options->flags.diff_from_contents)
+ diffcore_ignore(options);
if (options->orderfile)
diffcore_order(options->orderfile);
if (options->rotate_to)
diff --git a/diffcore-ignore.c b/diffcore-ignore.c
new file mode 100644
index 0000000000..2b80911449
--- /dev/null
+++ b/diffcore-ignore.c
@@ -0,0 +1,151 @@
+#include "git-compat-util.h"
+#include "diff.h"
+#include "diffcore.h"
+#include "xdiff-interface.h"
+
+#define KEEP 0
+#define IGNORE 1
+
+struct diffignore_cb {
+ regex_t **regex;
+ size_t sz;
+ int miss;
+};
+
+/*
+ * Check for any added or deleted line that does not match the ignore pattern.
+ * If found, mark data->miss=1 and return early.
+ */
+static int diffignore_consume(void *priv, char *line, unsigned long len)
+{
+ struct diffignore_cb *data = priv;
+
+ if (line[0] != '+' && line[0] != '-')
+ return 0;
+ if (data->miss)
+ BUG("Already matched in diffignore_consume! Broken xdiff_emit_line_fn?");
+
+ /* check if any of the regexes match */
+ data->miss = 1;
+ for (size_t nr = 0; nr < data->sz; nr++) {
+ regmatch_t regmatch;
+ regex_t *regex = data->regex[nr];
+
+ if (!regexec_buf(regex, line + 1, len - 1, 1,
+ ®match, 0)) {
+ data->miss = 0;
+ break;
+ }
+ }
+
+ /* stop looking for miss */
+ if (data->miss)
+ return 1;
+ return 0;
+}
+
+/* return IGNORE to ignore this change, return KEEP to keep it. */
+static int diff_ignore(mmfile_t *one, mmfile_t *two, struct diff_options *o)
+{
+ struct diffignore_cb ecbdata;
+ xpparam_t xpp;
+ xdemitconf_t xecfg;
+ int ret;
+
+ /*
+ * We have both sides; need to run textual diff and check if
+ * there are any unmatched (non-ignored) added or deleted lines.
+ */
+ memset(&xpp, 0, sizeof(xpp));
+ memset(&xecfg, 0, sizeof(xecfg));
+ ecbdata.regex = o->ignore_regex;
+ ecbdata.sz = o->ignore_regex_nr;
+ ecbdata.miss = 0;
+ xecfg.flags = XDL_EMIT_NO_HUNK_HDR;
+ xecfg.ctxlen = 0;
+ xecfg.interhunkctxlen = 0;
+ xpp.flags = o->xdl_opts & XDF_WHITESPACE_FLAGS;
+
+ /*
+ * An xdiff error might be our "data->miss" from above. See the
+ * comment for xdiff_emit_line_fn in xdiff-interface.h
+ */
+ ret = xdi_diff_outf(one, two, NULL, diffignore_consume,
+ &ecbdata, &xpp, &xecfg);
+
+ /* error happened, keep this change */
+ if (ret)
+ return KEEP;
+ /* If no regex matches, keep this change */
+ return ecbdata.miss ? KEEP : IGNORE;
+}
+
+/* return IGNORE to ignore this change, return KEEP to keep it. */
+static int ignore_match(struct diff_filepair *p, struct diff_options *o)
+{
+ struct userdiff_driver *textconv_one = NULL;
+ struct userdiff_driver *textconv_two = NULL;
+ mmfile_t mf1, mf2;
+ int ret;
+
+ /* keep unmerged */
+ if (DIFF_PAIR_UNMERGED(p))
+ return KEEP;
+
+ /* keep add/delete */
+ if (!DIFF_FILE_VALID(p->one) || !DIFF_FILE_VALID(p->two))
+ return KEEP;
+
+ /* keep mode changed pair */
+ if (DIFF_PAIR_MODE_CHANGED(p))
+ return KEEP;
+
+ if (o->flags.allow_textconv) {
+ textconv_one = get_textconv(o->repo, p->one);
+ textconv_two = get_textconv(o->repo, p->two);
+ }
+
+ /* unmodified pair will be ignored anyway */
+ if (textconv_one == textconv_two && diff_unmodified_pair(p))
+ return IGNORE;
+
+ /* keep binary files if diff cannot be performed */
+ if (!o->flags.text &&
+ ((!textconv_one && diff_filespec_is_binary(o->repo, p->one)) ||
+ (!textconv_two && diff_filespec_is_binary(o->repo, p->two))))
+ return KEEP;
+
+ mf1.size = fill_textconv(o->repo, textconv_one, p->one, &mf1.ptr);
+ mf2.size = fill_textconv(o->repo, textconv_two, p->two, &mf2.ptr);
+
+ ret = diff_ignore(&mf1, &mf2, o);
+
+ if (textconv_one)
+ free(mf1.ptr);
+ if (textconv_two)
+ free(mf2.ptr);
+ diff_free_filespec_data(p->one);
+ diff_free_filespec_data(p->two);
+
+ return ret;
+}
+
+void diffcore_ignore(struct diff_options *o)
+{
+ struct diff_queue_struct *q = &diff_queued_diff;
+ struct diff_queue_struct outq = DIFF_QUEUE_INIT;
+
+ if (!(o->output_format & (DIFF_FORMAT_NAME | DIFF_FORMAT_NAME_STATUS)))
+ return;
+
+ for (int i = 0; i < q->nr; i++) {
+ struct diff_filepair *p = q->queue[i];
+ if (ignore_match(p, o))
+ diff_free_filepair(p);
+ else
+ diff_q(&outq, p);
+ }
+
+ free(q->queue);
+ *q = outq;
+}
diff --git a/diffcore.h b/diffcore.h
index 9c0a0e7aaf..97e6e3553b 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -189,6 +189,7 @@ void diffcore_rename_extended(struct diff_options *options,
struct strmap *cached_pairs);
void diffcore_merge_broken(void);
void diffcore_pickaxe(struct diff_options *);
+void diffcore_ignore(struct diff_options *);
void diffcore_order(const char *orderfile);
void diffcore_rotate(struct diff_options *);
diff --git a/meson.build b/meson.build
index 92d62695e3..c95c9b92bd 100644
--- a/meson.build
+++ b/meson.build
@@ -329,6 +329,7 @@ libgit_sources = [
'diffcore-delta.c',
'diffcore-order.c',
'diffcore-pickaxe.c',
+ 'diffcore-ignore.c',
'diffcore-rename.c',
'diffcore-rotate.c',
'dir-iterator.c',
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 8ebd170451..72eb7ae703 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -643,6 +643,59 @@ test_expect_success 'diff -I<regex> --stat' '
test_cmp expect actual
'
+test_expect_success 'diff -I<regex>: setup file1' '
+ test_seq 50 >file1 &&
+ git add file1 &&
+ test_seq 50 | sed -e "s/13/ten and three/" -e "s/^[124-9].*/& /" >file1
+'
+
+test_expect_success 'diff -I<regex>: ignore file1' '
+ git diff --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual &&
+ cat >expect <<-\EOF &&
+ diff --git a/file0 b/file0
+ --- a/file0
+ +++ b/file0
+ @@ -34,7 +31,6 @@
+ 34
+ 35
+ 36
+ -37
+ 38
+ 39
+ 40
+ EOF
+ compare_diff_patch expect actual &&
+
+ git diff --stat --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual &&
+ cat >expect <<-\EOF &&
+ file0 | 1 -
+ 1 file changed, 1 deletion(-)
+ EOF
+ test_cmp expect actual &&
+
+ git diff --name-status --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual &&
+ cat >expect <<-\EOF &&
+ M file0
+ EOF
+ test_cmp expect actual
+'
+
+test_expect_success 'diff -I<regex> --raw: --raw ignores -I<regex>' '
+ git diff --raw >expect &&
+ git diff --raw --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'diff -I<regex> --check: --check ignores -I<regex>' '
+ test_must_fail git diff --check 2>&1 >expect &&
+ test_must_fail git diff --check --ignore-blank-lines -I"ten.*e" -I"^[124-9]" 2>&1 >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'diff -I<regex>: rm file1' '
+ git rm -f file1
+'
+
test_expect_success 'diff -I<regex>: detect malformed regex' '
test_expect_code 129 git diff --ignore-matching-lines="^[124-9" 2>error &&
test_grep "invalid regex given to -I: " error
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 52e3e476ff..6e46a070e5 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -11,7 +11,7 @@ test_description='Test special whitespace in diff engine.
. "$TEST_DIRECTORY"/lib-diff.sh
for opt_res in --patch --quiet -s --stat --shortstat --dirstat=lines \
- --raw! --name-only! --name-status!
+ --name-only --name-status --raw!
do
opts=${opt_res%!} expect_failure=
test "$opts" = "$opt_res" ||
@@ -43,7 +43,13 @@ do
echo foo >x &&
git add x &&
echo " foo" >x &&
- $expect_failure git diff -w $opts --exit-code x
+ $expect_failure git diff -w $opts --exit-code x &&
+ echo "foo " >x &&
+ $expect_failure git diff --ignore-space-at-eol $opts --exit-code x &&
+ echo "fo o" >x &&
+ git add x &&
+ echo "fo o " >x &&
+ $expect_failure git diff --ignore-space-change $opts --exit-code x
'
done
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-08-02 10:22 ` Jeff King
2025-08-03 8:42 ` Lidong Yan
@ 2025-08-03 15:43 ` Junio C Hamano
2025-08-04 4:39 ` Junio C Hamano
2 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2025-08-03 15:43 UTC (permalink / raw)
To: Jeff King; +Cc: Lidong Yan, git, hi, michal
Jeff King <peff@peff.net> writes:
> We already have the diff_from_contents flag which is used for
> --exit-code. We should be able to see where that logic is applied and do
> something similar. It looks like it happens in diff_flush(), which makes
> sense:
>
> if (output_format & DIFF_FORMAT_NO_OUTPUT &&
> options->flags.exit_with_status &&
> options->flags.diff_from_contents) {
> /*
> * run diff_flush_patch for the exit status. setting
> * options->file to /dev/null should be safe, because we
> * aren't supposed to produce any output anyway.
> */
> diff_free_file(options);
> options->file = xfopen("/dev/null", "w");
> options->close_file = 1;
> options->color_moved = 0;
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
> if (check_pair_status(p))
> diff_flush_patch(p, options);
> if (options->found_changes)
> break;
> }
> }
>
> So here's a naive application of the same technique:
>
> diff --git a/diff.c b/diff.c
> index 76291e238c..0fe6eb7443 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6845,8 +6845,28 @@ void diff_flush(struct diff_options *options)
> DIFF_FORMAT_CHECKDIFF)) {
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
> - if (check_pair_status(p))
> - flush_one_pair(p, options);
> +
> + if (!check_pair_status(p))
> + continue;
> +
> + if (options->flags.diff_from_contents) {
> + FILE *orig_out = options->file;
> + int orig_changes = options->found_changes;
> + int skip;
> +
> + options->file = xfopen("/dev/null", "w");
> + diff_flush_patch(p, options);
> + skip = !options->found_changes;
> +
> + fclose(options->file);
> + options->file = orig_out;
> + options->found_changes = orig_changes;
> +
> + if (skip)
> + continue;
> + }
> +
> + flush_one_pair(p, options);
> }
> separator++;
> }
>
> which works on a trivial example. It affects all of raw, name-only,
> name-status, and checkdiff. I know Junio said that --raw should not be
> affected, but I'm not sure I agree. Anyway, it should be possible to
> split the logic by output type.
This is much more palatable to avoid code duplication and
repetition, compared to what was posted before. A new code that
loops over q->queue[] one more time (which is yucky but cannot be
helped due to output order as you noted earlier) and runs
diff_flush_patch(), like the one we see above, is probably the best
we can do. The beauty of the above approach is that the new code
only needs to know about a single .diff_from_contents bit, and does
not have to care what features cause the .diff_from_contents bit to
be flipped on. Doing anything more would be maintenance nightmare.
Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-08-03 14:51 ` [PATCH v2] " Lidong Yan
@ 2025-08-04 0:39 ` Junio C Hamano
2025-08-04 1:56 ` Lidong Yan
2025-08-06 12:33 ` [PATCH v3] diff: ensure consistent diff behavior with ignore options Lidong Yan
1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-08-04 0:39 UTC (permalink / raw)
To: Lidong Yan; +Cc: git, hi, michal, peff
Lidong Yan <yldhome2d2@gmail.com> writes:
> `git diff -I<regex>` option is inconsistently applied across various
> output formats. In some cases, files would appear in the `--name-only`
> output but not in the accompanying `--stat` or `-p` outputs, despite
> the user explicitly requesting to ignore certain changes using
> `-I<regex>`. Not only for `-I<regex>`, but this inconsistency also
> exists for other output formats that have `.diff_from_content` set
> (e.g. `-w`, `--ignore-space-at-eol` and `--ignore-space-change`).
Perhaps the above (and code, like the name of the helper functions
and possibly the name of the new file) should be updated to place
much stress on -I<regex>, as "ignore-regex" is not any more special
than other things that flips .diff_from_content bit in this new
iteration of the patch.
I do not quite get why ignore_match() has to know so much about how
the real code in diff.c that implements -I<regex> works, compared to
the illustration of "here is how to do it" Peff posted, though. It
somehow feels too much duplicated code.
Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-08-04 0:39 ` Junio C Hamano
@ 2025-08-04 1:56 ` Lidong Yan
2025-08-04 4:36 ` Junio C Hamano
0 siblings, 1 reply; 35+ messages in thread
From: Lidong Yan @ 2025-08-04 1:56 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, hi, michal, peff
Junio C Hamano <gitster@pobox.com> writes:
>
> Lidong Yan <yldhome2d2@gmail.com> writes:
>
>> `git diff -I<regex>` option is inconsistently applied across various
>> output formats. In some cases, files would appear in the `--name-only`
>> output but not in the accompanying `--stat` or `-p` outputs, despite
>> the user explicitly requesting to ignore certain changes using
>> `-I<regex>`. Not only for `-I<regex>`, but this inconsistency also
>> exists for other output formats that have `.diff_from_content` set
>> (e.g. `-w`, `--ignore-space-at-eol` and `--ignore-space-change`).
>
> Perhaps the above (and code, like the name of the helper functions
> and possibly the name of the new file) should be updated to place
> much stress on -I<regex>, as "ignore-regex" is not any more special
> than other things that flips .diff_from_content bit in this new
> iteration of the patch.
I will replace ‘-I<regex>’ to ‘options that set .diff_from_content’.
>
> I do not quite get why ignore_match() has to know so much about how
> the real code in diff.c that implements -I<regex> works, compared to
> the illustration of "here is how to do it" Peff posted, though. It
> somehow feels too much duplicated code.
I did copy some code from diffcore-pickaxe.c. I will use Peff's code in the
next patch and try to refactor diff_flush() to make the code simpler. Though
the reason I match the regular expression in ignore_match() is that I want to
return early as soon as an unmatched change is found. And indeed, it's not
worth writing the duplicated code for this unknown performance benefit.
Thanks,
Lidong
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-08-04 1:56 ` Lidong Yan
@ 2025-08-04 4:36 ` Junio C Hamano
2025-08-05 9:23 ` Lidong Yan
0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-08-04 4:36 UTC (permalink / raw)
To: Lidong Yan; +Cc: git, hi, michal, peff
Lidong Yan <yldhome2d2@gmail.com> writes:
>> I do not quite get why ignore_match() has to know so much about how
>> the real code in diff.c that implements -I<regex> works, compared to
>> the illustration of "here is how to do it" Peff posted, though. It
>> somehow feels too much duplicated code.
>
> I did copy some code from diffcore-pickaxe.c. I will use Peff's code in the
> next patch and try to refactor diff_flush() to make the code simpler. Though
> the reason I match the regular expression in ignore_match() is that I want to
> return early as soon as an unmatched change is found. And indeed, it's not
> worth writing the duplicated code for this unknown performance benefit.
In the production code, it would be truly worth doing the
optimization; we want to avoid running diff twice if we can.
But I think the refactoring of diff_flush() codepath would may
involve some new mode (perhaps DIFF_FORMAT_DRYRUN or something) that
(1) does not produce any output, like DIFF_FORMAT_NO_OUTPUT, so
that we do not need to play with /dev/null like Peff's
illustration.
(2) knows that the caller is only interested in each path having
any change worth reporting, so that it can short-circuit once a
change is found for each path.
So, just before you want to decide showing name or name-status,
you'd do this extra diff_flush() that is run only to learn if each
path has changes (with various "ignore" criteria) in the dry-run
mode, and it can do as much short-cut as it needs to.
Hmm?
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-08-02 10:22 ` Jeff King
2025-08-03 8:42 ` Lidong Yan
2025-08-03 15:43 ` Junio C Hamano
@ 2025-08-04 4:39 ` Junio C Hamano
2025-08-04 12:42 ` Jeff King
2 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-08-04 4:39 UTC (permalink / raw)
To: Jeff King; +Cc: Lidong Yan, git, hi, michal
Jeff King <peff@peff.net> writes:
> .... It affects all of raw, name-only,
> name-status, and checkdiff. I know Junio said that --raw should not be
> affected, but I'm not sure I agree.
I no longer am sure if I agree. I do not mind a raw entry that
would show different object name for preimage and postimage for a
path to be omitted when --ignore-whatever is passed and the blobs
compare "equal" under the specified "ignore" criteria.
The behaviour sounds somewhat incoherent, but that is what the user,
who passes both --raw and --ignore-whatever to the command at the
same time, wants.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-08-04 4:39 ` Junio C Hamano
@ 2025-08-04 12:42 ` Jeff King
0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2025-08-04 12:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Lidong Yan, git, hi, michal
On Sun, Aug 03, 2025 at 09:39:21PM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > .... It affects all of raw, name-only,
> > name-status, and checkdiff. I know Junio said that --raw should not be
> > affected, but I'm not sure I agree.
>
> I no longer am sure if I agree. I do not mind a raw entry that
> would show different object name for preimage and postimage for a
> path to be omitted when --ignore-whatever is passed and the blobs
> compare "equal" under the specified "ignore" criteria.
>
> The behaviour sounds somewhat incoherent, but that is what the user,
> who passes both --raw and --ignore-whatever to the command at the
> same time, wants.
Yeah, exactly. It definitely is weird, but it feels like the closest
thing to what the user asked for.
Just trying to play devil's advocate on this whole topic: is there
anybody who could complain about omitting these entries from raw or
name-only lists? IMHO it is weird and a bug that:
git diff --name-only --raw -p -w <commit...>
might show an entry in the name-only and raw lists that doesn't also end
up in the actual patch. For just:
git diff --name-only --raw -w
it is easy to say "well, why did you pass -w if you did not want
content-level inspection?". But when they are combined, could the
current behavior ever be preferred?
The two counterpoints I can think of are:
1. Maybe that is an interesting signal to somebody that the diff _did_
touch that path, but it just had no content-level change. I am
having trouble imagining why that is useful, but it's not outside
the realm of possibility. And certainly you could get that
information separately by running a tree-level raw diff (without
"-w") followed by a "-p -w" diff (and if you use diff-pairs, that
second diff can even skip doing the tree-diff again).
2. 99% of the time, "-w" (or -I, or whatever) will not remove the
entirety of the change from those files. So we will do a whole
content-level diff just to say "yep, we still should mention this
in --raw"). Is the extra computation worth it?
-Peff
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-08-04 4:36 ` Junio C Hamano
@ 2025-08-05 9:23 ` Lidong Yan
2025-08-05 16:11 ` Junio C Hamano
0 siblings, 1 reply; 35+ messages in thread
From: Lidong Yan @ 2025-08-05 9:23 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, hi, michal, peff
Junio C Hamano <gitster@pobox.com> writes:
>
> But I think the refactoring of diff_flush() codepath would may
> involve some new mode (perhaps DIFF_FORMAT_DRYRUN or something) that
>
> (1) does not produce any output, like DIFF_FORMAT_NO_OUTPUT, so
> that we do not need to play with /dev/null like Peff's
> illustration.
>
> (2) knows that the caller is only interested in each path having
> any change worth reporting, so that it can short-circuit once a
> change is found for each path.
>
> So, just before you want to decide showing name or name-status,
> you'd do this extra diff_flush() that is run only to learn if each
> path has changes (with various "ignore" criteria) in the dry-run
> mode, and it can do as much short-cut as it needs to.
I’m proposing to add a .diff_optimize field to struct diff_options, which
would support three modes: DIFF_OPT_NONE, DIFF_OPT_DRY_RUN,
and DIFF_OPT_BUFFER. The appropriate value would be determined
before calling diff_flush(), potentially in repo_diff_setup().
DIFF_OPT_NONE will be the code Peff provide, DIFF_OPT_DRY_RUN
will optimize for --quiet, --name, --name-status, etc, so that we can return
early if we found any change. DIFF_OPT_BUFFER will first emit changes
and context around changes into a buffer (so there would be a map from file
pair to change buffer), then operations after the buffer is built will use the
buffer instead of calling xdl_diff().
However, I’m concerned that DIFF_OPT_BUFFER could lead to high memory
usage in Git, and I’m not entirely sure if this trade-off is justified.
Thanks,
Lidong
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v2] diff: ensure consistent diff behavior with -I<regex> across output formats
2025-08-05 9:23 ` Lidong Yan
@ 2025-08-05 16:11 ` Junio C Hamano
0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2025-08-05 16:11 UTC (permalink / raw)
To: Lidong Yan; +Cc: git, hi, michal, peff
Lidong Yan <yldhome2d2@gmail.com> writes:
> I’m proposing to add a .diff_optimize field to struct diff_options, which
> would support three modes: DIFF_OPT_NONE, DIFF_OPT_DRY_RUN,
> and DIFF_OPT_BUFFER. The appropriate value would be determined
> before calling diff_flush(), potentially in repo_diff_setup().
> ...
> However, I’m concerned that DIFF_OPT_BUFFER could lead to high memory
> usage in Git, and I’m not entirely sure if this trade-off is justified.
The DRY_RUN mode would make very good sense, and I agree with you
that it is better to leave the BUFFER mode out. It is hard to do
right (like, keeping up to certain small amount of diff in memory
and then spill it to an external temporary file if we are getting
too much), and the value to have it is questionable, especially in
the initial attempt.
Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v3] diff: ensure consistent diff behavior with ignore options
2025-08-03 14:51 ` [PATCH v2] " Lidong Yan
2025-08-04 0:39 ` Junio C Hamano
@ 2025-08-06 12:33 ` Lidong Yan
2025-08-06 17:35 ` Junio C Hamano
` (2 more replies)
1 sibling, 3 replies; 35+ messages in thread
From: Lidong Yan @ 2025-08-06 12:33 UTC (permalink / raw)
To: yldhome2d2; +Cc: git, gitster, hi, michal, peff
In git-diff, options like `-w` and `-I<regex>` require comparing
file contents to determine whether two files are the same, even when
their SHA values differ. For options like `--raw`, `--name-status`,
and `--name-only`, git-diff deliberately compares only the SHA values
to determine whether two files are the same, for performance reasons.
As a result, a file shown in `git diff --name-status` may not appear
in `git diff --patch`.
To quickly determine whether two files are identical, Add helper
function diff_flush_patch_quiet() in diff.c. Add `.diff_optimize`
field in `struct diff_options`. When `.diff_optimize` is set to
`DIFF_OPT_DRY_RUN`, builtin_diff() will return immediately upon
detecting any change. Call diff_flush_patch_quiet() to determine
if we should flush `--raw`, `--name-only` or `--name-status` output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
---
diff.c | 67 +++++++++++++++++++++++++++++---------
diff.h | 5 +++
t/t4013-diff-various.sh | 14 ++++++++
t/t4015-diff-whitespace.sh | 2 +-
xdiff-interface.h | 6 ++--
5 files changed, 74 insertions(+), 20 deletions(-)
diff --git a/diff.c b/diff.c
index dca87e164f..5254ef9373 100644
--- a/diff.c
+++ b/diff.c
@@ -2444,6 +2444,15 @@ static int fn_out_consume(void *priv, char *line, unsigned long len)
return 0;
}
+static int quick_consume(void *priv, char *line, unsigned long len)
+{
+ struct emit_callback *ecbdata = priv;
+ struct diff_options *o = ecbdata->opt;
+
+ o->found_changes = 1;
+ return 1;
+}
+
static void pprint_rename(struct strbuf *name, const char *a, const char *b)
{
const char *old_name = a;
@@ -3709,6 +3718,7 @@ static void builtin_diff(const char *name_a,
xdemitconf_t xecfg;
struct emit_callback ecbdata;
const struct userdiff_funcname *pe;
+ int dry_run = o->diff_optimize == DIFF_OPT_DRY_RUN;
if (must_show_header) {
emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
@@ -3741,8 +3751,8 @@ static void builtin_diff(const char *name_a,
xpp.ignore_regex_nr = o->ignore_regex_nr;
xpp.anchors = o->anchors;
xpp.anchors_nr = o->anchors_nr;
- xecfg.ctxlen = o->context;
- xecfg.interhunkctxlen = o->interhunkcontext;
+ xecfg.ctxlen = dry_run ? 0 : o->context;
+ xecfg.interhunkctxlen = dry_run ? 0 : o->interhunkcontext;
xecfg.flags = XDL_EMIT_FUNCNAMES;
if (o->flags.funccontext)
xecfg.flags |= XDL_EMIT_FUNCCONTEXT;
@@ -3750,7 +3760,8 @@ static void builtin_diff(const char *name_a,
xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
diffopts = getenv("GIT_DIFF_OPTS");
- if (!diffopts)
+ /* ignore ctxlen if we are in dry run mode */
+ if (!diffopts || dry_run)
;
else if (skip_prefix(diffopts, "--unified=", &v))
xecfg.ctxlen = strtoul(v, NULL, 10);
@@ -3759,8 +3770,11 @@ static void builtin_diff(const char *name_a,
if (o->word_diff)
init_diff_words_data(&ecbdata, o, one, two);
- if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
- &ecbdata, &xpp, &xecfg))
+ if (dry_run)
+ xdi_diff_outf(&mf1, &mf2, NULL, quick_consume,
+ &ecbdata, &xpp, &xecfg);
+ else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
+ &ecbdata, &xpp, &xecfg))
die("unable to generate diff for %s", one->path);
if (o->word_diff)
free_diff_words_data(&ecbdata);
@@ -4988,6 +5002,8 @@ void diff_setup_done(struct diff_options *options)
options->filter = ~filter_bit[DIFF_STATUS_FILTER_AON];
options->filter &= ~options->filter_not;
}
+
+ options->diff_optimize = DIFF_OPT_NONE;
}
int parse_long_opt(const char *opt, const char **argv,
@@ -6150,6 +6166,22 @@ static void diff_flush_patch(struct diff_filepair *p, struct diff_options *o)
run_diff(p, o);
}
+/* return 1 if any change is found; otherwise, return 0 */
+static int diff_flush_patch_quiet(struct diff_filepair *p, struct diff_options *o)
+{
+ int diff_opt = o->diff_optimize;
+ int found_changes = o->found_changes;
+ int ret;
+
+ o->diff_optimize = DIFF_OPT_DRY_RUN;
+ o->found_changes = 0;
+ diff_flush_patch(p, o);
+ ret = o->found_changes;
+ o->diff_optimize = diff_opt;
+ o->found_changes |= found_changes;
+ return ret;
+}
+
static void diff_flush_stat(struct diff_filepair *p, struct diff_options *o,
struct diffstat_t *diffstat)
{
@@ -6778,7 +6810,19 @@ void diff_flush(struct diff_options *options)
DIFF_FORMAT_CHECKDIFF)) {
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
- if (check_pair_status(p))
+ int need_flush = 1;
+
+ if (!check_pair_status(p))
+ continue;
+
+ if (options->flags.diff_from_contents) {
+ if (diff_flush_patch_quiet(p, options))
+ need_flush = 1;
+ else
+ need_flush = 0;
+ }
+
+ if (need_flush)
flush_one_pair(p, options);
}
separator++;
@@ -6831,19 +6875,10 @@ void diff_flush(struct diff_options *options)
if (output_format & DIFF_FORMAT_NO_OUTPUT &&
options->flags.exit_with_status &&
options->flags.diff_from_contents) {
- /*
- * run diff_flush_patch for the exit status. setting
- * options->file to /dev/null should be safe, because we
- * aren't supposed to produce any output anyway.
- */
- diff_free_file(options);
- options->file = xfopen("/dev/null", "w");
- options->close_file = 1;
- options->color_moved = 0;
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
if (check_pair_status(p))
- diff_flush_patch(p, options);
+ diff_flush_patch_quiet(p, options);
if (options->found_changes)
break;
}
diff --git a/diff.h b/diff.h
index 62e5768a9a..e165aeebb1 100644
--- a/diff.h
+++ b/diff.h
@@ -400,6 +400,11 @@ struct diff_options {
#define COLOR_MOVED_WS_ERROR (1<<0)
unsigned color_moved_ws_handling;
+ enum {
+ DIFF_OPT_NONE = 0,
+ DIFF_OPT_DRY_RUN = 1,
+ } diff_optimize;
+
struct repository *repo;
struct strmap *additional_path_headers;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 8ebd170451..b56a79d979 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -648,6 +648,20 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
test_grep "invalid regex given to -I: " error
'
+test_expect_success 'diff -I<regex>: ignore matching file' '
+ test_seq 50 >file1 &&
+ git add file1 &&
+ test_seq 50 | sed -e "s/13/ten and three/" -e "s/^[124-9].*/& /" >file1 &&
+
+ : >actual &&
+ git diff --raw --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
+ git diff --name-only --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
+ git diff --name-status --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
+ ! grep "file1" actual &&
+
+ git rm -f file1
+'
+
# check_prefix <patch> <src> <dst>
# check only lines with paths to avoid dependency on exact oid/contents
check_prefix () {
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 52e3e476ff..e7be8c5a8f 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -11,7 +11,7 @@ test_description='Test special whitespace in diff engine.
. "$TEST_DIRECTORY"/lib-diff.sh
for opt_res in --patch --quiet -s --stat --shortstat --dirstat=lines \
- --raw! --name-only! --name-status!
+ --raw --name-only --name-status
do
opts=${opt_res%!} expect_failure=
test "$opts" = "$opt_res" ||
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 1ed430b622..dfc55daddf 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -28,9 +28,9 @@
* from an error internal to xdiff, xdiff itself will see that
* non-zero return and translate it to -1.
*
- * See "diff_grep" in diffcore-pickaxe.c for a trick to work around
- * this, i.e. using the "consume_callback_data" to note the desired
- * early return.
+ * See "diff_grep" in diffcore-pickaxe.c and "quick_consume" in diff.c
+ * for a trick to work around this, i.e. using the "consume_callback_data"
+ * to note the desired early return.
*/
typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
typedef void (*xdiff_emit_hunk_fn)(void *data,
--
2.51.0.rc0.48.g112648dd6b.dirty
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v3] diff: ensure consistent diff behavior with ignore options
2025-08-06 12:33 ` [PATCH v3] diff: ensure consistent diff behavior with ignore options Lidong Yan
@ 2025-08-06 17:35 ` Junio C Hamano
2025-08-07 1:23 ` Lidong Yan
2025-08-06 20:56 ` Junio C Hamano
2025-08-07 2:06 ` [PATCH v4] " Lidong Yan
2 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-08-06 17:35 UTC (permalink / raw)
To: Lidong Yan; +Cc: git, hi, michal, peff
Lidong Yan <yldhome2d2@gmail.com> writes:
> In git-diff, options like `-w` and `-I<regex>` require comparing
> file contents to determine whether two files are the same, even when
> their SHA values differ. For options like `--raw`, `--name-status`,
> and `--name-only`, git-diff deliberately compares only the SHA values
> to determine whether two files are the same, for performance reasons.
> As a result, a file shown in `git diff --name-status` may not appear
> in `git diff --patch`.
>
> To quickly determine whether two files are identical, Add helper
> function diff_flush_patch_quiet() in diff.c. Add `.diff_optimize`
> field in `struct diff_options`. When `.diff_optimize` is set to
> `DIFF_OPT_DRY_RUN`, builtin_diff() will return immediately upon
> detecting any change. Call diff_flush_patch_quiet() to determine
> if we should flush `--raw`, `--name-only` or `--name-status` output.
>
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
> ---
> diff.c | 67 +++++++++++++++++++++++++++++---------
> diff.h | 5 +++
> t/t4013-diff-various.sh | 14 ++++++++
> t/t4015-diff-whitespace.sh | 2 +-
> xdiff-interface.h | 6 ++--
> 5 files changed, 74 insertions(+), 20 deletions(-)
The code looks much easier to reason about than the previous rounds.
A few comments about the design.
- Are there other possible values that might fit in this "optimize"
member, and what kind of behaviour would they trigger, that we
can envision? I do not think of any and that is why the "enum
diff_optimize" member in the diff_options structure smells more
like a "bool dry_run".
By the way, giving a member "diff_" prefix when the enclosing
struct is clearly about "diff" by having a name "diff_options" is
often a waste of readers' time.
- It is unclear why the dry-run need to imply 0-line context.
- diff_flush_patch_quietly() would be a better name for
diff_flush_patch_quiet().
On to the details.
> diff --git a/diff.c b/diff.c
> index dca87e164f..5254ef9373 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2444,6 +2444,15 @@ static int fn_out_consume(void *priv, char *line, unsigned long len)
> return 0;
> }
>
> +static int quick_consume(void *priv, char *line, unsigned long len)
> +{
> + struct emit_callback *ecbdata = priv;
> + struct diff_options *o = ecbdata->opt;
> +
> + o->found_changes = 1;
> + return 1;
> +}
OK, as a non-zero return value from consume callbacks is supposed
to signal an error and causes xdiff_outf() an early return, this
serves as a short-cut. One downside is that we cannot truly notice
and signal an error to our callers, as we will see in a later hunk.
> @@ -3709,6 +3718,7 @@ static void builtin_diff(const char *name_a,
> xdemitconf_t xecfg;
> struct emit_callback ecbdata;
> const struct userdiff_funcname *pe;
> + int dry_run = o->diff_optimize == DIFF_OPT_DRY_RUN;
And this screams that o->dry_run that is a Boolean may be
sufficient, but I could be missing obvious future enhancement
opportunities.
> @@ -3741,8 +3751,8 @@ static void builtin_diff(const char *name_a,
> xpp.ignore_regex_nr = o->ignore_regex_nr;
> xpp.anchors = o->anchors;
> xpp.anchors_nr = o->anchors_nr;
> - xecfg.ctxlen = o->context;
> - xecfg.interhunkctxlen = o->interhunkcontext;
> + xecfg.ctxlen = dry_run ? 0 : o->context;
> + xecfg.interhunkctxlen = dry_run ? 0 : o->interhunkcontext;
Unclear why. I think you had a similar change with a comment ...
> @@ -3750,7 +3760,8 @@ static void builtin_diff(const char *name_a,
> xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
>
> diffopts = getenv("GIT_DIFF_OPTS");
> - if (!diffopts)
> + /* ignore ctxlen if we are in dry run mode */
... here, but this comment is totally useless.
> + if (!diffopts || dry_run)
> ;
> else if (skip_prefix(diffopts, "--unified=", &v))
> xecfg.ctxlen = strtoul(v, NULL, 10);
Anybody who can read the code can tell that dry_run disables the
xecfg.ctxlen handling we have below. What the code does not tell,
hence the author of a patch must help the readers by writing *why*
ignoring patch context is safe, correct, necessary, and desirable
when the main non-dry-run invocation of the diff machinery, which
this dry-run mode is trying to help, may use some context lines.
It does not have to be done in in-code comment. Especially because
the consequence of the design decision to "ignore context" appears
in two different places, the proposed log message would be a better
place to explain why it is a safe, correct, necessary and desirable
thing to do.
> - if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
> - &ecbdata, &xpp, &xecfg))
> + if (dry_run)
> + xdi_diff_outf(&mf1, &mf2, NULL, quick_consume,
> + &ecbdata, &xpp, &xecfg);
> + else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
> + &ecbdata, &xpp, &xecfg))
> die("unable to generate diff for %s", one->path);
And this is the consequence of xdi_diff_outf() not having an
extensible way to report different "abnormal" conditions to its
caller.
In the normal case, all exceptional/abnormal conditions are error.
But in the dry-run case, "abnormal return is an error and we should
die" would not fit our purpose. "Assume that non-zero 'success'
return is because our quick_consume() is signalling that it found
what it needs to find, and that is not an error" is what is going on
here.
I'll stop here for now. Will continue later.
Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3] diff: ensure consistent diff behavior with ignore options
2025-08-06 12:33 ` [PATCH v3] diff: ensure consistent diff behavior with ignore options Lidong Yan
2025-08-06 17:35 ` Junio C Hamano
@ 2025-08-06 20:56 ` Junio C Hamano
2025-08-07 1:39 ` Lidong Yan
2025-08-07 2:06 ` [PATCH v4] " Lidong Yan
2 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-08-06 20:56 UTC (permalink / raw)
To: Lidong Yan; +Cc: git, hi, michal, peff
Lidong Yan <yldhome2d2@gmail.com> writes:
> +static int quick_consume(void *priv, char *line, unsigned long len)
> +{
> + struct emit_callback *ecbdata = priv;
> + struct diff_options *o = ecbdata->opt;
> +
> + o->found_changes = 1;
> + return 1;
> +}
"make DEVELOPER=YesPlease" would not be very happy, without
-static int quick_consume(void *priv, char *line, unsigned long len)
+static int quick_consume(void *priv, char *line UNUSED, unsigned long len UNUSED)
> +/* return 1 if any change is found; otherwise, return 0 */
> +static int diff_flush_patch_quiet(struct diff_filepair *p, struct diff_options *o)
> +{
> + int diff_opt = o->diff_optimize;
> + int found_changes = o->found_changes;
> + int ret;
> +
> + o->diff_optimize = DIFF_OPT_DRY_RUN;
> + o->found_changes = 0;
> + diff_flush_patch(p, o);
> + ret = o->found_changes;
> + o->diff_optimize = diff_opt;
> + o->found_changes |= found_changes;
> + return ret;
> +}
> static void diff_flush_stat(struct diff_filepair *p, struct diff_options *o,
> struct diffstat_t *diffstat)
> {
> @@ -6778,7 +6810,19 @@ void diff_flush(struct diff_options *options)
> DIFF_FORMAT_CHECKDIFF)) {
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
> - if (check_pair_status(p))
> + int need_flush = 1;
> +
> + if (!check_pair_status(p))
> + continue;
> +
> + if (options->flags.diff_from_contents) {
> + if (diff_flush_patch_quiet(p, options))
> + need_flush = 1;
> + else
> + need_flush = 0;
> + }
> +
> + if (need_flush)
> flush_one_pair(p, options);
> }
I am having a hard time understanding the logic in this loop. Is it
equivalent to the following loop, and ...
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
if (!check_pair_status(p))
continue;
if (options->flags.diff_from_contents &&
!diff_flush_patch_quietly(p, options))
continue; /* no changes */
flush_one_pair(p, options);
}
... if so, isn't the above rewrite easier to follow?
The idea is that we cannot do anything with DIFF_STATUS_UNKNOWN
pairs (hence we continue), and when we are filtering output by the
"has the pair changed in a way that the user cares about?" criteria,
we check with flush_patch_quietly, and if there is no change worth
talking about, we do not do anything with such a pair, either.
Only when these conditions are not met, we call flush_one_pair() to
show the name or name status or whatever.
> separator++;
> @@ -6831,19 +6875,10 @@ void diff_flush(struct diff_options *options)
> if (output_format & DIFF_FORMAT_NO_OUTPUT &&
> options->flags.exit_with_status &&
> options->flags.diff_from_contents) {
> - /*
> - * run diff_flush_patch for the exit status. setting
> - * options->file to /dev/null should be safe, because we
> - * aren't supposed to produce any output anyway.
> - */
> - diff_free_file(options);
> - options->file = xfopen("/dev/null", "w");
> - options->close_file = 1;
> - options->color_moved = 0;
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
> if (check_pair_status(p))
> - diff_flush_patch(p, options);
> + diff_flush_patch_quiet(p, options);
> if (options->found_changes)
> break;
> }
It is a natural consequence of having a helper that does the "quiet"
thing that we can now lose the /dev/null hack, which is very nice.
Overall, the changes are easy to follow and look good.
Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3] diff: ensure consistent diff behavior with ignore options
2025-08-06 17:35 ` Junio C Hamano
@ 2025-08-07 1:23 ` Lidong Yan
0 siblings, 0 replies; 35+ messages in thread
From: Lidong Yan @ 2025-08-07 1:23 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, hi, michal, peff
Junio C Hamano <gitster@pobox.com> writes:
>
> The code looks much easier to reason about than the previous rounds.
>
> A few comments about the design.
>
> - Are there other possible values that might fit in this "optimize"
> member, and what kind of behaviour would they trigger, that we
> can envision? I do not think of any and that is why the "enum
> diff_optimize" member in the diff_options structure smells more
> like a "bool dry_run".
I was thinking about DIFF_OPT_BUFFER , but "bool dry_run" would be good
enough for now.
>
> By the way, giving a member "diff_" prefix when the enclosing
> struct is clearly about "diff" by having a name "diff_options" is
> often a waste of readers' time.
Good advice, field name should avoid overlapping with struct name.
>
> - It is unclear why the dry-run need to imply 0-line context.
I was thinking about matching ignored regex in `quick_consume()`, in which
case we only need to match regex for changes rather than context, so I set
ctxlen to zero. But anyway more `if` reduce both readability and maintainability,
so I won’t set ctxlen in the next version.
>
> - diff_flush_patch_quietly() would be a better name for
> diff_flush_patch_quiet().
Understand, I suppose the idea behind this is to stick to proper grammar as
much as possible when choosing names.
>
> On to the details.
>
>> diff --git a/diff.c b/diff.c
>> index dca87e164f..5254ef9373 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -2444,6 +2444,15 @@ static int fn_out_consume(void *priv, char *line, unsigned long len)
>> return 0;
>> }
>>
>> +static int quick_consume(void *priv, char *line, unsigned long len)
>> +{
>> + struct emit_callback *ecbdata = priv;
>> + struct diff_options *o = ecbdata->opt;
>> +
>> + o->found_changes = 1;
>> + return 1;
>> +}
>
> OK, as a non-zero return value from consume callbacks is supposed
> to signal an error and causes xdiff_outf() an early return, this
> serves as a short-cut. One downside is that we cannot truly notice
> and signal an error to our callers, as we will see in a later hunk.
I want to think about how to solve this problem later, but I believe I shouldn’t
address it in this patch.
Thanks,
Lidong
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v3] diff: ensure consistent diff behavior with ignore options
2025-08-06 20:56 ` Junio C Hamano
@ 2025-08-07 1:39 ` Lidong Yan
0 siblings, 0 replies; 35+ messages in thread
From: Lidong Yan @ 2025-08-07 1:39 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, hi, michal, peff
Junio C Hamano <gitster@pobox.com> writes:
>
> Lidong Yan <yldhome2d2@gmail.com> writes:
>
>> +static int quick_consume(void *priv, char *line, unsigned long len)
>> +{
>> + struct emit_callback *ecbdata = priv;
>> + struct diff_options *o = ecbdata->opt;
>> +
>> + o->found_changes = 1;
>> + return 1;
>> +}
>
> "make DEVELOPER=YesPlease" would not be very happy, without
>
> -static int quick_consume(void *priv, char *line, unsigned long len)
> +static int quick_consume(void *priv, char *line UNUSED, unsigned long len UNUSED)
(I accidentally sent an unfinished email to Junio. Sorry about that.)
Another a good thing to test before commit. Thanks for the reminder.
>> static void diff_flush_stat(struct diff_filepair *p, struct diff_options *o,
>> struct diffstat_t *diffstat)
>> {
>> @@ -6778,7 +6810,19 @@ void diff_flush(struct diff_options *options)
>> DIFF_FORMAT_CHECKDIFF)) {
>> for (i = 0; i < q->nr; i++) {
>> struct diff_filepair *p = q->queue[i];
>> - if (check_pair_status(p))
>> + int need_flush = 1;
>> +
>> + if (!check_pair_status(p))
>> + continue;
>> +
>> + if (options->flags.diff_from_contents) {
>> + if (diff_flush_patch_quiet(p, options))
>> + need_flush = 1;
>> + else
>> + need_flush = 0;
>> + }
>> +
>> + if (need_flush)
>> flush_one_pair(p, options);
>> }
>
> I am having a hard time understanding the logic in this loop. Is it
> equivalent to the following loop, and ...
>
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
>
> if (!check_pair_status(p))
> continue;
> if (options->flags.diff_from_contents &&
> !diff_flush_patch_quietly(p, options))
> continue; /* no changes */
>
> flush_one_pair(p, options);
> }
>
> ... if so, isn't the above rewrite easier to follow?
Yes, I was being stupid again. That’s obviously better.
Thanks, will send v4 soon.
Lidong
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v4] diff: ensure consistent diff behavior with ignore options
2025-08-06 12:33 ` [PATCH v3] diff: ensure consistent diff behavior with ignore options Lidong Yan
2025-08-06 17:35 ` Junio C Hamano
2025-08-06 20:56 ` Junio C Hamano
@ 2025-08-07 2:06 ` Lidong Yan
2025-08-07 21:27 ` Junio C Hamano
2 siblings, 1 reply; 35+ messages in thread
From: Lidong Yan @ 2025-08-07 2:06 UTC (permalink / raw)
To: yldhome2d2; +Cc: git, gitster, hi, michal, peff
In git-diff, options like `-w` and `-I<regex>` require comparing
file contents to determine whether two files are the same, even when
their SHA values differ. For options like `--raw`, `--name-status`,
and `--name-only`, git-diff deliberately compares only the SHA values
to determine whether two files are the same, for performance reasons.
As a result, a file shown in `git diff --name-status` may not appear
in `git diff --patch`.
To quickly determine whether two files are identical, Add helper
function diff_flush_patch_quiet() in diff.c. Add `.diff_optimize`
field in `struct diff_options`. When `.diff_optimize` is set to
`DIFF_OPT_DRY_RUN`, builtin_diff() will return immediately upon
detecting any change. Call diff_flush_patch_quiet() to determine
if we should flush `--raw`, `--name-only` or `--name-status` output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
---
diff.c | 55 ++++++++++++++++++++++++++++----------
diff.h | 2 ++
t/t4013-diff-various.sh | 14 ++++++++++
t/t4015-diff-whitespace.sh | 2 +-
xdiff-interface.h | 6 ++---
5 files changed, 61 insertions(+), 18 deletions(-)
diff --git a/diff.c b/diff.c
index dca87e164f..3bd432db32 100644
--- a/diff.c
+++ b/diff.c
@@ -2444,6 +2444,15 @@ static int fn_out_consume(void *priv, char *line, unsigned long len)
return 0;
}
+static int quick_consume(void *priv, char *line UNUSED, unsigned long len UNUSED)
+{
+ struct emit_callback *ecbdata = priv;
+ struct diff_options *o = ecbdata->opt;
+
+ o->found_changes = 1;
+ return 1;
+}
+
static void pprint_rename(struct strbuf *name, const char *a, const char *b)
{
const char *old_name = a;
@@ -3709,6 +3718,7 @@ static void builtin_diff(const char *name_a,
xdemitconf_t xecfg;
struct emit_callback ecbdata;
const struct userdiff_funcname *pe;
+ int dry_run = o->dry_run;
if (must_show_header) {
emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
@@ -3759,8 +3769,11 @@ static void builtin_diff(const char *name_a,
if (o->word_diff)
init_diff_words_data(&ecbdata, o, one, two);
- if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
- &ecbdata, &xpp, &xecfg))
+ if (dry_run)
+ xdi_diff_outf(&mf1, &mf2, NULL, quick_consume,
+ &ecbdata, &xpp, &xecfg);
+ else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
+ &ecbdata, &xpp, &xecfg))
die("unable to generate diff for %s", one->path);
if (o->word_diff)
free_diff_words_data(&ecbdata);
@@ -6150,6 +6163,22 @@ static void diff_flush_patch(struct diff_filepair *p, struct diff_options *o)
run_diff(p, o);
}
+/* return 1 if any change is found; otherwise, return 0 */
+static int diff_flush_patch_quietly(struct diff_filepair *p, struct diff_options *o)
+{
+ int dry_run = o->dry_run;
+ int found_changes = o->found_changes;
+ int ret;
+
+ o->dry_run = 1;
+ o->found_changes = 0;
+ diff_flush_patch(p, o);
+ ret = o->found_changes;
+ o->dry_run = dry_run;
+ o->found_changes |= found_changes;
+ return ret;
+}
+
static void diff_flush_stat(struct diff_filepair *p, struct diff_options *o,
struct diffstat_t *diffstat)
{
@@ -6778,8 +6807,15 @@ void diff_flush(struct diff_options *options)
DIFF_FORMAT_CHECKDIFF)) {
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
- if (check_pair_status(p))
- flush_one_pair(p, options);
+
+ if (!check_pair_status(p))
+ continue;
+
+ if (options->flags.diff_from_contents &&
+ !diff_flush_patch_quietly(p, options))
+ continue;
+
+ flush_one_pair(p, options);
}
separator++;
}
@@ -6831,19 +6867,10 @@ void diff_flush(struct diff_options *options)
if (output_format & DIFF_FORMAT_NO_OUTPUT &&
options->flags.exit_with_status &&
options->flags.diff_from_contents) {
- /*
- * run diff_flush_patch for the exit status. setting
- * options->file to /dev/null should be safe, because we
- * aren't supposed to produce any output anyway.
- */
- diff_free_file(options);
- options->file = xfopen("/dev/null", "w");
- options->close_file = 1;
- options->color_moved = 0;
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
if (check_pair_status(p))
- diff_flush_patch(p, options);
+ diff_flush_patch_quietly(p, options);
if (options->found_changes)
break;
}
diff --git a/diff.h b/diff.h
index 62e5768a9a..91b3e1c5cf 100644
--- a/diff.h
+++ b/diff.h
@@ -400,6 +400,8 @@ struct diff_options {
#define COLOR_MOVED_WS_ERROR (1<<0)
unsigned color_moved_ws_handling;
+ bool dry_run;
+
struct repository *repo;
struct strmap *additional_path_headers;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 8ebd170451..b56a79d979 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -648,6 +648,20 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
test_grep "invalid regex given to -I: " error
'
+test_expect_success 'diff -I<regex>: ignore matching file' '
+ test_seq 50 >file1 &&
+ git add file1 &&
+ test_seq 50 | sed -e "s/13/ten and three/" -e "s/^[124-9].*/& /" >file1 &&
+
+ : >actual &&
+ git diff --raw --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
+ git diff --name-only --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
+ git diff --name-status --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
+ ! grep "file1" actual &&
+
+ git rm -f file1
+'
+
# check_prefix <patch> <src> <dst>
# check only lines with paths to avoid dependency on exact oid/contents
check_prefix () {
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 52e3e476ff..e7be8c5a8f 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -11,7 +11,7 @@ test_description='Test special whitespace in diff engine.
. "$TEST_DIRECTORY"/lib-diff.sh
for opt_res in --patch --quiet -s --stat --shortstat --dirstat=lines \
- --raw! --name-only! --name-status!
+ --raw --name-only --name-status
do
opts=${opt_res%!} expect_failure=
test "$opts" = "$opt_res" ||
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 1ed430b622..dfc55daddf 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -28,9 +28,9 @@
* from an error internal to xdiff, xdiff itself will see that
* non-zero return and translate it to -1.
*
- * See "diff_grep" in diffcore-pickaxe.c for a trick to work around
- * this, i.e. using the "consume_callback_data" to note the desired
- * early return.
+ * See "diff_grep" in diffcore-pickaxe.c and "quick_consume" in diff.c
+ * for a trick to work around this, i.e. using the "consume_callback_data"
+ * to note the desired early return.
*/
typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
typedef void (*xdiff_emit_hunk_fn)(void *data,
--
2.51.0.rc0.49.g31121ff5ea
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v4] diff: ensure consistent diff behavior with ignore options
2025-08-07 2:06 ` [PATCH v4] " Lidong Yan
@ 2025-08-07 21:27 ` Junio C Hamano
2025-08-08 1:46 ` Lidong Yan
0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2025-08-07 21:27 UTC (permalink / raw)
To: Lidong Yan; +Cc: git, hi, michal, peff
Lidong Yan <yldhome2d2@gmail.com> writes:
> In git-diff, options like `-w` and `-I<regex>` require comparing
> file contents to determine whether two files are the same, even when
> their SHA values differ.
Let's see if we can do something to clarify "the same" here.
Perhaps
... two files are considered equivalent under the specified
"ignore" rules, even when they are not bit-for-bit identical.
> For options like `--raw`, `--name-status`,
> and `--name-only`, git-diff deliberately compares only the SHA values
> to determine whether two files are the same, for performance reasons.
> As a result, a file shown in `git diff --name-status` may not appear
> in `git diff --patch`.
>
> To quickly determine whether two files are identical, Add helper
Following the above, perhaps replace "identical" with "equivalent".
Also, ", Add helper" should be ", add a helper", as that comma is
not finishing a sentence, hence the word that follows it is not at
the beginning of the next sentence.
> function diff_flush_patch_quiet() in diff.c. Add `.diff_optimize`
> field in `struct diff_options`. When `.diff_optimize` is set to
> `DIFF_OPT_DRY_RUN`, builtin_diff() will return immediately upon
> detecting any change. Call diff_flush_patch_quiet() to determine
> if we should flush `--raw`, `--name-only` or `--name-status` output.
Also the implementation details like the name of the .diff_options
member and the name of the helper function have changed, and the
proposed log message should be updated to match.
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
> ---
> diff.c | 55 ++++++++++++++++++++++++++++----------
> diff.h | 2 ++
> t/t4013-diff-various.sh | 14 ++++++++++
> t/t4015-diff-whitespace.sh | 2 +-
> xdiff-interface.h | 6 ++---
> 5 files changed, 61 insertions(+), 18 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index dca87e164f..3bd432db32 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2444,6 +2444,15 @@ static int fn_out_consume(void *priv, char *line, unsigned long len)
> return 0;
> }
>
> +static int quick_consume(void *priv, char *line UNUSED, unsigned long len UNUSED)
> +{
> + struct emit_callback *ecbdata = priv;
> + struct diff_options *o = ecbdata->opt;
> +
> + o->found_changes = 1;
> + return 1;
> +}
OK.
> @@ -3709,6 +3718,7 @@ static void builtin_diff(const char *name_a,
> xdemitconf_t xecfg;
> struct emit_callback ecbdata;
> const struct userdiff_funcname *pe;
> + int dry_run = o->dry_run;
As the "dry_run" variable is used only once in this block, we
probably do not want to add it.
> if (must_show_header) {
> emit_diff_symbol(o, DIFF_SYMBOL_HEADER,
> @@ -3759,8 +3769,11 @@ static void builtin_diff(const char *name_a,
>
> if (o->word_diff)
> init_diff_words_data(&ecbdata, o, one, two);
> - if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
> - &ecbdata, &xpp, &xecfg))
Instead we can check o->dry_run here.
> + if (dry_run)
> + xdi_diff_outf(&mf1, &mf2, NULL, quick_consume,
> + &ecbdata, &xpp, &xecfg);
We may want to leave a comment to explain why we ignore the error
return from xdi_diff_outf()? Perhaps like below?
if (o->dry_run)
/*
* Unlike the !dry_run case, we need to ignore the
* return value from xdi_diff_outf() here, because
* xdi_diff_outf() takes non-zero return from its
* callback function as a sign of error and returns
* early (which is why we return non-zer from our
* callback, quick_consume()). Unfortunately,
* xdi_diff_outf() signals an error by returning
* non-zero.
*/
xdi_diff_outf(&mf1, &mf2, NULL, quick_consume,
&ecbdata, &xpp, &xecfg);
I am undecided.
> + else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
> + &ecbdata, &xpp, &xecfg))
> +/* return 1 if any change is found; otherwise, return 0 */
> +static int diff_flush_patch_quietly(struct diff_filepair *p, struct diff_options *o)
> +{
> + int dry_run = o->dry_run;
> + int found_changes = o->found_changes;
In this codebase, these "original value of the variable X was this, we
tentatively save that original value away, tweak the variable X to do
something, and restore the saved value to variable X" variables are often
called "saved_X".
> + int ret;
> +
> + o->dry_run = 1;
> + o->found_changes = 0;
> + diff_flush_patch(p, o);
> + ret = o->found_changes;
> + o->dry_run = dry_run;
> + o->found_changes |= found_changes;
> + return ret;
> +}
In the previous iteration, .dry_run/.diff_optimize was set and reset in
different places; doing it in a single function here makes it easier to
understand what is going on. Nice improvement.
> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index 8ebd170451..b56a79d979 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -648,6 +648,20 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
> test_grep "invalid regex given to -I: " error
> '
>
> +test_expect_success 'diff -I<regex>: ignore matching file' '
> + test_seq 50 >file1 &&
> + git add file1 &&
> + test_seq 50 | sed -e "s/13/ten and three/" -e "s/^[124-9].*/& /" >file1 &&
> +
> + : >actual &&
> + git diff --raw --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
> + git diff --name-only --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
> + git diff --name-status --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
> + ! grep "file1" actual &&
Perhaps use test_grep helper shell function, i.e.
test_grep ! "file1" actual &&
> + git rm -f file1
Is this because later tests will break if you leave "file1" in the working
tree and/or in the index? If so, we should use test_when_finished to make
such a clean-up. If you insert
test_when_finished "git rm file1; rm -f file1" &&
at the very beginning, before you create file1 with 1..50, when this test
piece finishes executing (whether it completed successfully, or failed in
the middle of the &&-chain), the specified command will run.
On the other hand, if the later tests won't mind whether "file1" does or
does not exist in the working tree and/or in the index, it is common to
leave it behind without cleaning it. When running the test script with the
"-i" option, i.e.
$ sh t4013-diff-various.sh -i -v
leaving the files that were used in the test, without cleaning up,
sometimes helps your debugging of the test script.
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 52e3e476ff..e7be8c5a8f 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -11,7 +11,7 @@ test_description='Test special whitespace in diff engine.
> . "$TEST_DIRECTORY"/lib-diff.sh
>
> for opt_res in --patch --quiet -s --stat --shortstat --dirstat=lines \
> - --raw! --name-only! --name-status!
> + --raw --name-only --name-status
> do
Wouldn't this make the "if the option is marked with !, tweak the test that
notices these two equivalent paths are not-identical" extra code, whose
beginning part we see below, unnecessary? The $expect_failure variable
would always be an empty string, so "different but equivalent" test should
see "git diff --exit-code" exit with status 0, right?
Other than that, looking quite good.
Thanks.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH v4] diff: ensure consistent diff behavior with ignore options
2025-08-07 21:27 ` Junio C Hamano
@ 2025-08-08 1:46 ` Lidong Yan
2025-08-08 3:30 ` [PATCH v5] " Lidong Yan
0 siblings, 1 reply; 35+ messages in thread
From: Lidong Yan @ 2025-08-08 1:46 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, hi, michal, peff
Junio C Hamano <gitster@pobox.com> writes:
>
> Lidong Yan <yldhome2d2@gmail.com> writes:
>
>> In git-diff, options like `-w` and `-I<regex>` require comparing
>> file contents to determine whether two files are the same, even when
>> their SHA values differ.
>
> Let's see if we can do something to clarify "the same" here.
> Perhaps
>
> ... two files are considered equivalent under the specified
> "ignore" rules, even when they are not bit-for-bit identical.
>
> Following the above, perhaps replace "identical" with "equivalent".
Equivalent seems to be a good definition, will replace.
>
> Also, ", Add helper" should be ", add a helper", as that comma is
> not finishing a sentence, hence the word that follows it is not at
> the beginning of the next sentence.
>
> Also the implementation details like the name of the .diff_options
> member and the name of the helper function have changed, and the
> proposed log message should be updated to match.
Will fix grammar errors and update log message.
>
> As the "dry_run" variable is used only once in this block, we
> probably do not want to add it.
Sure, will fix.
>
> We may want to leave a comment to explain why we ignore the error
> return from xdi_diff_outf()? Perhaps like below?
>
> if (o->dry_run)
> /*
> * Unlike the !dry_run case, we need to ignore the
> * return value from xdi_diff_outf() here, because
> * xdi_diff_outf() takes non-zero return from its
> * callback function as a sign of error and returns
> * early (which is why we return non-zer from our
> * callback, quick_consume()). Unfortunately,
> * xdi_diff_outf() signals an error by returning
> * non-zero.
> */
> xdi_diff_outf(&mf1, &mf2, NULL, quick_consume,
> &ecbdata, &xpp, &xecfg);
>
I think your comment is good enough so I will just copy it.
>
> In this codebase, these "original value of the variable X was this, we
> tentatively save that original value away, tweak the variable X to do
> something, and restore the saved value to variable X" variables are often
> called "saved_X".
Will rename.
>
> Perhaps use test_grep helper shell function, i.e.
>
> test_grep ! "file1" actual &&
I guess this is because test_grep has more useful debugging information.
Will replace.
>
>> + git rm -f file1
>
> Is this because later tests will break if you leave "file1" in the working
> tree and/or in the index? If so, we should use test_when_finished to make
> such a clean-up. If you insert
Yes, exactly.
>
> test_when_finished "git rm file1; rm -f file1" &&
>
> at the very beginning, before you create file1 with 1..50, when this test
> piece finishes executing (whether it completed successfully, or failed in
> the middle of the &&-chain), the specified command will run.
Early on Patrict suggested the same thing, I will read test_when_finish
to see how it works.
>
> On the other hand, if the later tests won't mind whether "file1" does or
> does not exist in the working tree and/or in the index, it is common to
> leave it behind without cleaning it. When running the test script with the
> "-i" option, i.e.
>
> $ sh t4013-diff-various.sh -i -v
Good thing to know. Just curious, I also found that running ./txxx -v causes
failed results and successful results to be mixed together. Do you usually
solve this by using awk to extract the error messages?
>
>> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
>> index 52e3e476ff..e7be8c5a8f 100755
>> --- a/t/t4015-diff-whitespace.sh
>> +++ b/t/t4015-diff-whitespace.sh
>> @@ -11,7 +11,7 @@ test_description='Test special whitespace in diff engine.
>> . "$TEST_DIRECTORY"/lib-diff.sh
>>
>> for opt_res in --patch --quiet -s --stat --shortstat --dirstat=lines \
>> - --raw! --name-only! --name-status!
>> + --raw --name-only --name-status
>> do
>
> Wouldn't this make the "if the option is marked with !, tweak the test that
> notices these two equivalent paths are not-identical" extra code, whose
> beginning part we see below, unnecessary? The $expect_failure variable
> would always be an empty string, so "different but equivalent" test should
> see "git diff --exit-code" exit with status 0, right?
Yes, I will remove the optionally set expect_failure part.
Thank you very much for your review,
Lidong
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH v5] diff: ensure consistent diff behavior with ignore options
2025-08-08 1:46 ` Lidong Yan
@ 2025-08-08 3:30 ` Lidong Yan
2025-10-16 14:55 ` Johannes Schindelin
0 siblings, 1 reply; 35+ messages in thread
From: Lidong Yan @ 2025-08-08 3:30 UTC (permalink / raw)
To: yldhome2d2; +Cc: git, gitster, hi, michal, peff
In git-diff, options like `-w` and `-I<regex>`, two files are considered
equivalent under the specified "ignore" rules, even when they are not
bit-for-bit identical. For options like `--raw`, `--name-status`,
and `--name-only`, git-diff deliberately compares only the SHA values
to determine whether two files are equivalent, for performance reasons.
As a result, a file shown in `git diff --name-status` may not appear
in `git diff --patch`.
To quickly determine whether two files are equivalent, add a helper
function diff_flush_patch_quietly() in diff.c. Add `.dry_run` field in
`struct diff_options`. When `.dry_run` is true, builtin_diff() returns
immediately upon finding any change. Call diff_flush_patch_quietly()
to determine if we should flush `--raw`, `--name-only` or `--name-status`
output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
---
diff.c | 64 +++++++++++++++++++++++++++++---------
diff.h | 2 ++
t/t4013-diff-various.sh | 13 ++++++++
t/t4015-diff-whitespace.sh | 8 ++---
xdiff-interface.h | 6 ++--
5 files changed, 70 insertions(+), 23 deletions(-)
diff --git a/diff.c b/diff.c
index dca87e164f..cb04a9a6f2 100644
--- a/diff.c
+++ b/diff.c
@@ -2444,6 +2444,15 @@ static int fn_out_consume(void *priv, char *line, unsigned long len)
return 0;
}
+static int quick_consume(void *priv, char *line UNUSED, unsigned long len UNUSED)
+{
+ struct emit_callback *ecbdata = priv;
+ struct diff_options *o = ecbdata->opt;
+
+ o->found_changes = 1;
+ return 1;
+}
+
static void pprint_rename(struct strbuf *name, const char *a, const char *b)
{
const char *old_name = a;
@@ -3759,8 +3768,21 @@ static void builtin_diff(const char *name_a,
if (o->word_diff)
init_diff_words_data(&ecbdata, o, one, two);
- if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
- &ecbdata, &xpp, &xecfg))
+ if (o->dry_run) {
+ /*
+ * Unlike the !dry_run case, we need to ignore the
+ * return value from xdi_diff_outf() here, because
+ * xdi_diff_outf() takes non-zero return from its
+ * callback function as a sign of error and returns
+ * early (which is why we return non-zero from our
+ * callback, quick_consume()). Unfortunately,
+ * xdi_diff_outf() signals an error by returning
+ * non-zero.
+ */
+ xdi_diff_outf(&mf1, &mf2, NULL, quick_consume,
+ &ecbdata, &xpp, &xecfg);
+ } else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
+ &ecbdata, &xpp, &xecfg))
die("unable to generate diff for %s", one->path);
if (o->word_diff)
free_diff_words_data(&ecbdata);
@@ -6150,6 +6172,22 @@ static void diff_flush_patch(struct diff_filepair *p, struct diff_options *o)
run_diff(p, o);
}
+/* return 1 if any change is found; otherwise, return 0 */
+static int diff_flush_patch_quietly(struct diff_filepair *p, struct diff_options *o)
+{
+ int saved_dry_run = o->dry_run;
+ int saved_found_changes = o->found_changes;
+ int ret;
+
+ o->dry_run = 1;
+ o->found_changes = 0;
+ diff_flush_patch(p, o);
+ ret = o->found_changes;
+ o->dry_run = saved_dry_run;
+ o->found_changes |= saved_found_changes;
+ return ret;
+}
+
static void diff_flush_stat(struct diff_filepair *p, struct diff_options *o,
struct diffstat_t *diffstat)
{
@@ -6778,8 +6816,15 @@ void diff_flush(struct diff_options *options)
DIFF_FORMAT_CHECKDIFF)) {
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
- if (check_pair_status(p))
- flush_one_pair(p, options);
+
+ if (!check_pair_status(p))
+ continue;
+
+ if (options->flags.diff_from_contents &&
+ !diff_flush_patch_quietly(p, options))
+ continue;
+
+ flush_one_pair(p, options);
}
separator++;
}
@@ -6831,19 +6876,10 @@ void diff_flush(struct diff_options *options)
if (output_format & DIFF_FORMAT_NO_OUTPUT &&
options->flags.exit_with_status &&
options->flags.diff_from_contents) {
- /*
- * run diff_flush_patch for the exit status. setting
- * options->file to /dev/null should be safe, because we
- * aren't supposed to produce any output anyway.
- */
- diff_free_file(options);
- options->file = xfopen("/dev/null", "w");
- options->close_file = 1;
- options->color_moved = 0;
for (i = 0; i < q->nr; i++) {
struct diff_filepair *p = q->queue[i];
if (check_pair_status(p))
- diff_flush_patch(p, options);
+ diff_flush_patch_quietly(p, options);
if (options->found_changes)
break;
}
diff --git a/diff.h b/diff.h
index 62e5768a9a..91b3e1c5cf 100644
--- a/diff.h
+++ b/diff.h
@@ -400,6 +400,8 @@ struct diff_options {
#define COLOR_MOVED_WS_ERROR (1<<0)
unsigned color_moved_ws_handling;
+ bool dry_run;
+
struct repository *repo;
struct strmap *additional_path_headers;
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 8ebd170451..cfeec239e0 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -648,6 +648,19 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
test_grep "invalid regex given to -I: " error
'
+test_expect_success 'diff -I<regex>: ignore matching file' '
+ test_when_finished "git rm -f file1" &&
+ test_seq 50 >file1 &&
+ git add file1 &&
+ test_seq 50 | sed -e "s/13/ten and three/" -e "s/^[124-9].*/& /" >file1 &&
+
+ : >actual &&
+ git diff --raw --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
+ git diff --name-only --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
+ git diff --name-status --ignore-blank-lines -I"ten.*e" -I"^[124-9]" >>actual &&
+ test_grep ! "file1" actual
+'
+
# check_prefix <patch> <src> <dst>
# check only lines with paths to avoid dependency on exact oid/contents
check_prefix () {
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 52e3e476ff..9de7f73f42 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -11,12 +11,8 @@ test_description='Test special whitespace in diff engine.
. "$TEST_DIRECTORY"/lib-diff.sh
for opt_res in --patch --quiet -s --stat --shortstat --dirstat=lines \
- --raw! --name-only! --name-status!
+ --raw --name-only --name-status
do
- opts=${opt_res%!} expect_failure=
- test "$opts" = "$opt_res" ||
- expect_failure="test_expect_code 1"
-
test_expect_success "status with $opts (different)" '
echo foo >x &&
git add x &&
@@ -43,7 +39,7 @@ do
echo foo >x &&
git add x &&
echo " foo" >x &&
- $expect_failure git diff -w $opts --exit-code x
+ git diff -w $opts --exit-code x
'
done
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 1ed430b622..dfc55daddf 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -28,9 +28,9 @@
* from an error internal to xdiff, xdiff itself will see that
* non-zero return and translate it to -1.
*
- * See "diff_grep" in diffcore-pickaxe.c for a trick to work around
- * this, i.e. using the "consume_callback_data" to note the desired
- * early return.
+ * See "diff_grep" in diffcore-pickaxe.c and "quick_consume" in diff.c
+ * for a trick to work around this, i.e. using the "consume_callback_data"
+ * to note the desired early return.
*/
typedef int (*xdiff_emit_line_fn)(void *, char *, unsigned long);
typedef void (*xdiff_emit_hunk_fn)(void *data,
--
2.39.5 (Apple Git-154)
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH v5] diff: ensure consistent diff behavior with ignore options
2025-08-08 3:30 ` [PATCH v5] " Lidong Yan
@ 2025-10-16 14:55 ` Johannes Schindelin
0 siblings, 0 replies; 35+ messages in thread
From: Johannes Schindelin @ 2025-10-16 14:55 UTC (permalink / raw)
To: Lidong Yan; +Cc: git, gitster, hi, michal, peff, 李博文
[-- Attachment #1: Type: text/plain, Size: 2120 bytes --]
Hi,
On Fri, 8 Aug 2025, Lidong Yan wrote:
> @@ -6778,8 +6816,15 @@ void diff_flush(struct diff_options *options)
> DIFF_FORMAT_CHECKDIFF)) {
> for (i = 0; i < q->nr; i++) {
> struct diff_filepair *p = q->queue[i];
> - if (check_pair_status(p))
> - flush_one_pair(p, options);
> +
> + if (!check_pair_status(p))
> + continue;
> +
> + if (options->flags.diff_from_contents &&
> + !diff_flush_patch_quietly(p, options))
> + continue;
> +
> + flush_one_pair(p, options);
> }
> separator++;
> }
I'll play Ἑρμῆς here and relay the bug report from
https://github.com/git/git/commit/77f8e1002be5c736e064ed0e656ba51c82d85b5d#r168025843:
fcharlie wrote:
This code will cause git diff --patch --raw to produce garbled output
under certain conditions (e.g., using --ignore-space-change and whitespace
changes). flush_one_pair will output --raw format, but because
diff_flush_patch_quietly has already output a patch, garbled output will
be produced (actually, no diff_flush_patch_quietly should be called here).
[...]
A few exchanges later, brandb97 (AKA Lidong Yan) suggested this patch:
diff --git a/diff.c b/diff.c
index 87fa16b730..4baf9b535e 100644
--- a/diff.c
+++ b/diff.c
@@ -1351,6 +1351,9 @@ static void emit_diff_symbol_from_struct(struct diff_options *o,
int len = eds->len;
unsigned flags = eds->flags;
+ if (o->dry_run)
+ return;
+
switch (s) {
case DIFF_SYMBOL_NO_LF_EOF:
context = diff_get_color_opt(o, DIFF_CONTEXT);
@@ -4420,7 +4423,7 @@ static void run_external_diff(const struct external_diff *pgm,
{
struct child_process cmd = CHILD_PROCESS_INIT;
struct diff_queue_struct *q = &diff_queued_diff;
- int quiet = !(o->output_format & DIFF_FORMAT_PATCH);
+ int quiet = !(o->output_format & DIFF_FORMAT_PATCH) || o->dry_run;
int rc;
/*
fcharlie then responded by promising to test that patch tomorrow.
I will hold off from releasing Git for Windows v2.51.1 to await the result
of this test.
Ciao,
Johannes
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2025-10-16 14:55 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23 5:47 git-diff: --ignore-matching-lines has no effect on the output when --name-only is used hi
2025-07-23 8:00 ` Lidong Yan
2025-07-23 17:09 ` Junio C Hamano
2025-07-24 1:56 ` Lidong Yan
2025-07-24 2:16 ` Eric Sunshine
2025-07-24 3:38 ` Lidong Yan
2025-07-25 6:00 ` hi
2025-07-25 6:06 ` hi
2025-07-25 6:46 ` Lidong Yan
2025-07-25 8:08 ` hi
2025-07-25 11:11 ` Jeff King
2025-07-25 15:20 ` Junio C Hamano
2025-07-29 8:18 ` [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats Lidong Yan
2025-07-30 0:28 ` Junio C Hamano
2025-08-02 10:22 ` Jeff King
2025-08-03 8:42 ` Lidong Yan
2025-08-03 15:43 ` Junio C Hamano
2025-08-04 4:39 ` Junio C Hamano
2025-08-04 12:42 ` Jeff King
2025-08-03 14:51 ` [PATCH v2] " Lidong Yan
2025-08-04 0:39 ` Junio C Hamano
2025-08-04 1:56 ` Lidong Yan
2025-08-04 4:36 ` Junio C Hamano
2025-08-05 9:23 ` Lidong Yan
2025-08-05 16:11 ` Junio C Hamano
2025-08-06 12:33 ` [PATCH v3] diff: ensure consistent diff behavior with ignore options Lidong Yan
2025-08-06 17:35 ` Junio C Hamano
2025-08-07 1:23 ` Lidong Yan
2025-08-06 20:56 ` Junio C Hamano
2025-08-07 1:39 ` Lidong Yan
2025-08-07 2:06 ` [PATCH v4] " Lidong Yan
2025-08-07 21:27 ` Junio C Hamano
2025-08-08 1:46 ` Lidong Yan
2025-08-08 3:30 ` [PATCH v5] " Lidong Yan
2025-10-16 14:55 ` Johannes Schindelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).