git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Lidong Yan <yldhome2d2@gmail.com>
Cc: git@vger.kernel.org,  hi@arnes.space,  michal@isc.org,  peff@peff.net
Subject: Re: [PATCH v3] diff: ensure consistent diff behavior with ignore options
Date: Wed, 06 Aug 2025 10:35:17 -0700	[thread overview]
Message-ID: <xmqqikj0uze2.fsf@gitster.g> (raw)
In-Reply-To: <20250806123306.25532-1-yldhome2d2@gmail.com> (Lidong Yan's message of "Wed, 6 Aug 2025 20:33:06 +0800")

Lidong Yan <yldhome2d2@gmail.com> writes:

> In git-diff, options like `-w` and `-I<regex>` require comparing
> file contents to determine whether two files are the same, even when
> their SHA values differ. For options like `--raw`, `--name-status`,
> and `--name-only`, git-diff deliberately compares only the SHA values
> to determine whether two files are the same, for performance reasons.
> As a result, a file shown in `git diff --name-status` may not appear
> in `git diff --patch`.
>
> To quickly determine whether two files are identical, Add helper
> function diff_flush_patch_quiet() in diff.c. Add `.diff_optimize`
> field in `struct diff_options`. When `.diff_optimize` is set to
> `DIFF_OPT_DRY_RUN`, builtin_diff() will return immediately upon
> detecting any change. Call diff_flush_patch_quiet() to determine
> if we should flush `--raw`, `--name-only` or `--name-status` output.
>
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Lidong Yan <yldhome2d2@gmail.com>
> ---
>  diff.c                     | 67 +++++++++++++++++++++++++++++---------
>  diff.h                     |  5 +++
>  t/t4013-diff-various.sh    | 14 ++++++++
>  t/t4015-diff-whitespace.sh |  2 +-
>  xdiff-interface.h          |  6 ++--
>  5 files changed, 74 insertions(+), 20 deletions(-)

The code looks much easier to reason about than the previous rounds.

A few comments about the design.

 - Are there other possible values that might fit in this "optimize"
   member, and what kind of behaviour would they trigger, that we
   can envision?  I do not think of any and that is why the "enum
   diff_optimize" member in the diff_options structure smells more
   like a "bool dry_run".

   By the way, giving a member "diff_" prefix when the enclosing
   struct is clearly about "diff" by having a name "diff_options" is
   often a waste of readers' time.

 - It is unclear why the dry-run need to imply 0-line context.

 - diff_flush_patch_quietly() would be a better name for
   diff_flush_patch_quiet().

On to the details.

> diff --git a/diff.c b/diff.c
> index dca87e164f..5254ef9373 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2444,6 +2444,15 @@ static int fn_out_consume(void *priv, char *line, unsigned long len)
>  	return 0;
>  }
>  
> +static int quick_consume(void *priv, char *line, unsigned long len)
> +{
> +	struct emit_callback *ecbdata = priv;
> +	struct diff_options *o = ecbdata->opt;
> +
> +	o->found_changes = 1;
> +	return 1;
> +}

OK, as a non-zero return value from consume callbacks is supposed
to signal an error and causes xdiff_outf() an early return, this
serves as a short-cut.  One downside is that we cannot truly notice
and signal an error to our callers, as we will see in a later hunk.

> @@ -3709,6 +3718,7 @@ static void builtin_diff(const char *name_a,
>  		xdemitconf_t xecfg;
>  		struct emit_callback ecbdata;
>  		const struct userdiff_funcname *pe;
> +		int dry_run = o->diff_optimize == DIFF_OPT_DRY_RUN;

And this screams that o->dry_run that is a Boolean may be
sufficient, but I could be missing obvious future enhancement
opportunities.

> @@ -3741,8 +3751,8 @@ static void builtin_diff(const char *name_a,
>  		xpp.ignore_regex_nr = o->ignore_regex_nr;
>  		xpp.anchors = o->anchors;
>  		xpp.anchors_nr = o->anchors_nr;
> -		xecfg.ctxlen = o->context;
> -		xecfg.interhunkctxlen = o->interhunkcontext;
> +		xecfg.ctxlen = dry_run ? 0 : o->context;
> +		xecfg.interhunkctxlen = dry_run ? 0 : o->interhunkcontext;

Unclear why.  I think you had a similar change with a comment ...

> @@ -3750,7 +3760,8 @@ static void builtin_diff(const char *name_a,
>  			xdiff_set_find_func(&xecfg, pe->pattern, pe->cflags);
>  
>  		diffopts = getenv("GIT_DIFF_OPTS");
> -		if (!diffopts)
> +		/* ignore ctxlen if we are in dry run mode */

... here, but this comment is totally useless.

> +		if (!diffopts || dry_run)
>  			;
>  		else if (skip_prefix(diffopts, "--unified=", &v))
>  			xecfg.ctxlen = strtoul(v, NULL, 10);

Anybody who can read the code can tell that dry_run disables the
xecfg.ctxlen handling we have below.  What the code does not tell,
hence the author of a patch must help the readers by writing *why*
ignoring patch context is safe, correct, necessary, and desirable
when the main non-dry-run invocation of the diff machinery, which
this dry-run mode is trying to help, may use some context lines.

It does not have to be done in in-code comment.  Especially because
the consequence of the design decision to "ignore context" appears
in two different places, the proposed log message would be a better
place to explain why it is a safe, correct, necessary and desirable
thing to do.

> -		if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
> -				  &ecbdata, &xpp, &xecfg))
> +		if (dry_run)
> +			xdi_diff_outf(&mf1, &mf2, NULL, quick_consume,
> +				      &ecbdata, &xpp, &xecfg);
> +		else if (xdi_diff_outf(&mf1, &mf2, NULL, fn_out_consume,
> +				       &ecbdata, &xpp, &xecfg))
>  			die("unable to generate diff for %s", one->path);

And this is the consequence of xdi_diff_outf() not having an
extensible way to report different "abnormal" conditions to its
caller.

In the normal case, all exceptional/abnormal conditions are error.
But in the dry-run case, "abnormal return is an error and we should
die" would not fit our purpose.  "Assume that non-zero 'success'
return is because our quick_consume() is signalling that it found
what it needs to find, and that is not an error" is what is going on
here.

I'll stop here for now.  Will continue later.

Thanks.

  reply	other threads:[~2025-08-06 17:35 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-23  5:47 git-diff: --ignore-matching-lines has no effect on the output when --name-only is used hi
2025-07-23  8:00 ` Lidong Yan
2025-07-23 17:09   ` Junio C Hamano
2025-07-24  1:56     ` Lidong Yan
2025-07-24  2:16       ` Eric Sunshine
2025-07-24  3:38         ` Lidong Yan
2025-07-25  6:00     ` hi
2025-07-25  6:06       ` hi
2025-07-25  6:46       ` Lidong Yan
2025-07-25  8:08         ` hi
2025-07-25 11:11           ` Jeff King
2025-07-25 15:20             ` Junio C Hamano
2025-07-29  8:18               ` [PATCH] diff: ensure consistent diff behavior with -I<regex> across output formats Lidong Yan
2025-07-30  0:28                 ` Junio C Hamano
2025-08-02 10:22                   ` Jeff King
2025-08-03  8:42                     ` Lidong Yan
2025-08-03 15:43                     ` Junio C Hamano
2025-08-04  4:39                     ` Junio C Hamano
2025-08-04 12:42                       ` Jeff King
2025-08-03 14:51                   ` [PATCH v2] " Lidong Yan
2025-08-04  0:39                     ` Junio C Hamano
2025-08-04  1:56                       ` Lidong Yan
2025-08-04  4:36                         ` Junio C Hamano
2025-08-05  9:23                           ` Lidong Yan
2025-08-05 16:11                             ` Junio C Hamano
2025-08-06 12:33                     ` [PATCH v3] diff: ensure consistent diff behavior with ignore options Lidong Yan
2025-08-06 17:35                       ` Junio C Hamano [this message]
2025-08-07  1:23                         ` Lidong Yan
2025-08-06 20:56                       ` Junio C Hamano
2025-08-07  1:39                         ` Lidong Yan
2025-08-07  2:06                       ` [PATCH v4] " Lidong Yan
2025-08-07 21:27                         ` Junio C Hamano
2025-08-08  1:46                           ` Lidong Yan
2025-08-08  3:30                             ` [PATCH v5] " Lidong Yan
2025-10-16 14:55                               ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqikj0uze2.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=hi@arnes.space \
    --cc=michal@isc.org \
    --cc=peff@peff.net \
    --cc=yldhome2d2@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).