git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Antonin Delpeuch via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Elijah Newren <newren@gmail.com>,
	 Antonin Delpeuch <antonin@delpeuch.eu>
Subject: Re: [PATCH] blame: make diff algorithm configurable
Date: Mon, 20 Oct 2025 09:05:57 -0700	[thread overview]
Message-ID: <xmqqldl51rtm.fsf@gitster.g> (raw)
In-Reply-To: <pull.2075.git.git.1760972162827.gitgitgadget@gmail.com> (Antonin Delpeuch via GitGitGadget's message of "Mon, 20 Oct 2025 14:56:02 +0000")

"Antonin Delpeuch via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Antonin Delpeuch <antonin@delpeuch.eu>
>
> The diff algorithm used in 'git-blame(1)' can be configured using the
> `--diff-algorithm` option or the `diff.algorithm` config variable.
> Myers diff remains the default.

The usual way to compose a log message of this project is to

 - Give an observation on how the current system works in the
   present tense (so no need to say "Currently X is Y", or
   "Previously X was Y" to describe the state before your change;
   just "X is Y" is enough), and discuss what you perceive as a
   problem in it.

 - Propose a solution (optional---often, problem description
   trivially leads to an obvious solution in reader's minds).

 - Give commands to somebody editing the codebase to "make it so",
   instead of saying "This commit does X".

in this order.  This hasn't changed since your first commit to this
project a few years ago.

And when read with that expectation, I was surprised that "blame"
already paid attention to the command line option and configuration
variable, as that paragraph was supposed to explain what happens
without the patch being proposed.  It was a pleasant surprise that
turned out to be untrue X-<.

> Signed-off-by: Antonin Delpeuch <antonin@delpeuch.eu>
> ---
>     blame: make diff algorithm configurable
>     
>     There has been long-standing interest in changing the default diff
>     algorithm to "histogram", and Git 3.0 was floated as a possible occasion
>     for that: https://lore.kernel.org/git/xmqqed873vgn.fsf@gitster.g/
>     
>     As a preparation, it is worth making sure that the diff algorithm is
>     configurable where useful. It can have significant impact on the output
>     of the git-blame command, so I propose to make it configurable there
>     too. I have followed the convention of other commands (such as git-diff)
>     to introduce a --diff-algorithm option.

All of the above are good materials to be in the proposed log
message, not under the three-dash line, to explain the motivation
behind the change.

>     I understand that this command is a user-facing (porcelain) one, so I
>     think making it honor the diff.algorithm UI config variable is also
>     appropriate. The git-blame command has a machine-readable format that
>     can be enabled with --porcelain (which should be called --plumbing if
>     you ask me) so I wonder if the diff.algorithm variable should still be
>     honored in this case, as there could be the desire to keep it
>     independent from UI config variables (similarly to git-merge-file, a
>     plumbing command which doesn't honor diff.algorithm).

Good consideration and something we should make sure we do the right
thing for our users.  Personally, I would not be concerned---the
only folks that possibly affected are those who save old blame
output and wants a fresh "git blame" run they make today would
produce bit-for-bit identical output, but if they use newer versions
of Git with improved xdiff implementation, they cannot expect that
with or without the configuration knob _anyway_.  This is just my
personal opinion.  Others may differ.

>     If the general idea of this patch is judged worthwhile, I would be happy
>     to add tests to demonstrate the impact of the diff algorithm on blame
>     output.

Do not ever say this here.

I've seen from time to time people ask "I am thinking of doing this;
will a patch be accepted?  If so, I'll work on it." before showing
any work, and my response always has been:

 (1) We don't know how useful and interesting your contribution would
     be for our audience, until we see it; and

 (2) If you truly believe in your work (find it useful, find writing
     it fun, etc.), that would be incentive enough for you to work
     on it, whether or not the result will land in my tree.  You
     should instead aim for something so brilliant that we would
     come to you begging for your permission to include it in our
     project.

> diff --git a/Documentation/git-blame.adoc b/Documentation/git-blame.adoc
> index e438d28625..4beb2df551 100644
> --- a/Documentation/git-blame.adoc
> +++ b/Documentation/git-blame.adoc
> @@ -85,6 +85,27 @@ include::blame-options.adoc[]
>  	Ignore whitespace when comparing the parent's version and
>  	the child's to find where the lines came from.
>  
> +`--diff-algorithm=(patience|minimal|histogram|myers)`::
> +	Choose a diff algorithm. The variants are as follows:
> ++
> +--
> +   `default`;;
> +   `myers`;;
> +	The basic greedy diff algorithm. Currently, this is the default.
> +   `minimal`;;
> +	Spend extra time to make sure the smallest possible diff is
> +	produced.
> +   `patience`;;
> +	Use "patience diff" algorithm when generating patches.
> +   `histogram`;;
> +	This algorithm extends the patience algorithm to "support
> +	low-occurrence common elements".
> +--
> ++
> +For instance, if you configured the `diff.algorithm` variable to a
> +non-default value and want to use the default one, then you
> +have to use `--diff-algorithm=default` option.

Is this copied from somewhere else, or did you come up with the
above text yourself?  If the former, perhaps it is a good idea to
reduce the duplicattion.  Use of "include::line-range-format.adoc[]"
in Documentation/blame-options.adoc (which in turn is included by
Documentation/git-blame.adoc) may serve as a good model to include
the same text in multiple places (the "line-range" syntax thing is
included directly or indirectly and its text appears in a handful of
places as the result).  Copy the original text out into a new file
to be included (say, "diff-algorithm-option.adoc"), replace the
original text with "include::diff-algorithm-option.adoc[]", and then
add another "include::diff-algorithm-option.adoc[]" here in
git-blame documentation instead of duplicating the text like the
above hunk does.

> diff --git a/builtin/blame.c b/builtin/blame.c
> index 2703820258..177b606e81 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -779,6 +779,19 @@ static int git_blame_config(const char *var, const char *value,
>  		}
>  	}
>  
> +	if (!strcmp(var, "diff.algorithm")) {
> +		long diff_algorithm;
> +		if (!value)
> +			return config_error_nonbool(var);
> +		diff_algorithm = parse_algorithm_value(value);
> +		if (diff_algorithm < 0)
> +			return error(_("unknown value for config '%s': %s"),
> +				     var, value);

OK, this message is copied from git_diff_ui_config(), which is where
"git log" and 4 commands in the "git diff" family gets their error
message when "git -c diff.algorithm=bogus <cmd>" is run.  It is a
bit suboptimal, but users would know how to read the documentation
(even though in practice they never do), so let's say this is OK, at
least for now.

    For future reference (note: this is a #leftoverbits comment that is
    left here for the benefit of those who scan the list archive for
    ideas on what to do when they are absolutely bored without anything
    interesting to do, not meant as a suggestion to do anything of this
    sort before this patch lands), in addition to "git log" and 4
    commands in the "git diff" family,

     - merge-ort.c has the same message.
     - builtin/merge-file.c gives a bit nicer message but that is a bit
       of maintenance burden.
     - curiously "git log" and four commands in the "git diff" family
       give a much nicer message when a --diff-algorithm=bogus is given
       from the command line, but not in the configuration file.

    we may want to consolidate the error message into one place (a
    constant or "extern const char *diff_algorithm_error_message",
    or something else that is i18n friendly) and use it from all these
    places I just identified.

> +		xdl_opts &= ~XDF_DIFF_ALGORITHM_MASK;
> +		xdl_opts |= diff_algorithm;
> +		return 0;
> +	}

>  	if (git_diff_heuristic_config(var, value, cb) < 0)
>  		return -1;

This one relies on git_diff_heuristic_config() to give message when
it returns negative, so we do not have to do anything.  OK.

    Contination of the above #leftoverbits may be to see if
    parse_algorithm_value() is a good place to consolidate the error
    message, after auditing all its callers (if such a change turns
    out to be a good idea, they need to lose their own messages).

> @@ -824,6 +837,26 @@ static int blame_move_callback(const struct option *option, const char *arg, int
>  	return 0;
>  }
>  
> +static int blame_diff_algorithm_callback(const struct option *option,
> +					 const char *arg, int unset)
> +{
> +	int *opt = option->value;
> +	long value = parse_algorithm_value(arg);
> +
> +	BUG_ON_OPT_NEG(unset);
> +
> +	if (value < 0)
> +		return error(_("option diff-algorithm accepts \"myers\", "
> +			       "\"minimal\", \"patience\" and \"histogram\""));

You inherited the same "config error gets a message that requires
users to consult the manual, option error gets something a bit more
useful but is a maintenance burden" trait from "git diff" and family
here.  Let's say this is OK, too, at least for now.

> +	// ignore any previous --minimal setting, following git-diff's behavior

We do not do // comments around here, outside borrowed code.

> +	*opt &= ~XDF_NEED_MINIMAL;
> +	*opt &= ~XDF_DIFF_ALGORITHM_MASK;
> +	*opt |= value;
> +
> +	return 0;
> +}
> +
>  static int is_a_rev(const char *name)
>  {
>  	struct object_id oid;
> @@ -908,13 +941,16 @@ int cmd_blame(int argc,
>  		OPT_BIT('f', "show-name", &output_option, N_("show original filename (Default: auto)"), OUTPUT_SHOW_NAME),
>  		OPT_BIT('n', "show-number", &output_option, N_("show original linenumber (Default: off)"), OUTPUT_SHOW_NUMBER),
>  		OPT_BIT('p', "porcelain", &output_option, N_("show in a format designed for machine consumption"), OUTPUT_PORCELAIN),
> -		OPT_BIT(0, "line-porcelain", &output_option, N_("show porcelain format with per-line commit information"), OUTPUT_PORCELAIN|OUTPUT_LINE_PORCELAIN),
> +		OPT_BIT(0, "line-porcelain", &output_option, N_("show porcelain format with per-line commit information"), OUTPUT_PORCELAIN | OUTPUT_LINE_PORCELAIN),

WHY?

>  		OPT_BIT('c', NULL, &output_option, N_("use the same output mode as git-annotate (Default: off)"), OUTPUT_ANNOTATE_COMPAT),
>  		OPT_BIT('t', NULL, &output_option, N_("show raw timestamp (Default: off)"), OUTPUT_RAW_TIMESTAMP),
>  		OPT_BIT('l', NULL, &output_option, N_("show long commit SHA1 (Default: off)"), OUTPUT_LONG_OBJECT_NAME),
>  		OPT_BIT('s', NULL, &output_option, N_("suppress author name and timestamp (Default: off)"), OUTPUT_NO_AUTHOR),
>  		OPT_BIT('e', "show-email", &output_option, N_("show author email instead of name (Default: off)"), OUTPUT_SHOW_EMAIL),
>  		OPT_BIT('w', NULL, &xdl_opts, N_("ignore whitespace differences"), XDF_IGNORE_WHITESPACE),
> +		OPT_CALLBACK_F(0, "diff-algorithm", &xdl_opts, N_("<algorithm>"),
> +			       N_("choose a diff algorithm"),
> +			       PARSE_OPT_NONEG, blame_diff_algorithm_callback),

OK.

>  		OPT_STRING_LIST(0, "ignore-rev", &ignore_rev_list, N_("rev"), N_("ignore <rev> when blaming")),
>  		OPT_STRING_LIST(0, "ignore-revs-file", &ignore_revs_file_list, N_("file"), N_("ignore revisions from <file>")),
>  		OPT_BIT(0, "color-lines", &output_option, N_("color redundant metadata from previous line differently"), OUTPUT_COLOR_LINE),
>
> base-commit: 4253630c6f07a4bdcc9aa62a50e26a4d466219d1

  reply	other threads:[~2025-10-20 16:06 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-20 14:56 [PATCH] blame: make diff algorithm configurable Antonin Delpeuch via GitGitGadget
2025-10-20 16:05 ` Junio C Hamano [this message]
2025-10-22  9:37   ` Antonin Delpeuch
2025-10-22 20:39     ` Junio C Hamano
2025-10-23 16:03 ` Phillip Wood
2025-10-28 13:37 ` [PATCH v2] " Antonin Delpeuch via GitGitGadget
2025-10-28 15:22   ` Junio C Hamano
2025-10-28 16:00     ` Antonin Delpeuch
2025-10-28 21:14   ` [PATCH v3] " Antonin Delpeuch via GitGitGadget
2025-10-29 10:16     ` Phillip Wood
2025-10-29 18:46       ` Junio C Hamano
2025-10-30  9:22       ` Antonin Delpeuch
2025-10-30 10:47         ` Phillip Wood
2025-11-01 21:57     ` [PATCH v4 0/2] " Antonin Delpeuch via GitGitGadget
2025-11-01 21:57       ` [PATCH v4 1/2] xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASK Antonin Delpeuch via GitGitGadget
2025-11-03 14:32         ` Phillip Wood
2025-11-01 21:57       ` [PATCH v4 2/2] blame: make diff algorithm configurable Antonin Delpeuch via GitGitGadget
2025-11-03 14:32         ` Phillip Wood
2025-11-03 16:15           ` Junio C Hamano
2025-11-06 20:29             ` Junio C Hamano
2025-11-06 22:41       ` [PATCH v5 0/2] " Antonin Delpeuch via GitGitGadget
2025-11-06 22:41         ` [PATCH v5 1/2] xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASK Antonin Delpeuch via GitGitGadget
2025-11-07 15:52           ` Junio C Hamano
2025-11-06 22:41         ` [PATCH v5 2/2] blame: make diff algorithm configurable Antonin Delpeuch via GitGitGadget
2025-11-07 15:57           ` Junio C Hamano
2025-11-07 15:49         ` [PATCH v5 0/2] " Phillip Wood
2025-11-17  1:12           ` Junio C Hamano
2025-11-17  8:04         ` [PATCH v6 " Antonin Delpeuch via GitGitGadget
2025-11-17  8:04           ` [PATCH v6 1/2] xdiff: add 'minimal' to XDF_DIFF_ALGORITHM_MASK Antonin Delpeuch via GitGitGadget
2025-11-17  8:04           ` [PATCH v6 2/2] blame: make diff algorithm configurable Antonin Delpeuch via GitGitGadget
2025-11-17 14:13           ` [PATCH v6 0/2] " Phillip Wood
2025-11-17 18:24           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqldl51rtm.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=antonin@delpeuch.eu \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).