All of lore.kernel.org
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: Niels Glodny <n.glodny@campus.lmu.de>, git@vger.kernel.org
Cc: johannes.schindelin@gmx.de, peff@peff.net, phillip.wood@dunelm.org.uk
Subject: Re: [PATCH v2] xdiff: disable cleanup_records heuristic with --minimal
Date: Tue, 29 Apr 2025 10:00:32 +0100	[thread overview]
Message-ID: <10aed56b-8036-4458-97ad-7fa319b18f10@gmail.com> (raw)
In-Reply-To: <20250427220653.2325573-1-n.glodny@campus.lmu.de>

Hi Niels

On 27/04/2025 23:06, Niels Glodny wrote:
> The cleanup_records function marks some lines as changed before running
> the actual diff algorithm. For most lines, this is a good performance
> optimization, but it also marks lines that are surrounded by many
> changed lines as changed as well. This can cause redundant changes and
> longer-than-necessary diffs.
> 
> Whether this results in better-looking diffs is subjective. However, the
> --minimal flag explicitly requests the shortest possible diff.
> 
> A performance impact of this is not measurable, and it results in
> shorter diffs in about 1.3% of diffs in Git's history.

Thanks for re-rolling, the changes all look good to me. As Junio said it 
would be helpful to put the performance numbers in the commit message.

Thanks

Phillip

> Signed-off-by: Niels Glodny <n.glodny@campus.lmu.de>
> ---
>   t/meson.build           |  1 +
>   t/t4071-diff-minimal.sh | 14 ++++++++++++++
>   xdiff/xprepare.c        |  5 +++--
>   3 files changed, 18 insertions(+), 2 deletions(-)
>   create mode 100755 t/t4071-diff-minimal.sh
> 
> diff --git a/t/meson.build b/t/meson.build
> index bfb744e886..8f2e9d2c50 100644
> --- a/t/meson.build
> +++ b/t/meson.build
> @@ -501,6 +501,7 @@ integration_tests = [
>     't4068-diff-symmetric-merge-base.sh',
>     't4069-remerge-diff.sh',
>     't4070-diff-pairs.sh',
> +  't4071-diff-minimal.sh',
>     't4100-apply-stat.sh',
>     't4101-apply-nonl.sh',
>     't4102-apply-rename.sh',
> diff --git a/t/t4071-diff-minimal.sh b/t/t4071-diff-minimal.sh
> new file mode 100755
> index 0000000000..4c484dadfb
> --- /dev/null
> +++ b/t/t4071-diff-minimal.sh
> @@ -0,0 +1,14 @@
> +#!/bin/sh
> +
> +test_description='minimal diff algorithm'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'minimal diff should not mark changes between changed lines' '
> +	test_write_lines x x x x >pre &&
> +	test_write_lines x x x A B C D x E F G >post &&
> +	test_expect_code 1 git diff --no-index --minimal pre post >diff &&
> +	test_grep ! ^[+-]x diff
> +'
> +
> +test_done
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index c84549f6c5..e1d4017b2d 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -368,6 +368,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>   	xrecord_t **recs;
>   	xdlclass_t *rcrec;
>   	char *dis, *dis1, *dis2;
> +	int need_min = !!(cf->flags & XDF_NEED_MINIMAL);
>   
>   	if (!XDL_CALLOC_ARRAY(dis, xdf1->nrec + xdf2->nrec + 2))
>   		return -1;
> @@ -379,7 +380,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>   	for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
>   		rcrec = cf->rcrecs[(*recs)->ha];
>   		nm = rcrec ? rcrec->len2 : 0;
> -		dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
> +		dis1[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>   	}
>   
>   	if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
> @@ -387,7 +388,7 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>   	for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
>   		rcrec = cf->rcrecs[(*recs)->ha];
>   		nm = rcrec ? rcrec->len1 : 0;
> -		dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
> +		dis2[i] = (nm == 0) ? 0: (nm >= mlim && !need_min) ? 2: 1;
>   	}
>   
>   	for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart];
> 
> base-commit: f65182a99e545d2f2bc22e6c1c2da192133b16a3


  reply	other threads:[~2025-04-29  9:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-25 15:59 [PATCH] xdiff: disable cleanup_records heuristic with --minimal Niels Glodny
2025-04-27 15:04 ` Phillip Wood
2025-04-27 21:44   ` Niels Glodny
2025-04-28 17:05     ` Junio C Hamano
2025-04-27 22:06 ` [PATCH v2] " Niels Glodny
2025-04-29  9:00   ` Phillip Wood [this message]
2025-04-29 14:09 ` [PATCH v3] " Niels Glodny
2025-05-06 13:21   ` Phillip Wood
2025-05-06 22:50     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10aed56b-8036-4458-97ad-7fa319b18f10@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=n.glodny@campus.lmu.de \
    --cc=peff@peff.net \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.