git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Yee Cheng Chin via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Yee Cheng Chin <ychin.git@gmail.com>
Subject: Re: [PATCH] xdiff: optimize patience diff's LCS search
Date: Wed, 26 Nov 2025 10:50:30 -0800	[thread overview]
Message-ID: <xmqqy0nsmxvt.fsf@gitster.g> (raw)
In-Reply-To: <pull.2109.git.git.1764152756908.gitgitgadget@gmail.com> (Yee Cheng Chin via GitGitGadget's message of "Wed, 26 Nov 2025 10:25:56 +0000")

"Yee Cheng Chin via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The find_longest_common_sequence() function in patience diff is
> inefficient as it calls binary_search() for every unique line it
> encounters when deciding where to put it in the sequence. From
> instrumentation (using xctrace) on popular repositories, binary_search()
> takes up 50-60% of the run time within patience_diff() when performing a
> diff.
>
> To optimize this, add a boundary condition check before binary_search()
> is called to see if the encountered unique line is located after the
> entire currently tracked longest subsequence. If so, skip the
> unnecessary binary search and simply append the entry to the end of
> sequence. Given that most files compared in a diff are usually quite
> similar to each other, this condition is very common, and should be hit
> much more frequently than the binary search.

This is a "stupid and obvious" optimization that is quite clever ;-)

> diff --git a/xdiff/xpatience.c b/xdiff/xpatience.c
> index 669b653580..13ab0d591c 100644
> --- a/xdiff/xpatience.c
> +++ b/xdiff/xpatience.c
> @@ -211,7 +211,10 @@ static int find_longest_common_sequence(struct hashmap *map, struct entry **res)
>  	for (entry = map->first; entry; entry = entry->next) {
>  		if (!entry->line2 || entry->line2 == NON_UNIQUE)
>  			continue;
> -		i = binary_search(sequence, longest, entry);
> +		if (longest == 0 || entry->line2 > sequence[longest - 1]->line2)
> +			i = longest - 1;
> +		else
> +			i = binary_search(sequence, longest, entry);

OK.  If we have nothing, or if the thing sorts after the existing
ones, then we do not have to run binsearch to find where to insert
it.  We know we want to append.

Nice.

  reply	other threads:[~2025-11-26 18:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-26 10:25 [PATCH] xdiff: optimize patience diff's LCS search Yee Cheng Chin via GitGitGadget
2025-11-26 18:50 ` Junio C Hamano [this message]
2025-11-26 20:15   ` Yee Cheng Chin
2025-11-27  2:16 ` [PATCH v2] " Yee Cheng Chin via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy0nsmxvt.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=ychin.git@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).