All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Michael Montalbo <mmontalbo@gmail.com>,
	Michael Montalbo <mmontalbo@gmail.com>
Subject: [PATCH v4 1/6] xdiff: support external hunks via xpparam_t
Date: Sun, 14 Jun 2026 18:59:18 +0000	[thread overview]
Message-ID: <03f261dfe2ed68194866f286c336db358861d7d5.1781463564.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2120.v4.git.1781463564.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add two new xpparam_t fields (external_hunks, external_hunks_nr)
that let callers supply pre-computed hunks.  When set, xdl_diff()
populates the changed[] arrays from these hunks instead of running
the diff algorithm, then continues through compaction and emission
as usual.

Validate supplied hunks before use: reject out-of-bounds line
numbers, overlapping or out-of-order hunks, negative counts, and
violations of the synchronization invariant (unchanged line counts
must match between files).  On validation failure, fall back to
the builtin diff algorithm; this re-runs xdl_prepare_env() since
the first call may have dirtied the changed[] arrays.

Skip trim_common_tail() in xdi_diff() when external hunks are
present, since external hunks reference line numbers in the
original content.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 xdiff-interface.c |   7 ++-
 xdiff/xdiff.h     |  14 ++++++
 xdiff/xdiffi.c    | 123 +++++++++++++++++++++++++++++++++++++++++++++-
 xdiff/xprepare.c  |  10 ++++
 xdiff/xprepare.h  |   1 +
 5 files changed, 152 insertions(+), 3 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 5ee2b96d0a..76a24fc589 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -124,7 +124,12 @@ int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t co
 	if (mf1->size > MAX_XDIFF_SIZE || mf2->size > MAX_XDIFF_SIZE)
 		return -1;
 
-	if (!xecfg->ctxlen && !(xecfg->flags & XDL_EMIT_FUNCCONTEXT))
+	/*
+	 * External hunks reference line numbers in the original content;
+	 * trimming the tail would change line counts and invalidate them.
+	 */
+	if (!xpp->external_hunks &&
+	    !xecfg->ctxlen && !(xecfg->flags & XDL_EMIT_FUNCCONTEXT))
 		trim_common_tail(&a, &b);
 
 	return xdl_diff(&a, &b, xpp, xecfg, xecb);
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index dc370712e9..dd4915fe16 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -78,6 +78,16 @@ typedef struct s_mmbuffer {
 	long size;
 } mmbuffer_t;
 
+/*
+ * Hunk descriptor for externally computed diffs.
+ * Line numbers are 1-based; a start of 0 is accepted when
+ * count is 0 (empty file side, matching git diff output).
+ */
+struct xdl_hunk {
+	long old_start, old_count;
+	long new_start, new_count;
+};
+
 typedef struct s_xpparam {
 	unsigned long flags;
 
@@ -88,6 +98,10 @@ typedef struct s_xpparam {
 	/* See Documentation/diff-options.adoc. */
 	char **anchors;
 	size_t anchors_nr;
+
+	/* Externally computed hunks: bypass the diff algorithm.  Owned by caller. */
+	struct xdl_hunk *external_hunks;
+	size_t external_hunks_nr;
 } xpparam_t;
 
 typedef struct s_xdemitcb {
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index c5a892f91e..bf820b52e3 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -1085,16 +1085,135 @@ static void xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
 	}
 }
 
+/*
+ * Populate the changed[] arrays from externally supplied hunks,
+ * bypassing the diff algorithm.  Validates that hunks are in order,
+ * non-overlapping, and within bounds.
+ *
+ * Returns 0 on success, -1 on validation failure.
+ */
+static int xdl_populate_hunks_from_external(xdfenv_t *xe,
+					    struct xdl_hunk *hunks,
+					    size_t nr_hunks)
+{
+	size_t i;
+	long j, prev_old_end = 0, prev_new_end = 0;
+	long changed_old = 0, changed_new = 0;
+
+	/*
+	 * xdl_prepare_env() may dirty changed[] via xdl_cleanup_records().
+	 * Clear them so only the external hunks are marked.
+	 */
+	xdl_clear_changed(&xe->xdf1);
+	xdl_clear_changed(&xe->xdf2);
+
+	for (i = 0; i < nr_hunks; i++) {
+		struct xdl_hunk *h = &hunks[i];
+
+		if (h->old_count < 0 || h->new_count < 0) {
+			warning("diff process hunk %"PRIuMAX": "
+				"negative count (old=%ld, new=%ld)",
+				(uintmax_t)(i + 1),
+				h->old_count, h->new_count);
+			return -1;
+		}
+		if (h->old_start < 1 || h->new_start < 1) {
+			warning("diff process hunk %"PRIuMAX": "
+				"start must be >= 1 (old=%ld, new=%ld)",
+				(uintmax_t)(i + 1),
+				h->old_start, h->new_start);
+			return -1;
+		}
+
+		/*
+		 * Range must fit: start + count - 1 <= nrec,
+		 * rewritten to avoid overflow.  Same for both sides.
+		 *
+		 * When count is 0 (pure insert/delete) the check
+		 * reduces to 0 > nrec - start + 1, which rejects
+		 * start > nrec + 1 and allows start == nrec + 1
+		 * (the position after the last line).
+		 */
+		if (h->old_count > (long)xe->xdf1.nrec - h->old_start + 1) {
+			warning("diff process hunk %"PRIuMAX": "
+				"old range %ld+%ld exceeds %lu lines",
+				(uintmax_t)(i + 1),
+				h->old_start, h->old_count,
+				(unsigned long)xe->xdf1.nrec);
+			return -1;
+		}
+		if (h->new_count > (long)xe->xdf2.nrec - h->new_start + 1) {
+			warning("diff process hunk %"PRIuMAX": "
+				"new range %ld+%ld exceeds %lu lines",
+				(uintmax_t)(i + 1),
+				h->new_start, h->new_count,
+				(unsigned long)xe->xdf2.nrec);
+			return -1;
+		}
+
+		/* Ordering: no overlap with previous hunk (adjacent is OK) */
+		if (h->old_start < prev_old_end ||
+		    h->new_start < prev_new_end) {
+			warning("diff process hunk %"PRIuMAX": "
+				"overlaps with previous hunk",
+				(uintmax_t)(i + 1));
+			return -1;
+		}
+
+		for (j = 0; j < h->old_count; j++)
+			xe->xdf1.changed[h->old_start - 1 + j] = true;
+		for (j = 0; j < h->new_count; j++)
+			xe->xdf2.changed[h->new_start - 1 + j] = true;
+
+		prev_old_end = h->old_start + h->old_count;
+		prev_new_end = h->new_start + h->new_count;
+	}
+
+	/*
+	 * Synchronization invariant: unchanged line counts must match.
+	 * Otherwise xdl_build_script() would walk off one array.
+	 *
+	 * Count changed lines from the arrays rather than accumulating
+	 * during the loop to avoid any overflow in the summation.
+	 */
+	for (j = 0; j < (long)xe->xdf1.nrec; j++)
+		if (xe->xdf1.changed[j])
+			changed_old++;
+	for (j = 0; j < (long)xe->xdf2.nrec; j++)
+		if (xe->xdf2.changed[j])
+			changed_new++;
+	if ((long)xe->xdf1.nrec - changed_old !=
+	    (long)xe->xdf2.nrec - changed_new) {
+		warning("diff process: unchanged line count mismatch "
+			"(old: %ld unchanged, new: %ld unchanged)",
+			(long)xe->xdf1.nrec - changed_old,
+			(long)xe->xdf2.nrec - changed_new);
+		return -1;
+	}
+
+	return 0;
+}
+
 int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
 	xdfenv_t xe;
 	emit_func_t ef = xecfg->hunk_func ? xdl_call_hunk_func : xdl_emit_diff;
 
-	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
+	if (xpp->external_hunks) {
+		if (xdl_prepare_env(mf1, mf2, xpp, &xe) < 0)
+			return -1;
+		if (xdl_populate_hunks_from_external(&xe,
+						     xpp->external_hunks,
+						     xpp->external_hunks_nr) == 0)
+			goto diff_done;
+		xdl_free_env(&xe);
+	}
 
+	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0)
 		return -1;
-	}
+
+diff_done:
 	if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
 	    xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
 	    xdl_build_script(&xe, &xscr) < 0) {
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index 11bada2608..f4ab935332 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -471,3 +471,13 @@ int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 
 	return 0;
 }
+
+/*
+ * Reset the changed[] array so that no lines are marked as changed.
+ * Also clears the sentinel slots at changed[-1] and changed[nrec]
+ * that xdl_change_compact() relies on during backward scans.
+ */
+void xdl_clear_changed(xdfile_t *xdf)
+{
+	memset(xdf->changed - 1, 0, (xdf->nrec + 2) * sizeof(bool));
+}
diff --git a/xdiff/xprepare.h b/xdiff/xprepare.h
index 947d9fc1bb..0413baf07b 100644
--- a/xdiff/xprepare.h
+++ b/xdiff/xprepare.h
@@ -28,6 +28,7 @@
 int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		    xdfenv_t *xe);
 void xdl_free_env(xdfenv_t *xe);
+void xdl_clear_changed(xdfile_t *xdf);
 
 
 
-- 
gitgitgadget


  reply	other threads:[~2026-06-14 18:59 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22  2:11 [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 1/5] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-22  5:29   ` Junio C Hamano
2026-05-22 19:06     ` Michael Montalbo
2026-05-24  8:50       ` Junio C Hamano
2026-05-24 18:01         ` Michael Montalbo
2026-05-22  2:11 ` [PATCH 2/5] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 3/5] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 4/5] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 5/5] diff-process-normalize: add built-in whitespace normalizer Michael Montalbo via GitGitGadget
2026-05-22  5:29 ` [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Junio C Hamano
2026-05-22 17:19   ` Michael Montalbo
2026-05-25 18:29 ` [PATCH v2 0/4] " Michael Montalbo via GitGitGadget
2026-05-25 18:29   ` [PATCH v2 1/4] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-25 18:29   ` [PATCH v2 2/4] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-25 18:29   ` [PATCH v2 3/4] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-26  1:56     ` Junio C Hamano
2026-05-29  0:51       ` Michael Montalbo
2026-05-26  2:26     ` Junio C Hamano
2026-05-29  0:55       ` Michael Montalbo
2026-05-25 18:29   ` [PATCH v2 4/4] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-29 20:48   ` [PATCH v3 0/6] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 1/6] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 2/6] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 3/6] sub-process: separate process lifecycle from hashmap management Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 4/6] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-06-07 14:36       ` Johannes Schindelin
2026-06-07 17:04         ` Michael Montalbo
2026-06-08 12:26           ` Junio C Hamano
2026-06-07 20:36         ` Michael Montalbo
2026-06-08 17:19           ` Junio C Hamano
2026-06-08 12:06         ` Junio C Hamano
2026-05-29 20:48     ` [PATCH v3 5/6] diff: bypass diff process with --no-ext-diff and in format-patch Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 6/6] blame: consult diff process for no-hunk detection Michael Montalbo via GitGitGadget
2026-05-31 10:44     ` [PATCH v3 0/6] [RFC] diff: add diff.<driver>.process for external hunk providers Junio C Hamano
2026-06-01  4:28       ` Michael Montalbo
2026-06-14 18:59     ` [PATCH v4 " Michael Montalbo via GitGitGadget
2026-06-14 18:59       ` Michael Montalbo via GitGitGadget [this message]
2026-06-14 18:59       ` [PATCH v4 2/6] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-06-14 18:59       ` [PATCH v4 3/6] sub-process: separate process lifecycle from hashmap management Michael Montalbo via GitGitGadget
2026-06-14 18:59       ` [PATCH v4 4/6] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-06-14 18:59       ` [PATCH v4 5/6] diff: bypass diff process with --no-ext-diff and in format-patch Michael Montalbo via GitGitGadget
2026-06-14 18:59       ` [PATCH v4 6/6] blame: consult diff process for no-hunk detection Michael Montalbo via GitGitGadget
     [not found]     ` <pull.2120.v4.git.1781463332.gitgitgadget@gmail.com>
2026-06-15 21:14       ` [PREVIEW v4 0/6] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03f261dfe2ed68194866f286c336db358861d7d5.1781463564.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=mmontalbo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.