Git development
 help / color / mirror / Atom feed
From: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Michael Montalbo <mmontalbo@gmail.com>,
	Michael Montalbo <mmontalbo@gmail.com>
Subject: [PATCH 5/5] diff-process-normalize: add built-in whitespace normalizer
Date: Fri, 22 May 2026 02:11:24 +0000	[thread overview]
Message-ID: <8c7359b8a1bb59087947993cb6b09fe3496d1766.1779415884.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2120.git.1779415884.gitgitgadget@gmail.com>

From: Michael Montalbo <mmontalbo@gmail.com>

Add git diff-process-normalize, a built-in diff process that
detects whitespace-only changes.  It compares files line by line
using xdiff_compare_lines() with XDF_IGNORE_WHITESPACE (same
logic as "git diff -w").  If all lines match, it returns zero
hunks; otherwise it returns an error so git falls back to the
builtin diff algorithm.

    [diff "cdiff"]
        process = git diff-process-normalize

Update documentation to describe zero-hunk behavior for diff
and blame, and document the built-in normalize tool.

Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
 Documentation/config/diff.adoc   |   2 +
 Documentation/gitattributes.adoc |  15 ++++
 Makefile                         |   1 +
 builtin.h                        |   1 +
 builtin/diff-process-normalize.c | 143 +++++++++++++++++++++++++++++++
 git.c                            |   1 +
 t/t4080-diff-process.sh          |  60 +++++++++++++
 7 files changed, 223 insertions(+)
 create mode 100644 builtin/diff-process-normalize.c

diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
index 4ab5f60df6..475736c6ed 100644
--- a/Documentation/config/diff.adoc
+++ b/Documentation/config/diff.adoc
@@ -224,6 +224,8 @@ endif::git-diff[]
 	hunks that are fed into Git's diff and blame pipelines.
 	If the tool returns zero hunks, the file is treated as
 	unchanged for both diff output and blame attribution.
+	Git provides `git diff-process-normalize` as a built-in
+	tool that detects whitespace-only changes.
 	See linkgit:gitattributes[5] for details.
 
 `diff.indentHeuristic`::
diff --git a/Documentation/gitattributes.adoc b/Documentation/gitattributes.adoc
index 7d66fa3aa1..3f1d7affd8 100644
--- a/Documentation/gitattributes.adoc
+++ b/Documentation/gitattributes.adoc
@@ -861,6 +861,21 @@ the file as having no changes and produces no diff output.
 where it reports zero hunks, attributing lines to earlier commits
 instead.
 
+Git ships with a built-in diff process, `git diff-process-normalize`,
+that detects whitespace-only changes.  Files whose only differences
+are whitespace produce zero hunks; files with non-whitespace changes
+fall back to the builtin diff algorithm.  To use it:
+
+----------------------------------------------------------------
+[diff "cdiff"]
+  process = git diff-process-normalize
+----------------------------------------------------------------
+
+This is useful after running a code formatter: `git diff` shows
+no output for files that only had whitespace changes,
+`git blame` skips whitespace-only commits automatically without
+requiring a `.git-blame-ignore-revs` file.
+
 Tools should ignore unknown keys in the per-file request to
 remain forward-compatible.
 
diff --git a/Makefile b/Makefile
index 22900368dd..01acfaf7b8 100644
--- a/Makefile
+++ b/Makefile
@@ -1409,6 +1409,7 @@ BUILTIN_OBJS += builtin/diagnose.o
 BUILTIN_OBJS += builtin/diff-files.o
 BUILTIN_OBJS += builtin/diff-index.o
 BUILTIN_OBJS += builtin/diff-pairs.o
+BUILTIN_OBJS += builtin/diff-process-normalize.o
 BUILTIN_OBJS += builtin/diff-tree.o
 BUILTIN_OBJS += builtin/diff.o
 BUILTIN_OBJS += builtin/difftool.o
diff --git a/builtin.h b/builtin.h
index 235c51f30e..c713a0417f 100644
--- a/builtin.h
+++ b/builtin.h
@@ -178,6 +178,7 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix, struct repos
 int cmd_diff_index(int argc, const char **argv, const char *prefix, struct repository *repo);
 int cmd_diff(int argc, const char **argv, const char *prefix, struct repository *repo);
 int cmd_diff_pairs(int argc, const char **argv, const char *prefix, struct repository *repo);
+int cmd_diff_process_normalize(int argc, const char **argv, const char *prefix, struct repository *repo);
 int cmd_diff_tree(int argc, const char **argv, const char *prefix, struct repository *repo);
 int cmd_difftool(int argc, const char **argv, const char *prefix, struct repository *repo);
 int cmd_env__helper(int argc, const char **argv, const char *prefix, struct repository *repo);
diff --git a/builtin/diff-process-normalize.c b/builtin/diff-process-normalize.c
new file mode 100644
index 0000000000..1580f6b7d9
--- /dev/null
+++ b/builtin/diff-process-normalize.c
@@ -0,0 +1,143 @@
+/*
+ * Built-in diff process that returns zero hunks for files whose
+ * only differences are whitespace, and status=error otherwise.
+ * See diff-process.c for the protocol and gitattributes(5) for usage.
+ *
+ * Uses xdiff_compare_lines() with XDF_IGNORE_WHITESPACE to compare
+ * lines, giving the same whitespace handling as "git diff -w".
+ */
+
+#include "builtin.h"
+#include "pkt-line.h"
+#include "strbuf.h"
+#include "xdiff-interface.h"
+
+/*
+ * Read a single pkt-line.  Returns 1 for data, 0 for flush, -1 for EOF.
+ */
+static int read_pkt(int fd, struct strbuf *line)
+{
+	int len;
+	char *data;
+
+	if (packet_read_line_gently(fd, &len, &data) < 0)
+		return -1;
+	if (!data || !len)
+		return 0; /* flush */
+	strbuf_reset(line);
+	strbuf_add(line, data, len);
+	strbuf_rtrim(line);
+	return 1;
+}
+
+/*
+ * Read packetized content until a flush packet.
+ */
+static int read_content(int fd, struct strbuf *out)
+{
+	strbuf_reset(out);
+	if (read_packetized_to_strbuf(fd, out, PACKET_READ_GENTLE_ON_EOF) < 0)
+		return -1;
+	return 0;
+}
+
+/*
+ * Compare two buffers line by line using xdiff_compare_lines() with
+ * XDF_IGNORE_WHITESPACE (same logic as "git diff -w").
+ * Returns 1 if all lines match, 0 otherwise.
+ */
+static int whitespace_equivalent(const char *a, long size_a,
+				 const char *b, long size_b)
+{
+	const char *ea = a + size_a;
+	const char *eb = b + size_b;
+
+	while (a < ea && b < eb) {
+		const char *eol_a = memchr(a, '\n', ea - a);
+		const char *eol_b = memchr(b, '\n', eb - b);
+		long len_a = (eol_a ? eol_a : ea) - a;
+		long len_b = (eol_b ? eol_b : eb) - b;
+
+		if (!xdiff_compare_lines(a, len_a, b, len_b,
+					 XDF_IGNORE_WHITESPACE))
+			return 0;
+
+		a += len_a + (eol_a ? 1 : 0);
+		b += len_b + (eol_b ? 1 : 0);
+	}
+
+	/* Both sides must be exhausted */
+	return a >= ea && b >= eb;
+}
+
+int cmd_diff_process_normalize(int argc UNUSED, const char **argv UNUSED,
+			       const char *prefix UNUSED,
+			       struct repository *repo UNUSED)
+{
+	struct strbuf line = STRBUF_INIT;
+	struct strbuf old_content = STRBUF_INIT;
+	struct strbuf new_content = STRBUF_INIT;
+	int ret;
+
+	/* Handshake: read client greeting */
+	ret = read_pkt(0, &line);
+	if (ret <= 0 || strcmp(line.buf, "git-diff-client"))
+		return 1;
+	ret = read_pkt(0, &line);
+	if (ret <= 0 || strcmp(line.buf, "version=1"))
+		return 1;
+	read_pkt(0, &line); /* flush */
+
+	/* Send server greeting */
+	packet_write_fmt(1, "git-diff-server\n");
+	packet_write_fmt(1, "version=1\n");
+	packet_flush(1);
+
+	/* Read client capabilities until flush */
+	while ((ret = read_pkt(0, &line)) > 0)
+		; /* consume */
+
+	/* Send our capabilities */
+	packet_write_fmt(1, "capability=hunks\n");
+	packet_flush(1);
+
+	/* Main loop: process file pairs */
+	for (;;) {
+		int have_command = 0;
+
+		/* Read request headers until flush */
+		while ((ret = read_pkt(0, &line)) > 0) {
+			if (starts_with(line.buf, "command="))
+				have_command = 1;
+		}
+		if (ret < 0)
+			break; /* EOF: client closed connection */
+		if (!have_command)
+			break;
+
+		/* Read old file content */
+		if (read_content(0, &old_content) < 0)
+			break;
+		/* Read new file content */
+		if (read_content(0, &new_content) < 0)
+			break;
+
+		if (whitespace_equivalent(old_content.buf, old_content.len,
+					  new_content.buf, new_content.len)) {
+			/* Whitespace-only differences */
+			packet_flush(1); /* zero hunks */
+			packet_write_fmt(1, "status=success\n");
+			packet_flush(1);
+		} else {
+			/* Non-whitespace differences: fall back */
+			packet_flush(1);
+			packet_write_fmt(1, "status=error\n");
+			packet_flush(1);
+		}
+	}
+
+	strbuf_release(&line);
+	strbuf_release(&old_content);
+	strbuf_release(&new_content);
+	return 0;
+}
diff --git a/git.c b/git.c
index 5a40eab8a2..6239240b02 100644
--- a/git.c
+++ b/git.c
@@ -568,6 +568,7 @@ static struct cmd_struct commands[] = {
 	{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "diff-index", cmd_diff_index, RUN_SETUP | NO_PARSEOPT },
 	{ "diff-pairs", cmd_diff_pairs, RUN_SETUP | NO_PARSEOPT },
+	{ "diff-process-normalize", cmd_diff_process_normalize, NO_PARSEOPT },
 	{ "diff-tree", cmd_diff_tree, RUN_SETUP | NO_PARSEOPT },
 	{ "difftool", cmd_difftool, RUN_SETUP_GENTLY },
 	{ "fast-export", cmd_fast_export, RUN_SETUP },
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index 5ed644b786..a6fa1df456 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -366,5 +366,65 @@ test_expect_success PYTHON 'blame skips commits with zero hunks from diff proces
 	! grep "$BLAME_COMMIT" with
 '
 
+NORMALIZE="git diff-process-normalize"
+
+test_expect_success 'diff-process-normalize setup' '
+	echo "*.c diff=cdiff" >.gitattributes &&
+	git add .gitattributes &&
+	test_commit normalize-base
+'
+
+test_expect_success 'diff-process-normalize suppresses whitespace-only changes' '
+	cat >ws.c <<-\EOF &&
+	int main(void)
+	{
+	    return 0;
+	}
+	EOF
+	git add ws.c &&
+	git commit -m "add ws.c" &&
+
+	cat >ws.c <<-\EOF &&
+	int main(void)
+	{
+	        return 0;
+	}
+	EOF
+
+	git -c diff.cdiff.process="$NORMALIZE" \
+		diff ws.c >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success 'diff-process-normalize falls back on non-whitespace changes' '
+	cat >ws.c <<-\EOF &&
+	int main(void)
+	{
+	    return 0;
+	}
+
+	int added_function(void)
+	{
+	    return 99;
+	}
+	EOF
+
+	git -c diff.cdiff.process="$NORMALIZE" \
+		diff ws.c >actual &&
+	grep "added_function" actual
+'
+
+test_expect_success 'diff-process-normalize falls back on mixed whitespace and real changes' '
+	cat >ws.c <<-\EOF &&
+	int main(void)
+	{
+	        return 42;
+	}
+	EOF
+
+	git -c diff.cdiff.process="$NORMALIZE" \
+		diff ws.c >actual &&
+	grep "return 42" actual
+'
 
 test_done
-- 
gitgitgadget

  parent reply	other threads:[~2026-05-22  2:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22  2:11 [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 1/5] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-22  5:29   ` Junio C Hamano
2026-05-22  2:11 ` [PATCH 2/5] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 3/5] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 4/5] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` Michael Montalbo via GitGitGadget [this message]
2026-05-22  5:29 ` [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8c7359b8a1bb59087947993cb6b09fe3496d1766.1779415884.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mmontalbo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox