From: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Michael Montalbo <mmontalbo@gmail.com>,
Michael Montalbo <mmontalbo@gmail.com>
Subject: [PATCH 5/5] diff-process-normalize: add built-in whitespace normalizer
Date: Fri, 22 May 2026 02:11:24 +0000 [thread overview]
Message-ID: <8c7359b8a1bb59087947993cb6b09fe3496d1766.1779415884.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2120.git.1779415884.gitgitgadget@gmail.com>
From: Michael Montalbo <mmontalbo@gmail.com>
Add git diff-process-normalize, a built-in diff process that
detects whitespace-only changes. It compares files line by line
using xdiff_compare_lines() with XDF_IGNORE_WHITESPACE (same
logic as "git diff -w"). If all lines match, it returns zero
hunks; otherwise it returns an error so git falls back to the
builtin diff algorithm.
[diff "cdiff"]
process = git diff-process-normalize
Update documentation to describe zero-hunk behavior for diff
and blame, and document the built-in normalize tool.
Signed-off-by: Michael Montalbo <mmontalbo@gmail.com>
---
Documentation/config/diff.adoc | 2 +
Documentation/gitattributes.adoc | 15 ++++
Makefile | 1 +
builtin.h | 1 +
builtin/diff-process-normalize.c | 143 +++++++++++++++++++++++++++++++
git.c | 1 +
t/t4080-diff-process.sh | 60 +++++++++++++
7 files changed, 223 insertions(+)
create mode 100644 builtin/diff-process-normalize.c
diff --git a/Documentation/config/diff.adoc b/Documentation/config/diff.adoc
index 4ab5f60df6..475736c6ed 100644
--- a/Documentation/config/diff.adoc
+++ b/Documentation/config/diff.adoc
@@ -224,6 +224,8 @@ endif::git-diff[]
hunks that are fed into Git's diff and blame pipelines.
If the tool returns zero hunks, the file is treated as
unchanged for both diff output and blame attribution.
+ Git provides `git diff-process-normalize` as a built-in
+ tool that detects whitespace-only changes.
See linkgit:gitattributes[5] for details.
`diff.indentHeuristic`::
diff --git a/Documentation/gitattributes.adoc b/Documentation/gitattributes.adoc
index 7d66fa3aa1..3f1d7affd8 100644
--- a/Documentation/gitattributes.adoc
+++ b/Documentation/gitattributes.adoc
@@ -861,6 +861,21 @@ the file as having no changes and produces no diff output.
where it reports zero hunks, attributing lines to earlier commits
instead.
+Git ships with a built-in diff process, `git diff-process-normalize`,
+that detects whitespace-only changes. Files whose only differences
+are whitespace produce zero hunks; files with non-whitespace changes
+fall back to the builtin diff algorithm. To use it:
+
+----------------------------------------------------------------
+[diff "cdiff"]
+ process = git diff-process-normalize
+----------------------------------------------------------------
+
+This is useful after running a code formatter: `git diff` shows
+no output for files that only had whitespace changes,
+`git blame` skips whitespace-only commits automatically without
+requiring a `.git-blame-ignore-revs` file.
+
Tools should ignore unknown keys in the per-file request to
remain forward-compatible.
diff --git a/Makefile b/Makefile
index 22900368dd..01acfaf7b8 100644
--- a/Makefile
+++ b/Makefile
@@ -1409,6 +1409,7 @@ BUILTIN_OBJS += builtin/diagnose.o
BUILTIN_OBJS += builtin/diff-files.o
BUILTIN_OBJS += builtin/diff-index.o
BUILTIN_OBJS += builtin/diff-pairs.o
+BUILTIN_OBJS += builtin/diff-process-normalize.o
BUILTIN_OBJS += builtin/diff-tree.o
BUILTIN_OBJS += builtin/diff.o
BUILTIN_OBJS += builtin/difftool.o
diff --git a/builtin.h b/builtin.h
index 235c51f30e..c713a0417f 100644
--- a/builtin.h
+++ b/builtin.h
@@ -178,6 +178,7 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix, struct repos
int cmd_diff_index(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_diff(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_diff_pairs(int argc, const char **argv, const char *prefix, struct repository *repo);
+int cmd_diff_process_normalize(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_diff_tree(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_difftool(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_env__helper(int argc, const char **argv, const char *prefix, struct repository *repo);
diff --git a/builtin/diff-process-normalize.c b/builtin/diff-process-normalize.c
new file mode 100644
index 0000000000..1580f6b7d9
--- /dev/null
+++ b/builtin/diff-process-normalize.c
@@ -0,0 +1,143 @@
+/*
+ * Built-in diff process that returns zero hunks for files whose
+ * only differences are whitespace, and status=error otherwise.
+ * See diff-process.c for the protocol and gitattributes(5) for usage.
+ *
+ * Uses xdiff_compare_lines() with XDF_IGNORE_WHITESPACE to compare
+ * lines, giving the same whitespace handling as "git diff -w".
+ */
+
+#include "builtin.h"
+#include "pkt-line.h"
+#include "strbuf.h"
+#include "xdiff-interface.h"
+
+/*
+ * Read a single pkt-line. Returns 1 for data, 0 for flush, -1 for EOF.
+ */
+static int read_pkt(int fd, struct strbuf *line)
+{
+ int len;
+ char *data;
+
+ if (packet_read_line_gently(fd, &len, &data) < 0)
+ return -1;
+ if (!data || !len)
+ return 0; /* flush */
+ strbuf_reset(line);
+ strbuf_add(line, data, len);
+ strbuf_rtrim(line);
+ return 1;
+}
+
+/*
+ * Read packetized content until a flush packet.
+ */
+static int read_content(int fd, struct strbuf *out)
+{
+ strbuf_reset(out);
+ if (read_packetized_to_strbuf(fd, out, PACKET_READ_GENTLE_ON_EOF) < 0)
+ return -1;
+ return 0;
+}
+
+/*
+ * Compare two buffers line by line using xdiff_compare_lines() with
+ * XDF_IGNORE_WHITESPACE (same logic as "git diff -w").
+ * Returns 1 if all lines match, 0 otherwise.
+ */
+static int whitespace_equivalent(const char *a, long size_a,
+ const char *b, long size_b)
+{
+ const char *ea = a + size_a;
+ const char *eb = b + size_b;
+
+ while (a < ea && b < eb) {
+ const char *eol_a = memchr(a, '\n', ea - a);
+ const char *eol_b = memchr(b, '\n', eb - b);
+ long len_a = (eol_a ? eol_a : ea) - a;
+ long len_b = (eol_b ? eol_b : eb) - b;
+
+ if (!xdiff_compare_lines(a, len_a, b, len_b,
+ XDF_IGNORE_WHITESPACE))
+ return 0;
+
+ a += len_a + (eol_a ? 1 : 0);
+ b += len_b + (eol_b ? 1 : 0);
+ }
+
+ /* Both sides must be exhausted */
+ return a >= ea && b >= eb;
+}
+
+int cmd_diff_process_normalize(int argc UNUSED, const char **argv UNUSED,
+ const char *prefix UNUSED,
+ struct repository *repo UNUSED)
+{
+ struct strbuf line = STRBUF_INIT;
+ struct strbuf old_content = STRBUF_INIT;
+ struct strbuf new_content = STRBUF_INIT;
+ int ret;
+
+ /* Handshake: read client greeting */
+ ret = read_pkt(0, &line);
+ if (ret <= 0 || strcmp(line.buf, "git-diff-client"))
+ return 1;
+ ret = read_pkt(0, &line);
+ if (ret <= 0 || strcmp(line.buf, "version=1"))
+ return 1;
+ read_pkt(0, &line); /* flush */
+
+ /* Send server greeting */
+ packet_write_fmt(1, "git-diff-server\n");
+ packet_write_fmt(1, "version=1\n");
+ packet_flush(1);
+
+ /* Read client capabilities until flush */
+ while ((ret = read_pkt(0, &line)) > 0)
+ ; /* consume */
+
+ /* Send our capabilities */
+ packet_write_fmt(1, "capability=hunks\n");
+ packet_flush(1);
+
+ /* Main loop: process file pairs */
+ for (;;) {
+ int have_command = 0;
+
+ /* Read request headers until flush */
+ while ((ret = read_pkt(0, &line)) > 0) {
+ if (starts_with(line.buf, "command="))
+ have_command = 1;
+ }
+ if (ret < 0)
+ break; /* EOF: client closed connection */
+ if (!have_command)
+ break;
+
+ /* Read old file content */
+ if (read_content(0, &old_content) < 0)
+ break;
+ /* Read new file content */
+ if (read_content(0, &new_content) < 0)
+ break;
+
+ if (whitespace_equivalent(old_content.buf, old_content.len,
+ new_content.buf, new_content.len)) {
+ /* Whitespace-only differences */
+ packet_flush(1); /* zero hunks */
+ packet_write_fmt(1, "status=success\n");
+ packet_flush(1);
+ } else {
+ /* Non-whitespace differences: fall back */
+ packet_flush(1);
+ packet_write_fmt(1, "status=error\n");
+ packet_flush(1);
+ }
+ }
+
+ strbuf_release(&line);
+ strbuf_release(&old_content);
+ strbuf_release(&new_content);
+ return 0;
+}
diff --git a/git.c b/git.c
index 5a40eab8a2..6239240b02 100644
--- a/git.c
+++ b/git.c
@@ -568,6 +568,7 @@ static struct cmd_struct commands[] = {
{ "diff-files", cmd_diff_files, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
{ "diff-index", cmd_diff_index, RUN_SETUP | NO_PARSEOPT },
{ "diff-pairs", cmd_diff_pairs, RUN_SETUP | NO_PARSEOPT },
+ { "diff-process-normalize", cmd_diff_process_normalize, NO_PARSEOPT },
{ "diff-tree", cmd_diff_tree, RUN_SETUP | NO_PARSEOPT },
{ "difftool", cmd_difftool, RUN_SETUP_GENTLY },
{ "fast-export", cmd_fast_export, RUN_SETUP },
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index 5ed644b786..a6fa1df456 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -366,5 +366,65 @@ test_expect_success PYTHON 'blame skips commits with zero hunks from diff proces
! grep "$BLAME_COMMIT" with
'
+NORMALIZE="git diff-process-normalize"
+
+test_expect_success 'diff-process-normalize setup' '
+ echo "*.c diff=cdiff" >.gitattributes &&
+ git add .gitattributes &&
+ test_commit normalize-base
+'
+
+test_expect_success 'diff-process-normalize suppresses whitespace-only changes' '
+ cat >ws.c <<-\EOF &&
+ int main(void)
+ {
+ return 0;
+ }
+ EOF
+ git add ws.c &&
+ git commit -m "add ws.c" &&
+
+ cat >ws.c <<-\EOF &&
+ int main(void)
+ {
+ return 0;
+ }
+ EOF
+
+ git -c diff.cdiff.process="$NORMALIZE" \
+ diff ws.c >actual &&
+ test_must_be_empty actual
+'
+
+test_expect_success 'diff-process-normalize falls back on non-whitespace changes' '
+ cat >ws.c <<-\EOF &&
+ int main(void)
+ {
+ return 0;
+ }
+
+ int added_function(void)
+ {
+ return 99;
+ }
+ EOF
+
+ git -c diff.cdiff.process="$NORMALIZE" \
+ diff ws.c >actual &&
+ grep "added_function" actual
+'
+
+test_expect_success 'diff-process-normalize falls back on mixed whitespace and real changes' '
+ cat >ws.c <<-\EOF &&
+ int main(void)
+ {
+ return 42;
+ }
+ EOF
+
+ git -c diff.cdiff.process="$NORMALIZE" \
+ diff ws.c >actual &&
+ grep "return 42" actual
+'
test_done
--
gitgitgadget
next prev parent reply other threads:[~2026-05-22 2:11 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-22 2:11 [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 1/5] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-22 5:29 ` Junio C Hamano
2026-05-22 2:11 ` [PATCH 2/5] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 3/5] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 4/5] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` Michael Montalbo via GitGitGadget [this message]
2026-05-22 5:29 ` [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8c7359b8a1bb59087947993cb6b09fe3496d1766.1779415884.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=mmontalbo@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox