git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] diff: add option to report binary files in raw diffs
@ 2025-11-04  2:14 Justin Tobler
  2025-11-04  2:26 ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Justin Tobler @ 2025-11-04  2:14 UTC (permalink / raw)
  To: git; +Cc: karthik.188, Justin Tobler

When generating patch diff output, if either side of a filepair is
detected as binary, Git omits the diff content and instead prints a
"Binary files differ" message. From this message it is known that at
least one of the files in the pair is considered binary, but not exactly
which ones.

Add a --report-binary-files diff option that, when enabled, extends the
raw diff output format to explicitly indicate for each file whether it
was considered binary or not.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
---

Greetings,

I have a usecase where I would like to know exactly which files in a
diff pair are considered binary by Git when computing diffs. When
computing patch diff output, Git already omits filepair diffs where at
least one side is considered binary and prints a "binary files differ"
message instead. From this message we cannot discern exactly which files
were considered binary by Git though.

In this patch, the raw diff format is extended with a
`--report-binary-files` option to explicitly specify which files in the
diff pair were considered binary. The output in this form looks
something like this:

        $ git diff-tree --abbrev=8 --report-binary-files HEAD~ HEAD
        :100644 100644 a1961526 e231acb1 bt M	foo
        :100644 100644 31eedd5c 402a70d7 bb M	bar

With this format, there is a new column before the status that specifies
the binary status for each file. 'b' indicates binary and 't' is used
otherwise.

In an earlier iteration of this patch, I originally extended the patch
output "binary files differ" message to indicate the binary status for
each file in the diff pair, but felt it wasn't the best place to do so
since I also want it to be machine friendly. So I ended up extending the
raw diff format instead.

I'm not entirely sure the current implementation is most ideal format
here so I'm very open to feedback. :)

-Justin

---
 Documentation/diff-format.adoc  | 12 ++++++++++++
 Documentation/diff-options.adoc |  4 ++++
 diff.c                          |  9 +++++++++
 diff.h                          |  6 ++++++
 t/t4012-diff-binary.sh          | 29 +++++++++++++++++++++++++++++
 5 files changed, 60 insertions(+)

diff --git a/Documentation/diff-format.adoc b/Documentation/diff-format.adoc
index 9f7e988241..74c0a064ad 100644
--- a/Documentation/diff-format.adoc
+++ b/Documentation/diff-format.adoc
@@ -83,6 +83,18 @@ quoted as explained for the configuration variable `core.quotePath`
 (see linkgit:git-config[1]).  Using `-z` the filename is output
 verbatim and the line is terminated by a NUL byte.
 
+With the `--report-binary-files` option, a new column is added prior to the
+status indicating for each file if Git considered it binary or not. If
+considered binary, a file is denoted with `b`. Otherwise, `t` is used. This
+column is followed by a space character. Combined diffs do not report binary
+file info.
+
+Example:
+
+------------------------------------------------
+:100644 100644 5be4a4a cc95eb0 bt M file.c
+------------------------------------------------
+
 diff format for merges
 ----------------------
 
diff --git a/Documentation/diff-options.adoc b/Documentation/diff-options.adoc
index ae31520f7f..54eb48c067 100644
--- a/Documentation/diff-options.adoc
+++ b/Documentation/diff-options.adoc
@@ -544,6 +544,10 @@ ifndef::git-format-patch[]
 	Implies `--patch`.
 endif::git-format-patch[]
 
+`--report-binary-files`::
+	Adds a column to raw diff output to report for each file in the pair
+	whether it was considered binary by Git.
+
 `--abbrev[=<n>]`::
 	Instead of showing the full 40-byte hexadecimal object
 	name in diff-raw format output and diff-tree header
diff --git a/diff.c b/diff.c
index a1961526c0..e231acb1a9 100644
--- a/diff.c
+++ b/diff.c
@@ -5747,6 +5747,8 @@ struct option *add_diff_options(const struct option *opts,
 		OPT_CALLBACK_F(0, "binary", options, NULL,
 			       N_("output a binary diff that can be applied"),
 			       PARSE_OPT_NONEG | PARSE_OPT_NOARG, diff_opt_binary),
+		OPT_BOOL(0, "report-binary-files", &options->report_binary_files,
+			 N_("report if pre- and post-image blobs are binary")),
 		OPT_BOOL(0, "full-index", &options->flags.full_index,
 			 N_("show full pre- and post-image object names on the \"index\" lines")),
 		OPT_COLOR_FLAG(0, "color", &options->use_color,
@@ -6111,6 +6113,13 @@ static void diff_flush_raw(struct diff_filepair *p, struct diff_options *opt)
 		fprintf(opt->file, "%s ",
 			diff_aligned_abbrev(&p->two->oid, opt->abbrev));
 	}
+
+	if (opt->report_binary_files) {
+		char one = diff_filespec_is_binary(opt->repo, p->one) ? 'b' : 't';
+		char two = diff_filespec_is_binary(opt->repo, p->two) ? 'b' : 't';
+		fprintf(opt->file, "%c%c ", one, two);
+	}
+
 	if (p->score) {
 		fprintf(opt->file, "%c%03d%c", p->status, similarity_index(p),
 			inter_name_termination);
diff --git a/diff.h b/diff.h
index 31eedd5c0c..402a70d7ad 100644
--- a/diff.h
+++ b/diff.h
@@ -369,6 +369,12 @@ struct diff_options {
 	 */
 	int skip_resolving_statuses;
 
+	/*
+	 * When generating raw diff output, report for each file whether it was
+	 * considered binary.
+	 */
+	int report_binary_files;
+
 	/* Callback which allows tweaking the options in diff_setup_done(). */
 	void (*set_default)(struct diff_options *);
 
diff --git a/t/t4012-diff-binary.sh b/t/t4012-diff-binary.sh
index d1d30ac2a9..e026e1d3a4 100755
--- a/t/t4012-diff-binary.sh
+++ b/t/t4012-diff-binary.sh
@@ -130,4 +130,33 @@ test_expect_success 'diff --stat with binary files and big change count' '
 	test_cmp expect actual
 '
 
+test_expect_success SHA1 'diff --report-binary-files' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		echo foo >foo &&
+		printf "\0bar\0" >bar &&
+		echo baz >baz &&
+		git add foo bar baz &&
+		git commit -m foo &&
+
+		printf "\0foo\0" >foo &&
+		printf "\0bar2\0" >bar &&
+		echo baz2 >baz &&
+		git commit -am "binary foo" &&
+
+		cat >expect <<-\EOF &&
+		:100644 100644 e02d9a3a8aeb904ccc3bb9ed0600f2e963ba1a10 884a24af772a87733e911a3491c0ab576d34c06c bb M	bar
+		:100644 100644 76018072e09c5d31c8c6e3113b8aa0fe625195ca 3414c84ca6b7ca9cbbe40dd44f4d0715c1464f6e tt M	baz
+		:100644 100644 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 a60073ceafeca287824d7b9ac3eebef233b72fce tb M	foo
+		EOF
+
+		git diff-tree --report-binary-files HEAD~ HEAD >out &&
+
+		test_cmp expect out
+	)
+'
+
 test_done

base-commit: 7f278e958afbf9b7e0727631b4c26dcfa1c63d6e
-- 
2.51.0.193.g4975ec3473b


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-11-07 17:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04  2:14 [RFC PATCH] diff: add option to report binary files in raw diffs Justin Tobler
2025-11-04  2:26 ` Junio C Hamano
2025-11-04  4:44   ` Junio C Hamano
2025-11-05  0:17     ` Justin Tobler
2025-11-05  8:04       ` Junio C Hamano
2025-11-06 21:42         ` Justin Tobler
2025-11-07  8:30           ` Torsten Bögershausen
2025-11-07 16:07             ` Junio C Hamano
2025-11-07 17:16             ` Justin Tobler
2025-11-07 17:26               ` Junio C Hamano
2025-11-05 12:14       ` Ben Knoble
2025-11-06 21:52         ` Justin Tobler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).