git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: Justin Tobler <jltobler@gmail.com>, git@vger.kernel.org
Cc: peff@peff.net, Patrick Steinhardt <ps@pks.im>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 2/3] builtin: introduce diff-pairs command
Date: Mon, 17 Feb 2025 14:38:11 +0000	[thread overview]
Message-ID: <d6d4230e-7b80-4eec-b218-37717ae2e298@gmail.com> (raw)
In-Reply-To: <20250212041825.2455031-3-jltobler@gmail.com>

Hi Justin

On 12/02/2025 04:18, Justin Tobler wrote:
> Through git-diff(1), a single diff can be generated from a pair of blob
> revisions directly. Unfortunately, there is not a mechanism to compute
> batches of specific file pair diffs in a single process. Such a feature
> is particularly useful on the server-side where diffing between a large
> set of changes is not feasible all at once due to timeout concerns.
> 
> To facilitate this, introduce git-diff-pairs(1) which takes the
> null-terminated raw diff format as input on stdin and produces diffs in
> other formats. As the raw diff format already contains the necessary
> metadata, it becomes possible to progressively generate batches of diffs
> without having to recompute rename detection or retrieve object context.
> Something like the following:
> 
> 	git diff-tree -r -z -M $old $new |
> 	git diff-pairs -p
> 
> should generate the same output as `git diff-tree -p -M`. Furthermore,
> each line of raw diff formatted input can also be individually fed to a
> separate git-diff-pairs(1) process and still produce the same output.

I like the idea of this, I've left a few comments mainly around the UI.

 > +Here's an incomplete list of things that `diff-pairs` could do, but
 > +doesn't (mostly in the name of simplicity):
 > +
 > + - Only `-z` input is accepted, not normal `--raw` input.

I think only accepting NUL terminated input is fine, but if we want to 
accept other formats we should  have a plan for how to do that in a 
backwards compatible way as we cannot use `-z` to distinguish between 
input formats.

> +	const char * const usage[] = {
> +		N_("git diff-pairs [diff-options]"),

Normally the option summary printed by "git foo -h" is generated by the 
option parser. In this case we don't define any options and use 
setup_revisions() instead so we need to provide the option summary 
ourselves. Looking at diff-files.c we can add

	"\n"
	COMMON_DIFF_OPTIONS_HELP;

to do that.

> +	argc = setup_revisions(argc, argv, &revs, NULL);

I think we should check that there are no options left on the 
commandline after setup_revisions() returns

> +	/* Don't allow pathspecs at all. */
> +	if (revs.prune_data.nr)
> +		usage_with_options(usage, options);

It is not just pathspecs that we want to reject but all revision related 
options. Looking at diff-files.c we can do

	if (rev.pending.nr ||
	    rev.min_age != -1 || rev.max_age != -1 ||
	    rev.max_count != -1)
		usage_with_option(usage, options);

To catch some of that but it still accepts things like "--first-parent", 
"--merges" and "--ancestry-path". We may just have to live with that as 
I don't think it is worth expanding a huge amount of effort to prevent them.

> +	if (!revs.diffopt.output_format)
> +		revs.diffopt.output_format = DIFF_FORMAT_RAW;

This matches the other diff plumbing commands but I'm not sure it is the 
most helpful default for a command that is supposed to transform raw 
diffs into another format. Maybe we should default to DIFF_FORMAT_PATCH?


> +test_expect_success 'split input across multiple diff-pairs' '

This needs a PERL prerequisite I think. I'm a bit unsure what this test 
adds compared to the others.

Best Wishes

Phillip

> +	write_script split-raw-diff "$PERL_PATH" <<-\EOF &&
> +	$/ = "\0";
> +	while (<>) {
> +	  my $meta = $_;
> +	  my $path = <>;
> +	  # renames have an extra path
> +	  my $path2 = <> if $meta =~ /[RC]\d+/;
> +
> +	  open(my $fh, ">", sprintf "diff%03d", $.);
> +	  print $fh $meta, $path, $path2;
> +	}
> +	EOF
> +
> +	git diff-tree -p -M -C -C base new >expect &&
> +
> +	git diff-tree -r -z -M -C -C base new |
> +	./split-raw-diff &&
> +	for i in diff*; do
> +		git diff-pairs -p <$i || return 1
> +	done >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_done


  parent reply	other threads:[~2025-02-17 14:38 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13  4:23 [PATCH 0/3] batch blob diff generation Justin Tobler
2024-12-13  4:23 ` [PATCH 1/3] builtin: introduce diff-blob command Justin Tobler
2024-12-13  4:23 ` [PATCH 2/3] builtin/diff-blob: add "--stdin" option Justin Tobler
2024-12-13  4:23 ` [PATCH 3/3] builtin/diff-blob: Add "-z" option Justin Tobler
2024-12-13  8:12 ` [PATCH 0/3] batch blob diff generation Jeff King
2024-12-13 10:17   ` Junio C Hamano
2024-12-13 10:38     ` Jeff King
2024-12-15  2:07       ` Junio C Hamano
2024-12-15  2:17         ` Junio C Hamano
2024-12-16 11:11           ` Jeff King
2024-12-16 16:29             ` Junio C Hamano
2024-12-18 11:39               ` Jeff King
2024-12-18 14:53                 ` Junio C Hamano
2024-12-20  9:09                   ` Jeff King
2024-12-20  9:10                     ` Jeff King
2024-12-13 16:41   ` Justin Tobler
2024-12-16 11:18     ` Jeff King
2024-12-13 22:34   ` Junio C Hamano
2024-12-15 23:24     ` Junio C Hamano
2024-12-16 11:30       ` Jeff King
2025-02-12  4:18 ` [PATCH v2 " Justin Tobler
2025-02-12  4:18   ` [PATCH v2 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-12  9:06     ` Karthik Nayak
2025-02-12 17:35       ` Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-12 17:24       ` Justin Tobler
2025-02-13  5:45         ` Patrick Steinhardt
2025-02-12  4:18   ` [PATCH v2 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-12  9:51     ` Karthik Nayak
2025-02-25 23:38       ` Justin Tobler
2025-02-12 11:40     ` Jean-Noël Avila
2025-02-12 16:50     ` Junio C Hamano
2025-02-19 22:19       ` Justin Tobler
2025-02-19 23:19         ` Junio C Hamano
2025-02-19 23:47           ` Junio C Hamano
2025-02-20  0:32             ` Justin Tobler
2025-02-20 14:56               ` Justin Tobler
2025-02-20 16:14                 ` Junio C Hamano
2025-02-17 14:38     ` Phillip Wood [this message]
2025-02-19 20:51       ` Justin Tobler
2025-02-19 21:57         ` Junio C Hamano
2025-02-19 22:38           ` Justin Tobler
2025-02-26 14:47         ` Phillip Wood
2025-02-12  4:18   ` [PATCH v2 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-17 14:38     ` Phillip Wood
2025-02-19 23:09       ` Justin Tobler
2025-02-25 23:39   ` [PATCH v3 0/3] batch blob diff generation Justin Tobler
2025-02-25 23:39     ` [PATCH v3 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-26 18:04       ` Junio C Hamano
2025-02-25 23:39     ` [PATCH v3 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-26 18:24       ` Junio C Hamano
2025-02-27 22:15         ` Justin Tobler
2025-02-27  9:35       ` Karthik Nayak
2025-02-27 22:36         ` Justin Tobler
2025-02-27 12:56       ` Patrick Steinhardt
2025-02-27 23:00         ` Justin Tobler
2025-02-25 23:39     ` [PATCH v3 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-26 14:58     ` [PATCH v3 0/3] batch blob diff generation phillip.wood123
2025-02-27 22:04       ` Justin Tobler
2025-02-28  0:26     ` [PATCH v4 0/4] " Justin Tobler
2025-02-28  0:26       ` [PATCH v4 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-28  0:26       ` [PATCH v4 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-02-28  8:29         ` Patrick Steinhardt
2025-02-28 17:10           ` Justin Tobler
2025-02-28  0:26       ` [PATCH v4 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-02-28  8:29         ` Patrick Steinhardt
2025-02-28 17:26           ` Justin Tobler
2025-02-28  0:26       ` [PATCH v4 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-28 21:33       ` [PATCH v5 0/4] batch blob diff generation Justin Tobler
2025-02-28 21:33         ` [PATCH v5 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-03-03 16:17           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-03-03 16:19           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-03-03 16:30           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6d4230e-7b80-4eec-b218-37717ae2e298@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jltobler@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).