From: Phillip Wood <phillip.wood123@gmail.com>
To: Justin Tobler <jltobler@gmail.com>, git@vger.kernel.org
Cc: peff@peff.net, Patrick Steinhardt <ps@pks.im>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 2/3] builtin: introduce diff-pairs command
Date: Mon, 17 Feb 2025 14:38:11 +0000 [thread overview]
Message-ID: <d6d4230e-7b80-4eec-b218-37717ae2e298@gmail.com> (raw)
In-Reply-To: <20250212041825.2455031-3-jltobler@gmail.com>
Hi Justin
On 12/02/2025 04:18, Justin Tobler wrote:
> Through git-diff(1), a single diff can be generated from a pair of blob
> revisions directly. Unfortunately, there is not a mechanism to compute
> batches of specific file pair diffs in a single process. Such a feature
> is particularly useful on the server-side where diffing between a large
> set of changes is not feasible all at once due to timeout concerns.
>
> To facilitate this, introduce git-diff-pairs(1) which takes the
> null-terminated raw diff format as input on stdin and produces diffs in
> other formats. As the raw diff format already contains the necessary
> metadata, it becomes possible to progressively generate batches of diffs
> without having to recompute rename detection or retrieve object context.
> Something like the following:
>
> git diff-tree -r -z -M $old $new |
> git diff-pairs -p
>
> should generate the same output as `git diff-tree -p -M`. Furthermore,
> each line of raw diff formatted input can also be individually fed to a
> separate git-diff-pairs(1) process and still produce the same output.
I like the idea of this, I've left a few comments mainly around the UI.
> +Here's an incomplete list of things that `diff-pairs` could do, but
> +doesn't (mostly in the name of simplicity):
> +
> + - Only `-z` input is accepted, not normal `--raw` input.
I think only accepting NUL terminated input is fine, but if we want to
accept other formats we should have a plan for how to do that in a
backwards compatible way as we cannot use `-z` to distinguish between
input formats.
> + const char * const usage[] = {
> + N_("git diff-pairs [diff-options]"),
Normally the option summary printed by "git foo -h" is generated by the
option parser. In this case we don't define any options and use
setup_revisions() instead so we need to provide the option summary
ourselves. Looking at diff-files.c we can add
"\n"
COMMON_DIFF_OPTIONS_HELP;
to do that.
> + argc = setup_revisions(argc, argv, &revs, NULL);
I think we should check that there are no options left on the
commandline after setup_revisions() returns
> + /* Don't allow pathspecs at all. */
> + if (revs.prune_data.nr)
> + usage_with_options(usage, options);
It is not just pathspecs that we want to reject but all revision related
options. Looking at diff-files.c we can do
if (rev.pending.nr ||
rev.min_age != -1 || rev.max_age != -1 ||
rev.max_count != -1)
usage_with_option(usage, options);
To catch some of that but it still accepts things like "--first-parent",
"--merges" and "--ancestry-path". We may just have to live with that as
I don't think it is worth expanding a huge amount of effort to prevent them.
> + if (!revs.diffopt.output_format)
> + revs.diffopt.output_format = DIFF_FORMAT_RAW;
This matches the other diff plumbing commands but I'm not sure it is the
most helpful default for a command that is supposed to transform raw
diffs into another format. Maybe we should default to DIFF_FORMAT_PATCH?
> +test_expect_success 'split input across multiple diff-pairs' '
This needs a PERL prerequisite I think. I'm a bit unsure what this test
adds compared to the others.
Best Wishes
Phillip
> + write_script split-raw-diff "$PERL_PATH" <<-\EOF &&
> + $/ = "\0";
> + while (<>) {
> + my $meta = $_;
> + my $path = <>;
> + # renames have an extra path
> + my $path2 = <> if $meta =~ /[RC]\d+/;
> +
> + open(my $fh, ">", sprintf "diff%03d", $.);
> + print $fh $meta, $path, $path2;
> + }
> + EOF
> +
> + git diff-tree -p -M -C -C base new >expect &&
> +
> + git diff-tree -r -z -M -C -C base new |
> + ./split-raw-diff &&
> + for i in diff*; do
> + git diff-pairs -p <$i || return 1
> + done >actual &&
> + test_cmp expect actual
> +'
> +
> +test_done
next prev parent reply other threads:[~2025-02-17 14:38 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-13 4:23 [PATCH 0/3] batch blob diff generation Justin Tobler
2024-12-13 4:23 ` [PATCH 1/3] builtin: introduce diff-blob command Justin Tobler
2024-12-13 4:23 ` [PATCH 2/3] builtin/diff-blob: add "--stdin" option Justin Tobler
2024-12-13 4:23 ` [PATCH 3/3] builtin/diff-blob: Add "-z" option Justin Tobler
2024-12-13 8:12 ` [PATCH 0/3] batch blob diff generation Jeff King
2024-12-13 10:17 ` Junio C Hamano
2024-12-13 10:38 ` Jeff King
2024-12-15 2:07 ` Junio C Hamano
2024-12-15 2:17 ` Junio C Hamano
2024-12-16 11:11 ` Jeff King
2024-12-16 16:29 ` Junio C Hamano
2024-12-18 11:39 ` Jeff King
2024-12-18 14:53 ` Junio C Hamano
2024-12-20 9:09 ` Jeff King
2024-12-20 9:10 ` Jeff King
2024-12-13 16:41 ` Justin Tobler
2024-12-16 11:18 ` Jeff King
2024-12-13 22:34 ` Junio C Hamano
2024-12-15 23:24 ` Junio C Hamano
2024-12-16 11:30 ` Jeff King
2025-02-12 4:18 ` [PATCH v2 " Justin Tobler
2025-02-12 4:18 ` [PATCH v2 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-12 9:06 ` Karthik Nayak
2025-02-12 17:35 ` Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-12 17:24 ` Justin Tobler
2025-02-13 5:45 ` Patrick Steinhardt
2025-02-12 4:18 ` [PATCH v2 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-12 9:51 ` Karthik Nayak
2025-02-25 23:38 ` Justin Tobler
2025-02-12 11:40 ` Jean-Noël Avila
2025-02-12 16:50 ` Junio C Hamano
2025-02-19 22:19 ` Justin Tobler
2025-02-19 23:19 ` Junio C Hamano
2025-02-19 23:47 ` Junio C Hamano
2025-02-20 0:32 ` Justin Tobler
2025-02-20 14:56 ` Justin Tobler
2025-02-20 16:14 ` Junio C Hamano
2025-02-17 14:38 ` Phillip Wood [this message]
2025-02-19 20:51 ` Justin Tobler
2025-02-19 21:57 ` Junio C Hamano
2025-02-19 22:38 ` Justin Tobler
2025-02-26 14:47 ` Phillip Wood
2025-02-12 4:18 ` [PATCH v2 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-17 14:38 ` Phillip Wood
2025-02-19 23:09 ` Justin Tobler
2025-02-25 23:39 ` [PATCH v3 0/3] batch blob diff generation Justin Tobler
2025-02-25 23:39 ` [PATCH v3 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-26 18:04 ` Junio C Hamano
2025-02-25 23:39 ` [PATCH v3 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-26 18:24 ` Junio C Hamano
2025-02-27 22:15 ` Justin Tobler
2025-02-27 9:35 ` Karthik Nayak
2025-02-27 22:36 ` Justin Tobler
2025-02-27 12:56 ` Patrick Steinhardt
2025-02-27 23:00 ` Justin Tobler
2025-02-25 23:39 ` [PATCH v3 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-26 14:58 ` [PATCH v3 0/3] batch blob diff generation phillip.wood123
2025-02-27 22:04 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 0/4] " Justin Tobler
2025-02-28 0:26 ` [PATCH v4 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-28 0:26 ` [PATCH v4 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-02-28 8:29 ` Patrick Steinhardt
2025-02-28 17:10 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-02-28 8:29 ` Patrick Steinhardt
2025-02-28 17:26 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-28 21:33 ` [PATCH v5 0/4] batch blob diff generation Justin Tobler
2025-02-28 21:33 ` [PATCH v5 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-03-03 16:17 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-03-03 16:19 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-03-03 16:30 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d6d4230e-7b80-4eec-b218-37717ae2e298@gmail.com \
--to=phillip.wood123@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jltobler@gmail.com \
--cc=peff@peff.net \
--cc=phillip.wood@dunelm.org.uk \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).