git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org, karthik.188@gmail.com,
	phillip.wood123@gmail.com, Jeff King <peff@peff.net>
Subject: Re: [PATCH v3 2/3] builtin: introduce diff-pairs command
Date: Thu, 27 Feb 2025 13:56:04 +0100	[thread overview]
Message-ID: <Z8Bg5EAArZVGPaAc@pks.im> (raw)
In-Reply-To: <20250225233925.1345086-3-jltobler@gmail.com>

On Tue, Feb 25, 2025 at 05:39:24PM -0600, Justin Tobler wrote:
> Through git-diff(1), a single diff can be generated from a pair of blob
> revisions directly. Unfortunately, there is not a mechanism to compute
> batches of specific file pair diffs in a single process. Such a feature
> is particularly useful on the server-side where diffing between a large
> set of changes is not feasible all at once due to timeout concerns.
> 
> To facilitate this, introduce git-diff-pairs(1) which acts as a backend
> passing its NUL-terminated raw diff format input from stdin through diff
> machinery to produce various forms of output such as patch or raw.
> 
> The raw format was originally designed as an interchange format and
> represents the contents of the diff_queue_diff list making it possible
> to break the diff pipeline into separate stages. For example,
> git-diff-tree(1) can be used as a frontend to compute file pairs to
> queue and feed its raw output to git-diff-pairs(1) to compute patches.
> With this, batches of diffs can be progessively generated without having
> to recompute rename detection or retrieve object context. Something like

s/rename detection/renames/

> diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c
> new file mode 100644
> index 0000000000..9472b10461
> --- /dev/null
> +++ b/builtin/diff-pairs.c
> @@ -0,0 +1,193 @@
> +#include "builtin.h"
> +#include "commit.h"
> +#include "config.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +#include "gettext.h"
> +#include "hex.h"
> +#include "object.h"
> +#include "parse-options.h"
> +#include "revision.h"
> +#include "strbuf.h"
> +
> +static unsigned parse_mode_or_die(const char *mode, const char **endp)
> +{
> +	uint16_t ret;
> +
> +	*endp = parse_mode(mode, &ret);
> +	if (!*endp)
> +		die(_("unable to parse mode: %s"), mode);
> +	return ret;
> +}
> +
> +static void parse_oid_or_die(const char *p, struct object_id *oid,
> +			     const char **endp, const struct git_hash_algo *algop)
> +{
> +	if (parse_oid_hex_algop(p, oid, endp, algop) || *(*endp)++ != ' ')
> +		die(_("unable to parse object id: %s"), p);
> +}
> +
> +static void flush_diff_queue(struct diff_options *options)
> +{
> +	/*
> +	 * If rename detection is not requested, use rename information from the
> +	 * raw diff formatted input. Setting found_follow ensures diffcore_std()
> +	 * does not mess with rename information already present in queued
> +	 * filepairs.
> +	 */
> +	if (!options->detect_rename)
> +		options->found_follow = 1;

It's a bit weird that we set this over here. Shouldn't we have set it up
in the main function already?

> +	diffcore_std(options);
> +	diff_flush(options);
> +}
> +
> +int cmd_diff_pairs(int argc, const char **argv, const char *prefix,
> +		   struct repository *repo)
> +{
> +	struct strbuf path_dst = STRBUF_INIT;
> +	struct strbuf path = STRBUF_INIT;
> +	struct strbuf meta = STRBUF_INIT;
> +	struct rev_info revs;
> +	int ret;
> +
> +	const char * const usage[] = {
> +		N_("git diff-pairs -z [<diff-options>]"),
> +		NULL
> +	};
> +	struct option options[] = {
> +		OPT_END()
> +	};
> +	struct option *parseopts = add_diff_options(options, &revs.diffopt);
> +
> +	show_usage_with_options_if_asked(argc, argv, usage, parseopts);

Don't we also have to call `parse_options()` even though we don't have
our own options yet? Or is this all handled by `setup_revisions()`?

> +	repo_init_revisions(repo, &revs, prefix);
> +	repo_config(repo, git_diff_basic_config, NULL);
> +	revs.disable_stdin = 1;
> +	revs.abbrev = 0;
> +	revs.diff = 1;
> +
> +	if (setup_revisions(argc, argv, &revs, NULL) > 1)
> +		usage_with_options(usage, parseopts);

I think it's discouraged nowadays to use `usage_with_options()` as it
generates a ton of noise while hiding the actual error message. It is
instead recommended to directly call `usage()` with an error message.

In this case here we would say e.g. `usage(_("unrecognized argument:
%s"), argv[0])`, in the cases below we'd use the error messages you
already have.

> +
> +	/*
> +	 * With the -z option, both command input and raw output are
> +	 * NUL-delimited (this mode does not effect patch output). At present
> +	 * only NUL-delimited raw diff formatted input is supported.
> +	 */
> +	if (revs.diffopt.line_termination) {
> +		error(_("working without -z is not supported"));
> +		usage_with_options(usage, parseopts);
> +	}
> +
> +	if (revs.prune_data.nr) {
> +		error(_("pathspec arguments not supported"));
> +		usage_with_options(usage, parseopts);
> +	}
> +
> +	if (revs.pending.nr || revs.max_count != -1 ||
> +	    revs.min_age != (timestamp_t)-1 ||
> +	    revs.max_age != (timestamp_t)-1) {
> +		error(_("revision arguments not allowed"));
> +		usage_with_options(usage, parseopts);
> +	}

Okay. We restrict a bunch of usages, which makes your job simpler right
now, but by dying it keeps the door open to iterate on those in the
future.

> +	if (!revs.diffopt.output_format)
> +		revs.diffopt.output_format = DIFF_FORMAT_PATCH;

Instead of setting this conditionally, can we already set it up as a
default before calling `setup_revisions()`?

> +	while (1) {
> +		struct object_id oid_a, oid_b;
> +		struct diff_filepair *pair;
> +		unsigned mode_a, mode_b;
> +		const char *p;
> +		char status;
> +
> +		if (strbuf_getline_nul(&meta, stdin) == EOF)
> +			break;
> +
> +		p = meta.buf;
> +		if (*p != ':')
> +			die(_("invalid raw diff input"));
> +		p++;
> +
> +		mode_a = parse_mode_or_die(p, &p);
> +		mode_b = parse_mode_or_die(p, &p);
> +
> +		if (S_ISDIR(mode_a) || S_ISDIR(mode_b))
> +			die(_("tree objects not supported"));

I assume submodules aren't supported either, are they? If so, do we also
have to check for `S_ISGITLINK()`? It would be nice to have a test for
them.

Patrick

  parent reply	other threads:[~2025-02-27 12:56 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13  4:23 [PATCH 0/3] batch blob diff generation Justin Tobler
2024-12-13  4:23 ` [PATCH 1/3] builtin: introduce diff-blob command Justin Tobler
2024-12-13  4:23 ` [PATCH 2/3] builtin/diff-blob: add "--stdin" option Justin Tobler
2024-12-13  4:23 ` [PATCH 3/3] builtin/diff-blob: Add "-z" option Justin Tobler
2024-12-13  8:12 ` [PATCH 0/3] batch blob diff generation Jeff King
2024-12-13 10:17   ` Junio C Hamano
2024-12-13 10:38     ` Jeff King
2024-12-15  2:07       ` Junio C Hamano
2024-12-15  2:17         ` Junio C Hamano
2024-12-16 11:11           ` Jeff King
2024-12-16 16:29             ` Junio C Hamano
2024-12-18 11:39               ` Jeff King
2024-12-18 14:53                 ` Junio C Hamano
2024-12-20  9:09                   ` Jeff King
2024-12-20  9:10                     ` Jeff King
2024-12-13 16:41   ` Justin Tobler
2024-12-16 11:18     ` Jeff King
2024-12-13 22:34   ` Junio C Hamano
2024-12-15 23:24     ` Junio C Hamano
2024-12-16 11:30       ` Jeff King
2025-02-12  4:18 ` [PATCH v2 " Justin Tobler
2025-02-12  4:18   ` [PATCH v2 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-12  9:06     ` Karthik Nayak
2025-02-12 17:35       ` Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-12 17:24       ` Justin Tobler
2025-02-13  5:45         ` Patrick Steinhardt
2025-02-12  4:18   ` [PATCH v2 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-12  9:51     ` Karthik Nayak
2025-02-25 23:38       ` Justin Tobler
2025-02-12 11:40     ` Jean-Noël Avila
2025-02-12 16:50     ` Junio C Hamano
2025-02-19 22:19       ` Justin Tobler
2025-02-19 23:19         ` Junio C Hamano
2025-02-19 23:47           ` Junio C Hamano
2025-02-20  0:32             ` Justin Tobler
2025-02-20 14:56               ` Justin Tobler
2025-02-20 16:14                 ` Junio C Hamano
2025-02-17 14:38     ` Phillip Wood
2025-02-19 20:51       ` Justin Tobler
2025-02-19 21:57         ` Junio C Hamano
2025-02-19 22:38           ` Justin Tobler
2025-02-26 14:47         ` Phillip Wood
2025-02-12  4:18   ` [PATCH v2 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-17 14:38     ` Phillip Wood
2025-02-19 23:09       ` Justin Tobler
2025-02-25 23:39   ` [PATCH v3 0/3] batch blob diff generation Justin Tobler
2025-02-25 23:39     ` [PATCH v3 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-26 18:04       ` Junio C Hamano
2025-02-25 23:39     ` [PATCH v3 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-26 18:24       ` Junio C Hamano
2025-02-27 22:15         ` Justin Tobler
2025-02-27  9:35       ` Karthik Nayak
2025-02-27 22:36         ` Justin Tobler
2025-02-27 12:56       ` Patrick Steinhardt [this message]
2025-02-27 23:00         ` Justin Tobler
2025-02-25 23:39     ` [PATCH v3 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-26 14:58     ` [PATCH v3 0/3] batch blob diff generation phillip.wood123
2025-02-27 22:04       ` Justin Tobler
2025-02-28  0:26     ` [PATCH v4 0/4] " Justin Tobler
2025-02-28  0:26       ` [PATCH v4 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-28  0:26       ` [PATCH v4 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-02-28  8:29         ` Patrick Steinhardt
2025-02-28 17:10           ` Justin Tobler
2025-02-28  0:26       ` [PATCH v4 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-02-28  8:29         ` Patrick Steinhardt
2025-02-28 17:26           ` Justin Tobler
2025-02-28  0:26       ` [PATCH v4 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-28 21:33       ` [PATCH v5 0/4] batch blob diff generation Justin Tobler
2025-02-28 21:33         ` [PATCH v5 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-03-03 16:17           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-03-03 16:19           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-03-03 16:30           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8Bg5EAArZVGPaAc@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).