All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org, karthik.188@gmail.com,
	phillip.wood123@gmail.com, Jeff King <peff@peff.net>
Subject: Re: [PATCH v3 2/3] builtin: introduce diff-pairs command
Date: Thu, 27 Feb 2025 13:56:04 +0100	[thread overview]
Message-ID: <Z8Bg5EAArZVGPaAc@pks.im> (raw)
In-Reply-To: <20250225233925.1345086-3-jltobler@gmail.com>

On Tue, Feb 25, 2025 at 05:39:24PM -0600, Justin Tobler wrote:
> Through git-diff(1), a single diff can be generated from a pair of blob
> revisions directly. Unfortunately, there is not a mechanism to compute
> batches of specific file pair diffs in a single process. Such a feature
> is particularly useful on the server-side where diffing between a large
> set of changes is not feasible all at once due to timeout concerns.
> 
> To facilitate this, introduce git-diff-pairs(1) which acts as a backend
> passing its NUL-terminated raw diff format input from stdin through diff
> machinery to produce various forms of output such as patch or raw.
> 
> The raw format was originally designed as an interchange format and
> represents the contents of the diff_queue_diff list making it possible
> to break the diff pipeline into separate stages. For example,
> git-diff-tree(1) can be used as a frontend to compute file pairs to
> queue and feed its raw output to git-diff-pairs(1) to compute patches.
> With this, batches of diffs can be progessively generated without having
> to recompute rename detection or retrieve object context. Something like

s/rename detection/renames/

> diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c
> new file mode 100644
> index 0000000000..9472b10461
> --- /dev/null
> +++ b/builtin/diff-pairs.c
> @@ -0,0 +1,193 @@
> +#include "builtin.h"
> +#include "commit.h"
> +#include "config.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +#include "gettext.h"
> +#include "hex.h"
> +#include "object.h"
> +#include "parse-options.h"
> +#include "revision.h"
> +#include "strbuf.h"
> +
> +static unsigned parse_mode_or_die(const char *mode, const char **endp)
> +{
> +	uint16_t ret;
> +
> +	*endp = parse_mode(mode, &ret);
> +	if (!*endp)
> +		die(_("unable to parse mode: %s"), mode);
> +	return ret;
> +}
> +
> +static void parse_oid_or_die(const char *p, struct object_id *oid,
> +			     const char **endp, const struct git_hash_algo *algop)
> +{
> +	if (parse_oid_hex_algop(p, oid, endp, algop) || *(*endp)++ != ' ')
> +		die(_("unable to parse object id: %s"), p);
> +}
> +
> +static void flush_diff_queue(struct diff_options *options)
> +{
> +	/*
> +	 * If rename detection is not requested, use rename information from the
> +	 * raw diff formatted input. Setting found_follow ensures diffcore_std()
> +	 * does not mess with rename information already present in queued
> +	 * filepairs.
> +	 */
> +	if (!options->detect_rename)
> +		options->found_follow = 1;

It's a bit weird that we set this over here. Shouldn't we have set it up
in the main function already?

> +	diffcore_std(options);
> +	diff_flush(options);
> +}
> +
> +int cmd_diff_pairs(int argc, const char **argv, const char *prefix,
> +		   struct repository *repo)
> +{
> +	struct strbuf path_dst = STRBUF_INIT;
> +	struct strbuf path = STRBUF_INIT;
> +	struct strbuf meta = STRBUF_INIT;
> +	struct rev_info revs;
> +	int ret;
> +
> +	const char * const usage[] = {
> +		N_("git diff-pairs -z [<diff-options>]"),
> +		NULL
> +	};
> +	struct option options[] = {
> +		OPT_END()
> +	};
> +	struct option *parseopts = add_diff_options(options, &revs.diffopt);
> +
> +	show_usage_with_options_if_asked(argc, argv, usage, parseopts);

Don't we also have to call `parse_options()` even though we don't have
our own options yet? Or is this all handled by `setup_revisions()`?

> +	repo_init_revisions(repo, &revs, prefix);
> +	repo_config(repo, git_diff_basic_config, NULL);
> +	revs.disable_stdin = 1;
> +	revs.abbrev = 0;
> +	revs.diff = 1;
> +
> +	if (setup_revisions(argc, argv, &revs, NULL) > 1)
> +		usage_with_options(usage, parseopts);

I think it's discouraged nowadays to use `usage_with_options()` as it
generates a ton of noise while hiding the actual error message. It is
instead recommended to directly call `usage()` with an error message.

In this case here we would say e.g. `usage(_("unrecognized argument:
%s"), argv[0])`, in the cases below we'd use the error messages you
already have.

> +
> +	/*
> +	 * With the -z option, both command input and raw output are
> +	 * NUL-delimited (this mode does not effect patch output). At present
> +	 * only NUL-delimited raw diff formatted input is supported.
> +	 */
> +	if (revs.diffopt.line_termination) {
> +		error(_("working without -z is not supported"));
> +		usage_with_options(usage, parseopts);
> +	}
> +
> +	if (revs.prune_data.nr) {
> +		error(_("pathspec arguments not supported"));
> +		usage_with_options(usage, parseopts);
> +	}
> +
> +	if (revs.pending.nr || revs.max_count != -1 ||
> +	    revs.min_age != (timestamp_t)-1 ||
> +	    revs.max_age != (timestamp_t)-1) {
> +		error(_("revision arguments not allowed"));
> +		usage_with_options(usage, parseopts);
> +	}

Okay. We restrict a bunch of usages, which makes your job simpler right
now, but by dying it keeps the door open to iterate on those in the
future.

> +	if (!revs.diffopt.output_format)
> +		revs.diffopt.output_format = DIFF_FORMAT_PATCH;

Instead of setting this conditionally, can we already set it up as a
default before calling `setup_revisions()`?

> +	while (1) {
> +		struct object_id oid_a, oid_b;
> +		struct diff_filepair *pair;
> +		unsigned mode_a, mode_b;
> +		const char *p;
> +		char status;
> +
> +		if (strbuf_getline_nul(&meta, stdin) == EOF)
> +			break;
> +
> +		p = meta.buf;
> +		if (*p != ':')
> +			die(_("invalid raw diff input"));
> +		p++;
> +
> +		mode_a = parse_mode_or_die(p, &p);
> +		mode_b = parse_mode_or_die(p, &p);
> +
> +		if (S_ISDIR(mode_a) || S_ISDIR(mode_b))
> +			die(_("tree objects not supported"));

I assume submodules aren't supported either, are they? If so, do we also
have to check for `S_ISGITLINK()`? It would be nice to have a test for
them.

Patrick

  parent reply	other threads:[~2025-02-27 12:56 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-13  4:23 [PATCH 0/3] batch blob diff generation Justin Tobler
2024-12-13  4:23 ` [PATCH 1/3] builtin: introduce diff-blob command Justin Tobler
2024-12-13  4:23 ` [PATCH 2/3] builtin/diff-blob: add "--stdin" option Justin Tobler
2024-12-13  4:23 ` [PATCH 3/3] builtin/diff-blob: Add "-z" option Justin Tobler
2024-12-13  8:12 ` [PATCH 0/3] batch blob diff generation Jeff King
2024-12-13 10:17   ` Junio C Hamano
2024-12-13 10:38     ` Jeff King
2024-12-15  2:07       ` Junio C Hamano
2024-12-15  2:17         ` Junio C Hamano
2024-12-16 11:11           ` Jeff King
2024-12-16 16:29             ` Junio C Hamano
2024-12-18 11:39               ` Jeff King
2024-12-18 14:53                 ` Junio C Hamano
2024-12-20  9:09                   ` Jeff King
2024-12-20  9:10                     ` Jeff King
2024-12-13 16:41   ` Justin Tobler
2024-12-16 11:18     ` Jeff King
2024-12-13 22:34   ` Junio C Hamano
2024-12-15 23:24     ` Junio C Hamano
2024-12-16 11:30       ` Jeff King
2025-02-12  4:18 ` [PATCH v2 " Justin Tobler
2025-02-12  4:18   ` [PATCH v2 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-12  9:06     ` Karthik Nayak
2025-02-12 17:35       ` Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-12 17:24       ` Justin Tobler
2025-02-13  5:45         ` Patrick Steinhardt
2025-02-12  4:18   ` [PATCH v2 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-12  9:51     ` Karthik Nayak
2025-02-25 23:38       ` Justin Tobler
2025-02-12 11:40     ` Jean-Noël Avila
2025-02-12 16:50     ` Junio C Hamano
2025-02-19 22:19       ` Justin Tobler
2025-02-19 23:19         ` Junio C Hamano
2025-02-19 23:47           ` Junio C Hamano
2025-02-20  0:32             ` Justin Tobler
2025-02-20 14:56               ` Justin Tobler
2025-02-20 16:14                 ` Junio C Hamano
2025-02-17 14:38     ` Phillip Wood
2025-02-19 20:51       ` Justin Tobler
2025-02-19 21:57         ` Junio C Hamano
2025-02-19 22:38           ` Justin Tobler
2025-02-26 14:47         ` Phillip Wood
2025-02-12  4:18   ` [PATCH v2 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-12  9:23     ` Patrick Steinhardt
2025-02-17 14:38     ` Phillip Wood
2025-02-19 23:09       ` Justin Tobler
2025-02-25 23:39   ` [PATCH v3 0/3] batch blob diff generation Justin Tobler
2025-02-25 23:39     ` [PATCH v3 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-26 18:04       ` Junio C Hamano
2025-02-25 23:39     ` [PATCH v3 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-26 18:24       ` Junio C Hamano
2025-02-27 22:15         ` Justin Tobler
2025-02-27  9:35       ` Karthik Nayak
2025-02-27 22:36         ` Justin Tobler
2025-02-27 12:56       ` Patrick Steinhardt [this message]
2025-02-27 23:00         ` Justin Tobler
2025-02-25 23:39     ` [PATCH v3 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-26 14:58     ` [PATCH v3 0/3] batch blob diff generation phillip.wood123
2025-02-27 22:04       ` Justin Tobler
2025-02-28  0:26     ` [PATCH v4 0/4] " Justin Tobler
2025-02-28  0:26       ` [PATCH v4 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-28  0:26       ` [PATCH v4 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-02-28  8:29         ` Patrick Steinhardt
2025-02-28 17:10           ` Justin Tobler
2025-02-28  0:26       ` [PATCH v4 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-02-28  8:29         ` Patrick Steinhardt
2025-02-28 17:26           ` Justin Tobler
2025-02-28  0:26       ` [PATCH v4 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-28 21:33       ` [PATCH v5 0/4] batch blob diff generation Justin Tobler
2025-02-28 21:33         ` [PATCH v5 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-03-03 16:17           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-03-03 16:19           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-03-03 16:30           ` Junio C Hamano
2025-02-28 21:33         ` [PATCH v5 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z8Bg5EAArZVGPaAc@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=jltobler@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.