From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org, karthik.188@gmail.com,
phillip.wood123@gmail.com, Jeff King <peff@peff.net>
Subject: Re: [PATCH v3 2/3] builtin: introduce diff-pairs command
Date: Thu, 27 Feb 2025 13:56:04 +0100 [thread overview]
Message-ID: <Z8Bg5EAArZVGPaAc@pks.im> (raw)
In-Reply-To: <20250225233925.1345086-3-jltobler@gmail.com>
On Tue, Feb 25, 2025 at 05:39:24PM -0600, Justin Tobler wrote:
> Through git-diff(1), a single diff can be generated from a pair of blob
> revisions directly. Unfortunately, there is not a mechanism to compute
> batches of specific file pair diffs in a single process. Such a feature
> is particularly useful on the server-side where diffing between a large
> set of changes is not feasible all at once due to timeout concerns.
>
> To facilitate this, introduce git-diff-pairs(1) which acts as a backend
> passing its NUL-terminated raw diff format input from stdin through diff
> machinery to produce various forms of output such as patch or raw.
>
> The raw format was originally designed as an interchange format and
> represents the contents of the diff_queue_diff list making it possible
> to break the diff pipeline into separate stages. For example,
> git-diff-tree(1) can be used as a frontend to compute file pairs to
> queue and feed its raw output to git-diff-pairs(1) to compute patches.
> With this, batches of diffs can be progessively generated without having
> to recompute rename detection or retrieve object context. Something like
s/rename detection/renames/
> diff --git a/builtin/diff-pairs.c b/builtin/diff-pairs.c
> new file mode 100644
> index 0000000000..9472b10461
> --- /dev/null
> +++ b/builtin/diff-pairs.c
> @@ -0,0 +1,193 @@
> +#include "builtin.h"
> +#include "commit.h"
> +#include "config.h"
> +#include "diff.h"
> +#include "diffcore.h"
> +#include "gettext.h"
> +#include "hex.h"
> +#include "object.h"
> +#include "parse-options.h"
> +#include "revision.h"
> +#include "strbuf.h"
> +
> +static unsigned parse_mode_or_die(const char *mode, const char **endp)
> +{
> + uint16_t ret;
> +
> + *endp = parse_mode(mode, &ret);
> + if (!*endp)
> + die(_("unable to parse mode: %s"), mode);
> + return ret;
> +}
> +
> +static void parse_oid_or_die(const char *p, struct object_id *oid,
> + const char **endp, const struct git_hash_algo *algop)
> +{
> + if (parse_oid_hex_algop(p, oid, endp, algop) || *(*endp)++ != ' ')
> + die(_("unable to parse object id: %s"), p);
> +}
> +
> +static void flush_diff_queue(struct diff_options *options)
> +{
> + /*
> + * If rename detection is not requested, use rename information from the
> + * raw diff formatted input. Setting found_follow ensures diffcore_std()
> + * does not mess with rename information already present in queued
> + * filepairs.
> + */
> + if (!options->detect_rename)
> + options->found_follow = 1;
It's a bit weird that we set this over here. Shouldn't we have set it up
in the main function already?
> + diffcore_std(options);
> + diff_flush(options);
> +}
> +
> +int cmd_diff_pairs(int argc, const char **argv, const char *prefix,
> + struct repository *repo)
> +{
> + struct strbuf path_dst = STRBUF_INIT;
> + struct strbuf path = STRBUF_INIT;
> + struct strbuf meta = STRBUF_INIT;
> + struct rev_info revs;
> + int ret;
> +
> + const char * const usage[] = {
> + N_("git diff-pairs -z [<diff-options>]"),
> + NULL
> + };
> + struct option options[] = {
> + OPT_END()
> + };
> + struct option *parseopts = add_diff_options(options, &revs.diffopt);
> +
> + show_usage_with_options_if_asked(argc, argv, usage, parseopts);
Don't we also have to call `parse_options()` even though we don't have
our own options yet? Or is this all handled by `setup_revisions()`?
> + repo_init_revisions(repo, &revs, prefix);
> + repo_config(repo, git_diff_basic_config, NULL);
> + revs.disable_stdin = 1;
> + revs.abbrev = 0;
> + revs.diff = 1;
> +
> + if (setup_revisions(argc, argv, &revs, NULL) > 1)
> + usage_with_options(usage, parseopts);
I think it's discouraged nowadays to use `usage_with_options()` as it
generates a ton of noise while hiding the actual error message. It is
instead recommended to directly call `usage()` with an error message.
In this case here we would say e.g. `usage(_("unrecognized argument:
%s"), argv[0])`, in the cases below we'd use the error messages you
already have.
> +
> + /*
> + * With the -z option, both command input and raw output are
> + * NUL-delimited (this mode does not effect patch output). At present
> + * only NUL-delimited raw diff formatted input is supported.
> + */
> + if (revs.diffopt.line_termination) {
> + error(_("working without -z is not supported"));
> + usage_with_options(usage, parseopts);
> + }
> +
> + if (revs.prune_data.nr) {
> + error(_("pathspec arguments not supported"));
> + usage_with_options(usage, parseopts);
> + }
> +
> + if (revs.pending.nr || revs.max_count != -1 ||
> + revs.min_age != (timestamp_t)-1 ||
> + revs.max_age != (timestamp_t)-1) {
> + error(_("revision arguments not allowed"));
> + usage_with_options(usage, parseopts);
> + }
Okay. We restrict a bunch of usages, which makes your job simpler right
now, but by dying it keeps the door open to iterate on those in the
future.
> + if (!revs.diffopt.output_format)
> + revs.diffopt.output_format = DIFF_FORMAT_PATCH;
Instead of setting this conditionally, can we already set it up as a
default before calling `setup_revisions()`?
> + while (1) {
> + struct object_id oid_a, oid_b;
> + struct diff_filepair *pair;
> + unsigned mode_a, mode_b;
> + const char *p;
> + char status;
> +
> + if (strbuf_getline_nul(&meta, stdin) == EOF)
> + break;
> +
> + p = meta.buf;
> + if (*p != ':')
> + die(_("invalid raw diff input"));
> + p++;
> +
> + mode_a = parse_mode_or_die(p, &p);
> + mode_b = parse_mode_or_die(p, &p);
> +
> + if (S_ISDIR(mode_a) || S_ISDIR(mode_b))
> + die(_("tree objects not supported"));
I assume submodules aren't supported either, are they? If so, do we also
have to check for `S_ISGITLINK()`? It would be nice to have a test for
them.
Patrick
next prev parent reply other threads:[~2025-02-27 12:56 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-13 4:23 [PATCH 0/3] batch blob diff generation Justin Tobler
2024-12-13 4:23 ` [PATCH 1/3] builtin: introduce diff-blob command Justin Tobler
2024-12-13 4:23 ` [PATCH 2/3] builtin/diff-blob: add "--stdin" option Justin Tobler
2024-12-13 4:23 ` [PATCH 3/3] builtin/diff-blob: Add "-z" option Justin Tobler
2024-12-13 8:12 ` [PATCH 0/3] batch blob diff generation Jeff King
2024-12-13 10:17 ` Junio C Hamano
2024-12-13 10:38 ` Jeff King
2024-12-15 2:07 ` Junio C Hamano
2024-12-15 2:17 ` Junio C Hamano
2024-12-16 11:11 ` Jeff King
2024-12-16 16:29 ` Junio C Hamano
2024-12-18 11:39 ` Jeff King
2024-12-18 14:53 ` Junio C Hamano
2024-12-20 9:09 ` Jeff King
2024-12-20 9:10 ` Jeff King
2024-12-13 16:41 ` Justin Tobler
2024-12-16 11:18 ` Jeff King
2024-12-13 22:34 ` Junio C Hamano
2024-12-15 23:24 ` Junio C Hamano
2024-12-16 11:30 ` Jeff King
2025-02-12 4:18 ` [PATCH v2 " Justin Tobler
2025-02-12 4:18 ` [PATCH v2 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-12 9:06 ` Karthik Nayak
2025-02-12 17:35 ` Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-12 17:24 ` Justin Tobler
2025-02-13 5:45 ` Patrick Steinhardt
2025-02-12 4:18 ` [PATCH v2 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-12 9:51 ` Karthik Nayak
2025-02-25 23:38 ` Justin Tobler
2025-02-12 11:40 ` Jean-Noël Avila
2025-02-12 16:50 ` Junio C Hamano
2025-02-19 22:19 ` Justin Tobler
2025-02-19 23:19 ` Junio C Hamano
2025-02-19 23:47 ` Junio C Hamano
2025-02-20 0:32 ` Justin Tobler
2025-02-20 14:56 ` Justin Tobler
2025-02-20 16:14 ` Junio C Hamano
2025-02-17 14:38 ` Phillip Wood
2025-02-19 20:51 ` Justin Tobler
2025-02-19 21:57 ` Junio C Hamano
2025-02-19 22:38 ` Justin Tobler
2025-02-26 14:47 ` Phillip Wood
2025-02-12 4:18 ` [PATCH v2 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-12 9:23 ` Patrick Steinhardt
2025-02-17 14:38 ` Phillip Wood
2025-02-19 23:09 ` Justin Tobler
2025-02-25 23:39 ` [PATCH v3 0/3] batch blob diff generation Justin Tobler
2025-02-25 23:39 ` [PATCH v3 1/3] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-26 18:04 ` Junio C Hamano
2025-02-25 23:39 ` [PATCH v3 2/3] builtin: introduce diff-pairs command Justin Tobler
2025-02-26 18:24 ` Junio C Hamano
2025-02-27 22:15 ` Justin Tobler
2025-02-27 9:35 ` Karthik Nayak
2025-02-27 22:36 ` Justin Tobler
2025-02-27 12:56 ` Patrick Steinhardt [this message]
2025-02-27 23:00 ` Justin Tobler
2025-02-25 23:39 ` [PATCH v3 3/3] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-26 14:58 ` [PATCH v3 0/3] batch blob diff generation phillip.wood123
2025-02-27 22:04 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 0/4] " Justin Tobler
2025-02-28 0:26 ` [PATCH v4 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-02-28 0:26 ` [PATCH v4 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-02-28 8:29 ` Patrick Steinhardt
2025-02-28 17:10 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-02-28 8:29 ` Patrick Steinhardt
2025-02-28 17:26 ` Justin Tobler
2025-02-28 0:26 ` [PATCH v4 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
2025-02-28 21:33 ` [PATCH v5 0/4] batch blob diff generation Justin Tobler
2025-02-28 21:33 ` [PATCH v5 1/4] diff: return diff_filepair from diff queue helpers Justin Tobler
2025-03-03 16:17 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 2/4] diff: add option to skip resolving diff statuses Justin Tobler
2025-03-03 16:19 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 3/4] builtin: introduce diff-pairs command Justin Tobler
2025-03-03 16:30 ` Junio C Hamano
2025-02-28 21:33 ` [PATCH v5 4/4] builtin/diff-pairs: allow explicit diff queue flush Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8Bg5EAArZVGPaAc@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
--cc=peff@peff.net \
--cc=phillip.wood123@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).