From: Junio C Hamano <gitster@pobox.com>
To: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Michael Montalbo <mmontalbo@gmail.com>
Subject: Re: [PATCH v2 3/4] diff: add long-running diff process via diff.<driver>.process
Date: Tue, 26 May 2026 10:56:32 +0900 [thread overview]
Message-ID: <xmqqpl2jlyr3.fsf@gitster.g> (raw)
In-Reply-To: <c25647c6e571e293fc994e0620ca37709f680f8a.1779733799.git.gitgitgadget@gmail.com> (Michael Montalbo via GitGitGadget's message of "Mon, 25 May 2026 18:29:57 +0000")
"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +struct diff_subprocess {
> + struct subprocess_entry subprocess;
> + unsigned int supported_capabilities;
> +};
> +
> +static int subprocess_map_initialized;
> +static struct hashmap subprocess_map;
Can we avoid introducing new global variables like these? Would
"struct userdiff_driver" or "struct diff_options" be a good place to
hang this hashmap, perhaps?
> +static int send_file_content(int fd, const char *buf, long size)
> +{
> + int ret;
> +
> + if (size > 0)
> + ret = write_packetized_from_buf_no_flush(buf, size, fd);
> + else
> + ret = 0;
Shouldn't "size == -24" be flagged as an invalid input?
> + if (ret)
> + return ret;
> + return packet_flush_gently(fd);
> +}
> +static int parse_hunk_line(const char *line, struct xdl_hunk *hunk)
> +{
> +...
> +}
This gives a silent error diagnosis, which is good for a lower level
helper.
> +int diff_process_get_hunks(struct userdiff_driver *drv,
> + const char *path,
> + const char *old_buf, long old_size,
> + const char *new_buf, long new_size,
> + struct xdl_hunk **hunks_out,
> + size_t *nr_hunks_out)
> +{
> + struct diff_subprocess *backend;
> + struct child_process *process;
> + int fd_in, fd_out;
> + struct strbuf status = STRBUF_INIT;
> + struct xdl_hunk *hunks = NULL;
> + struct xdl_hunk hunk;
> + size_t nr_hunks = 0, alloc_hunks = 0;
> + int len;
> + char *line;
> +
> + if (!drv || !drv->process)
> + return -1;
A driver that does not define process is not an error; it is
perfectly normal in the current world order where nobody has such an
external process and even fi this patch lands, external processes
are optional. So here "return -1" does not mean an error, and
silent return is perfectly fine.
> + backend = find_or_start_process(drv->process);
> + if (!backend)
> + return -1;
This is probably an error; the user specified drv->process, we
either tried to find or start the process and failed. Isn't it an
event that deserves to be reported in an error message?
> + if (!(backend->supported_capabilities & CAP_HUNKS))
> + return -1;
Backend started, but the "hunks" feature is not supported. Perhaps
in a year or two, this external process protocol may have become so
popular that it gained more capabilities, possibly making get_hunks
obsolete. We may be looking at such an external process that uses
other capabilities but not this one. This is not an error, so
silent return is perfectly fine.
> + process = subprocess_get_child_process(&backend->subprocess);
> + fd_in = process->in;
> + fd_out = process->out;
> +
> + /* Send request */
> + if (packet_write_fmt_gently(fd_in, "command=hunks\n") ||
> + packet_write_fmt_gently(fd_in, "pathname=%s\n", path) ||
> + packet_flush_gently(fd_in))
> + goto error;
> +
> + /* Send old file content */
> + if (send_file_content(fd_in, old_buf, old_size))
> + goto error;
> +
> + /* Send new file content */
> + if (send_file_content(fd_in, new_buf, new_size))
> + goto error;
> +
> + /* Read hunks until flush packet */
> + while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 &&
> + line) {
> + if (parse_hunk_line(line, &hunk) < 0)
> + goto error;
> + ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks);
> + hunks[nr_hunks++] = hunk;
> + }
> + if (len < 0)
> + goto error;
> +
> + /* Read status */
> + if (subprocess_read_status(fd_out, &status))
> + goto error;
> +
> + if (strcmp(status.buf, "success")) {
> + if (!strcmp(status.buf, "abort"))
> + backend->supported_capabilities &= ~CAP_HUNKS;
> + goto error;
> + }
> +
> + *hunks_out = hunks;
> + *nr_hunks_out = nr_hunks;
> + strbuf_release(&status);
> + return 0;
> +
> +error:
All exceptions that lead here look like events that should be
reported to the end-user.
> + free(hunks);
> + strbuf_release(&status);
> + return -1;
> +}
> +/*
> + * Query a diff process for hunks describing the changes
> + * between old_buf and new_buf.
> + *
> + * The backend is a long-running subprocess configured via
> + * diff.<driver>.process. It receives file content via
> + * pkt-line and returns hunks with 1-based line numbers.
> + *
> + * On success, sets *hunks_out and *nr_hunks_out to a newly allocated
> + * array (caller must free) and returns 0.
> + *
> + * On failure, returns -1. The caller should fall back to the
> + * builtin diff algorithm.
> + */
I do not agree with this. If it is a failure, the user should fix
the external process (or disable). It shouldn't be hidden behind a
fallback. As I left comments, in this round of implementation,
there are conditions that returns -1 for soemthing that is not an
error (i.e., not configured, or process not supporting the
particular capability) *and* in those cases the caller should fall
back as if nothing happened. But some error cases, the caller
should't hide them.
next prev parent reply other threads:[~2026-05-26 1:56 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-22 2:11 [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 1/5] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-22 5:29 ` Junio C Hamano
2026-05-22 19:06 ` Michael Montalbo
2026-05-24 8:50 ` Junio C Hamano
2026-05-24 18:01 ` Michael Montalbo
2026-05-22 2:11 ` [PATCH 2/5] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 3/5] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 4/5] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 5/5] diff-process-normalize: add built-in whitespace normalizer Michael Montalbo via GitGitGadget
2026-05-22 5:29 ` [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Junio C Hamano
2026-05-22 17:19 ` Michael Montalbo
2026-05-25 18:29 ` [PATCH v2 0/4] " Michael Montalbo via GitGitGadget
2026-05-25 18:29 ` [PATCH v2 1/4] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-25 18:29 ` [PATCH v2 2/4] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-25 18:29 ` [PATCH v2 3/4] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-26 1:56 ` Junio C Hamano [this message]
2026-05-29 0:51 ` Michael Montalbo
2026-05-26 2:26 ` Junio C Hamano
2026-05-29 0:55 ` Michael Montalbo
2026-05-25 18:29 ` [PATCH v2 4/4] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-29 20:48 ` [PATCH v3 0/6] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-29 20:48 ` [PATCH v3 1/6] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-29 20:48 ` [PATCH v3 2/6] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-29 20:48 ` [PATCH v3 3/6] sub-process: separate process lifecycle from hashmap management Michael Montalbo via GitGitGadget
2026-05-29 20:48 ` [PATCH v3 4/6] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-29 20:48 ` [PATCH v3 5/6] diff: bypass diff process with --no-ext-diff and in format-patch Michael Montalbo via GitGitGadget
2026-05-29 20:48 ` [PATCH v3 6/6] blame: consult diff process for no-hunk detection Michael Montalbo via GitGitGadget
2026-05-31 10:44 ` [PATCH v3 0/6] [RFC] diff: add diff.<driver>.process for external hunk providers Junio C Hamano
2026-06-01 4:28 ` Michael Montalbo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqpl2jlyr3.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=mmontalbo@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.