From: Junio C Hamano <gitster@pobox.com>
To: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Michael Montalbo <mmontalbo@gmail.com>
Subject: Re: [PATCH v2 3/4] diff: add long-running diff process via diff.<driver>.process
Date: Tue, 26 May 2026 10:56:32 +0900 [thread overview]
Message-ID: <xmqqpl2jlyr3.fsf@gitster.g> (raw)
In-Reply-To: <c25647c6e571e293fc994e0620ca37709f680f8a.1779733799.git.gitgitgadget@gmail.com> (Michael Montalbo via GitGitGadget's message of "Mon, 25 May 2026 18:29:57 +0000")
"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +struct diff_subprocess {
> + struct subprocess_entry subprocess;
> + unsigned int supported_capabilities;
> +};
> +
> +static int subprocess_map_initialized;
> +static struct hashmap subprocess_map;
Can we avoid introducing new global variables like these? Would
"struct userdiff_driver" or "struct diff_options" be a good place to
hang this hashmap, perhaps?
> +static int send_file_content(int fd, const char *buf, long size)
> +{
> + int ret;
> +
> + if (size > 0)
> + ret = write_packetized_from_buf_no_flush(buf, size, fd);
> + else
> + ret = 0;
Shouldn't "size == -24" be flagged as an invalid input?
> + if (ret)
> + return ret;
> + return packet_flush_gently(fd);
> +}
> +static int parse_hunk_line(const char *line, struct xdl_hunk *hunk)
> +{
> +...
> +}
This gives a silent error diagnosis, which is good for a lower level
helper.
> +int diff_process_get_hunks(struct userdiff_driver *drv,
> + const char *path,
> + const char *old_buf, long old_size,
> + const char *new_buf, long new_size,
> + struct xdl_hunk **hunks_out,
> + size_t *nr_hunks_out)
> +{
> + struct diff_subprocess *backend;
> + struct child_process *process;
> + int fd_in, fd_out;
> + struct strbuf status = STRBUF_INIT;
> + struct xdl_hunk *hunks = NULL;
> + struct xdl_hunk hunk;
> + size_t nr_hunks = 0, alloc_hunks = 0;
> + int len;
> + char *line;
> +
> + if (!drv || !drv->process)
> + return -1;
A driver that does not define process is not an error; it is
perfectly normal in the current world order where nobody has such an
external process and even fi this patch lands, external processes
are optional. So here "return -1" does not mean an error, and
silent return is perfectly fine.
> + backend = find_or_start_process(drv->process);
> + if (!backend)
> + return -1;
This is probably an error; the user specified drv->process, we
either tried to find or start the process and failed. Isn't it an
event that deserves to be reported in an error message?
> + if (!(backend->supported_capabilities & CAP_HUNKS))
> + return -1;
Backend started, but the "hunks" feature is not supported. Perhaps
in a year or two, this external process protocol may have become so
popular that it gained more capabilities, possibly making get_hunks
obsolete. We may be looking at such an external process that uses
other capabilities but not this one. This is not an error, so
silent return is perfectly fine.
> + process = subprocess_get_child_process(&backend->subprocess);
> + fd_in = process->in;
> + fd_out = process->out;
> +
> + /* Send request */
> + if (packet_write_fmt_gently(fd_in, "command=hunks\n") ||
> + packet_write_fmt_gently(fd_in, "pathname=%s\n", path) ||
> + packet_flush_gently(fd_in))
> + goto error;
> +
> + /* Send old file content */
> + if (send_file_content(fd_in, old_buf, old_size))
> + goto error;
> +
> + /* Send new file content */
> + if (send_file_content(fd_in, new_buf, new_size))
> + goto error;
> +
> + /* Read hunks until flush packet */
> + while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 &&
> + line) {
> + if (parse_hunk_line(line, &hunk) < 0)
> + goto error;
> + ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks);
> + hunks[nr_hunks++] = hunk;
> + }
> + if (len < 0)
> + goto error;
> +
> + /* Read status */
> + if (subprocess_read_status(fd_out, &status))
> + goto error;
> +
> + if (strcmp(status.buf, "success")) {
> + if (!strcmp(status.buf, "abort"))
> + backend->supported_capabilities &= ~CAP_HUNKS;
> + goto error;
> + }
> +
> + *hunks_out = hunks;
> + *nr_hunks_out = nr_hunks;
> + strbuf_release(&status);
> + return 0;
> +
> +error:
All exceptions that lead here look like events that should be
reported to the end-user.
> + free(hunks);
> + strbuf_release(&status);
> + return -1;
> +}
> +/*
> + * Query a diff process for hunks describing the changes
> + * between old_buf and new_buf.
> + *
> + * The backend is a long-running subprocess configured via
> + * diff.<driver>.process. It receives file content via
> + * pkt-line and returns hunks with 1-based line numbers.
> + *
> + * On success, sets *hunks_out and *nr_hunks_out to a newly allocated
> + * array (caller must free) and returns 0.
> + *
> + * On failure, returns -1. The caller should fall back to the
> + * builtin diff algorithm.
> + */
I do not agree with this. If it is a failure, the user should fix
the external process (or disable). It shouldn't be hidden behind a
fallback. As I left comments, in this round of implementation,
there are conditions that returns -1 for soemthing that is not an
error (i.e., not configured, or process not supporting the
particular capability) *and* in those cases the caller should fall
back as if nothing happened. But some error cases, the caller
should't hide them.
next prev parent reply other threads:[~2026-05-26 1:56 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-22 2:11 [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 1/5] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-22 5:29 ` Junio C Hamano
2026-05-22 19:06 ` Michael Montalbo
2026-05-24 8:50 ` Junio C Hamano
2026-05-24 18:01 ` Michael Montalbo
2026-05-22 2:11 ` [PATCH 2/5] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 3/5] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 4/5] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-22 2:11 ` [PATCH 5/5] diff-process-normalize: add built-in whitespace normalizer Michael Montalbo via GitGitGadget
2026-05-22 5:29 ` [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Junio C Hamano
2026-05-22 17:19 ` Michael Montalbo
2026-05-25 18:29 ` [PATCH v2 0/4] " Michael Montalbo via GitGitGadget
2026-05-25 18:29 ` [PATCH v2 1/4] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-25 18:29 ` [PATCH v2 2/4] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-25 18:29 ` [PATCH v2 3/4] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-26 1:56 ` Junio C Hamano [this message]
2026-05-26 2:26 ` Junio C Hamano
2026-05-25 18:29 ` [PATCH v2 4/4] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqpl2jlyr3.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=mmontalbo@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox