Git development
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Michael Montalbo <mmontalbo@gmail.com>
Subject: Re: [PATCH v2 3/4] diff: add long-running diff process via diff.<driver>.process
Date: Tue, 26 May 2026 10:56:32 +0900	[thread overview]
Message-ID: <xmqqpl2jlyr3.fsf@gitster.g> (raw)
In-Reply-To: <c25647c6e571e293fc994e0620ca37709f680f8a.1779733799.git.gitgitgadget@gmail.com> (Michael Montalbo via GitGitGadget's message of "Mon, 25 May 2026 18:29:57 +0000")

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +struct diff_subprocess {
> +	struct subprocess_entry subprocess;
> +	unsigned int supported_capabilities;
> +};
> +
> +static int subprocess_map_initialized;
> +static struct hashmap subprocess_map;

Can we avoid introducing new global variables like these?  Would
"struct userdiff_driver" or "struct diff_options" be a good place to
hang this hashmap, perhaps?

> +static int send_file_content(int fd, const char *buf, long size)
> +{
> +	int ret;
> +
> +	if (size > 0)
> +		ret = write_packetized_from_buf_no_flush(buf, size, fd);
> +	else
> +		ret = 0;

Shouldn't "size == -24" be flagged as an invalid input?

> +	if (ret)
> +		return ret;
> +	return packet_flush_gently(fd);
> +}

> +static int parse_hunk_line(const char *line, struct xdl_hunk *hunk)
> +{
> +...
> +}

This gives a silent error diagnosis, which is good for a lower level
helper.

> +int diff_process_get_hunks(struct userdiff_driver *drv,
> +			   const char *path,
> +			   const char *old_buf, long old_size,
> +			   const char *new_buf, long new_size,
> +			   struct xdl_hunk **hunks_out,
> +			   size_t *nr_hunks_out)
> +{
> +	struct diff_subprocess *backend;
> +	struct child_process *process;
> +	int fd_in, fd_out;
> +	struct strbuf status = STRBUF_INIT;
> +	struct xdl_hunk *hunks = NULL;
> +	struct xdl_hunk hunk;
> +	size_t nr_hunks = 0, alloc_hunks = 0;
> +	int len;
> +	char *line;
> +
> +	if (!drv || !drv->process)
> +		return -1;

A driver that does not define process is not an error; it is
perfectly normal in the current world order where nobody has such an
external process and even fi this patch lands, external processes
are optional.  So here "return -1" does not mean an error, and
silent return is perfectly fine.

> +	backend = find_or_start_process(drv->process);
> +	if (!backend)
> +		return -1;

This is probably an error; the user specified drv->process, we
either tried to find or start the process and failed.  Isn't it an
event that deserves to be reported in an error message?

> +	if (!(backend->supported_capabilities & CAP_HUNKS))
> +		return -1;

Backend started, but the "hunks" feature is not supported.  Perhaps
in a year or two, this external process protocol may have become so
popular that it gained more capabilities, possibly making get_hunks
obsolete.  We may be looking at such an external process that uses
other capabilities but not this one.  This is not an error, so
silent return is perfectly fine.

> +	process = subprocess_get_child_process(&backend->subprocess);
> +	fd_in = process->in;
> +	fd_out = process->out;
> +
> +	/* Send request */
> +	if (packet_write_fmt_gently(fd_in, "command=hunks\n") ||
> +	    packet_write_fmt_gently(fd_in, "pathname=%s\n", path) ||
> +	    packet_flush_gently(fd_in))
> +		goto error;
> +
> +	/* Send old file content */
> +	if (send_file_content(fd_in, old_buf, old_size))
> +		goto error;
> +
> +	/* Send new file content */
> +	if (send_file_content(fd_in, new_buf, new_size))
> +		goto error;
> +
> +	/* Read hunks until flush packet */
> +	while ((len = packet_read_line_gently(fd_out, NULL, &line)) >= 0 &&
> +	       line) {
> +		if (parse_hunk_line(line, &hunk) < 0)
> +			goto error;
> +		ALLOC_GROW(hunks, nr_hunks + 1, alloc_hunks);
> +		hunks[nr_hunks++] = hunk;
> +	}
> +	if (len < 0)
> +		goto error;
> +
> +	/* Read status */
> +	if (subprocess_read_status(fd_out, &status))
> +		goto error;
> +
> +	if (strcmp(status.buf, "success")) {
> +		if (!strcmp(status.buf, "abort"))
> +			backend->supported_capabilities &= ~CAP_HUNKS;
> +		goto error;
> +	}
> +
> +	*hunks_out = hunks;
> +	*nr_hunks_out = nr_hunks;
> +	strbuf_release(&status);
> +	return 0;
> +
> +error:

All exceptions that lead here look like events that should be
reported to the end-user.

> +	free(hunks);
> +	strbuf_release(&status);
> +	return -1;
> +}

> +/*
> + * Query a diff process for hunks describing the changes
> + * between old_buf and new_buf.
> + *
> + * The backend is a long-running subprocess configured via
> + * diff.<driver>.process.  It receives file content via
> + * pkt-line and returns hunks with 1-based line numbers.
> + *
> + * On success, sets *hunks_out and *nr_hunks_out to a newly allocated
> + * array (caller must free) and returns 0.
> + *
> + * On failure, returns -1.  The caller should fall back to the
> + * builtin diff algorithm.
> + */

I do not agree with this.  If it is a failure, the user should fix
the external process (or disable).  It shouldn't be hidden behind a
fallback.  As I left comments, in this round of implementation,
there are conditions that returns -1 for soemthing that is not an
error (i.e., not configured, or process not supporting the
particular capability) *and* in those cases the caller should fall
back as if nothing happened.  But some error cases, the caller
should't hide them.

  reply	other threads:[~2026-05-26  1:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22  2:11 [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 1/5] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-22  5:29   ` Junio C Hamano
2026-05-22 19:06     ` Michael Montalbo
2026-05-24  8:50       ` Junio C Hamano
2026-05-24 18:01         ` Michael Montalbo
2026-05-22  2:11 ` [PATCH 2/5] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 3/5] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 4/5] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 5/5] diff-process-normalize: add built-in whitespace normalizer Michael Montalbo via GitGitGadget
2026-05-22  5:29 ` [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Junio C Hamano
2026-05-22 17:19   ` Michael Montalbo
2026-05-25 18:29 ` [PATCH v2 0/4] " Michael Montalbo via GitGitGadget
2026-05-25 18:29   ` [PATCH v2 1/4] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-25 18:29   ` [PATCH v2 2/4] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-25 18:29   ` [PATCH v2 3/4] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-26  1:56     ` Junio C Hamano [this message]
2026-05-26  2:26     ` Junio C Hamano
2026-05-25 18:29   ` [PATCH v2 4/4] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqpl2jlyr3.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=mmontalbo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox