All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Michael Montalbo <mmontalbo@gmail.com>
Subject: Re: [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers
Date: Fri, 22 May 2026 14:29:03 +0900	[thread overview]
Message-ID: <xmqq8q9cui5c.fsf@gitster.g> (raw)
In-Reply-To: <pull.2120.git.1779415884.gitgitgadget@gmail.com> (Michael Montalbo via GitGitGadget's message of "Fri, 22 May 2026 02:11:19 +0000")

"Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series adds diff.<driver>.process, a long-running subprocess protocol
> that lets external tools provide hunks to git's diff and blame pipelines.
>
> Over the past 18 years, git's diff pipeline accumulated many features that
> operate on hunks: word diff, function context, color-moved, indent
> heuristic, blame. External tools can replace the pipeline entirely
> (diff.<driver>.command) or select among builtin algorithms
> (diff.<driver>.algorithm), but there is no way for a tool to provide
> line-change information into the pipeline. Tools that understand code
> structure (tree-sitter parsers, format-aware analyzers, tools like
> Difftastic and Mergiraf) must bypass git's pipeline and lose access to
> everything downstream.
>
> The protocol follows filter.<driver>.process: pkt-line over stdin/stdout,
> capability negotiation, one tool invocation per git command. The tool
> receives file pairs and returns hunk descriptors that git feeds into the
> standard xdiff pipeline. All output features work normally.
>
> Zero hunks with status=success means the tool considers the files
> equivalent. git diff shows no output for the file, and git blame skips the
> commit, attributing lines to earlier commits.
>
> On error or tool crash, git falls back silently to the builtin diff
> algorithm. The feature is opt-in via diff.<driver>.process and
> .gitattributes; unconfigured files are unaffected.
>
> The series includes git diff-process-normalize, a built-in tool that
> compares files line by line ignoring whitespace (same logic as "git diff -w"
> via xdiff_compare_lines):

Interesting.

If the goal is purely to normalize content before comparison
(e.g. stripping comments or canonicalizing formatting), we already
have the `textconv` mechanism.  While `textconv` is a "one-shot"
per-file process, it is significantly simpler.

I suspect, however, that the primary focus here is to allow external
tools to provide structural alignment (e.g. for AST- aware diffs
like Difftastic or Mergiraf) without losing the original content in
the display.  Unlike `textconv`, which transforms the text the user
sees, this protocol lets the display remain identical to the source
while using a custom engine for the line-matching logic.

If that is the intent, it should be stated more explicitly in the
documentation and commit messages.  The "whitespace-normalize"
demonstration in [PATCH 5/5] is misleading because it's exactly the
case where `textconv` would be sufficient.

I am afraid that the use of a long-running subprocess for every
diff/blame invocation adds significant complexity and overhead.  In
particular, wouldn't the `blame` implementation performs a
round-trip to the subprocess for every commit in the history?  Even
with a persistent process, the overhead of serializing and
deserializing the entire file content twice (old and new) for every
commit could be prohibitive for large files or deep histories.

So, I dunno.

  parent reply	other threads:[~2026-05-22  5:29 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22  2:11 [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 1/5] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-22  5:29   ` Junio C Hamano
2026-05-22 19:06     ` Michael Montalbo
2026-05-24  8:50       ` Junio C Hamano
2026-05-24 18:01         ` Michael Montalbo
2026-05-22  2:11 ` [PATCH 2/5] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 3/5] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 4/5] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-22  2:11 ` [PATCH 5/5] diff-process-normalize: add built-in whitespace normalizer Michael Montalbo via GitGitGadget
2026-05-22  5:29 ` Junio C Hamano [this message]
2026-05-22 17:19   ` [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo
2026-05-25 18:29 ` [PATCH v2 0/4] " Michael Montalbo via GitGitGadget
2026-05-25 18:29   ` [PATCH v2 1/4] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-25 18:29   ` [PATCH v2 2/4] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-25 18:29   ` [PATCH v2 3/4] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-26  1:56     ` Junio C Hamano
2026-05-29  0:51       ` Michael Montalbo
2026-05-26  2:26     ` Junio C Hamano
2026-05-29  0:55       ` Michael Montalbo
2026-05-25 18:29   ` [PATCH v2 4/4] blame: consult diff process for zero-hunk detection Michael Montalbo via GitGitGadget
2026-05-29 20:48   ` [PATCH v3 0/6] [RFC] diff: add diff.<driver>.process for external hunk providers Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 1/6] xdiff: support external hunks via xpparam_t Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 2/6] userdiff: add diff.<driver>.process config Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 3/6] sub-process: separate process lifecycle from hashmap management Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 4/6] diff: add long-running diff process via diff.<driver>.process Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 5/6] diff: bypass diff process with --no-ext-diff and in format-patch Michael Montalbo via GitGitGadget
2026-05-29 20:48     ` [PATCH v3 6/6] blame: consult diff process for no-hunk detection Michael Montalbo via GitGitGadget
2026-05-31 10:44     ` [PATCH v3 0/6] [RFC] diff: add diff.<driver>.process for external hunk providers Junio C Hamano
2026-06-01  4:28       ` Michael Montalbo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq8q9cui5c.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=mmontalbo@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.